Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Corpus hierarchy #9

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
May 28, 2020
Merged

Corpus hierarchy #9

merged 6 commits into from
May 28, 2020

Conversation

mkarikom
Copy link
Contributor

@mkarikom mkarikom commented Mar 7, 2020

Hi, There are only two new commits in this branch and the commit messages are itemized with details.

Looking toward the next PR, I'm going to start adding supervised LDA functionality. I'm looking to support two forms of document-level response:

  1. GLM (sLDA) : Gibbs and VEM
  2. truncated log-normal for censored response data (current project) : Gibbs, VEM dicey but tempting

boathit and others added 5 commits March 6, 2020 15:04
v1.3 compat

fixed lexicon
1) Type hierarchy for data: rooted at abstract corpus and document, which support subtypes representing fully-synthetic and real world data
2) Type hierarchy for MCMC: break struct model into "model" and "state" reflecting the scope (document locality) of latent variables vs model parameters and hyperpriors.  This will facilitate clear cut testing in next PR based on Grosse and Duvenaud https://arxiv.org/abs/1412.5218
1) Per-word topics: add a test for consistency (with the full joint) of the corresponding conditional
2) Get rid of mutability on structs in src/Data.jl in favor of in-place assignment
Copy link
Owner

@slycoder slycoder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, thanks for taking this on and sorry for taking so long to review! Just a couple of comments.

return
end

function topTopicWords(model::Model,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would consider this moving to a separate file where someday all the functions which visualize/inspect the model could live.

topicSums::Vector{Float64}
docSums::Array{Float64,2}
assignments::Array{Array{Int64,1},1}
conditionals::Array{Array{Float64,2},1} # the p paramter for the word assignment (cat/multinom) variable
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might not be an issue but in the past on large corpora where there's memory pressure keeping the conditions around winds up being the limiting factor. (As opposed to just keeping a temporary conditional for the word currently being sampled)

@slycoder
Copy link
Owner

slycoder commented May 3, 2020

(Also there seem to be conflicts on this branch)

@mkarikom
Copy link
Contributor Author

mkarikom commented May 4, 2020 via email

@mkarikom
Copy link
Contributor Author

Sorry about the late reply (had another paper that needed to be submitted)...

I think all these are safe to overwrite.
Project/Management.toml are exclusively written by the Pkg backend such that -
Project.toml gets updated because of the new deps in TopicModels.jl
Manifest.toml gets updated automatically because of Project.toml

TopicModels.jl is completely changed from having classes,and functions to just serving as the Julia equivalent of DESCRIPTION.R under the Julia 1.3 Pkg framework

@mkarikom
Copy link
Contributor Author

If you want I can do the merge by hand, just need write access ;)

@slycoder
Copy link
Owner

I'm not sure what you mean? You should be able to resolve the conflicts in your branch.

@mkarikom
Copy link
Contributor Author

Ok, I see, sorry just getting the hang of this. Done

I'm going to move all .toml to gitignore, this is all going to be local platform dependent

@slycoder slycoder merged commit c814d7f into slycoder:master May 28, 2020
@mkarikom mkarikom deleted the corpus_hierarchy branch June 1, 2020 20:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants