Corpus hierarchy #9

mkarikom · 2020-03-07T01:23:48Z

Hi, There are only two new commits in this branch and the commit messages are itemized with details.

Looking toward the next PR, I'm going to start adding supervised LDA functionality. I'm looking to support two forms of document-level response:

GLM (sLDA) : Gibbs and VEM
truncated log-normal for censored response data (current project) : Gibbs, VEM dicey but tempting

v1.3 compat fixed lexicon

1) Type hierarchy for data: rooted at abstract corpus and document, which support subtypes representing fully-synthetic and real world data 2) Type hierarchy for MCMC: break struct model into "model" and "state" reflecting the scope (document locality) of latent variables vs model parameters and hyperpriors. This will facilitate clear cut testing in next PR based on Grosse and Duvenaud https://arxiv.org/abs/1412.5218

1) Per-word topics: add a test for consistency (with the full joint) of the corresponding conditional 2) Get rid of mutability on structs in src/Data.jl in favor of in-place assignment

slycoder

Cool, thanks for taking this on and sorry for taking so long to review! Just a couple of comments.

slycoder · 2020-05-03T21:35:58Z

src/Computation.jl

+  return
+end
+
+function topTopicWords(model::Model,


I would consider this moving to a separate file where someday all the functions which visualize/inspect the model could live.

slycoder · 2020-05-03T21:37:09Z

src/Computation.jl

+  topicSums::Vector{Float64}
+  docSums::Array{Float64,2}
+  assignments::Array{Array{Int64,1},1}
+  conditionals::Array{Array{Float64,2},1} # the p paramter for the word assignment (cat/multinom) variable


This might not be an issue but in the past on large corpora where there's memory pressure keeping the conditions around winds up being the limiting factor. (As opposed to just keeping a temporary conditional for the word currently being sampled)

slycoder · 2020-05-03T21:48:34Z

(Also there seem to be conflicts on this branch)

mkarikom · 2020-05-04T20:13:31Z

Thanks for looking through the code! (I'll look into those conflicts and see what's going on)

I would consider this moving to a separate file where someday all the functions which visualize/inspect the model could live. ------------------------------ Sounds good, that would also solve the issue of exports being scattered

around various files, where most of the functions are not exported.

This might not be an issue but in the past on large corpora where there's memory pressure keeping the conditions around winds up being the limiting factor. (As opposed to just keeping a temporary conditional for the word currently being sampled)

Yeah I agree, since we already have sufficient stats from the the collapsed model, I can just compute the theta samples post hoc if necessary. In my current application the mixed membership is of interest but as you say the actual class is usually the relevant observation.

mkarikom · 2020-05-28T01:33:30Z

Sorry about the late reply (had another paper that needed to be submitted)...

I think all these are safe to overwrite.
Project/Management.toml are exclusively written by the Pkg backend such that -
Project.toml gets updated because of the new deps in TopicModels.jl
Manifest.toml gets updated automatically because of Project.toml

TopicModels.jl is completely changed from having classes,and functions to just serving as the Julia equivalent of DESCRIPTION.R under the Julia 1.3 Pkg framework

mkarikom · 2020-05-28T19:55:32Z

If you want I can do the merge by hand, just need write access ;)

slycoder · 2020-05-28T20:37:49Z

I'm not sure what you mean? You should be able to resolve the conflicts in your branch.

mkarikom · 2020-05-28T20:49:42Z

Ok, I see, sorry just getting the hang of this. Done

I'm going to move all .toml to gitignore, this is all going to be local platform dependent

boathit and others added 5 commits March 6, 2020 15:04

fix deprecated warning for julia 0.4

295e3e3

v1.3 compat

feec08d

v1.3 compat fixed lexicon

Add unit test for Gibbs sampler, etc

827bb48

1) Per-word topics: add a test for consistency (with the full joint) of the corresponding conditional 2) Get rid of mutability on structs in src/Data.jl in favor of in-place assignment

comments in gibbs tests

16cc790

slycoder approved these changes May 3, 2020

View reviewed changes

Merge branch 'master' into corpus_hierarchy

c2133e8

slycoder merged commit c814d7f into slycoder:master May 28, 2020

mkarikom deleted the corpus_hierarchy branch June 1, 2020 20:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Corpus hierarchy #9

Corpus hierarchy #9

Uh oh!

mkarikom commented Mar 7, 2020

Uh oh!

slycoder left a comment

Uh oh!

slycoder May 3, 2020

Uh oh!

slycoder May 3, 2020

Uh oh!

slycoder commented May 3, 2020

Uh oh!

mkarikom commented May 4, 2020 via email

Uh oh!

mkarikom commented May 28, 2020

Uh oh!

mkarikom commented May 28, 2020

Uh oh!

slycoder commented May 28, 2020

Uh oh!

mkarikom commented May 28, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Corpus hierarchy #9

Corpus hierarchy #9

Uh oh!

Conversation

mkarikom commented Mar 7, 2020

Uh oh!

slycoder left a comment

Choose a reason for hiding this comment

Uh oh!

slycoder May 3, 2020

Choose a reason for hiding this comment

Uh oh!

slycoder May 3, 2020

Choose a reason for hiding this comment

Uh oh!

slycoder commented May 3, 2020

Uh oh!

mkarikom commented May 4, 2020 via email

Uh oh!

mkarikom commented May 28, 2020

Uh oh!

mkarikom commented May 28, 2020

Uh oh!

slycoder commented May 28, 2020

Uh oh!

mkarikom commented May 28, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants