Topic modeling with latent Dirichlet allocation. lda aims for simplicity.
lda implements latent Dirichlet allocation (LDA) using collapsed Gibbs
sampling. LDA is described in Blei et al. (2003) and Pritchard et al. (2000).
pip install lda
lda.LDA implements latent Dirichlet allocation (LDA). The interface follows
conventions found in scikit-learn.
>>> import numpy as np
>>> import lda
>>> X = np.array([[1,1], [2, 1], [3, 1], [4, 1], [5, 8], [6, 1]])
>>> model = lda.LDA(n_topics=2, n_iter, random_state=1)
>>> doc_topic = model.fit_transform(X) # estimate of document-topic distributions
>>> model.components_ # estimate of topic-word distributions; model.doc_topic_ is an aliasPython 2.7 or Python 3.3+ is required. The following packages are required
lda aims for simplicity. (It happens to be fast, as essential parts are
written in C via Cython_.) If you are working with a very large corpus you may
wish to use more sophisticated topic models such as those implemented in hca
and MALLET. hca is written in C and MALLET_ is written in Java. Unlike
lda, hca can use more than one processor at a time.
- Documentation: http://pythonhosted.org/lda
- Source code: https://github.com/ariddell/lda/
- Issue tracker: https://github.com/ariddell/lda/issues
lda is licensed under Version 2.0 of the Mozilla Public License.