Pretty and opinionated topic model visualization in Python.
topicwizard_0.5.0_compressed.mp4
- Enhanced readibility and legibility of graphs.
- Added helper tooltips to help you understand and interpret the graphs.
- Improved stability.
- Negative topic distributions are now supported in documents.
- Investigate complex relations between topics, words, documents and groups/genres/labels interactively
- Easy to use pipelines that can be utilized for downstream tasks
- Sklearn, Gensim and BERTopic compatible (stay tuned for more) π©
- Interactive and composable Plotly figures
- Automatically infer topic names, oooor...
- Name topics manually
- Easy deployment π
Install from PyPI:
pip install topic-wizardThe main abstraction of topicwizard around a topic model is a topic pipeline, which consists of a vectorizer, that turns texts into bag-of-tokens
representations and a topic model which decomposes these representations into vectors of topic importance.
topicwizard allows you to use both scikit-learn pipelines or its own TopicPipeline.
Let's build a pipeline. We will use scikit-learns CountVectorizer as our vectorizer component:
from sklearn.feature_extraction.text import CountVectorizer
vectorizer = CountVectorizer(min_df=5, max_df=0.8, stop_words="english")The topic model I will use for this example is Non-negative Matrix Factorization as it is fast and usually finds good topics.
from sklearn.decomposition import NMF
model = NMF(n_components=10)Then let's put this all together in a pipeline. You can either use sklearn Pipelines...
from sklearn.pipeline import make_pipeline
topic_pipeline = make_pipeline(vectorizer, model)Or topicwizard's TopicPipeline
from topicwizard.pipeline import make_topic_pipeline
topic_pipeline = make_topic_pipeline(vectorizer, model)Let's load a corpus that we would like to analyze, in this example I will use 20newsgroups from sklearn.
from sklearn.datasets import fetch_20newsgroups
newsgroups = fetch_20newsgroups(subset="all")
corpus = newsgroups.data
# Sklearn gives the labels back as integers, we have to map them back to
# the actual textual label.
group_labels = [newsgroups.target_names[label] for label in newsgroups.target]Then let's fit our pipeline to this data:
topic_pipeline.fit(corpus)You can launch the topic wizard web application for interactively investigating your topic models. The app is also quite easy to deploy in case you want to create a client-facing interface.
import topicwizard
topicwizard.visualize(corpus, pipeline=topic_pipeline)From version 0.3.0 you can also disable pages you do not wish to display thereby sparing a lot of time for yourself:
# A large corpus takes a looong time to compute 2D projections for so
# so you can speed up preprocessing by disabling it alltogether.
topicwizard.visualize(corpus, pipeline=topic_pipeline, exclude_pages=["documents"])| Topics | Words |
|---|---|
![]() |
![]() |
| Documents | Groups |
|---|---|
![]() |
![]() |
If you want customizable, faster, html-saveable interactive plots, you can use the figures API. Here are a couple of examples:
from topicwizard.figures import word_map, document_topic_timeline, topic_wordclouds, word_association_barchart| Word Map | Timeline of Topics in a Document |
|---|---|
word_map(corpus, pipeline=topic_pipeline) |
document_topic_timeline( "Joe Biden takes over presidential office from Donald Trump.", pipeline=topic_pipeline) |
![]() |
![]() |
| Wordclouds of Topics | Topic for Word Importance |
|---|---|
topic_wordclouds(corpus, pipeline=topic_pipeline) |
word_association_barchart(["supreme", "court"], corpus=corpus, pipeline=topic_pipeline) |
![]() |
![]() |
For more information consult our Documentation








