topicwizard

Pretty and opinionated topic model visualization in Python.

topicwizard_0.5.0_compressed.mp4

New in version 0.5.0 🌟

Enhanced readibility and legibility of graphs.
Added helper tooltips to help you understand and interpret the graphs.
Improved stability.
Negative topic distributions are now supported in documents.

Features

Investigate complex relations between topics, words, documents and groups/genres/labels interactively
Easy to use pipelines that can be utilized for downstream tasks
Sklearn, Gensim and BERTopic compatible (stay tuned for more) 🔩
Interactive and composable Plotly figures
Automatically infer topic names, oooor...
Name topics manually
Easy deployment 🌍

Installation

Install from PyPI:

pip install topic-wizard

Pipelines

The main abstraction of topicwizard around a topic model is a topic pipeline, which consists of a vectorizer, that turns texts into bag-of-tokens representations and a topic model which decomposes these representations into vectors of topic importance. topicwizard allows you to use both scikit-learn pipelines or its own TopicPipeline.

Let's build a pipeline. We will use scikit-learns CountVectorizer as our vectorizer component:

from sklearn.feature_extraction.text import CountVectorizer

vectorizer = CountVectorizer(min_df=5, max_df=0.8, stop_words="english")

The topic model I will use for this example is Non-negative Matrix Factorization as it is fast and usually finds good topics.

from sklearn.decomposition import NMF

model = NMF(n_components=10)

Then let's put this all together in a pipeline. You can either use sklearn Pipelines...

from sklearn.pipeline import make_pipeline

topic_pipeline = make_pipeline(vectorizer, model)

Or topicwizard's TopicPipeline

from topicwizard.pipeline import make_topic_pipeline

topic_pipeline = make_topic_pipeline(vectorizer, model)

Let's load a corpus that we would like to analyze, in this example I will use 20newsgroups from sklearn.

from sklearn.datasets import fetch_20newsgroups

newsgroups = fetch_20newsgroups(subset="all")
corpus = newsgroups.data

# Sklearn gives the labels back as integers, we have to map them back to
# the actual textual label.
group_labels = [newsgroups.target_names[label] for label in newsgroups.target]

Then let's fit our pipeline to this data:

topic_pipeline.fit(corpus)

Web Application

You can launch the topic wizard web application for interactively investigating your topic models. The app is also quite easy to deploy in case you want to create a client-facing interface.

import topicwizard

topicwizard.visualize(corpus, pipeline=topic_pipeline)

From version 0.3.0 you can also disable pages you do not wish to display thereby sparing a lot of time for yourself:

# A large corpus takes a looong time to compute 2D projections for so
# so you can speed up preprocessing by disabling it alltogether.
topicwizard.visualize(corpus, pipeline=topic_pipeline, exclude_pages=["documents"])

Topics	Words

Documents	Groups

Figures

If you want customizable, faster, html-saveable interactive plots, you can use the figures API. Here are a couple of examples:

from topicwizard.figures import word_map, document_topic_timeline, topic_wordclouds, word_association_barchart

Word Map	Timeline of Topics in a Document
`word_map(corpus, pipeline=topic_pipeline)`	`document_topic_timeline( "Joe Biden takes over presidential office from Donald Trump.", pipeline=topic_pipeline)`

Wordclouds of Topics	Topic for Word Importance
`topic_wordclouds(corpus, pipeline=topic_pipeline)`	`word_association_barchart(["supreme", "court"], corpus=corpus, pipeline=topic_pipeline)`

For more information consult our Documentation

Name		Name	Last commit message	Last commit date
Latest commit History 202 Commits
.github/workflows		.github/workflows
assets		assets
docs		docs
examples		examples
topicwizard		topicwizard
.flake8		.flake8
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
citation.cff		citation.cff
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

topicwizard

New in version 0.5.0 🌟

Features

Installation

Pipelines

Web Application

Figures

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

License

x-tabdeveloping/topicwizard

Folders and files

Latest commit

History

Repository files navigation

topicwizard

New in version 0.5.0 🌟

Features

Installation

Pipelines

Web Application

Figures

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages