A Python package for visualizing categorical data over time using temporal flow diagrams.
PyCatFlow is a specialized visualization tool designed to represent temporal developments in categorical data. It creates flow diagrams that show how categories evolve, appear, and disappear over time periods, making it ideal for analyzing trends in datasets with temporal and categorical dimensions.
- Temporal Flow Visualization: Create dynamic flow diagrams showing category changes over time
- Multiple Connection Types: Choose from semi-curved, curved, or straight connection styles
- Data Input: Support for CSV files
- Customizable Appearance: Extensive options for colors, spacing, labels, and legends
- Export Capabilities: Generate high-quality SVG and PNG outputs
- Professional Output: Publication-ready visualizations with comprehensive styling options
pip install pycatflowgit clone https://github.com/bumatic/PyCatFlow.git
cd PyCatFlow
pip install -r requirements-dev.txt
pip install -e .Alternative using extras:
pip install -e ".[dev]"PyCatFlow requires Cairo for PNG export functionality. Install Cairo using your system's package manager:
macOS (using Homebrew):
brew install cairoUbuntu/Debian:
sudo apt-get install libcairo2-devWindows: Follow the instructions at cairographics.org
Additional Python Dependencies: For PNG export functionality, install:
pip install cairosvgimport pycatflow as pcf
# Load and parse data
data = pcf.read_file(
"data.csv",
columns="time_period",
nodes="category",
categories="subcategory"
)
# Create visualization
viz = pcf.visualize(
data,
spacing=20,
width=800,
connection_type="semi-curved"
)
# Export results
viz.save_svg('output.svg')
viz.save_png('output.png')
# Display in Jupyter
vizYour CSV data should contain at minimum:
- Time periods: Column indicating different time points
- Categories: Column with categorical data to track over time
- Subcategories (optional): Additional categorical dimension for color coding
Example data structure:
time_period,category,subcategory
2020,LibraryA,Core
2020,LibraryB,Optional
2021,LibraryA,Core
2021,LibraryC,New
viz = pcf.visualize(
data,
# Layout
spacing=50, # Space between time periods
width=1200, # Canvas width (auto if None)
height=800, # Canvas height (auto if None)
# Node appearance
node_size=10, # Base node size
minValue=1, # Minimum node size
maxValue=20, # Maximum node size
node_scaling="linear", # Scaling method
# Connections
connection_type="semi-curved", # "semi-curved", "curved", "straight"
line_opacity=0.5, # Connection transparency
# Colors
color_categories=True, # Color by subcategory
color_startEnd=True, # Highlight start/end nodes
palette=("viridis", 10), # Matplotlib colormap
# Labels
show_labels=True, # Display node labels
label_text="item", # "item", "item_count", "item_category"
label_position="nodes", # "nodes", "start_end"
# Legend
legend=True, # Include legend
# Sorting
sort_by="frequency" # "frequency", "alphabetical", "category"
)# File loading with custom parameters
data = pcf.read_file(
"data.csv",
columns="time_col", # Time period column
nodes="category_col", # Category column
categories="subcat_col", # Subcategory column (optional)
orientation="horizontal", # Data layout
delimiter=",", # Custom delimiter
column_order="order_col" # Column for custom time ordering
)
# Direct string parsing
data = pcf.read(
csv_string,
columns="time_col",
nodes="category_col"
)import pycatflow as pcf
# Load dependency data
data = pcf.read_file(
"dependencies.csv",
columns="year",
nodes="library",
categories="type"
)
# Create professional visualization
viz = pcf.visualize(
data,
spacing=30,
width=1000,
connection_type="curved",
color_categories=True,
label_text="item_count",
legend=True
)
viz.save_svg('dependencies_flow.svg')# Create visualization with custom colors
viz = pcf.visualize(
data,
palette=("Set3", 12),
nodes_color="#f0f0f0",
start_node_color="#2e8b57",
end_node_color="#dc143c",
line_opacity=0.7,
label_color="#333333"
)Load and parse data from CSV file.
Parameters:
filepath(str): Path to CSV filecolumns(str): Column name containing time periodsnodes(str): Column name containing categories to trackcategories(str, optional): Column name for subcategoriesorientation(str): "horizontal" or "vertical" data layoutdelimiter(str, optional): CSV delimiter (auto-detected if None)
Returns:
dict: Structured data ready for visualization
Generate flow visualization from structured data.
Parameters:
data(dict): Output fromread_file()orread()spacing(int): Space between time periods (default: 50)connection_type(str): "semi-curved", "curved", or "straight"color_categories(bool): Enable category-based coloringlegend(bool): Include legend in output
Returns:
drawsvg.Drawing: SVG visualization object
The returned visualization object supports:
save_svg(filename): Export as SVGsave_png(filename): Export as PNG (requires cairosvg)- Display in Jupyter notebooks directly
Time periods in one column, categories in another:
time_period,category,subcategory
2020,ItemA,TypeX
2020,ItemB,TypeY
2021,ItemA,TypeX
2021,ItemC,TypeZ
Time periods as column headers:
category,2020,2021,2022
ItemA,TypeX,TypeX,
ItemB,TypeY,,TypeY
ItemC,,TypeZ,TypeZ
Major Update: drawSVG 2.x Migration
- Updated drawSVG dependency: Now requires
drawsvg>=2.0(previouslydrawSVG<2.0) - API method names: Updated to snake_case following drawSVG 2.x conventions
viz.saveSvg()→viz.save_svg()viz.savePng()→viz.save_png()
- Package name: Import statement unchanged (
import drawsvg), but package name is now lowercase
Users upgrading from version 0.1.x should:
- Update method calls:
save_svg()andsave_png()instead of camelCase versions - Install updated dependencies:
pip install drawsvg>=2.0 cairosvg - Existing visualization outputs will be functionally identical with minor coordinate improvements
- Initial release with drawSVG 1.x support
- Core visualization functionality
- Basic CSV data loading
- SVG and PNG export capabilities
- Multiple connection types and styling options
git clone https://github.com/bumatic/PyCatFlow.git
cd PyCatFlow
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
pip install -r requirements.txt
pip install -e .# Using pytest (recommended)
python -m pytest tests/ -v
# With coverage report
python -m pytest tests/ --cov=pycatflow --cov-report=html
The project follows Python best practices:
- PEP 8 style guidelines
- Comprehensive docstrings
- Type hints where appropriate
- Professional error handling
PNG Export Not Working
pip install cairosvgImport Errors Ensure all dependencies are installed:
pip install drawsvg>=2.0 matplotlib cairosvgData Loading Issues
- Verify CSV format matches expected structure
- Check column names match those specified in parameters
- Ensure file encoding is UTF-8
- Large datasets (>1000 categories) may require increased spacing
- Complex connection types (curved) take longer to render
- PNG export is slower than SVG due to rasterization
- Tutorial Article: Medium article with detailed explanation
- Interactive Tutorial: Jupyter Notebook with widgets
- Example Data: Sample datasets available in the
example/directory
If you use PyCatFlow in your research, please cite:
Marcus Burkhardt, and Herbert Natta. 2021. "PyCatFlow: A Python Package for Visualizing Categorical Data over Time". Zenodo. https://doi.org/10.5281/zenodo.5531785.
PyCatFlow is released under the MIT License. See LICENSE file for details.
Conceptualization: Marcus Burkhardt Implementation: Marcus Burkhardt and Herbert Natta (@herbertmn) Inspiration: Rankflow visualization tool by Bernhard Rieder
For questions, issues, or contributions, please visit the GitHub repository.