Snowball

A terminal-based tool for conducting Systematic Literature Reviews (SLR) using the snowballing methodology.

Snowball can be used both as a CLI/TUI application and as a Python library for programmatic access to snowballing functionality.

How it works

Start by adding seed papers (via PDF or DOI)
Review seed papers for inclusion (just in case)
Snowball backwards (via references), forwards (via citations) or both (optionally setting time period)
- Grobid is used to extract references
- APIs like OpenAlex and Semantic Scholar are used to find citations
Review list of found papers for inclusion or exclusion
- Simple keyboard shortcut interaction
- Filter by keyword or review status
- Enhance metadata by fetching abstract, citations, etc
- Open DOI (if available) in web browser for more info and to access PDF
- Repair any errors in metadata
Once there are no more "Pending" papers, add PDFs to the project's PDF folder and snowball again.

Quick Start

1. Initialize a Project

snowball init my-slr-project \
  --name "Machine Learning in Healthcare" \
  --description "SLR on ML applications in medical diagnosis" \
  --min-year 2015 \
  --max-year 2024

2. Add Seed Papers

From PDF files:

snowball add-seed my-slr-project \
  --pdf seed1.pdf seed2.pdf \
  --email [email protected]

From DOIs:

snowball add-seed my-slr-project \
  --doi "10.1234/example.doi" "10.5678/another.doi" \
  --s2-api-key YOUR_SEMANTIC_SCHOLAR_KEY \
  --email [email protected]

3. Run Snowballing

snowball snowball my-slr-project \
  --iterations 2 \
  --email [email protected]

You can also specify which direction to snowball:

# Only backward snowballing (papers referenced by your seeds)
snowball snowball my-slr-project --direction backward

# Only forward snowballing (papers citing your seeds)
snowball snowball my-slr-project --direction forward

# Both directions (default)
snowball snowball my-slr-project --direction both

This will:

Find all papers referenced by your seeds (backward) - if direction is "backward" or "both"
Find all papers citing your seeds (forward) - if direction is "forward" or "both"
Apply your configured filters
Save all discovered papers for review

4. Review Papers

Launch the interactive TUI:

snowball review my-slr-project

Navigation:

↑/↓: Navigate papers (details appear automatically)
Enter/Space: Toggle detail view on/off
d: Toggle details panel visibility
Click column headers: Sort by that column (ascending → descending → default)

Quick Review (Tinder-style):

→ or i: Include paper (and advance to next)
←: Exclude paper (and advance to next)

Review Actions:

n: Add/edit notes
u: Undo last status change

Paper Actions:

o: Open paper's DOI, arXiv URL, or search Google Scholar
p: Open local PDF file
l: Link/unlink a PDF to the current paper
e: Enrich metadata from APIs (abstract, citations, etc.)

Project Actions:

s: Run another snowball iteration
x: Export results (BibTeX, CSV, TikZ, PNG)
f: Cycle filter (All → Pending → Included → Excluded)
g: Generate citation network graph
P: Parse PDFs in pdfs/inbox/ folder (Shift+P)
R: Compute relevance scores (Shift+R)

Other:

?: Show help with all shortcuts
q: Quit

5. Export Results

# Export included papers to BibTeX
snowball export my-slr-project --format bibtex --included-only

# Export all papers to CSV with full metadata
snowball export my-slr-project --format csv

# Export citation graph as TikZ/LaTeX code
snowball export my-slr-project --format tikz --included-only

# Export citation graph as standalone LaTeX document
snowball export my-slr-project --format tikz --included-only --standalone

# Export all formats (BibTeX, CSV, and TikZ)
snowball export my-slr-project --format all

TikZ Export:

Generates publication-ready LaTeX/TikZ code for citation network visualization
Use --standalone to create a complete LaTeX document (compile with pdflatex)
Without --standalone, generates TikZ code for embedding in your own LaTeX document
Papers are positioned by iteration (left to right) and sorted by citation count

6. Relevance Scoring (Optional)

If you set a research question, Snowball can score papers by relevance to help prioritize review:

# Set research question during init
snowball init my-slr-project \
  --name "ML in Healthcare" \
  --research-question "How is machine learning used for medical diagnosis?"

# Or set/update research question later
snowball set-rq my-slr-project "How is machine learning used for medical diagnosis?"

# Compute relevance scores for pending papers
snowball compute-relevance my-slr-project --method tfidf   # Fast, offline
snowball compute-relevance my-slr-project --method llm     # Uses OpenAI API (requires OPENAI_API_KEY)

Relevance scores (0.0–1.0) appear in the "Rel" column in the TUI. Press R in the TUI to compute scores interactively.

7. PDF Management

Snowball supports two PDF workflows:

Automatic matching (inbox):

# Place PDFs in pdfs/inbox/ folder
cp paper1.pdf paper2.pdf my-slr-project/pdfs/inbox/

# Parse and auto-match by title
snowball parse-pdfs my-slr-project
# Matched PDFs are moved to pdfs/, unmatched stay in inbox/

Manual linking (TUI):

Press l to link any PDF from pdfs/ or pdfs/inbox/ to the current paper
Press p to open the linked PDF

8. Update Citation Counts (Optional)

Update citation counts from Google Scholar for more accurate/current data:

# Update all papers
snowball update-citations my-slr-project

# Update only included papers
snowball update-citations my-slr-project --status included

# Custom delay between requests (default: 5 seconds)
snowball update-citations my-slr-project --delay 3

Note: This uses Google Scholar scraping via the scholarly library. Use responsibly with appropriate delays to avoid being rate-limited.

9. Non-Interactive Commands (Scripting/AI Agents)

These commands support automation and AI-assisted workflows:

# List papers with filtering and sorting
snowball list my-slr-project                           # All papers
snowball list my-slr-project --status pending          # Only pending
snowball list my-slr-project --iteration 1 --format json

# Show detailed paper information
snowball show my-slr-project --doi "10.1234/example"
snowball show my-slr-project --title "Machine Learning" --format json

# Update paper status programmatically
snowball set-status my-slr-project --id <paper-id> --status included
snowball set-status my-slr-project --doi "10.1234/example" --status excluded --notes "Out of scope"

# Get project statistics
snowball stats my-slr-project
snowball stats my-slr-project --format json

Configuration

API Keys (Optional but Recommended)

While most APIs work without keys, you'll get higher rate limits with authentication:

Semantic Scholar API Key:

Register at https://www.semanticscholar.org/product/api
Set via environment variable (recommended) or CLI flag

Email for Polite Pools:

CrossRef and OpenAlex offer faster service if you provide an email
Set via environment variable (recommended) or CLI flag

Environment Variables (Recommended):

Add these to your shell profile (~/.bashrc, ~/.zshrc, etc.):

export SEMANTIC_SCHOLAR_API_KEY="your-api-key-here"
export SNOWBALL_EMAIL="[email protected]"

Then commands will automatically use these credentials:

snowball snowball my-project  # Uses env vars automatically

CLI Flags (Alternative):

You can also pass credentials per-command:

snowball snowball my-project --s2-api-key YOUR_KEY --email [email protected]

CLI flags override environment variables when both are set.

GROBID (Optional)

For best PDF parsing results, install GROBID:

# Using Docker
docker run -p 8070:8070 lfoppiano/grobid:0.8.0

# Or install manually following https://grobid.readthedocs.io/

If GROBID is not available, Snowball will automatically fall back to Python-based PDF parsing.

Filter Criteria

Configure auto-filtering in the init command or edit project.json:

{
  "filter_criteria": {
    "min_year": 2015,
    "max_year": 2024,
    "min_citations": 5,
    "keywords": ["machine learning", "deep learning"],
    "excluded_keywords": ["deprecated", "retracted"],
    "venue_types": ["journal", "conference"]
  }
}

Using Snowball as a Python Library

Snowball can be used programmatically as a Python library, allowing you to integrate snowballing into your own workflows, scripts, or applications.

Quick Example

from pathlib import Path
from snowball import (
    SnowballEngine,
    JSONStorage,
    APIAggregator,
    ReviewProject,
    FilterCriteria,
)

# Initialize components
storage = JSONStorage(Path("my-project"))
api = APIAggregator()
engine = SnowballEngine(storage, api)

# Create project
project = ReviewProject(
    name="My Review",
    filter_criteria=FilterCriteria(min_year=2020),
)
storage.save_project(project)

# Add seed paper
paper = engine.add_seed_from_doi("10.1234/example.doi", project)

# Run snowballing
stats = engine.run_snowball_iteration(project, direction="both")
print(f"Found {stats['added']} papers")

# Get papers for review
from snowball import PaperStatus
pending = storage.get_papers_by_status(PaperStatus.PENDING)

# Export results
from snowball import BibTeXExporter
exporter = BibTeXExporter()
bibtex = exporter.export(storage.load_all_papers(), only_included=True)

For comprehensive documentation on using Snowball as a library, see API_USAGE.md.

Available Components

The public API includes:

Core Engine: SnowballEngine for running iterations
Storage: JSONStorage for persistence
API Clients: APIAggregator, SemanticScholarClient, OpenAlexClient, etc.
Parsers: PDFParser for extracting metadata from PDFs
Exporters: BibTeXExporter, CSVExporter, TikZExporter
Filters: FilterEngine for applying criteria
Scoring: TFIDFScorer, LLMScorer for relevance scoring
Models: Paper, ReviewProject, FilterCriteria, etc.

Troubleshooting

PDF Parsing Issues

If PDF metadata extraction fails:

Try with GROBID if not already using it
Use DOI instead: --doi rather than --pdf
Check PDF is not scanned/image-based
Manually add metadata by editing the paper JSON file

API Rate Limits

If you hit rate limits:

Use API keys (--s2-api-key)
Provide email for polite pools (--email)
Add delays between operations
Process seeds in smaller batches

Missing Citations/References

Some papers may not have citation data:

Very recent papers have fewer citations
Some venues aren't well-indexed
Try multiple APIs (aggregator tries all automatically)
Consider using different seed papers

Installation

Using uv (Recommended)

uv is a fast Python package installer and resolver. It's the recommended way to work with this project for development:

# Install uv (if not already installed)
pipx install uv

# Clone the repository
git clone <repo-url>
cd snowball

# Sync dependencies and create virtual environment
uv sync

# Run the tool
uv run snowball --help

# Optional: Install with GROBID support
uv sync --extra grobid

# Optional: Install development dependencies (for testing/linting)
uv sync --extra dev

The uv sync command will:

Create a .venv virtual environment in the project directory
Install all dependencies from the lockfile (uv.lock)
Install the snowball package in editable mode

You can then run commands with uv run <command>, which automatically uses the virtual environment.

Using pipx (Alternative)

# Install pipx (if not already installed)
python -m pip install --user pipx
pipx ensurepath
# Note: You may need to restart your terminal or run 'source ~/.bashrc' (Linux/Mac)
# or 'source ~/.bash_profile' (Mac) for PATH changes to take effect

# Install the package from the repository
pipx install git+<repo-url>

# Or install from a local clone
git clone <repo-url>
cd snowball
pipx install .

# Optional: Install with GROBID support
pipx install "git+<repo-url>[grobid]"
# Or from local: pipx install ".[grobid]"

# Note: For development work with editable installs, use uv instead

Citation

If you use Snowball in your research, please cite:

@software{snowball_2005,
  title = {Snowball: A Tool for Systematic Literature Reviews},
  author = {Richard Glassey and Daniel Bosk},
  year = {2025},
  url = {https://github.com/rjglasse/snowball}
}

Acknowledgments

Built with:

Name		Name	Last commit message	Last commit date
Latest commit History 129 Commits
.github		.github
src/snowball		src/snowball
tests		tests
.env.example		.env.example
.gitignore		.gitignore
API_USAGE.md		API_USAGE.md
CHEATSHEET.md		CHEATSHEET.md
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Snowball

How it works

Quick Start

1. Initialize a Project

2. Add Seed Papers

3. Run Snowballing

4. Review Papers

5. Export Results

6. Relevance Scoring (Optional)

7. PDF Management

8. Update Citation Counts (Optional)

9. Non-Interactive Commands (Scripting/AI Agents)

Configuration

API Keys (Optional but Recommended)

GROBID (Optional)

Filter Criteria

Using Snowball as a Python Library

Quick Example

Available Components

Troubleshooting

PDF Parsing Issues

API Rate Limits

Missing Citations/References

Installation

Using uv (Recommended)

Using pipx (Alternative)

Citation

Acknowledgments

About

Uh oh!

Releases

Packages

Contributors 4

Uh oh!

Languages

License

rjglasse/snowball

Folders and files

Latest commit

History

Repository files navigation

Snowball

How it works

Quick Start

1. Initialize a Project

2. Add Seed Papers

3. Run Snowballing

4. Review Papers

5. Export Results

6. Relevance Scoring (Optional)

7. PDF Management

8. Update Citation Counts (Optional)

9. Non-Interactive Commands (Scripting/AI Agents)

Configuration

API Keys (Optional but Recommended)

GROBID (Optional)

Filter Criteria

Using Snowball as a Python Library

Quick Example

Available Components

Troubleshooting

PDF Parsing Issues

API Rate Limits

Missing Citations/References

Installation

Using uv (Recommended)

Using pipx (Alternative)

Citation

Acknowledgments

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Uh oh!

Languages

Packages