A terminal-based tool for conducting Systematic Literature Reviews (SLR) using the snowballing methodology.
Snowball can be used both as a CLI/TUI application and as a Python library for programmatic access to snowballing functionality.
- Start by adding seed papers (via PDF or DOI)
- Review seed papers for inclusion (just in case)
- Snowball backwards (via references), forwards (via citations) or both (optionally setting time period)
- Grobid is used to extract references
- APIs like OpenAlex and Semantic Scholar are used to find citations
- Review list of found papers for inclusion or exclusion
- Simple keyboard shortcut interaction
- Filter by keyword or review status
- Enhance metadata by fetching abstract, citations, etc
- Open DOI (if available) in web browser for more info and to access PDF
- Repair any errors in metadata
- Once there are no more "Pending" papers, add PDFs to the project's PDF folder and snowball again.
snowball init my-slr-project \
--name "Machine Learning in Healthcare" \
--description "SLR on ML applications in medical diagnosis" \
--min-year 2015 \
--max-year 2024From PDF files:
snowball add-seed my-slr-project \
--pdf seed1.pdf seed2.pdf \
--email [email protected]From DOIs:
snowball add-seed my-slr-project \
--doi "10.1234/example.doi" "10.5678/another.doi" \
--s2-api-key YOUR_SEMANTIC_SCHOLAR_KEY \
--email [email protected]snowball snowball my-slr-project \
--iterations 2 \
--email [email protected]You can also specify which direction to snowball:
# Only backward snowballing (papers referenced by your seeds)
snowball snowball my-slr-project --direction backward
# Only forward snowballing (papers citing your seeds)
snowball snowball my-slr-project --direction forward
# Both directions (default)
snowball snowball my-slr-project --direction bothThis will:
- Find all papers referenced by your seeds (backward) - if direction is "backward" or "both"
- Find all papers citing your seeds (forward) - if direction is "forward" or "both"
- Apply your configured filters
- Save all discovered papers for review
Launch the interactive TUI:
snowball review my-slr-projectNavigation:
↑/↓: Navigate papers (details appear automatically)Enter/Space: Toggle detail view on/offd: Toggle details panel visibility- Click column headers: Sort by that column (ascending → descending → default)
Quick Review (Tinder-style):
→ori: Include paper (and advance to next)←: Exclude paper (and advance to next)
Review Actions:
n: Add/edit notesu: Undo last status change
Paper Actions:
o: Open paper's DOI, arXiv URL, or search Google Scholarp: Open local PDF filel: Link/unlink a PDF to the current papere: Enrich metadata from APIs (abstract, citations, etc.)
Project Actions:
s: Run another snowball iterationx: Export results (BibTeX, CSV, TikZ, PNG)f: Cycle filter (All → Pending → Included → Excluded)g: Generate citation network graphP: Parse PDFs in pdfs/inbox/ folder (Shift+P)R: Compute relevance scores (Shift+R)
Other:
?: Show help with all shortcutsq: Quit
# Export included papers to BibTeX
snowball export my-slr-project --format bibtex --included-only
# Export all papers to CSV with full metadata
snowball export my-slr-project --format csv
# Export citation graph as TikZ/LaTeX code
snowball export my-slr-project --format tikz --included-only
# Export citation graph as standalone LaTeX document
snowball export my-slr-project --format tikz --included-only --standalone
# Export all formats (BibTeX, CSV, and TikZ)
snowball export my-slr-project --format allTikZ Export:
- Generates publication-ready LaTeX/TikZ code for citation network visualization
- Use
--standaloneto create a complete LaTeX document (compile withpdflatex) - Without
--standalone, generates TikZ code for embedding in your own LaTeX document - Papers are positioned by iteration (left to right) and sorted by citation count
If you set a research question, Snowball can score papers by relevance to help prioritize review:
# Set research question during init
snowball init my-slr-project \
--name "ML in Healthcare" \
--research-question "How is machine learning used for medical diagnosis?"
# Or set/update research question later
snowball set-rq my-slr-project "How is machine learning used for medical diagnosis?"
# Compute relevance scores for pending papers
snowball compute-relevance my-slr-project --method tfidf # Fast, offline
snowball compute-relevance my-slr-project --method llm # Uses OpenAI API (requires OPENAI_API_KEY)Relevance scores (0.0–1.0) appear in the "Rel" column in the TUI. Press R in the TUI to compute scores interactively.
Snowball supports two PDF workflows:
Automatic matching (inbox):
# Place PDFs in pdfs/inbox/ folder
cp paper1.pdf paper2.pdf my-slr-project/pdfs/inbox/
# Parse and auto-match by title
snowball parse-pdfs my-slr-project
# Matched PDFs are moved to pdfs/, unmatched stay in inbox/Manual linking (TUI):
- Press
lto link any PDF from pdfs/ or pdfs/inbox/ to the current paper - Press
pto open the linked PDF
Update citation counts from Google Scholar for more accurate/current data:
# Update all papers
snowball update-citations my-slr-project
# Update only included papers
snowball update-citations my-slr-project --status included
# Custom delay between requests (default: 5 seconds)
snowball update-citations my-slr-project --delay 3Note: This uses Google Scholar scraping via the scholarly library. Use responsibly with appropriate delays to avoid being rate-limited.
These commands support automation and AI-assisted workflows:
# List papers with filtering and sorting
snowball list my-slr-project # All papers
snowball list my-slr-project --status pending # Only pending
snowball list my-slr-project --iteration 1 --format json
# Show detailed paper information
snowball show my-slr-project --doi "10.1234/example"
snowball show my-slr-project --title "Machine Learning" --format json
# Update paper status programmatically
snowball set-status my-slr-project --id <paper-id> --status included
snowball set-status my-slr-project --doi "10.1234/example" --status excluded --notes "Out of scope"
# Get project statistics
snowball stats my-slr-project
snowball stats my-slr-project --format jsonWhile most APIs work without keys, you'll get higher rate limits with authentication:
Semantic Scholar API Key:
- Register at https://www.semanticscholar.org/product/api
- Set via environment variable (recommended) or CLI flag
Email for Polite Pools:
- CrossRef and OpenAlex offer faster service if you provide an email
- Set via environment variable (recommended) or CLI flag
Environment Variables (Recommended):
Add these to your shell profile (~/.bashrc, ~/.zshrc, etc.):
export SEMANTIC_SCHOLAR_API_KEY="your-api-key-here"
export SNOWBALL_EMAIL="[email protected]"Then commands will automatically use these credentials:
snowball snowball my-project # Uses env vars automaticallyCLI Flags (Alternative):
You can also pass credentials per-command:
snowball snowball my-project --s2-api-key YOUR_KEY --email [email protected]CLI flags override environment variables when both are set.
For best PDF parsing results, install GROBID:
# Using Docker
docker run -p 8070:8070 lfoppiano/grobid:0.8.0
# Or install manually following https://grobid.readthedocs.io/If GROBID is not available, Snowball will automatically fall back to Python-based PDF parsing.
Configure auto-filtering in the init command or edit project.json:
{
"filter_criteria": {
"min_year": 2015,
"max_year": 2024,
"min_citations": 5,
"keywords": ["machine learning", "deep learning"],
"excluded_keywords": ["deprecated", "retracted"],
"venue_types": ["journal", "conference"]
}
}Snowball can be used programmatically as a Python library, allowing you to integrate snowballing into your own workflows, scripts, or applications.
from pathlib import Path
from snowball import (
SnowballEngine,
JSONStorage,
APIAggregator,
ReviewProject,
FilterCriteria,
)
# Initialize components
storage = JSONStorage(Path("my-project"))
api = APIAggregator()
engine = SnowballEngine(storage, api)
# Create project
project = ReviewProject(
name="My Review",
filter_criteria=FilterCriteria(min_year=2020),
)
storage.save_project(project)
# Add seed paper
paper = engine.add_seed_from_doi("10.1234/example.doi", project)
# Run snowballing
stats = engine.run_snowball_iteration(project, direction="both")
print(f"Found {stats['added']} papers")
# Get papers for review
from snowball import PaperStatus
pending = storage.get_papers_by_status(PaperStatus.PENDING)
# Export results
from snowball import BibTeXExporter
exporter = BibTeXExporter()
bibtex = exporter.export(storage.load_all_papers(), only_included=True)For comprehensive documentation on using Snowball as a library, see API_USAGE.md.
The public API includes:
- Core Engine:
SnowballEnginefor running iterations - Storage:
JSONStoragefor persistence - API Clients:
APIAggregator,SemanticScholarClient,OpenAlexClient, etc. - Parsers:
PDFParserfor extracting metadata from PDFs - Exporters:
BibTeXExporter,CSVExporter,TikZExporter - Filters:
FilterEnginefor applying criteria - Scoring:
TFIDFScorer,LLMScorerfor relevance scoring - Models:
Paper,ReviewProject,FilterCriteria, etc.
If PDF metadata extraction fails:
- Try with GROBID if not already using it
- Use DOI instead:
--doirather than--pdf - Check PDF is not scanned/image-based
- Manually add metadata by editing the paper JSON file
If you hit rate limits:
- Use API keys (
--s2-api-key) - Provide email for polite pools (
--email) - Add delays between operations
- Process seeds in smaller batches
Some papers may not have citation data:
- Very recent papers have fewer citations
- Some venues aren't well-indexed
- Try multiple APIs (aggregator tries all automatically)
- Consider using different seed papers
uv is a fast Python package installer and resolver. It's the recommended way to work with this project for development:
# Install uv (if not already installed)
pipx install uv
# Clone the repository
git clone <repo-url>
cd snowball
# Sync dependencies and create virtual environment
uv sync
# Run the tool
uv run snowball --help
# Optional: Install with GROBID support
uv sync --extra grobid
# Optional: Install development dependencies (for testing/linting)
uv sync --extra devThe uv sync command will:
- Create a
.venvvirtual environment in the project directory - Install all dependencies from the lockfile (
uv.lock) - Install the
snowballpackage in editable mode
You can then run commands with uv run <command>, which automatically uses the virtual environment.
# Install pipx (if not already installed)
python -m pip install --user pipx
pipx ensurepath
# Note: You may need to restart your terminal or run 'source ~/.bashrc' (Linux/Mac)
# or 'source ~/.bash_profile' (Mac) for PATH changes to take effect
# Install the package from the repository
pipx install git+<repo-url>
# Or install from a local clone
git clone <repo-url>
cd snowball
pipx install .
# Optional: Install with GROBID support
pipx install "git+<repo-url>[grobid]"
# Or from local: pipx install ".[grobid]"
# Note: For development work with editable installs, use uv insteadIf you use Snowball in your research, please cite:
@software{snowball_2005,
title = {Snowball: A Tool for Systematic Literature Reviews},
author = {Richard Glassey and Daniel Bosk},
year = {2025},
url = {https://github.com/rjglasse/snowball}
}Built with: