A command-line tool for batch downloading academic papers with multi-source support (Sci-Hub, Unpaywall, arXiv, CORE).
Read this in other languages: English, 简体中文
- Multi-Source Support: Intelligently routes downloads across multiple sources
- arXiv: Prioritized for preprints (free, no API key needed)
- Unpaywall: For open access papers (requires email)
- Sci-Hub: Fallback for older papers (coverage-driven, slower)
- CORE: Additional OA fallback
- Smart Year-Based Routing:
- Papers before 2021: OA sources first, Sci-Hub fallback
- Papers 2021+: OA sources only (skip Sci-Hub)
- Parallel Source Querying: Fast sources queried in parallel with slow-source fallback
- Parallel Mirror Testing: Quickly finds working Sci-Hub mirrors (typically <2s)
- Smart Metadata Caching: Avoids redundant API calls across sources
- Smart Fallback: Automatically tries alternative sources if primary fails
- Flexible Input: Download papers using DOIs, arXiv IDs, or URLs
- Batch processing from a text file
- Automatic mirror selection and testing
- Customizable output directory
- Robust error handling and retries
- PDF validation (rejects HTML files)
- Progress reporting
- Metadata-based Filenames: Automatically names files as
[YYYY] - [Title].pdffor easy organization
uv is an extremely fast Python package and project manager, written in Rust.
# macOS and Linux
curl -LsSf https://astral.sh/uv/install.sh | sh
# Windows
powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"
# Or via pip
pip install uv# Install globally from the current directory
uv tool install .
# Or install globally from GitHub
uv tool install git+https://github.com/Oxidane-bot/scihub-cli.git
# Try without installing (temporary run)
uvx scihub-cli papers.txtNote: uv tool install installs the tool globally on your system, making the scihub-cli command available from anywhere in your terminal.
- Global Installation: Use
uv tool installto install the tool permanently on your system - Temporary Usage: Use
uvx scihub-clito run the tool without installing it - Source Code: Clone the repo and run directly with Python for development
If you prefer to run directly from source:
-
Clone this repository:
git clone https://github.com/Oxidane-bot/scihub-cli.git cd scihub-cli -
Install the required dependencies:
pip install -r requirements.txt -
Run directly with Python:
python -m scihub_cli input_file.txt
If you encounter issues with the installation, try the following:
-
Ensure you have Python 3.9+ installed:
python --version
-
Verify uv is installed correctly:
uv --version
-
Check if the command is in your PATH:
# On Windows where scihub-cli # On macOS/Linux which scihub-cli
-
If you get "command not found" errors after installation:
# Update shell environment uv tool update-shell # Manual PATH refresh # On Windows $env:Path = [System.Environment]::GetEnvironmentVariable("Path","User") # On macOS/Linux source ~/.bashrc # or .zshrc, .bash_profile, etc.
-
If having issues, try:
# List installed tools uv tool list # Upgrade a tool uv tool upgrade scihub-cli # Reinstall uv tool uninstall scihub-cli uv tool install scihub-cli
# If installed with uv
scihub-cli input_file.txt
# If running temporarily with uv
uvx scihub-cli input_file.txt
# If running directly from source
python -m scihub_cli input_file.txtWhere input_file.txt is a text file containing DOIs or paper URLs, one per line.
Create a text file with one identifier per line. Supports:
- DOIs:
10.1038/nature12373 - arXiv IDs:
2301.12345orarxiv:2401.00001 - URLs:
- DOI URLs:
https://doi.org/10.1126/science.abc1234 - Direct PDF URLs:
https://files.eric.ed.gov/fulltext/EJ1358705.pdf - PMC article URLs:
https://pmc.ncbi.nlm.nih.gov/articles/PMC6505544/ - OA landing pages (auto PDF extraction):
https://example.org/article/123
- DOI URLs:
Example papers.txt:
# Comments start with a hash symbol
10.1038/s41586-020-2649-2
2301.12345
arxiv:2401.00001
https://files.eric.ed.gov/fulltext/EJ1358705.pdf
https://pmc.ncbi.nlm.nih.gov/articles/PMC6505544/
10.1016/s1003-6326(21)65629-7
Set an email if you want Unpaywall open-access lookup. If no email is provided, Unpaywall is skipped automatically.
# Enable Unpaywall by setting email
scihub-cli papers.txt --email [email protected]The email is saved to ~/.scihub-cli/config.json and sent only to Unpaywall for rate limiting (not tracking).
usage: scihub-cli [-h] [-o OUTPUT] [-m MIRROR] [-t TIMEOUT] [-r RETRIES] [-p PARALLEL]
[--email EMAIL] [-v] [--version] input_file
Download academic papers from Sci-Hub and Unpaywall in batch mode.
positional arguments:
input_file Text file containing DOIs or URLs (one per line)
options:
-h, --help show this help message and exit
-o OUTPUT, --output OUTPUT
Output directory for downloaded PDFs (default: ./downloads)
-m MIRROR, --mirror MIRROR
Specific Sci-Hub mirror to use
-t TIMEOUT, --timeout TIMEOUT
Request timeout in seconds (default: 15)
-r RETRIES, --retries RETRIES
Number of retries for failed downloads (default: 3)
-p PARALLEL, --parallel PARALLEL
Number of parallel downloads (threads)
--email EMAIL Email for Unpaywall API (saves to config file)
-v, --verbose Enable verbose logging
--version show program's version number and exit
scihub-cli stores configuration in ~/.scihub-cli/config.json:
{
"email": "[email protected]"
}You can edit this file directly or use --email to update it.
# Basic usage
scihub-cli papers.txt
# Specify output directory
scihub-cli -o research/papers papers.txt
# Use specific mirror
scihub-cli -m https://sci-hub.se papers.txt
# Increase verbosity
scihub-cli -v papers.txtThe tool uses intelligent multi-source routing:
- Year Detection: Queries Crossref API to determine publication year
- Smart Routing:
- Papers before 2021 → Try OA sources first, then Sci-Hub fallback
- Papers 2021+ → Try OA sources only (skip Sci-Hub)
- Unknown year → OA sources first with Sci-Hub fallback
- Download Process:
- Get PDF URL from selected source
- Download with progress tracking
- Validate PDF (reject HTML files)
- Generate filename from metadata:
[YYYY] - [Title].pdf
- Sci-Hub: Strong fallback for pre-2021 coverage, but stopped updating in 2020
- Unpaywall: Best for 2021+ open access papers (25-35% coverage for recent papers)
- Combined: OA-first speed with Sci-Hub fallback for older literature
The tool automatically adapts HTTP headers for different publishers:
- MDPI: Uses
curl/8.0.0(required by their CDN) - Others: Uses browser User-Agent for compatibility
| Year Range | Primary Source | Success Rate |
|---|---|---|
| Before 2021 | OA first (Sci-Hub fallback) | 85-90% |
| 2021+ | OA (Unpaywall/arXiv/CORE) | 25-35% |
| Overall | Multi-source | 75-80% |
- Not all papers are available through Sci-Hub or Unpaywall
- Unpaywall only covers open access papers
- Some publishers may block automated downloads
- Sci-Hub mirrors may change or become unavailable
This tool is provided for educational and research purposes only. Users are responsible for ensuring they comply with applicable laws and regulations when using this tool.
The project includes comprehensive tests for multi-source functionality:
# Run all tests
cd tests
uv run python test_functionality.py
uv run python -m unittest test_metadata_utils.py -v
# Or run all unit tests
uv run python -m unittest discover -vThe test suite covers:
- ✅ Multi-Source Download: Tests OA-first routing with Sci-Hub fallback for pre-2021 papers
- ✅ PDF Validation: Verifies downloaded files have valid PDF headers
- ✅ Mirror Connectivity: Tests all Sci-Hub mirrors for accessibility
- ✅ Metadata Extraction: Tests Unpaywall API metadata retrieval
- ✅ Filename Generation: Tests filename sanitization and edge cases
Multi-source download: 2/2 PASS
- 2013 paper (944 KB) via OA-first (Sci-Hub fallback if OA fails) ✓
- 2021 paper (1.6 MB) via OA (Sci-Hub skipped) ✓
PDF validation: All valid ✓
Metadata extraction: PASS ✓
This project is licensed under the MIT License - see the LICENSE file for details.