A utility to retrieve fastq files with ffq and aria2. It is designed to be fast and efficient, allowing you to download large datasets quickly and easily. This tool can be used to fetch fastq files from various public repositories, including:
- GEO: Gene Expression Omnibus,
- SRA: Sequence Read Archive,
- EMBL-EBI: European Molecular BIology Laboratory’s European BIoinformatics Institute.
Important
- Fast: Uses
aria2to download files in parallel, which can significantly speed up the download process. - Integrity: Verifies the integrity of downloaded files using
md5sumto ensure that the files are not corrupted during the download process. - Retry Mechanism: Automatically attempts to re-download files if the initial download fails, ensuring successful retrieval of data.
# Fetch fastq files of GSE52856
ngsfetch -i GSE52856 -o /path/to/output/GSE52856 -p 16
# Fetch fastq files of SRP175008
ngsfetch -i SRP175008 -o /path/to/output/SRP175008 -p 16
# Fetch fastq files of ERP126666
ngsfetch -i ERP126666 -o /path/to/output/ERP126666 -p 16pip install ngsfetchor
git clone https://github.com/NaotoKubota/ngsfetch.git
cd ngsfetch
pip install .conda create -n ngsfetch python=3.9
conda activate ngsfetch
conda install -c bioconda ngsfetchdocker pull naotokubota/ngsfetch- Linux (i.e. where the
md5sumcommand is available)
- python (>=3.9)
- ffq (>=0.3.1)
- aria2 (>=0.0.1b0)
usage: ngsfetch [-h] [-i ID] [-o OUTPUT] [-p PROCESSES] [--attempts ATTEMPTS] [-v]
ngsfetch v0.1.1 - fast retrieval of metadata and fastq files with ffq and aria2c
optional arguments:
-h, --help show this help message and exit
-i ID, --id ID ID of the data to fetch
-o OUTPUT, --output OUTPUT
Output directory
-p PROCESSES, --processes PROCESSES
Number of processes to use (up to 16)
--attempts ATTEMPTS Number of attempts to fetch metadata and fastq files
-v, --verbose Increase verbosityThank you for wanting to improve ngsfetch! If you have any bugs or questions, feel free to open an issue or pull request.
- Naoto Kubota (0000-0003-0612-2300)