Snakemake workflow: `colora`

A Snakemake workflow for for genome assembly.

Why colora? 🐍 Colora means "snake" in Sardinian language 🐍

Input reads: hifi reads, optionally ONT, and hic reads. Other inputs: oatk database, ncbi FCS database (optional), BUSCO database (to be implemented)

Usage

The usage of this workflow is described in the Snakemake Workflow Catalog.

If you use this workflow in a paper, don't forget to give credits to the authors by citing the URL of this (original) sitory and its DOI (see above).

place raw hifi reads in resources/raw_hifi
place oatk database of interest from github.oatkdb.repo in resources/oatkDB
place raw hic reads in resources/raw_hic
place ncbi database for FCS-GX in resources/gx_db (optional, this needs ~500GB of disk space and a large RAM)

How to run colora:

snakemake --software-deployment-method conda --snakefile workflow/Snakefile --cores all

snakemake --software-deployment-method conda --snakefile workflow/Snakefile --cores all --dry-run


#for the cluster:

snakemake --software-deployment-method conda --conda-frontend mamba --snakefile workflow/Snakefile --cores 100

Before executing the command, ensure you have appropriately changed your config.yaml

Test the pipeline:

1. Download test data
1. Download oatk DB

git clone https://github.com/c-zhou/OatkDB.git
cd colora/resources
mkdir oatkDB
cp path/to/where/you/cloned/OatkDB/v20230921/dikarya* oatkDB/

1. Download FCS-GX test database

You can skip this step if you are not going to run the decontamination step with FCS-GX

mamba create -n ncbi_fcsgx ncbi-fcs-gx
mamba activate ncbi_fcsgx
cd colora/resources
mkdir gx_test_db
cd gx_test_db
sync_files.py get --mft https://ftp.ncbi.nlm.nih.gov/genomes/TOOLS/FCS/database/test-only/test-only.manifest --dir ./test-only

1. Run the test pipeline

snakemake --configfile config/config_test.yaml --software-deployment-method conda --snakefile workflow/Snakefile --cores 4

TODO

The workflow will occur in the snakemake-workflow-catalog once it has been made public. Then the link under "Usage" will point to the usage instructions if <owner> and <repo> were correctly set.

Notes:

Arima pipeline - changes compared to the original pipeline:
- creating conda environments with needed tools so no need to specify tools' path
- Remove the PREFIX line and the option -p $PREFIX from the bwa command, it is not necessary and creates problems in the reading of files
- add -M flag in bwa mem command - step 1.A and 1.B
- pipeline split in several rules

Name		Name	Last commit message	Last commit date
Latest commit History 196 Commits
.github/workflows		.github/workflows
.template		.template
config		config
scripts		scripts
workflow		workflow
.gitignore		.gitignore
.snakemake-workflow-catalog.yml		.snakemake-workflow-catalog.yml
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Snakemake workflow: `colora`

Usage

TODO

About

Uh oh!

Releases

Packages

Languages

License

tbooth/colora

Folders and files

Latest commit

History

Repository files navigation

Snakemake workflow: colora

Usage

TODO

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Snakemake workflow: `colora`

Packages