Thanks to visit codestin.com
Credit goes to github.com

Skip to content

tbooth/colora

 
 

Repository files navigation

Snakemake workflow: colora

Snakemake GitHub actions status DOI

A Snakemake workflow for for genome assembly.

Why colora? 🐍 Colora means "snake" in Sardinian language 🐍

Colora_1

Input reads: hifi reads, optionally ONT, and hic reads. Other inputs: oatk database, ncbi FCS database (optional), BUSCO database (to be implemented)

Usage

The usage of this workflow is described in the Snakemake Workflow Catalog.

If you use this workflow in a paper, don't forget to give credits to the authors by citing the URL of this (original) sitory and its DOI (see above).

  • place raw hifi reads in resources/raw_hifi
  • place oatk database of interest from github.oatkdb.repo in resources/oatkDB
  • place raw hic reads in resources/raw_hic
  • place ncbi database for FCS-GX in resources/gx_db (optional, this needs ~500GB of disk space and a large RAM)

How to run colora:

snakemake --software-deployment-method conda --snakefile workflow/Snakefile --cores all

snakemake --software-deployment-method conda --snakefile workflow/Snakefile --cores all --dry-run


#for the cluster:

snakemake --software-deployment-method conda --conda-frontend mamba --snakefile workflow/Snakefile --cores 100

Before executing the command, ensure you have appropriately changed your config.yaml

Test the pipeline:

    1. Download test data
    1. Download oatk DB
git clone https://github.com/c-zhou/OatkDB.git
cd colora/resources
mkdir oatkDB
cp path/to/where/you/cloned/OatkDB/v20230921/dikarya* oatkDB/
    1. Download FCS-GX test database

You can skip this step if you are not going to run the decontamination step with FCS-GX

mamba create -n ncbi_fcsgx ncbi-fcs-gx
mamba activate ncbi_fcsgx
cd colora/resources
mkdir gx_test_db
cd gx_test_db
sync_files.py get --mft https://ftp.ncbi.nlm.nih.gov/genomes/TOOLS/FCS/database/test-only/test-only.manifest --dir ./test-only
    1. Run the test pipeline
snakemake --configfile config/config_test.yaml --software-deployment-method conda --snakefile workflow/Snakefile --cores 4

TODO

  • The workflow will occur in the snakemake-workflow-catalog once it has been made public. Then the link under "Usage" will point to the usage instructions if <owner> and <repo> were correctly set.
  • Rule for Nanoplot
  • Rule for fastp
  • Rules for arima pipeline - split in several rules
  • Rule for yahs
  • integrate the snakemake report in the workflow: not necessary
  • input / output: hardcoded is okay
  • test dataset
  • test config file
  • test possibility to add ONT reads as optional param in hifiasm
  • test possibility to add HiC reads as optional params in hifiasm: file names change in this case. Need more study. Probably this needs a separate rule.
  • packages versions: create stable yaml files with conda export
  • add singularity and docker as option for environment management
  • implement ncbi FCS (decontamination) as optional rule (orange path in the scheme above)
  • make purging steps optional
  • slurm integration (profile)
  • setting of resources for each rule
  • Rule purge_dups.smk and purge_dups_alt.smk: redirecting outputs
  • implement assemblyQC - waiting for new Merqury release to make a new conda recipe (light green path above)
  • Formatting and linting to be fixed according to snakemake requirements
  • log files: some of them are empty because it's impossible to redirect stderr and stdout to the file

Notes:

  • Arima pipeline - changes compared to the original pipeline:
    • creating conda environments with needed tools so no need to specify tools' path
    • Remove the PREFIX line and the option -p $PREFIX from the bwa command, it is not necessary and creates problems in the reading of files
    • add -M flag in bwa mem command - step 1.A and 1.B
    • pipeline split in several rules

About

Tim's fork of COlora for PRs

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 78.2%
  • Perl 21.6%
  • Awk 0.2%