Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Search for G-quadruplex motifs in sequencing reads and genome assemblies.

Notifications You must be signed in to change notification settings

artorias111/pg4Findr

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

pg4Findr

Find G-quadruplex motifs in sequencing reads/genome assemblies. Input is expected to be a fastq (optionally can be gzipped) or a fasta (if you're working with a genome assembly). Output is a bed file with the columns sequence_id, start, end, G4, length, strand. Default output is to stdout. See https://github.com/samtools/hts-specs/blob/master/BEDv1.pdf for more information on the BED file format.

The sequences are found via a regular expression explained in https://doi.org/10.1093/nar/gki609 with Rust Regex's find_iter() (https://docs.rs/regex/latest/regex/struct.Regex.html#method.find_iter) to avoid overlaps and repeating counts.

Usage

# The default output is to stdout, you can redirect it to a file. the output is in a standard bed file format.

# quick run with cargo
cargo run -- --reads /path/to/reads.fastq(.gz) > g4_motifs.bed

# Works with multiple read files, and pipe the output to gzip/pigz before saving
cargo run -- --reads ../*.fastq.gz  | pigz > g4_motifs.bed.gz 

# Run on Polar2020
/data2/work/local/pg4Findr/pg4Findr --reads /path/to/read/or/assembly.fa | pigz > g4_motifs.bed.gz

About

Search for G-quadruplex motifs in sequencing reads and genome assemblies.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages