pg4Findr

Find G-quadruplex motifs in sequencing reads/genome assemblies. Input is expected to be a fastq (optionally can be gzipped) or a fasta (if you're working with a genome assembly). Output is a bed file with the columns sequence_id, start, end, G4, length, strand. Default output is to stdout. See https://github.com/samtools/hts-specs/blob/master/BEDv1.pdf for more information on the BED file format.

The sequences are found via a regular expression explained in https://doi.org/10.1093/nar/gki609 with Rust Regex's find_iter() (https://docs.rs/regex/latest/regex/struct.Regex.html#method.find_iter) to avoid overlaps and repeating counts.

Usage

# The default output is to stdout, you can redirect it to a file. the output is in a standard bed file format.

# quick run with cargo
cargo run -- --reads /path/to/reads.fastq(.gz) > g4_motifs.bed

# Works with multiple read files, and pipe the output to gzip/pigz before saving
cargo run -- --reads ../*.fastq.gz  | pigz > g4_motifs.bed.gz 

# Run on Polar2020
/data2/work/local/pg4Findr/pg4Findr --reads /path/to/read/or/assembly.fa | pigz > g4_motifs.bed.gz

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
src		src
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

pg4Findr

Usage

About

Uh oh!

Releases

Packages

Languages

artorias111/pg4Findr

Folders and files

Latest commit

History

Repository files navigation

pg4Findr

Usage

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages