This repository contains a modular Snakemake pipeline for processing ATAC-seq data from raw FASTQ files through quality control, alignment, peak calling, and downstream analyses (e.g., fragment-length distribution, TSS enrichment, FRiP). The workflow is fully configurable via a config.yaml file.
- Features
- Prerequisites
- Installation
- Configuration (
config.yaml) - Usage
- Pipeline Steps
- Directory Structure
- Contributing
- License
- Automatic detection of samples and paired-end FASTQ files
- Adapter trimming (Cutadapt) and dual FastQC reports (pre- and post-trimming)
- Bowtie2 alignment (paired-end) with customizable parameters
- BAM filtering, duplicate marking (Picard), and indexing
- Peak calling with MACS3 (multiple q-value thresholds)
- Generation of BigWig coverage tracks
- Quality control plots: fragment-length distribution, TSS enrichment, FRiP score
- Support for ChromBPNet preprocessing and bias modeling
- Pseudoreplicate generation and IDR analysis
-
Clone this repository
git clone https://github.com/grandrews7/mohd-atac.git cd mohd-atac -
Create and activate a Snakemake environment
mamba create -n snakemake-env -f env/snakemake.yaml
-
**Create environment with required software
mamba create -n MOHD-ATAC -f env/mohd-atac.yaml
All pipeline parameters and file locations are controlled in config.yaml. A minimal example:
# config.yaml
data_dir: "data/merged_data"
fastq_pattern: "{sample}_{read}.fastq.gz"
samples: []
reads:
- "R1"
- "R2"
genomes:
- "GRCh38"
qvals:
- 0.05
resources_dir: "resources"Modify paths or add additional genomes, q-values, or samples as needed.
Run the full pipeline with 64 cores using conda/mamba (adjust --cores as needed):
mamba activate snakemake-env
snakemake \
--cores 4 \
--use-conda \
--configfile config.yaml \
--printshellcmdsOr submit to slurm:
snakemake \
--profile slurm \
--configfile config.yaml- FastQC (raw)
- Cutadapt trimming
- FastQC (trimmed)
- Bowtie2 alignment (paired-end)
- Alignment filtering filtering, mate fixing, deduplication (Picard)
- BEDPE/tagAlign Convert alignments to BEDPE & ENCODE tagAlign formats
- MACS3 peak calling (variable q-values)
- bedGraphToBigWig → convert signal to bigWig format
- QC metrics: fragment-length distribution, TSS enrichment, FRiP
- Pseudoreplicates
- IDR analysis
├── Snakefile
├── config.yaml
├── envs/ # conda environment YAMLs
├── scripts/ # custom Python scripts
├── resources/ # genome FASTA, sizes, blacklist, etc.
├── results/ # pipeline outputs
├── logs/ # per-rule log files
└── README.md # this file
Feel free to open issues or pull requests. For substantial changes, please discuss first via GitHub Issues.
This project is released under the MIT License. See LICENSE for details.
This is a test