MOHD ATAC-seq Snakemake Pipeline

This repository contains a modular Snakemake pipeline for processing ATAC-seq data from raw FASTQ files through quality control, alignment, peak calling, and downstream analyses (e.g., fragment-length distribution, TSS enrichment, FRiP). The workflow is fully configurable via a config.yaml file.

📋 Table of Contents

Features

Automatic detection of samples and paired-end FASTQ files
Adapter trimming (Cutadapt) and dual FastQC reports (pre- and post-trimming)
Bowtie2 alignment (paired-end) with customizable parameters
BAM filtering, duplicate marking (Picard), and indexing
Peak calling with MACS3 (multiple q-value thresholds)
Generation of BigWig coverage tracks
Quality control plots: fragment-length distribution, TSS enrichment, FRiP score
Support for ChromBPNet preprocessing and bias modeling
Pseudoreplicate generation and IDR analysis

Installation

Clone this repository

git clone https://github.com/grandrews7/mohd-atac.git
cd mohd-atac

Create and activate a Snakemake environment

mamba create -n snakemake-env -f env/snakemake.yaml

**Create environment with required software

mamba create -n MOHD-ATAC -f env/mohd-atac.yaml

Configuration (`config.yaml`)

All pipeline parameters and file locations are controlled in config.yaml. A minimal example:

# config.yaml

data_dir: "data/merged_data"
fastq_pattern: "{sample}_{read}.fastq.gz"
samples: []
reads:
  - "R1"
  - "R2"
genomes:
  - "GRCh38"
qvals:
  - 0.05
resources_dir: "resources"

Modify paths or add additional genomes, q-values, or samples as needed.

Usage

Run the full pipeline with 64 cores using conda/mamba (adjust --cores as needed):

mamba activate snakemake-env
snakemake \
  --cores 4 \
  --use-conda \
  --configfile config.yaml \
  --printshellcmds

Or submit to slurm:

snakemake \
  --profile slurm \
  --configfile config.yaml

Pipeline Steps

FastQC (raw)
Cutadapt trimming
FastQC (trimmed)
Bowtie2 alignment (paired-end)
Alignment filtering filtering, mate fixing, deduplication (Picard)
BEDPE/tagAlign Convert alignments to BEDPE & ENCODE tagAlign formats
MACS3 peak calling (variable q-values)
bedGraphToBigWig → convert signal to bigWig format
QC metrics: fragment-length distribution, TSS enrichment, FRiP
Pseudoreplicates
IDR analysis

Directory Structure

├── Snakefile
├── config.yaml
├── envs/                # conda environment YAMLs
├── scripts/             # custom Python scripts
├── resources/           # genome FASTA, sizes, blacklist, etc.
├── results/             # pipeline outputs
├── logs/                # per-rule log files
└── README.md            # this file

Contributing

Feel free to open issues or pull requests. For substantial changes, please discuss first via GitHub Issues.

License

This project is released under the MIT License. See LICENSE for details.

Test

This is a test

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
config		config
notebooks		notebooks
profile/slurm		profile/slurm
workflow/rules		workflow/rules
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
Snakefile		Snakefile
TO_DOs.txt		TO_DOs.txt
dryrun.log		dryrun.log
main.py		main.py
notes.md		notes.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

MOHD ATAC-seq Snakemake Pipeline

📋 Table of Contents

Features

Installation

Configuration (`config.yaml`)

Usage

Pipeline Steps

Directory Structure

Contributing

License

Test

About

Uh oh!

Releases

Packages

Languages

weng-lab/atac-smk

Folders and files

Latest commit

History

Repository files navigation

MOHD ATAC-seq Snakemake Pipeline

📋 Table of Contents

Features

Installation

Configuration (config.yaml)

Usage

Pipeline Steps

Directory Structure

Contributing

License

Test

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Configuration (`config.yaml`)

Packages