We have developed an nf-HiChIP pipeline that combines the analytical approach for ChIP-seq data processing (mapping, filtering, peak calling, coverage tracks calculations) with HiChIP-specific analysis (MAPS pipeline, Juric, Ivan, et al.). This pipeline enables users to conduct thorough and efficient analysis of multiple HiChIP datasets simultaneously, eliminating the requirement for additional ChIP-seq experiments. This workflow is based on the reference implementation of the method designed by Zofia Tojek. The original version is available here.
- You can get familiar with Nextflow options.
-resumeflag allows you to execute the pipeline from the last successful step.- For more details, see Nextflow documentation.
Docker image available:
https://hub.docker.com/repository/docker/mateuszchilinski/hichip-nf-pipeline/general
Command to run Docker image (use -v to bind folder with data):
docker run -v /path_to_your_data/:/data_in_container/ -it mateuszchilinski/hichip-nf-pipeline:latest bash
Required Files for Reference Folder (Total 6 files) -
1. Reference fasta files -
> Homo_sapiens_assembly38.fasta
2. BWA Reference Index files -
> Homo_sapiens_assembly38.fasta.amb
> Homo_sapiens_assembly38.fasta.ann
> Homo_sapiens_assembly38.fasta.bwt
> Homo_sapiens_assembly38.fasta.pac
> Homo_sapiens_assembly38.fasta.sa
Example 1 for design.csv file
If you do not have raw and processed results (narrow peaks) from the ChIP-Seq experiment
| sample | fastq_1 | fastq_2 | replicate | chipseq |
|---|---|---|---|---|
| S1 | /data/SAMPLE1_1_R1.fastq.gz | /data/SAMPLE1_1_R2.fastq.gz | 1 | None |
| S1 | /data/SAMPLE1_2_R1.fastq.gz | /data/SAMPLE1_2_R2.fastq.gz | 2 | None |
| S2 | /data/SAMPLE2_1_R1.fastq.gz | /data/SAMPLE2_1_R2.fastq.gz | 1 | None |
| S2 | /data/SAMPLE2_2_R1.fastq.gz | /data/SAMPLE2_2_R2.fastq.gz | 2 | None |
Note -
- "None" (note the capital letter) in the last column.
- In this case, pseudo-ChIP-Seq data will be generated from HiChIP data.
Example 2 for design.csv file
If you have processed ChIP-Seq experiment results (in the form of narrow peaks)
| sample | fastq_1 | fastq_2 | replicate | chipseq |
|---|---|---|---|---|
| S1 | /data/SAMPLE1_1_R1.fastq.gz | /data/SAMPLE1_1_R2.fastq.gz | 1 | /data/SAMPLE1.narrowPeak |
| S1 | /data/SAMPLE1_2_R1.fastq.gz | /data/SAMPLE1_2_R2.fastq.gz | 2 | /data/SAMPLE1.narrowPeak |
| S2 | /data/SAMPLE2_1_R1.fastq.gz | /data/SAMPLE2_1_R2.fastq.gz | 1 | /data/SAMPLE2.narrowPeak |
| S2 | /data/SAMPLE2_2_R1.fastq.gz | /data/SAMPLE2_2_R2.fastq.gz | 2 | /data/SAMPLE2.narrowPeak |
Note -
- Remember, the pipeline requires chromosome names in the "chrX" format (e.g., chr1, chr14, chr21) in the narrowpeak file.
- Ensure peak files follow this naming convention and the BED6+4 format.
Example 3 for design.csv file
If you have raw ChIP-Seq data but the peaks have not been called yet
| sample | fastq_1 | fastq_2 | input_1 | input_2 | replicate |
|---|---|---|---|---|---|
| S1 | /data/SAMPLE1_1_R1.fastq.gz | /data/SAMPLE1_1_R2.fastq.gz | /data/SAMPLE1_INPUT_R1.fastq.gz | /data/SAMPLE1_INPUT_R2.fastq.gz | 1 |
| S1 | /data/SAMPLE1_2_R1.fastq.gz | /data/SAMPLE1_2_R2.fastq.gz | /data/SAMPLE1_INPUT_R1.fastq.gz | /data/SAMPLE1_INPUT_R2.fastq.gz | 2 |
| S2 | /data/SAMPLE2_1_R1.fastq.gz | /data/SAMPLE2_1_R2.fastq.gz | /data/SAMPLE2_INPUT_R1.fastq.gz | /data/SAMPLE2_INPUT_R2.fastq.gz | 1 |
| S2 | /data/SAMPLE2_2_R1.fastq.gz | /data/SAMPLE2_2_R2.fastq.gz | /data/SAMPLE2_INPUT_R1.fastq.gz | /data/SAMPLE2_INPUT_R2.fastq.gz | 2 |
To run for design file example 1 and example 2, use the main.nf with parameter (use the command inside the container):
/opt/nextflow run main.nf --design design.csv
To run for design file example 3: use the main_chipseq.nf with parameter (use the command inside the container):
/opt/nextflow run main_chipseq.nf --design design.csv
Example
/opt/nextflow run \
/mnt/sfglab/nf-hichip/nf-hichip/main.nf \
--ref /mnt/sfglab/Data/References/Genome/hg38/Homo_sapiens_assembly38/Homo_sapiens_assembly38.fasta \
--chrom_sizes /mnt/sfglab/Data/References/Genome/hg38/Homo_sapiens_assembly38/hg38.sizes \
--outdir /mnt/sfglab/workspaces/output/HiChIP_HG00731 \
--design /mnt/sfglab/workspaces/design/design_HiChIP_HG00731.csv \
--threads 4 \
--mem 10 \
--mapq 30 \
--peak_quality 0.01 \
The parameters of the pipeline can be found in the following table. All of them are optional:
| Parameter | Description | Default |
|---|---|---|
| --ref | Reference genome for the analysis. | /workspaces/hichip-nf-pipeline/ref/Homo_sapiens_assembly38.fasta |
| --outdir | Folder with the final results. | results |
| --design | .csv file containing information about samples and replicates. | /workspaces/hichip-nf-pipeline/design_high.csv |
| --chrom_sizes | Sizes of chromosomes for the specific reference genome. | /workspaces/hichip-nf-pipeline/hg38.chrom.sizes |
| --threads | Threads are to be used in each task. | 4 |
| --mem | Memory to use (in GB) for all samtools tasks (per-sample - e.g., 4 samples with 4 threads with 4GB would consume 64GB of memory). | 16 |
| --mapq | MAPQ for MAPS. | 30 |
| --peak_quality | Quality parameter (q-value (minimum FDR) cutoff) for MACS3. | 0.05 |
| --genome_size | Genome size string for MACS3. | hs |
For Post-processing and figure recreation, please follow the scripts in the folder post_processing
If you use nf-HiChIP in your research (the idea, the algorithm, the analysis scripts, or the supplemental data), please give us a star on the GitHub repo page and cite our paper as follows:
Buka, K., Parteka-Tojek, Z., Agarwal, A. et al. Improved cohesin HiChIP protocol and bioinformatic analysis for robust detection of chromatin loops and stripes. Commun Biol 8, 437 (2025). https://doi.org/10.1038/s42003-025-07847-w