Thanks to visit codestin.com
Credit goes to Github.com

Skip to content

Add GATK contamination check to complement VerifyBamID2#758

Open
dorotejavujinovic wants to merge 9 commits intonf-core:devfrom
dorotejavujinovic:gatk-contamination-clean
Open

Add GATK contamination check to complement VerifyBamID2#758
dorotejavujinovic wants to merge 9 commits intonf-core:devfrom
dorotejavujinovic:gatk-contamination-clean

Conversation

@dorotejavujinovic
Copy link

PR checklist

  • This comment contains a description of changes (with reason)
  • If you've fixed a bug or added code that should be tested, add tests!
  • If necessary, also make a PR on the nf-core/raredisease branch on the nf-core/test-datasets repo
  • Ensure the test suite passes (nextflow run . -profile test,docker).
  • Make sure your code lints (nf-core lint .).
  • Documentation in docs is updated
  • CHANGELOG.md is updated
  • README.md is updated

Description

Adds GATK-based contamination detection to complement VerifyBamID2.

Background

  • VerifyBamID2 works well for WGS but has significant limitations with WES data
  • GATK CalculateContamination performs better on targeted sequencing (WES)
  • Having both methods for WGS provides cross-validation

Implementation

  • New subworkflow: CONTAMINATION_CHECK using GATK4 GetPileupSummaries and CalculateContamination
  • New module: PARSE_CONTAMINATION for MultiQC integration
  • Conditional interval handling: WGS (genome-wide) vs WES (target regions)
  • MultiQC configuration with color-coded thresholds

Usage

params.run_contamination = true
params.contamination_sites = "small_exac_common_3.hg38.vcf.gz"
params.contamination_sites_tbi = "small_exac_common_3.hg38.vcf.gz.tbi"

Testing

Tested on both WGS and WES samples with successful integration into MultiQC reports.

Doroteja Vujinovic and others added 7 commits November 26, 2025 11:06
- Add CONTAMINATION_CHECK subworkflow using GATK4
- Add PARSE_CONTAMINATION module for MultiQC integration
- Add GATK4 GetPileupSummaries and CalculateContamination modules
- Implement conditional intervals handling (WGS vs WES)
- Update workflow to integrate contamination check after QC_BAM
- Configure MultiQC to display contamination results
Updated GATK contamination configuration for clarity and consistency.
Introduced GATK contamination check for WES/WGS samples, added new parameters and subworkflow, and updated MultiQC configuration.
@nf-core-bot
Copy link
Member

Warning

Newer version of the nf-core template is available.

Your pipeline is using an old version of the nf-core template: 3.3.1.
Please update your pipeline to the latest version.

For more documentation on how to update your pipeline, please see the nf-core documentation and Synchronisation documentation.

@dorotejavujinovic dorotejavujinovic changed the title Gatk contamination clean Add GATK contamination check to complement VerifyBamID2 Dec 15, 2025
Copy link
Collaborator

@ramprasadn ramprasadn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @dorotejavujinovic!

ext.prefix = { "${meta.id}_sorted_md" }
publishDir = [
enabled: !params.save_mapped_as_cram,
enabled: true,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am curious why do you want to change this?

withName: '.*ALIGN:ALIGN_BWA_BWAMEM2_BWAMEME:SAMTOOLS_INDEX_MARKDUP' {
publishDir = [
enabled: !params.save_mapped_as_cram,
enabled: true,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and this

script:
def prefix = task.ext.prefix ?: "${meta.id}"
"""
#!/usr/bin/env python3
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you make this a module binary? We have had issues in the past with some systems interpreting indents differently.

v.write('"${task.process}":\\n')
v.write(' python: "3.11"\\n')
"""
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also can you add a stub section?

Comment on lines +8 to +14
### Added

- Added GATK contamination check for WES/WGS samples as complement to VerifyBamID2
- New parameters: `run_contamination`, `contamination_sites`, `contamination_sites_tbi`
- CONTAMINATION_CHECK subworkflow using GATK4 GetPileupSummaries and CalculateContamination
- PARSE_CONTAMINATION module for MultiQC integration
- Contamination results displayed in MultiQC with color-coded thresholds
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can add your log entries to 2.7.0dev since its the one in development. And don't forget to link the PR to your entries ;)

Also, we have a separate table for parameters and new tools under the ##Fixed section of 2.7.0dev, so you can add that information there.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a test for this subworkflow? We are currently in the process of adding subworkflow level tests using nf-test, so it would be fantastic if you can include one for this subworkflow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants