Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Issue running tumor only mode #38

@tomgutman

Description

@tomgutman

Hello,

I am trying to run wakhan in tumor only mode after using Severus. This is Nanopore data with a mean coverage of ~30X, clair3 and whatsapp were used to create ${tumor_phased_vcf file
I checked that reference files and all associated files have the same nomenclature e.g "1" and not "chr1"
I replaced --breakpoints ${SV_VCF} with --change-point-detection-for-cna but I got the same error

Would you have an explanation and a fix for this error ?

here is the command I'm running :

singularity exec --contain \
    -B /mnt/ ${WAKHAN} \
    python /Wakhan/wakhan.py --threads 12 \
    --reference ${REF} \
    --target-bam ${haplotaggued_tumor_bam} \
    --tumor-phased-vcf ${tumor_phased_vcf} \
    --genome-name ${SAMPLE_NAME} \
    --out-dir-plots ${OUTDIR2} \
    --contigs 1 \
    --breakpoints ${SV_VCF} \
    --loh-enable

Here is the log with the error I'm encoutering :

logs

[2025-09-23 09:06:29] INFO: Starting Wakhan 0.2.0
[2025-09-23 09:06:29] INFO: Cmd: /Wakhan/wakhan.py --threads 12 --reference /mnt/beegfs02/database/bioinfo/Index_DB/Fasta/Ensembl/GRCh38.109/homo_sapiens.GRCh38.109.fasta --target-bam /mnt/beegfs02/scratch/t_gutman/AML_merge_output_haplotagged.bam --tumor-phased-vcf /mnt/beegfs02/scratch/t_gutman/AML_merge_output_phased.vcf.gz --genome-name AML --out-dir-plots /mnt/beegfs02/scratch/t_gutman/wakhan/AML --contigs 1-22,X,Y --breakpoints /mnt/beegfs02/scratch/t_gutman/severus/AML/somatic_SVs/severus_somatic.vcf --loh-enable
[2025-09-23 09:06:29] INFO: Python version: 3.8.0 | packaged by conda-forge | (default, Nov 22 2019, 19:11:38) 
[GCC 7.3.0]
[2025-09-23 09:06:29] INFO: Starting hapcorrect() module...
[2025-09-23 09:06:30] INFO: Parsing reads from /mnt/beegfs02/scratch/t_gutman/AML_merge_output_haplotagged.bam
[2025-09-23 09:20:32] INFO: Parsed 22801547 segments
[2025-09-23 09:20:32] INFO: Computing coverage histogram
[2025-09-23 09:23:17] INFO: Writing tumor coverage for bins
[2025-09-23 09:23:17] INFO: Parsing phaseblocks information
[2025-09-23 09:23:17] INFO: bcftools -> Query for phasesets and GT, DP, VAF feilds by creating a CSV file
/Wakhan/src/hapcorrect/src/utils.py:62: DtypeWarning: Columns (0) have mixed types. Specify dtype option on import or set low_memory=False.
  dataframe = pd.read_csv(path, sep=sept, names=names)
[2025-09-23 09:23:44] INFO: Computing coverage for bins
[2025-09-23 09:23:44] INFO: bcftools -> Query for het SNPs and creating a /mnt/beegfs02/scratch/t_gutman/wakhan/AML/data_phasing/AML_merge_output_phased.vcf_het_snps.csv CSV file
[2025-09-23 09:23:54] INFO: SNPs frequency -> CSV to dataframe conversion for heterozygous SNPs
/Wakhan/src/hapcorrect/src/utils.py:62: DtypeWarning: Columns (0) have mixed types. Specify dtype option on import or set low_memory=False.
  dataframe = pd.read_csv(path, sep=sept, names=names)
[2025-09-23 09:24:07] INFO: SNPs frequency -> Computing SNPs frequency from tumor BAM
[mpileup] 1 samples in 1 input files
[mpileup] 1 samples in 1 input files
[mpileup] 1 samples in 1 input files
[mpileup] 1 samples in 1 input files
[mpileup] 1 samples in 1 input files
[mpileup] 1 samples in 1 input files
[mpileup] 1 samples in 1 input files
[mpileup] 1 samples in 1 input files
[mpileup] 1 samples in 1 input files
[mpileup] 1 samples in 1 input files
[mpileup] 1 samples in 1 input files
[mpileup] 1 samples in 1 input files
[2025-09-23 10:10:13] INFO: SNPs frequency -> Computing ACGTs frequencies for heterozygous SNPs
/Wakhan/src/hapcorrect/src/utils.py:62: DtypeWarning: Columns (0) have mixed types. Specify dtype option on import or set low_memory=False.
  dataframe = pd.read_csv(path, sep=sept, names=names)
/Wakhan/src/hapcorrect/src/utils.py:62: DtypeWarning: Columns (0) have mixed types. Specify dtype option on import or set low_memory=False.
  dataframe = pd.read_csv(path, sep=sept, names=names)
[2025-09-23 10:12:18] INFO: bcftools -> Query for phasesets and GT, DP, VAF feilds by creating a CSV file
/Wakhan/src/hapcorrect/src/utils.py:62: DtypeWarning: Columns (0) have mixed types. Specify dtype option on import or set low_memory=False.
  dataframe = pd.read_csv(path, sep=sept, names=names)
[2025-09-23 10:12:51] INFO: Loading coverage (bins) and coverage (phaseblocks) datasets for 1
[2025-09-23 10:14:49] INFO: Loading coverage (bins) and coverage (phaseblocks) datasets for 2
[2025-09-23 10:17:06] INFO: Loading coverage (bins) and coverage (phaseblocks) datasets for 3
[2025-09-23 10:18:41] INFO: Loading coverage (bins) and coverage (phaseblocks) datasets for 4
[2025-09-23 10:20:13] INFO: Loading coverage (bins) and coverage (phaseblocks) datasets for 5
[2025-09-23 10:21:30] INFO: Loading coverage (bins) and coverage (phaseblocks) datasets for 6
[2025-09-23 10:22:41] INFO: Loading coverage (bins) and coverage (phaseblocks) datasets for 7
[2025-09-23 10:23:48] INFO: Loading coverage (bins) and coverage (phaseblocks) datasets for 8
[2025-09-23 10:24:37] INFO: Loading coverage (bins) and coverage (phaseblocks) datasets for 9
[2025-09-23 10:25:27] INFO: Loading coverage (bins) and coverage (phaseblocks) datasets for 10
[2025-09-23 10:26:21] INFO: Loading coverage (bins) and coverage (phaseblocks) datasets for 11
[2025-09-23 10:27:14] INFO: Loading coverage (bins) and coverage (phaseblocks) datasets for 12
[2025-09-23 10:28:05] INFO: Loading coverage (bins) and coverage (phaseblocks) datasets for 13
[2025-09-23 10:28:37] INFO: Loading coverage (bins) and coverage (phaseblocks) datasets for 14
[2025-09-23 10:29:06] INFO: Loading coverage (bins) and coverage (phaseblocks) datasets for 15
[2025-09-23 10:29:35] INFO: Loading coverage (bins) and coverage (phaseblocks) datasets for 16
[2025-09-23 10:29:59] INFO: Loading coverage (bins) and coverage (phaseblocks) datasets for 17
[2025-09-23 10:30:21] INFO: Loading coverage (bins) and coverage (phaseblocks) datasets for 18
[2025-09-23 10:30:45] INFO: Loading coverage (bins) and coverage (phaseblocks) datasets for 19
[2025-09-23 10:30:59] INFO: Loading coverage (bins) and coverage (phaseblocks) datasets for 20
[2025-09-23 10:31:16] INFO: Loading coverage (bins) and coverage (phaseblocks) datasets for 21
[2025-09-23 10:31:25] INFO: Loading coverage (bins) and coverage (phaseblocks) datasets for 22
[2025-09-23 10:31:36] INFO: Loading coverage (bins) and coverage (phaseblocks) datasets for X
[2025-09-23 10:31:59] INFO: Loading coverage (bins) and coverage (phaseblocks) datasets for Y
[2025-09-23 10:33:07] INFO: Total phased length: 1961077905
[2025-09-23 10:33:07] INFO: Phase blocks N50: 318259
[2025-09-23 10:33:08] INFO: VCF edit for phase change segments
[2025-09-23 10:35:38] INFO: Total phased length: 3335716963
[2025-09-23 10:35:38] INFO: Phase blocks N50: 1783457
[2025-09-23 10:35:40] INFO: SNPs frequencies plots generation for 1
[2025-09-23 10:36:47] INFO: SNPs frequencies plots generation for 2
[2025-09-23 10:37:58] INFO: SNPs frequencies plots generation for 3
[2025-09-23 10:38:49] INFO: SNPs frequencies plots generation for 4
[2025-09-23 10:39:42] INFO: SNPs frequencies plots generation for 5
[2025-09-23 10:40:25] INFO: SNPs frequencies plots generation for 6
[2025-09-23 10:41:05] INFO: SNPs frequencies plots generation for 7
[2025-09-23 10:42:00] INFO: SNPs frequencies plots generation for 8
[2025-09-23 10:42:31] INFO: SNPs frequencies plots generation for 9
[2025-09-23 10:42:59] INFO: SNPs frequencies plots generation for 10
[2025-09-23 10:43:30] INFO: SNPs frequencies plots generation for 11
[2025-09-23 10:43:59] INFO: SNPs frequencies plots generation for 12
[2025-09-23 10:44:29] INFO: SNPs frequencies plots generation for 13
[2025-09-23 10:44:48] INFO: SNPs frequencies plots generation for 14
[2025-09-23 10:45:04] INFO: SNPs frequencies plots generation for 15
[2025-09-23 10:45:20] INFO: SNPs frequencies plots generation for 16
[2025-09-23 10:45:35] INFO: SNPs frequencies plots generation for 17
[2025-09-23 10:45:48] INFO: SNPs frequencies plots generation for 18
[2025-09-23 10:46:04] INFO: SNPs frequencies plots generation for 19
[2025-09-23 10:46:13] INFO: SNPs frequencies plots generation for 20
[2025-09-23 10:46:23] INFO: SNPs frequencies plots generation for 21
[2025-09-23 10:46:29] INFO: SNPs frequencies plots generation for 22
[2025-09-23 10:46:35] INFO: SNPs frequencies plots generation for X
[2025-09-23 10:46:53] INFO: SNPs frequencies plots generation for Y
[2025-09-23 10:47:07] INFO: hapcorrect() module finished successfully.
[2025-09-23 10:47:07] INFO: Starting cna() module...
[2025-09-23 10:47:08] INFO: Split variants = True
[2025-09-23 10:47:08] INFO: check info = True
[2025-09-23 10:47:08] INFO: Allele symbol = 0
[2025-09-23 10:47:08] INFO: Initializing HeaderParser
[2025-09-23 10:47:08] INFO: Reading vcf form file /mnt/beegfs02/scratch/t_gutman/severus/AML/somatic_SVs/severus_somatic.vcf
[2025-09-23 10:47:08] INFO: Setting self.individuals to ['AML_merge_output_haplotagged']
[2025-09-23 10:47:09] INFO: Using existing phase corrected coverage data
[2025-09-23 10:47:09] INFO: Generating coverage plots chromosomes-wise
[2025-09-23 10:47:09] INFO: Split variants = True
[2025-09-23 10:47:09] INFO: check info = True
[2025-09-23 10:47:09] INFO: Allele symbol = 0
[2025-09-23 10:47:09] INFO: Initializing HeaderParser
[2025-09-23 10:47:09] INFO: Reading vcf form file /mnt/beegfs02/scratch/t_gutman/severus/AML/somatic_SVs/severus_somatic.vcf
[2025-09-23 10:47:09] INFO: Setting self.individuals to ['AML_merge_output_haplotagged']
[2025-09-23 10:47:09] INFO: Split variants = True
[2025-09-23 10:47:09] INFO: check info = True
[2025-09-23 10:47:09] INFO: Allele symbol = 0
[2025-09-23 10:47:09] INFO: Initializing HeaderParser
[2025-09-23 10:47:09] INFO: Reading vcf form file /mnt/beegfs02/scratch/t_gutman/severus/AML/somatic_SVs/severus_somatic.vcf
[2025-09-23 10:47:09] INFO: Setting self.individuals to ['AML_merge_output_haplotagged']
[2025-09-23 10:47:10] INFO: Plots generation for 1
Traceback (most recent call last):
  File "/Wakhan/wakhan.py", line 28, in <module>
    main()
  File "/Wakhan/wakhan.py", line 24, in main
    sys.exit(main())
  File "/Wakhan/src/main.py", line 360, in main
    wakhan_all(args) #hapcorrect + cna
  File "/Wakhan/src/main.py", line 376, in wakhan_all
    cna_process(args) #cna
  File "/Wakhan/src/main.py", line 495, in cna_process
    coverage_plots_chromosomes(csv_df_coverage, csv_df_phasesets, args, thread_pool)
  File "/Wakhan/src/plots.py", line 380, in coverage_plots_chromosomes
    snps_cpd_means, snps_cpd_lens, df_means_chr = change_point_detection_means(args, chrom, breakpoints_segemnts, df_snps_freqs_chr, ref_start_values, ref_start_values_1, df_centm_chrom, df_loh_chrom)
  File "/Wakhan/src/utils.py", line 1181, in change_point_detection_means
    snps_haplotype1_mean = remove_indices(snps_haplotype1_mean, list(set(indices_cent_hp1[0] + indices_loh_hp1)))
IndexError: index 0 is out of bounds for axis 0 with size 0

Thank you for your help !

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions