Thanks to visit codestin.com
Credit goes to github.com

Skip to content

htslib has problems with positions above ~500000000 #348

@bunop

Description

@bunop

Dear all,

we faced an issue when dealing with vcf in plant genomes, which can have big chromosomes. Briefly, we can't extract with tabix positions above 5e8, and also softwares which uses htslib have such issue. I've uploaded a gist with a sample VCF. Here are the instruction to reproduce the issue:

Download the latest htslib from github. Compile code, then download moved.vcf from gist. Inspect last vcf lines with tail:

$ tail -2 moved.vcf | cut -f 1-7
12      1060240521      .       G       T       203.196 .
12      1070240615      .       C       A       378.781 .

Now pack the file with bgzip and index it with tabix:

$ ./bgzip moved.vcf 
$ ./tabix moved.vcf.gz

Finally, query the VCF using tabix and chrom 12:

$ ./tabix moved.vcf.gz 12 | tail -2 | cut -f 1-7
12      520164811       .       G       A       73.6504 .
12      530164817       .       T       C       83.1434 .

All lines after 109 (POS 530164817) were lost.

Thanks for your support.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions