Thanks to visit codestin.com
Credit goes to github.com

Skip to content

vg gbwt metadata parsing #3361

@brianabernathy

Description

@brianabernathy

1. What were you trying to do?
I have 3 genome sequences, each with 12 chromosomes. An msa was constructed for each chromosome sequence triplet, which was fed to vg construct. The 12 resulting .vg files were fed to vg combine and vg converted to a single gfa file.

I wanted to create giraffe indexes which included the haplotypes from the gfa paths.

vg gbwt --path-regex "(.*)\.(chr.*)" --path-fields "SC" -G graph.gfa -g graph.gg -o graph.gbwt

My gfa paths are formatted "sampleX.chrXX", so the above command seemed appropriate for parsing the sample and contig metadata from the path names.

Inspecting the .gbwt file reveals...

vg gbwt -M graph.gbwt 
36 paths with names, 36 samples with names, 36 haplotypes, 3 contigs with names
vg gbwt -SL graph.gbwt
sample1.chr01
sample1.chr02
...
sample3.chr12
vg gbwt -CL graph.gbwt
sample1
sample2
sample3
vg gbwt -HL graph.gbwt
36

2. What did you want to happen?
I may be misinterpreting the results, I couldn't find much documentation for these options. vg gbwt -M reporting 36 paths seems reasonable, I was expecting 3 samples and 12 contigs though. According to ---path-regex "(.*)\.(chr.*)" --path-fields "SC" I'd expect the samples (S) to be sample1-sample3 and the contigs (C) to be chr01-chr12 from the sampleX.chrXX path names.

Does the output seem reasonable and I'm misunderstanding the contigs and samples labels? Do I need to include a haplotype id in path names or will this default to a valid value if omitted?

3. What actually happened?
see above

4. If you got a line like Stack trace path: /somewhere/on/your/computer/stacktrace.txt, please copy-paste the contents of that file here:

n/a

5. What data and command can the vg dev team use to make the problem happen?
input data is too large to post

6. What does running vg version say?

vg version v1.33.0 "Moscona"
Compiled with g++ (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0 on Linux
Linked against libstd++ 20200808
Built by anovak@octagon

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions