Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Significant SNP with absence of that SNP #87

@ingridvanw

Description

@ingridvanw

Hello @caitiecollins ,

I am trying to run TreeWAS on a merged vcf file to see which SNPs are significant related to a binairy phenotype (no disease = 0 , disease = 1)
I merged 500 vcf files for staph aureus by bcftools
bcftools merge -m none -0 -O z *.compressed.vcf.gz |bcftools +fill-tags -O z -- -t AN,AC,AF > merged.vcf.gz

  1. Should I filter the vcf file for invariant SNPs or MAF > 0.01?

I did not perform any filtering right now. I have read the vcf file with:

myvcf = read.vcfR(filename_vcf)
dna <- vcfR2genind(myvcf)
snps <- dna@tab

As result of my TreeWAS the terminal score gives almost 12000 significant SNPs back, around 30 significant SNP for simultaneous score and none significant SNP for the subsequent score. It is remarkable that the distribution for G1P1 and G1P0 for the significant SNPs is 0. It seems that those are biological not informative. But how can this happen? Moreover, the sum of G1P1, G0P0, G1P0, G0P1 are not equal to the
total amount of isolates (n=500).

  1. How can G1P1, G1P0 have 0 isolates while Treewas stated them as significant SNPs?
  2. Why is G1P1+G1P0+G0P0+G0P1 is not equal to the total amount of isolates (in this case 500)

out <- treeWAS(snps = snps, phen = phen, tree = tree, chunk.size = 1000, p.value = 0.05, n.snps.sim = 50*ncol(snps), correct.prop = TRUE, filename.plot = treewas_file_SNP)
Output of out$simultaneous$sig.snps

  SNP.locus p.value score G1P1 G0P0 G1P0 G0P1
NC_007795_958488_77653.1 77653 0 1.8585672 0 125 0 50
NC_007795_969239_78739.0 78739 0 -1.7829203 0 109 0 66
NC_007795_706380_58407.1 58407 0 1.6309743 0 124 0 51
 
Thanks in advance!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions