-
Couldn't load subscription status.
- Fork 19
Description
Hello @caitiecollins ,
I am trying to run TreeWAS on a merged vcf file to see which SNPs are significant related to a binairy phenotype (no disease = 0 , disease = 1)
I merged 500 vcf files for staph aureus by bcftools
bcftools merge -m none -0 -O z *.compressed.vcf.gz |bcftools +fill-tags -O z -- -t AN,AC,AF > merged.vcf.gz
- Should I filter the vcf file for invariant SNPs or MAF > 0.01?
I did not perform any filtering right now. I have read the vcf file with:
myvcf = read.vcfR(filename_vcf)
dna <- vcfR2genind(myvcf)
snps <- dna@tab
As result of my TreeWAS the terminal score gives almost 12000 significant SNPs back, around 30 significant SNP for simultaneous score and none significant SNP for the subsequent score. It is remarkable that the distribution for G1P1 and G1P0 for the significant SNPs is 0. It seems that those are biological not informative. But how can this happen? Moreover, the sum of G1P1, G0P0, G1P0, G0P1 are not equal to the
total amount of isolates (n=500).
- How can G1P1, G1P0 have 0 isolates while Treewas stated them as significant SNPs?
- Why is G1P1+G1P0+G0P0+G0P1 is not equal to the total amount of isolates (in this case 500)
out <- treeWAS(snps = snps, phen = phen, tree = tree, chunk.size = 1000, p.value = 0.05, n.snps.sim = 50*ncol(snps), correct.prop = TRUE, filename.plot = treewas_file_SNP)
Output of out$simultaneous$sig.snps
| SNP.locus | p.value | score | G1P1 | G0P0 | G1P0 | G0P1 | |
|---|---|---|---|---|---|---|---|
| NC_007795_958488_77653.1 | 77653 | 0 | 1.8585672 | 0 | 125 | 0 | 50 |
| NC_007795_969239_78739.0 | 78739 | 0 | -1.7829203 | 0 | 109 | 0 | 66 |
| NC_007795_706380_58407.1 | 58407 | 0 | 1.6309743 | 0 | 124 | 0 | 51 |
| Thanks in advance! |