Applying polygenic scores (PGS) on imputed genotypes
- command line program (works on linux or MacOS)
- supports vcf.gz files (imputed or genotyped)
- supports different filters (e.g. r2 or variant list)
- supports PGS Catalog format (https://www.pgscatalog.org, currently over 2,000 scores)
- creates an interactive html report
- supports liftover of score files
- supports converting rsIDs to positions
- Download
pgs-calc-*.tar.gzfrom latest release - Extract the downloaded archive (e.g
tar -xf pgs-calc-*.tar.gz) - Validate installation with
pgs-calc --version
Applying polygenic scores (PGS) on imputed genotypes:
pgs-calc apply --ref PGS000018 --out PGS000018.scores.txt chr*.dose.noID.vcf.gz --report-html PGS000018.html
The weights for score PGS000018 are downloaded automatically from PGSCatalog and all scores are written to file PGS000018.scores.txt. An interactive report html report is created.
--ref <file(s) or PGS-ID>- score file with weights or a PGS ID. Multiple scores are separated by,(.e.gscore1.txt.gz,score2.txt.gzorPGS000018,PGS000027)--out <file>- Output file name
--minR2 <value>- Use only variants with an imputation quality (R2) >=<value>--writeVariants <file>- Writes csv file with all variants used in calculation--includeVariants <file>- Restrict calculation to use only variants from this csv file--genotypes GT|DS- Use genotypes or dosage--report-html <file>- Creates an interactive html report. The report includes summary statistics (like coverage) for each score and can be filtered by e.g. id or trait.--samples- Restrict calculation to use only samples from this csv file--meta <file>- Use this meta file to annotate scores
- VCF file format (
*.vcfand*.vcf.gz) - one VCF file per chromosome (e.g. output of Imputationserver)
- works out of the box with imputed genotypes from Michigan Imputation Server
pgs-calc supports PGSCatalog out of the box: open the website, find your score of interest and download the provided txt.gz files.
As pgs-calc works with chromosomal positions and not with marker ids, the following requirements must be fulfilled:
- The build of your genotypes and the build of the score must be the same. If the score is on a different build, you can use the
pgs-calc resolvecommand to lift over to the build of the genotypes. - The score file needs
chr_nameandchr_positioncolumns. If there is onlyrsIDpresent, you need to set the parameter--dbsnpand the correct index to convert rsIDs on the fly to the correct chromosomal positions. Depending on the build of your genotypes (hg19 or hg38) you can download the dbsnp-index from here. - The column
other_alleleis mandatory to handle multi-allelic variants in an unified way.
If you want to create your own weight files, you need a tab-delimited text file with the following columns:
chr_name chr_position effect_allele other_allele effect_weight
Apply PGS to a single file (e.g. one chromosome):
pgs-calc apply --ref PGS000018.txt.gz test.chr1.vcf.gz --out scores.txt
All scores are written to file scores.txt
Apply PGS to multiple files (e.g. multiple chromosomes):
pgs-calc apply --ref PGS000018.txt.gz test.chr1.vcf.gz test.chr2.vcf.gz test.chr3.vcf.gz test.chr4.vcf.gz --out scores.txt
Apply PGS to multiple files by using file patterns:
pgs-calc apply --ref PGS000018.txt.gz test.chr*.vcf.gz --out scores.txt
Apply multiple score files:
pgs-calc apply --ref PGS000018.txt.gz,PGS000027.txt.gz test.chr*.vcf.gz --out scores.txt
You can also create a file scores_filenames.txt that lists all paths to your score files:
scores
PGS000018.txt.gz
PGS000027.txt.gz
pgs-calc apply --ref scores_filenames.txt test.chr*.vcf.gz --out scores.txt
Attention: All paths inside the file are relative to the location of the file itself.
Use only variants with an imputation quality (R2) >= 0.9:
pgs-calc apply --ref PGS000018.txt.gz test.chr*.vcf.gz --minR2 0.9 --out scores.txt
If a PGS id is provided, pgs-calc downloads the file from PGSCatalog automatically:
pgs-calc apply --ref PGS000018 test.chr1.vcf.gz --out scores.txt
All scores are written to file scores.txt.
You can also use the download command to download a specific PGS id:
pgs-calc download PGS000018 --out PGS000018.txt.gz
The weights are saved in file PGS002297.txt.gz.
If the --dbsnp parameter is set, pgs-calc converts on the fly all rsID automatically to their positions. Depending on the build of your genotypes (hg19 or hg38) you can download the dbsnp index from here.
pgs-calc apply --ref PGS002297 test.chr1.vcf.gz --out scores.txt --dbsnp dbsnp154_hg19.txt.gz
All scores are written to file scores.txt
The build of your genotypes and the score must be the same. If the score is on a different build, you can use the pgs-calc resolve command to lift over the score file to the build of the genotypes. You need a dbsnp-index file and a chain file.
pgs-calc resolve --in PGS002297 --out PGS002297.hg38.txt.gz --dbsnp dbsnp154_hg38.txt.gz --chain hg19_to_hg38.over.chain.gz
The new positions are written to file PGS002297.hg38.txt.gz and this file can the be used by pgs-calc apply.
- dbsnp-index files to resolve rsIDs: https://imputationserver.sph.umich.edu/resources/chain/
- Chain files: https://imputationserver.sph.umich.edu/resources/chain/
Lukas Forer, Institute of Genetic Epidemiology, Medical University of Innsbruck