This repository contains the source code for the UK Biobank PRS Evaluation Tool developed by Genomics plc as part of the Thompson et al publication https://www.medrxiv.org/content/10.1101/2022.06.16.22276246v1.
This tool will be made available via the UK Biobank Research Analysis Platform and may be used to compare the performance of Polygenic Scores against the models published in Thompson et al.
You may only use this tool under the terms of the licence.
ukb-pret is a software package which compares the performance of two PRS in predicting an outcome phenotype.
This tool has been designed to evaluate an input set of PRS against a PRS selected from the UKB PRS release when
used as predictors for a set of input phenotype values for a relevant trait.
ukb-pret can be used by any registered user of the
UK Biobank Research Analysis Platform
with access to the UK Biobank data and the UK Biobank PRS Release data. Simply load a project with access to the
UK Biobank data and the UK Biobank PRS Release data, and follow the steps below.
In order to compare your own PRS against one from the UKB PRS release, you will need the following:
- A RAP project with dispensed data including the PRS release fields
- dxCLI tool installed
- docker to build the ukb-pret code for use in the RAP (with docker engine >v17.05)
- Your PRS scores in a csv file
- Your phenotype values in a csv file
Specific instructions on how to install/prepare each of these prerequisites can be found below
If you don't already have this installed, type pip3 install dxpy into the command
line.
dxpy can also be installed
from a tarball
on Ubuntu, MacOS, Windows, CentOS, Red Hat Enterprise Linux and Scientific Linux 5/6/7 machines.
Details can be found on the
DNANexus website,
and full documentation for dxpy can be found  here.
The first step in running ukb-pret on the RAP is to export the PRS of interest and other required data to a CSV file.
This is done using the DNANexus tool table-exporter. A helper script is provided to run this tool and generate the
input CSV:
- 
Choose the PRS you are interested in comparing against from the list here and note the "trait code" identifier. For example, the trait code for "PRS for enhanced age at menopause (AAM)"isAAM
- 
Log into DNANexus by typing dx loginto the command line, and select the project containing the UKB PRS dataset
- 
Execute the run_table_exporter.shscript. The script takes two arguments to identify your desired comparison PRS from step 1 and your Spark dataset ID. For example:./run_table_exporter.sh AAM app1234_20221008123456 The pipeline should now run automatically, creating a CSV file named table_exporter_output_<trait code>.csvin your project directory containing some data that are required to runukb-pret
The second step is to generate a Docker image containing an installation of the ukb-pret tool*.
To do this, enter the top-level parent ukb-pret directory and run
docker build -t ukb-pret-docker -f docker/Dockerfile .
This will generate a Docker image named ukb-pret-docker, which should be converted to .tar.gz format and uploaded to
the RAP for use with the ukb-pret DNANexus app.
Convert to a compressed tarball:
docker save ukb-pret-docker | gzip > ukb-pret-docker.tar.gz
and upload to DNANexus platform
dx upload ukb-pret-docker.tar.gz
*An installation of docker engine >v17.05 is required
The third step is create a CSV file containing the PRS you wish to compare against the UKB PRS release, and upload it to your DNANexus project.
This file should contain two columns [eid,<data_tag>], where <data_tag> is the field used to
identify the PRS in the output.
Next upload the file to DNANexus with dx upload <my_prs_file>.csv, replacing <my_prs_file> with your own CSV path.
The final step is to build and run the ukb-pret app on the platform.
To build the app, go to the ukb-pret/app directory and type
dx build ukb-pret
The app should now be built and available to view on the DNANexus platform.
Go to the online GUI, click on the app, and select your inputs through the GUI. These inputs should be:
- The PRS file generated by the table-exporter app
- The PRS you wish to compare to the UKB PRS release (as per instructions in the previous section)
- The phenotype file you wish to evaluate your PRS performance in (the headers should be be [eid,<trait_code>], where<trait_code>can either refer to a UKB PRS release phenotype definition or to a user-defined trait. Additionally, this file can include the following columns to enable survival analysis in binary traits:[age_at_first_assessment,date_of_diagnosis,date_of_first_assessment,date_of_death,incident])*
- The docker tarball you uploaded to the platform
Click the "Run" button to commence the analysis. Once the job is finished, you will find a PDF named
prs_evaluation_report.pdf containing useful plots, metrics and summary statistics that describe and compare the two
input PRS. A CSV named evaluation_metrics.csv is also produced which can be used to create your own bespoke plots
and figures from the raw data.
*Notebooks for generating some of the Enhanced PRS phenotypes can be found in the pheconstructors directory (see the README for instructions on how to run these in the RAP)
ukb-pret can be run and installed locally to compare the performance of two PRS in predicting a
phenotype, provided the inputs conform to the formats defined below.
Please note that survival analysis will only be performed if the fields
[age_at_first_assessment,date_of_diagnosis,date_of_first_assessment,date_of_death,incident] are provided in the
phenotype file.
Ancestry-stratified analysis will only be performed if a valid principal components file is provided.
Installation instructions to follow via pip
The ukb-pret tool can also be used more generally to compare the performance of any two sets of PRS scores,
for prediction of a phenotype. In this feature, all inputs come directly from the user and are called using the evaluate-prs entrypoint.
Instructions on how to use ukb-pret's evaluate-prs CLI tool can be viewed by typing evaluate-prs --help
into your terminal:
usage: evaluate-prs [-h] --prs-files PRS_FILES PRS_FILES --pheno-file
                    PHENO_FILE [--pcs-file PCS_FILE] [--output-dir OUTPUT_DIR]
A CLI tool for evaluating a set of PRS against a phenotype.
optional arguments:
  -h, --help            show this help message and exit
  --prs-files PRS_FILES PRS_FILES
                        Paths to two files, each containing Polygenic Risk
                        Score (PRS) and participant eIDs in CSV format.
                        Headers should be [eid,<data_tag>], where <data_tag>
                        is a field without spaces or special characters that
                        is used to identify the PRS in the output (REQUIRED)
  --pheno-file PHENO_FILE
                        Paths to a file containing phenotype data and
                        participant eIDs in CSV format. Headers should contain
                        at least [eid,<trait_code>], where <trait_code> is a
                        field without spaces or special characters that can
                        either correspond to an existing Gplc phenotype
                        definition or be defined by the user. [sex] can also
                        be included as a header for stratified analysis using
                        coding {0: female, 1: male}. Additionally, this file
                        can include the following columns to enable survival
                        analysis in binary traits: ['age_at_first_assessment',
                        'date_of_diagnosis', 'date_of_first_assessment',
                        'date_of_death', 'incident'] (REQUIRED)
  --pcs-file PCS_FILE   Path to a file containing the first 4 genetically
                        inferred principal components. Headers should be
                        [eid,pc1,pc2,pc3,pc4] (OPTIONAL) (when omitted,
                        evaluation is carried out across all ancestries & the
                        report does not contain a quality control section)
  --output-dir OUTPUT_DIR
                        Output directory for evaluation report and CSV
                        containing metrics (default is current working
                        directory) (OPTIONAL)
Please feel free to contact us at [email protected]