Sentieon pipelines for Amazon Omics
Sentieon supports bioinformatic workflows running on Amazon Omics. The files in this repository can be used to run Sentieon pipelines as private workflows on Amazon Omics or you can use this repository as a starting point for developing customized pipeines that utilize the Sentieon software.
The Sentieon software is a commerical software package and a license is required to run the software. To support workflows running on Amazon Omics, Sentieon operates a dedicated license server for Amazon Omics workflows.
To use the license server for Amazon Omics, you will need to provide Sentieon ([email protected]) your AWS Canonical User ID. You can find your canonical ID by following the instructions at the following link, https://docs.aws.amazon.com/AmazonS3/latest/userguide/finding-canonical-user-id.html.
- Docker cli or another container implementation (Podman, etc.)
- AWS CLI v2
The following files are in the container directory:
sentieon_omics.dockerfile: A dockerfile that can be used to create a Sentieon container image for Amazon Omicsomics_credentials.sh: a shell script to perform license authentication on Amazon Omics
To build the container image for the lastest version of Sentieon, run:
cd ./container
docker build --platform linux/amd64 --build-arg SENTIEON_VERSION=202112.07 -t sentieon:omics-1 -f sentieon_omics.dockerfile .Create a private repository in AWS ECR
aws ecr create-repository --repository-name sentieonLogin to the registry
aws ecr get-login-password --region <region-name> | docker login --username AWS --password-stdin <account-id>.dkr.ecr.<region-name>.amazonaws.comTag the custom Sentieon container and push the container image to the repository
docker tag sentieon:omics-1 <account-id>.dkr.ecr.<region-name>.amazonaws.com/sentieon:omics-1
docker push <account-id>.dkr.ecr.<region-name>.amazonaws.com/sentieon:omics-1Grant the Omics service permission to interact with the repository using the policy in the assets directory
aws ecr set-repository-policy --repository-name sentieon --policy-text file://assets/omics-ecr-repository-policy.jsonAs part of the license validation, the omics_credentials.sh script will obtain a license token from AWS s3 for your workflow. Adding the following policy to your Amazon Omics service role will grant the workflow read access to files in the license bucket for your region:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:GetObjectAcl",
"s3:GetObject"
],
"Resource": [
"arn:aws:s3:::sentieon-omics-license-<region>/*"
]
}
]
}We are now ready to create Sentieon workflows on Amazon Omics. Running the following command at the start of the workflow will configure the environment for the Sentieon software:
source /opt/sentieon/omics_credentials.sh <SENTIEON_LICENSE> <CANONICAL_USER_ID>Where <SENTIEON_LICENSE> is the FQDN and port of the Sentieon license server and <CANONICAL_USER_ID> is the AWS canonical user ID of the account running the workflow.
Example workflows can be found in the examples directory and complete workflow implementations can be found in the workflows directory.
(cd examples/wdl && zip test_sentieon.wdl.zip test_sentieon.wdl)
aws omics create-workflow \
--name test-sentieon-wdl \
--engine WDL \
--definition-zip fileb://examples/wdl/test_sentieon.wdl.zip \
--parameter-template file://examples/parameter.template.json(cd examples/nextflow && zip -r ${OLDPWD}/test_sentieon.nextflow.zip .)
aws omics create-workflow \
--name test-sentieon-nextflow \
--engine NEXTFLOW \
--main test_sentieon.nf \
--definition-zip fileb://test_sentieon.nextflow.zip \
--parameter-template file://examples/parameter.template.jsonThe create-workflow command will output some information including the workflow-id.
To run the example workflow, modify the examples/test.parameters.json file replacing <canonical-id>, <account-id>, and <region-name> to match your environment. Then run the following, using the workflow-id from the create-workflow command and the role-name for your Amazon Omics service role:
aws omics start-run \
--role-arn "arn:aws:iam::<account-id>:role/<role-name>" \
--workflow-id <workflow_id> \
--name "test $(date +%Y%m%d-%H%M%S)" \
--output-uri <s3-uri> \
--parameters file://examples/test.parameters.jsonAfter ~20min, verify that the test workflow completes successfully:
aws omics get-run --id <run-id>You should see a response like:
{
"arn": "arn:aws:omics:<region>:<account-id>:run/<run-id>",
"creationTime": "2023-04-24T17:14:38.880864+00:00",
"digest": "sha256:<sha256>",
"id": "<run-id>",
"name": "test 20230424-101437",
"outputUri": "<s3-output-uri>",
"parameters": {
"canonical_user_id": "<canonical_id>",
"sentieon_docker": "<account-id>.dkr.ecr.<region>.amazonaws.com/sentieon:omics"
},
"resourceDigests": {
"<account-id>.dkr.ecr.<region>.amazonaws.com/sentieon:omics": "sha256:<sha256>"
},
"roleArn": "arn:aws:iam::<account-id>:role/<role-name>",
"startTime": "2023-04-24T17:25:06.021000+00:00",
"startedBy": "arn:aws:iam::<account-id>:<user>",
"status": "COMPLETED",
"stopTime": "2023-04-24T17:39:32.138095+00:00",
"tags": {},
"workflowId": "<workflow-id>",
"workflowType": "PRIVATE"
}The example workflow has only one task called SentieonLicense. Locate this task:
aws omics list-run-tasks --id <run-id>You should see a response like:
{
"items": [
{
"cpus": 1,
"creationTime": "2023-04-24T17:25:40.323030+00:00",
"memory": 4,
"name": "SentieonLicence",
"startTime": "2023-04-24T17:29:42.881000+00:00",
"status": "COMPLETED",
"stopTime": "2023-04-24T17:30:24.442000+00:00",
"taskId": "<task-id>"
}
]
}Get the log-stream for the task:
aws logs get-log-events --log-group-name /aws/omics/WorkflowLog --log-stream-name run/<run-id>/task/<task-id> --output textNote that get-log-events is paginated, and may not return the full log stream for workflows with verbose logs
If license verification is successful, you should see event lines like:
EVENTS 1682357406252 sentieon licclnt ping && echo "Ping is OK" 1682357399707
EVENTS 1682357406252 + sentieon licclnt ping 1682357399707
EVENTS 1682357406252 + echo 'Ping is OK' 1682357400013
EVENTS 1682357406252 sentieon licclnt query Haplotyper 1682357400013
EVENTS 1682357406252 + sentieon licclnt query Haplotyper 1682357400015
EVENTS 1682357406252 Ping is OK 1682357400015
EVENTS 1682357406252 499968 1682357400539
Congratulations! You've successfully run a test workflow with the Sentieon software on Amazon Omics. Feel free to update/extend the example workflow to implement your own custom pipelines with the Sentieon software.
Alternatively, you can find full pipeline implmentations in the workflows directory that you can modify or implement as private workflows.