The purpose of this repository is to serve as a nice template to align with the values in DFWED.
The BaseSpace automation pipeline scans your BaseSpace account for all available project data sets. It checks your local step_mothur_log file to determine if a specific project data set has been downloaded and/or analyzed by step_mothur. It reads all required parameters from a config file (config.ini)
Since the pipeline will run step_mothur pipeline, you need to set up the environment properly, please refer to step_mothur github repo.
And if this is the first time you're running BaseSpace automation pipeline, you will also need to install a python schedule module , by pip install schedule (so the pipeline runs periodically at given time slots). Or follow the instructions below.
This script checks for new projects under your BaseSpace account, downloads them, and runs step_mothur on those new projects.
- copy over the config file
cp config.ini.template config.ini(update necessary path variables in config.ini appropriately)
- Start a tmux session
tmux new-session -s BD_SM (or any name you prefer)- Insure pip is installed
python -m ensurepip --default-pip- Install required Python libraries
python -m pip install --user schedule
python -m pip install --user filelock- Run the main script
python bscli_fq_downloader.py -c config.ini- This will check for all new projects under your BaseSpace account.
- 2 log files are required to run the pipeline.
sphl_code_logandstep_mothur_logsphl_code_logis a 3 column (tab delimited) text file as: basespace owner_id, basespace owner_name, state_codestep_mothur_logis a 6 column (tab delimited) text file as: project_name, project_id, run_id, owner_id, timestamp, status. To run the pipeline for the first time, you can generate an emptystep_mothur_logwith commandtouch step_mothur_log
- Successfully processed projects are logged in
step_mothur_log, ensuring they are not processed again. - the other 2 log files are also useful for debugging purpose:
stepmothur_download.logandnextflow.log(this log file is in the folder where we run SM pipeline).
run_idis formatted as: HMAS_{last_run_number: 3-digit padding}_{sphl_code},sphl_codeneed to exist inSPHL_CODE_LOG, or the script will stop and notify you thourh an email.- there are a few configuration variables with default values in the beginning of the script, which need to be updated when running in a different environment.
Add the flowchart for the pipeline.
Before running these scripts, ensure:
- You have installed BaseSpace Sequence Hub CLI (documentation).
- You have authenticated your BaseSpace account by running:
bs auth
- You might need these info for the authentication.
Step-mothur-downloader v1.0.0
Client Id
99290fe834f241ff8d19df9edbc4f250
Client Secret
8624c0cf939044a3a72c1b7d397f46ff
Access Token
902ebaa7c85342fe9f0decb6ee68bf4a
Or use CIMS basespace account credentials to log in This repository constitutes a work of the United States Government and is not subject to domestic copyright protection under 17 USC § 105. This repository is in the public domain within the United States, and copyright and related rights in the work worldwide are waived through the CC0 1.0 Universal public domain dedication. All contributions to this repository will be released under the CC0 dedication. By submitting a pull request you are agreeing to comply with this waiver of copyright interest.
This repository contains only non-sensitive, publicly available data and information. All material and community participation is covered by the Disclaimer and Code of Conduct. For more information about CDC's privacy policy, please visit http://www.cdc.gov/other/privacy.html.
Anyone is encouraged to contribute to the repository by forking and submitting a pull request. (If you are new to GitHub, you might start with a basic tutorial.) By contributing to this project, you grant a world-wide, royalty-free, perpetual, irrevocable, non-exclusive, transferable license to all users under the terms of the Apache Software License v2 or later.
All comments, messages, pull requests, and other submissions received through CDC including this GitHub page may be subject to applicable federal law, including but not limited to the Federal Records Act, and may be archived. Learn more at http://www.cdc.gov/other/privacy.html.
This repository is not a source of government records, but is a copy to increase collaboration and collaborative potential. All government records will be published through the CDC web site.