We have released a new polisher PEPPER that replaces MarginPolish-HELEN. If you have newer data Guppy >= 3.0.5 please use PEPPER instead of MarginPolish-HELEN. PEPPER is fully supported by our team.
H.E.L.E.N. (Homopolymer Encoded Long-read Error-corrector for Nanopore)
HELEN is published in Nature Biotechnology:
Nanopore sequencing and the Shasta toolkit enable efficient de novo assembly of eleven human genomes
HELEN uses a Recurrent-Neural-Network (RNN) based Multi-Task Learning (MTL) model that can predict a base and a run-length for each genomic position using the weights generated by MarginPolish.
© 2020 Kishwar Shafin, Trevor Pesout, Benedict Paten.
Computational Genomics Lab (CGL), University of California, Santa Cruz.
MarginPolish-HELENoutperforms other graph-based and Neural-Network based polishing pipelines.- Simple installation steps.
HELENcan use multiple GPUs at the same time.- Highly optimized pipeline that is faster than any other available polishing tool.
- We have sequenced-assembled-polished 11 samples to ensure robustness, runtime-consistency and cost-efficiency.
- We tested GPU usage on
Amazon Web Services (AWS)andGoogle Cloud Platform (GCP)to ensure scalability. - Open source (MIT License).
MarginPolish-HELEN is supported on Ubuntu 16.10/18.04 or any other Linux-based system.
Â
Before you follow any of the methods, make sure you install all the dependencies:
sudo apt-get -y install git cmake make gcc g++ autoconf bzip2 lzma-dev zlib1g-dev \
libcurl4-openssl-dev libpthread-stubs0-dev libbz2-dev liblzma-dev libhdf5-dev \
python3-pip python3-virtualenv virtualenvYou can install from the GitHub repository:
git clone https://github.com/kishwarshafin/helen.git
cd helen
make install
. ./venv/bin/activate
helen --help
marginpolish --helpEach time you want to use it, activate the virtualenv:
. <path/to/helen/venv/bin/activate>Install prerequisites and the install MarginPolish-HELEN using pip:
python3 -m pip install helen --user
python3 -m helen.helen --help
python3 -m helen.marginpolish --helpUpdate the installed version:
python3 -m pip install update pip
python3 -m pip install helen --upgradeYou can also add module locations to path:
echo 'export PATH="$(python3 -m site --user-base)/bin":$PATH' >> ~/.bashrc
source ~/.bashrc
marginpolish --help
helen --help# SEE CONFIGURATION
docker run --rm -it --ipc=host kishwars/helen:latest helen --help
docker run --rm -it --ipc=host kishwars/helen:latest marginpolish --help
docker run -it --ipc=host --user=`id -u`:`id -g` --cpus="16" \
-v </directory/with/inputs_outputs>:/data kishwars/helen:latest \
helen --helpsudo apt-get install -y nvidia-docker2
# SEE CONFIGURATION
nvidia-docker run -it --ipc=host kishwars/helen:latest helen torch_stat
nvidia-docker run -it --ipc=host kishwars/helen:latest helen --help
nvidia-docker run -it --ipc=host kishwars/helen:latest marginpolish --help
# RUN HELEN
nvidia-docker run -it --ipc=host --user=`id -u`:`id -g` --cpus="16" \
-v </directory/with/inputs_outputs>:/data kishwars/helen:latest \
helen --helpMarginPolish requires a draft assembly and a mapping of reads to the draft assembly. We commend using Shasta as the initial assembler and MiniMap2 for the mapping.
Generate an assembly using one of the ONT assemblers:
We recommend using MiniMap2 to generate the mapping between the reads and the assembly. You don't have to follow these exact commands.
minimap2 -ax map-ont -t 32 shasta_assembly.fa reads.fq | samtools view -hb -F 0x904 > unsorted.bam;
samtools sort -@32 -o reads_2_assembly.0x904.bam unsorted.bam;
samtools index -@32 reads_2_assembly.0x904.bamhelen download_models \
--output_dir <path/to/mp_helen_models/>You can generate images using MarginPolish by running:
marginpolish reads_2_assembly.bam \
Assembly.fa \
</path/to/model_name.json> \
-t <number_of_threads> \
-o <path/to/marginpolish_images> \
-fYou can find the models by downloading them.
Next, run HELEN to polish using a RNN.
helen polish \
--image_dir </path/to/marginpolish_images/> \
--model_path </path/to/model.pkl> \
--batch_size 256 \
--num_workers 4 \
--threads <num_of_threads> \
--output_dir </path/to/output_dir> \
--output_prefix <output_filename.fa> \
--gpu_modeIf you are using CPUs then remove the --gpu_mode argument.
Please open a github issue if you face any difficulties.
We are thankful to Segey Koren and Karen Miga for their help with CHM13 data and evaluation.
We downloaded our data from Telomere-to-telomere consortium to evaluate our pipeline against CHM13.
We acknowledge the work of the developers of these packages:
The name "HELEN" is inspired from the A.I. created by Tony Stark in the Marvel Comics (Earth-616). HELEN was created to control the city Tony was building named "Troy" making the A.I. "HELEN of Troy".
READ MORE: HELEN
© 2020 Kishwar Shafin, Trevor Pesout, Benedict Paten.