SparkAligner is generalized version of SparkBWA with support for modules. These modules can be used to add support for other aligners.
The following aligners are currently supported:
spark-submit \
--class com.github.sparkaligner.SparkAligner \
sparkaligner.jar \
<name of aligner to use> \
[<aligner specific options>]
Ex.
spark-submit \
--class com.github.sparkaligner.SparkAligner \
sparkaligner.jar \
bwa \
-algorithm mem \
-R /data/hg19/hg19.fasta \
-partitions 2 \
-I /data/input/datasets
The docker image can be built using
docker build --no-cache -t <name of Docker image> .
It can also be found here. This docker image also downloads a test dataset (the Lambda phage dataset) from f.128.no).
docker run \
-it \
-v <path to data folder>:<path to mount the data folder inside to container> \
paalka/spark-aligner
<regular spark-aligner arguments here>
In order to keep the SAM files, you need to mount the data directory to the container, ex:
docker run -it \
-v <path to test data>:/test_data \
paalka/spark-aligner bwa \
-algorithm mem \
-R /data/reference/lambda_virus.fa \
-I /test_data/<test_data_folder> \
-partitions 2 -bwaArgs "-t 4"
Make sure to clone the project using git clone --recursive, as it uses
submodules.
The JAR can be built by using make.
The folder aligners contains the code for each module. New aligners are
required to extends the abstract class BaseAligner, which performs most of
the Spark related work. This means that in order to add a new aligner, you
only need to specify how to process the arguments for the aligner, and manage
how the aligner will be run.