rush is a tool similar to GNU parallel
and gargs.
rush borrows some idea from them and has some unique features,
e.g.,
supporting custom defined variables,
resuming multi-line commands,
more advanced embeded replacement strings.
These features make rush suitable for easily and flexibly parallelizing
complex workflows in fields like Bioinformatics (see examples 18).
- Features
- Performance
- Installation
- Usage
- Examples
- Special Cases
- Contributors
- Acknowledgements
- Contact
- License
Major:
- Supporting Linux, OS X and Windows (not CygWin)!
- Avoid mixed line from multiple processes without loss of performance,
e.g. the first half of a line is from one process
and the last half of the line is from another process.
(
--line-bufferin GNU parallel) - Timeout (
-t). (--timeoutin GNU parallel) - Retry (
-r). (--retry-failed --joblogin GNU parallel) - Safe exit after capturing Ctrl-C
- Continue (
-c). (--resume --joblogin GNU parallel, but it does not support multi-line commands, which are common in workflow) awk -vlike custom defined variables (-v). (Using Shell variable in GNU parallel)- Keeping output in order of input (
-k). (Same-k/--keep-orderin GNU parallel) - Exit on first error(s) (
-e). (--halt 2in GNU parallel) - Settable record delimiter (
-D, default\n). (--recstartand--recendin GNU parallel) - Settable records sending to every command (
-n, default1). (-n/--max-argsin GNU parallel) - Settable field delimiter (
-d, default\s+). (Same-d/--delimiterin GNU parallel) - Practical replacement strings (like GNU parallel):
{#}, job ID. (Same in GNU parallel){}, full data. (Same in GNU parallel){n},nth field in delimiter-delimited data. (Same in GNU parallel)- Directory and file
{/}, dirname. ({//}in GNU parallel){%}, basename. ({/}in GNU parallel){.}, remove the last extension. (Same in GNU parallel){:}, remove any extension (Not supported in GNU parallel){^suffix}, removesuffix(Not supported in GNU parallel){@regexp}, capture submatch using regular expression (Not supported in GNU parallel)
- Combinations
{%.},{%:}, basename without extension{2.},{2/},{2%.}, manipulatenth field
- Preset variable (macro), e.g.,
rush -v p={^suffix} 'echo {p}_new_suffix', where{p}is replaced with{^suffix}. (Using Shell variable in GNU parallel)
Minor:
- Dry run (
--dry-run). (Same in GNU parallel) - Trim input data (
--trim). (Same in GNU parallel) - Verbose output (
--verbose). (Same in GNU parallel)
Differences between rush and GNU parallel on GNU parallel site.
Performance of rush is similar to gargs, and they are both slightly faster than parallel (Perl) and both slower than Rust parallel (discussion).
Note that speed is not the #.1 target, especially for processes that last long.
rush is implemented in Go programming language,
executable binary files for most popular operating systems are freely available
in release page.
Tip: run rush -V to check update !!!
| OS | Arch | File, (δΈε½ιε) | Download Count |
|---|---|---|---|
| Linux | 32-bit | rush_linux_386.tar.gz, (mirror) | |
| Linux | 64-bit | rush_linux_amd64.tar.gz, (mirror) | |
| OS X | 32-bit | rush_darwin_386.tar.gz, (mirror) | |
| OS X | 64-bit | rush_darwin_amd64.tar.gz, (mirror) | |
| Windows | 32-bit | rush_windows_386.exe.tar.gz, (mirror) | |
| Windows | 64-bit | rush_windows_amd64.exe.tar.gz, (mirror) |
Just download compressed
executable file of your operating system,
and decompress it with tar -zxvf *.tar.gz command or other tools.
And then:
-
For Linux-like systems
-
If you have root privilege simply copy it to
/usr/local/bin:sudo cp rush /usr/local/bin/ -
Or add the current directory of the executable file to environment variable
PATH:echo export PATH=\$PATH:\"$(pwd)\" >> ~/.bashrc source ~/.bashrc
-
-
For windows, just copy
rush.exetoC:\WINDOWS\system32.
go get -u github.com/shenwei356/rush/
rush -- a cross-platform command-line tool for executing jobs in parallel
Version: 0.3.0
Author: Wei Shen <[email protected]>
Homepage: https://github.com/shenwei356/rush
Usage:
rush [flags] [command] [args of command...]
Examples:
1. simple run, quoting is not necessary
$ seq 1 10 | rush echo {}
2. keep order
$ seq 1 10 | rush 'echo {}' -k
3. timeout
$ seq 1 | rush 'sleep 2; echo {}' -t 1
4. retry
$ seq 1 | rush 'python script.py' -r 3
5. dirname & basename & remove suffix
$ echo dir/file_1.txt.gz | rush 'echo {/} {%} {^_1.txt.gz}'
dir file.txt.gz dir/file
6. basename without last or any extension
$ echo dir.d/file.txt.gz | rush 'echo {.} {:} {%.} {%:}'
dir.d/file.txt dir.d/file file.txt file
7. job ID, combine fields and other replacement strings
$ echo 12 file.txt dir/s_1.fq.gz | rush 'echo job {#}: {2} {2.} {3%:^_1}'
job 1: file.txt file s
8. capture submatch using regular expression
$ echo read_1.fq.gz | rush 'echo {@(.+)_\d}'
read
9. custom field delimiter
$ echo a=b=c | rush 'echo {1} {2} {3}' -d =
a b c
10. custom record delimiter
$ echo a=b=c | rush -D "=" -k 'echo {}'
a
b
c
$ echo abc | rush -D "" -k 'echo {}'
a
b
c
11. assign value to variable, like "awk -v"
# seq 1 | rush 'echo Hello, {fname} {lname}!' -v fname=Wei,lname=Shen
$ seq 1 | rush 'echo Hello, {fname} {lname}!' -v fname=Wei -v lname=Shen
Hello, Wei Shen!
12. preset variable (Macro)
# equal to: echo read_1.fq.gz | rush 'echo {:^_1} {:^_1}_2.fq.gz'
$ echo read_1.fq.gz | rush -g -v p={:^_1} 'echo {p} {p}_2.fq.gz'
read read_2.fq.gz
13. save successful commands to continue in NEXT run
$ seq 1 3 | rush 'sleep {}; echo {}' -c -t 2
[INFO] ignore cmd #1: sleep 1; echo 1
[ERRO] run cmd #1: sleep 2; echo 2: time out
[ERRO] run cmd #2: sleep 3; echo 3: time out
14. escape special symbols
$ seq 1 | rush 'echo -e "a\tb" | awk "{print $1}"' -q
a
More examples: https://github.com/shenwei356/rush
Flags:
-v, --assign strings assign the value val to the variable var (format: var=val, val also supports replacement strings)
-c, --continue continue jobs. NOTES: 1) successful commands are saved in file (given by flag -C/--succ-cmd-file); 2) if the file does not exist, rush saves data so we can continue jobs next time; 3) if the file exists, rush ignores jobs in it and update the file
--dry-run print command but not run
-q, --escape escape special symbols like $ which you can customize by flag -Q/--escape-symbols
-Q, --escape-symbols string symbols to escape (default "$#&`")
-d, --field-delimiter string field delimiter in records, support regular expression (default "\\s+")
-h, --help help for rush
-i, --infile strings input data file, multi-values supported
-j, --jobs int run n jobs in parallel (default value depends on your device) (default 4)
-k, --keep-order keep output in order of input
--kill-on-ctrl-c kill child processes on ctrl-c (default true)
-n, --nrecords int number of records sent to a command (default 1)
-o, --out-file string out file ("-" for stdout) (default "-")
--print-retry-output print output from retry commands (default true)
--propagate-exit-status propagate child exit status up to the exit status of rush (default true)
-D, --record-delimiter string record delimiter (default is "\n") (default "\n")
-J, --records-join-sep string record separator for joining multi-records (default is "\n") (default "\n")
-r, --retries int maximum retries (default 0)
--retry-interval int retry interval (unit: second) (default 0)
-e, --stop-on-error stop all processes on first error(s)
-C, --succ-cmd-file string file for saving successful commands (default "successful_cmds.rush")
-t, --timeout int timeout of a command (unit: second, 0 for no timeout) (default 0)
-T, --trim string trim white space (" \t\r\n") in input (available values: "l" for left, "r" for right, "lr", "rl", "b" for both side)
--verbose print verbose information
-V, --version print version information and check for update
-
Simple run, quoting is not necessary
# seq 1 3 | rush 'echo {}' $ seq 1 3 | rush echo {} 3 1 2 -
Read data from file (
-i)$ rush echo {} -i data1.txt -i data2.txt -
Keep output order (
-k)$ seq 1 3 | rush 'echo {}' -k 1 2 3 -
Timeout (
-t)$ time seq 1 | rush 'sleep 2; echo {}' -t 1 [ERRO] run command #1: sleep 2; echo 1: time out real 0m1.010s user 0m0.005s sys 0m0.007s -
Retry (
-r)$ seq 1 | rush 'python unexisted_script.py' -r 1 python: can't open file 'unexisted_script.py': [Errno 2] No such file or directory [WARN] wait command: python unexisted_script.py: exit status 2 python: can't open file 'unexisted_script.py': [Errno 2] No such file or directory [ERRO] wait command: python unexisted_script.py: exit status 2 -
Dirname (
{/}) and basename ({%}) and remove custom suffix ({^suffix})$ echo dir/file_1.txt.gz | rush 'echo {/} {%} {^_1.txt.gz}' dir file_1.txt.gz dir/file -
Get basename, and remove last (
{.}) or any ({:}) extension$ echo dir.d/file.txt.gz | rush 'echo {.} {:} {%.} {%:}' dir.d/file.txt dir.d/file file.txt file -
Job ID, combine fields index and other replacement strings
$ echo 12 file.txt dir/s_1.fq.gz | rush 'echo job {#}: {2} {2.} {3%:^_1}' job 1: file.txt file s -
Capture submatch using regular expression (
{@regexp})$ echo read_1.fq.gz | rush 'echo {@(.+)_\d}' -
Custom field delimiter (
-d)$ echo a=b=c | rush 'echo {1} {2} {3}' -d = a b c -
Send multi-lines to every command (
-n)$ seq 5 | rush -n 2 -k 'echo "{}"; echo' 1 2 3 4 5 # Multiple records are joined with separator `"\n"` (`-J/--records-join-sep`) $ seq 5 | rush -n 2 -k 'echo "{}"; echo' -J ' ' 1 2 3 4 5 $ seq 5 | rush -n 2 -k -j 3 'echo {1}' 1 3 5 -
Custom record delimiter (
-D), note that empty records are not used.$ echo a b c d | rush -D " " -k 'echo {}' a b c d $ echo abcd | rush -D "" -k 'echo {}' a b c d # FASTA format $ echo -ne ">seq1\nactg\n>seq2\nAAAA\n>seq3\nCCCC" >seq1 actg >seq2 AAAA >seq3 CCCC $ echo -ne ">seq1\nactg\n>seq2\nAAAA\n>seq3\nCCCC" | rush -D ">" 'echo FASTA record {#}: name: {1} sequence: {2}' -k -d "\n" FASTA record 1: name: seq1 sequence: actg FASTA record 2: name: seq2 sequence: AAAA FASTA record 3: name: seq3 sequence: CCCC -
Assign value to variable, like
awk -v(-v)$ seq 1 | rush 'echo Hello, {fname} {lname}!' -v fname=Wei -v lname=Shen Hello, Wei Shen! $ seq 1 | rush 'echo Hello, {fname} {lname}!' -v fname=Wei,lname=Shen Hello, Wei Shen! $ for var in a b; do \ $ seq 1 3 | rush -k -v var=$var 'echo var: {var}, data: {}'; \ $ done var: a, data: 1 var: a, data: 2 var: a, data: 3 var: b, data: 1 var: b, data: 2 var: b, data: 3 -
Preset variable (
-v), avoid repeatedly writing verbose replacement strings# naive way $ echo read_1.fq.gz | rush 'echo {:^_1} {:^_1}_2.fq.gz' read read_2.fq.gz # macro + removing suffix $ echo read_1.fq.gz | rush -v p='{:^_1}' 'echo {p} {p}_2.fq.gz' # macro + regular expression $ echo read_1.fq.gz | rush -v p='{@(.+?)_\d}' 'echo {p} {p}_2.fq.gz' -
Escape special symbols
$ seq 1 | rush 'echo "I have $100"' I have 00 $ seq 1 | rush 'echo "I have $100"' -q I have $100 $ seq 1 | rush 'echo "I have $100"' -q --dry-run echo "I have \$100" $ seq 1 | rush 'echo -e "a\tb" | awk "{print $1}"' a b $ seq 1 | rush 'echo -e "a\tb" | awk "{print $1}"' -q a -
Interrupt jobs by
Ctrl-C, rush will stop unfinished commands and exit.$ seq 1 20 | rush 'sleep 1; echo {}' ^C[CRIT] received an interrupt, stopping unfinished commands... [ERRO] wait cmd #7: sleep 1; echo 7: signal: interrupt [ERRO] wait cmd #5: sleep 1; echo 5: signal: killed [ERRO] wait cmd #6: sleep 1; echo 6: signal: killed [ERRO] wait cmd #8: sleep 1; echo 8: signal: killed [ERRO] wait cmd #9: sleep 1; echo 9: signal: killed 1 3 4 2 -
Continue/resume jobs (
-c). When some jobs failed (by execution failure, timeout, or cancelling by user withCtrl + C), please switch flag-c/--continueon and run again, so thatrushcan save successful commands and ignore them in NEXT run.$ seq 1 3 | rush 'sleep {}; echo {}' -t 3 -c 1 2 [ERRO] run cmd #3: sleep 3; echo 3: time out # successful commands: $ cat successful_cmds.rush sleep 1; echo 1__CMD__ sleep 2; echo 2__CMD__ # run again $ seq 1 3 | rush 'sleep {}; echo {}' -t 3 -c [INFO] ignore cmd #1: sleep 1; echo 1 [INFO] ignore cmd #2: sleep 2; echo 2 [ERRO] run cmd #1: sleep 3; echo 3: time outCommands of multi-lines (Not supported in GNU parallel)
$ seq 1 3 | rush 'sleep {}; echo {}; \ echo finish {}' -t 3 -c -C finished.rush 1 finish 1 2 finish 2 [ERRO] run cmd #3: sleep 3; echo 3; \ echo finish 3: time out $ cat finished.rush sleep 1; echo 1; \ echo finish 1__CMD__ sleep 2; echo 2; \ echo finish 2__CMD__ # run again $ seq 1 3 | rush 'sleep {}; echo {}; \ echo finish {}' -t 3 -c -C finished.rush [INFO] ignore cmd #1: sleep 1; echo 1; \ echo finish 1 [INFO] ignore cmd #2: sleep 2; echo 2; \ echo finish 2 [ERRO] run cmd #1: sleep 3; echo 3; \ echo finish 3: time outCommands are saved to file (
-C) right after it finished, so we can view the check finished jobs:grep -c __CMD__ successful_cmds.rush -
A comprehensive example: downloading 1K+ pages given by three URL list files using
phantomjs save_page.js(some page contents are dynamicly generated by Javascript, sowgetdoes not work). Here I set max jobs number (-j) as20, each job has a max running time (-t) of60seconds and3retry changes (-r). Continue flag-cis also switched on, so we can continue unfinished jobs. Luckily, it's accomplished in one run π$ for f in $(seq 2014 2016); do \ $ /bin/rm -rf $f; mkdir -p $f; \ $ cat $f.html.txt | rush -v d=$f -d = 'phantomjs save_page.js "{}" > {d}/{3}.html' -j 20 -t 60 -r 3 -c; \ $ done -
A bioinformatics example: mapping with
bwa, and processing result withsamtools:$ tree raw.cluster.clean.mapping raw.cluster.clean.mapping βββ M1 βΒ Β βββ M1_1.fq.gz -> ../../raw.cluster.clean/M1/M1_1.fq.gz βΒ Β βββ M1_2.fq.gz -> ../../raw.cluster.clean/M1/M1_2.fq.gz ... $ ref=ref/xxx.fa $ threads=25 $ ls -d raw.cluster.clean.mapping/* \ | rush -v ref=$ref -v j=$threads \ 'bwa mem -t {j} -M -a {ref} {}/{%}_1.fq.gz {}/{%}_2.fq.gz > {}/{%}.sam; \ samtools view -bS {}/{%}.sam > {}/{%}.bam; \ samtools sort -T {}/{%}.tmp -@ {j} {}/{%}.bam -o {}/{%}.sorted.bam; \ samtools index {}/{%}.sorted.bam; \ samtools flagstat {}/{%}.sorted.bam > {}/{%}.sorted.bam.flagstat; \ /bin/rm {}/{%}.bam {}/{%}.sam;' \ -j 2 --verbose -c -C mapping.rushSince
{}/{%}appears many times, we can use preset variable (macro) to simplify it:$ ls -d raw.cluster.clean.mapping/* \ | rush -v ref=$ref -v j=$threads -v p='{}/{%}' \ 'bwa mem -t {j} -M -a {ref} {p}_1.fq.gz {p}_2.fq.gz > {p}.sam; \ samtools view -bS {p}.sam > {p}.bam; \ samtools sort -T {p}.tmp -@ {j} {p}.bam -o {p}.sorted.bam; \ samtools index {p}.sorted.bam; \ samtools flagstat {p}.sorted.bam > {p}.sorted.bam.flagstat; \ /bin/rm {p}.bam {p}.sam;' \ -j 2 --verbose -c -C mapping.rush
-
Shell
grepreturns exit code1when no matches found.rushthinks it failed to run. Please usegrep foo bar || trueinstead ofgrep foo bar.$ seq 1 | rush 'echo abc | grep 123' [ERRO] wait cmd #1: echo abc | grep 123: exit status 1 $ seq 1 | rush 'echo abc | grep 123 || true'
Specially thank @brentp
and his gargs, from which rush borrows
some ideas.
Thank @bburgin for his contribution on improvement of child process management.
Create an issue to report bugs, propose new functions or ask for help.