-
Notifications
You must be signed in to change notification settings - Fork 829
Pipeline crashes with large input files; Add sequencing center in BAM #56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
apeltzer
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I resolved the merge conflicts. Otherwise lets wait for the tests to pass :-)
|
@apeltzer Thank you! I was doing the same thing but you're so quick and then I saw "the page out of date"...... |
ewels
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great! Couple of minor cosmetic changes, but nothing functional.
conf/base.config
Outdated
| cpus = { check_max( 10 * task.attempt, 'cpus' ) } | ||
| memory = { check_max( 80.GB * task.attempt, 'memory' ) } | ||
| time = { check_max( 8.h * task.attempt, 'time' ) } | ||
| errorStrategy = { task.exitStatus in [1,143,137,104] ? 'retry' : 'terminate' } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default for all processes is set on line 20:
errorStrategy = { task.exitStatus in [143,137] ? 'retry' : 'terminate' }
Instead of adding these lines to these specific processes, we can just update that line with the extra exit statuses and it will apply to everything.
conf/base.config
Outdated
| memory = { check_max( 80.GB * task.attempt, 'memory' ) } | ||
| time = { check_max( 8.h * task.attempt, 'time' ) } | ||
| errorStrategy = { task.exitStatus in [1,143,137,104] ? 'retry' : 'terminate' } | ||
| maxRetries = 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm surprised that the default for all processes is only 1. I think it would be good to set that to 2, then again we don't need this here.
conf/base.config
Outdated
| withName:markDuplicates { | ||
| cpus = { check_max( 2 * task.attempt, 'cpus' ) } | ||
| memory = { check_max( 16.GB * task.attempt, 'memory' ) } | ||
| errorStrategy = { task.exitStatus in [1,143,137] ? 'retry' : 'terminate' } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As above, this isn't needed if we update the line that applies to all processes. We should keep the line below though, as 3 is different to the default.
main.nf
Outdated
| script: | ||
| prefix = reads[0].toString() - ~/(_R1)?(_trimmed)?(_val_1)?(\.fq)?(\.fastq)?(\.gz)?$/ | ||
| def avail_mem = task.memory == null ? '' : "--limitBAMsortRAM ${task.memory.toBytes() - 100000000}" | ||
| RG = params.seqCenter ? "--outSAMattrRGline ID:$prefix 'CN:$params.seqCenter'" : '' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small thing, but I'd prefer a more human readable variable name here. Also I'm not sure if groovy does something special with variable names that are all caps lock - the GitHub syntax highlighting puts it in a different colour to other variable names. I'd prefer for example:
def seqcenter = params.seqCenter ? "--outSAMattrRGline ID:$prefix 'CN:$params.seqCenter'" : ''
main.nf
Outdated
| --met-stderr \\ | ||
| --new-summary \\ | ||
| --summary-file ${prefix}.hisat2_summary.txt \\ | ||
| $RG \\ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When we have options that are likely to be blank, I prefer for them not to be on their own line. Although it shouldn't affect things, it looks strange to see newlines in command scripts. Should be fine to just append the variable to the above line.
| } | ||
| """ | ||
| picard MarkDuplicates \\ | ||
| picard -Xmx${avail_mem}g MarkDuplicates \\ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good spot 👍
|
Changed target branch for PR to |
ewels
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
main.nf
Outdated
| script: | ||
| prefix = reads[0].toString() - ~/(_R1)?(_trimmed)?(_val_1)?(\.fq)?(\.fastq)?(\.gz)?$/ | ||
| def avail_mem = task.memory == null ? '' : "--limitBAMsortRAM ${task.memory.toBytes() - 100000000}" | ||
| def seqCenter = params.seqCenter ? "--outSAMattrRGline ID:$prefix 'CN:$params.seqCenter'" : '' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is now triggering a syntax error in the Travis CI tests:
ERROR ~ Variable `prefix` already defined in the process scope @ line 580, column 68.
ter ? "--outSAMattrRGline ID:$prefix 'CN
^
main.nf
Outdated
| script: | ||
| index_base = hs2_indices[0].toString() - ~/.\d.ht2/ | ||
| prefix = reads[0].toString() - ~/(_R1)?(_trimmed)?(_val_1)?(\.fq)?(\.fastq)?(\.gz)?$/ | ||
| def seqCenter = params.seqCenter ? "--rg-id ${prefix} --rg CN:${params.seqCenter.replaceAll("\\s","_")}" : '' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is now triggering a syntax error in the Travis CI tests:
_nf_script_6da76d62: 631: Variable `prefix` already defined in the process scope @ line 631, column 54.
params.seqCenter ? "--rg-id ${prefix} -
^
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I don't think that you need the .replaceAll("\\s","_") according to @chuan-wang (whitespace is ok). So better to remove this so that it's the same as the other entries in this pipeline.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The whitespace is not allowed in HISAT2, and must be double-quoted in STAR, so I replaced the space to underscore here and used '' in STAR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aha, there are tool specific differences on this? Sad times.. Ok that’s great though, thanks! I’ll just wait for the tests then merge.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just waiting for you to push the updates now @jun-wan :)
ewels
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, that's a separate issue - we can fix that in another PR. This looks great!
update 'dynamic computing resources' in base.config
add params.seqCenter in main.nf to add @rg line in BAM (STAR, HISAT2)