-
Notifications
You must be signed in to change notification settings - Fork 223
Replace check_max() by set_resource() #408
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
nf_core/pipeline-template/{{cookiecutter.name_noslash}}/nextflow.config
Outdated
Show resolved
Hide resolved
|
Just a minor comment @maxulysse regarding the description of the function. So this wont actually fix the problem I was eluding to i.e. the function itself isnt findable when used in a |
|
Ohhh I see, sorry, I didn't catch that. |
ewels
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I remember that we discussed this ages ago. The code is nicer, but it does force everyone to use the nextflow object types - strings such as "8GB" won't work. But I'm not sure that it really matters much, and I guess it's no bad thing to force pipeline authors to be strict with typing..
|
I agree that it's the only issue: |
|
Is that any better than https://github.com/nf-core/rnaseq/blob/37f260d360e59df7166cfd60e2b3c9a3999adf75/conf/base.config#L30 ..? 😉 |
|
Not for the memory, but definitively better for time and cpus: https://github.com/nf-core/rnaseq/blob/37f260d360e59df7166cfd60e2b3c9a3999adf75/conf/base.config#L35 |
|
But "5h" for time also won't work..? And "4" cpus also.. |
|
According to tests we did with @alneberg we only had issues with memory, nothing with time or cpus. |
|
|
|
If anyone fancies demonstrating that this new function works with command line options like I'm still worried about edge cases though, like people specifying memory in bytes as an integer and other weird stuff. I generally prefer the explicitness of the existing code we have personally. |
|
This is the current Sarek: So I'll try Phil suggestion on Sarek to see if it works. |
|
It appears to be working as planned: |
|
ok nice 👍 Then if there is general agreement that this code is better than what was there before, then I'm happy to be outvoted 😄 Would be good to update the parameter names though 👍 (will need updates in schema and possibly also linting?) |
Codecov Report
@@ Coverage Diff @@
## dev #408 +/- ##
=======================================
Coverage 68.14% 68.14%
=======================================
Files 11 11
Lines 1987 1987
=======================================
Hits 1354 1354
Misses 633 633 Continue to review full report at Codecov.
|
|
@drpatelh so you want to change the params name in a separate PR? |
|
I am not entirely sure about |
|
I agree for |
|
But then, it probably makes sense to do all that in just this one PR... |
|
not sure about |
|
Ok, so this issue has been rumbling on for years now. I'm going to try to summarise the decisions so that we can wrap it up.. Please correct me on any of this @maxulysse As I see it, we have discussed three distinct things:
The two renaming points are fairly minor and I don't think that there is any controversy there really. The first point is what this PR is really about and the reason it hasn't been merged yet. As I understand it, the core change in the refactoring is that the new function code guesses the resource type instead of requiring it to be passed as a parameter. This means that we can change the way that the function is called: - time = { check_max( 5.h * task.attempt, 'time' ) }
+ time = { check_max( 5.h * task.attempt ) }My main concern with this at the beginning was that the process of guessing might be difficult, leading to loss of flexibility when providing command line parameters. However @maxulysse has done a good job of testing a bunch of these and showing that they still work. I have an outstanding concern that there will be edge cases where people could provide memory or time without units (bytes, seconds) and that will be interpreted as cpus. But it's an edge case at best and probably unlikely to happen much. My only remaining problem with this change is that it requires updating all configs in all pipelines for what I view as a very minor improvement in code succinctness. Basically, we gain a few keyboard characters when writing configs, but we force all pipeline developers to manually edit all of their configs as well as incurring extra sync overhead. For me, the benefits do not outweigh the costs. However, the change is not big, and most of my initial fears were unfounded. @maxulysse seems super keen on this change, so I'm happy to be overruled and accept this change (hence my positive review above). I am aware that I can be annoyingly stubborn on minor things like this, and I'm quite happy to let it go - @maxulysse has earned this through matching my stubbornness 😅 Hopefully this is a fair summary of where we are. I will leave it to everyone else to decide how to continue. Phil |
|
I think it's a good summary. I think it's a good opportunity to change the function name as it was a bad description of what it was actually doing anyway. Don't really mind about the params, and I do think it's a big change, which can break things up. So can keep that part in another PR. Maybe keep both functions |
Entirely agree on this - renaming these is something different and I think its also helpful if then we have less users misinterpreting the parameter(s) involved across all pipelines.
Same here. I just feel a bit bad about a function that "hides" what it is changing, instead of making it explicitly clear what is changed/modified by the code here. It might be totally unjustified, but saving 9 letters once per pipeline doesn't justify changing the method across all pipelines in my opinion. This has also the potential for side-effects that we currently don't see/estimate/expect, but might happen. Good example for something like this was that config here: nf-core/configs#162, which worked fine with the exception of the case that Phil discovered. Could be similar here and the current code works a.) fine and b.) is very well tested in reality ;-) |
|
OK, so for the 3 points:
As the first one involve removing one argument to the function, I think it's a good opportunity to rename it to an actually more meaningful name.
|
|
I hadn't thought about the fact that renaming the Ok then my vote would be to just change the I'm not super keen on having two functions side by side, I think we should decide which style we want and be done with it. |
|
I'm not sure about changing the params name. For example if I set up --max_cpus 2, then maximum 2 cpus are to be used. What do you think? |
Forget I said anything, this is what the function is actually doing. if I do --max_cpus 2, I'm sure that the maximum amount of cpus used is 2. |
|
I'm giving up on that PR ;-) |
check_max()with an improvedcheck_resource(), which is already used innf-core/sarekPR checklist
docsis updatedCHANGELOG.mdis updatedREADME.mdis updatedLearn more about contributing: https://github.com/nf-core/tools/tree/master/.github/CONTRIBUTING.md