Conversation
erikrikarddaniel
left a comment
There was a problem hiding this comment.
I'm a little puzzled, and unsure what the best approach is. Your idea is to first check the presence of files using your curl-based module, the let them go to staging? If you'd do it the other way around, first stage, then check, you'd loose the idea with this I suppose?
You're calling the new module for every file. Some sort of loop over a collected channel would be much more effective. You could perhaps have a module that just returns a channel of correct urls after looping over all? (One might end up with long commands, but that could be dealt with by splitting perhaps? I can't see a way of splitting a channel now though; only based on reading files.)
Even better would be if one could do this directly with nextflow/groovy and avoid the module altogether.
Let's discuss tomorrow.
| # Use curl to check if the URL returns 404 | ||
| if curl -Is "${genome_fna}" | grep -q "404 Not Found"; then | ||
| echo "Broken link: ${genome_fna}" | ||
| exit 0 # Exit successfully but don't emit anything |
There was a problem hiding this comment.
This only works on remote files, right?
Co-authored-by: Daniel Lundin <[email protected]>
Co-authored-by: Daniel Lundin <[email protected]>
|
|
closing as we are addressing the issue with another approach |
PR checklist
nf-core lint).nextflow run . -profile test,docker).docs/usage.mdis updated.docs/output.mdis updated.CHANGELOG.mdis updated.README.mdis updated (including new tool citations and authors/contributors).I tried to address the #90 bug that stops the pipeline when it finds a broken link from outside source (like ncbi).
Now there is a module "check_broken_links.nf" that checks whether the links are broken or not. if they aren't, the links are then used to stage the genome, otherwise it will be discarded.