Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@carpanz
Copy link
Collaborator

@carpanz carpanz commented Oct 13, 2022

This is a PR to make the first release.

We have successfully tested the pipeline on eight 10 exomes datasets of different species (2 Human cohorts, Mouse, Macaca Mulatta, Macaca Fascicularis, Pan Paniscus, Bos Taurus and Canis Familiaris) and an additional larger dataset of 88 Human exomes.

PR checklist

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
  • If you've added a new tool - have you followed the pipeline conventions in the contribution docs- [ ] If necessary, also make a PR on the nf-core/hgtseq branch on the nf-core/test-datasets repository.
  • Make sure your code lints (nf-core lint).
  • Ensure the test suite passes (nextflow run . -profile test,docker --outdir <OUTDIR>).
  • Usage Documentation in docs/usage.md is updated.
  • Output Documentation in docs/output.md is updated.
  • CHANGELOG.md is updated.
  • README.md is updated (including new tool citations and authors/contributors).

@carpanz
Copy link
Collaborator Author

carpanz commented Oct 20, 2022

Thank you again for the super useful review!
As discussed offline on Slack, I have implemented the documentation with the missing requests in PR #18.

Important suggestion for improvements, such as :

  • Missing function from template
  • Tidyverse functions (bind_cols, bind_rows)
  • Expanding input file extensions (.fq)

are noted and will be implemented in a minor release to be prepared soon.

Copy link
Member

@jfy133 jfy133 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Output and input documentation is much improved. I think only thing missing now is a review of the parameters, for example in your test profiles you use fasta but the parameter documentation only shows genome...? Also you've added help info to the trimgalore trimming parameters but these are still hidden, as are useful things such as bwaindex/bwamem2index etc.

image


<!-- TODO nf-core: Fill in short bullet-pointed list of the default steps in the pipeline -->
<p align="center">
<img src="docs/images/hgtseq_pipeline_metromap.png" alt="nf-core/circdna metromap" width="70%">
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok some feedback on this, but this is much clearer now I think:

  • Neon green isn't great, it's very hard to read on white background. I would suggest using the same grey as 'pre-aligned' after the bam
  • Are you missing an input FASTA route for the alignment step?

intsingle = read_tsv(integrations, col_names = c("read_name", "chr", "position"))
data_integrations = rbind(
data_integrations,
cbind(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For next release

docs/usage.md Outdated
Comment on lines 104 to 106
### `--genome`

The user must specify the genome of interest. A list of genomes is available in the pipeline under the folder `conf/igenomes.config`, that contains illumina iGenomes reference file paths
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or is it --fasta? or do you support both?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see above.
I prefer the usage to be simplified and be handled through the iGenomes config.
in some of our tests where genomes are not in iGenomes, we've created a config following the nf-core iGenomes page, and still passed to our pipeline only the --genome param

carpanz and others added 2 commits October 21, 2022 11:26
Co-authored-by: James A. Fellows Yates <[email protected]>
Co-authored-by: James A. Fellows Yates <[email protected]>
@lescai
Copy link
Collaborator

lescai commented Oct 21, 2022

Output and input documentation is much improved. I think only thing missing now is a review of the parameters, for example in your test profiles you use fasta but the parameter documentation only shows genome...? Also you've added help info to the trimgalore trimming parameters but these are still hidden, as are useful things such as bwaindex/bwamem2index etc.

image

Hi @jfy133 just to make sure I've replied to the correct comment, I'll probably repeat myself here.
Hiding parameters is a valid option, as often it happens you set the granularity to various levels. The params you mention are - in my view as it should be - hidden because they're set through the iGenomes config.
I like this solution nf-core has proposed, precisely because you set fasta, gtf bwa indexes and the likes just by setting the --genome parameter. In the spirit of flexibility though, it doesn't prevent the user to have more control, should they wish: but those params are neither required, nor cluttering the params page where --genome instead is required.
I'll be happy to have a more general conversation with others on how to improve this, but also in light of all the other pipelines already released, I'd probably be happy with the solution we've implemented :)

@lescai
Copy link
Collaborator

lescai commented Oct 24, 2022

Hi there,
thanks everyone and in particular @jfy133 @FriederikeHanssen for their contribution and minute review of this PR.
I'm taking the liberty to merge, because we do need the release to be linked in the manuscript we are about to submit, and this takes now priority.
I believe all suggestions have been discussed in here and on slack, and thoroughly addressed by @carpanz, particularly in the documentation, which I hope can now serve as example to the work of others in nf-core.
Any outstanding suggestions has been listed and agreed for prioritisation in the next release.
I'm very grateful for the team work and the fruitful exchange during this process.

@lescai lescai merged commit fa9b7c1 into master Oct 24, 2022
@jfy133
Copy link
Member

jfy133 commented Oct 24, 2022

Spoken in slack (unfortunately before the merge): I will do one last check before we do the actual release

Copy link
Member

@jfy133 jfy133 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Last few things:

  • README introduction: the pipeline brackground is still maybe a bit too detailed, now it's not friday - what about:

    nf-core/hgtseq is a bioinformatics best-practice analysis pipeline built to investigate horizontal 
    gene transfer from NGS data. 
    
    The pipeline uses metagenomic classification of paired-read alignments against a reference 
    genome to identify discrepencies in species assignment between within read pairs to identify 
    potential integration sites into the host genome.  
    

    This will also make the pipeline summary abit clearer I think.

  • README: The metromap doesn't fully work in darkmode (but not important)

image

  • USAGE: Does the pipeline work with single-end reads? My understanding is that it requires pairs , in which case the first line of Input Formats is in correct.
  • USAGE/Nextflow_schema: make sure you mentio nin both places the krona/kraken2 databases can be tar.gz'd (currently only in parameters)
  • We have broken markdown links under 'pipeline arguments' https://nf-co.re/hgtseq/dev/usage#pipeline-arguments, likely due to the backticks, I would remove this formatting.
  • citations.md: missing - bamtools (gh repo only), bwamem2
  • JSON schema: given you have a fix listed of aligners, you should make this with an enum (press the cog next to the parameter when in the schema build), I think you also say you support aln, so that should be added?

Otherwise once these are in I can give you a retroactive ✅ and we can complete the rest of the release @carpanz @lescai

@lescai
Copy link
Collaborator

lescai commented Oct 24, 2022

small adjustment to otherwise wonderful summary :)

The pipeline uses metagenomic classification of paired-read alignments against a reference 
genome to identify the presence of non-host microbial sequences within read pairs, and to infer 
potential integration sites into the host genome.

@jfy133
Copy link
Member

jfy133 commented Oct 24, 2022

OK, my final requests have been merged in with #23 #24 etc.! So imagine a ✅

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants