Release PR 1.0.0 #16

carpanz · 2022-10-13T14:32:45Z

This is a PR to make the first release.

We have successfully tested the pipeline on eight 10 exomes datasets of different species (2 Human cohorts, Mouse, Macaca Mulatta, Macaca Fascicularis, Pan Paniscus, Bos Taurus and Canis Familiaris) and an additional larger dataset of 88 Human exomes.

PR checklist

This comment contains a description of changes (with reason).
If you've fixed a bug or added code that should be tested, add tests!
If you've added a new tool - have you followed the pipeline conventions in the contribution docs- [ ] If necessary, also make a PR on the nf-core/hgtseq branch on the nf-core/test-datasets repository.
Make sure your code lints (nf-core lint).
Ensure the test suite passes (nextflow run . -profile test,docker --outdir <OUTDIR>).
Usage Documentation in docs/usage.md is updated.
Output Documentation in docs/output.md is updated.
CHANGELOG.md is updated.
README.md is updated (including new tool citations and authors/contributors).

Documentation update as required in the release PR

carpanz · 2022-10-20T10:42:18Z

Thank you again for the super useful review!
As discussed offline on Slack, I have implemented the documentation with the missing requests in PR #18.

Important suggestion for improvements, such as :

Missing function from template
Tidyverse functions (bind_cols, bind_rows)
Expanding input file extensions (.fq)

are noted and will be implemented in a minor release to be prepared soon.

jfy133

Output and input documentation is much improved. I think only thing missing now is a review of the parameters, for example in your test profiles you use fasta but the parameter documentation only shows genome...? Also you've added help info to the trimgalore trimming parameters but these are still hidden, as are useful things such as bwaindex/bwamem2index etc.

CITATIONS.md

jfy133 · 2022-10-21T08:40:05Z

README.md


-<!-- TODO nf-core: Fill in short bullet-pointed list of the default steps in the pipeline -->
+<p align="center">
+<img src="docs/images/hgtseq_pipeline_metromap.png" alt="nf-core/circdna metromap" width="70%">


Ok some feedback on this, but this is much clearer now I think:

Neon green isn't great, it's very hard to read on white background. I would suggest using the same grey as 'pre-aligned' after the bam

Are you missing an input FASTA route for the alignment step?

README.md

jfy133 · 2022-10-21T08:42:01Z

assets/analysis_report.Rmd

+  intsingle = read_tsv(integrations, col_names = c("read_name", "chr", "position"))
+  data_integrations = rbind(
+    data_integrations,
+    cbind(


For next release

conf/modules.config

docs/output.md

docs/usage.md

jfy133 · 2022-10-21T09:02:14Z

docs/usage.md

+### `--genome`
+
+The user must specify the genome of interest. A list of genomes is available in the pipeline under the folder `conf/igenomes.config`, that contains illumina iGenomes reference file paths


Or is it --fasta? or do you support both?

see above.
I prefer the usage to be simplified and be handled through the iGenomes config.
in some of our tests where genomes are not in iGenomes, we've created a config following the nf-core iGenomes page, and still passed to our pipeline only the --genome param

modules/local/ranalysis/main.nf

Co-authored-by: James A. Fellows Yates <[email protected]>

lescai · 2022-10-21T12:28:51Z

Output and input documentation is much improved. I think only thing missing now is a review of the parameters, for example in your test profiles you use fasta but the parameter documentation only shows genome...? Also you've added help info to the trimgalore trimming parameters but these are still hidden, as are useful things such as bwaindex/bwamem2index etc.

Hi @jfy133 just to make sure I've replied to the correct comment, I'll probably repeat myself here.
Hiding parameters is a valid option, as often it happens you set the granularity to various levels. The params you mention are - in my view as it should be - hidden because they're set through the iGenomes config.
I like this solution nf-core has proposed, precisely because you set fasta, gtf bwa indexes and the likes just by setting the --genome parameter. In the spirit of flexibility though, it doesn't prevent the user to have more control, should they wish: but those params are neither required, nor cluttering the params page where --genome instead is required.
I'll be happy to have a more general conversation with others on how to improve this, but also in light of all the other pipelines already released, I'd probably be happy with the solution we've implemented :)

small changes in docs

added short guideline to genome parameter

Forgot to fix metromap as suggested in release PR review

lescai · 2022-10-24T06:08:15Z

Hi there,
thanks everyone and in particular @jfy133 @FriederikeHanssen for their contribution and minute review of this PR.
I'm taking the liberty to merge, because we do need the release to be linked in the manuscript we are about to submit, and this takes now priority.
I believe all suggestions have been discussed in here and on slack, and thoroughly addressed by @carpanz, particularly in the documentation, which I hope can now serve as example to the work of others in nf-core.
Any outstanding suggestions has been listed and agreed for prioritisation in the next release.
I'm very grateful for the team work and the fruitful exchange during this process.

jfy133 · 2022-10-24T06:29:50Z

Spoken in slack (unfortunately before the merge): I will do one last check before we do the actual release

jfy133

Last few things:

README introduction: the pipeline brackground is still maybe a bit too detailed, now it's not friday - what about:

nf-core/hgtseq is a bioinformatics best-practice analysis pipeline built to investigate horizontal 
gene transfer from NGS data. 

The pipeline uses metagenomic classification of paired-read alignments against a reference 
genome to identify discrepencies in species assignment between within read pairs to identify 
potential integration sites into the host genome.

This will also make the pipeline summary abit clearer I think.

README: The metromap doesn't fully work in darkmode (but not important)

USAGE: Does the pipeline work with single-end reads? My understanding is that it requires pairs , in which case the first line of Input Formats is in correct.
USAGE/Nextflow_schema: make sure you mentio nin both places the krona/kraken2 databases can be tar.gz'd (currently only in parameters)
We have broken markdown links under 'pipeline arguments' https://nf-co.re/hgtseq/dev/usage#pipeline-arguments, likely due to the backticks, I would remove this formatting.
citations.md: missing - bamtools (gh repo only), bwamem2
JSON schema: given you have a fix listed of aligners, you should make this with an enum (press the cog next to the parameter when in the schema build), I think you also say you support aln, so that should be added?

Otherwise once these are in I can give you a retroactive ✅ and we can complete the rest of the release @carpanz @lescai

lescai · 2022-10-24T07:41:14Z

small adjustment to otherwise wonderful summary :)

The pipeline uses metagenomic classification of paired-read alignments against a reference 
genome to identify the presence of non-host microbial sequences within read pairs, and to infer 
potential integration sites into the host genome.

jfy133 · 2022-10-24T12:38:27Z

OK, my final requests have been merged in with #23 #24 etc.! So imagine a ✅

lescai and others added 30 commits June 10, 2022 19:06

debugging

c0ff31c

debugging

2950f4e

debugging

9bc7246

adding radius to report

720c31c

updated params schema

708f90e

adding parsing genome param gff from igenomes gft

57b88bc

change input file in test.config for test

5468126

replace with original input

b9065c9

deleting gff from nextflow.config

aee2da8

reshaped outdir structure

00c50f6

moved markdown file to assets and passing it as a channel

50dc002

fixed publishdir mode by default disabled

998b2a3

re-enabled multiqc

bc340b4

fixed fastq outdir missing

4bcb61d

trying to address untrimmed fastqc

37301e6

trying to address untrimmed fastqc

cb5d461

missing .out in subworkflow outputs

3e62922

fixing multiqc requiring two inputs with updated module version

a7286a7

adding ext.args in krona module to specify taxID column

1341f93

conditional multiqc on kraken if small test

8db6f6c

updated schema

72ff70c

fixing type with new schema

49b6129

fixed json schema for gff

7e4321d

singularity image is now signed

9d853cc

filtering reads in rmarkdown before plotting

cb9de25

testing bam input

f19c4ce

testing bam input

b400e2b

testing bam input

54577b6

testing bam input

a5b38b4

adding test_bam profile

7444723

carpanz and others added 7 commits October 19, 2022 18:48

updated usage docs

8d9303b

updated json schema

085a1c2

added a short description in usage docs

c0e2412

testing changes in json schema

31f247d

testing changes in json schema

ec52287

added qualimap citation

bbd7ab5

Merge pull request #18 from carpanz/dev

2b41937

Documentation update as required in the release PR

jfy133 reviewed Oct 21, 2022

View reviewed changes

carpanz and others added 2 commits October 21, 2022 11:26

Update docs/output.md

8032db5

Co-authored-by: James A. Fellows Yates <[email protected]>

Update docs/output.md

df67f52

Co-authored-by: James A. Fellows Yates <[email protected]>

carpanz and others added 12 commits October 21, 2022 14:54

fixed missing url and brackets in docs

10f9ba3

added a short description in main readme

226078f

Merge pull request #19 from carpanz/dev

73ca03f

small changes in docs

added short guideline to genome parameter

5988620

Merge branch 'nf-core:dev' into dev

01956d8

typos correction

cb14c39

Merge pull request #20 from carpanz/dev

94ca546

added short guideline to genome parameter

replaced metromap with the right one

184d3fc

replaced metromap with the right one

f781927

replaced metromap with the right one

6751109

replaced metromap with the right one

22c0da3

Merge pull request #21 from carpanz/dev

ffbdacb

Forgot to fix metromap as suggested in release PR review

lescai merged commit fa9b7c1 into master Oct 24, 2022

jfy133 reviewed Oct 24, 2022

View reviewed changes

		### `--genome`

		The user must specify the genome of interest. A list of genomes is available in the pipeline under the folder `conf/igenomes.config`, that contains illumina iGenomes reference file paths

Release PR 1.0.0 #16

Release PR 1.0.0 #16

Uh oh!

Conversation

carpanz commented Oct 13, 2022

PR checklist

Uh oh!

carpanz commented Oct 20, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jfy133 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jfy133 Oct 21, 2022

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jfy133 Oct 21, 2022

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jfy133 Oct 21, 2022

Choose a reason for hiding this comment

Uh oh!

lescai Oct 21, 2022

Choose a reason for hiding this comment

Uh oh!

Uh oh!

lescai commented Oct 21, 2022

Uh oh!

lescai commented Oct 24, 2022

Uh oh!

jfy133 commented Oct 24, 2022

Uh oh!

jfy133 left a comment

Choose a reason for hiding this comment

Uh oh!

lescai commented Oct 24, 2022

Uh oh!

jfy133 commented Oct 24, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

carpanz commented Oct 20, 2022 •

edited

Loading