-
Couldn't load subscription status.
- Fork 15
v.1.0.3 - Minor release containing template and documentation updates plus PrepareAA Docker build needed for next release #51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…th - no relative path
|
…recisely. - unsure if it will help a lot
Python linting (
|
Important! Template update for nf-core/tools v2.8
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is generally fine.
Can you go through the scripts and check that they have authors and licences please.
See: https://nf-co.re/docs/contributing/pipeline_release_review_guidelines
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a license attached to this file?
Does it just come straight from the AA repo or have you modified it at all?
| sys.exit(1) | ||
|
|
||
| rdList = hg.interval_list([r for r in rdList0 if float(r.info[-1]) > GAIN]) | ||
| tempL = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you need to change the header to note that you have modified the script too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this change actually makes circdna's modified version of amplified_intervals.py closer to the current version of the amplified_intervals.py script from the AmpliconArchitect maintainer (me). Is it possible to use the unmodified version of amplified_intervals.py script directly from the jluebeck fork of AmpliconArchitect? This way it can be kept updated more easily.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm. It sounds like the easiest thing would be if AmpliconArchitect was put into bioconda. Then these additional helper scripts would just be in the containers and not separate scripts in the pipeline.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it - we are planning to work on this, but I will prioritize it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great! On the nf-core slack there is a bioconda channel if you need help/reviews with that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Simon and Jens, Yes I will update the pipeline with the unmodified version of the script on your fork and add citations and version
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please add an author and licence to this script please.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And this one.
| tag "$meta.id" | ||
| label 'process_low' | ||
|
|
||
| conda "conda-forge::python=2.7 bioconda::pysam=0.15.2 anaconda::flask=1.1.2 anaconda::cython=0.29.14 anaconda::numpy=1.16.6 anaconda::scipy=1.2.1 conda-forge::matplotlib=2.2.5 mosek::mosek=8.0.60 anaconda::future=0.18.2 anaconda::intervaltree=3.0.2" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changing from python 2 to python 3 is a major change isn't it (I'm not a python developer)? Does the script still work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Simon, I'm the developer of PrepareAA - a while back we added support for both python2 and python3 to this collection of tools, so it can be run with either version of python.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Simon and Jens,
I will update the prepareaa.nf containers from python2 and python3 and it works with both. PrepareAA is not yet included in the pipeline, but will be in the next version update. PrepareAA requires a new docker container which hopefully is automatically built after this version release. Then, prepareaa will be included and replaces the helper scripts collect_seeds.nf or amplified_intervals.nf
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Simon and Daniel,
I am working on making a Conda package which will wrap AmpliconSuite-pipeline (PrepareAA.py) and its modules - AmpliconArchitect and AmpliconClassifier. Hopefully will be able to release it by next week or so. I just joined the nf-core slack and circdna channel - happy to discuss there. May ask some questions about conda packages there as well. Thanks!
| tag "$meta.id" | ||
| label 'process_low' | ||
|
|
||
| conda "conda-forge::python=2.7 bioconda::pysam=0.15.2 anaconda::flask=1.1.2 anaconda::cython=0.29.14 anaconda::numpy=1.16.6 anaconda::scipy=1.2.1 conda-forge::matplotlib=2.2.5 mosek::mosek=8.0.60 anaconda::future=0.18.2" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This script hasn't been updated for the newer versions, but the previous one has?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Simon,
good spot. I will leave it as is for now and will update in the next version. It will get removed in the next version as it will be replaced by PrepareAA. The versions need to be cautiously handled as these packages don't have a lot of overlap where all can be simultaneously installed. Because it is working as is I am preferring to stay at the current version
…al python scripts
|
Hi @jluebeck and @SPPearce , PrepareAA is still not included and needs to be included as soon as the docker container was generated. @pditommaso noted that nextflow has a wave option to generate containers on-the-fly. I will try to include it in this release so I will wait to merge until it is clear how I can implement wave into the pipeline. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally, "ecDNA" seems to be the preferred term as it captures both the large megabase-scale oncogene-carrying extrachromosomal circular DNAs, and also includes the small circular extrachromosomal DNAs (which the field studying them call "eccDNAs"). The terminology is a bit confusing and it is not official in any regards, but my suggestion is that using the term "ecDNAs" instead of "eccDNAs" may be slightly less confusing for those who are coming to use the methods in the context of cancer samples (probably the majority of users?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my understanding eccDNAs was supposed to be the term for all circular DNAs including the large amplified and the small ones. EcDNAs is mostly used when talking about large amplified ones found usually in cancer. It seems that this changes from lab to lab and I am not sure what abbreviation should be used from now. However, I agree that most users will most likely be interested in identifying amplified ecDNAs and therefore I will change eccDNAs to ecDNAs in the README.md
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, perhaps if the cirdna workflow supports analysis of both the small and also megabase-scale ecDNA, then it may be worth putting that into information into the Readme and making it clear which modules are designed for either task. AA and its related tools are not at all designed for detection of small (< 5kbp) eccDNAs, only the large, amplified structures.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That seems like a good idea. Will include
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good from the AA side. To confirm - is the workflow now
- start with a bam file
- run PrepareAA.py to get AA_CNV_SEEDS.bed
- run AmpliconArchitect on bam file + AA_CNV_SEEDS.bed
- run the AmpliconClassification stages?
If so that is great - as calling amplified_intervals.py outside of the PrepareAA.py wrapper is not really supported and could cause issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The script that should be run to give users the information for similarity scores is called feature_similarity.py. amplicon_similarity.py is a more general function that is used by feature_similarity. Simply swapping the two script names should work in this case.
EDIT: You will also need to give feature_similarity.py the -f argument, the file for which is generated by amplicon_classification.py and ends with [sample]_features_to_graph.txt
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is my fault the distinction on these two functions is not more clear. I will work on improving our documentaiton.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Understood! This should be easily included in this release.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While optional in the older version of AC wrapped by nf, the argument --summary_map should also be given and it is required in the latest version of AC. That summary_map file is already generated by make_input.sh and is located alongside the .input file.
Also, the [sample]_run_metadata.json file produced by PrepareAA.py, should also be given to make_results_table.py via the --run_metadata_file argument.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
About PrepareAA.py i mentioned that it is not yet included in this release. AmpliconArchitect still runs with the external amplified intervals script and needs to be updated. However, I don't have a docker container for PrepareAA, yet, which includes intervaltree. I will try using the wave strategy from nextflow, but if not, a built-docker-image script is in place that should generate a docker container upon the next pipeline release (after this merge).
After the docker container is created, PrepareAA.py can be fully implemented. This can be facilitated in a few days which then will be also released.
I will let you know if I was able to implement wave. If not, I hope you will be available after this release to verify the correct implementation of prepareaa and offer to review the pipeline again.
Thanks a lot for your time and wonderful insights"
|
@SPPearce Let me know if you identified issues or changes that need to be implemented. I hope I addressed all your change request for a version release. I now stopped using |
I think most of my comments will be best resolved once the whole of AmpliconArchitect is on bioconda, that'll move a lot of the scripts from the pipeline here to just being present in the container. |
I agree. Thanks for reviewing and all your suggestions! |
v1.0.3 - [2023-03-23]
AddedFixedDependenciesDeprecatedPR checklist
nf-core lint).nextflow run . -profile test,docker --outdir <OUTDIR>).docs/usage.mdis updated.docs/output.mdis updated.CHANGELOG.mdis updated.README.mdis updated (including new tool citations and authors/contributors).