Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@jpfeuffer
Copy link
Contributor

For repos that follow the nf-core template but need authentication.

PR checklist

  • This comment contains a description of changes (with reason)
  • CHANGELOG.md is updated
  • If you've fixed a bug or added code that should be tested, add tests!
  • Documentation in docs is updated

@jpfeuffer jpfeuffer requested a review from fabianegli June 5, 2025 15:20
@github-actions

This comment was marked as outdated.

@jpfeuffer jpfeuffer changed the base branch from main to dev June 5, 2025 15:21
@jpfeuffer
Copy link
Contributor Author

Oops wrong base branch. Fixed.

@jpfeuffer
Copy link
Contributor Author

jpfeuffer commented Jun 6, 2025

Done. Test failures (most likely) unrelated or at least I have no idea what they mean.

@jpfeuffer jpfeuffer requested a review from mashehu June 6, 2025 09:29
@MatthiasZepper
Copy link
Member

Done. Test failures (most likely) unrelated or at least I have no idea what they mean.

There is a problem mapping your CLI arguments to the parameters of the DownloadWorkflow class init() function. It seems they are disordered, and therefore you get the TypeError("object of type 'bool' has no len()")

Apart from that, please mind that there is a major refactor of Downloads ongoing (#3634). Preferably, all new contributions would already be based on and point to this new structure.

@JulianFlesch
Copy link
Contributor

How does this fit with the pipeline downloads refactoring @MatthiasZepper @jpfeuffer?
Is this salvageable or maybe already covered by the PRs that were merged?

@jpfeuffer
Copy link
Contributor Author

jpfeuffer commented Aug 29, 2025

Good question. Who knows more about the refactor?
From what I gathered from @MatthiasZepper s comment this might not yet be covered.
This is mainly about the repo download, not the download of the individual containers etc.
(Which from what I could see was the main focus of the refactor but judging from its size probably not the only focus)

@jpfeuffer
Copy link
Contributor Author

Yes, I checked. And the mechanism for a download of the repo data/files is still the same.
The patch can be carried over as-is, just for the new folder structure.

@jpfeuffer
Copy link
Contributor Author

And well there needs to be a little fix for the CLI apparently. Note that I just added the CLI option because I still wanted to allow downloads from the Zip URLs because they will never be rate-limited. While non-authenticated downloads from the API can be.

@codecov
Copy link

codecov bot commented Sep 18, 2025

Codecov Report

❌ Patch coverage is 60.00000% with 8 lines in your changes missing coverage. Please review.
✅ Project coverage is 77.26%. Comparing base (3afb111) to head (c113b81).
⚠️ Report is 7 commits behind head on dev.

Files with missing lines Patch % Lines
nf_core/pipelines/download/download.py 66.66% 5 Missing ⚠️
nf_core/commands_pipelines.py 0.00% 3 Missing ⚠️

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@jpfeuffer
Copy link
Contributor Author

jpfeuffer commented Sep 18, 2025

I quickly let copilot rebase the changes. (and fixed the cli bug)
If you want more code coverage you would need to think about a test case with a Git token. I am not familiar enough with your internal CI to suggest a way forward here.
@JulianFlesch

@jpfeuffer jpfeuffer requested review from JulianFlesch and MatthiasZepper and removed request for fabianegli and mashehu September 18, 2025 14:17
Copy link
Member

@MatthiasZepper MatthiasZepper left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Julian has kindly taken over the maintainer role for Downloads, so I leave the ultimate decision to him, but I am leaning towards a few changes still.

default=4,
help="Number of allowed parallel tasks",
)
@click.option(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some reservations regarding the--api-downloadoption from a user experience perspective. The current solution exposes implementation details that users shouldn't need to think about. Instead, we could focus on what the user actually wants to achieve - authenticated vs. anonymous downloads.

Consider something like --authenticated and a help text like Enable authenticated download (with better rate limits, access to private repos, etc.) instead.

Potentially, even --auth-method <method> to future-proof it, if we want to support multiple authentication methods in the future. For example, the option to download pipelines not hosted on GitHub has also been a long-standing request and similar features could then be successively added without renaming / replacing too many CLI arguments.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MatthiasZepper while I agree with the switch to authenticated, I am not sure if auth_method is helpful here, even in the future.
I think there is usually only one method for each SCM/git provider and even if there are more, you would require extensive logic to make sure that people do not use "github authentication" when actually wanting to download from gitlab.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we just use gh_api.get instead of requests.get in nf_core/pipelines/download/download.py, no new flag should be needed at all. Or am I missing something?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I vaguely recall that the problem was the need to authenticate at the API even for public repositories. For us developers, who anyway have some key-based authentication or token set up for GitHub, this is essentially unnoticeable.

But for ordinary users, who just get started with nf-core and would like to download their first pipeline, this represents a significant obstacle since they do barely understand the error message or might not even have a GitHub account.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes kind of. I think it is possible to use the API unauthenticated, too but you would be rate limited more easily (which is not the case for the public zip download url, afaik). Since I don't know how many requests the CI or some power user does in a short timeframe I decided to put it behind a flag.

url = requests.get(download_url)
with ZipFile(io.BytesIO(url.content)) as zipfile:
zipfile.extractall(self.outdir)
if not self.api_download:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the flow in this function could have a nicer flow for better maintainability of the code.

For example, the download_url for the anonymous download is assembled in line 467ff:

if not self.platform:
  for revision, wf_sha in self.wf_sha.items():
      # Set the download URL and return - only applicable for classic downloads
      self.wf_download_url = {
          **self.wf_download_url,
          revision: f"https://github.com/{self.pipeline}/archive/{wf_sha}.zip",
      }

That is logically where also the api_url should be created.

If you refactor the logic of the function, you can also reduce a bit code duplication in the ZipFile part and clearly group the topdir and request.get.content / gh_api.get.content closer together for a clearer logic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough. I'll see what I can do.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there should be no condition here at all. From what it looks like, gh_api can just be used in both conditions either way.

As Matthias points out, instead of creating a url, we should use download_url.

@github-project-automation github-project-automation bot moved this from Todo to In Progress in nf-core infrastructure projects Sep 23, 2025
@JulianFlesch JulianFlesch modified the milestones: 3.4.0, 3.5.0 Oct 7, 2025
- Rename --api-download to --authenticated for better UX
- Replace os.rename with Pathlib operations
- Refactor download_wf_files method to reduce code duplication
- Rename compress to compress_type consistently across codebase
- Update all references and tests accordingly
Copy link
Contributor

@JulianFlesch JulianFlesch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for taking my time with this review. The repo download could probably be simplified further and I think we should get away without adding a new flag, if I am not missing something.

Is it also a target of this PR to add downloads from different sources (i.e. private repos)?

url = requests.get(download_url)
with ZipFile(io.BytesIO(url.content)) as zipfile:
zipfile.extractall(self.outdir)
if not self.api_download:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there should be no condition here at all. From what it looks like, gh_api can just be used in both conditions either way.

As Matthias points out, instead of creating a url, we should use download_url.

default=4,
help="Number of allowed parallel tasks",
)
@click.option(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we just use gh_api.get instead of requests.get in nf_core/pipelines/download/download.py, no new flag should be needed at all. Or am I missing something?

@jpfeuffer
Copy link
Contributor Author

Yep, private/internal (GitHub) repos should be supported by this. This is basically my use case.

The download_url unification should be addressed by my latest changes.

Comment on lines +574 to +586
# Fetch content and determine top-level directory based on authentication method
if self.authenticated:
# GitHub API download: fetch via API and get topdir from zip contents
content = gh_api.get(download_url).content
with ZipFile(io.BytesIO(content)) as zipfile:
topdir = zipfile.namelist()[0] # API zipballs have a generated directory name
zipfile.extractall(self.outdir)
else:
# Direct URL download: fetch and construct expected topdir name
content = requests.get(download_url).content
topdir = f"{self.pipeline}-{wf_sha if bool(wf_sha) else ''}".split("/")[-1]
with ZipFile(io.BytesIO(content)) as zipfile:
zipfile.extractall(self.outdir)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Fetch content and determine top-level directory based on authentication method
if self.authenticated:
# GitHub API download: fetch via API and get topdir from zip contents
content = gh_api.get(download_url).content
with ZipFile(io.BytesIO(content)) as zipfile:
topdir = zipfile.namelist()[0] # API zipballs have a generated directory name
zipfile.extractall(self.outdir)
else:
# Direct URL download: fetch and construct expected topdir name
content = requests.get(download_url).content
topdir = f"{self.pipeline}-{wf_sha if bool(wf_sha) else ''}".split("/")[-1]
with ZipFile(io.BytesIO(content)) as zipfile:
zipfile.extractall(self.outdir)
# GitHub API download: fetch via API and get topdir from zip contents
content = gh_api.get(download_url).content
with ZipFile(io.BytesIO(content)) as zipfile:
topdir = zipfile.namelist()[0] # API zipballs have a generated directory name
zipfile.extractall(self.outdir)

Copy link
Contributor

@JulianFlesch JulianFlesch Oct 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am talking about replacing the old way of downloading with your Github API urls. This works also with unauthenticated requests (within a quota) and that way we can reduce complexity and remove the new parameter.

What are your thoughts @MatthiasZepper ?

# allows to specify a container library / registry or a respective mirror to download images from
self.parallel = parallel
self.hide_progress = hide_progress
self.authenticated = authenticated
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should still be able to read the authenticated status from gh_api.auth instead of having the new flag.

Suggested change
self.authenticated = authenticated
if not gh_api.has_init:
gh_api.lazy_init()
self.authenticated = gh_api.auth is not None

@JulianFlesch
Copy link
Contributor

Ok, my revised suggestion:

  • Read the authenticated status from gh_api in favor of adding a new flag that uses gh_api anyway (this comment)
  • Keep the two wf_download_url to not cause new issues in workflows with quotas
  • Keep only one of the two redundant functions as suggested in this comment

What do you think?

Thanks for this feature @jpfeuffer and sorry for being a bit picky on these details, I am just trying to keep the complexity as low as can be.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: In Progress

Development

Successfully merging this pull request may close these issues.

4 participants