-
-
Notifications
You must be signed in to change notification settings - Fork 586
experimental_index_url slows down downloads by 25-50% #2849
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I recall there was a change to avoid log spam spam when alternative indexes were available. I wonder if that is contributing? I think part of that fix was to change it to fetch an entire index, figure out what was left, then fetch again. The behavior previously was to immediately fetch the next "step" as soon as any given package finished. Don't quote me on this exactly, though -- Ignas would remember better. Relevant snippet from slack:
This sounds vaguely familiar -- didn't we have a similar problem with our doc building using iblaze? In any case, I think a reasonable functionality request is to provide some basic restriction capability for how it traverses the indexes. e.g. if you simply don't care about musl (or windows, or w/e), then it shouldn't try to follow any edges that are musl-specific, try to download any metadata for musl-specific wheels, etc. (as an aside, doesn't the uv.lock format and pylock.toml have capabilities that would prevent this problem?) |
At the very least there is no index fetching because [[package]]
name = "markupsafe"
version = "3.0.2"
source = { registry = "https://pypi.org/simple" }
sdist = { url = "https://files.pythonhosted.org/packages/b2/97/5d42485e71dfc078108a86d6de8fa46db44a1a9295e89c5d6d4a06e23a62/markupsafe-3.0.2.tar.gz", hash = "sha256:ee55d3edf80167e48ea11a923c7386f4669df67d7994554387f84e7d8b0a2bf0", size = 20537 }
wheels = [
{ url = "https://files.pythonhosted.org/packages/6b/28/bbf83e3f76936960b850435576dd5e67034e200469571be53f69174a2dfd/MarkupSafe-3.0.2-cp311-cp311-macosx_10_9_universal2.whl", hash = "sha256:9025b4018f3a1314059769c7bf15441064b2207cb3f065e6ea1e7359cb46db9d", size = 14353 },
{ url = "https://files.pythonhosted.org/packages/6c/30/316d194b093cde57d448a4c3209f22e3046c5bb2fb0820b118292b334be7/MarkupSafe-3.0.2-cp311-cp311-macosx_11_0_arm64.whl", hash = "sha256:93335ca3812df2f366e80509ae119189886b0f3c2b81325d39efdb84a1e2ae93", size = 12392 }, |
Could you please provide the following numbers for each of the cases:
|
π bug report
Affected Rule
pip.parse
Is this a regression?
No
Description
Through various issues it has been recommended that I use
experimental_index_url
to get improved behavior for pip.parse. My discovery after attempting to use this in our project is that fetching all deps for the first time slowed down significantly. Here are some of the stats I pulled in our project:without experimental_index_url:
with experimental index
It seems like this is consistent, I tested about 10 times before assuming this was the cause. I imagine this is heavily dependent on what other repo activity exists in the project. I tried messing with the value of
--http_max_parallel_downloads
without any luck.To test this I did a
rm -rf ~/.cache/pip ~/.cache/bazel
between runs to make sure I was starting completely clean.π¬ Minimal Reproduction
I tried reproducing this in a rules_python example but the difference was not as severe as in our project, which is what makes me think that it's heavily dependent on other http_archives etc that you have.
π Your Environment
Operating System:
Output of
bazel version
:Rules_python version:
Anything else relevant?
https://bazelbuild.slack.com/archives/CA306CEV6/p1745365100854859
The text was updated successfully, but these errors were encountered: