-
-
Notifications
You must be signed in to change notification settings - Fork 626
Preliminary steps to save the CI infrastructure #39009
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Documentation preview for this PR (built with commit 527a5ae; changes) is ready! 🎉 |
ff6bdcd
to
7da5efe
Compare
db666ee
to
6df7a6a
Compare
before proceeding it's good to decide whether we rather go with the other PR |
I don't think we necessarily need to test this. In case of errors during the compilation of sage packages, people seem to be pragmatic and recommend to install the system package (latest example).
I agree we should not test systems past their EOL (but this should not be the only criteria for dropping support). |
We are testing sage packages on multiple levels:
for the best stability of the release, when built on the user's machine.
This is a failure of the multiple-level testing, since the testing is not perfect. Reducing such instances of build failure is exactly the purpose of our CI infrastructure. That helps people, including Dima, live better life. |
Do we still support python 3.9? Where is documented the oldest python we support? We may remove more platforms if we do not support python 3.9... |
We still support python 3.9 according to https://doc-release--sagemath.netlify.app/html/en/reference/spkg/python3 |
I think we went through this argument: yes, the CI did its job (with occasional fix needed such as this one?), but we don't have enough resources to fix failures anyway and/or people who could fix it doesn't see the necessity because workaround is available. (Your argument against it, people could fix it doesn't because testing is not perfect instead, yes it's true that testing is not perfect, but the implication wouldn't hold if people getting build issues still are redirected to install system package even before CI failing.) Yes, I also agree that it's better to disable than delete the code so (hypothetically) if we get more resources in the future it can simply be reenabled.
has been dropped since #39251 (unfortunately there isn't enough links between relevant parts in source code/documentation so people can forget to update one when another is updated) |
Exactly. Each vendored package is more work down the road, not less. |
I think you mean people or developers by "resources". We cannot say definitely "people will fix it", "people won't fix it", "we have enough resource", or "we don't have enough resources". We don't know who will be interested in fixing some sage package for some platform. So what is the implication of your argument? Stop running CI (testing sage packages)?
I didn't argue that "people could fix it doesn't, because testing is not perfect". I said "build failures on user machines happen because testing (through CI) is not perfect".
OK. Then I will just leave the task of dropping other old platforms to the other PR. |
people should not be interested in fixing vendored packages which can be perfectly replaced by what's provided by systems/distros. We have a lot of really messy old broken code in sagelib, big chunks should be redone, for a variety of reasons - this is where the time should go into. Not elsewhere |
off-topic: then put them into |
As you can see from the latest test: https://github.com/kwankyu/sage/actions/runs/13692352790, the fixed CI Linux reports only failures due to the build system, except that some jobs (notably in "optional") fail for "out of runner space". I tried to fix it but I found no good solution for the space problem. In my view, platforms on which both "standard" and "minimal" jobs fail deserve most attention, such as "opensuse-tumbleweed-python3.10". However, this particular platform may be removed from CI if we do not support python 3.10. I leave this task to the other PR. I still insist to merge this PR first, then "resource problem" is treated on the other PR. Let me know if your vote changed. |
@kwankyu Why did you set this to positive review? I have troubles reading the last comments on this PR (in particular #39009 (comment) and #39009 (comment)) as positive reviews/votes. From my side, removing deprecated/end-of-life systems as well as systems with outdated python versions is okay. Could you please extract this part to a new PR so that we can get this in as soon as possible? |
@tobiasdiez Has the previous comments addressed your concerns with running optional on standard instead of maximal? |
I haven't found the time to really look at it, but the behavior described in #39009 (comment) sounds very much like a ci bug to me. Why do you first install a system package, check for its existence to then anyway overwrite it with a newly compiled version? |
Sounds about right. But then if that was also the behavior before the patch, it sounds reasonable to just leave a
somewhere there, then we should be good. As is, the new behavior is not worse than the old one. |
Sure that would be a temporary workaround. But if I understand the PR description correctly, then this issue is actually fixed in the "maximal" run. Not sure though since there are so many different things going on in this PR at the same time... |
I had trouble too. So I have requested Dima twice to be clear about his vote. He had enough time to react, but did not. Without Dima's response, setting this PR as positive according to the rules of disputed PRs has no problem. I think your action removing the positive review label is against the rules, and perhaps a CoC violation.
No. You rejected Dima's (and my) request to base your PR onto this one. I clearly objected to your PR removing CI parts. Now that your PR is merged, this PR is outdated. I will close this. I will prepare a new PR restoring the removed CI parts based on this PR. However, as there is no enough support for my view on how to maintain the CI infrastructure, I won't set it for needs review anytime soon. |
Thanks for attention and discussion. During the discussion, this PR got improved and put the (now old) CI in good shape. As this PR is outdated, I now close it. |
If you want to keep the CI you can just put commits to revert the old ones, can't you? If we agree in principle that things supported should be tested, then there would be nothing wrong with doing that. (on the other hand if the old build system is to be removed really soon and/or |
If you mean by "the old ones" the commits of the merged PR, yes that is what I will do in a new PR. However, as the other PR got merged just now, I won't hurry to propose a PR reverting it. |
Right.
I don't think so. To aim for it, we need community-wide approval. Even if we aim for it, we need a long deprecation period. |
As I've said before, I'm not against restoring parts of the CI that were removed in the other PR if all tests pass and someone is interested in the results (and keeps watching them). To gauge interest, I suggest you run the "to-be-restored part" (say the optional tests) and then track all occurring problems in new issues. Once the majority of these issues are fixed, we can reactivate the corresponding part in the CI of the sage repo. Would that be an acceptable path forward for you? |
Sage is a volunteer project. No one is obliged to keep watching CI results. The CI infrastructure is there because the project needs it. As long as it provides reliable info (whether the tests pass or not), the info may be used by someone to fix the failed platform. Do we remove some code from the sage library if we suspect that the code is never used in the last 6 months? Do we have "Build & Test using Conda" running for every PR because you keep watching all of them? By the way, is it providing reliable and useful info to the PR author?
No. |
To improve the situation with the CI infrastructure, this PR:
added comments untangling obscure code in CI-related files, for those poor guys who ever attempt to read the files for whatever reasons.
while doing the cosmetic changes, a bug (about
-uninstall
targets) was foundbuild/make/Makefile.in
, which is fixed here.to test, do
fixed some jobs in the CI-linux workflow that fail because of duplicate artifact names.
removed ubuntu-lunar, ubuntu-mantic, conda-forge-python3.11, ubuntu-bionic-gcc_8-i386, debian-bullseye-i386 from the list of the default systems that CI runs for. This is how to properly modify the list:
tox.ini
(find DEFAULT_SYSTEM_FACTORS)tox -e update_docker_platforms
removed old versions of linuxmint and added new versions.
removed old versions of fedora distributions.
"optional" and "experimental" jobs now run upon "standard" docker images, instead of "maximal" ones, to avoid "out of runner space" error.
renamed "Reusable workflow for Docker-based portability CI" to "Workflow for Linux portability CI" for short name and made it runnable through github interface to facilitate testing specific platform by adding "workflow-dispatch" calling
docker.yml
.test: https://github.com/kwankyu/sage/actions/workflows/docker.yml
added helpful comments and updated the developer doc
reimplemented
.ci/write-dockerfile.sh
so that simplified Dockerfile is generated for present and future stabilityturned off failing jobs in "CI Linux incremental"
removed seemingly useless
subprojects/factory
directory to eliminate certain git warnings.turned off "standard-sitepackegs" and "standard-constraints_pkgs-norequirements" jobs as they fail on (almost) all platforms.
test CI run (as of 10.6.beta8): https://github.com/kwankyu/sage/actions/runs/13692352790
compare with the status quo: https://github.com/sagemath/sage/actions/runs/13596179432
test CI with a PR: kwankyu#82
The main objective of this PR is to solve issues with the workflow "CI Linux" such that a failure on a platform reveals solely some problem of sage built on the platform, but not a problem of the CI infrastructure. After this PR, hopefully, each of failing platforms should be tackled individually. If a platform fails, perhaps we should
I suggest discontinuing support (at least in CI) for Linux releases that have been past their EOL (end of life or end of support by the distributor) for more than 2 years.
Only decent platforms according to the CI results should be listed in https://github.com/sagemath/sage/wiki/Sage-10.6-Release-Tour#availability-and-installation-help.
The following diagram shows how packages are installed for each of CI jobs:
where "S" represents system package and dash "-" represents Sage package. Hence
📝 Checklist
⌛ Dependencies