Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Preliminary steps to save the CI infrastructure #39009

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 48 commits into from

Conversation

kwankyu
Copy link
Collaborator

@kwankyu kwankyu commented Nov 20, 2024

To improve the situation with the CI infrastructure, this PR:

  • added comments untangling obscure code in CI-related files, for those poor guys who ever attempt to read the files for whatever reasons.

  • while doing the cosmetic changes, a bug (about -uninstall targets) was found build/make/Makefile.in, which is fixed here.

    to test, do

    $ ./configure --enable-dot2tex | grep dot2tex
    $ make build | grep dot2tex
    $ ./configure --disable-dot2tex | grep dot2tex
    $ make build | grep dot2tex
    
  • fixed some jobs in the CI-linux workflow that fail because of duplicate artifact names.

  • removed ubuntu-lunar, ubuntu-mantic, conda-forge-python3.11, ubuntu-bionic-gcc_8-i386, debian-bullseye-i386 from the list of the default systems that CI runs for. This is how to properly modify the list:

    • first edit tox.ini (find DEFAULT_SYSTEM_FACTORS)
    • run tox -e update_docker_platforms
    • commit the changes
  • removed old versions of linuxmint and added new versions.

  • removed old versions of fedora distributions.

  • "optional" and "experimental" jobs now run upon "standard" docker images, instead of "maximal" ones, to avoid "out of runner space" error.

  • renamed "Reusable workflow for Docker-based portability CI" to "Workflow for Linux portability CI" for short name and made it runnable through github interface to facilitate testing specific platform by adding "workflow-dispatch" calling docker.yml.

    test: https://github.com/kwankyu/sage/actions/workflows/docker.yml

  • added helpful comments and updated the developer doc

  • reimplemented .ci/write-dockerfile.sh so that simplified Dockerfile is generated for present and future stability

  • turned off failing jobs in "CI Linux incremental"

  • removed seemingly useless subprojects/factory directory to eliminate certain git warnings.

  • turned off "standard-sitepackegs" and "standard-constraints_pkgs-norequirements" jobs as they fail on (almost) all platforms.

test CI run (as of 10.6.beta8): https://github.com/kwankyu/sage/actions/runs/13692352790
compare with the status quo: https://github.com/sagemath/sage/actions/runs/13596179432

test CI with a PR: kwankyu#82

The main objective of this PR is to solve issues with the workflow "CI Linux" such that a failure on a platform reveals solely some problem of sage built on the platform, but not a problem of the CI infrastructure. After this PR, hopefully, each of failing platforms should be tackled individually. If a platform fails, perhaps we should

  1. decide first whether to support the platform or not.
  2. if the platform is supported, open a github issue for it.
  3. if the platform is not supported, then remove it from the "master list of supported linux platforms" in tox.ini.
  4. if a supported platform constantly fails but no PR for the issue is present, then we may turn it off (by commenting it out) until fixed.

I suggest discontinuing support (at least in CI) for Linux releases that have been past their EOL (end of life or end of support by the distributor) for more than 2 years.

Only decent platforms according to the CI results should be listed in https://github.com/sagemath/sage/wiki/Sage-10.6-Release-Tour#availability-and-installation-help.

The following diagram shows how packages are installed for each of CI jobs:

                _prereq | standard package | optional package | experimental package
-------------------------------------------------------------------------------------
"minimal"        SSSSSS | ---------------- |
"standard"       SSSSSS | SSSSSSSSSSS----- |
"maximal"        SSSSSS | SSSSSSSSSSS----- | SSSS------------ | 
"optional"       SSSSSS | SSSSSSSSSSS----- | ---------------- | 
"experimental"   SSSSSS | SSSSSSSSSSS----- |                  | ------------------ |

where "S" represents system package and dash "-" represents Sage package. Hence

  • In the test results of "minimal" job, we can examine how well standard sage packages behave with sage.
  • In the test results of "standard" job, we can examine how well standard system packages behave with sage.
  • In the test results of "maximal" job, we can examine how well optional system packages behave with sage.
  • In the test results of "optional" job, we can examine how well optional sage packages behave with sage.
  • In the test results of "experimental" job, we can examine how well experimental sage packages behave with sage.

📝 Checklist

  • The title is concise and informative.
  • The description explains in detail what this PR is about.
  • I have linked a relevant issue or discussion.
  • I have created tests covering the changes.
  • I have updated the documentation and checked the documentation preview.

⌛ Dependencies

@kwankyu kwankyu changed the title Add comments untangling complicated code Add comments untangling obscure code in a few build-related files Nov 20, 2024
Copy link

github-actions bot commented Nov 20, 2024

Documentation preview for this PR (built with commit 527a5ae; changes) is ready! 🎉
This preview will update shortly after each push to this PR.

@kwankyu kwankyu force-pushed the p/add-comments-to-scripts branch from ff6bdcd to 7da5efe Compare November 20, 2024 13:47
@kwankyu kwankyu changed the title Add comments untangling obscure code in a few build-related files Add comments untangling obscure code in a few CI-related files Nov 21, 2024
@kwankyu kwankyu force-pushed the p/add-comments-to-scripts branch from db666ee to 6df7a6a Compare February 13, 2025 08:35
@kwankyu kwankyu changed the title Add comments untangling obscure code in a few CI-related files Add comments untangling obscure code in CI-related files Feb 13, 2025
@kwankyu kwankyu changed the title Add comments untangling obscure code in CI-related files Preliminary steps to save the CI infrastructure Feb 13, 2025
@kwankyu kwankyu marked this pull request as ready for review February 13, 2025 10:12
@kwankyu kwankyu mentioned this pull request Feb 13, 2025
5 tasks
@dimpase dimpase added the disputed PR is waiting for community vote, see https://groups.google.com/g/sage-devel/c/IgBYUJl33SQ label Feb 13, 2025
@dimpase
Copy link
Member

dimpase commented Feb 13, 2025

before proceeding it's good to decide whether we rather go with the other PR

@kwankyu kwankyu marked this pull request as draft February 15, 2025 00:39
@tobiasdiez
Copy link
Contributor

The old "optional" already does that, and the case is useful to test optional sage packages. No?

I agree with you that your diagram is the most complete and most logical one; I just very much doubt that we have the resources to have all these test pass. For this reason, I would start with "standard" and the old "optional", and once they are green most of the times and someone still has resources we can activate "minimal" and your "optional".

In my view, "minimal" is the most important job because in it, all standard sage packages are built. How can they be tested on a platform without actually installing them on the platform?

I don't think we necessarily need to test this. In case of errors during the compilation of sage packages, people seem to be pragmatic and recommend to install the system package (latest example).

I removed some linux releases (ubuntu and fedora releases) from CI, according to

* discontinue support in CI for Linux releases that have been past their EOL (end of life or end of support by the distributor) for more than 2 years.

I suggest that as a guideline to decide which Linux releases we should run tests for in CI. I don't know if there is already a similar guideline in our documentation (or in sage-devel).

I agree we should not test systems past their EOL (but this should not be the only criteria for dropping support).

@kwankyu
Copy link
Collaborator Author

kwankyu commented Mar 5, 2025

In my view, "minimal" is the most important job because in it, all standard sage packages are built. How can they be tested on a platform without actually installing them on the platform?

I don't think we necessarily need to test this.

We are testing sage packages on multiple levels:

  • the author of a PR modifying the sage package on his own machine
  • the PR branch is tested incrementally on selected machines (CI incremental)
  • the release to which the PR branch is merged is tested on buildbots run by the release manager
  • the release is tested on CI Linux and CI macOS on wider selection of machines, to give info to developers

for the best stability of the release, when built on the user's machine.

In case of errors during the compilation of sage packages, people seem to be pragmatic and recommend to install the system package (latest example).

This is a failure of the multiple-level testing, since the testing is not perfect. Reducing such instances of build failure is exactly the purpose of our CI infrastructure. That helps people, including Dima, live better life.

@kwankyu
Copy link
Collaborator Author

kwankyu commented Mar 5, 2025

Do we still support python 3.9? Where is documented the oldest python we support?

We may remove more platforms if we do not support python 3.9...

@kwankyu
Copy link
Collaborator Author

kwankyu commented Mar 6, 2025

Do we still support python 3.9? Where is documented the oldest python we support?

We still support python 3.9 according to https://doc-release--sagemath.netlify.app/html/en/reference/spkg/python3

@user202729
Copy link
Contributor

user202729 commented Mar 6, 2025

I don't think we necessarily need to test this.

In case of errors during the compilation of sage packages, people seem to be pragmatic and recommend to install the system package (latest example).

Reducing such instances of build failure is exactly the purpose of our CI infrastructure […]

I think we went through this argument: yes, the CI did its job (with occasional fix needed such as this one?), but we don't have enough resources to fix failures anyway and/or people who could fix it doesn't see the necessity because workaround is available.

(Your argument against it, people could fix it doesn't because testing is not perfect instead, yes it's true that testing is not perfect, but the implication wouldn't hold if people getting build issues still are redirected to install system package even before CI failing.)

Yes, I also agree that it's better to disable than delete the code so (hypothetically) if we get more resources in the future it can simply be reenabled.

We still support python 3.9 according to https://doc-release--sagemath.netlify.app/html/en/reference/spkg/python3

has been dropped since #39251 (unfortunately there isn't enough links between relevant parts in source code/documentation so people can forget to update one when another is updated)

@dimpase
Copy link
Member

dimpase commented Mar 6, 2025

failures anyway and/or people who could fix it doesn't see the necessity because workaround is available.

Exactly. Each vendored package is more work down the road, not less.
E.g. I couldn't care less if we have a failure in some jupyter test on a particular platform.
Because pip install notebook is more robust than our lame efforts to vendor hundreds of packages it needs.
Why do we have these hundreds in Sage in the 1st place? Inertia, this is why. No other reason.

@kwankyu
Copy link
Collaborator Author

kwankyu commented Mar 6, 2025

... but we don't have enough resources to fix failures anyway and/or people who could fix it doesn't see the necessity because workaround is available.

I think you mean people or developers by "resources".

We cannot say definitely "people will fix it", "people won't fix it", "we have enough resource", or "we don't have enough resources". We don't know who will be interested in fixing some sage package for some platform.

So what is the implication of your argument? Stop running CI (testing sage packages)?

Your argument against it, people could fix it doesn't because testing is not perfect instead

I didn't argue that "people could fix it doesn't, because testing is not perfect".

I said "build failures on user machines happen because testing (through CI) is not perfect".

We still support python 3.9 according to https://doc-release--sagemath.netlify.app/html/en/reference/spkg/python3

has been dropped since #39251 (unfortunately there isn't enough links between relevant parts in source code/documentation so people can forget to update one when another is updated)

OK. Then I will just leave the task of dropping other old platforms to the other PR.

@dimpase
Copy link
Member

dimpase commented Mar 6, 2025

We cannot say definitely "people will fix it", "people won't fix it", "we have enough resource", or "we don't have enough resources". We don't know who will be interested in fixing some sage package for some platform.

people should not be interested in fixing vendored packages which can be perfectly replaced by what's provided by systems/distros.

We have a lot of really messy old broken code in sagelib, big chunks should be redone, for a variety of reasons - this is where the time should go into.

Not elsewhere

@kwankyu
Copy link
Collaborator Author

kwankyu commented Mar 6, 2025

people should not be interested in fixing vendored packages which can be perfectly replaced by what's provided by systems/distros.

off-topic: then put them into _prereq. Please do that on your own PR.

@kwankyu
Copy link
Collaborator Author

kwankyu commented Mar 9, 2025

As you can see from the latest test: https://github.com/kwankyu/sage/actions/runs/13692352790, the fixed CI Linux reports only failures due to the build system, except that some jobs (notably in "optional") fail for "out of runner space". I tried to fix it but I found no good solution for the space problem.

In my view, platforms on which both "standard" and "minimal" jobs fail deserve most attention, such as "opensuse-tumbleweed-python3.10". However, this particular platform may be removed from CI if we do not support python 3.10. I leave this task to the other PR.

I still insist to merge this PR first, then "resource problem" is treated on the other PR.

Let me know if your vote changed.

@tobiasdiez
Copy link
Contributor

@kwankyu Why did you set this to positive review?

I have troubles reading the last comments on this PR (in particular #39009 (comment) and #39009 (comment)) as positive reviews/votes.

From my side, removing deprecated/end-of-life systems as well as systems with outdated python versions is okay. Could you please extract this part to a new PR so that we can get this in as soon as possible?

@user202729
Copy link
Contributor

@tobiasdiez Has the previous comments addressed your concerns with running optional on standard instead of maximal?

@tobiasdiez
Copy link
Contributor

tobiasdiez commented Mar 9, 2025

@tobiasdiez Has the previous comments addressed your concerns with running optional on standard instead of maximal?

I haven't found the time to really look at it, but the behavior described in #39009 (comment) sounds very much like a ci bug to me. Why do you first install a system package, check for its existence to then anyway overwrite it with a newly compiled version?

@user202729
Copy link
Contributor

sounds very much like a ci bug to me

Sounds about right. But then if that was also the behavior before the patch, it sounds reasonable to just leave a

# known bug: packages force recompiled, should use system version instead whenever available, to be fix later
# current temporary workaround: stop install the system package

somewhere there, then we should be good. As is, the new behavior is not worse than the old one.

@tobiasdiez
Copy link
Contributor

Sure that would be a temporary workaround. But if I understand the PR description correctly, then this issue is actually fixed in the "maximal" run. Not sure though since there are so many different things going on in this PR at the same time...

@kwankyu
Copy link
Collaborator Author

kwankyu commented Mar 9, 2025

I have troubles reading the last comments on this PR (in particular #39009 (comment) and #39009 (comment)) as positive reviews/votes.

I had trouble too. So I have requested Dima twice to be clear about his vote. He had enough time to react, but did not.

Without Dima's response, setting this PR as positive according to the rules of disputed PRs has no problem. I think your action removing the positive review label is against the rules, and perhaps a CoC violation.

From my side, removing deprecated/end-of-life systems as well as systems with outdated python versions is okay. Could you please extract this part to a new PR so that we can get this in as soon as possible?

No. You rejected Dima's (and my) request to base your PR onto this one. I clearly objected to your PR removing CI parts.

Now that your PR is merged, this PR is outdated. I will close this.

I will prepare a new PR restoring the removed CI parts based on this PR. However, as there is no enough support for my view on how to maintain the CI infrastructure, I won't set it for needs review anytime soon.

@kwankyu
Copy link
Collaborator Author

kwankyu commented Mar 9, 2025

Thanks for attention and discussion. During the discussion, this PR got improved and put the (now old) CI in good shape.

As this PR is outdated, I now close it.

@kwankyu kwankyu closed this Mar 9, 2025
@user202729
Copy link
Contributor

If you want to keep the CI you can just put commits to revert the old ones, can't you?

If we agree in principle that things supported should be tested, then there would be nothing wrong with doing that. (on the other hand if the old build system is to be removed really soon and/or _prereq = standard it doesn't matter that much, but will it?)

@kwankyu
Copy link
Collaborator Author

kwankyu commented Mar 10, 2025

If you want to keep the CI you can just put commits to revert the old ones, can't you?

If you mean by "the old ones" the commits of the merged PR, yes that is what I will do in a new PR.

However, as the other PR got merged just now, I won't hurry to propose a PR reverting it.

@kwankyu
Copy link
Collaborator Author

kwankyu commented Mar 10, 2025

... if the old build system is to be removed really soon and/or _prereq = standard it doesn't matter that much,

Right.

but will it?

I don't think so. To aim for it, we need community-wide approval. Even if we aim for it, we need a long deprecation period.

@tobiasdiez
Copy link
Contributor

I will prepare a new PR restoring the removed CI parts based on this PR. However, as there is no enough support for my view on how to maintain the CI infrastructure, I won't set it for needs review anytime soon.

As I've said before, I'm not against restoring parts of the CI that were removed in the other PR if all tests pass and someone is interested in the results (and keeps watching them).

To gauge interest, I suggest you run the "to-be-restored part" (say the optional tests) and then track all occurring problems in new issues. Once the majority of these issues are fixed, we can reactivate the corresponding part in the CI of the sage repo. Would that be an acceptable path forward for you?

@kwankyu
Copy link
Collaborator Author

kwankyu commented Mar 10, 2025

As I've said before, I'm not against restoring parts of the CI that were removed in the other PR if all tests pass and someone is interested in the results (and keeps watching them).

Sage is a volunteer project. No one is obliged to keep watching CI results. The CI infrastructure is there because the project needs it. As long as it provides reliable info (whether the tests pass or not), the info may be used by someone to fix the failed platform.

Do we remove some code from the sage library if we suspect that the code is never used in the last 6 months?

Do we have "Build & Test using Conda" running for every PR because you keep watching all of them? By the way, is it providing reliable and useful info to the PR author?

... Would that be an acceptable path forward for you?

No.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
disputed PR is waiting for community vote, see https://groups.google.com/g/sage-devel/c/IgBYUJl33SQ p: critical / 2
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants