Preliminary steps to save the CI infrastructure #39009

kwankyu · 2024-11-20T09:40:07Z

To improve the situation with the CI infrastructure, this PR:

added comments untangling obscure code in CI-related files, for those poor guys who ever attempt to read the files for whatever reasons.

while doing the cosmetic changes, a bug (about -uninstall targets) was found build/make/Makefile.in, which is fixed here.

to test, do

$ ./configure --enable-dot2tex | grep dot2tex
$ make build | grep dot2tex
$ ./configure --disable-dot2tex | grep dot2tex
$ make build | grep dot2tex

fixed some jobs in the CI-linux workflow that fail because of duplicate artifact names.
removed ubuntu-lunar, ubuntu-mantic, conda-forge-python3.11, ubuntu-bionic-gcc_8-i386, debian-bullseye-i386 from the list of the default systems that CI runs for. This is how to properly modify the list:
- first edit tox.ini (find DEFAULT_SYSTEM_FACTORS)
- run tox -e update_docker_platforms
- commit the changes
removed old versions of linuxmint and added new versions.
removed old versions of fedora distributions.
"optional" and "experimental" jobs now run upon "standard" docker images, instead of "maximal" ones, to avoid "out of runner space" error.
renamed "Reusable workflow for Docker-based portability CI" to "Workflow for Linux portability CI" for short name and made it runnable through github interface to facilitate testing specific platform by adding "workflow-dispatch" calling docker.yml.

test: https://github.com/kwankyu/sage/actions/workflows/docker.yml
added helpful comments and updated the developer doc
reimplemented .ci/write-dockerfile.sh so that simplified Dockerfile is generated for present and future stability
turned off failing jobs in "CI Linux incremental"
removed seemingly useless subprojects/factory directory to eliminate certain git warnings.
turned off "standard-sitepackegs" and "standard-constraints_pkgs-norequirements" jobs as they fail on (almost) all platforms.

test CI run (as of 10.6.beta8): https://github.com/kwankyu/sage/actions/runs/13692352790
compare with the status quo: https://github.com/sagemath/sage/actions/runs/13596179432

test CI with a PR: kwankyu#82

The main objective of this PR is to solve issues with the workflow "CI Linux" such that a failure on a platform reveals solely some problem of sage built on the platform, but not a problem of the CI infrastructure. After this PR, hopefully, each of failing platforms should be tackled individually. If a platform fails, perhaps we should

decide first whether to support the platform or not.
if the platform is supported, open a github issue for it.
if the platform is not supported, then remove it from the "master list of supported linux platforms" in tox.ini.
if a supported platform constantly fails but no PR for the issue is present, then we may turn it off (by commenting it out) until fixed.

I suggest discontinuing support (at least in CI) for Linux releases that have been past their EOL (end of life or end of support by the distributor) for more than 2 years.

Only decent platforms according to the CI results should be listed in https://github.com/sagemath/sage/wiki/Sage-10.6-Release-Tour#availability-and-installation-help.

The following diagram shows how packages are installed for each of CI jobs:

                _prereq | standard package | optional package | experimental package
-------------------------------------------------------------------------------------
"minimal"        SSSSSS | ---------------- |
"standard"       SSSSSS | SSSSSSSSSSS----- |
"maximal"        SSSSSS | SSSSSSSSSSS----- | SSSS------------ | 
"optional"       SSSSSS | SSSSSSSSSSS----- | ---------------- | 
"experimental"   SSSSSS | SSSSSSSSSSS----- |                  | ------------------ |

where "S" represents system package and dash "-" represents Sage package. Hence

In the test results of "minimal" job, we can examine how well standard sage packages behave with sage.
In the test results of "standard" job, we can examine how well standard system packages behave with sage.
In the test results of "maximal" job, we can examine how well optional system packages behave with sage.
In the test results of "optional" job, we can examine how well optional sage packages behave with sage.
In the test results of "experimental" job, we can examine how well experimental sage packages behave with sage.

📝 Checklist

The title is concise and informative.
The description explains in detail what this PR is about.
I have linked a relevant issue or discussion.
I have created tests covering the changes.
I have updated the documentation and checked the documentation preview.

⌛ Dependencies

github-actions · 2024-11-20T11:24:13Z

Documentation preview for this PR (built with commit 527a5ae; changes) is ready! 🎉
This preview will update shortly after each push to this PR.

.github/workflows/docker.yml

dimpase · 2025-02-13T15:45:42Z

before proceeding it's good to decide whether we rather go with the other PR

tobiasdiez · 2025-03-05T19:17:31Z

The old "optional" already does that, and the case is useful to test optional sage packages. No?

I agree with you that your diagram is the most complete and most logical one; I just very much doubt that we have the resources to have all these test pass. For this reason, I would start with "standard" and the old "optional", and once they are green most of the times and someone still has resources we can activate "minimal" and your "optional".

In my view, "minimal" is the most important job because in it, all standard sage packages are built. How can they be tested on a platform without actually installing them on the platform?

I don't think we necessarily need to test this. In case of errors during the compilation of sage packages, people seem to be pragmatic and recommend to install the system package (latest example).

I removed some linux releases (ubuntu and fedora releases) from CI, according to
* discontinue support in CI for Linux releases that have been past their EOL (end of life or end of support by the distributor) for more than 2 years.
I suggest that as a guideline to decide which Linux releases we should run tests for in CI. I don't know if there is already a similar guideline in our documentation (or in sage-devel).

I agree we should not test systems past their EOL (but this should not be the only criteria for dropping support).

kwankyu · 2025-03-05T22:18:49Z

In my view, "minimal" is the most important job because in it, all standard sage packages are built. How can they be tested on a platform without actually installing them on the platform?

I don't think we necessarily need to test this.

We are testing sage packages on multiple levels:

the author of a PR modifying the sage package on his own machine
the PR branch is tested incrementally on selected machines (CI incremental)
the release to which the PR branch is merged is tested on buildbots run by the release manager
the release is tested on CI Linux and CI macOS on wider selection of machines, to give info to developers

for the best stability of the release, when built on the user's machine.

In case of errors during the compilation of sage packages, people seem to be pragmatic and recommend to install the system package (latest example).

This is a failure of the multiple-level testing, since the testing is not perfect. Reducing such instances of build failure is exactly the purpose of our CI infrastructure. That helps people, including Dima, live better life.

kwankyu · 2025-03-05T23:30:44Z

Do we still support python 3.9? Where is documented the oldest python we support?

We may remove more platforms if we do not support python 3.9...

kwankyu · 2025-03-06T00:49:44Z

Do we still support python 3.9? Where is documented the oldest python we support?

We still support python 3.9 according to https://doc-release--sagemath.netlify.app/html/en/reference/spkg/python3

user202729 · 2025-03-06T01:01:11Z

I don't think we necessarily need to test this.

In case of errors during the compilation of sage packages, people seem to be pragmatic and recommend to install the system package (latest example).

Reducing such instances of build failure is exactly the purpose of our CI infrastructure […]

I think we went through this argument: yes, the CI did its job (with occasional fix needed such as this one?), but we don't have enough resources to fix failures anyway and/or people who could fix it doesn't see the necessity because workaround is available.

(Your argument against it, people could fix it doesn't because testing is not perfect instead, yes it's true that testing is not perfect, but the implication wouldn't hold if people getting build issues still are redirected to install system package even before CI failing.)

Yes, I also agree that it's better to disable than delete the code so (hypothetically) if we get more resources in the future it can simply be reenabled.

We still support python 3.9 according to https://doc-release--sagemath.netlify.app/html/en/reference/spkg/python3

has been dropped since #39251 (unfortunately there isn't enough links between relevant parts in source code/documentation so people can forget to update one when another is updated)

dimpase · 2025-03-06T01:26:20Z

failures anyway and/or people who could fix it doesn't see the necessity because workaround is available.

Exactly. Each vendored package is more work down the road, not less.
E.g. I couldn't care less if we have a failure in some jupyter test on a particular platform.
Because pip install notebook is more robust than our lame efforts to vendor hundreds of packages it needs.
Why do we have these hundreds in Sage in the 1st place? Inertia, this is why. No other reason.

kwankyu · 2025-03-06T01:44:24Z

... but we don't have enough resources to fix failures anyway and/or people who could fix it doesn't see the necessity because workaround is available.

I think you mean people or developers by "resources".

We cannot say definitely "people will fix it", "people won't fix it", "we have enough resource", or "we don't have enough resources". We don't know who will be interested in fixing some sage package for some platform.

So what is the implication of your argument? Stop running CI (testing sage packages)?

Your argument against it, people could fix it doesn't because testing is not perfect instead

I didn't argue that "people could fix it doesn't, because testing is not perfect".

I said "build failures on user machines happen because testing (through CI) is not perfect".

We still support python 3.9 according to https://doc-release--sagemath.netlify.app/html/en/reference/spkg/python3

has been dropped since #39251 (unfortunately there isn't enough links between relevant parts in source code/documentation so people can forget to update one when another is updated)

OK. Then I will just leave the task of dropping other old platforms to the other PR.

dimpase · 2025-03-06T01:52:49Z

We cannot say definitely "people will fix it", "people won't fix it", "we have enough resource", or "we don't have enough resources". We don't know who will be interested in fixing some sage package for some platform.

people should not be interested in fixing vendored packages which can be perfectly replaced by what's provided by systems/distros.

We have a lot of really messy old broken code in sagelib, big chunks should be redone, for a variety of reasons - this is where the time should go into.

Not elsewhere

kwankyu · 2025-03-06T02:24:56Z

people should not be interested in fixing vendored packages which can be perfectly replaced by what's provided by systems/distros.

off-topic: then put them into _prereq. Please do that on your own PR.

kwankyu · 2025-03-09T09:46:19Z

As you can see from the latest test: https://github.com/kwankyu/sage/actions/runs/13692352790, the fixed CI Linux reports only failures due to the build system, except that some jobs (notably in "optional") fail for "out of runner space". I tried to fix it but I found no good solution for the space problem.

In my view, platforms on which both "standard" and "minimal" jobs fail deserve most attention, such as "opensuse-tumbleweed-python3.10". However, this particular platform may be removed from CI if we do not support python 3.10. I leave this task to the other PR.

I still insist to merge this PR first, then "resource problem" is treated on the other PR.

Let me know if your vote changed.

tobiasdiez · 2025-03-09T11:57:23Z

@kwankyu Why did you set this to positive review?

I have troubles reading the last comments on this PR (in particular #39009 (comment) and #39009 (comment)) as positive reviews/votes.

From my side, removing deprecated/end-of-life systems as well as systems with outdated python versions is okay. Could you please extract this part to a new PR so that we can get this in as soon as possible?

user202729 · 2025-03-09T12:06:39Z

@tobiasdiez Has the previous comments addressed your concerns with running optional on standard instead of maximal?

tobiasdiez · 2025-03-09T13:15:19Z

@tobiasdiez Has the previous comments addressed your concerns with running optional on standard instead of maximal?

I haven't found the time to really look at it, but the behavior described in #39009 (comment) sounds very much like a ci bug to me. Why do you first install a system package, check for its existence to then anyway overwrite it with a newly compiled version?

user202729 · 2025-03-09T13:19:20Z

sounds very much like a ci bug to me

Sounds about right. But then if that was also the behavior before the patch, it sounds reasonable to just leave a

# known bug: packages force recompiled, should use system version instead whenever available, to be fix later
# current temporary workaround: stop install the system package

somewhere there, then we should be good. As is, the new behavior is not worse than the old one.

tobiasdiez · 2025-03-09T13:50:35Z

Sure that would be a temporary workaround. But if I understand the PR description correctly, then this issue is actually fixed in the "maximal" run. Not sure though since there are so many different things going on in this PR at the same time...

kwankyu · 2025-03-09T23:03:17Z

I have troubles reading the last comments on this PR (in particular #39009 (comment) and #39009 (comment)) as positive reviews/votes.

I had trouble too. So I have requested Dima twice to be clear about his vote. He had enough time to react, but did not.

Without Dima's response, setting this PR as positive according to the rules of disputed PRs has no problem. I think your action removing the positive review label is against the rules, and perhaps a CoC violation.

From my side, removing deprecated/end-of-life systems as well as systems with outdated python versions is okay. Could you please extract this part to a new PR so that we can get this in as soon as possible?

No. You rejected Dima's (and my) request to base your PR onto this one. I clearly objected to your PR removing CI parts.

Now that your PR is merged, this PR is outdated. I will close this.

I will prepare a new PR restoring the removed CI parts based on this PR. However, as there is no enough support for my view on how to maintain the CI infrastructure, I won't set it for needs review anytime soon.

kwankyu · 2025-03-09T23:09:25Z

Thanks for attention and discussion. During the discussion, this PR got improved and put the (now old) CI in good shape.

As this PR is outdated, I now close it.

user202729 · 2025-03-10T04:22:40Z

If you want to keep the CI you can just put commits to revert the old ones, can't you?

If we agree in principle that things supported should be tested, then there would be nothing wrong with doing that. (on the other hand if the old build system is to be removed really soon and/or _prereq = standard it doesn't matter that much, but will it?)

kwankyu · 2025-03-10T04:38:20Z

If you want to keep the CI you can just put commits to revert the old ones, can't you?

If you mean by "the old ones" the commits of the merged PR, yes that is what I will do in a new PR.

However, as the other PR got merged just now, I won't hurry to propose a PR reverting it.

kwankyu · 2025-03-10T04:51:16Z

... if the old build system is to be removed really soon and/or _prereq = standard it doesn't matter that much,

Right.

but will it?

I don't think so. To aim for it, we need community-wide approval. Even if we aim for it, we need a long deprecation period.

tobiasdiez · 2025-03-10T17:39:21Z

I will prepare a new PR restoring the removed CI parts based on this PR. However, as there is no enough support for my view on how to maintain the CI infrastructure, I won't set it for needs review anytime soon.

As I've said before, I'm not against restoring parts of the CI that were removed in the other PR if all tests pass and someone is interested in the results (and keeps watching them).

To gauge interest, I suggest you run the "to-be-restored part" (say the optional tests) and then track all occurring problems in new issues. Once the majority of these issues are fixed, we can reactivate the corresponding part in the CI of the sage repo. Would that be an acceptable path forward for you?

kwankyu · 2025-03-10T23:01:33Z

As I've said before, I'm not against restoring parts of the CI that were removed in the other PR if all tests pass and someone is interested in the results (and keeps watching them).

Sage is a volunteer project. No one is obliged to keep watching CI results. The CI infrastructure is there because the project needs it. As long as it provides reliable info (whether the tests pass or not), the info may be used by someone to fix the failed platform.

Do we remove some code from the sage library if we suspect that the code is never used in the last 6 months?

Do we have "Build & Test using Conda" running for every PR because you keep watching all of them? By the way, is it providing reliable and useful info to the PR author?

... Would that be an acceptable path forward for you?

No.

kwankyu changed the title ~~Add comments untangling complicated code~~ Add comments untangling obscure code in a few build-related files Nov 20, 2024

kwankyu force-pushed the p/add-comments-to-scripts branch from ff6bdcd to 7da5efe Compare November 20, 2024 13:47

kwankyu changed the title ~~Add comments untangling obscure code in a few build-related files~~ Add comments untangling obscure code in a few CI-related files Nov 21, 2024

Add comments

6df7a6a

kwankyu force-pushed the p/add-comments-to-scripts branch from db666ee to 6df7a6a Compare February 13, 2025 08:35

kwankyu changed the title ~~Add comments untangling obscure code in a few CI-related files~~ Add comments untangling obscure code in CI-related files Feb 13, 2025

kwankyu added 2 commits February 13, 2025 18:46

Add logs_artifact_postfix input argument for docker.yml

871934b

Removed gentoo platform from CI

daff1ac

kwankyu changed the title ~~Add comments untangling obscure code in CI-related files~~ Preliminary steps to save the CI infrastructure Feb 13, 2025

kwankyu marked this pull request as ready for review February 13, 2025 10:12

github-actions bot added the s: needs review label Feb 13, 2025

kwankyu mentioned this pull request Feb 13, 2025

Disable broken and outdated CI #39467

Merged

5 tasks

user202729 reviewed Feb 13, 2025

View reviewed changes

.github/workflows/docker.yml Outdated Show resolved Hide resolved

Avoid semantic change in bash command

06c11b7

dimpase added the disputed PR is waiting for community vote, see https://groups.google.com/g/sage-devel/c/IgBYUJl33SQ label Feb 13, 2025

kwankyu added 2 commits February 14, 2025 13:58

Refactor default inputs

557c69e

Use standard image for optional and experimental

3cfeeca

kwankyu marked this pull request as draft February 15, 2025 00:39

github-actions bot removed the s: needs review label Feb 15, 2025

kwankyu added 9 commits February 16, 2025 11:08

Update DEFAULT_SYSTEM_FACTORS

84d04df

Turned off standard-constraints_pkgs-norequirements

81ddcef

Fix tyopos in portability_testing.rst

fccfaea

Adjust jobs order

30085e4

Fix malformatted list

be82775

Update the documentation

5498aa8

More edits of the doc

f4e765a

For debugging

63dd959

Update linux mint versions

c7a2750

kwankyu added 2 commits March 5, 2025 17:58

strip may not be available on some systems

1292781

Restore cancellability of long running jobs

527a5ae

kwankyu added s: positive review and removed s: needs review labels Mar 9, 2025

tobiasdiez removed the s: positive review label Mar 9, 2025

kwankyu closed this Mar 9, 2025

This was referenced Mar 17, 2025

Restore CI Linux made useful #39729

Draft

CI errors for macos and incremental #39727

Open

Uh oh!

Preliminary steps to save the CI infrastructure #39009

Preliminary steps to save the CI infrastructure #39009

Uh oh!

Conversation

kwankyu commented Nov 20, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

📝 Checklist

⌛ Dependencies

Uh oh!

github-actions bot commented Nov 20, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

dimpase commented Feb 13, 2025

Uh oh!

tobiasdiez commented Mar 5, 2025

Uh oh!

kwankyu commented Mar 5, 2025

Uh oh!

kwankyu commented Mar 5, 2025

Uh oh!

kwankyu commented Mar 6, 2025

Uh oh!

user202729 commented Mar 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dimpase commented Mar 6, 2025

Uh oh!

kwankyu commented Mar 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dimpase commented Mar 6, 2025

Uh oh!

kwankyu commented Mar 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kwankyu commented Mar 9, 2025

Uh oh!

tobiasdiez commented Mar 9, 2025

Uh oh!

user202729 commented Mar 9, 2025

Uh oh!

tobiasdiez commented Mar 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

user202729 commented Mar 9, 2025

Uh oh!

tobiasdiez commented Mar 9, 2025

Uh oh!

kwankyu commented Mar 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kwankyu commented Mar 9, 2025

Uh oh!

user202729 commented Mar 10, 2025

Uh oh!

kwankyu commented Mar 10, 2025

Uh oh!

kwankyu commented Mar 10, 2025

Uh oh!

tobiasdiez commented Mar 10, 2025

Uh oh!

kwankyu commented Mar 10, 2025

Uh oh!

Uh oh!

kwankyu commented Nov 20, 2024 •

edited

Loading

github-actions bot commented Nov 20, 2024 •

edited

Loading

user202729 commented Mar 6, 2025 •

edited

Loading

kwankyu commented Mar 6, 2025 •

edited

Loading

kwankyu commented Mar 6, 2025 •

edited

Loading

tobiasdiez commented Mar 9, 2025 •

edited

Loading

kwankyu commented Mar 9, 2025 •

edited

Loading