Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@klihub
Copy link
Contributor

@klihub klihub commented Feb 3, 2022

Fix a number of problems related to CI tests. Huge thanks to @haircommander in helping me narrow down the test setup ordering problem:

  • fix incorrect test setup order in cgroups BATS test
  • force BATS /usr/bin symlink in place (in case CI worker node has distro-specific BATS installed)
  • fix test cases trying to dereference unset variables (recent BATS runs tests with set -u)

What type of PR is this?

/kind failing-test

What this PR does / why we need it:

An incorrect order of calling setup_test and evaluating $TESTDIR causes cgroups test cases to share supposedly private test files and random test failures. This typically manifests itself as a conflict between the ctr with swap should be configured and ctr with swap should fail when swap is lower tests, with either one failing with a seemingly impossible condition. In reality this is caused by one of those test cases overwriting the container configuration input file generated by the other, before that file could be taken into use by the failing test case.

Which issue(s) this PR fixes:

None

Special notes for your reviewer:

Does this PR introduce a user-facing change?

None

@klihub klihub requested review from mrunalp and runcom as code owners February 3, 2022 20:11
@openshift-ci openshift-ci bot added release-note-none Denotes a PR that doesn't merit a release note. kind/bug Categorizes issue or PR as related to a bug. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. labels Feb 3, 2022
@klihub klihub force-pushed the fixes/cgroups-test-setup branch from ebb0aa1 to faceaf3 Compare February 3, 2022 20:15
@haircommander
Copy link
Member

/approve

LGTM, thanks @klihub ! @cri-o/cri-o-maintainers PTAL

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 3, 2022
@klihub klihub force-pushed the fixes/cgroups-test-setup branch from faceaf3 to 65ab94b Compare February 3, 2022 20:20
@codecov
Copy link

codecov bot commented Feb 3, 2022

Codecov Report

Merging #5596 (1855da9) into main (6c92b67) will increase coverage by 0.10%.
The diff coverage is n/a.

❗ Current head 1855da9 differs from pull request most recent head b6387cc. Consider uploading reports for the commit b6387cc to get more accurate results

@@            Coverage Diff             @@
##             main    #5596      +/-   ##
==========================================
+ Coverage   43.07%   43.18%   +0.10%     
==========================================
  Files         123      123              
  Lines       12295    12267      -28     
==========================================
+ Hits         5296     5297       +1     
+ Misses       6491     6462      -29     
  Partials      508      508              

@klihub klihub force-pushed the fixes/cgroups-test-setup branch from 65ab94b to acc76a6 Compare February 3, 2022 21:59
Copy link
Contributor

@fgiudici fgiudici left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@klihub , nice catch, thank you!
Overall LGTM.
Seems to me the check-bats-setup.sh script will just check that the first occurrence of setup_test comes before the first usage of TESTDIR in the file. That may help, but will not check if this is true for all the tests: it will work only in the bats file where the setup_test is in the function_setup() (which honestly is always the case but for the apparmor.bats file). While this doesn't harm and may just help, having a more generic check may be a better option.

@klihub klihub force-pushed the fixes/cgroups-test-setup branch 3 times, most recently from edb8f03 to 076a7ff Compare February 4, 2022 16:54
@klihub
Copy link
Contributor Author

klihub commented Feb 5, 2022

/retest

1 similar comment
@klihub
Copy link
Contributor Author

klihub commented Feb 5, 2022

/retest

@klihub klihub force-pushed the fixes/cgroups-test-setup branch 7 times, most recently from 4d46651 to 4ef37af Compare February 7, 2022 11:14
Copy link
Contributor

@fgiudici fgiudici left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @klihub , the changes lgtm!
I found few places where we do double checks if the vars are set, but didn't spot any missing one, thank you!

@klihub klihub force-pushed the fixes/cgroups-test-setup branch from 4ef37af to f345107 Compare February 7, 2022 12:54
@haircommander
Copy link
Member

thanks for working through this @klihub

@kolyshkin
Copy link
Collaborator

@klihub have you found where in CI we still have old bats?

@klihub
Copy link
Contributor Author

klihub commented Feb 16, 2022

@klihub have you found where in CI we still have old bats?

@kolyshkin At least these are running older versions:

  • ci/openshift-jenkins/integration_crun_cgroupv2: ++ time bats --jobs 16 --tap . (using Bats 1.2.0-dev)
  • ci/openshift-jenkins/integration_crun : ++ time bats --jobs 16 --tap . (using Bats 1.2.1)
  • ci/openshift-jenkins/integration_fedora: ++ time bats --jobs 16 --tap . (using Bats 1.2.1)

I suspect ci/openshift-jenkins/integration_rhel is the same, but looks like this time it failed earlier so the corresponding logs were not collected.

@klihub klihub force-pushed the fixes/cgroups-test-setup branch 5 times, most recently from b66ebac to f8ad0b4 Compare February 20, 2022 21:06
@klihub
Copy link
Contributor Author

klihub commented Feb 21, 2022

@klihub have you found where in CI we still have old bats?

@kolyshkin At least these are running older versions:

  • ci/openshift-jenkins/integration_crun_cgroupv2: ++ time bats --jobs 16 --tap . (using Bats 1.2.0-dev)
  • ci/openshift-jenkins/integration_crun : ++ time bats --jobs 16 --tap . (using Bats 1.2.1)
  • ci/openshift-jenkins/integration_fedora: ++ time bats --jobs 16 --tap . (using Bats 1.2.1)

I suspect ci/openshift-jenkins/integration_rhel is the same, but looks like this time it failed earlier so the corresponding logs were not collected.

@kolyshkin Looks like the real problematic ones run the really bad combination of CentOS-7 using Bats 1.2.1 (dcaec03e32e0b152f8ef9cf14b75296cf5caeaff), bash 4.2.46(2)-release. Bats version is insignificant here, because even the latest one is broken with that bash version (pre-4.4) if test set -u. I've filed a fix for it but have no idea how long it might take to get accepted. Meanwhile the only other alternatives are to 1) update bash on those hosts (which might mean compiling and installing from sources), or 2) only turn on set -u if bash is new enough. I chose to go with the latter.

@klihub klihub force-pushed the fixes/cgroups-test-setup branch from f8ad0b4 to 7a567b1 Compare February 21, 2022 11:28
@klihub
Copy link
Contributor Author

klihub commented Feb 21, 2022

/test e2e_features_fedora

@klihub klihub force-pushed the fixes/cgroups-test-setup branch from 7a567b1 to 9527d78 Compare February 21, 2022 18:19
@kolyshkin
Copy link
Collaborator

@klihub can you please rebase this (ideally once #5663 is merged)?

@kolyshkin
Copy link
Collaborator

/hold

@openshift-ci openshift-ci bot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Feb 22, 2022
@klihub klihub force-pushed the fixes/cgroups-test-setup branch from 9527d78 to 66f01e5 Compare February 22, 2022 08:10
@openshift-ci openshift-ci bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 22, 2022
@klihub
Copy link
Contributor Author

klihub commented Feb 22, 2022

@klihub can you please rebase this (ideally once #5663 is merged)?

@kolyshkin Rebased on latest main, #5663 included.

@klihub klihub force-pushed the fixes/cgroups-test-setup branch 2 times, most recently from 203a347 to 0728485 Compare February 22, 2022 08:57
Don't dereference $TESTDATA before the test case is set up.

Signed-off-by: Krisztian Litkey <[email protected]>
Run tests with 'set -u' treating any substitution of unset
variables (other than $* and $@) as errors. Try to fix all
existing test cases to adhere to this strict shell behavior.

Signed-off-by: Krisztian Litkey <[email protected]>
@klihub klihub force-pushed the fixes/cgroups-test-setup branch from 1855da9 to b6387cc Compare February 24, 2022 19:33
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Feb 24, 2022

@klihub: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-gcp b6387cc link true /test e2e-gcp

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@klihub
Copy link
Contributor Author

klihub commented Apr 28, 2022

This has been split up to multiple PRs and merged piecemeal. I think everything present here has been covered by those. Closing this accordingly...

@klihub klihub closed this Apr 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. kind/bug Categorizes issue or PR as related to a bug. release-note-none Denotes a PR that doesn't merit a release note.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants