Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@joestringer
Copy link
Member

Split out from original patch in #1975 so that this change could be independently reviewed and debated.

One of the pain points for developers using the runtime testsuites is that the directly relevant information about the failure is split in half, with a giant Cilium log in the middle, eg:

------------------------------
• Failure [33.489 seconds]
RunPolicies
/home/joe/work/src/github.com/cilium/cilium/test/runtime/Policies.go:340
  L4Policy Checks [It]
  /home/joe/work/src/github.com/cilium/cilium/test/runtime/Policies.go:524

  Client 'app1' can't ping to server '10.15.13.37'
  Expected
      <bool>: false
  to be true

  /home/joe/work/src/github.com/cilium/cilium/test/runtime/Policies.go:529
------------------------------
level=info msg=Starting test=RunPolicies
level=info msg="Docker: set target to 'runtime'" test=RunPolicies
level=info msg="Cilium: set target to 'runtime'" test=RunPolicies
level=info msg="Cilium status is true" test=RunPolicies
level=info msg="Endpoints are not ready valid='4' invalid='2'" EndpointWaitReady= test=RunPolicies
level=info msg="PolicyImport: /vagrant//runtime/manifests/Policies-l4-policy.json and current policy revision is '7'" test=RunPolicies
level=info msg="PolicyImport: finished '/vagrant//runtime/manifests/Policies-l4-policy.json' with revision '8'" test=RunPolicies
STEP: Client 'app1' pinging server 'httpd1' IPv4
StackTrace Begin

*** Cilium log ***

ENDPOINT   POLICY        IDENTITY   LABELS (source:key[=value])   IPv6            IPv4            STATUS
           ENFORCEMENT 
3978       Disabled      274        container:id.app2             f00d::a0f:0:0:f8a    10.15.251.95    ready
4314       Disabled      278        container:id.app3             f00d::a0f:0:0:10da   10.15.25.58     ready
15124      Disabled      279        container:id.httpd3           f00d::a0f:0:0:3b14   10.15.195.213   ready
                                    container:id.service1 
25729      Disabled      275        container:id.app1             f00d::a0f:0:0:6481   10.15.101.61    ready
48896      Enabled       276        container:id.httpd1           f00d::a0f:0:0:bf00   10.15.13.37     ready
                                    container:id.service1 
60670      Enabled       277        container:id.httpd2           f00d::a0f:0:0:ecfe   10.15.167.158   ready
                                    container:id.service1 
StackTrace Ends
SSSSSSSS

Summarizing 1 Failure:

[Fail] RunPolicies [It] L4Policy Checks 
/home/joe/work/src/github.com/cilium/cilium/test/runtime/Policies.go:529

Ran 1 of 41 Specs in 33.491 seconds
FAIL! -- 0 Passed | 1 Failed | 0 Pending | 40 Skipped --- FAIL: TestTest (33.49s)
FAIL

Ginkgo ran 1 suite in 35.326556304s
Test Suite Failed

When running the ginkgo tests locally, the Cilium logs get in the way of seeing the actual error, for example Expected <bool> false to be true. In a lot of cases, the error itself may be revealing enough so that the developer working on the test can understand what the problem is without looking through the Cilium logs. Furthermore, when running locally, it's much easier for a developer to open the Cilium log separately in an editor that allows better search, highlight, etc. capabilities.

There is currently a ginkgo PR open (onsi/ginkgo#383) to allow us to restrict the length of these Cilium logs to only those since the beginning of the current testrun (or, a particular point in time). This will independently be useful to allow us to filter out irrelevant logs from previous test runs. However, even with this kind of functionality, if we continue to print the logs directly to the terminal upon failure, then it is still possible for the Cilium logs to be very long, in which case we still have the developer pain point described above.

The proposed patch here replaces the printout of the actual log with a pointer about how to get the logs. It could be extended so that when running in a Jenkins environment, it gives different instructions (perhaps even with a link to the relevant path to fetch the logs, for example).

When developers are trying to quickly iterate on tests or determine the
cause of runtime test failures, the Cilium logs are important but not
the first port of call. With long-running VMs, the Cilium log can be
thousands of lines of output between the actual error reported by Ginkgo
and where the terminal stops.

Rather than print the entire Cilium log, just print a simple message
describing how the developer/tester can retrieve the Cilium logs, if
necessary.

Signed-off-by: Joe Stringer <[email protected]>
@joestringer joestringer added area/CI Continuous Integration testing issue or flake kind/enhancement This would improve or streamline existing functionality. pending-review labels Nov 9, 2017
@joestringer joestringer requested a review from a team as a code owner November 9, 2017 01:49
@joestringer
Copy link
Member Author

@eloycoto I welcome some discussion of what you'd like to see in terms of making the Cilium logs accessible in the CI, I think @ianvernon also had some discussion that they should be separate artifact files.

@ianvernon
Copy link
Member

My two cents; I am against dumping a bunch of stuff to console. It makes it hard to parse. Having the output of individual commands written to individual files / having logs published into artifacts that are picked up by Jenkins is much easier. Jenkins should specify what test failed, and not clog console with a bunch of metadata, especially since the Cilium logs can be quite verbose as we run the daemon in debug mode in the CI.

aanm
aanm previously approved these changes Nov 9, 2017
@aanm
Copy link
Member

aanm commented Nov 9, 2017

I think this can be solved if you print the journalctl with this command

journalctl -au cilium | grep -Ei "err\|warn" -A 1 -B 2 | head -n 20 this might cover 90% of the cases where the test fails and the developer will immediately see the reason for it

@aanm aanm dismissed their stale review November 9, 2017 23:35

Let's discuss it first before ACK

@joestringer
Copy link
Member Author

The suggestion from @aanm sounds like a pretty reasonable middle ground to me.

@eloycoto
Copy link
Member

@aanm suggestion should be ok for the small test, but in case of large test does it work? I mean for RunPolicies testcase for example.

I think that we should consider what Ian said, and export the full logs too. At the moment we have a helper function to know if it is running on Jenkins helpers.IsRunningOnJenkins() so report log to artifacts should be easy.

@aanm
Copy link
Member

aanm commented Nov 10, 2017

@eloycoto my suggestion was for the "grep" to only be printed and the full log to be stored

@tgraf
Copy link
Member

tgraf commented Nov 14, 2017

@eloycoto @joestringer @aanm @ianvernon What's the consensus here?

@ianvernon
Copy link
Member

I'm fine with having a subset of the logs from the test failure for now. However, we need all logs in the case that a test fails in case the subset provided isn't enough. #2026 covers all logs we need from tests.

Copy link
Member

@aanm aanm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@joestringer grep command it is?

@ianvernon
Copy link
Member

@joestringer grep command it is?

SGTM

@joestringer
Copy link
Member Author

Superseded by cc12604.

@joestringer joestringer closed this Dec 6, 2017
@joestringer joestringer deleted the submit/ginkgo-logs branch December 6, 2017 01:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/CI Continuous Integration testing issue or flake kind/enhancement This would improve or streamline existing functionality.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants