Improve ginkgo test failure readability. #1999

joestringer · 2017-11-09T01:49:23Z

Split out from original patch in #1975 so that this change could be independently reviewed and debated.

One of the pain points for developers using the runtime testsuites is that the directly relevant information about the failure is split in half, with a giant Cilium log in the middle, eg:

------------------------------
• Failure [33.489 seconds]
RunPolicies
/home/joe/work/src/github.com/cilium/cilium/test/runtime/Policies.go:340
  L4Policy Checks [It]
  /home/joe/work/src/github.com/cilium/cilium/test/runtime/Policies.go:524

  Client 'app1' can't ping to server '10.15.13.37'
  Expected
      <bool>: false
  to be true

  /home/joe/work/src/github.com/cilium/cilium/test/runtime/Policies.go:529
------------------------------
level=info msg=Starting test=RunPolicies
level=info msg="Docker: set target to 'runtime'" test=RunPolicies
level=info msg="Cilium: set target to 'runtime'" test=RunPolicies
level=info msg="Cilium status is true" test=RunPolicies
level=info msg="Endpoints are not ready valid='4' invalid='2'" EndpointWaitReady= test=RunPolicies
level=info msg="PolicyImport: /vagrant//runtime/manifests/Policies-l4-policy.json and current policy revision is '7'" test=RunPolicies
level=info msg="PolicyImport: finished '/vagrant//runtime/manifests/Policies-l4-policy.json' with revision '8'" test=RunPolicies
STEP: Client 'app1' pinging server 'httpd1' IPv4
StackTrace Begin

*** Cilium log ***

ENDPOINT   POLICY        IDENTITY   LABELS (source:key[=value])   IPv6            IPv4            STATUS
           ENFORCEMENT 
3978       Disabled      274        container:id.app2             f00d::a0f:0:0:f8a    10.15.251.95    ready
4314       Disabled      278        container:id.app3             f00d::a0f:0:0:10da   10.15.25.58     ready
15124      Disabled      279        container:id.httpd3           f00d::a0f:0:0:3b14   10.15.195.213   ready
                                    container:id.service1 
25729      Disabled      275        container:id.app1             f00d::a0f:0:0:6481   10.15.101.61    ready
48896      Enabled       276        container:id.httpd1           f00d::a0f:0:0:bf00   10.15.13.37     ready
                                    container:id.service1 
60670      Enabled       277        container:id.httpd2           f00d::a0f:0:0:ecfe   10.15.167.158   ready
                                    container:id.service1 
StackTrace Ends
SSSSSSSS

Summarizing 1 Failure:

[Fail] RunPolicies [It] L4Policy Checks 
/home/joe/work/src/github.com/cilium/cilium/test/runtime/Policies.go:529

Ran 1 of 41 Specs in 33.491 seconds
FAIL! -- 0 Passed | 1 Failed | 0 Pending | 40 Skipped --- FAIL: TestTest (33.49s)
FAIL

Ginkgo ran 1 suite in 35.326556304s
Test Suite Failed

When running the ginkgo tests locally, the Cilium logs get in the way of seeing the actual error, for example Expected <bool> false to be true. In a lot of cases, the error itself may be revealing enough so that the developer working on the test can understand what the problem is without looking through the Cilium logs. Furthermore, when running locally, it's much easier for a developer to open the Cilium log separately in an editor that allows better search, highlight, etc. capabilities.

There is currently a ginkgo PR open (onsi/ginkgo#383) to allow us to restrict the length of these Cilium logs to only those since the beginning of the current testrun (or, a particular point in time). This will independently be useful to allow us to filter out irrelevant logs from previous test runs. However, even with this kind of functionality, if we continue to print the logs directly to the terminal upon failure, then it is still possible for the Cilium logs to be very long, in which case we still have the developer pain point described above.

The proposed patch here replaces the printout of the actual log with a pointer about how to get the logs. It could be extended so that when running in a Jenkins environment, it gives different instructions (perhaps even with a link to the relevant path to fetch the logs, for example).

When developers are trying to quickly iterate on tests or determine the cause of runtime test failures, the Cilium logs are important but not the first port of call. With long-running VMs, the Cilium log can be thousands of lines of output between the actual error reported by Ginkgo and where the terminal stops. Rather than print the entire Cilium log, just print a simple message describing how the developer/tester can retrieve the Cilium logs, if necessary. Signed-off-by: Joe Stringer <[email protected]>

joestringer · 2017-11-09T01:51:45Z

@eloycoto I welcome some discussion of what you'd like to see in terms of making the Cilium logs accessible in the CI, I think @ianvernon also had some discussion that they should be separate artifact files.

ianvernon · 2017-11-09T21:19:29Z

My two cents; I am against dumping a bunch of stuff to console. It makes it hard to parse. Having the output of individual commands written to individual files / having logs published into artifacts that are picked up by Jenkins is much easier. Jenkins should specify what test failed, and not clog console with a bunch of metadata, especially since the Cilium logs can be quite verbose as we run the daemon in debug mode in the CI.

aanm · 2017-11-09T23:34:56Z

I think this can be solved if you print the journalctl with this command

journalctl -au cilium | grep -Ei "err\|warn" -A 1 -B 2 | head -n 20 this might cover 90% of the cases where the test fails and the developer will immediately see the reason for it

Let's discuss it first before ACK

joestringer · 2017-11-10T01:18:36Z

The suggestion from @aanm sounds like a pretty reasonable middle ground to me.

eloycoto · 2017-11-10T09:41:38Z

@aanm suggestion should be ok for the small test, but in case of large test does it work? I mean for RunPolicies testcase for example.

I think that we should consider what Ian said, and export the full logs too. At the moment we have a helper function to know if it is running on Jenkins helpers.IsRunningOnJenkins() so report log to artifacts should be easy.

aanm · 2017-11-10T11:30:46Z

@eloycoto my suggestion was for the "grep" to only be printed and the full log to be stored

tgraf · 2017-11-14T00:46:00Z

@eloycoto @joestringer @aanm @ianvernon What's the consensus here?

ianvernon · 2017-11-14T01:24:05Z

I'm fine with having a subset of the logs from the test failure for now. However, we need all logs in the case that a test fails in case the subset provided isn't enough. #2026 covers all logs we need from tests.

aanm

@joestringer grep command it is?

ianvernon · 2017-11-14T17:16:30Z

@joestringer grep command it is?

SGTM

joestringer · 2017-12-06T01:16:06Z

Superseded by cc12604.

joestringer added area/CI Continuous Integration testing issue or flake kind/enhancement This would improve or streamline existing functionality. pending-review labels Nov 9, 2017

joestringer requested review from eloycoto and ianvernon November 9, 2017 01:49

joestringer requested a review from a team as a code owner November 9, 2017 01:49

aanm previously approved these changes Nov 9, 2017

View reviewed changes

aanm requested changes Nov 14, 2017

View reviewed changes

joestringer added wip and removed pending-review labels Nov 16, 2017

joestringer closed this Dec 6, 2017

joestringer deleted the submit/ginkgo-logs branch December 6, 2017 01:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Improve ginkgo test failure readability. #1999

Improve ginkgo test failure readability. #1999

Uh oh!

joestringer commented Nov 9, 2017

Uh oh!

joestringer commented Nov 9, 2017

Uh oh!

ianvernon commented Nov 9, 2017

Uh oh!

aanm commented Nov 9, 2017

Uh oh!

joestringer commented Nov 10, 2017

Uh oh!

eloycoto commented Nov 10, 2017

Uh oh!

aanm commented Nov 10, 2017

Uh oh!

tgraf commented Nov 14, 2017

Uh oh!

ianvernon commented Nov 14, 2017

Uh oh!

aanm left a comment

Uh oh!

ianvernon commented Nov 14, 2017

Uh oh!

joestringer commented Dec 6, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Uh oh!

Improve ginkgo test failure readability. #1999

Improve ginkgo test failure readability. #1999

Uh oh!

Conversation

joestringer commented Nov 9, 2017

Uh oh!

joestringer commented Nov 9, 2017

Uh oh!

ianvernon commented Nov 9, 2017

Uh oh!

aanm commented Nov 9, 2017

Uh oh!

joestringer commented Nov 10, 2017

Uh oh!

eloycoto commented Nov 10, 2017

Uh oh!

aanm commented Nov 10, 2017

Uh oh!

tgraf commented Nov 14, 2017

Uh oh!

ianvernon commented Nov 14, 2017

Uh oh!

aanm left a comment

Choose a reason for hiding this comment

Uh oh!

ianvernon commented Nov 14, 2017

Uh oh!

joestringer commented Dec 6, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants