Minimize enable-run-disable overhead #13

pwaller · 2019-07-19T21:39:23Z

In the spirit of minimizing variance, this reduces the overhead of Measure()
from ~300 instructions to ~8 on amd64. In practice the overhead will be a
little bit higher due to the stack frame management of whatever you are
calling. Be sure to annotate the called function with //go:nosplit if you
really want to get it down to something with minimal variance.

One consideration for this PR is that it does the thing you should never do:
"ignore error returns". I believe it is justifiable in this case, since we call
Disable() and Reset() on those perf file descriptors just before going into the
path where we do not check errors, so I do not believe these should fail under
ordinary circumstances. Likely, if there is a failure, the subsequent
ReadCounter will fail anyway.

I will not be offended if you have a reason to turn down this contribution.

Note, as it stands this has a failing test, I haven't been able to figure out
why yet, but I have run out of time to look.

--- FAIL: TestReadRecord/SampleTracepointStack (0.02s)
    record_test.go:1099: Go (0x50676f) and kernel (0x51039f) PC differ

In the spirit of minimizing variance, this reduces the overhead of Measure() from ~300 instructions to ~8 on amd64. In practice the overhead will be a little bit higher due to the stack frame management of whatever you are calling. Be sure to annotate the called function with `//go:nosplit` if you really want to get it down to something with minimal variance. One consideration for this PR is that it does the thing you should never do: "ignore error returns". I believe it is justifiable in this case, since we call Disable() and Reset() on those perf file descriptors just before going into the path where we do not check errors, so I do not believe these should fail under ordinary circumstances. Likely, if there is a failure, the subsequent ReadCounter will fail anyway. I will not be offended if you have a reason to turn down this contribution. diff --git a/perf.go b/perf.go index f58c25d..6538f6c 100644 --- a/perf.go +++ b/perf.go @@ -270,15 +270,9 @@ func (ev *Event) Measure(f func()) (Count, error) { if err := ev.Reset(); err != nil { return Count{}, err } - if err := ev.Enable(); err != nil { - return Count{}, err - } - f() + doEnableRunDisable(uintptr(ev.perffd), f) - if err := ev.Disable(); err != nil { - return Count{}, err - } return ev.ReadCount() } @@ -290,15 +284,9 @@ func (ev *Event) MeasureGroup(f func()) (GroupCount, error) { if err := ev.Reset(); err != nil { return GroupCount{}, err } - if err := ev.Enable(); err != nil { - return GroupCount{}, err - } - f() + doEnableRunDisable(uintptr(ev.perffd), f) - if err := ev.Disable(); err != nil { - return GroupCount{}, err - } return ev.ReadGroupCount() } diff --git a/perf_amd64.go b/perf_amd64.go new file mode 100644 index 0000000..7006271 --- /dev/null +++ b/perf_amd64.go @@ -0,0 +1,7 @@ +package perf + +// doEnableRunDisable enables the counters, executes f, and disables them. It is +// implemented in assembly to minimize non-deterministic overhead. It is assumed +// that perfFD is known to be a valid file descriptor at the time of the call, +// no error checking occurs. +func doEnableRunDisable(perfFD uintptr, f func()) \ No newline at end of file diff --git a/perf_amd64.s b/perf_amd64.s new file mode 100644 index 0000000..b9aeb36 --- /dev/null +++ b/perf_amd64.s @@ -0,0 +1,23 @@ +#include "textflag.h" + +#define SYS_IOCTL 16 +#define PERF_EVENT_IOC_ENABLE 0x2400 +#define PERF_EVENT_IOC_DISABLE 0x2401 + +TEXT ·doEnableRunDisable(SB),NOSPLIT,$0-16 + MOVQ fd+0(FP), DI + MOVQ $PERF_EVENT_IOC_ENABLE, SI + MOVQ $SYS_IOCTL, AX + SYSCALL + + // Overhead: + MOVQ f+8(FP), DX // 1 + MOVQ 0(DX), AX // 2 + CALL AX // 3, 4 (RET on the other side) + + MOVQ fd+0(FP), DI // 5 + MOVQ $PERF_EVENT_IOC_DISABLE, SI // 6 + MOVQ $SYS_IOCTL, AX // 7 + SYSCALL // 8 + + RET diff --git a/perf_generic.go b/perf_generic.go new file mode 100644 index 0000000..73357dd --- /dev/null +++ b/perf_generic.go @@ -0,0 +1,17 @@ +//+build !amd64 + +package perf + +import ( + "golang.org/x/sys/unix" +) + +// doEnableRunDisable enables the counters, executes f, and disables them. Where +// possible it is implemented in assembly to minimize non-deterministic +// overhead. It is assumed that perfFD is known to be a valid file descriptor at +// the time of the call, no error checking occurs. +func doEnableRunDisable(fd uintptr, f func()) { + unix.Syscall(unix.SYS_IOCTL, fd, uintptr(unix.PERF_EVENT_IOC_ENABLE), 0) + f() + unix.Syscall(unix.SYS_IOCTL, fd, uintptr(unix.PERF_EVENT_IOC_DISABLE), 0) +}

acln0

This looks good. Great idea. I want to merge, and I don't see any downsides. If you want to update the branch to use RawSyscall, please do, otherwise I will do it myself when I context switch into merging.

Thanks.

perf_generic.go

perf_amd64.go

Signed-off-by: Andrei Tudor Călin <[email protected]>

acln0 · 2019-07-20T10:12:11Z

Leaving open until we figure out the PC difference in the failing test.

The rule I was taught by another developer was "use Fatal if there is nothing useful you can further do in the test, otherwise use Error". This test in particular does do some useful things, it logs the stacks that it observed, which immediately makes the problem clear to me. Before we'd just get the first message, now we get all of this: ``` --- FAIL: TestReadRecord/SampleTracepointStack (0.04s) record_test.go:1101: Go (0x50a0ff) and kernel (0x513eff) PC differ record_test.go:1101: Go (0x513eff) and kernel (0x5043fb) PC differ record_test.go:1117: kernel callchain: record_test.go:1110: 0xfffffffffffffe00 <nil> record_test.go:1113: 0x5043fb /home/pwaller/go/pkg/mod/golang.org/x/[email protected]/unix/asm_linux_amd64.s:52 golang.org/x/sys/unix.RawSyscallNoError record_test.go:1113: 0x513eff /home/pwaller/.local/src/acln.ro/perf/perf_amd64.s:18 acln.ro/perf.doEnableRunDisable record_test.go:1113: 0x52681b /home/pwaller/.local/src/acln.ro/perf/record_test.go:1070 acln.ro/perf_test.testSampleTracepointStack record_test.go:1113: 0x4c0bb0 /snap/go/4098/src/testing/testing.go:868 testing.tRunner record_test.go:1113: 0x45aa41 /snap/go/4098/src/runtime/asm_amd64.s:1338 runtime.goexit record_test.go:1122: record_test.go:1124: Go stack: record_test.go:1113: 0x513eff /home/pwaller/.local/src/acln.ro/perf/perf_amd64.s:18 acln.ro/perf.doEnableRunDisable record_test.go:1113: 0x50a0ff /home/pwaller/.local/src/acln.ro/perf/perf.go:276 acln.ro/perf.(*Event).Measure record_test.go:1113: 0x52681b /home/pwaller/.local/src/acln.ro/perf/record_test.go:1070 acln.ro/perf_test.testSampleTracepointStack record_test.go:1113: 0x4c0bb0 /snap/go/4098/src/testing/testing.go:868 testing.tRunner record_test.go:1113: 0x45aa41 /snap/go/4098/src/runtime/asm_amd64.s:1338 runtime.goexit ``` --- I have looked at the usage of Fatalf elsewhere in the tests and decided there were too many of them for me to go and update them all quickly and without making a mistake late a night :) Updates acln0#13.

This fixes the SampleTracepointStack test, because the kernel stack trace wasn't able to 'see' the caller of doEnableRunDisable since we weren't maintaining the frame pointer. My original rationale for using NOSPLIT was to reduce the overhead of calling doEnableRunDisable, but that doesn't make sense since we're called with the counters inhibited anyway, so any stack splitting code won't be counted. The thing which needs to be NOSPLIT is the `f` to be called, which is under the control of the user.

pwaller · 2019-07-20T20:11:06Z

The PC difference issue is fixed in cf4a345, with complete description in the commit. I had borked the kernel stack trace machinery by not maintaining the frame pointer.

acln0 · 2019-07-20T20:12:56Z

Fantastic. I think this is ready to merge now.

The rule I was taught by another developer was "use Fatal if there is nothing useful you can further do in the test, otherwise use Error". This test in particular does do some useful things, it logs the stacks that it observed, which immediately makes the problem clear to me. Before we'd just get the first message, now we get all of this: ``` --- FAIL: TestReadRecord/SampleTracepointStack (0.04s) record_test.go:1101: Go (0x50a0ff) and kernel (0x513eff) PC differ record_test.go:1101: Go (0x513eff) and kernel (0x5043fb) PC differ record_test.go:1117: kernel callchain: record_test.go:1110: 0xfffffffffffffe00 <nil> record_test.go:1113: 0x5043fb /home/pwaller/go/pkg/mod/golang.org/x/[email protected]/unix/asm_linux_amd64.s:52 golang.org/x/sys/unix.RawSyscallNoError record_test.go:1113: 0x513eff /home/pwaller/.local/src/acln.ro/perf/perf_amd64.s:18 acln.ro/perf.doEnableRunDisable record_test.go:1113: 0x52681b /home/pwaller/.local/src/acln.ro/perf/record_test.go:1070 acln.ro/perf_test.testSampleTracepointStack record_test.go:1113: 0x4c0bb0 /snap/go/4098/src/testing/testing.go:868 testing.tRunner record_test.go:1113: 0x45aa41 /snap/go/4098/src/runtime/asm_amd64.s:1338 runtime.goexit record_test.go:1122: record_test.go:1124: Go stack: record_test.go:1113: 0x513eff /home/pwaller/.local/src/acln.ro/perf/perf_amd64.s:18 acln.ro/perf.doEnableRunDisable record_test.go:1113: 0x50a0ff /home/pwaller/.local/src/acln.ro/perf/perf.go:276 acln.ro/perf.(*Event).Measure record_test.go:1113: 0x52681b /home/pwaller/.local/src/acln.ro/perf/record_test.go:1070 acln.ro/perf_test.testSampleTracepointStack record_test.go:1113: 0x4c0bb0 /snap/go/4098/src/testing/testing.go:868 testing.tRunner record_test.go:1113: 0x45aa41 /snap/go/4098/src/runtime/asm_amd64.s:1338 runtime.goexit ``` --- I have looked at the usage of Fatalf elsewhere in the tests and decided there were too many of them for me to go and update them all quickly and without making a mistake late a night :) Updates #13.

acln0 reviewed Jul 20, 2019

View reviewed changes

perf_generic.go Outdated Show resolved Hide resolved

perf_amd64.go Outdated Show resolved Hide resolved

use syscall.RawSyscall in the generic doEnableRunDisable code path

6e91a1f

Signed-off-by: Andrei Tudor Călin <[email protected]>

pwaller mentioned this pull request Jul 20, 2019

Use Errorf instead of fatal in testSampleTracepointStack #18

Merged

acln0 merged commit 6861f4b into acln0:master Jul 20, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Minimize enable-run-disable overhead #13

Minimize enable-run-disable overhead #13

Uh oh!

pwaller commented Jul 19, 2019 •

edited

Loading

Uh oh!

acln0 left a comment

Uh oh!

Uh oh!

Uh oh!

acln0 commented Jul 20, 2019

Uh oh!

pwaller commented Jul 20, 2019

Uh oh!

acln0 commented Jul 20, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Minimize enable-run-disable overhead #13

Minimize enable-run-disable overhead #13

Uh oh!

Conversation

pwaller commented Jul 19, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

acln0 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

acln0 commented Jul 20, 2019

Uh oh!

pwaller commented Jul 20, 2019

Uh oh!

acln0 commented Jul 20, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

pwaller commented Jul 19, 2019 •

edited

Loading