-
Notifications
You must be signed in to change notification settings - Fork 6
Minimize enable-run-disable overhead #13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
In the spirit of minimizing variance, this reduces the overhead of Measure() from ~300 instructions to ~8 on amd64. In practice the overhead will be a little bit higher due to the stack frame management of whatever you are calling. Be sure to annotate the called function with `//go:nosplit` if you really want to get it down to something with minimal variance. One consideration for this PR is that it does the thing you should never do: "ignore error returns". I believe it is justifiable in this case, since we call Disable() and Reset() on those perf file descriptors just before going into the path where we do not check errors, so I do not believe these should fail under ordinary circumstances. Likely, if there is a failure, the subsequent ReadCounter will fail anyway. I will not be offended if you have a reason to turn down this contribution. diff --git a/perf.go b/perf.go index f58c25d..6538f6c 100644 --- a/perf.go +++ b/perf.go @@ -270,15 +270,9 @@ func (ev *Event) Measure(f func()) (Count, error) { if err := ev.Reset(); err != nil { return Count{}, err } - if err := ev.Enable(); err != nil { - return Count{}, err - } - f() + doEnableRunDisable(uintptr(ev.perffd), f) - if err := ev.Disable(); err != nil { - return Count{}, err - } return ev.ReadCount() } @@ -290,15 +284,9 @@ func (ev *Event) MeasureGroup(f func()) (GroupCount, error) { if err := ev.Reset(); err != nil { return GroupCount{}, err } - if err := ev.Enable(); err != nil { - return GroupCount{}, err - } - f() + doEnableRunDisable(uintptr(ev.perffd), f) - if err := ev.Disable(); err != nil { - return GroupCount{}, err - } return ev.ReadGroupCount() } diff --git a/perf_amd64.go b/perf_amd64.go new file mode 100644 index 0000000..7006271 --- /dev/null +++ b/perf_amd64.go @@ -0,0 +1,7 @@ +package perf + +// doEnableRunDisable enables the counters, executes f, and disables them. It is +// implemented in assembly to minimize non-deterministic overhead. It is assumed +// that perfFD is known to be a valid file descriptor at the time of the call, +// no error checking occurs. +func doEnableRunDisable(perfFD uintptr, f func()) \ No newline at end of file diff --git a/perf_amd64.s b/perf_amd64.s new file mode 100644 index 0000000..b9aeb36 --- /dev/null +++ b/perf_amd64.s @@ -0,0 +1,23 @@ +#include "textflag.h" + +#define SYS_IOCTL 16 +#define PERF_EVENT_IOC_ENABLE 0x2400 +#define PERF_EVENT_IOC_DISABLE 0x2401 + +TEXT ·doEnableRunDisable(SB),NOSPLIT,$0-16 + MOVQ fd+0(FP), DI + MOVQ $PERF_EVENT_IOC_ENABLE, SI + MOVQ $SYS_IOCTL, AX + SYSCALL + + // Overhead: + MOVQ f+8(FP), DX // 1 + MOVQ 0(DX), AX // 2 + CALL AX // 3, 4 (RET on the other side) + + MOVQ fd+0(FP), DI // 5 + MOVQ $PERF_EVENT_IOC_DISABLE, SI // 6 + MOVQ $SYS_IOCTL, AX // 7 + SYSCALL // 8 + + RET diff --git a/perf_generic.go b/perf_generic.go new file mode 100644 index 0000000..73357dd --- /dev/null +++ b/perf_generic.go @@ -0,0 +1,17 @@ +//+build !amd64 + +package perf + +import ( + "golang.org/x/sys/unix" +) + +// doEnableRunDisable enables the counters, executes f, and disables them. Where +// possible it is implemented in assembly to minimize non-deterministic +// overhead. It is assumed that perfFD is known to be a valid file descriptor at +// the time of the call, no error checking occurs. +func doEnableRunDisable(fd uintptr, f func()) { + unix.Syscall(unix.SYS_IOCTL, fd, uintptr(unix.PERF_EVENT_IOC_ENABLE), 0) + f() + unix.Syscall(unix.SYS_IOCTL, fd, uintptr(unix.PERF_EVENT_IOC_DISABLE), 0) +}
acln0
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good. Great idea. I want to merge, and I don't see any downsides. If you want to update the branch to use RawSyscall, please do, otherwise I will do it myself when I context switch into merging.
Thanks.
Signed-off-by: Andrei Tudor Călin <[email protected]>
|
Leaving open until we figure out the PC difference in the failing test. |
The rule I was taught by another developer was "use Fatal if there is nothing
useful you can further do in the test, otherwise use Error". This test in
particular does do some useful things, it logs the stacks that it observed,
which immediately makes the problem clear to me.
Before we'd just get the first message, now we get all of this:
```
--- FAIL: TestReadRecord/SampleTracepointStack (0.04s)
record_test.go:1101: Go (0x50a0ff) and kernel (0x513eff) PC differ
record_test.go:1101: Go (0x513eff) and kernel (0x5043fb) PC differ
record_test.go:1117: kernel callchain:
record_test.go:1110: 0xfffffffffffffe00 <nil>
record_test.go:1113: 0x5043fb /home/pwaller/go/pkg/mod/golang.org/x/[email protected]/unix/asm_linux_amd64.s:52 golang.org/x/sys/unix.RawSyscallNoError
record_test.go:1113: 0x513eff /home/pwaller/.local/src/acln.ro/perf/perf_amd64.s:18 acln.ro/perf.doEnableRunDisable
record_test.go:1113: 0x52681b /home/pwaller/.local/src/acln.ro/perf/record_test.go:1070 acln.ro/perf_test.testSampleTracepointStack
record_test.go:1113: 0x4c0bb0 /snap/go/4098/src/testing/testing.go:868 testing.tRunner
record_test.go:1113: 0x45aa41 /snap/go/4098/src/runtime/asm_amd64.s:1338 runtime.goexit
record_test.go:1122:
record_test.go:1124: Go stack:
record_test.go:1113: 0x513eff /home/pwaller/.local/src/acln.ro/perf/perf_amd64.s:18 acln.ro/perf.doEnableRunDisable
record_test.go:1113: 0x50a0ff /home/pwaller/.local/src/acln.ro/perf/perf.go:276 acln.ro/perf.(*Event).Measure
record_test.go:1113: 0x52681b /home/pwaller/.local/src/acln.ro/perf/record_test.go:1070 acln.ro/perf_test.testSampleTracepointStack
record_test.go:1113: 0x4c0bb0 /snap/go/4098/src/testing/testing.go:868 testing.tRunner
record_test.go:1113: 0x45aa41 /snap/go/4098/src/runtime/asm_amd64.s:1338 runtime.goexit
```
---
I have looked at the usage of Fatalf elsewhere in the tests and decided there
were too many of them for me to go and update them all quickly and without
making a mistake late a night :)
Updates acln0#13.
This fixes the SampleTracepointStack test, because the kernel stack trace wasn't able to 'see' the caller of doEnableRunDisable since we weren't maintaining the frame pointer. My original rationale for using NOSPLIT was to reduce the overhead of calling doEnableRunDisable, but that doesn't make sense since we're called with the counters inhibited anyway, so any stack splitting code won't be counted. The thing which needs to be NOSPLIT is the `f` to be called, which is under the control of the user.
|
The PC difference issue is fixed in cf4a345, with complete description in the commit. I had borked the kernel stack trace machinery by not maintaining the frame pointer. |
|
Fantastic. I think this is ready to merge now. |
The rule I was taught by another developer was "use Fatal if there is nothing
useful you can further do in the test, otherwise use Error". This test in
particular does do some useful things, it logs the stacks that it observed,
which immediately makes the problem clear to me.
Before we'd just get the first message, now we get all of this:
```
--- FAIL: TestReadRecord/SampleTracepointStack (0.04s)
record_test.go:1101: Go (0x50a0ff) and kernel (0x513eff) PC differ
record_test.go:1101: Go (0x513eff) and kernel (0x5043fb) PC differ
record_test.go:1117: kernel callchain:
record_test.go:1110: 0xfffffffffffffe00 <nil>
record_test.go:1113: 0x5043fb /home/pwaller/go/pkg/mod/golang.org/x/[email protected]/unix/asm_linux_amd64.s:52 golang.org/x/sys/unix.RawSyscallNoError
record_test.go:1113: 0x513eff /home/pwaller/.local/src/acln.ro/perf/perf_amd64.s:18 acln.ro/perf.doEnableRunDisable
record_test.go:1113: 0x52681b /home/pwaller/.local/src/acln.ro/perf/record_test.go:1070 acln.ro/perf_test.testSampleTracepointStack
record_test.go:1113: 0x4c0bb0 /snap/go/4098/src/testing/testing.go:868 testing.tRunner
record_test.go:1113: 0x45aa41 /snap/go/4098/src/runtime/asm_amd64.s:1338 runtime.goexit
record_test.go:1122:
record_test.go:1124: Go stack:
record_test.go:1113: 0x513eff /home/pwaller/.local/src/acln.ro/perf/perf_amd64.s:18 acln.ro/perf.doEnableRunDisable
record_test.go:1113: 0x50a0ff /home/pwaller/.local/src/acln.ro/perf/perf.go:276 acln.ro/perf.(*Event).Measure
record_test.go:1113: 0x52681b /home/pwaller/.local/src/acln.ro/perf/record_test.go:1070 acln.ro/perf_test.testSampleTracepointStack
record_test.go:1113: 0x4c0bb0 /snap/go/4098/src/testing/testing.go:868 testing.tRunner
record_test.go:1113: 0x45aa41 /snap/go/4098/src/runtime/asm_amd64.s:1338 runtime.goexit
```
---
I have looked at the usage of Fatalf elsewhere in the tests and decided there
were too many of them for me to go and update them all quickly and without
making a mistake late a night :)
Updates #13.
In the spirit of minimizing variance, this reduces the overhead of Measure()
from ~300 instructions to ~8 on amd64. In practice the overhead will be a
little bit higher due to the stack frame management of whatever you are
calling. Be sure to annotate the called function with
//go:nosplitif youreally want to get it down to something with minimal variance.
One consideration for this PR is that it does the thing you should never do:
"ignore error returns". I believe it is justifiable in this case, since we call
Disable() and Reset() on those perf file descriptors just before going into the
path where we do not check errors, so I do not believe these should fail under
ordinary circumstances. Likely, if there is a failure, the subsequent
ReadCounter will fail anyway.
I will not be offended if you have a reason to turn down this contribution.
Note, as it stands this has a failing test, I haven't been able to figure out
why yet, but I have run out of time to look.