Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@pwaller
Copy link
Contributor

@pwaller pwaller commented Jul 19, 2019

In the spirit of minimizing variance, this reduces the overhead of Measure()
from ~300 instructions to ~8 on amd64. In practice the overhead will be a
little bit higher due to the stack frame management of whatever you are
calling. Be sure to annotate the called function with //go:nosplit if you
really want to get it down to something with minimal variance.

One consideration for this PR is that it does the thing you should never do:
"ignore error returns". I believe it is justifiable in this case, since we call
Disable() and Reset() on those perf file descriptors just before going into the
path where we do not check errors, so I do not believe these should fail under
ordinary circumstances. Likely, if there is a failure, the subsequent
ReadCounter will fail anyway.

I will not be offended if you have a reason to turn down this contribution.

Note, as it stands this has a failing test, I haven't been able to figure out
why yet, but I have run out of time to look.

--- FAIL: TestReadRecord/SampleTracepointStack (0.02s)
    record_test.go:1099: Go (0x50676f) and kernel (0x51039f) PC differ

In the spirit of minimizing variance, this reduces the overhead of Measure()
from ~300 instructions to ~8 on amd64. In practice the overhead will be a
little bit higher due to the stack frame management of whatever you are
calling. Be sure to annotate the called function with `//go:nosplit` if you
really want to get it down to something with minimal variance.

One consideration for this PR is that it does the thing you should never do:
"ignore error returns". I believe it is justifiable in this case, since we call
Disable() and Reset() on those perf file descriptors just before going into the
path where we do not check errors, so I do not believe these should fail under
ordinary circumstances. Likely, if there is a failure, the subsequent
ReadCounter will fail anyway.

I will not be offended if you have a reason to turn down this contribution.

diff --git a/perf.go b/perf.go index f58c25d..6538f6c 100644 --- a/perf.go +++
b/perf.go @@ -270,15 +270,9 @@ func (ev *Event) Measure(f func()) (Count,
error) { if err := ev.Reset(); err != nil { return Count{}, err }
-	if err := ev.Enable(); err != nil {
-		return Count{}, err
-	}

-	f()
+	doEnableRunDisable(uintptr(ev.perffd), f)

-	if err := ev.Disable(); err != nil {
-		return Count{}, err
-	}
 	return ev.ReadCount()
 }

@@ -290,15 +284,9 @@ func (ev *Event) MeasureGroup(f func()) (GroupCount, error) {
 	if err := ev.Reset(); err != nil {
 		return GroupCount{}, err
 	}
-	if err := ev.Enable(); err != nil {
-		return GroupCount{}, err
-	}

-	f()
+	doEnableRunDisable(uintptr(ev.perffd), f)

-	if err := ev.Disable(); err != nil {
-		return GroupCount{}, err
-	}
 	return ev.ReadGroupCount()
 }

diff --git a/perf_amd64.go b/perf_amd64.go
new file mode 100644
index 0000000..7006271
--- /dev/null
+++ b/perf_amd64.go
@@ -0,0 +1,7 @@
+package perf
+
+// doEnableRunDisable enables the counters, executes f, and disables them. It is
+// implemented in assembly to minimize non-deterministic overhead. It is assumed
+// that perfFD is known to be a valid file descriptor at the time of the call,
+// no error checking occurs.
+func doEnableRunDisable(perfFD uintptr, f func())
\ No newline at end of file
diff --git a/perf_amd64.s b/perf_amd64.s
new file mode 100644
index 0000000..b9aeb36
--- /dev/null
+++ b/perf_amd64.s
@@ -0,0 +1,23 @@
+#include "textflag.h"
+
+#define SYS_IOCTL 16
+#define PERF_EVENT_IOC_ENABLE  0x2400
+#define PERF_EVENT_IOC_DISABLE 0x2401
+
+TEXT ·doEnableRunDisable(SB),NOSPLIT,$0-16
+  MOVQ fd+0(FP), DI
+  MOVQ $PERF_EVENT_IOC_ENABLE, SI
+  MOVQ $SYS_IOCTL, AX
+  SYSCALL
+
+                                   // Overhead:
+  MOVQ f+8(FP), DX                 // 1
+  MOVQ 0(DX), AX                   // 2
+  CALL AX                          // 3, 4 (RET on the other side)
+
+  MOVQ fd+0(FP), DI                // 5
+  MOVQ $PERF_EVENT_IOC_DISABLE, SI // 6
+  MOVQ $SYS_IOCTL, AX              // 7
+  SYSCALL                          // 8
+
+  RET
diff --git a/perf_generic.go b/perf_generic.go
new file mode 100644
index 0000000..73357dd
--- /dev/null
+++ b/perf_generic.go
@@ -0,0 +1,17 @@
+//+build !amd64
+
+package perf
+
+import (
+	"golang.org/x/sys/unix"
+)
+
+// doEnableRunDisable enables the counters, executes f, and disables them. Where
+// possible it is implemented in assembly to minimize non-deterministic
+// overhead. It is assumed that perfFD is known to be a valid file descriptor at
+// the time of the call, no error checking occurs.
+func doEnableRunDisable(fd uintptr, f func()) {
+	unix.Syscall(unix.SYS_IOCTL, fd, uintptr(unix.PERF_EVENT_IOC_ENABLE), 0)
+	f()
+	unix.Syscall(unix.SYS_IOCTL, fd, uintptr(unix.PERF_EVENT_IOC_DISABLE), 0)
+}
Copy link
Owner

@acln0 acln0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good. Great idea. I want to merge, and I don't see any downsides. If you want to update the branch to use RawSyscall, please do, otherwise I will do it myself when I context switch into merging.

Thanks.

@acln0
Copy link
Owner

acln0 commented Jul 20, 2019

Leaving open until we figure out the PC difference in the failing test.

pwaller added a commit to pwaller/acln0-perf that referenced this pull request Jul 20, 2019
The rule I was taught by another developer was "use Fatal if there is nothing
useful you can further do in the test, otherwise use Error". This test in
particular does do some useful things, it logs the stacks that it observed,
which immediately makes the problem clear to me.

Before we'd just get the first message, now we get all of this:

```
    --- FAIL: TestReadRecord/SampleTracepointStack (0.04s)
        record_test.go:1101: Go (0x50a0ff) and kernel (0x513eff) PC differ
        record_test.go:1101: Go (0x513eff) and kernel (0x5043fb) PC differ
        record_test.go:1117: kernel callchain:
        record_test.go:1110: 0xfffffffffffffe00 <nil>
        record_test.go:1113: 0x5043fb /home/pwaller/go/pkg/mod/golang.org/x/[email protected]/unix/asm_linux_amd64.s:52 golang.org/x/sys/unix.RawSyscallNoError
        record_test.go:1113: 0x513eff /home/pwaller/.local/src/acln.ro/perf/perf_amd64.s:18 acln.ro/perf.doEnableRunDisable
        record_test.go:1113: 0x52681b /home/pwaller/.local/src/acln.ro/perf/record_test.go:1070 acln.ro/perf_test.testSampleTracepointStack
        record_test.go:1113: 0x4c0bb0 /snap/go/4098/src/testing/testing.go:868 testing.tRunner
        record_test.go:1113: 0x45aa41 /snap/go/4098/src/runtime/asm_amd64.s:1338 runtime.goexit
        record_test.go:1122:
        record_test.go:1124: Go stack:
        record_test.go:1113: 0x513eff /home/pwaller/.local/src/acln.ro/perf/perf_amd64.s:18 acln.ro/perf.doEnableRunDisable
        record_test.go:1113: 0x50a0ff /home/pwaller/.local/src/acln.ro/perf/perf.go:276 acln.ro/perf.(*Event).Measure
        record_test.go:1113: 0x52681b /home/pwaller/.local/src/acln.ro/perf/record_test.go:1070 acln.ro/perf_test.testSampleTracepointStack
        record_test.go:1113: 0x4c0bb0 /snap/go/4098/src/testing/testing.go:868 testing.tRunner
        record_test.go:1113: 0x45aa41 /snap/go/4098/src/runtime/asm_amd64.s:1338 runtime.goexit
```

---

I have looked at the usage of Fatalf elsewhere in the tests and decided there
were too many of them for me to go and update them all quickly and without
making a mistake late a night :)

Updates acln0#13.
This fixes the SampleTracepointStack test, because the kernel stack trace
wasn't able to 'see' the caller of doEnableRunDisable since we weren't
maintaining the frame pointer.

My original rationale for using NOSPLIT was to reduce the overhead of calling
doEnableRunDisable, but that doesn't make sense since we're called with the
counters inhibited anyway, so any stack splitting code won't be counted.

The thing which needs to be NOSPLIT is the `f` to be called, which is under the
control of the user.
@pwaller
Copy link
Contributor Author

pwaller commented Jul 20, 2019

The PC difference issue is fixed in cf4a345, with complete description in the commit. I had borked the kernel stack trace machinery by not maintaining the frame pointer.

@acln0
Copy link
Owner

acln0 commented Jul 20, 2019

Fantastic. I think this is ready to merge now.

acln0 pushed a commit that referenced this pull request Jul 20, 2019
The rule I was taught by another developer was "use Fatal if there is nothing
useful you can further do in the test, otherwise use Error". This test in
particular does do some useful things, it logs the stacks that it observed,
which immediately makes the problem clear to me.

Before we'd just get the first message, now we get all of this:

```
    --- FAIL: TestReadRecord/SampleTracepointStack (0.04s)
        record_test.go:1101: Go (0x50a0ff) and kernel (0x513eff) PC differ
        record_test.go:1101: Go (0x513eff) and kernel (0x5043fb) PC differ
        record_test.go:1117: kernel callchain:
        record_test.go:1110: 0xfffffffffffffe00 <nil>
        record_test.go:1113: 0x5043fb /home/pwaller/go/pkg/mod/golang.org/x/[email protected]/unix/asm_linux_amd64.s:52 golang.org/x/sys/unix.RawSyscallNoError
        record_test.go:1113: 0x513eff /home/pwaller/.local/src/acln.ro/perf/perf_amd64.s:18 acln.ro/perf.doEnableRunDisable
        record_test.go:1113: 0x52681b /home/pwaller/.local/src/acln.ro/perf/record_test.go:1070 acln.ro/perf_test.testSampleTracepointStack
        record_test.go:1113: 0x4c0bb0 /snap/go/4098/src/testing/testing.go:868 testing.tRunner
        record_test.go:1113: 0x45aa41 /snap/go/4098/src/runtime/asm_amd64.s:1338 runtime.goexit
        record_test.go:1122:
        record_test.go:1124: Go stack:
        record_test.go:1113: 0x513eff /home/pwaller/.local/src/acln.ro/perf/perf_amd64.s:18 acln.ro/perf.doEnableRunDisable
        record_test.go:1113: 0x50a0ff /home/pwaller/.local/src/acln.ro/perf/perf.go:276 acln.ro/perf.(*Event).Measure
        record_test.go:1113: 0x52681b /home/pwaller/.local/src/acln.ro/perf/record_test.go:1070 acln.ro/perf_test.testSampleTracepointStack
        record_test.go:1113: 0x4c0bb0 /snap/go/4098/src/testing/testing.go:868 testing.tRunner
        record_test.go:1113: 0x45aa41 /snap/go/4098/src/runtime/asm_amd64.s:1338 runtime.goexit
```

---

I have looked at the usage of Fatalf elsewhere in the tests and decided there
were too many of them for me to go and update them all quickly and without
making a mistake late a night :)

Updates #13.
@acln0 acln0 merged commit 6861f4b into acln0:master Jul 20, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants