engine: support running ebpf programs at engine start #11548
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
eBPF was crucial in debugging #11545 and #11556. It also has quite a bit of potential for future debugging, monitoring, performance analysis, and possibly even user facing features in terms of tracing (?)
Running it as a sidecar on my machine was very tedious, it's much nicer to have it in the engine itself and part of its logs. For the previous debugging I was shelling out to
bpftrace, but getting that to build for wolfi was a nightmare and extremely slow, so trying outcilium/ebpfas a library directly.LLMs (claude code + codex) seem to be consistently very good at converting "here's what I want to trace" into working ebpf programs. The two programs checked in here (
ovl_inusefor debugging overlay in-use warnings +filetracerfor tracing all file syscalls) were entirely written by them.engine/ebpf/AGENTS.mdto help out more hereWe obviously don't want to always run these programs, currently they can be configured by:
DAGGER_EBPF_PROG_<prog name>=y(e.g.DAGGER_EBPF_PROG_FILETRACER=y)cmd/engine/main.goExtraDebugis on, progs can be enabled by default by uncommenting entries in thedefaultEbpfProgramsmap at the top ofcmd/engine/main.goAlso added the ability to selectively enable progs per test workflow (add name of progs to new arg added in
toolchains/test-split/engine-tests.gen.dang), which is useful when trying to debug CI-only flakes.Overall the overhead of tracing seems to be extremely low. The
filetracerprog that logs every file create/rename/delete/etc. doesn't seem to have much noticeable impact on test runtime (though the engine logs do have over 1 million lines)