interpreter: inline ebpf instruction decoding #101
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This is a relatively crude hammer that brings the delta between JIT and
interpreter from 17x to 12x (as measured on top of #100) in the
bench_jit_vs_interpreter_empty_for_loopbenchmark. On every iterationwe'd spend significant time calling into this small helper function that
returns a non-trivial structure back, requiring significant setup, etc.
This change should also help JIT compilation speed quite a bit, but I
haven't checked this.
The logic flow here feels quite inefficient still, however, and this
specific part of the code still warrants further attention. In
particular, even after this change the machine code eagerly decodes the
entire isntruction into its parts rather than grabbing just the opcode
that it would need to pick the instruction to process. At that point
there might be another instruction word worth of arguments to decode…
It might (or might not!) be better to decode just the opcode and leave
it up to the compiler on how it wants to hoist decoding of the arguments
(if at all.)