A minimal, predictable CLI to record Apple‑Silicon PMU counters and compare multiple commands.
- Requires Zig 0.15.1
- Build with
zig build --release=fast - Running
lauka recordrequiressudo(PMU counters need elevated access).
# Record a single command
lauka [options] -- <command>
# Compare two or more commands (aggregated per command)
lauka [options] -- '<cmd1>' '<cmd2>' ['<cmd3>' ...]
# Explicit subcommands (optional)
lauka record [options] -- '<cmd1>' '<cmd2>' ['<cmd3>' ...]
# List counters
lauka counters
lauka counters --detailsUsage:
lauka [options] -- <command>
lauka [options] -- '<command_1>' '<command_2>' ['<command_3>' ...]
lauka <subcommand> [options]
Subcommands:
record Run one or more commands; aggregated stats per command (deltas vs first when multiple)
counters List available counters (names; optional descriptions & compatibility flags)
version Show version
help Show help for any command
-n, --runs <N>— number of measured runs (default:3, minimum: 3)--warmup <N>— warmup runs before measuring (default:0)-m, --measurements <list>— comma‑separated counters--color <when>—auto(default),never,ansi-h, --help— show help
Notes
- If any child exits non‑zero,
laukaforwards that exit code (for multiple commands, the first failing code). - Everything after
--is passed verbatim to the child command list.
lauka [options] -- '<cmd1>' ['<cmd2>' ['<cmd3>' ...]]
Behavior
- Run
warmuptimes (ignored in stats), thenrunsmeasured times. - Compute per‑metric:
mean,stddev,min,max, andoutliers. - Default execution is sequential: run all iterations of the first command, then the second, etc.
- Output shows a separate aggregated table for each command, and for commands after the first, a delta column vs the first (baseline) for each metric.
Examples
# Two commands, sequential (default)
lauka -n 9 -m core_active_cycle,inst_all,branch_mispred_nonspec,l1d_cache_miss_ld_nonspec -- \
'./old --opt=0' \
'./new --opt=1'
# Three commands
a="prog --size 1e6 --mode=A"; b="prog --size 1e6 --mode=B"; c="prog --size 1e6 --mode=C"
lauka -n 7 -m l1d_cache_miss_ld_nonspec,branch_mispred_nonspec -- "$a" "$b" "$c"lauka counters [--details] [--no-headers]
Behavior
- Default: print names only.
- With
--details: include description and incompatibility flags for each counter.
Options
-d, --details— showname,description, and incompatibility flags--no-headers- hide headers for--detailsoutput
Incompatibility flags (shown only with --details)
incompat: none— no known constraintincompat: pair— has pairwise incompatibility constraintsincompat: quad— has quad incompatibility constraints
Examples
lauka counters
lauka counters --details- One block per command.
- Columns:
measurement,mean ± σ,min … max,outliers, anddelta(compare mode only, for commands after the baseline). - Colors follow
--color:auto,never,ansi.
Example (two commands)
Benchmark 1 (9 runs): ./build-old
measurement mean ± σ min … max outliers
wall_time 591ms ± 7.6ms 583ms … 605ms 0 (0%)
peak_rss 137MB ± 0.3MB 136.6MB … 137.4MB 0 (0%)
core_active_cycle 2.51G ± 22.1M 2.48G … 2.54G 0 (0%)
inst_all 3.62G ± 23.9M 3.53G … 3.69G 0 (0%)
l1d_cache_miss_ld_nonspec 3.58M ± 31.7K 3.54M … 3.63M 0 (0%)
branch_mispred_nonspec 21.4M ± 58.2K 21.3M … 21.5M 0 (0%)
Benchmark 2 (9 runs): ./build-new -O2
measurement mean ± σ min … max outliers delta
wall_time 130ms ± 8.3ms 125ms … 141ms 0 (0%) ⚡ −78.0% ± 0.5%
peak_rss 91.9MB ± 0.09MB 91.8MB … 92.1MB 0 (0%) −32.9% ± 0.1%
core_active_cycle 507M ± 2.35M 503M … 511M 0 (0%) −79.8% ± 0.1%
inst_all 796M ± 10.7M 781M … 809M 0 (0%) −78.0% ± 0.1%
l1d_cache_miss_ld_nonspec 352K ± 7.7K 318K … 355K 0 (0%) −90.2% ± 0.1%
branch_mispred_nonspec 4.52M ± 11.5K 4.51M … 4.57M 2 (5%) −78.9% ± 0.0%
The delta columns are relative to the first command (baseline). Signs and glyphs may be colorized depending on
--color.
- Default: names only (one per line).
- With
--details: add description and incompatibility flags.
Example (--details):
name incompat description
core_active_cycle none Cycles while the core was active
inst_all pair,quad All retired instructions
l1d_cache_miss_ld_nonspec quad Retired loads that missed in the L1D
branch_mispred_nonspec quad Retired branches mispredicted
0– success1– usage error (bad flags, missing command, quoting error)2– PMU scheduling/collection error
error: minimum runs is 3 (got 2)
fix: use -n 3 or higher
error: command contains spaces/metacharacters; wrap it in quotes
example: lauka -- 'myapp --flag 1'
error: requested measurements could not be scheduled on Apple M3
tip: remove one of the conflicting counters or try a smaller set
If app was interrupted/killed, the next run may show this:
error(lauka): SetTimerCount
error(lauka): SetTimerPeriod with Action ID = 1
error(lauka): SetTimer with Action ID = 1 and Timer ID = 1
error(lauka): SetTimerPet with Timer ID = 1
error(lauka): SetLightweightPetworkaround: ignore it, no changes in recording behavior was detected. If you want to
fix it, make one successful run of lauka record, the next runs should be clean.
This tool couldn't exist without the following projects and their authors:
- reverse-engineered kperf API by ibireme, which made it possible to access PMU counters on Apple Silicon.
- poop by Andrey Kelly, which created a great tool to monitor PMU counters for Linux.
- scoop by tensorush, who ported to Zig reverse-engineered kperf API, wrapped it in a nice library, and created the PR that added the ability to fetch CPU counters on Macs to
pooptool.
This tool is besically a merge of poop and scoop, with a rewritten CLI and extended functionality.