circe-sanely-auto

Drop-in replacement for circe's derivation on Scala 3. Compile faster, run faster, change nothing else.

Scala 3.8.2+ | JVM + Scala.js | ✅ 327 tests passing

The circe problem

circe is the most widely used JSON library in the Scala ecosystem. It's well-designed, well-tested, and deeply embedded in production codebases everywhere — from startups to large enterprises. Frameworks like Tapir, http4s, and Pekko HTTP all integrate with circe. If you write Scala, you probably use circe.

But circe has two performance problems:

Slow compilation. circe-generic's auto derivation triggers a new round of implicit resolution for every nested type. For a codebase with hundreds of types, these rounds compound — each type waits for all its fields to be resolved, which wait for their fields, and so on. A clean compile of 350 types takes over 9 seconds just for derivation. In large monorepos with thousands of types, this adds minutes to every build.

Slow runtime. circe parses JSON into an intermediate Json AST, then traverses that AST to build your domain objects. Every request allocates a full tree of Json nodes only to throw them away immediately. This parse-allocate-traverse pipeline caps throughput at ~130K ops/sec and allocates ~28 KB per decode for a typical 1.4 KB payload.

The catch: you can't just switch libraries. circe is not merely a dependency — it's infrastructure. Production codebases have io.circe.Json as field types in domain models, cursor navigation in business logic, tree manipulation in API layers, and framework integrations wired to Encoder[T]/Decoder[T]. Migrating away from circe is a rewrite, not a refactor.

What this library does

circe-sanely-auto fixes both problems without changing anything else:

The contract: 100% circe compatibility, zero compromise. Same JSON output, same decoded values, same error messages, same behavior in every edge case. You swap the dependency and nothing changes except performance. Any difference is a bug. No exceptions, no "close enough", no subtle surprises. This is the promise that makes or breaks this library.

This contract is enforced by 327 tests — including 192 property-based tests auto-generated from circe's own test suite using the same types, same Arbitrary instances, and same property checks.

The numbers

Compile time — 2.9–3.6x faster

~350 types, measured with hyperfine:

	circe	circe-sanely-auto
Auto derivation	9.25s	2.56s	3.6x faster
Configured derivation	2.74s	1.27s	2.2x faster
Bytecode	3,384 KB	2,635 KB	-22%
Peak RSS	1,133 MB	893 MB	-21%

CI numbers (ubuntu-latest, GitHub Actions shared runners)

Shared runners are noisier than dedicated hardware. Absolute times are ~3x slower; ratios are the meaningful metric:

	circe	circe-sanely-auto
Auto derivation	21.0s	7.2s	2.9x faster
Configured derivation	6.6s	4.6s	1.4x faster
Peak RSS (auto)	1,041 MB	863 MB	-17%

The configured speedup (1.4x on CI vs 2.2x locally) is compressed by CI noise — sanely's σ is 1.26s on a 4.6s mean (27% CoV), while circe-generic's σ is 0.25s (4% CoV). The speedup is real but smaller workloads are harder to measure reliably on shared runners. Full CI results: BENCHMARK.md.

Runtime — 5.6x faster reads, 9.0x faster writes

~1.4 KB JSON payload, nested products, sealed traits, optional fields. JMH (1 fork, 5 warmup + 5 measurement iterations of 1s each):

Approach	Reading (ops/sec)		Writing (ops/sec)
circe + jawn (baseline)	133,568 ± 1,334	1.0x	132,620 ± 2,162	1.0x
circe + jsoniter bridge	208,474 ± 3,217	1.6x	125,410 ± 1,816	0.9x
sanely-jsoniter	753,375 ± 11,453	5.6x	1,194,849 ± 15,503	9.0x
jsoniter-scala native	686,731 ± 13,820	5.1x	989,189 ± 15,742	7.5x

sanely-jsoniter surpasses jsoniter-scala native — +10% reads, +21% writes — on Apple Silicon, while remaining competitive on x86. Both dramatically outperform circe-jawn.

Compile-time cost of adding sanely-jsoniter

Adding jsoniter codecs on top of existing circe codecs. 199 types, measured with hyperfine:

Module	Mean ± σ
circe codecs only (baseline)	809ms ± 39ms	1.0x
circe + jsoniter codecs	850ms ± 18ms	1.05x
Marginal cost	~42ms	~0.2ms/type

The jsoniter codec derivation adds negligible compile time — under 50ms for 199 types (~5% overhead). This is possible because sanely-jsoniter's macro resolves codecs in a single expansion pass, reusing the same caching and resolution infrastructure as the circe codec macro.

CI numbers (ubuntu-latest, GitHub Actions shared runners)

Approach	Reading (ops/sec)		Writing (ops/sec)
circe + jawn (baseline)	~90K	1.0x	~58K	1.0x
sanely-jsoniter	~370K	4.1x	~378K	6.5x
jsoniter-scala native	~362K	4.0x	~422K	7.3x

On CI (x86), jsoniter-scala is slightly faster on writes while sanely-jsoniter is slightly faster on reads. On Apple Silicon (M3 Max), sanely-jsoniter leads on both axes (+10% reads, +21% writes). The performance difference between the two is small and architecture-dependent — both are in the same league. Full CI results: BENCHMARK.md.

Benchmark methodology & cross-session stability

Local environment: Apple M3 Max (10P + 4E cores), 36 GB RAM, macOS 26.3, OpenJDK 25.0.2 (Homebrew, aarch64), Mill 1.1.2, Scala 3.8.2.

CI environment: ubuntu-latest (GitHub Actions shared runners), x86_64, OpenJDK 25. CPU governor set to performance, turbo boost disabled. Full results tracked in BENCHMARK.md.

Compile-time: Measured with hyperfine (bash bench.sh 10). Each run cleans only the benchmark module's output then recompiles ~350 types (auto) or ~230 types (configured). One untimed warmup run ensures the Mill daemon JVM is JIT-warm. Ten timed runs follow, with hyperfine randomizing execution order. Dependencies are pre-compiled and cached. Cross-session stability: speedup ranged 3.55x–3.61x across 20 runs in 2 separate sessions (local). The jsoniter marginal cost benchmark (bash bench.sh --jsoniter 10) compiles 199 types with circe codecs only vs circe + jsoniter codecs, measuring the additional compile-time cost of adopting sanely-jsoniter.

Runtime: JMH 1.37 via Mill's contrib.jmh module. 1 fork, 5 warmup + 5 measurement iterations of 1 second each, -Xms512m -Xmx512m. JMH provides fork isolation, compiler blackhole dead-code prevention, and statistical error reporting (99.9% CI). Run with ./mill benchmark-jmh.runJmh.

Fairness: Both libraries compile the same source files, same Scala version, same JVM, same Mill daemon, in the same hyperfine invocation. The only difference is the derivation import.

How to use it

Step 1: Faster compilation

Swap one dependency, change one import:

- mvn"io.circe::circe-generic:0.14.x"
+ mvn"io.github.nguyenyou::circe-sanely-auto:0.23.0"

- import io.circe.generic.auto._
+ import io.circe.generic.auto.given

That's it. Same JSON format, same API, same behavior. Everything else stays the same — semiauto (deriveEncoder, deriveDecoder, deriveCodec), configured (deriveConfiguredCodec, deriveEnumCodec), and all io.circe.derivation.Configuration options work identically.

Step 2: Faster runtime (optional)

For HTTP hot paths where runtime throughput matters, sanely-jsoniter generates JsonValueCodec[T] instances that skip the Json tree entirely — bytes go directly to domain objects:

mvn"io.github.nguyenyou::sanely-jsoniter:0.23.0"

import sanely.jsoniter.auto.given  // auto-derives JsonValueCodec for all types
import com.github.plokhotnyuk.jsoniter_scala.core.*

case class User(name: String, age: Int)

val json = writeToString(User("Alice", 30))  // {"name":"Alice","age":30}
val user = readFromString[User](json)        // User(Alice, 30)

Both circe codecs and jsoniter codecs coexist — they're different types, no conflicts. circe stays for everything that needs the Json AST (cursor navigation, tree merging, programmatic construction). Only the serialization hot path changes.

There's also a lighter option — pair circe-sanely-auto with the jsoniter-scala-circe bridge for a drop-in 1.5x decode speedup without changing any codec code.

See the sanely-jsoniter README for full documentation, configured derivation, and migration guide.

How it works

Compile time: one macro, no search chains

Based on Mateusz Kubuszok's sanely-automatic derivation technique. The core problem with circe-generic is how inline given and summonInline interact: every nested type triggers a new round of implicit resolution through the compiler, and each round must wait for all its dependencies. For 300 types with 5 fields average, that's 1,500+ implicit searches the compiler performs sequentially.

circe-sanely-auto replaces this with a single macro expansion:

An inline given autoEncoder[A] delegates to a macro
The macro uses Expr.summonIgnoring (Scala 3.7+) to search for existing instances — excluding our own auto-given from the search to prevent infinite loops
If a user-provided instance exists, it's used. Otherwise, the macro derives it recursively within the same expansion

One macro call derives everything. The compiler never re-enters implicit search. Seven optimizations on top of this reduce both macro expansion time and generated AST size:

Single-pass codec derivation — deriveConfiguredCodec resolves both encoder and decoder in one traversal, sharing one cache, one Mirror summon, and one builtin check per type (230 vs 460 expansions)
Builtin short-circuit — primitives resolve directly to circe instances without summonIgnoring, saving ~66% of implicit searches
Container composition — Option[String], List[Int], Map[String, T] etc. are composed directly from cached inner codecs (10 container types supported)
Factory method consolidation — 11 runtime factory methods replace per-expansion anonymous class generation, reducing generated AST size by ~30-50%
Sub-trait detection cache — O(1) set lookup replaces redundant summonIgnoring calls for sealed trait sub-type detection (-90% per-call time)
Codec-first summon — tries Codec.AsObject[T] before separate Encoder[T] + Decoder[T], reducing configured implicit searches by 30%
Cross-expansion cache — 75% hit rate in auto derivation, avoiding redundant derivations for repeated types

Runtime: no intermediate AST, direct streaming

circe's pipeline allocates a Json tree on every request:

bytes  →  Json tree (allocated)  →  Decoder[T]  →  T
T  →  Encoder[T]  →  Json tree (allocated)  →  bytes

sanely-jsoniter eliminates the tree entirely, using jsoniter-scala as the streaming engine:

bytes  →  JsonValueCodec[T]  →  T
T  →  JsonValueCodec[T]  →  bytes

The macro generates optimized codec bodies using TASTy API:

Typed local variables — var _0: Int = 0; var _1: String = null instead of Array[Any], keeping primitives unboxed on the stack during the decode loop
Hash-based field dispatch — for types with > 8 fields, (in.charBufToHashCode(l): @switch) match { ... } with compile-time pre-computed hashes; small types keep a linear if-else chain
Direct primitive read/write calls — in.readInt(), out.writeVal(x.age) instead of virtual dispatch through codec.decodeValue()/codec.encodeValue() for primitive fields
Inline container-of-primitive writes — Option[String], List[Int], etc. are written inline (if _o.isDefined then out.writeVal(_o.get) else out.writeNull()) instead of dispatching through container codecs, eliminating virtual calls for the most common container patterns
Direct constructor call — new P(_f0, _f1, ...) instead of mirror.fromProduct(ArrayProduct(Array(...))), eliminating primitive boxing and two intermediate allocations per product decode
Branchless product encoding — every field is unconditionally written in a straight-line sequence: write-key, write-value, write-key, write-value. No per-field conditional checks

This brings sanely-jsoniter to surpass jsoniter-scala native on both reads and writes on Apple Silicon, while remaining competitive on x86 — all while maintaining full circe wire compatibility, with 89–96% less allocation than circe-jawn.

Why sanely-jsoniter is competitive with jsoniter-scala native

Writes: jsoniter-scala's default configuration (transientNone, transientEmpty, transientDefault all enabled) generates per-field conditional branches. sanely-jsoniter writes every field unconditionally — the circe format requires null for None and always includes all fields. The macro emits a branchless straight-line sequence with inline expansion for Option[prim] and List[prim] writes (no virtual dispatch for common container patterns). On Apple Silicon, this gives sanely-jsoniter a +21% edge; on x86 CI runners, the difference is smaller and architecture-dependent.

Reads: Both use the same direct constructor pattern (new P(f0, f1, ...)), typed locals, and hash-based field dispatch. sanely-jsoniter's simpler field semantics (no transient/default field skipping, no transientEmpty checks during decode) means fewer branches in the hot loop. Performance is within a few percent of each other.

Compatibility

circe-sanely-auto provides the same packages and APIs as circe-generic:

Before	After
`mvn"io.circe::circe-generic:0.14.x"`	`mvn"io.github.nguyenyou::circe-sanely-auto:0.23.0"`
`import io.circe.generic.auto._`	`import io.circe.generic.auto.given`
`import io.circe.generic.semiauto._`	`import io.circe.generic.semiauto.*` (unchanged)

327 tests: 135 unit tests (munit, cross-compiled JVM + Scala.js) covering auto, semiauto, and configured derivation. Plus 192 compatibility tests (munit + discipline) auto-generated from circe's own DerivesSuite, SemiautoDerivationSuite, ConfiguredDerivesSuite, and ConfiguredEnumDerivesSuites via scripts/sync-circe-tests.py — same types, same Arbitrary instances, same property-based checks.

Requirement: Scala 3.8.2+. No Scala 2 support.

Features

Auto-derivation

import io.circe.*
import io.circe.syntax.*
import sanely.auto.given  // or: import io.circe.generic.auto.given

case class Address(street: String, city: String)
case class Person(name: String, age: Int, address: Address)

Person("Alice", 30, Address("123 Main St", "Springfield")).asJson
// {"name":"Alice","age":30,"address":{"street":"123 Main St","city":"Springfield"}}

Sum types use external tagging:

enum Shape:
  case Circle(radius: Double)
  case Rectangle(width: Double, height: Double)

Shape.Circle(5.0).asJson  // {"Circle":{"radius":5.0}}

Semiauto derivation

import io.circe.generic.semiauto.*

case class User(name: String, age: Int)
object User:
  given Decoder[User] = deriveDecoder
  given Encoder.AsObject[User] = deriveEncoder
  // or: given Codec.AsObject[User] = deriveCodec

Configured derivation

All 5 options from io.circe.derivation.Configuration:

import io.circe.*
import io.circe.derivation.Configuration
import io.circe.generic.semiauto.*

given Configuration = Configuration.default
  .withSnakeCaseMemberNames
  .withDiscriminator("type")
  .withDefaults

case class User(firstName: String, lastName: String, age: Int = 25)
given Codec.AsObject[User] = deriveConfiguredCodec

User("Alice", "Smith", 30).asJson
// {"first_name":"Alice","last_name":"Smith","age":30,"type":"User"}

Option	Effect
`transformMemberNames`	Rename JSON keys (`withSnakeCaseMemberNames`, `withKebabCaseMemberNames`, etc.)
`transformConstructorNames`	Rename ADT variant names (`withSnakeCaseConstructorNames`, etc.)
`useDefaults`	Use Scala default values for missing/null JSON fields
`discriminator`	`Some("type")` for flat encoding instead of external tagging
`strictDecoding`	Reject unexpected JSON keys

Enum string codec

Encode enums with only singleton cases as plain JSON strings:

enum Color:
  case Red, Green, Blue

given Configuration = Configuration.default.withSnakeCaseConstructorNames
given Codec[Color] = deriveEnumCodec

Color.Red.asJson  // "red"

Supports hierarchical sealed traits with diamond inheritance.

What works

Products, sum types, case objects, enums
Nested types, generic types (Box[A], Qux[A, B])
Recursive types (Option[Self], List[Self], Map[K, Self], etc.)
All standard containers: Option, List, Vector, Set, Seq, Map, Chain, NonEmptyList, NonEmptyVector, NonEmptySeq, NonEmptyChain
User-provided instances are respected (not overridden)
Sub-trait flattening in ADTs
Large types (33+ fields, 33+ variants)
Generic classes with default values
Cross-platform: JVM and Scala.js

Detailed benchmarks

Compile-time benchmarks

Two benchmark suites compare compile times against circe's native derivation. All numbers from M3 Max MacBook Pro, Mill 1.1.2, Scala 3.8.2.

bash bench.sh 5              # auto derivation (~350 types)
bash bench.sh --configured 5 # configured derivation (~230 types)

Results

Suite	circe-sanely-auto	circe baseline	Speedup
Auto derivation (~350 types)	2.56s ± 0.07s	9.25s ± 0.04s (circe-generic)	3.61x ± 0.09
Configured derivation (~230 types)	1.27s ± 0.03s	2.74s ± 0.04s (circe-core)	2.16x ± 0.05

Benchmark method

Measurements use hyperfine for statistical rigor. The harness (bench.sh) works as follows:

Dependency warm-up — sanely.jvm (and the configured compat shim for --configured) are compiled via the Mill daemon before any timed run, so dependency compilation is never included in the measurement.
--warmup 1 — one untimed warmup round ensures the Mill daemon JVM is JIT-warm and OS file caches are hot.
--prepare 'rm -rf out/…' — before each timed run, the benchmark module's output is deleted so every measurement is a clean recompilation of only the benchmark sources.
--runs N — N timed runs per command. Hyperfine randomizes execution order across runs to prevent systematic ordering bias, and reports mean ± σ with min/max range.

This measures what users actually experience: warm-daemon, incremental-dependency compilation of the benchmark types only.

JVM profiling (async-profiler)

JVM-level profiling with async-profiler shows where the Scala compiler spends time. Compiler-only samples (excluding JIT/GC cold-start overhead):

Auto derivation — compiler workload

Phase	sanely (825 samples)	circe-generic (1558 samples)	Delta
core (types, symbols, contexts)	298 (36%)	519 (33%)	-43%
ast (tree maps, accumulators)	89 (11%)	242 (16%)	-63%
typer	69 (8%)	90 (6%)	-23%
other (infra, denotations)	113 (14%)	148 (10%)	-24%
transform (erasure, lambdalift)	63 (8%)	78 (5%)	-19%
backend (bytecode, classfiles)	45 (5%)	70 (4%)	-36%
macro inlines	7 (1%)	71 (5%)	-90%
macro quoted	12 (1%)	7 (0%)	sanely only
typer implicits	14 (2%)	14 (1%)	0%
parsing	11 (1%)	7 (0%)	+57%
total compiler	825	1558	-47%

Configured derivation — compiler workload

Phase	sanely (621 samples)	circe-core (805 samples)	Delta
core (types, symbols, contexts)	204 (33%)	273 (34%)	-25%
ast (tree maps, accumulators)	64 (10%)	95 (12%)	-33%
typer	62 (10%)	63 (8%)	-2%
other (infra, denotations)	97 (16%)	88 (11%)	+10%
transform (erasure, lambdalift)	53 (9%)	62 (8%)	-15%
backend (bytecode, classfiles)	44 (7%)	58 (7%)	-24%
macro inlines	7 (1%)	30 (4%)	-77%
macro quoted	6 (1%)	5 (1%)	+20%
typer implicits	11 (2%)	12 (1%)	-8%
parsing	9 (1%)	7 (1%)	+29%
total compiler	621	805	-23%

Key pattern: Sanely trades inlining for quote reflection. circe-generic/circe-core's inline + summonInline approach forces the compiler to do heavy inlining work (71 samples for auto, 30 for configured). Sanely's Expr.summonIgnoring approach shifts that work to the quote reflection phase (12/6 samples) which is cheaper. Factory method consolidation eliminated the previous regression in transform+backend phases — sanely is now lighter than circe in all phases. For auto derivation, the total compiler workload is 47% lighter. For configured, the compiler workload is 23% lighter.

Memory profiling

Peak RSS via /usr/bin/time -l, allocation samples via async-profiler event=alloc.

Peak RSS

Suite	sanely	circe baseline	Delta
Auto derivation (~350 types)	893 MB	1,133 MB (circe-generic)	-21%
Configured derivation (~230 types)	752 MB	750 MB (circe-core)	~0%

Auto derivation uses 21% less peak memory thanks to fewer generated classes and smaller ASTs. Configured derivation uses comparable peak memory. RSS includes compiler heap, JIT code, metaspace, and OS buffers.

Allocation pressure — auto derivation

Category	sanely	circe-generic	Delta
total	4,822	11,571	-58%
compiler	3,942 (82%)	10,674 (92%)	-63%
compiler.core	1,444	4,069	-65%
compiler.core.types	465	1,477	-69%
compiler.ast	335	1,582	-79%
compiler.macro.inlines	70	721	-90%
compiler.typer	246	549	-55%
compiler.backend	204	351	-42%
mill / zinc / jvm	880	897	-2%

Allocation pressure — configured derivation

Category	sanely	circe-core	Delta
total	3,073	4,181	-26%
compiler	2,242 (73%)	3,316 (79%)	-32%
compiler.core	775	1,080	-28%
compiler.core.types	175	397	-56%
compiler.ast	162	405	-60%
compiler.macro.inlines	26	201	-87%
compiler.typer	125	182	-31%
compiler.backend	211	216	-2%
mill / zinc / jvm	831	865	-4%

Bytecode impact

Bytecode size and method count measured with javap across all compiled class files. Fewer methods means faster classloading, less JIT work, and smaller vtables. Measured with python3 .claude/skills/bytecode-impact/scripts/analyze_bytecode.py.

Auto derivation (~350 types)

Metric	circe-sanely-auto	circe-generic	Delta
Total bytecode	2,634,732 bytes	3,384,133 bytes	-22.1%
Total methods	12,433	13,761	-9.7%
LazyRef/lzyINIT/monitor	0	0	—

Configured derivation (~230 types)

Metric	circe-sanely-auto	circe-core	Delta
Total bytecode	2,775,933 bytes	3,097,489 bytes	-10.4%
Total methods	8,991	9,449	-4.8%
lzyINIT patterns	1,200	1,658	-27.6%

Auto derivation generates 22% less bytecode and 1,328 fewer methods thanks to SanelyRuntime factory methods (consolidating encoder/decoder construction into shared templates) and lazy-val elimination for non-recursive types. Configured derivation generates 10% less bytecode with 27% fewer lazy initialization patterns.

Macro profiling

Built-in compile-time profiling via SANELY_PROFILE=true tracks where time is spent inside our macros:

# Profile auto derivation (~350 types)
mkdir -p results/macro-profile-auto
rm -rf out/benchmark/sanely
SANELY_PROFILE=true ./mill --no-server benchmark.sanely.compile 2>&1 | tee results/macro-profile-auto/raw.txt
python3 .claude/skills/macro-profile/scripts/analyze_profile.py results/macro-profile-auto/raw.txt

# Profile configured derivation (~230 types)
mkdir -p results/macro-profile-configured
rm -rf out/benchmark-configured/sanely
SANELY_PROFILE=true ./mill --no-server benchmark-configured.sanely.compile 2>&1 | tee results/macro-profile-configured/raw.txt
python3 .claude/skills/macro-profile/scripts/analyze_profile.py results/macro-profile-configured/raw.txt

Auto derivation (398 expansions, 2.5s total macro time)

Category	Time	%	Calls	Avg
`summonIgnoring`	1340ms	53.7%	894	1.50ms
`derive`	916ms	36.7%	920	1.00ms
`tryBuiltin`	125ms	5.0%	1943	0.06ms
`summonMirror`	102ms	4.1%	920	0.11ms
`subTraitDetect`	45ms	1.8%	336	0.13ms
`cheapTypeKey`	3ms	0.1%	4342	0.00ms
`builtinHit`	—	—	907	—
`constructorNegHit`	—	—	142	—
cache hits	—	—	2399 (75%)	—

Configured derivation (230 expansions, 815ms total macro time)

Category	Time	%	Calls	Avg
`topDerive`*	708ms	86.9%	230	3.08ms
`summonIgnoring`	303ms	37.1%	205	1.48ms
`tryBuiltin`	30ms	3.7%	493	0.06ms
`resolveDefaults`	9ms	1.2%	214	0.04ms
`subTraitDetect`	2ms	0.2%	69	0.03ms
`cheapTypeKey`	1ms	0.2%	820	0.00ms
`builtinHit`	—	—	345	—
`codecHit`	—	—	118	—
cache hits	—	—	327	—

*topDerive is a container category that includes summonIgnoring, derive, summonMirror, subTraitDetect, and resolveDefaults. Percentages sum > 100% due to nesting.

Automated benchmarks

Every release automatically triggers a benchmark workflow that runs five benchmarks in parallel on ubuntu-latest:

Job	What it measures
compile-auto	Compile time — auto derivation (~350 types), sanely vs circe-generic
compile-configured	Compile time — configured derivation (~230 types), sanely vs circe-core
runtime	Encoding/decoding throughput — circe-jawn vs circe+jsoniter vs sanely-jsoniter vs jsoniter-scala
macro-profile-auto	Macro expansion profiling — auto derivation (398 expansions)
macro-profile-configured	Macro expansion profiling — configured derivation (230 expansions)

Results accumulate in BENCHMARK.md — each release adds a new section so you can track performance across versions. The workflow opens a PR with the updated results after each run.

Compile-time benchmark entries include hyperfine's statistical output (mean ± σ, min, max) so you can see exactly how the numbers were produced.

Benchmarking a PR: Maintainers can comment /benchmark on any pull request to run the full benchmark suite against that PR's code. Results are posted as a collapsible comment on the PR. Only repository collaborators can trigger this.

You can also trigger it manually from the Actions tab with custom parameters (number of runs, warmup iterations, etc.).

Building

Requires Mill 1.1.2+.

./mill sanely.jvm.compile    # compile (JVM)
./mill sanely.js.compile     # compile (Scala.js)
./mill sanely.jvm.test       # unit tests (JVM)
./mill sanely.js.test        # unit tests (Scala.js)
./mill compat.jvm.test       # circe compatibility tests (JVM)
./mill compat.js.test        # circe compatibility tests (Scala.js)
./mill demo.run              # run demo
./mill benchmark-jmh.runJmh  # JMH runtime benchmark (all 4 libraries, read + write)
bash bench-runtime.sh        # runtime benchmark (quick, hand-rolled harness)

How it's made

This entire library — every macro, every test, every line of build config, and this README — was written by Claude Code with Claude Opus 4.6. 100% vibe coded.

Roadmap

See ROADMAP.md.

Contributing

Since this project is 100% vibe coded, I follow a strict test-driven workflow to make sure everything actually works: every change starts with a test. Write the test first, then write the code that makes it pass.

If you find a bug:

Open an issue — even just reporting the problem is really appreciated
Submit a PR with a failing test — this is the best way to describe a bug precisely
Submit a PR with the fix too — even better! Add the failing test and the code that makes it pass

All contributions are welcome — issues, bug reports, feature requests, PRs. Even a quick "hey this didn't work for me" helps.

Thanks for trying this library. I love open source and have relied on open source projects throughout my career. This is my way of giving back and having fun. Welcome to the project!

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 297 Commits
.claude/skills		.claude/skills
.github/workflows		.github/workflows
benchmark-configured		benchmark-configured
benchmark-jmh		benchmark-jmh
benchmark-jsoniter		benchmark-jsoniter
benchmark-runtime		benchmark-runtime
benchmark		benchmark
compat		compat
demo/src/demo		demo/src/demo
docs		docs
results		results
sanely-jsoniter		sanely-jsoniter
sanely-zero		sanely-zero
sanely		sanely
scripts		scripts
tapir-test		tapir-test
upstream		upstream
zinc-test		zinc-test
.gitignore		.gitignore
.gitmodules		.gitmodules
BENCHMARK.md		BENCHMARK.md
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
ISSUE-x86-write-perf.md		ISSUE-x86-write-perf.md
LICENSE		LICENSE
README.md		README.md
ROADMAP.md		ROADMAP.md
WHY.md		WHY.md
bench-runtime.sh		bench-runtime.sh
bench.sh		bench.sh
build.mill		build.mill
mill		mill
test-zinc.sh		test-zinc.sh

Folders and files

Latest commit

History

Repository files navigation

circe-sanely-auto

The circe problem

What this library does

The numbers

Compile time — 2.9–3.6x faster

Runtime — 5.6x faster reads, 9.0x faster writes

Compile-time cost of adding sanely-jsoniter

How to use it

Step 1: Faster compilation

Step 2: Faster runtime (optional)

How it works

Compile time: one macro, no search chains

Runtime: no intermediate AST, direct streaming

Compatibility

Features

Auto-derivation

Semiauto derivation

Configured derivation

Enum string codec

What works

Detailed benchmarks

Compile-time benchmarks

Results

Benchmark method

JVM profiling (async-profiler)

Auto derivation — compiler workload

Configured derivation — compiler workload

Memory profiling

Peak RSS

Allocation pressure — auto derivation

Allocation pressure — configured derivation

Bytecode impact

Auto derivation (~350 types)

Configured derivation (~230 types)

Macro profiling

Auto derivation (398 expansions, 2.5s total macro time)

Configured derivation (230 expansions, 815ms total macro time)

Automated benchmarks

Building

How it's made

Roadmap

Contributing

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 24

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages