toon4s is the idiomatic Scala implementation
of Token-Oriented Object Notation (TOON),
a compact, LLM-friendly data format that blends YAML-style indentation with CSV-like tabular efficiency.
Save 30-60% on LLM token costs while maintaining full JSON compatibility.
What makes toon4s different: Most libraries prioritize features over architecture.
- Pure functional core: Zero mutations, total functions, referentially transparent
- Type safety first: sealed ADTs, exhaustive pattern matching, zero unsafe casts, VectorMap for deterministic ordering
- Stack-safe by design: @tailrec-verified functions, constant stack usage, handles arbitrarily deep structures
- Modern JVM ready: Virtual thread compatible (no ThreadLocal), streaming optimized, zero dependencies (491KB core JAR)
- Production hardened: 500+ passing tests, property-based testing, Either-based error handling, security limits
- Railway-oriented programming: For-comprehension error handling, no exceptions in happy paths, composable with Cats/ZIO/FS2
Example:
{ "tags": ["jazz","chill","lofi"] }→tags[3]: jazz,chill,lofi(40-60% token savings)
- Key features & Scala-first benefits
- Benchmarks at a glance
- Architecture & design principles
- Installation
- Quick start (library)
- CLI usage
- Format crash course
- Rules & guidelines
- API surface
- Type safety & conversions
- Using TOON in LLM prompts
- Limitations & gotchas
- Syntax cheatsheet
- Development & quality gates
- License
| Theme | What you get | Why it matters on the JVM |
|---|---|---|
| Spec‑complete | Targets TOON v2.1.0 and emits the v3 row-depth (+2) layout for tabular arrays in list-item first-field position; parity with toon (TS) and JToon (Java). |
Mixed stacks behave the same; token math is consistent across platforms. |
| Typed APIs (2 & 3) | Scala 3 derivation for Encoder/Decoder; Scala 2.13 typeclasses via ToonTyped. |
Compile‑time guarantees, no Any; safer refactors and zero-cost abstractions. |
| Pure & total | All encoders/decoders are pure functions; decode returns Either[DecodeError, JsonValue]. |
Idiomatic FP: easy to compose in Cats/ZIO/FS2; referentially transparent. |
| Deterministic ADTs | JsonValue as a sealed ADT with VectorMap for objects; stable field ordering. |
Exhaustive pattern matching; predictable serialization for testing/debugging. |
| Streaming visitors | foreachTabular and nested foreachArrays (tail‑recursive, stack-safe). |
Validate/process millions of rows without building a full AST; constant memory usage. |
| Zero-overhead visitors | Composable visitor pattern for streaming + transformations in single pass; includes JSON repair for LLM output. Universal TreeWalker adapters for Jackson/Circe/Play JSON. |
Apache Spark workloads: repair + filter + encode 1M rows with O(d) memory; encode Jackson JsonNode→TOON or decode TOON→JsonNode without JsonValue intermediate. |
| Zero‑dep core | Core library has zero dependencies beyond Scala stdlib; CLI uses only scopt + jtokkit. |
Tiny footprint (<100KB), simpler audits, no transitive dependency hell. |
| Strictness profiles | Strict (spec-compliant) vs Lenient (error-tolerant) modes with validation policies. |
Safer ingestion of LLM outputs and human-edited data; configurable validation. |
| CLI with budgets | Built-in --stats (token counts), --optimize (delimiter selection); cross-platform. |
Track token savings in CI/CD; pick optimal delimiter for your data shape. |
| Virtual thread ready | No ThreadLocal usage; compatible with Java 21+ Project Loom virtual threads. | Future-proof for modern JVM concurrency; scales to millions of concurrent tasks. |
| Production hardened | 500+ passing tests; property-based testing; strict mode validation; security limits. | Battle-tested edge cases; prevents DoS via depth/length limits; safe for production. |
toon4s is built on a layered architecture that separates concerns and enables composability:
flowchart TD
USER["User code"] --> API["Public API layer"]
API --> ENCODE["Encoder path"]
API --> DECODE["Decoder path"]
API --> VISITOR["Visitor path"]
ENCODE --> PRIMITIVES["Primitives module"]
ENCODE --> NORM["Normalize module"]
ENCODE --> WRITER["EncodeLineWriter"]
DECODE --> SCANNER["Scanner"]
SCANNER --> PARSER["Parser layer"]
PARSER --> CURSOR["Cursor + Validation"]
CURSOR --> JSON["JsonValue ADT"]
VISITOR --> TREEWALKER["TreeWalker"]
TREEWALKER --> VISITORS["Visitor implementations"]
VISITORS --> TRANSFORM["Streaming transform"]
style USER fill:#e1f5ff,stroke:#0066cc,color:#000
style API fill:#fff4e1,stroke:#cc8800,color:#000
style ENCODE fill:#e1ffe1,stroke:#2d7a2d,color:#000
style DECODE fill:#e1ffe1,stroke:#2d7a2d,color:#000
style VISITOR fill:#e1ffe1,stroke:#2d7a2d,color:#000
style JSON fill:#f0e1ff,stroke:#8800cc,color:#000
style TRANSFORM fill:#f0e1ff,stroke:#8800cc,color:#000
Decode path (decode/):
- Scanner: Tokenizes TOON text into structured lines with indentation tracking
- Parser: Converts tokens to
JsonValueADT with strict/lenient modes - Cursor: Stack-safe navigation through nested structures
- Validation: Depth, length, and size limit enforcement
Encode path (encode/):
- Encoders: Pure functions from
JsonValueto TOON format - Primitives: Low-level string quoting and primitive encoding
- Normalize: Array/object structure analysis for optimal layout selection
Visitor pattern (visitor/):
- TreeWalker: Universal adapter for external JSON libraries (Jackson, Circe, Play)
- Streaming Visitors: O(1) memory transformations (filter, repair, stringify)
- Composable: Chain multiple visitors in single pass
flowchart LR
START["JsonValue"] --> ANALYZE["Normalize.analyze"]
ANALYZE --> DECISION{"Array type?"}
DECISION -->|" Uniform objects "| TABULAR["Tabular format"]
DECISION -->|" Primitives "| INLINE["Inline format"]
DECISION -->|" Mixed/nested "| LIST["List format"]
TABULAR --> HEADER["Format header + rows"]
INLINE --> DELIM["Join with delimiter"]
LIST --> NESTED["Recursive encode"]
HEADER --> OUTPUT["TOON string"]
DELIM --> OUTPUT
NESTED --> OUTPUT
style START fill:#e1f5ff,stroke:#0066cc,color:#000
style ANALYZE fill:#fff4e1,stroke:#cc8800,color:#000
style DECISION fill:#f0e1ff,stroke:#8800cc,color:#000
style TABULAR fill:#e1ffe1,stroke:#2d7a2d,color:#000
style INLINE fill:#e1ffe1,stroke:#2d7a2d,color:#000
style LIST fill:#e1ffe1,stroke:#2d7a2d,color:#000
style OUTPUT fill:#90EE90,stroke:#2d7a2d,color:#000
flowchart LR
INPUT["TOON string"] --> SCANNER["Scanner.scan"]
SCANNER --> LINES["Structured lines"]
LINES --> PARSE["Parser.parse"]
PARSE --> VALIDATE["Validation"]
VALIDATE -->|" Valid "| SUCCESS["Right(JsonValue)"]
VALIDATE -->|" Invalid "| ERROR["Left(DecodeError)"]
SUCCESS --> TYPED["Optional: Decoder[T]"]
TYPED --> RESULT["T"]
style INPUT fill:#e1f5ff,stroke:#0066cc,color:#000
style SCANNER fill:#fff4e1,stroke:#cc8800,color:#000
style LINES fill:#fff4e1,stroke:#cc8800,color:#000
style PARSE fill:#fff4e1,stroke:#cc8800,color:#000
style VALIDATE fill:#f0e1ff,stroke:#8800cc,color:#000
style SUCCESS fill:#e1ffe1,stroke:#2d7a2d,color:#000
style ERROR fill:#ffe1e1,stroke:#cc0000,color:#000
style TYPED fill:#e1ffe1,stroke:#2d7a2d,color:#000
style RESULT fill:#90EE90,stroke:#2d7a2d,color:#000
flowchart TD
EXTERNAL["External JSON<br/>(Jackson/Circe/Play)"] --> WALKER["TreeWalker.dispatch"]
JSONVAL["JsonValue ADT"] --> WALKER
WALKER --> VISITOR["Visitor trait"]
VISITOR --> IMPL{"Implementation"}
IMPL -->|" StringifyVisitor "| STRINGIFY["TOON string"]
IMPL -->|" FilterKeysVisitor "| FILTER["Filtered JSON"]
IMPL -->|" JsonRepairVisitor "| REPAIR["Repaired JSON"]
IMPL -->|" ConstructionVisitor "| CONSTRUCT["JsonValue"]
FILTER --> CHAIN["Chain visitors"]
REPAIR --> CHAIN
CHAIN --> ONEPASS["Single-pass transform"]
style EXTERNAL fill:#e1f5ff,stroke:#0066cc,color:#000
style JSONVAL fill:#e1f5ff,stroke:#0066cc,color:#000
style WALKER fill:#fff4e1,stroke:#cc8800,color:#000
style VISITOR fill:#f0e1ff,stroke:#8800cc,color:#000
style STRINGIFY fill:#e1ffe1,stroke:#2d7a2d,color:#000
style FILTER fill:#e1ffe1,stroke:#2d7a2d,color:#000
style REPAIR fill:#e1ffe1,stroke:#2d7a2d,color:#000
style CONSTRUCT fill:#e1ffe1,stroke:#2d7a2d,color:#000
style ONEPASS fill:#90EE90,stroke:#2d7a2d,color:#000
toon4s achieves 2x performance through systematic optimization:
Allocation reduction:
- Pre-allocated
StringBuildercapacity based on estimated output size - Single-pass parsing (combined quote-finding + unescaping)
- Cached common header patterns (array lengths 0-10)
VectorBuilder+ while loops instead of functional chains
Hot path optimization:
Character.isWhitespace()instead ofString.trim()allocation- Pattern matching for delimiter dispatch
- Early exit with
iterator.forallfor uniform array detection - Hoisted constants outside loops
Memory efficiency:
- Streaming visitors with O(d) memory (depth-dependent, not size-dependent)
- Tail-recursive iteration for large arrays
- Stack-safe cursor navigation
- No intermediate allocations in visitor chains
Benchmark results (encode_object: 287 → 600 ops/ms, decode_tabular: 417 → 874 ops/ms):
- P0 quick wins: 20-30% gain
- P1 high impact: 45-70% gain
- P2 optimizations: 4-15% additional gain
- Total: ~2x improvement while maintaining functional purity
classDiagram
class JsonValue {
<<sealedtrait>>
}
class JNull {
<<caseobject>>
}
class JBool {
+Boolean value
}
class JNumber {
+BigDecimal value
}
class JString {
+String value
}
class JArray {
+Vector~JsonValue~ values
}
class JObj {
+VectorMap~String,JsonValue~ fields
}
JsonValue <|-- JNull
JsonValue <|-- JBool
JsonValue <|-- JNumber
JsonValue <|-- JString
JsonValue <|-- JArray
JsonValue <|-- JObj
JArray --> JsonValue: contains
JObj --> JsonValue: contains
style JsonValue fill:#f0e1ff,stroke:#8800cc,color:#000
style JNull fill:#e1ffe1,stroke:#2d7a2d,color:#000
style JBool fill:#e1ffe1,stroke:#2d7a2d,color:#000
style JNumber fill:#e1ffe1,stroke:#2d7a2d,color:#000
style JString fill:#e1ffe1,stroke:#2d7a2d,color:#000
style JArray fill:#fff4e1,stroke:#cc8800,color:#000
style JObj fill:#fff4e1,stroke:#cc8800,color:#000
classDiagram
class Visitor~R~ {
<<trait>>
+visitNull() R
+visitBool(Boolean) R
+visitNumber(BigDecimal) R
+visitString(String) R
+visitArray(Vector~R~) R
+visitObject(VectorMap~String,R~) R
}
class TreeWalker {
<<object>>
+dispatch(JsonValue, Visitor) R
+fromJackson(JsonNode, Visitor) R
+fromCirce(Json, Visitor) R
}
class StringifyVisitor {
+visitString(s) String
+visitArray(items) String
}
class FilterKeysVisitor {
+Set~String~ keysToKeep
+visitObject(fields) JsonValue
}
class ConstructionVisitor {
+visitNull() JsonValue
+visitArray(items) JArray
}
class JsonRepairVisitor {
+visitString(s) JsonValue
+visitNumber(n) JsonValue
}
Visitor <|.. StringifyVisitor
Visitor <|.. FilterKeysVisitor
Visitor <|.. ConstructionVisitor
Visitor <|.. JsonRepairVisitor
TreeWalker --> Visitor: uses
style Visitor fill:#f0e1ff,stroke:#8800cc,color:#000
style TreeWalker fill:#fff4e1,stroke:#cc8800,color:#000
style StringifyVisitor fill:#e1ffe1,stroke:#2d7a2d,color:#000
style FilterKeysVisitor fill:#e1ffe1,stroke:#2d7a2d,color:#000
style ConstructionVisitor fill:#e1ffe1,stroke:#2d7a2d,color:#000
style JsonRepairVisitor fill:#e1ffe1,stroke:#2d7a2d,color:#000
sequenceDiagram
participant User
participant Encoder
participant Normalize
participant Primitives
participant Writer
User ->> Encoder: encode(JsonValue)
Encoder ->> Normalize: analyze(array)
Normalize -->> Encoder: StructuralInfo
alt Tabular format
Encoder ->> Encoder: extractHeader + rows
Encoder ->> Writer: writeTabular
else Inline format
Encoder ->> Primitives: quoteAndEscape
Encoder ->> Writer: writeInline
else List format
Encoder ->> Encoder: recursive encode
Encoder ->> Writer: writeList
end
Writer -->> User: TOON string
Note over Normalize, Writer: Zero allocations in hot path
Note over Encoder: Pre-allocated StringBuilder
sequenceDiagram
participant User
participant Decoder
participant Scanner
participant Parser
participant Cursor
participant Validator
User ->> Decoder: decode(String)
Decoder ->> Scanner: scan(input)
Scanner -->> Decoder: Vector[StructuredLine]
Decoder ->> Parser: parse(lines)
Parser ->> Cursor: navigate structure
loop For each line
Cursor ->> Parser: peek + advance
Parser ->> Parser: parseValue
end
Parser ->> Validator: validate limits
alt Valid
Validator -->> User: Right(JsonValue)
else Invalid
Validator -->> User: Left(DecodeError)
end
Note over Cursor: Stack-safe navigation
Note over Parser: Single-pass parsing
graph TD
API["Public API<br/>(Encoder/Decoder)"]
CODEC["Codec Layer<br/>(ToonTyped)"]
JSON["JsonValue ADT"]
ENCODE["Encode module"]
DECODE["Decode module"]
VISITOR["Visitor module"]
PRIMITIVES["Primitives"]
NORMALIZE["Normalize"]
SCANNER["Scanner"]
PARSER["Parser"]
CURSOR["Cursor"]
VALIDATOR["Validator"]
TREEWALKER["TreeWalker"]
VISITORS["Visitor Impls"]
ERROR["Error types"]
BUILD["Builder pattern"]
API --> ENCODE
API --> DECODE
API --> VISITOR
API --> JSON
CODEC --> ENCODE
CODEC --> DECODE
ENCODE --> PRIMITIVES
ENCODE --> NORMALIZE
ENCODE --> JSON
DECODE --> SCANNER
DECODE --> PARSER
DECODE --> CURSOR
DECODE --> VALIDATOR
DECODE --> JSON
PARSER --> CURSOR
VISITOR --> TREEWALKER
VISITOR --> VISITORS
VISITORS --> JSON
ERROR --> API
BUILD --> API
style API fill:#e1f5ff,stroke:#0066cc,color:#000
style CODEC fill:#fff4e1,stroke:#cc8800,color:#000
style JSON fill:#f0e1ff,stroke:#8800cc,color:#000
style ENCODE fill:#e1ffe1,stroke:#2d7a2d,color:#000
style DECODE fill:#e1ffe1,stroke:#2d7a2d,color:#000
style VISITOR fill:#e1ffe1,stroke:#2d7a2d,color:#000
This is what sets toon4s apart: While most libraries compromise on architecture for convenience, toon4s demonstrates that you can have both production performance and functional purity. Every design decision prioritizes correctness, composability, and type safety-making toon4s a reference implementation for modern Scala projects.
Every function in toon4s is pure and total:
-
Zero mutations: No vars / while loops
- State threading pattern (pass state as parameters, return new state)
- Accumulator-based tail recursion
- Immutable builders (Vector, VectorMap)
-
Total functions: No exceptions in happy paths
- All encoders/decoders return
Either[Error, Result] - Railway-oriented programming for error handling
- Exhaustive pattern matching on sealed ADTs
- All encoders/decoders return
-
Referentially transparent: Same input → same output, always
- No side effects in core logic
- No global mutable state
- Deterministic output (VectorMap preserves insertion order)
-
Stack-safe recursion: functions with
@tailrec- Compiler-verified tail call optimization
- Can parse arbitrarily deep structures
- Constant stack usage regardless of input size
Scala's type system is used to maximum effect
Key type safety features:
- sealed ADTs: Exhaustive pattern matching catches missing cases at compile time
- No unsafe casts: Zero
asInstanceOfin production code (only 2 necessary casts with safety comments) - VectorMap everywhere: Ensure deterministic field ordering
- Compile-time derivation: Scala 3
derivesgenerates type class instances at compile time
State threading pattern
@tailrec
def collectFields(
targetDepth: Option[Int],
acc: Vector[(String, JsonValue)] // Accumulator instead of var
): Vector[(String, JsonValue)] = {
cursor.peek match {
case None => acc
case Some(line) if line.depth < baseDepth => acc
case Some(line) =>
val td = targetDepth.orElse(Some(line.depth))
if (td.contains(line.depth)) {
cursor.advance()
val KeyValueParse(key, value, _) = decodeKeyValue(.
..)
collectFields(td, acc :+ (key -> value)) // Recurse with new state
} else acc
}
}Railway-oriented programming
// Either accumulation instead of var err: Error | Null = null
xs.foldLeft[Either[DecodeError, List[A]]](Right(Nil)) {
(acc, j) =>
for
list <- acc // Short-circuit on first error
a <- d(j) // Decode current element
yield a :: list // Accumulate successes
}.map(_.reverse)Visitor pattern for zero-overhead transformations
// Generic visitor trait with type parameter R (return type)
trait Visitor[R] {
def visitNull(): R
def visitBool(value: Boolean): R
def visitString(value: String): R
def visitArray(items: Vector[R]): R
def visitObject(fields: VectorMap[String, R]): R
}
// TreeWalker dispatches to visitor without intermediate allocations
object TreeWalker {
def dispatch[R](json: JsonValue, visitor: Visitor[R]): R = json match {
case JNull => visitor.visitNull()
case JBool(b) => visitor.visitBool(b)
case JArray(items) => visitor.visitArray(items.map(dispatch(_, visitor)))
case JObj(fields) => visitor.visitObject(fields.map((k, v) => k -> dispatch(v, visitor)))
}
}
// Compose multiple visitors in single pass
val filtered = TreeWalker.dispatch(json, FilterKeysVisitor(Set("id", "name")))
val repaired = TreeWalker.dispatch(filtered, JsonRepairVisitor())Strategy pattern for encoding
// Different encoding strategies based on structure analysis
sealed trait EncodingStrategy
case object TabularStrategy extends EncodingStrategy
case object InlineStrategy extends EncodingStrategy
case object ListStrategy extends EncodingStrategy
// Normalize.analyze returns StructuralInfo with detected strategy
case class StructuralInfo(
strategy: EncodingStrategy,
commonFields: Option[List[String]],
isUniform: Boolean
)
// Encoder dispatches based on strategy
def encodeArray(arr: JArray): String = {
val info = Normalize.analyze(arr)
info.strategy match {
case TabularStrategy => encodeTabular(arr, info.commonFields.get)
case InlineStrategy => encodeInline(arr)
case ListStrategy => encodeList(arr)
}
}Builder pattern with phantom types
// Type-safe builder using phantom types
sealed trait BuilderState
sealed trait Empty extends BuilderState
sealed trait WithDelimiter extends BuilderState
sealed trait Complete extends BuilderState
class OptionsBuilder[S <: BuilderState] private(config: Map[String, Any]) {
// Only callable in Empty state
def delimiter(d: Delimiter)(implicit ev: S =:= Empty): OptionsBuilder[WithDelimiter] =
new OptionsBuilder(config + ("delimiter" -> d))
// Only callable in WithDelimiter state
def strictness(s: Strictness)(implicit ev: S =:= WithDelimiter): OptionsBuilder[Complete] =
new OptionsBuilder(config + ("strictness" -> s))
// Only callable in Complete state
def build()(implicit ev: S =:= Complete): Options =
Options(
delimiter = config("delimiter").asInstanceOf[Delimiter],
strictness = config("strictness").asInstanceOf[Strictness]
)
}
// Usage (type-safe at compile time)
val opts = OptionsBuilder.empty
.delimiter(Delimiter.Comma) // Must be first
.strictness(Strictness.Strict) // Must be second
.build() // Must be lastTypeclass pattern for derivation
// Encoder typeclass for type-safe serialization
trait Encoder[A] {
def encode(value: A): JsonValue
}
// Decoder typeclass for type-safe deserialization
trait Decoder[A] {
def decode(json: JsonValue): Either[DecodeError, A]
}
// Scala 3 automatic derivation
case class User(id: Int, name: String, email: String) derives Encoder, Decoder
// Usage (type-safe at compile time)
val user = User(1, "Alice", "[email protected]")
val json = Encoder[User].encode(user) // JsonValue
val decoded = Decoder[User].decode(json) // Either[DecodeError, User]Adapter pattern for external libraries
// TreeWalker adapts external JSON libraries without conversion
object TreeWalker {
// Jackson adapter
def fromJackson[R](node: JsonNode, visitor: Visitor[R]): R = {
if (node.isNull) visitor.visitNull()
else if (node.isBoolean) visitor.visitBool(node.booleanValue())
else if (node.isArray) {
val items = node.elements().asScala.map(fromJackson(_, visitor)).toVector
visitor.visitArray(items)
}
// ... dispatch to visitor directly without creating JsonValue
}
// Circe adapter
def fromCirce[R](json: io.circe.Json, visitor: Visitor[R]): R = {
json.fold(
visitor.visitNull(),
visitor.visitBool,
n => visitor.visitNumber(BigDecimal(n.toString)),
visitor.visitString,
arr => visitor.visitArray(arr.map(fromCirce(_, visitor)).toVector),
obj => visitor.visitObject(VectorMap.from(obj.toMap.map((k, v) => k -> fromCirce(v, visitor))))
)
}
}
// Usage: zero-copy transformation from Jackson to TOON
val jacksonNode: JsonNode = objectMapper.readTree(input)
val toonString = TreeWalker.fromJackson(jacksonNode, StringifyVisitor(Options.default))Factory pattern for parser creation
// Parser factory with configuration
object Parser {
def create(options: Options): Parser = {
val validator = Validator(
maxDepth = options.maxDepth,
maxLength = options.maxLength,
maxSize = options.maxSize
)
new Parser(
strictness = options.strictness,
validator = validator,
delimiter = options.delimiter
)
}
}| Metric | Value | Meaning |
|---|---|---|
| Production code | 5,887 lines (56 files) | Well-organized, modular |
| Test coverage | 500+ tests, 100% passing | Comprehensive validation |
| Tail-recursive fns | With @tailrec |
Stack-safe, verified |
| Sealed ADTs | traits/classes | Exhaustive matching |
| VectorMap usage | 32+ occurrences | Deterministic ordering |
| Mutable state | No vars in parsers |
Pure functional |
| Unsafe casts | 2 (documented as safe) | Type-safe design |
Built for the future of JVM concurrency:
-
Virtual thread ready: Zero
ThreadLocalusage- Fully compatible with Java 21+ Project Loom
- Can spawn millions of virtual threads without memory leaks
- See core/src/main/scala/io/toonformat/toon4s/encode/Primitives.scala:60 for virtual thread design notes
-
Streaming optimized: Constant-memory validation
Streaming.foreachTabular- process rows without full ASTStreaming.foreachArrays- validate nested arrays incrementally- Tail-recursive visitors with accumulator pattern
-
Zero dependencies: 491KB core JAR
- Pure Scala stdlib (no Jackson, Circe, Play JSON)
- CLI only adds scopt + jtokkit
- Minimal attack surface for security audits
toon4s proves you don't have to choose between performance and purity:
| Traditional tradeoff | How toon4s achieves both |
|---|---|
| "Mutation is faster" | Tail recursion + accumulators match imperative performance while staying pure |
| "Exceptions are simpler" | Either + railway-oriented programming is just as ergonomic with for-comprehensions |
| "ThreadLocal is convenient" | State threading pattern works seamlessly with virtual threads (future-proof) |
| "Any/casting saves time" | Sealed ADTs + exhaustive matching catch bugs at compile time (saves debugging time) |
| "External libs add features" | Zero dependencies means zero CVEs, zero conflicts, minimal attack surface |
The result: A library that's both safer (pure FP, types) and faster to maintain (no surprises, composable).
This architecture makes toon4s ideal for:
- Production services - reliability and correctness are non-negotiable
- Functional stacks (Cats, ZIO, FS2) - pure functions compose without side effects
- Virtual thread workloads (Project Loom) - no ThreadLocal means no memory leaks
- High-throughput pipelines - ~660 ops/ms average with predictable, constant-memory streaming
- Type-safe domain modeling - sealed ADTs + derivation = compile-time guarantees
Bottom line: toon4s is what happens when you refuse to compromise. Use it for TOON encoding, or study it to learn how to build production-grade functional systems.
See also: SCALA-TOON-SPECIFICATION.md for encoding rules
See also: Encoding rules, Strict mode, Delimiters & headers
Be honest: token savings depend on your data. From our runs and community reports:
- Typical savings: 30-60% vs formatted JSON when arrays are uniform and values are short strings/numbers.
- Small example:
{ "tags": ["jazz","chill","lofi"] }→tags[3]: jazz,chill,lofisaved ~40-60% tokens across common GPT tokenizers. - Deeply nested, irregular objects: savings narrow; sometimes JSON ties or wins. Measure in CI with
--stats. - Retrieval accuracy: some reports show JSON ≈ 70% vs TOON ≈ 65% on certain tasks. If accuracy matters more than cost, validate on your prompts.
Use the CLI or the benchmark runner to measure your payloads:
# Option A: CLI (quick)
toon4s-cli --encode payload.json --stats --tokenizer o200k -o payload.toon
# Option B: JMH runner (reproducible set)
sbt jmhDev # quick JMH runs
sbt jmhFull # heavy JMH runs
Throughput (JMH heavy, macOS M‑series, Java 21.0.9, Temurin OpenJDK; 5 warmup iterations × 2s, 5 measurement iterations × 2s):
Benchmark Mode Cnt Score Error Units
EncodeDecodeBench.decode_list thrpt 5 784.240 ± 3.439 ops/ms
EncodeDecodeBench.decode_nested thrpt 5 570.729 ± 0.844 ops/ms
EncodeDecodeBench.decode_tabular thrpt 5 874.285 ± 3.410 ops/ms
EncodeDecodeBench.encode_object thrpt 5 600.403 ± 1.240 ops/ms
Latest results with PR #42, #43 & #44 optimizations (2025-12-10) Represents ~2x performance improvement over PR #43 baseline through systematic hot-path optimization
Performance highlights:
- Tabular decoding: ~874 ops/ms - optimized for CSV-like structures
- List decoding: ~784 ops/ms - fast array processing
- Nested decoding: ~571 ops/ms - efficient for deep object hierarchies
- Object encoding: ~600 ops/ms - consistent encoding performance
Note: numbers vary by JVM/OS/data shape. Run your own payloads with JMH for apples‑to‑apples comparison.
- Token savings: format‑driven and therefore similar across implementations. Expect ~30-60% on uniform/tabular data.
Example:
{ "tags": ["jazz","chill","lofi"] }→tags[3]: jazz,chill,lofi. - Accuracy: prompt‑ and data‑dependent. Community reports: JSON ≈ 70%, TOON ≈ 65% on some tasks. Measure on your prompts before switching.
- Throughput: toon4s encode throughput is on par with JToon on small/mid shapes (JMH: ~520 ops/ms). Decoding is implemented and fast in toon4s (tabular ~838 ops/ms, list ~745 ops/ms, nested ~538 ops/ms). If/when JToon adds decoding, compare like‑for‑like.
- Scala ergonomics: typed derivation (3.x), typeclasses (2.13), sealed ADTs, VectorMap ordering, streaming visitors, zero‑dep core.
- Guidance: use toon (TS) for Node stacks, JToon for Java codebases, toon4s for JVM. Token savings are equivalent; choose by ecosystem fit.
Savings are model/tokenizer-sensitive; treat ranges as guidance, not guarantees.
See also: Token benchmarks
// build.sbt
libraryDependencies += "com.vitthalmirji" %% "toon4s-core" % "0.1.0"Prefer CLI only? Ship the staged script (diagram below):
sbt cli/stage # builds ./cli/target/universal/stage/bin/toon4s-cli
./cli/target/universal/stage/bin/toon4s-cli --encode sample.json -o sample.toonimport io.toonformat.toon4s._
val payload = Map(
"users" -> Vector(
Map("id" -> 1, "name" -> "Ada", "tags" -> Vector("reading", "gaming")),
Map("id" -> 2, "name" -> "Bob", "tags" -> Vector("writing"))
)
)
val toon = Toon.encode(payload, EncodeOptions(indent = 2)).fold(throw _, identity)
println(toon)
// users[2]{id,name,tags}:
// 1,Ada,[2]: reading,gaming
// 2,Bob,[1]: writing
val json = Toon.decode(toon).fold(throw _, identity)
println(json)- Works with Scala 3.3.3 and Scala 2.13.14 (tested in CI).
- Accepts Scala collections, Java collections,
java.time.*,Option,Either,Product(case classes, tuples), andIterableOnce. - Deterministic ordering when encoding maps via
VectorMap. - Scala 3 derivation:
codec.Encoderandcodec.Decoderderive for case classes. Prefer typedToonTyped.encode[A: Encoder]/ToonTyped.decodeAs[A: Decoder]overAny-based methods.
# Encode JSON -> TOON with 4-space indentation and tab delimiters
toon4s-cli --encode data.json --indent 4 --delimiter tab -o data.toon
# Decode TOON -> JSON (strict mode on by default; pass lenient if needed)
toon4s-cli --decode data.toon --strictness lenient -o roundtrip.jsonAvailable flags:
| Flag | Description |
|---|---|
--encode / --decode |
Required: choose direction explicitly. |
--indent <n> |
Pretty-print indentation (default 2). |
--delimiter <comma|tab|pipe> |
Column delimiter for tabular arrays. |
--key-folding <off|safe> |
Fold single-key object chains into dotted paths (safe mode respects quoting). |
--flatten-depth <n> |
Limit folding depth when --key-folding safe (default: unlimited). |
--expand-paths <off|safe> |
Decode dotted keys into nested objects (safe mode keeps quoted literals). |
--strictness <strict|lenient> |
Strict enforces spec errors; lenient tolerates recoverable issues. |
--optimize |
Auto-pick delimiter and folding for token savings (implies --stats). |
--stats |
Print input/output token counts and savings to stderr. |
--tokenizer <cl100k|o200k|p50k|r50k> |
Select tokenizer for --stats (default cl100k). |
-o, --output <file> |
Target file (stdout when omitted). |
Use --stats to measure token impact. Choose a tokenizer with --tokenizer (e.g., o200k).
TOON borrows two big ideas:
- Indentation for structure (like YAML)
- Headers for uniform arrays (like CSV/TSV)
flowchart LR
scala["Scala data\nMap / Case Class / Iterable"]
norm["Normalize\n(JsonValue)"]
encoder["Encoders\n(pure)"]
toon["TOON text\n(headers)"]
llm["LLM prompt\n(token-efficient)"]
scala --> norm --> encoder --> toon --> llm
style scala fill:#e1f5ff,stroke:#0066cc,color:#000
style norm fill:#f0e1ff,stroke:#8800cc,color:#000
style encoder fill:#fff4e1,stroke:#cc8800,color:#000
style toon fill:#e1ffe1,stroke:#2d7a2d,color:#000
style llm fill:#ffe1e1,stroke:#cc0000,color:#000
Example:
orders[2]{id,user,total,items}:
1001,ada,29.70,[3]{sku,qty,price}:
A1,2,9.99
B2,1,5.50
C3,1,4.22
1002,bob,15.00,[1]: gift-card
orders[2]says “array length 2”.{id,user,...}declares columns for the following rows.- Nested arrays either go inline (
[3]: gift-card,store-credit) or open their own blocks.
Full spec reference: toon-format/spec.
See also: Encoding rules
- Strict indentation: use spaces (tabs rejected when
strict=true). Indent levels must be multiples ofDecodeOptions.indent. - Quotes only when required: strings with spaces, delimiters, or structural characters need
".."wrapping. - Array headers carry lengths: headers include the declared row count; strict mode validates it. Keep them intact in prompts to cross-check model output.
- Delimiters: choose comma (default), tab (token-efficient), or pipe (human-friendly). The delimiter is encoded in the header, so consumers know what to expect.
- Uniform rows: tabular arrays must have consistent field counts; strict mode enforces this.
Quoting vs. unquoted strings (encoder rules):
| Condition | Needs quotes? | Reason |
|---|---|---|
| Empty string | Yes | Ambiguous if unquoted. |
| Leading/trailing whitespace | Yes | Preserves spaces. |
Contains : |
Yes | Conflicts with key separators. |
Contains delimiter (,/\t/` |
`) | Yes |
Contains " or \\ |
Yes | Must be escaped inside quotes. |
Contains [ ] { } |
Yes | Structural tokens. |
Contains \n, \r, \t |
Yes | Control characters. |
Starts with - at list depth |
Yes | Could be parsed as list marker. |
Boolean/Null literal: true/false/null |
Yes | Avoids primitive coercion. |
Looks numeric (e.g., -12, 1.2e5, 01) |
Yes | Avoids numeric coercion; leading zeros are reserved. |
flowchart TD
s["string value"] --> check1{empty or trimmed != value?}
check1 -- yes --> q[quote]
check1 -- no --> check2{contains colon / delimiter?}
check2 -- yes --> q
check2 -- no --> check3{structural or control chars?}
check3 -- yes --> q
check3 -- no --> check4{boolean/null or numeric-like?}
check4 -- yes --> q
check4 -- no --> u[unquoted]
style s fill:#e1f5ff,stroke:#0066cc,color:#000
style q fill:#ffe1e1,stroke:#cc0000,color:#000
style u fill:#e1ffe1,stroke:#2d7a2d,color:#000
style check1 fill:#f0e1ff,stroke:#8800cc,color:#000
style check2 fill:#f0e1ff,stroke:#8800cc,color:#000
style check3 fill:#f0e1ff,stroke:#8800cc,color:#000
style check4 fill:#f0e1ff,stroke:#8800cc,color:#000
See also: Encoding rules
| Package | Purpose |
|---|---|
io.toonformat.toon4s |
Core types: Toon, JsonValue, EncodeOptions, DecodeOptions, Delimiter. Typed entry points live in ToonTyped: ToonTyped.encode[A: Encoder], ToonTyped.decodeAs[A: Decoder]. |
io.toonformat.toon4s.encode.* |
Encoders, primitive formatting helpers. |
io.toonformat.toon4s.decode.* |
Decoders, parser/validation utilities. |
io.toonformat.toon4s.decode.Streaming |
Streaming visitors for tabular arrays (foreachTabular) and nested arrays (foreachArrays). |
io.toonformat.toon4s.json.SimpleJson |
Lightweight JSON AST + parser/stringifier used in tests/CLI. |
io.toonformat.toon4s.cli.* |
CLI wiring (Main, token estimator). |
Most teams only interact with Toon.encode, Toon.decode, and JsonValue pattern matching. Lower-level modules stay
internal unless you are extending the format.
See also: JsonValue ADT, Encoding model, Decoding rules
| Scala type | TOON behaviour |
|---|---|
String, Boolean, Byte/Short/Int/Long, Float/Double, BigDecimal |
Direct primitives; floats/ doubles silently drop NaN/Inf → null (to stay deterministic). |
Option[A] |
Some(a) → encode a; None → null. |
Either[L, R] |
Encoded as JSON-like objects ({"Left": ...}) via product encoding. Consider normalizing upstream. |
Iterable, Iterator, Array |
Encoded as TOON arrays, falling back to list syntax when not tabular. |
Map[String, _], VectorMap |
Preserve insertion order; keys auto-quoted when needed. |
Product (case classes / tuples) |
Converted through productElementNames + productIterator. |
Java time (Instant, ZonedDateTime, etc.) |
ISO‑8601 strings, UTC-normalized for deterministic prompts. |
Preferred (Scala 3): typed APIs with type classes.
import io.toonformat.toon4s._
import io.toonformat.toon4s.codec.{Encoder, Decoder}
case class User(id: Int, name: String) derives Encoder, Decoder
val s: String = Toon.encode(User(1, "Ada")).fold(throw _, identity)
val u: User = ToonTyped.decodeAs[User](s).fold(throw _, identity)Fallbacks:
- Decoding always yields the
JsonValueADT; pattern-match it if you prefer. SimpleJson.toScalayieldsAnyfor quick-and-dirty interop.
Why another TOON for JVM/Scala?
- Ergonomics: native Scala APIs and derivation reduce boilerplate versus Java/TS bindings in Scala codebases.
- Footprint: zero-dep core minimizes transitive risk compared to libraries built atop general JSON stacks.
- Streaming: visitors let you validate/model-check row counts without paying for full tree allocation.
- Parity: same token savings as JToon/toon because the format drives savings, not the implementation.
- Throughput: competitive decode throughput (see JMH); encode throughput is solid and easy to reason about.
See also: Encoding model, JsonValue ADT
flowchart TD
raw["LLM response"]
parse["SimpleJson.parse"]
json["JsonValue\n(JObj/JArray…)"]
mapScala["Pattern match /\ncustom decoder"]
domain["Domain model\n(case class, DTO)"]
raw --> parse --> json --> mapScala --> domain
style raw fill:#e1f5ff,stroke:#0066cc,color:#000
style parse fill:#fff4e1,stroke:#cc8800,color:#000
style json fill:#f0e1ff,stroke:#8800cc,color:#000
style mapScala fill:#ffe1e1,stroke:#cc0000,color:#000
style domain fill:#e1ffe1,stroke:#2d7a2d,color:#000
Prompt scaffolding idea:
System: You are a precise data validator.
User:
Please read the following TOON payload describing purchase orders.
Return JSON with fields {id, total, status} for every order with total > 100.
Validate row counts against the headers.
Then attach:
orders[3]{id,total,status}:
101,250.10,pending
102,89.00,fulfilled
103,140.00,review
Why it helps:
- Array headers give you a checksum (“model must return 3 rows”).
- Tabular headers reduce hallucinations (model sees explicit columns).
- Reduced tokens = cheaper prompts; faster iteration = cheaper eval runs.
For response validation, decode the model output using Toon.decode (if the LLM responds in TOON) or rehydrate JSON
responses and compare lengths/keys.
See also: Delimiters & headers, Strict mode
What we didn't compromise on: toon4s prioritizes correctness, type safety, and functional purity over convenience. All limitations below are honest tradeoffs we made consciously-not shortcuts.
These are inherent to the TOON specification, not toon4s:
- Irregular arrays: When rows differ in shape, TOON falls back to YAML-like list syntax; token savings shrink. This is by design-tabular encoding requires uniform structure.
- Binary blobs: TOON doesn't support binary data (spec limitation). Encode as Base64 strings manually before passing to toon4s.
These are conscious design decisions:
-
Full AST decode (v0.1.0):
Toon.decode()andToon.decodeFrom()read entire input into memory before parsing. This ensures:- Pure functions: Decode returns
Either[DecodeError, JsonValue]with complete error context - Type safety: Full AST enables exhaustive pattern matching and sealed ADT validation
- Referential transparency: No hidden state, no streaming cursors to manage
For large files (>100MB), we provide streaming alternatives that maintain purity:
Streaming.foreachTabular- tail-recursive row-by-row validation (constant memory)Streaming.foreachArrays- validate nested arrays incrementally (stack-safe)- Both use pure visitor pattern (no side effects, accumulator-based)
Full streaming decode (incremental parsing of entire documents) is planned for v0.2.0 while maintaining functional purity (likely using FS2/ZIO Stream integration).
- Pure functions: Decode returns
-
Deterministic ordering: We use
VectorMapinstead ofHashMapbecause predictable field ordering matters more than raw lookup speed. This aids debugging, testing, and spec compliance. -
No mutation: Immutability with tailrec. Trade: ~20% throughput decrease. Gain: zero race conditions, zero hidden state, composable functions.
-
No external dependencies (core): Zero deps means you can't use Jackson/Circe codecs directly. Trade: manual integration. Gain: 491KB JAR, zero CVEs, zero conflicts.
- Locale-specific numbers: Encoder always uses
.decimal separators (spec requirement). Normalize inputs beforehand. - CLI tokenizer:
TokenEstimatorcurrently defaults toCL100K_BASE(GPT-4/3.5). Model-specific differences apply (easily configurable).
Philosophy: We refuse shortcuts that compromise type safety (Any, asInstanceOf), purity (var, while, null), or correctness (exceptions in happy paths). If a feature can't be implemented purely, we defer it until we find the right abstraction.
| Construct | Example | Notes |
|---|---|---|
| Object | user:\n id: 123\n name: Ada |
Indentation defines nesting. |
| Inline primitives | tags[3]: reading,gaming,coding |
Quotes only when needed. |
| Tabular array | users[2]{id,name}:\n 1,Ada\n 2,Bob |
Header defines columns. |
| Nested tabular | orders[1]{id,items}:\n 1,[2]{sku,qty}: ... |
Inner header scoped to nested block. |
| Header with delimiter | `items[2 | ]{sku |
| Empty array/object | prefs[0]: or prefs: {} |
Choose whichever fits your schema. |
| Comments | (not part of spec - strip before encoding) | Keep prompts clean; TOON itself has no comment syntax. |
- CLI flag rename:
--strictis deprecated; use--strictness strict|lenient. The old flag still works with a warning for now. - Length markers: legacy
[#N]headers are no longer emitted; headers remain[N]{...}with delimiter hints (e.g.,[2|]{...}). Decoders stay lenient toward existing[#N]inputs. - Row depth: tabular arrays that are the first field in list-item objects now emit rows at depth
+2(v3 layout). Decoders remain lenient to legacy depths. - Path expansion & key folding: available via
--expand-paths safeand--key-folding safe; defaults remain off for backward compatibility.
sbt scalafmtCheckAll # formatting
sbt +test # Scala 2.13 and 3.3 suites
./smoke-tests/run-smoke.shReleases are fully automated, but you must complete the prerequisites in
docs/releasing.md (namespace approval + PGP key upload)
before the GitHub Actions workflows can publish to Maven Central.
GitHub actions runs:
- Quick checks: scalafmt +
+compileon Ubuntu. - Matrix tests: Linux/macOS/Windows × Scala 2.13 & 3.3, with test-report artifacts when a shard fails.
- Smoke: CLI round trip script on Ubuntu.
- All checks pass “gate” job.
- Quick run (single iteration, small windows):
sbt "jmh/jmh:run -i 1 -wi 1 -r 500ms -w 500ms -f1 -t1 io.toonformat.toon4s.jmh.EncodeDecodeBench.*"
- Typical run:
sbt "jmh/jmh:run -i 5 -wi 5 -f1 -t1 io.toonformat.toon4s.jmh.EncodeDecodeBench.*"
Or use aliases:
sbt jmhDev # quick check
sbt jmhFull # heavy run
- Intent: publish indicative throughput numbers for common shapes (tabular, lists, nested objects) under reproducible settings.
- Harness: JMH via
sbt-jmh0.4.5. Single thread (-t1), single fork (-f1). - Quick config:
-i 1 -wi 1 -r 500ms -w 500ms(fast sanity; noisy but useful for local checks). - Heavy config:
-i 5 -wi 5 -r 2s -w 2s(more stable). CI runs this set with a soft 150s guard. - Reporting: CI also emits JSON (
-rf json -rff /tmp/jmh.json) and posts a summary table on PRs. - Machine baseline (indicative): macOS Apple M‑series (M2/M3), Temurin Java 21, default power settings.
- Guidance: close heavy apps/IDEs, plug in AC power, warm JVM before measurement. Numbers vary by OS/JVM/data shapes-treat them as relative, not absolute.
For Apache Spark-style workloads processing millions of rows, toon4s provides a composable visitor pattern that eliminates intermediate allocations:
import io.toonformat.toon4s.visitor._
// Compose: Repair LLM output → Filter sensitive keys → Encode
val visitor = new JsonRepairVisitor(
new FilterKeysVisitor(
Set("password", "ssn", "api_key"),
new StringifyVisitor(indent = 2)
)
)
// Single pass, zero intermediate trees
val cleanToon: String = Dispatch(llmJson, visitor)Visitor composition flow:
flowchart LR
JSON["JsonValue Tree"] --> DISPATCH["Dispatch"]
DISPATCH --> REPAIR["JsonRepairVisitor"]
REPAIR --> FILTER["FilterKeysVisitor"]
FILTER --> STRINGIFY["StringifyVisitor"]
STRINGIFY --> OUTPUT["TOON String"]
style JSON fill:#e1f5ff,stroke:#0066cc,color:#000
style DISPATCH fill:#fff4e1,stroke:#cc8800,color:#000
style REPAIR fill:#f0e1ff,stroke:#8800cc,color:#000
style FILTER fill:#f0e1ff,stroke:#8800cc,color:#000
style STRINGIFY fill:#f0e1ff,stroke:#8800cc,color:#000
style OUTPUT fill:#e1ffe1,stroke:#2d7a2d,color:#000
Performance comparison:
flowchart TD
subgraph WITHOUT["Without visitors - O(n) space"]
W1["parse(row)"] --> W2["Tree 1"]
W2 --> W3["filter(tree1)"]
W3 --> W4["Tree 2"]
W4 --> W5["encode(tree2)"]
W5 --> W6["String"]
end
subgraph WITH["With visitors - O(d) space"]
V1["Dispatch(row, visitor)"] --> V2["Single Pass"]
V2 --> V3["String"]
end
style W2 fill:#ffe1e1,stroke:#cc0000,color:#000
style W4 fill:#ffe1e1,stroke:#cc0000,color:#000
style W6 fill:#e1ffe1,stroke:#2d7a2d,color:#000
style V1 fill:#f0e1ff,stroke:#8800cc,color:#000
style V2 fill:#fff4e1,stroke:#cc8800,color:#000
style V3 fill:#e1ffe1,stroke:#2d7a2d,color:#000
Dispatch algorithm (how visitor traversal works):
flowchart TD
START["Dispatch(json, visitor)"] --> MATCH{Pattern match JsonValue}
MATCH -->|" JString(s) "| VS["visitor.visitString(s)"]
MATCH -->|" JNumber(n) "| VN["visitor.visitNumber(n)"]
MATCH -->|" JBool(b) "| VB["visitor.visitBool(b)"]
MATCH -->|" JNull "| VNULL["visitor.visitNull()"]
MATCH -->|" JArray(elems) "| ARR["Map over elements:\nDispatch(elem, visitor)"]
MATCH -->|" JObj(fields) "| OBJ["visitor.visitObject()"]
ARR --> VARR["visitor.visitArray(results)"]
OBJ --> LOOP{"For each (key, value)"}
LOOP --> VKEY["objVisitor.visitKey(key)"]
VKEY --> VVAL["objVisitor.visitValue()"]
VVAL --> REC["Dispatch(value, newVisitor)"]
REC --> VVALRES["objVisitor.visitValue(result)"]
VVALRES --> LOOP
LOOP -->|" Done "| DONE["objVisitor.done()"]
VS --> RETURN["Return T"]
VN --> RETURN
VB --> RETURN
VNULL --> RETURN
VARR --> RETURN
DONE --> RETURN
style START fill:#e1f5ff,stroke:#0066cc,color:#000
style MATCH fill:#fff4e1,stroke:#cc8800,color:#000
style VS fill:#f0e1ff,stroke:#8800cc,color:#000
style VN fill:#f0e1ff,stroke:#8800cc,color:#000
style VB fill:#f0e1ff,stroke:#8800cc,color:#000
style VNULL fill:#f0e1ff,stroke:#8800cc,color:#000
style ARR fill:#f0e1ff,stroke:#8800cc,color:#000
style OBJ fill:#f0e1ff,stroke:#8800cc,color:#000
style VARR fill:#f0e1ff,stroke:#8800cc,color:#000
style LOOP fill:#fff4e1,stroke:#cc8800,color:#000
style VKEY fill:#f0e1ff,stroke:#8800cc,color:#000
style VVAL fill:#f0e1ff,stroke:#8800cc,color:#000
style REC fill:#fff4e1,stroke:#cc8800,color:#000
style VVALRES fill:#f0e1ff,stroke:#8800cc,color:#000
style DONE fill:#f0e1ff,stroke:#8800cc,color:#000
style RETURN fill:#e1ffe1,stroke:#2d7a2d,color:#000
ObjectVisitor lifecycle (zero-overhead secret):
sequenceDiagram
participant D as Dispatch
participant V as Visitor[T]
participant OV as ObjectVisitor[T]
participant DS as Downstream Visitor
Note over D, DS: Processing JObj({"name": "Ada", "age": 30})
D ->> V: visitObject()
V ->> OV: Create ObjectVisitor
OV -->> D: Return objVisitor
loop For each field
D ->> OV: visitKey("name")
Note over OV: Store key, no allocation yet
D ->> OV: visitValue()
OV ->> DS: Return new visitor for value
D ->> DS: Dispatch(JString("Ada"), visitor)
DS -->> D: Return result: T
D ->> OV: visitValue(result)
Note over OV: Forward (key, T) to downstream
end
D ->> OV: done()
OV -->> D: Return final T
Note over D, DS: Zero intermediate trees - results flow directly!
Key visitors:
StringifyVisitor- Terminal visitor producing TOON stringsConstructionVisitor- Terminal visitor reconstructing JsonValue treesFilterKeysVisitor- Intermediate visitor removing sensitive fieldsJsonRepairVisitor- Fixes malformed LLM JSON (converts string "true" → JBool, normalizes keys, etc.)StreamingEncoder- Streams directly to Writer for large datasetsTreeWalker[T]- Universal adapter for encoding from Jackson JsonNode, Circe Json, Play JSON, etc. without JsonValue conversionTreeConstructionVisitor[T]- Universal adapter for decoding to Jackson JsonNode, Circe Json, etc. without JsonValue intermediateVisitorConverter[T]- Typeclass for converting domain models to JsonValue with.toJsonValuesyntax
Performance: O(n) time, O(d) space where d = depth. Perfect for processing millions of rows with constant memory.
Jackson/Circe interop (zero-overhead, typeclass pattern):
import io.toonformat.toon4s.visitor.TreeWalkerOps._
// Setup: copy JacksonWalker adapter from TreeWalker scaladocs
implicit val walker: TreeWalker[JsonNode] = JacksonWalker
// Encode: Jackson JsonNode → TOON (zero JsonValue intermediate)
val jacksonNode: JsonNode = objectMapper.readTree(apiResponse)
val toon: String = jacksonNode.toToon(indent = 2)
val filtered: String = jacksonNode.toToonFiltered(Set("password"), indent = 2)
// Decode: TOON → Jackson JsonNode (zero JsonValue intermediate)
val factory = JsonNodeFactory.instance
val jacksonNode: JsonNode = Toon.decode(toonString)
.map(Dispatch(_, JacksonConstructionVisitor(factory)))
.fold(throw _, identity)See TreeWalker and TreeConstructionVisitor scaladocs for complete Jackson/Circe adapter examples (copy-paste ready).
See also: io.toonformat.toon4s.visitor package docs
and Li Haoyi's article.
- Tabular rows only:
import io.toonformat.toon4s.decode.Streaming
val reader = new java.io.StringReader(
"""
users[2]{id,name}:
1,Ada
2,Bob
""".stripMargin)
Streaming.foreachTabular(reader) { (key, fields, values) =>
// key = Some("users"), fields = List("id","name"), values = Vector("1","Ada") then Vector("2","Bob")
}- Nested arrays with path:
val reader2 = new java.io.StringReader(
"""
orders[1]{id,items}:
1001,[2]{sku,qty}:
A1,2
B2,1
""".stripMargin)
Streaming.foreachArrays(reader2)({ (path, header) =>
// path: Vector("orders") when header key is bound
})({ (path, header, values) =>
// values: Vector("A1","2"), then Vector("B2","1")
})When to use streaming
- Validate/model‑check tabular sections quickly (row counts, required columns) without allocating a full AST.
- Pipe rows directly to sinks (CSV writers, database ingesters, online aggregation) for large payloads.
- Pre‑filter/transform rows on the fly before passing trimmed data to LLMs.
- Keep full
Toon.decodefor non‑tabular or when you need the entire tree (e.g., complex nested edits).
MIT - see LICENSE.