TOON: Token-Oriented Object Notation

A Clojure/ClojureScript implementation of Token-Oriented Object Notation – a compact, human-readable serialization format designed for passing structured data to Large Language Models with significantly reduced token usage.

TOON achieves 49% fewer tokens than formatted JSON (28% vs compact JSON) while maintaining explicit structure that helps LLMs parse and validate data reliably. It's intended for LLM input as a lossless, drop-in representation of JSON data.

Specification: This library implements TOON v3.0 specification Reference Implementation: TypeScript/JavaScript

Why TOON?

When working with Large Language Models, token efficiency directly impacts cost, context window usage, and processing speed. LLM tokens still cost money – and standard JSON is verbose and token-expensive.

TOON's sweet spot is uniform arrays of objects – multiple fields per row, same structure across items. It borrows YAML's indentation-based structure for nested objects and CSV's tabular format for uniform data rows, then optimizes both for token efficiency in LLM contexts.

Token Efficiency

Based on benchmarks using the GPT-5 o200k_base tokenizer:

49.1% reduction vs formatted JSON (2-space indentation)
28.0% reduction vs compact JSON (minified)
39.4% reduction vs YAML
56.0% reduction vs XML

Real-world examples:

GitHub repositories (100 items): 42.3% fewer tokens than JSON
Daily analytics (180 days): 58.9% fewer tokens than JSON
E-commerce orders: 35.4% fewer tokens than JSON

Key Features

💸 Token-efficient: Eliminates redundant punctuation and repeated keys
🤿 LLM-friendly guardrails: Explicit lengths and fields enable validation
🍱 Minimal syntax: Removes braces, brackets, and most quotes
📐 Indentation-based: Uses whitespace like YAML instead of braces
🧺 Tabular arrays: Declare keys once, stream data as rows

When to Use TOON

TOON excels at:

Uniform arrays of objects (same fields, primitive values)
Large datasets with consistent structure
Tabular data with multiple rows

JSON is better for:

Non-uniform data with varying field sets
Deeply nested structures
Mixed-type collections

CSV is more compact for:

Flat, uniform tables without any nesting
Data without nested objects or arrays

Installation

Clojure CLI/deps.edn

com.vadelabs/toon {:mvn/version "2025.12.01-36"}

Leiningen/Boot

[com.vadelabs/toon "2025.12.01-36"]

Quick Start

(require '[com.vadelabs.toon.core :as toon])

;; Encode Clojure data to TOON
(toon/encode {:name "Alice" :age 30 :tags ["dev" "rust"]})
;=> "name: Alice\nage: 30\ntags[2]: dev,rust"

;; Decode TOON to Clojure data
(toon/decode "name: Alice\nage: 30\ntags[2]: dev,rust")
;=> {"name" "Alice", "age" 30.0, "tags" ["dev" "rust"]}

Format Examples

Objects

JSON:

{
  "name": "Alice",
  "age": 30,
  "active": true
}

TOON:

name: Alice
age: 30
active: true

Nested Objects

JSON:

{
  "user": {
    "name": "Alice",
    "email": "[email protected]"
  }
}

TOON:

user:
  name: Alice
  email: [email protected]

Arrays of Primitives (Inline)

JSON:

{
  "tags": ["reading", "gaming", "coding"]
}

TOON:

tags[3]: reading,gaming,coding

Arrays of Objects (Tabular Format)

This is TOON's sweet spot – uniform arrays of objects with consistent fields:

JSON:

{
  "users": [
    {"id": 1, "name": "Alice", "role": "admin"},
    {"id": 2, "name": "Bob", "role": "user"}
  ]
}

TOON:

users[2]{id,name,role}:
  1,Alice,admin
  2,Bob,user

The tabular format eliminates repeated keys, providing significant token savings for large datasets.

Arrays of Mixed Items (List Format)

For non-uniform data, TOON uses list format:

TOON:

items[3]:
  - name: Laptop
    price: 999
  - name: Mouse
    price: 29
  - name: Keyboard
    price: 79

API Reference

`encode`

Encodes Clojure data structures to TOON format.

(encode input)
(encode input options)

Parameters:

input - Any Clojure value (normalized to JSON-compatible types)
options - Optional map:
- :indent - Spaces per indentation level (default: 2)
- :delimiter - Array value delimiter: "," (default), "\t", or "|"
- :key-collapsing - Key collapsing mode: :off (default) or :safe
- :flatten-depth - Max depth for key collapsing (default: Infinity)
- :replacer - Function (fn [key value path] ...) to transform/filter values

Returns: String in TOON format

Examples:

;; Basic encoding
(encode {:name "Ada" :tags ["reading" "gaming"]})
;=> "name: Ada\ntags[2]: reading,gaming"

;; Custom delimiter
(encode {:tags ["a" "b" "c"]} {:delimiter "\t"})
;=> "tags[3\t]: a\tb\tc"

;; Tabular array format
(encode [{:id 1 :name "Alice"}
         {:id 2 :name "Bob"}])
;=> "[2]{id,name}:\n  1,Alice\n  2,Bob"

;; Using replacer to filter sensitive fields
(encode {:name "Alice" :password "secret"}
        {:replacer (fn [k v _] (when-not (= k "password") v))})
;=> "name: Alice"

;; Using replacer to transform values
(require '[clojure.string :as str])
(encode {:status "active"}
        {:replacer (fn [k v _] (if (string? v) (str/upper-case v) v))})
;=> "status: ACTIVE"

`decode`

Decodes TOON format to Clojure data structures.

(decode input)
(decode input options)

Parameters:

input - String in TOON format
options - Optional map:
- :indent - Spaces per indentation level (default: 2)
- :strict - Enable strict validation (default: true)

Returns: Clojure data structure (maps, vectors, primitives)

Examples:

;; Basic decoding
(decode "name: Ada\ntags[2]: reading,gaming")
;=> {"name" "Ada", "tags" ["reading" "gaming"]}

;; Tabular array
(decode "[2]{id,name}:\n  1,Alice\n  2,Bob")
;=> [{"id" 1.0, "name" "Alice"} {"id" 2.0, "name" "Bob"}]

;; Inline array
(decode "[3]: 1,2,3")
;=> [1.0 2.0 3.0]

;; Relaxed mode (allows tabs, inconsistent indentation)
(decode "name: Ada" {:strict false})
;=> {"name" "Ada"}

Format Specification

Primitives

string: Hello World
number: 42
float: 3.14
boolean: true
nil: null

Quoted Strings

Strings are quoted when they contain special characters:

comma: "a,b"
colon: "key:value"
reserved: "true"
newline: "line1\nline2"

Objects

Key-value pairs separated by colons:

name: Alice
age: 30

Nested objects use indentation:

user:
  name: Alice
  email: [email protected]

Arrays

Inline format (primitives):

tags[3]: reading,gaming,coding

Tabular format (objects with same keys):

[3]{id,name}:
  1,Alice
  2,Bob
  3,Carol

List format (mixed items):

items[2]:
  - name: Laptop
    price: 999
  - name: Mouse
    price: 29

Options

Custom delimiter:

tags[3|]: a|b|c
tags[3\t]: a\tb\tc

Length marker:

items[#3]: 1,2,3

Type Normalization

TOON normalizes Clojure types to JSON-compatible values:

Keywords → Strings: :name → "name"
Sets → Sorted vectors: #{3 1 2} → [1 2 3]
All numbers → Doubles: 42 → 42.0
Maps → String-keyed maps: {:a 1} → {"a" 1.0}

Testing

# Run all Clojure tests
bb test

# Run all tests (Clojure + Babashka)
bb test:all

# Run CI pipeline with tests
bb ci

# Generate test coverage report
bb coverage

The library includes:

340+ unit tests with 90%+ code coverage
Property-based tests using test.check
Comprehensive roundtrip testing
Edge case coverage

Coverage reports are generated in target/coverage/ including:

HTML report: target/coverage/index.html
Codecov JSON: target/coverage/codecov.json

Contributing

We welcome contributions! Please see CONTRIBUTING.md for:

Development setup
Coding guidelines
Testing requirements
Pull request process

Quick Contribution Guide

Fork the repository
Create a feature branch: git checkout -b feature/my-feature
Make your changes with tests
Run tests: bb test
Commit with clear messages: git commit -m "add feature X"
Push and create a pull request

Specification

This implementation follows the TOON v2.0 specification (2025-11-10).

For detailed format rules, edge cases, and conformance requirements, see:

Full Specification - Complete technical specification
Conformance Tests - Language-agnostic test fixtures
Examples - Example TOON files
Changelog - Spec version history

Benchmarks

Detailed benchmarks comparing TOON against JSON, YAML, XML, and CSV across multiple datasets and LLM models are available in the reference implementation repository.

Key findings:

Token efficiency: 49% fewer tokens than formatted JSON on average
Retrieval accuracy: 70.1% (TOON) vs 65.4% (JSON) across 4 LLMs
Best case: 58.9% reduction for uniform tabular data (daily analytics)

Token counts are measured using the GPT-5 o200k_base tokenizer. Actual savings vary by model and tokenizer.

Other Implementations

Official Implementations

TypeScript/JavaScript: toon-format/toon (reference implementation)
Python: toon-format/toon-python (in development)
Rust: toon-format/toon-rust (in development)

Community Implementations

.NET: ToonSharp
C++: ctoon
Crystal: toon-crystal
Dart: toon
Elixir: toon_ex
Gleam: toon_codec
Go: gotoon
Java: JToon
Lua/Neovim: toon.nvim
OCaml: ocaml-toon
PHP: toon-php
Python: python-toon
Ruby: toon-ruby
Swift: TOONEncoder

Note: When implementing TOON in other languages, follow the specification to ensure compatibility. The conformance tests provide language-agnostic validation.

Roadmap

Conformance test suite integration
Performance benchmarks vs JSON for Clojure
ClojureScript browser optimization
Streaming encoder/decoder
Custom type handlers

License

Distributed under the MIT License. See LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 108 Commits
.clj-kondo		.clj-kondo
.github/workflows		.github/workflows
doc		doc
resources		resources
src/com/vadelabs/toon		src/com/vadelabs/toon
test/com/vadelabs/toon		test/com/vadelabs/toon
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
SPEC.md		SPEC.md
bb.edn		bb.edn
build.clj		build.clj
deps.edn		deps.edn

License

vadelabs/toon

Folders and files

Latest commit

History

Repository files navigation

TOON: Token-Oriented Object Notation

Why TOON?

Token Efficiency

Key Features

When to Use TOON

Installation

Clojure CLI/deps.edn

Leiningen/Boot

Quick Start

Format Examples

Objects

Nested Objects

Arrays of Primitives (Inline)

Arrays of Objects (Tabular Format)

Arrays of Mixed Items (List Format)

API Reference

encode

decode

Format Specification

Primitives

Quoted Strings

Objects

Arrays

Options

Type Normalization

Testing

Contributing

Quick Contribution Guide

Specification

Benchmarks

Other Implementations

Official Implementations

Community Implementations

Roadmap

License

Links

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 5

Packages 0

Contributors 2

Uh oh!

Languages

`encode`

`decode`

Packages