Thanks to visit codestin.com
Credit goes to github.com

Skip to content

vadelabs/toon

Repository files navigation

TOON: Token-Oriented Object Notation

Clojars Project CI SPEC v3.0 License: MIT

A Clojure/ClojureScript implementation of Token-Oriented Object Notation – a compact, human-readable serialization format designed for passing structured data to Large Language Models with significantly reduced token usage.

TOON achieves 49% fewer tokens than formatted JSON (28% vs compact JSON) while maintaining explicit structure that helps LLMs parse and validate data reliably. It's intended for LLM input as a lossless, drop-in representation of JSON data.

Specification: This library implements TOON v3.0 specification Reference Implementation: TypeScript/JavaScript

Why TOON?

When working with Large Language Models, token efficiency directly impacts cost, context window usage, and processing speed. LLM tokens still cost money – and standard JSON is verbose and token-expensive.

TOON's sweet spot is uniform arrays of objects – multiple fields per row, same structure across items. It borrows YAML's indentation-based structure for nested objects and CSV's tabular format for uniform data rows, then optimizes both for token efficiency in LLM contexts.

Token Efficiency

Based on benchmarks using the GPT-5 o200k_base tokenizer:

  • 49.1% reduction vs formatted JSON (2-space indentation)
  • 28.0% reduction vs compact JSON (minified)
  • 39.4% reduction vs YAML
  • 56.0% reduction vs XML

Real-world examples:

  • GitHub repositories (100 items): 42.3% fewer tokens than JSON
  • Daily analytics (180 days): 58.9% fewer tokens than JSON
  • E-commerce orders: 35.4% fewer tokens than JSON

Key Features

  • 💸 Token-efficient: Eliminates redundant punctuation and repeated keys
  • 🤿 LLM-friendly guardrails: Explicit lengths and fields enable validation
  • 🍱 Minimal syntax: Removes braces, brackets, and most quotes
  • 📐 Indentation-based: Uses whitespace like YAML instead of braces
  • 🧺 Tabular arrays: Declare keys once, stream data as rows

When to Use TOON

TOON excels at:

  • Uniform arrays of objects (same fields, primitive values)
  • Large datasets with consistent structure
  • Tabular data with multiple rows

JSON is better for:

  • Non-uniform data with varying field sets
  • Deeply nested structures
  • Mixed-type collections

CSV is more compact for:

  • Flat, uniform tables without any nesting
  • Data without nested objects or arrays

Installation

Clojure CLI/deps.edn

com.vadelabs/toon {:mvn/version "2025.12.01-36"}

Leiningen/Boot

[com.vadelabs/toon "2025.12.01-36"]

Quick Start

(require '[com.vadelabs.toon.core :as toon])

;; Encode Clojure data to TOON
(toon/encode {:name "Alice" :age 30 :tags ["dev" "rust"]})
;=> "name: Alice\nage: 30\ntags[2]: dev,rust"

;; Decode TOON to Clojure data
(toon/decode "name: Alice\nage: 30\ntags[2]: dev,rust")
;=> {"name" "Alice", "age" 30.0, "tags" ["dev" "rust"]}

Format Examples

Objects

JSON:

{
  "name": "Alice",
  "age": 30,
  "active": true
}

TOON:

name: Alice
age: 30
active: true

Nested Objects

JSON:

{
  "user": {
    "name": "Alice",
    "email": "[email protected]"
  }
}

TOON:

user:
  name: Alice
  email: [email protected]

Arrays of Primitives (Inline)

JSON:

{
  "tags": ["reading", "gaming", "coding"]
}

TOON:

tags[3]: reading,gaming,coding

Arrays of Objects (Tabular Format)

This is TOON's sweet spot – uniform arrays of objects with consistent fields:

JSON:

{
  "users": [
    {"id": 1, "name": "Alice", "role": "admin"},
    {"id": 2, "name": "Bob", "role": "user"}
  ]
}

TOON:

users[2]{id,name,role}:
  1,Alice,admin
  2,Bob,user

The tabular format eliminates repeated keys, providing significant token savings for large datasets.

Arrays of Mixed Items (List Format)

For non-uniform data, TOON uses list format:

TOON:

items[3]:
  - name: Laptop
    price: 999
  - name: Mouse
    price: 29
  - name: Keyboard
    price: 79

API Reference

encode

Encodes Clojure data structures to TOON format.

(encode input)
(encode input options)

Parameters:

  • input - Any Clojure value (normalized to JSON-compatible types)
  • options - Optional map:
    • :indent - Spaces per indentation level (default: 2)
    • :delimiter - Array value delimiter: "," (default), "\t", or "|"
    • :key-collapsing - Key collapsing mode: :off (default) or :safe
    • :flatten-depth - Max depth for key collapsing (default: Infinity)
    • :replacer - Function (fn [key value path] ...) to transform/filter values

Returns: String in TOON format

Examples:

;; Basic encoding
(encode {:name "Ada" :tags ["reading" "gaming"]})
;=> "name: Ada\ntags[2]: reading,gaming"

;; Custom delimiter
(encode {:tags ["a" "b" "c"]} {:delimiter "\t"})
;=> "tags[3\t]: a\tb\tc"

;; Tabular array format
(encode [{:id 1 :name "Alice"}
         {:id 2 :name "Bob"}])
;=> "[2]{id,name}:\n  1,Alice\n  2,Bob"

;; Using replacer to filter sensitive fields
(encode {:name "Alice" :password "secret"}
        {:replacer (fn [k v _] (when-not (= k "password") v))})
;=> "name: Alice"

;; Using replacer to transform values
(require '[clojure.string :as str])
(encode {:status "active"}
        {:replacer (fn [k v _] (if (string? v) (str/upper-case v) v))})
;=> "status: ACTIVE"

decode

Decodes TOON format to Clojure data structures.

(decode input)
(decode input options)

Parameters:

  • input - String in TOON format
  • options - Optional map:
    • :indent - Spaces per indentation level (default: 2)
    • :strict - Enable strict validation (default: true)

Returns: Clojure data structure (maps, vectors, primitives)

Examples:

;; Basic decoding
(decode "name: Ada\ntags[2]: reading,gaming")
;=> {"name" "Ada", "tags" ["reading" "gaming"]}

;; Tabular array
(decode "[2]{id,name}:\n  1,Alice\n  2,Bob")
;=> [{"id" 1.0, "name" "Alice"} {"id" 2.0, "name" "Bob"}]

;; Inline array
(decode "[3]: 1,2,3")
;=> [1.0 2.0 3.0]

;; Relaxed mode (allows tabs, inconsistent indentation)
(decode "name: Ada" {:strict false})
;=> {"name" "Ada"}

Format Specification

Primitives

string: Hello World
number: 42
float: 3.14
boolean: true
nil: null

Quoted Strings

Strings are quoted when they contain special characters:

comma: "a,b"
colon: "key:value"
reserved: "true"
newline: "line1\nline2"

Objects

Key-value pairs separated by colons:

name: Alice
age: 30

Nested objects use indentation:

user:
  name: Alice
  email: [email protected]

Arrays

Inline format (primitives):

tags[3]: reading,gaming,coding

Tabular format (objects with same keys):

[3]{id,name}:
  1,Alice
  2,Bob
  3,Carol

List format (mixed items):

items[2]:
  - name: Laptop
    price: 999
  - name: Mouse
    price: 29

Options

Custom delimiter:

tags[3|]: a|b|c
tags[3\t]: a\tb\tc

Length marker:

items[#3]: 1,2,3

Type Normalization

TOON normalizes Clojure types to JSON-compatible values:

  • Keywords → Strings: :name"name"
  • Sets → Sorted vectors: #{3 1 2}[1 2 3]
  • All numbers → Doubles: 4242.0
  • Maps → String-keyed maps: {:a 1}{"a" 1.0}

Testing

# Run all Clojure tests
bb test

# Run all tests (Clojure + Babashka)
bb test:all

# Run CI pipeline with tests
bb ci

# Generate test coverage report
bb coverage

The library includes:

  • 340+ unit tests with 90%+ code coverage
  • Property-based tests using test.check
  • Comprehensive roundtrip testing
  • Edge case coverage

Coverage reports are generated in target/coverage/ including:

  • HTML report: target/coverage/index.html
  • Codecov JSON: target/coverage/codecov.json

Contributing

We welcome contributions! Please see CONTRIBUTING.md for:

  • Development setup
  • Coding guidelines
  • Testing requirements
  • Pull request process

Quick Contribution Guide

  1. Fork the repository
  2. Create a feature branch: git checkout -b feature/my-feature
  3. Make your changes with tests
  4. Run tests: bb test
  5. Commit with clear messages: git commit -m "add feature X"
  6. Push and create a pull request

Specification

This implementation follows the TOON v2.0 specification (2025-11-10).

For detailed format rules, edge cases, and conformance requirements, see:

Benchmarks

Detailed benchmarks comparing TOON against JSON, YAML, XML, and CSV across multiple datasets and LLM models are available in the reference implementation repository.

Key findings:

  • Token efficiency: 49% fewer tokens than formatted JSON on average
  • Retrieval accuracy: 70.1% (TOON) vs 65.4% (JSON) across 4 LLMs
  • Best case: 58.9% reduction for uniform tabular data (daily analytics)

Token counts are measured using the GPT-5 o200k_base tokenizer. Actual savings vary by model and tokenizer.

Other Implementations

Official Implementations

Community Implementations

Note: When implementing TOON in other languages, follow the specification to ensure compatibility. The conformance tests provide language-agnostic validation.

Roadmap

  • Conformance test suite integration
  • Performance benchmarks vs JSON for Clojure
  • ClojureScript browser optimization
  • Streaming encoder/decoder
  • Custom type handlers

License

Copyright © 2025 Vade Labs Pvt. Ltd.

Distributed under the MIT License. See LICENSE for details.

Links

About

No description, website, or topics provided.

Resources

License

Contributing

Stars

Watchers

Forks

Packages

No packages published

Contributors 2

  •  
  •