A Ruby gem implementing TOON (Token-Oriented Object Notation) β the compact, human-readable serialization format that slashes LLM token usage by 30-60% vs JSON while staying lossless.
Perfect for API responses, database exports, and LLM prompts!
π‘ Inspired by: This gem is based on the TOON format specification and provides a complete Ruby implementation.
graph LR
JSON[JSON: 100% tokens] -->|30-60% savings| TOON[TOON: 40-70% tokens]
TOON -->|lossless| JSON
subgraph LLM
Prompt[Your LLM Prompt]
end
TOON -.->|Cheaper/Faster| Prompt
Key Wins:
- π Token Reduction: 30-60% fewer tokens for LLM contexts
- π Bidirectional:
encode/decodewith 100% round-trip fidelity - π Smart Tabular Arrays: Auto-optimizes uniform data (e.g., DB records)
- π‘οΈ Secure by Design: Depth limits, circular refs, no
eval - β‘ Fast: ~2x JSON speed
- ποΈ CLI + Rails: Ready for production
Requirements:
- Ruby 3.0 or higher
- Tested on Ruby 3.0, 3.1, 3.2, 3.3, 3.4
Add to your Gemfile:
gem 'toon-format'Then install:
bundle installOr install directly:
gem install toon-formatrequire 'toon_format'
# Encode
data = { name: 'Alice', age: 30 }
toon = ToonFormat.encode(data)
# => "name: Alice\nage: 30"
# Decode
original = ToonFormat.decode(toon)
# => {:name=>"Alice", :age=>30}
# Tabular magic β¨
users = [{id:1, name:'Alice'}, {id:2, name:'Bob'}]
ToonFormat.encode(users)
# => "[2,]{id,name}:\n1,Alice\n2,Bob"flowchart TD
Data[Ruby Data] --> Type{Check Type}
Type -->|Primitive| Prim["null/true/false/num/str"]
Type -->|Hash| Obj["key: value\n..."]
Type -->|Array| Tab{Uniform?<br/>All Hashes +<br/>Primitive Values?}
Tab -->|Yes| Table["[N,]{id,name,...}:\nrow1\nrow2"]
Tab -->|No| List["[N]:\n item1\n item2"]
Prim --> Output[TOON String]
Obj --> Output
Table --> Output
List --> Output
graph TB
subgraph 'Public API'
Main[lib/toon_format.rb<br/>encode/decode/estimate_savings]
end
subgraph 'Core'
Enc[encoder.rb]
Dec[decoder.rb]
Pars[parser.rb]
Val[validator.rb]
Err[errors.rb]
end
subgraph 'Integrations'
Rails[rails/extensions.rb<br/>ActiveRecord#to_toon]
CLI[exe/toon-format]
end
Main --> Enc
Main --> Dec
Dec --> Pars
Dec --> Val
Main -.-> Rails
Main -.-> CLI
stats = ToonFormat.estimate_savings(data)
# => {json_tokens: 1234, toon_tokens: 789, savings_percent: 36.1}ToonFormat.encode(data, delimiter: '|', indent: 4, length_marker: false)ToonFormat.decode(toon, strict: false) # Skip validationAuto-extends ActiveRecord:
user.to_toon(only: [:id, :name])# Encode JSON β TOON
toon-format encode data.json > data.toon
# Decode
toon-format decode data.toon > data.json
# Stats
toon-format stats data.json
# JSON: 1,234 tokens | TOON: 789 | Savings: 36.1%
# Pipe it!
cat api.json | toon-format encodeOptions: --output FILE --no-strict --delimiter '|' --indent 4 --no-length-marker
| Scenario | Speed vs JSON | Token Savings |
|---|---|---|
| Tabular Data (100 records) | 2-3x faster | ~52% π― |
| Simple Objects | 1-2x faster | ~14% |
| Nested Structures | Similar | ~22% |
| Large Datasets (1000+) | 1.5-2x faster | 40-70% π |
We have 11 specialized benchmarks covering:
- β‘ Performance: Encode/decode speed, scalability (1-10k records)
- π Comparisons: vs JSON, YAML, MessagePack, CSV
- π Real-World: API responses, DB exports, LLM contexts
- π Advanced: Memory usage, validation overhead, deep nesting
- π Fidelity: Round-trip tests, data integrity
Run all benchmarks:
ruby benchmark/run_all_benchmarks.rbRun individual benchmarks:
ruby benchmark/token_reduction_benchmark.rb # Token savings
ruby benchmark/scalability_benchmark.rb # 1-10k records
ruby benchmark/real_world_benchmark.rb # Practical scenarios
ruby benchmark/format_comparison_benchmark.rb # vs other formatsSee benchmark/README.md for details.
MAX_DEPTH=100MAX_ARRAY_SIZE=100_000- Circular reference detection
- UTF-8 validation
- No
eval
- β v0.1.0: Core features + 83% coverage (42+ specs)
- π Next: Complex nesting, 95% coverage
- Fork & clone
bin/setupbundle exec rspecbundle exec rubocop -a- PR away! π
See CONTRIBUTING.md for guidelines.
- π TOON Format Repository - Original TOON format
- π TOON Specification - Format specification
- π This Ruby Implementation
- π Changelog
- π€ Contributing
- π Benchmarks
- ποΈ Architecture
This gem is inspired by and implements the TOON format specification, created to optimize token usage for LLM contexts. Special thanks to the TOON format community for developing this innovative serialization approach.
β Star on GitHub & try it in your LLM pipelines! π