Thanks to visit codestin.com
Credit goes to github.com

Skip to content

osmanok/toon-format

Toon Format πŸ–ΌοΈπŸ“¦

Gem License Ruby Version Tests Coverage

A Ruby gem implementing TOON (Token-Oriented Object Notation) – the compact, human-readable serialization format that slashes LLM token usage by 30-60% vs JSON while staying lossless.

Perfect for API responses, database exports, and LLM prompts!

πŸ’‘ Inspired by: This gem is based on the TOON format specification and provides a complete Ruby implementation.

πŸš€ Why TOON Format?

graph LR
  JSON[JSON: 100% tokens] -->|30-60% savings| TOON[TOON: 40-70% tokens]
  TOON -->|lossless| JSON
  subgraph LLM
    Prompt[Your LLM Prompt]
  end
  TOON -.->|Cheaper/Faster| Prompt
Loading

Key Wins:

  • πŸ† Token Reduction: 30-60% fewer tokens for LLM contexts
  • πŸ”„ Bidirectional: encode/decode with 100% round-trip fidelity
  • πŸ“Š Smart Tabular Arrays: Auto-optimizes uniform data (e.g., DB records)
  • πŸ›‘οΈ Secure by Design: Depth limits, circular refs, no eval
  • ⚑ Fast: ~2x JSON speed
  • πŸŽ›οΈ CLI + Rails: Ready for production

πŸ“¦ Installation

Requirements:

  • Ruby 3.0 or higher
  • Tested on Ruby 3.0, 3.1, 3.2, 3.3, 3.4

Add to your Gemfile:

gem 'toon-format'

Then install:

bundle install

Or install directly:

gem install toon-format

⚑ Quick Start

require 'toon_format'

# Encode
data = { name: 'Alice', age: 30 }
toon = ToonFormat.encode(data)
# => "name: Alice\nage: 30"

# Decode
original = ToonFormat.decode(toon)
# => {:name=>"Alice", :age=>30}

# Tabular magic ✨
users = [{id:1, name:'Alice'}, {id:2, name:'Bob'}]
ToonFormat.encode(users)
# => "[2,]{id,name}:\n1,Alice\n2,Bob"

πŸ› οΈ How It Works: Encoding Flow

flowchart TD
    Data[Ruby Data] --> Type{Check Type}
    Type -->|Primitive| Prim["null/true/false/num/str"]
    Type -->|Hash| Obj["key: value\n..."]
    Type -->|Array| Tab{Uniform?<br/>All Hashes +<br/>Primitive Values?}
    Tab -->|Yes| Table["[N,]{id,name,...}:\nrow1\nrow2"]
    Tab -->|No| List["[N]:\n  item1\n  item2"]
    Prim --> Output[TOON String]
    Obj --> Output
    Table --> Output
    List --> Output
Loading

πŸ—οΈ Architecture

graph TB
    subgraph 'Public API'
        Main[lib/toon_format.rb<br/>encode/decode/estimate_savings]
    end
    subgraph 'Core'
        Enc[encoder.rb]
        Dec[decoder.rb]
        Pars[parser.rb]
        Val[validator.rb]
        Err[errors.rb]
    end
    subgraph 'Integrations'
        Rails[rails/extensions.rb<br/>ActiveRecord#to_toon]
        CLI[exe/toon-format]
    end
    Main --> Enc
    Main --> Dec
    Dec --> Pars
    Dec --> Val
    Main -.-> Rails
    Main -.-> CLI
Loading

✨ Advanced Usage

Token Savings Estimator

stats = ToonFormat.estimate_savings(data)
# => {json_tokens: 1234, toon_tokens: 789, savings_percent: 36.1}

Custom Options

ToonFormat.encode(data, delimiter: '|', indent: 4, length_marker: false)

Strict Decoding

ToonFormat.decode(toon, strict: false)  # Skip validation

πŸš‚ Rails Integration

Auto-extends ActiveRecord:

user.to_toon(only: [:id, :name])

πŸ”§ CLI Tool

# Encode JSON β†’ TOON
toon-format encode data.json > data.toon

# Decode
toon-format decode data.toon > data.json

# Stats
toon-format stats data.json
# JSON: 1,234 tokens | TOON: 789 | Savings: 36.1%

# Pipe it!
cat api.json | toon-format encode

Options: --output FILE --no-strict --delimiter '|' --indent 4 --no-length-marker

πŸ“ˆ Benchmarks

Quick Results

Scenario Speed vs JSON Token Savings
Tabular Data (100 records) 2-3x faster ~52% 🎯
Simple Objects 1-2x faster ~14%
Nested Structures Similar ~22%
Large Datasets (1000+) 1.5-2x faster 40-70% πŸš€

Comprehensive Benchmark Suite

We have 11 specialized benchmarks covering:

  • ⚑ Performance: Encode/decode speed, scalability (1-10k records)
  • πŸ“Š Comparisons: vs JSON, YAML, MessagePack, CSV
  • 🌍 Real-World: API responses, DB exports, LLM contexts
  • πŸ” Advanced: Memory usage, validation overhead, deep nesting
  • πŸ”„ Fidelity: Round-trip tests, data integrity

Run all benchmarks:

ruby benchmark/run_all_benchmarks.rb

Run individual benchmarks:

ruby benchmark/token_reduction_benchmark.rb  # Token savings
ruby benchmark/scalability_benchmark.rb      # 1-10k records
ruby benchmark/real_world_benchmark.rb       # Practical scenarios
ruby benchmark/format_comparison_benchmark.rb # vs other formats

See benchmark/README.md for details.

πŸ›‘οΈ Security

  • MAX_DEPTH=100
  • MAX_ARRAY_SIZE=100_000
  • Circular reference detection
  • UTF-8 validation
  • No eval

πŸ“Š Status

  • βœ… v0.1.0: Core features + 83% coverage (42+ specs)
  • πŸ”„ Next: Complex nesting, 95% coverage

🀝 Contributing

  1. Fork & clone
  2. bin/setup
  3. bundle exec rspec
  4. bundle exec rubocop -a
  5. PR away! πŸŽ‰

See CONTRIBUTING.md for guidelines.

🌐 Resources & Links

TOON Format

This Gem

πŸ™ Acknowledgments

This gem is inspired by and implements the TOON format specification, created to optimize token usage for LLM contexts. Special thanks to the TOON format community for developing this innovative serialization approach.

πŸ“„ License

MIT

⭐ Star on GitHub & try it in your LLM pipelines! πŸš€

About

No description, website, or topics provided.

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •