ToonTokenizer for .NET

A complete .NET library for working with TOON (Token-Oriented Object Notation) - a compact, human-readable text format optimized for LLM prompts and structured data interchange.

What is TOON?

TOON is a line-oriented, indentation-based notation that encodes JSON data with explicit structure and minimal quoting. Think of it as:

More compact than JSON for arrays of uniform objects (no repeated keys)
More structured than CSV with nesting, types, and field names
More deterministic than YAML with explicit array lengths and fixed formatting rules

Perfect for LLM prompts, configuration files, and data interchange where token efficiency and readability matter.

Features

✅ Complete lexical analyzer with all TOON token types
✅ Full AST parser with resilient error recovery
✅ Token-to-AST navigation - easily map between tokens and syntax nodes
✅ Tokens included in parse results - no separate tokenization call needed
✅ Position tracking for every token and AST node (line, column, span)
✅ Resilient parsing - continues after errors, returns partial AST
✅ Rich error reporting - collects all errors with precise locations and error codes
✅ Standardized error codes - 20+ error codes (TOON1xxx-9xxx) for programmatic handling
✅ Context-aware error messages - every error explains what, why, and how to fix
✅ Visitor pattern for AST traversal and transformation
✅ Extension methods for syntax highlighting and IDE integration
✅ TOON spec §6.1 compliance - array size validation (detects size mismatches)
✅ Battle-tested with 637 unit tests (100% passing)

Targets: .NET Standard 2.0 (maximum compatibility)

Installation

dotnet add package ToonTokenizer

Or via Package Manager Console:

Install-Package ToonTokenizer

Quick Start

Parse TOON to AST

using ToonTokenizer;

var source = @"
users[2]{id,name,role}:
  1,Alice,admin
  2,Bob,user
";

// Parse returns: Document (AST), Errors (if any), and Tokens
var result = Toon.Parse(source);

if (result.IsSuccess)
{
    // Access the parsed document
    foreach (var property in result.Document.Properties)
    {
        Console.WriteLine($"{property.Key}: {property.Value}");
    }
    
    // Access tokens for syntax highlighting
    foreach (var token in result.Tokens)
    {
        Console.WriteLine($"{token.Type}: '{token.Value}' at {token.Line}:{token.Column}");
    }
}
else
{
    // Resilient parsing: you still get a partial AST + all errors
    Console.WriteLine($"Found {result.Errors.Count} error(s):");
    foreach (var error in result.Errors)
    {
        Console.WriteLine($"  Line {error.Line}: {error.Message}");
    }
}

Validate TOON

if (Toon.TryParse(source, out var result))
{
    if (result.IsSuccess)
        Console.WriteLine("✓ Valid TOON");
    else
        Console.WriteLine($"✗ {result.Errors.Count} error(s) found");
}

Access Tokens Only

// Get tokens without parsing
var tokens = Toon.Tokenize(source);

foreach (var token in tokens)
{
    Console.WriteLine($"{token.Type}: {token.Value}");
}

Navigate from Tokens to AST

var source = "name: John\nage: 30";
var result = Toon.Parse(source);

// Get a token and find which AST node it belongs to
var token = result.Tokens.Find(t => t.Value == "30");
var property = token.GetPropertyNode(result.Document);

Console.WriteLine($"Token '{token.Value}' belongs to property: {property.Key}");
// Output: Token '30' belongs to property: age

// Or find property at a specific line/column
var prop = result.GetPropertyAt(line: 2, column: 1);
Console.WriteLine($"Property at line 2: {prop.Key}");
// Output: Property at line 2: age

// Find nested properties by path
var theme = result.FindPropertyByPath("user.settings.theme");
if (theme?.Value is StringValueNode str)
{
    Console.WriteLine($"Theme: {str.Value}");
}

TOON Language Examples

Simple Properties

name: John Doe
age: 30
active: true
email: [email protected]

Nested Objects

user:
  name: Jane Smith
  email: [email protected]
  settings:
    theme: dark
    notifications: true

Arrays - Inline (Primitives)

colors[3]: red,green,blue
scores[5]: 95,87,92,88,91

Arrays - Tabular (Uniform Objects)

The killer feature! No repeated keys:

users[3]{id,name,email,active}:
  1,Alice,[email protected],true
  2,Bob,[email protected],false
  3,Charlie,[email protected],true

Compare to JSON:

{
  "users": [
    {"id": 1, "name": "Alice", "email": "[email protected]", "active": true},
    {"id": 2, "name": "Bob", "email": "[email protected]", "active": false},
    {"id": 3, "name": "Charlie", "email": "[email protected]", "active": true}
  ]
}

60% fewer tokens! 🎉

Real-World Example

context:
  task: Favorite hiking trails
  location: Boulder, CO
  season: Spring 2025

friends[3]: Ana,Luis,Sam

hikes[3]{id,name,distance,elevation,companion,sunny}:
  1,Blue Lake Trail,7.5,320,Ana,true
  2,Ridge Overlook,9.2,540,Luis,false
  3,Wildflower Loop,5.1,180,Sam,true

notes:
  best: Ridge Overlook has amazing views!
  bring: Water, snacks, sunscreen

API Reference

Main Entry Point: `Toon`

// Parse TOON source (returns Document, Errors, and Tokens)
ToonParseResult Parse(string source)

// Validate and parse (returns true for completed parse, even with errors)
bool TryParse(string source, out ToonParseResult result)

// Tokenize only
List<Token> Tokenize(string source)

Parse Result

public class ToonParseResult
{
    public ToonDocument Document { get; }      // Always available (even with errors)
    public List<ToonError> Errors { get; }     // Empty if no errors
    public List<Token> Tokens { get; }         // All tokens from lexing
    
    public bool IsSuccess => Errors.Count == 0;
    public bool HasErrors => Errors.Count > 0;
}

Errors with Precise Locations

public class ToonError
{
    public string Message { get; }
    public string? Code { get; }      // Error code (e.g., "TOON1001")
    public int Position { get; }      // 0-based character offset
    public int Length { get; }        // Length of error span
    public int Line { get; }          // 1-based line number
    public int Column { get; }        // 1-based column number
    public int EndPosition { get; }   // Position + Length
}

Error Codes for Programmatic Handling

All errors include standardized error codes for programmatic handling and filtering:

var result = Toon.Parse(source);

foreach (var error in result.Errors)
{
    // Errors include descriptive messages with fix suggestions
    Console.WriteLine($"[{error.Code}] {error.Message}");
    
    // Filter by error type
    if (error.Code?.StartsWith("TOON1") == true)
        Console.WriteLine("  → Lexer/tokenization error");
    else if (error.Code?.StartsWith("TOON2") == true)
        Console.WriteLine("  → Parser structural error");
    else if (error.Code?.StartsWith("TOON3") == true)
        Console.WriteLine("  → Validation error");
}

Error Code Categories:

Category	Range	Description	Examples
Lexer	TOON1xxx	Tokenization errors	`TOON1001` Unterminated string `TOON1002` Invalid escape sequence `TOON1003` Invalid character
Parser	TOON2xxx	Structural errors	`TOON2001` Expected property key `TOON2002` Expected colon `TOON2003` Expected right bracket `TOON2004` Expected value `TOON2005` Expected delimiter
Validation	TOON3xxx	Semantic errors	`TOON3001` Array size mismatch `TOON3002` Table array size mismatch `TOON3003` Table row field mismatch
Delimiters	TOON4xxx	Delimiter issues	`TOON4001` Mixed delimiters `TOON4002` Delimiter marker misplaced
Indentation	TOON5xxx	Indentation problems	`TOON5001` Unexpected indentation `TOON5002` Inconsistent indentation
Internal	TOON9xxx	Library bugs	`TOON9001` Infinite loop detected

Context-Aware Error Messages:

Every error includes:

✅ What went wrong - Clear description of the problem
✅ Why it's wrong - Explanation of the rule that was violated
✅ How to fix it - Actionable suggestions for correction

Example error messages:

// Unterminated string
[TOON1001] Unterminated double-quoted string at line 5, column 10. 
String reached end of line without closing " character. 
Fix: Add closing " before the end of the line

// Invalid escape sequence
[TOON1002] Invalid escape sequence '\x' at line 3, column 15. 
Valid escape sequences: \n, \r, \t, \\, \", \'. 
Fix: Use a valid escape sequence or remove the backslash

// Array size mismatch
[TOON3001] Array size mismatch: declared 5 elements, but found 3. 
Missing 2 elements. Check if array is incomplete or elements are on wrong indentation level. 
Fix: Either add 2 more elements or change the size declaration [5]→[3]

// Table size mismatch with helpful hint
[TOON3002] Table array size mismatch: declared 10 rows, but found 8. 
Missing 2 rows. Check if rows are incomplete or have incorrect indentation. 
Fix: Either add 2 more rows or update the size [10]→[8]

Token Types

public enum TokenType
{
    // Values
    String, Number, True, False, Null, Identifier,
    
    // Structure
    Colon, Comma, Pipe,
    LeftBracket, RightBracket,
    LeftBrace, RightBrace,
    
    // Formatting
    Newline, Indent, Dedent, Whitespace,
    
    // Special
    Comment, EndOfFile, Invalid
}

AST Nodes

All inherit from AstNode with position tracking:

// Document root
ToonDocument              // Contains Properties[]

// Structural
PropertyNode              // Key + Value
ObjectNode                // Nested object with Properties[]

// Arrays
ArrayNode                 // Simple array with Elements[]
TableArrayNode            // Tabular with Schema[] and Rows[][]

// Values
StringValueNode           // String literal
NumberValueNode           // Numeric (integer or float)
BooleanValueNode          // true/false
NullValueNode             // null

Every node includes:

int StartLine, StartColumn, StartPosition
int EndLine, EndColumn, EndPosition

Advanced Features

Resilient Parsing

The parser continues after errors, returning a partial AST and all error locations:

var source = @"
name: John
invalid line here
city: Boulder
";

var result = Toon.Parse(source);

// result.Document has 2 valid properties (name, city)
// result.Errors has 1 error (line 3)

Perfect for:

IDE integration (IntelliSense on valid parts)
Error highlighting (show all errors at once)
Language servers
Linters and validators

Extension Methods

Token Extensions

using ToonTokenizer;

var tokens = Toon.Tokenize(source);

// Get tokens on specific line
var lineTokens = tokens.GetTokensOnLine(5);

// Find token at position
var token = tokens.GetTokenAt(line: 3, column: 10);

// Filter by type
var strings = tokens.GetTokensByType(TokenType.String);

// Syntax highlighting classification
foreach (var token in tokens)
{
    string cssClass = token.GetClassification();
    // Returns: "keyword", "string", "number", "comment", etc.
}

// Check categories
bool isKeyword = token.IsKeyword();        // true, false, null
bool isStructural = token.IsStructural();  // :, [, ], {, }, ,
bool isValue = token.IsValue();            // strings, numbers, booleans

Token to AST Navigation

using ToonTokenizer;
using ToonTokenizer.Ast;

var result = Toon.Parse(source);

// From token to AST node
var token = result.Tokens.GetTokenAt(line: 5, column: 3);
var node = token.GetAstNode(result.Document);
var property = token.GetPropertyNode(result.Document);

// From parse result directly
var nodeAtPosition = result.GetNodeAtPosition(42);
var nodeForToken = result.GetNodeForToken(myToken);
var propertyAt = result.GetPropertyAt(line: 3, column: 5);

// Get all properties (including nested)
var allProps = result.GetAllProperties();

// Find by path (dot notation)
var theme = result.FindPropertyByPath("user.settings.theme");
var email = result.FindPropertyByPath("user.email");

if (theme?.Value is StringValueNode str)
{
    Console.WriteLine($"Theme: {str.Value}");
}

Visitor Pattern

Implement custom AST processing:

using ToonTokenizer.Ast;

public class MyVisitor : IAstVisitor<string>
{
    public string VisitDocument(ToonDocument node)
    {
        var results = node.Properties.Select(p => p.Accept(this));
        return string.Join(", ", results);
    }
    
    public string VisitProperty(PropertyNode node)
    {
        return $"{node.Key} = {node.Value.Accept(this)}";
    }
    
    public string VisitStringValue(StringValueNode node)
    {
        return $"\"{node.Value}\"";
    }
    
    // ... implement other Visit methods
}

// Use it
var doc = Toon.Parse(source).Document;
var output = doc.Accept(new MyVisitor());

Use Cases

1. Visual Studio Extension (Syntax Highlighting)

public IEnumerable<ClassificationSpan> GetClassificationSpans(SnapshotSpan span)
{
    var source = span.GetText();
    var result = Toon.Parse(source);  // Gets tokens + AST in one call
    
    foreach (var token in result.Tokens)
    {
        var classification = token.GetClassification();
        var tokenSpan = new SnapshotSpan(
            span.Snapshot, 
            token.Position, 
            token.Length
        );
        
        yield return new ClassificationSpan(
            tokenSpan, 
            GetClassificationType(classification)
        );
    }
}

2. Language Server (IntelliSense)

public IEnumerable<Completion> GetCompletions(int line, int column)
{
    var result = Toon.Parse(documentText);
    
    // Find the property we're currently in
    var property = result.GetPropertyAt(line, column);
    
    if (property != null)
    {
        // Context-aware suggestions based on property type
        if (property.Value is ObjectNode)
        {
            // Suggest nested property names
            yield return new Completion("theme");
            yield return new Completion("enabled");
        }
        else if (property.Value is ArrayNode)
        {
            // Suggest array-specific completions
            yield return new Completion("[size]");
        }
    }
    
    var token = result.Tokens.GetTokenAt(line, column);
    if (token?.Type == TokenType.Colon)
    {
        // After colon: suggest value types
        yield return new Completion("true");
        yield return new Completion("false");
        yield return new Completion("null");
    }
}

3. Error Diagnostics

public IEnumerable<Diagnostic> GetDiagnostics()
{
    var result = Toon.Parse(documentText);
    
    foreach (var error in result.Errors)
    {
        yield return new Diagnostic
        {
            Severity = DiagnosticSeverity.Error,
            Message = error.Message,
            Range = new Range(
                error.Line - 1,
                error.Column - 1,
                error.EndPosition
            )
        };
    }
}

4. Code Folding / Outlining

public IEnumerable<FoldingRange> GetFoldingRanges()
{
    var result = Toon.Parse(documentText);
    
    foreach (var property in result.Document.Properties)
    {
        if (property.Value is ObjectNode obj && obj.Properties.Count > 0)
        {
            yield return new FoldingRange
            {
                StartLine = obj.StartLine,
                EndLine = obj.EndLine,
                Kind = FoldingRangeKind.Region
            };
        }
        else if (property.Value is TableArrayNode table && table.Rows.Count > 5)
        {
            yield return new FoldingRange
            {
                StartLine = table.StartLine,
                EndLine = table.EndLine,
                Kind = FoldingRangeKind.Region
            };
        }
    }
}

5. LLM Prompt Optimization

// Convert verbose JSON to compact TOON for token savings
var jsonData = GetDataFromApi();
var toonEncoder = new ToonEncoder();
var compactPrompt = toonEncoder.Encode(jsonData);

// Use in prompt
var prompt = $@"
Analyze this data:
{compactPrompt}

What insights can you provide?
";

Why Choose ToonTokenizer?

✨ Feature Complete

Full TOON v3.0 specification support
Handles all array types (inline, tabular, nested)
Complete delimiter support (comma, tab, pipe)
Resilient parsing with error recovery

🎯 Production Ready

637 unit tests covering edge cases (100% passing)
Battle-tested on complex real-world data
Handles malformed input gracefully
Comprehensive error reporting with standardized error codes
Context-aware error messages with actionable fix suggestions

🚀 Performance Focused

Efficient single-pass lexer
Minimal allocations
Streaming-friendly design
.NET Standard 2.0 for maximum compatibility

🛠️ Developer Friendly

Rich IntelliSense support
Extensive XML documentation
Position tracking on everything
Extension methods for common tasks

🏗️ Extensible

Visitor pattern for AST traversal
Hook points for custom behavior
Clean separation of concerns
Easy to integrate into larger systems

Specification Compliance

This library implements the TOON v3.0 specification. The full spec is included in spec.md.

Key features:

✅ Deterministic encoding
✅ Lossless round-tripping
✅ Strict and lenient parsing modes
✅ Position tracking for all tokens
✅ Table array detection
✅ Delimiter scoping rules
✅ Escape sequence handling
✅ Array size validation per §6.1 (detects undersized arrays)

Platform Support

Platform	Support
.NET Core 2.0+	✅
.NET Framework 4.6.1+	✅
.NET 5, 6, 7, 8, 9, 10	✅
Mono	✅
Xamarin	✅
Unity	✅ (via .NET Standard 2.0)

Performance

Typical parse performance on modern hardware:

Document Size	Parse Time	Tokens/sec
1 KB	< 1 ms	500K
10 KB	2-5 ms	400K
100 KB	20-40 ms	350K
1 MB	200-300 ms	300K

Benchmarks vary based on document structure and hardware.

Documentation

Quick Start: See examples above
Token Access: Examples/TokensInParseResult.md
API Documentation: XML docs included in package

Examples

Check out the Examples directory for:

Basic parsing examples
Syntax highlighter implementation
Error handling patterns
AST visitor examples
Token manipulation

Contributing

Contributions welcome! Please:

Follow existing code style
- Use the .editorconfig settings
- Keep methods focused and well-named
- Add XML documentation for public APIs
Write tests
- Add tests for new features
- Ensure all existing tests pass
- Aim for high code coverage
Update documentation
- Update README for user-facing changes
- Add examples for new features
- Keep spec compliance notes current

Testing

Run the full test suite:

dotnet test

Test coverage:

Lexer: Token generation, escape sequences, position tracking
Parser: All node types, error recovery, edge cases
Validation: Array size validation, string format validation, number format validation
Extensions: Helper methods, visitor pattern
Integration: Round-trip encoding/decoding

License

Apache License 2.0 - See LICENSE.txt file for details.

This library is independent from the TOON specification but implements it faithfully. The specification itself is MIT licensed.

Links

TOON Specification: https://github.com/toon-format/spec
Reference Implementation (TypeScript): https://github.com/toon-format/toon
This Library: https://github.com/madskristensen/ToonTokenizer
NuGet Package: https://www.nuget.org/packages/ToonTokenizer/

Support

Issues: GitHub Issues
Discussions: GitHub Discussions
Spec Questions: TOON Spec Repo

Author

Mads Kristensen - GitHub | Twitter

Implementing the TOON specification by Johann Schopplich - @johannschopplich

Made with ❤️ for the .NET community

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
.github/workflows		.github/workflows
BenchmarkSuite1		BenchmarkSuite1
Examples		Examples
ToonTokenizer		ToonTokenizer
ToonTokenizerTest		ToonTokenizerTest
.gitattributes		.gitattributes
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
Icon.png		Icon.png
LICENSE.txt		LICENSE.txt
README.md		README.md
SECURITY.md		SECURITY.md
ToonTokenizer.slnx		ToonTokenizer.slnx
favorites.json		favorites.json
test_results.txt		test_results.txt

Uh oh!

License

madskristensen/ToonTokenizer

Folders and files

Latest commit

History

Repository files navigation

ToonTokenizer for .NET

What is TOON?

Features

Installation

Quick Start

Parse TOON to AST

Validate TOON

Access Tokens Only

Navigate from Tokens to AST

TOON Language Examples

Simple Properties

Nested Objects

Arrays - Inline (Primitives)

Arrays - Tabular (Uniform Objects)

Real-World Example

API Reference

Main Entry Point: Toon

Parse Result

Errors with Precise Locations

Error Codes for Programmatic Handling

Token Types

AST Nodes

Advanced Features

Resilient Parsing

Extension Methods

Token Extensions

Token to AST Navigation

Visitor Pattern

Use Cases

1. Visual Studio Extension (Syntax Highlighting)

2. Language Server (IntelliSense)

3. Error Diagnostics

4. Code Folding / Outlining

5. LLM Prompt Optimization

Why Choose ToonTokenizer?

✨ Feature Complete

🎯 Production Ready

🚀 Performance Focused

🛠️ Developer Friendly

🏗️ Extensible

Specification Compliance

Platform Support

Performance

Documentation

Examples

Contributing

Testing

License

Links

Support

Author

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Sponsor this project

Uh oh!

Packages 0

Languages

Main Entry Point: `Toon`

Packages