๐ Semantic diff for structured data - Focus on what matters, not formatting
English README | ๆฅๆฌ่ช็ README | ไธญๆ็ README
A next-generation diff tool that understands the structure and meaning of your data, not just text changes. Perfect for JSON, YAML, TOML, XML, INI, and CSV files.
# Traditional diff shows formatting noise (key order, trailing commas)
$ diff config_v1.json config_v2.json
< {
< "name": "myapp",
< "version": "1.0"
< }
> {
> "version": "1.1",
> "name": "myapp"
> }
# diffx shows only semantic changes
$ diffx config_v1.json config_v2.json
~ version: "1.0" -> "1.1"- ๐ฏ Semantic Awareness: Ignores formatting, key order, whitespace, and trailing commas
- ๐ง Multiple Formats: JSON, YAML, TOML, XML, INI, CSV support
- ๐ค AI-Friendly: Clean CLI output perfect for automation and AI analysis
- โก Fast: Built in Rust for maximum performance
- ๐ Meta-Chaining: Compare diff reports to track change evolution
Real benchmark results on AMD Ryzen 5 PRO 4650U:
# Test files: ~600 bytes JSON with nested config
$ time diff large_test1.json large_test2.json # Shows 15+ lines of noise
$ time diffx large_test1.json large_test2.json # Shows 3 semantic changes
# Results:
Traditional diff: ~0.002s (but with formatting noise)
diffx: ~0.005s (clean semantic output)Why CLI matters for the AI era: As AI tools become essential in development workflows, having structured, machine-readable diff output becomes crucial. diffx provides clean, parseable results that AI can understand and reason about, making it perfect for automated code review, configuration management, and intelligent deployment pipelines.
Traditional diff tools show you formatting noise. diffx shows you what actually changed.
- Focus on meaning: Ignores key order, whitespace, and formatting
- Multiple formats: Works with JSON, YAML, TOML, XML, INI, CSV
- Clean output: Perfect for humans, scripts, and AI analysis
- JSON
- YAML
- TOML
- XML
- INI
- CSV
- Key addition/deletion
- Value change
- Array insertion/deletion/modification
- Nested structure differences
- Value type change
diffx outputs differences in the diffx format by default - a semantic diff representation designed specifically for structured data. The diffx format provides the richest expression of structural differences and can be complemented with machine-readable formats for integration:
-
diffx Format (Default)
- The diffx format is a human-readable, semantic diff representation that clearly displays structural differences (additions, changes, deletions, type changes, etc.) using intuitive symbols and hierarchical paths.
- Differences are represented by
+(addition),-(deletion),~(change),!(type change) symbols with full path context (e.g.,database.connection.host). - Core Feature: Focuses on semantic changes in data, ignoring changes in key order, whitespace, and formatting. This semantic focus is the fundamental value of both the tool and the diffx format.
-
JSON Format
- Machine-readable format. Used for CI/CD and integration with other programs.
- Differences detected by
diffxare output as a JSON array.
-
YAML Format
- Machine-readable format. Used for CI/CD and integration with other programs, similar to JSON.
- Differences detected by
diffxare output as a YAML array.
graph TB
subgraph Core["diffx-core"]
B[Format Parsers]
C[Semantic Diff Engine]
D[Output Formatters]
B --> C --> D
end
E[CLI Tool] --> Core
F[NPM Package] --> E
G[Python Package] --> E
H[JSON] --> B
I[YAML] --> B
J[TOML] --> B
K[XML] --> B
L[INI] --> B
M[CSV] --> B
D --> N[CLI Display]
D --> O[JSON Output]
D --> P[YAML Output]
diffx/
โโโ diffx-core/ # Diff extraction library (Crate)
โโโ diffx-cli/ # CLI wrapper
โโโ tests/ # All test-related files
โ โโโ fixtures/ # Test input data
โ โโโ integration/ # CLI integration tests
โ โโโ unit/ # Core library unit tests
โ โโโ output/ # Test intermediate files
โโโ docs/ # Documentation and specifications
โโโ ...
- Rust (Fast, safe, cross-platform)
serde_json,serde_yml,toml,configparser,quick-xml,csvparsersclap(CLI argument parsing)colored(CLI output coloring)similar(Unified Format output)
Compare diff reports to track how changes evolve over time:
graph LR
A[config_v1.json] --> D1[diffx]
B[config_v2.json] --> D1
D1 --> R1[diff_report_v1.json]
B --> D2[diffx]
C[config_v3.json] --> D2
D2 --> R2[diff_report_v2.json]
R1 --> D3[diffx]
R2 --> D3
D3 --> M[Meta-Diff Report]
$ diffx config_v1.json config_v2.json --output json > report1.json
$ diffx config_v2.json config_v3.json --output json > report2.json
$ diffx report1.json report2.json # Compare the changes themselves!# Rust (recommended - native performance)
cargo install diffx
# Node.js ecosystem (โก offline-ready with all platform binaries)
npm install diffx-js
# Python ecosystem (๐ self-contained wheel with embedded binary)
pip install diffx-python
# Or download pre-built binaries from GitHub ReleasesFor detailed usage and examples, see the documentation.
- Getting Started - Learn the basics
- Installation Guide - Platform-specific setup
- CLI Reference - Complete command reference
- Real-World Examples - Industry use cases
- Integration Guide - CI/CD and automation
# Compare JSON files
diffx file1.json file2.json
# Compare with different output formats
diffx config.yaml config_new.yaml --output json
diffx data.toml data_updated.toml --output yaml
# Advanced filtering options
diffx large.json large_v2.json --ignore-keys-regex "^timestamp$|^_.*"
diffx users.json users_v2.json --array-id-key "id"
diffx metrics.json metrics_v2.json --epsilon 0.001
# High-demand practical options
diffx config.yaml config_new.yaml --ignore-case # Ignore case differences
diffx api.json api_formatted.json --ignore-whitespace # Ignore whitespace changes
diffx large.json large_v2.json --output json # JSON output for automation
diffx file1.json file2.json --quiet && echo "Files identical" # Script automation
diffx dir1/ dir2/ --brief # Quick directory change check (automatic recursive)
# Performance optimization for large files
diffx huge_dataset.json huge_dataset_v2.json
# Directory comparison (automatic recursive detection)
diffx config_dir1/ config_dir2/
# Meta-chaining for change tracking
diffx config_v1.json config_v2.json --output json > diff1.json
diffx config_v2.json config_v3.json --output json > diff2.json
diffx diff1.json diff2.json # Compare the changes themselves!CI/CD Pipeline:
- name: Check configuration changes
run: |
diffx config/prod.yaml config/staging.yaml --output json > changes.json
# Process changes.json for deployment validation
- name: Quick file change detection
run: |
if ! diffx config/current.json config/new.json --quiet; then
echo "Configuration changed, triggering deployment"
fi
- name: Compare with ignore options for cleaner diffs
run: |
diffx api_old.json api_new.json --ignore-case --ignore-whitespace --output json > api_changes.json
# Focus on semantic changes, ignore formatting
- name: Compare large datasets efficiently
run: |
diffx large_prod_data.json large_staging_data.json --output json > data_changes.json
# Optimized processing for large files in CIGit Hook:
#!/bin/bash
# pre-commit hook
if diffx package.json HEAD~1:package.json --output json | jq -e '.[] | select(.Added)' > /dev/null; then
echo "New dependencies detected, running security audit..."
fidiffx is available across multiple ecosystems:
# Rust (native CLI)
cargo install diffx
# Node.js wrapper
npm install diffx-js
# Python wrapper
pip install diffx-pythonAll packages provide the same semantic diff capabilities:
- Rust: Source-based compilation
- npm: Universal package with all platform binaries (offline-ready)
- Python: Self-contained wheels with embedded binaries
- Interactive TUI (
diffx-tui): A powerful viewer showcasing diffx capabilities with side-by-side data display - AI agent integration: Automated diff summarization and explanation
- Web UI version (
diffx-web) - VSCode extension (
diffx-vscode) - Advanced CI/CD templates: Pre-built workflows for common use cases
We welcome contributions! See CONTRIBUTING.md for guidelines.
MIT License - see LICENSE for details.