A high-performance Rust library and CLI tool for CSV data analysis, featuring automatic type inference, statistical analysis, and professional reporting capabilities.
This project provides both:
- ๐ Rust Library - For embedding CSV analysis in your applications
- ๐ฅ๏ธ CLI Tool - For command-line data analysis
- Automatic Type Inference: Intelligently detects integers, floats, booleans, and strings
- Missing Value Analysis: Comprehensive NA/null detection and reporting
- Statistical Operations: Built-in sum, mean, min, max calculations for all numeric types
- JSON Export: Native JSON serialization with multiple orientations (Columns, Records, Values)
- Professional Output: Formatted tables and statistical reports
- Fast Processing: Rust-powered performance for large CSV files
- Self-Analyzing Columns: Each column type implements its own statistical operations
- Comprehensive Testing: 37+ tests ensuring reliability
Add to your Cargo.toml
:
[dependencies]
csv_processor = "0.1.0"
cargo install csv_processor
# Or build from source
git clone https://github.com/kkruglik/csv_processor
cd csv_processor
cargo build --release
use csv_processor::{DataFrame, JsonExportOrient, reporter::{generate_info_report, generate_na_report}};
fn main() -> Result<(), Box<dyn std::error::Error>> {
// Load CSV file
let df = DataFrame::from_csv("data.csv")?;
// Generate statistical report
let stats_report = generate_info_report(&df);
println!("Statistics:\n{}", stats_report);
// Generate NA analysis report
let na_report = generate_na_report(&df);
println!("Missing Values:\n{}", na_report);
// Export to JSON with different orientations
let json_columns = df.to_json(JsonExportOrient::Columns)?;
println!("JSON (Columns): {}", json_columns);
let json_records = df.to_json(JsonExportOrient::Records)?;
println!("JSON (Records): {}", json_records);
let json_values = df.to_json(JsonExportOrient::Values)?;
println!("JSON (Values): {}", json_values);
// Access individual columns for custom analysis
if let Some(column) = df.get_column(0) {
println!("Column mean: {:?}", column.mean());
println!("Column nulls: {}", column.null_count());
let column_json = column.to_json();
println!("Column as JSON: {:?}", column_json);
}
Ok(())
}
# Check for missing values
csv_processor na sample.csv
# Calculate comprehensive statistics
csv_processor info sample.csv
# Get help
csv_processor --help
Development Usage:
# When developing/building from source
cargo run --bin csv_processor -- na sample.csv
cargo run --bin csv_processor -- info sample.csv
When loading a CSV file, data is displayed in a formatted table:
โโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโฌโโโโโโโโโโฌโโโโโโโโโโโโโฌโโโโโโโโโโโโโโฌโโโโโโโโโฌโโโโโโโโโโโโโฌโโโโโโโโ
โ name โ age โ salary โ department โ active โ score โ ... โ ... โ
โโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโผโโโโโโโโโโผโโโโโโโโโโโโโผโโโโโโโโโโโโโโผโโโโโโโโโผโโโโโโโโโโโโโผโโโโโโโโค
โ Alice Smith โ 28 โ 75000.5 โEngineering โ true โ 8.7 โ ... โ ... โ
โ Bob Johnson โ null โ 65000 โ Marketing โ false โ null โ ... โ ... โ
โ Carol Davis โ 35 โ null โEngineering โ true โ 9.2 โ ... โ ... โ
โ null โ 29 โ58000.75 โ Sales โ true โ 7.8 โ ... โ ... โ
โ โฎ โ โฎ โ โฎ โ โฎ โ โฎ โ โฎ โ โฎ โ โฎ โ
โ Henry Taylor โ 38 โ 82000 โEngineering โ false โ 7.5 โ ... โ ... โ
โโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโดโโโโโโโโโโดโโโโโโโโโโโโโดโโโโโโโโโโโโโโดโโโโโโโโโดโโโโโโโโโโโโโดโโโโโโโโ
10 rows ร 8 columns
โโโโโโโโโโโโโโฌโโโโโโโโโโโฌโโโโโโโโโโโโโโฌโโโโโโโโโโโโฌโโโโโโโโโโโโโโ
โ column โ mean โ sum โ min โ max โ
โโโโโโโโโโโโโโผโโโโโโโโโโโผโโโโโโโโโโโโโโผโโโโโโโโโโโโผโโโโโโโโโโโโโโค
โ id โ 5.5 โ 55.0 โ 1.0 โ 10.0 โ
โ age โ 31.29 โ 250.33 โ 26.0 โ 42.0 โ
โ salary โ 72571.5 โ 507000.5 โ 58000.75 โ 95000.0 โ
โ active โ 0.8 โ 8.0 โ 0.0 โ 1.0 โ
โ score โ 8.06 โ 56.4 โ 6.9 โ 9.2 โ
โโโโโโโโโโโโโโดโโโโโโโโโโโดโโโโโโโโโโโโโโดโโโโโโโโโโโโดโโโโโโโโโโโโโโ
5 rows ร 5 columns
Column Analysis:
- id: 0 missing values (0.0%)
- name: 2 missing values (20.0%)
- age: 2 missing values (20.0%)
- salary: 3 missing values (30.0%)
- department: 1 missing values (10.0%)
- active: 1 missing values (10.0%)
- start_date: 2 missing values (20.0%)
- score: 3 missing values (30.0%)
The library supports three JSON export orientations:
Columns Format (Analytics-Optimized)
{
"headers": ["id", "name", "age", "salary", "active"],
"columns": [
[1, 2, 3, 4, 5],
["Alice", "Bob", null, "David", "Emma"],
[28, 35, null, 42, 31],
[75000.5, 65000, null, 82000, 71500],
[true, false, true, false, true]
]
}
Records Format (Row-Oriented)
[
{"id": 1, "name": "Alice", "age": 28, "salary": 75000.5, "active": true},
{"id": 2, "name": "Bob", "age": 35, "salary": 65000, "active": false},
{"id": 3, "name": null, "age": null, "salary": null, "active": true}
]
Values Format (Indexed)
[
{"0": 1, "1": "Alice", "2": 28, "3": 75000.5, "4": true},
{"0": 2, "1": "Bob", "2": 35, "3": 65000, "4": false},
{"0": 3, "1": null, "2": null, "3": null, "4": true}
]
use csv_processor::{DataFrame, ColumnArray, CellValue, JsonExportOrient, reporter};
// Main data container
let df = DataFrame::from_csv("data.csv")?;
// Access columns polymorphically
let column: &dyn ColumnArray = df.get_column(0).unwrap();
// Statistical operations (all return Option<f64>)
let mean = column.mean();
let sum = column.sum();
let min = column.min();
let max = column.max();
let nulls = column.null_count();
// JSON export with multiple orientations
let json_columns = df.to_json(JsonExportOrient::Columns)?;
let json_records = df.to_json(JsonExportOrient::Records)?;
let json_values = df.to_json(JsonExportOrient::Values)?;
let column_json = column.to_json();
// Generate reports
let stats_report = reporter::generate_info_report(&df);
let na_report = reporter::generate_na_report(&df);
ColumnArray
- Unified interface for column data, statistical operations, and JSON exportDisplay
- Formatted output for DataFrames and reports
src/
โโโ lib.rs # Library interface with documentation
โโโ bin/
โ โโโ csv_processor.rs # CLI binary
โโโ series/ # Column-oriented data structures (Polars-style)
โ โโโ array.rs # ColumnArray trait with statistical operations
โโโ frame/ # DataFrame operations and CSV I/O
โ โโโ mod.rs # Main DataFrame implementation
โโโ scalar/ # Cell-level operations and values
โโโ reporter.rs # Statistical report generation
โโโ config.rs # CLI parsing (exported for advanced use)
- Library First: Clean API for embedding in applications
- Self-Analyzing Columns: Statistical operations embedded in column types
- Functional Design: Pure functions over object-oriented patterns
- Rust Idioms: Leverage ownership system and proper error handling
- DataFrame: Main container with typed columns and display formatting
- ColumnArray: Unified trait for data access AND statistical operations
- Column Types:
IntegerColumn
,FloatColumn
,StringColumn
,BooleanColumn
- CellValue: Enum for individual cell values with type information
# Build the project
cargo build
# Run all tests (37+ test suite)
cargo test
# Run specific test suite
cargo test frame_tests
cargo test columns_tests
# Check code quality
cargo clippy
# Format code
cargo fmt
# Check without building
cargo check
- Fast Type Inference: Automatic detection of optimal column types
- Memory Efficient: Column-oriented storage following Apache Arrow patterns
- Zero-Cost Abstractions: Rust's performance with high-level ergonomics
- Parallel Processing Ready: Architecture designed for future parallelization
The tool handles various data types and missing values:
id,name,age,salary,department,active,start_date,score
1,Alice Smith,28,75000.50,Engineering,true,2021-03-15,8.7
2,Bob Johnson,,65000,Marketing,false,2020-11-22,
3,Carol Davis,35,NA,Engineering,true,,9.2
CLI Usage:
# Analyze missing values
csv_processor na employee_data.csv
# Generate statistical report (includes JSON export demonstration)
csv_processor info sales_data.csv
# For development (building from source)
cargo run --bin csv_processor -- na employee_data.csv
Library Usage:
use csv_processor::{DataFrame, JsonExportOrient, reporter::generate_info_report};
let df = DataFrame::from_csv("sales_data.csv")?;
let report = generate_info_report(&df);
println!("{}", report);
// Export to different JSON formats
let json_columns = df.to_json(JsonExportOrient::Columns)?;
let json_records = df.to_json(JsonExportOrient::Records)?;
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature
) - Write tests for your changes
- Run the test suite (
cargo test
) - Ensure code quality (
cargo clippy
) - Commit your changes (
git commit -am 'Add amazing feature'
) - Push to the branch (
git push origin feature/amazing-feature
) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.