A Rust library for analyzing data lake table health — checking the pulse — across multiple formats (Delta Lake, Apache Iceberg, Apache Hudi, Lance) and storage providers (AWS S3, Azure Data Lake, GCS, Local).
Lake Pulse provides comprehensive health metrics for your data lake tables, including:
- File organization and compaction opportunities
- Metadata analysis and schema evolution
- Partition statistics
- Time travel/snapshot metrics
- Storage efficiency insights
use lake_pulse::{Analyzer, StorageConfig};
#[tokio::main]
async fn main() {
let storage_config = StorageConfig::aws()
.with_option("bucket", "my-bucket-1234")
.with_option("region", "us-east-1")
.with_option("access_key_id", "the_access_key_id")
.with_option("secret_access_key", "the_secret_access_key")
.with_option("session_token", "session_token_if_needed");
let analyzer = Analyzer::builder(storage_config).build().await.unwrap();
// Generate report
let report = analyzer.analyze("my/table/path").await.unwrap();
// Print pretty report
println!("{}", report);
}- Delta Lake - Full support for transaction logs, deletion vectors, and Delta-specific metrics
- Apache Iceberg - Metadata analysis, snapshot management, and Iceberg-specific features
- Apache Hudi - Basic support for Hudi table structure analysis and metrics
- Lance - Modern columnar format with vector search capabilities
Lake Pulse uses the object_store crate for cloud storage access. Configuration options are passed through to the underlying storage provider.
Common options for S3 (see object_store AWS documentation):
bucket- S3 bucket nameregion- AWS region (e.g., "us-east-1")access_key_id- AWS access key IDsecret_access_key- AWS secret access keysession_token- Optional session token for temporary credentialsendpoint- Optional custom endpoint URL
Common options for Azure (see object_store Azure documentation):
container- Azure container nameaccount_name- Storage account nametenant_id- Azure tenant IDclient_id- Service principal client IDclient_secret- Service principal client secret
Common options for GCP (see object_store GCP documentation):
bucket- GCS bucket nameservice_account_key- Path to service account JSON key file
let storage_config = StorageConfig::local();
let analyzer = Analyzer::builder(storage_config).build().await.unwrap();
let report = analyzer.analyze("/path/to/table").await.unwrap();See the examples/ directory for more detailed usage examples:
s3_store.rs- AWS S3 exampleadl_store.rs- Azure Data Lake examplelocal_store.rs- Local filesystem examplelocal_store_iceberg.rs- Iceberg table examplelocal_store_hudi.rs- Hudi table examplelocal_store_lance.rs- Lance table example
Run examples with:
cargo run --example s3_storeFor detailed information on configuration options, refer to the object_store crate documentation:
This crate requires Rust 1.88 or later.
Contributions are welcome! Please see CONTRIBUTING.md for guidelines.
See LICENSE files for details.