Thanks to visit codestin.com
Credit goes to github.com

Skip to content

A Rust library for analyzing data lake table health — checking the pulse — across multiple formats (Delta Lake, Apache Iceberg, Apache Hudi, Lance) and storage providers (AWS S3, Azure Data Lake, GCS, Local).

License

Apache-2.0, MIT licenses found

Licenses found

Apache-2.0
LICENSE-APACHE
MIT
LICENSE-MIT
Notifications You must be signed in to change notification settings

adobe/lake-pulse

Lake Pulse

CI codecov Docs License: MIT or Apache-2.0 Latest Version

Lake Pulse Logo

A Rust library for analyzing data lake table health — checking the pulse — across multiple formats (Delta Lake, Apache Iceberg, Apache Hudi, Lance) and storage providers (AWS S3, Azure Data Lake, GCS, Local).

Supported Formats

Delta Lake Apache Iceberg Apache Hudi Lance

Overview

Lake Pulse provides comprehensive health metrics for your data lake tables, including:

  • File organization and compaction opportunities
  • Metadata analysis and schema evolution
  • Partition statistics
  • Time travel/snapshot metrics
  • Storage efficiency insights

Quick Start

Basic Example - Analyzing a Delta Lake table on AWS S3

use lake_pulse::{Analyzer, StorageConfig};

#[tokio::main]
async fn main() {
    let storage_config = StorageConfig::aws()
        .with_option("bucket", "my-bucket-1234")
        .with_option("region", "us-east-1")
        .with_option("access_key_id", "the_access_key_id")
        .with_option("secret_access_key", "the_secret_access_key")
        .with_option("session_token", "session_token_if_needed");
    
    let analyzer = Analyzer::builder(storage_config).build().await.unwrap();

    // Generate report
    let report = analyzer.analyze("my/table/path").await.unwrap();

    // Print pretty report
    println!("{}", report);
}

Supported Table Formats

  • Delta Lake - Full support for transaction logs, deletion vectors, and Delta-specific metrics
  • Apache Iceberg - Metadata analysis, snapshot management, and Iceberg-specific features
  • Apache Hudi - Basic support for Hudi table structure analysis and metrics
  • Lance - Modern columnar format with vector search capabilities

Storage Configuration

Lake Pulse uses the object_store crate for cloud storage access. Configuration options are passed through to the underlying storage provider.

AWS S3 Configuration Options

Common options for S3 (see object_store AWS documentation):

  • bucket - S3 bucket name
  • region - AWS region (e.g., "us-east-1")
  • access_key_id - AWS access key ID
  • secret_access_key - AWS secret access key
  • session_token - Optional session token for temporary credentials
  • endpoint - Optional custom endpoint URL

Azure Configuration Options

Common options for Azure (see object_store Azure documentation):

  • container - Azure container name
  • account_name - Storage account name
  • tenant_id - Azure tenant ID
  • client_id - Service principal client ID
  • client_secret - Service principal client secret

GCP Configuration Options

Common options for GCP (see object_store GCP documentation):

  • bucket - GCS bucket name
  • service_account_key - Path to service account JSON key file

Local Filesystem

let storage_config = StorageConfig::local();
let analyzer = Analyzer::builder(storage_config).build().await.unwrap();
let report = analyzer.analyze("/path/to/table").await.unwrap();

Examples

See the examples/ directory for more detailed usage examples:

  • s3_store.rs - AWS S3 example
  • adl_store.rs - Azure Data Lake example
  • local_store.rs - Local filesystem example
  • local_store_iceberg.rs - Iceberg table example
  • local_store_hudi.rs - Hudi table example
  • local_store_lance.rs - Lance table example

Run examples with:

cargo run --example s3_store

Documentation

For detailed information on configuration options, refer to the object_store crate documentation:

Minimum Supported Rust Version (MSRV)

This crate requires Rust 1.88 or later.

Contributing

Contributions are welcome! Please see CONTRIBUTING.md for guidelines.

License

See LICENSE files for details.

About

A Rust library for analyzing data lake table health — checking the pulse — across multiple formats (Delta Lake, Apache Iceberg, Apache Hudi, Lance) and storage providers (AWS S3, Azure Data Lake, GCS, Local).

Resources

License

Apache-2.0, MIT licenses found

Licenses found

Apache-2.0
LICENSE-APACHE
MIT
LICENSE-MIT

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages