Real-time event streaming platform. Streaming CDC, stream processing, low-latency serving, and Iceberg management.
-
Updated
Sep 13, 2025 - Rust
Real-time event streaming platform. Streaming CDC, stream processing, low-latency serving, and Iceberg management.
No-code LLM Platform to launch APIs and ETL Pipelines to structure unstructured documents
Make stream processing easier! Easy-to-use streaming application development framework and operation platform.
Apache Hamilton helps data scientists and engineers define testable, modular, self-documenting dataflows, that encode lineage/tracing and metadata. Runs and scales everywhere python does.
Implementing best practices for PySpark ETL jobs and applications.
Few projects related to Data Engineering including Data Modeling, Infrastructure setup on cloud, Data Warehousing and Data Lake development.
An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.
Enterprise-grade and API-first LLM workspace for unstructured documents, including data extraction, redaction, rights management, prompt playground, and more!
A Clojure high performance data processing system
A blazingly fast general purpose blockchain analytics engine specialized in systematic mev detection
A simplified, lightweight ETL Framework based on Apache Spark
The agentic AI platform for enterprise. Built by data engineers for data engineers. Complete context engineering and LLM orchestration infrastructure. Run anywhere - local, cloud, or bare metal.
The Supabase of AI era. A modular, open-source backend for building AI-native software — designed for knowledge, not static data.
A comprehensive guide to building a modern data warehouse with SQL Server, including ETL processes, data modeling, and analytics.
concurrent & fluent interface for (async) iterables
An end-to-end data engineering pipeline that orchestrates data ingestion, processing, and storage using Apache Airflow, Python, Apache Kafka, Apache Zookeeper, Apache Spark, and Cassandra. All components are containerized with Docker for easy deployment and scalability.
Service for bulk-loading data to databases with automatic schema management (Redshift, Snowflake, BigQuery, ClickHouse, Postgres, MySQL)
A simple Spark-powered ETL framework that just works 🍺
Jayvee is a domain-specific language and runtime for automated processing of data pipelines
Regular practice on Data Science, Machien Learning, Deep Learning, Solving ML Project problem, Analytical Issue. Regular boost up my knowledge. The goal is to help learner with learning resource on Data Science filed.
Add a description, image, and links to the etl-pipeline topic page so that developers can more easily learn about it.
To associate your repository with the etl-pipeline topic, visit your repo's landing page and select "manage topics."