etl-pipeline

Here are 1,631 public repositories matching this topic...

risingwavelabs / risingwave

Real-time event streaming platform. Streaming CDC, stream processing, low-latency serving, and Iceberg management.

rust database kafka postgresql stream-processing data-engineering materialized-view etl-pipeline apache-iceberg elt-pipeline

Updated Sep 13, 2025
Rust

Zipstack / unstract

Star

No-code LLM Platform to launch APIs and ETL Pipelines to structure unstructured documents

unstructured-data etl-pipeline llm-platform

Updated Sep 12, 2025
Python

apache / streampark

Star

Make stream processing easier! Easy-to-use streaming application development framework and operation platform.

streaming apache easy-to-use etl-pipeline development-framework streampark operation-platform

Updated Sep 10, 2025
Java

apache / hamilton

Star

Apache Hamilton helps data scientists and engineers define testable, modular, self-documenting dataflows, that encode lineage/tracing and metadata. Runs and scales everywhere python does.

python data-science machine-learning etl pandas orchestration data-engineering data-analysis software-engineering feature-engineering dataframe hacktoberfest dag lineage etl-framework etl-pipeline rag mlops llmops

Updated Sep 12, 2025
Jupyter Notebook

AlexIoannides / pyspark-example-project

Star

Implementing best practices for PySpark ETL jobs and applications.

python data-science spark etl pyspark data-engineering etl-pipeline etl-job

Updated Jan 1, 2023
Python

san089 / Udacity-Data-Engineering-Projects

Star

Few projects related to Data Engineering including Data Modeling, Infrastructure setup on cloud, Data Warehousing and Data Lake development.

Updated Aug 26, 2022
Python

san089 / goodreads_etl_pipeline

Star

An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.

Updated Mar 9, 2020
Python

Open-Source-Legal / OpenContracts

Star

Enterprise-grade and API-first LLM workspace for unstructured documents, including data extraction, redaction, rights management, prompt playground, and more!

agent etl unstructured-data etl-pipeline vector-database llm prompt-engineering agentic-ai

Updated Sep 13, 2025
TypeScript

techascent / tech.ml.dataset

Star

A Clojure high performance data processing system

java machine-learning clojure csv xlsx datascience dataset dataframe etl-pipeline

Updated Sep 12, 2025
Clojure

SorellaLabs / brontes

Star

A blazingly fast general purpose blockchain analytics engine specialized in systematic mev detection

rust ethereum evm etl-pipeline mev

Updated Jul 28, 2025
Rust

YotpoLtd / metorikku

Star

A simplified, lightweight ETL Framework based on Apache Spark

scala sql big-data spark etl distributed-computing etl-framework etl-pipeline

Updated Jan 24, 2024
Scala

trustgraph-ai / trustgraph

Star

The agentic AI platform for enterprise. Built by data engineers for data engineers. Complete context engineering and LLM orchestration infrastructure. Run anywhere - local, cloud, or bare metal.

data context data-engineering data-extraction data-sovereignty model-serving etl-pipeline ai-native llm-deployment context-management graphrag llm-orchestration agentic-rag agentic-ai agentic-ai-development knowledge-core trustgraph agentic-graphrag context-engineering

Updated Sep 11, 2025
Python

unbody-io / unbody

Star

The Supabase of AI era. A modular, open-source backend for building AI-native software — designed for knowledge, not static data.

backend chatbot developer-tools knowledge-base data-ingestion etl-pipeline rag data-enhancement vector-database llm ai-native generative-ai agentic-ai supabase-alternative

Updated Jun 5, 2025
TypeScript

DataWithBaraa / sql-data-warehouse-project

Sponsor

Star

A comprehensive guide to building a modern data warehouse with SQL Server, including ETL processes, data modeling, and analytics.

Updated Apr 23, 2025
TSQL

ebonnal / streamable

Star

concurrent & fluent interface for (async) iterables

Updated Sep 13, 2025
Python

airscholar / e2e-data-engineering

Star

An end-to-end data engineering pipeline that orchestrates data ingestion, processing, and storage using Apache Airflow, Python, Apache Kafka, Apache Zookeeper, Apache Spark, and Cassandra. All components are containerized with Docker for easy deployment and scalability.

docker big-data cassandra apache-spark data-storage postgresql data-engineering apache-kafka data-processing data-pipeline real-time-analytics containerization apache-zookeeper apache-airflow etl-pipeline

Updated Feb 14, 2025
Python

jitsucom / bulker

Star

Service for bulk-loading data to databases with automatic schema management (Redshift, Snowflake, BigQuery, ClickHouse, Postgres, MySQL)

pipeline etl data-engineering ingestion datawarehouse etl-pipeline

Updated Sep 10, 2025
Go

SETL-Framework / setl

Star

A simple Spark-powered ETL framework that just works 🍺

data-science machine-learning framework scala big-data spark pipeline etl data-transformation data-engineering dataset data-analysis modularization setl etl-pipeline

Updated Jul 29, 2025
Scala

jvalue / jayvee

Star

Jayvee is a domain-specific language and runtime for automated processing of data pipelines

data-science typescript data-engineering domain-specific-language data-pipeline etl-pipeline

Updated Sep 11, 2025
TypeScript

imsanjoykb / Data-Science-Regular-Bootcamp

Star

Regular practice on Data Science, Machien Learning, Deep Learning, Solving ML Project problem, Analytical Issue. Regular boost up my knowledge. The goal is to help learner with learning resource on Data Science filed.

Updated Jan 29, 2023
Jupyter Notebook

Improve this page

Add a description, image, and links to the etl-pipeline topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the etl-pipeline topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

etl-pipeline

Here are 1,631 public repositories matching this topic...

risingwavelabs / risingwave

Zipstack / unstract

apache / streampark

apache / hamilton

AlexIoannides / pyspark-example-project

san089 / Udacity-Data-Engineering-Projects

san089 / goodreads_etl_pipeline

Open-Source-Legal / OpenContracts

techascent / tech.ml.dataset

SorellaLabs / brontes

YotpoLtd / metorikku

trustgraph-ai / trustgraph

unbody-io / unbody

DataWithBaraa / sql-data-warehouse-project

ebonnal / streamable

airscholar / e2e-data-engineering

jitsucom / bulker

SETL-Framework / setl

jvalue / jayvee

imsanjoykb / Data-Science-Regular-Bootcamp

Improve this page

Add this topic to your repo