OLake

OLake is a high-performance, open-source data ingestion engine for replicating databases, S3, and Kafka into Apache Iceberg (or plain Parquet).
Built for scalable, real-time pipelines, OLake provides a simple web UI and CLI - used to ingest into vendor-lock-in free Iceberg tables supporting all the query-engines/warehouses.

Read the docs and benchmarks at olake.io/docs. Join our active community on Slack.

OLake — Super-fast Sync to Apache Iceberg

OLake supports replication from transactional databases such as PostgreSQL, MySQL, MongoDB, Oracle, DB2, and MSSQL, event-streaming systems like Apache Kafka and Object-store like S3, into open data lakehouse formats such as Apache Iceberg or Plain Parquet — delivering blazing-fast performance with minimal infrastructure cost.

🚀 Why OLake?

🧠 Smart sync: Full + CDC replication with automatic schema discovery & schema evolution
⚡ High throughput: 580K RPS (Postgres) & 338K RPS (MySQL)
➡️ Exactly once delivery & Arrow writes: Accuracy with speed.
💾 Iceberg-native: Supports Glue, Hive, JDBC, REST catalogs
🖥️ Self-serve UI: Deploy via Docker Compose and sync in minutes
💸 Infra-light: No Spark, no Flink, no Kafka, no Debezium
🗜️ Iceberg Table Optimization (Coming soon): Compaction tailored for CDC ingestion

📊 Benchmarks & possible connections

Full Load

Source → Destination	Full Load	Relative Performance (Full Load)	Full Report
Postgres → Iceberg	5,80,113 RPS	12.5× faster than Fivetran	Full Report
MySQL → Iceberg	3,38,005 RPS	2.83× faster than Fivetran	Full Report
MongoDB → Iceberg	37,879 RPS	-	Full Report
Oracle → Iceberg	5,26,337 RPS	-	Full Report
Kafka → Iceberg	1,54,320 RPS (Bounded Incremental)	1.8x faster than Flink	Full Report

CDC

Source → Destination	CDC	Relative Performance (CDC)	Full Report
Postgres → Iceberg	41,390 RPS	1.5× faster than Fivetran	Full Report
MySQL → Iceberg	51,867 RPS	1.85× faster than Fivetran	Full Report
MongoDB → Iceberg	-	-	Full Report
Oracle → Iceberg	-	-	Full Report

*These are preliminary results. Fully reproducible benchmark scores will be published soon.

🔧 Supported Sources and Destinations

Sources (Databases and S3)

Source	Full Load	CDC	Incremental	Notes	Documentation
PostgreSQL	✅	✅ `pgoutput`	✅	`wal2json` deprecated	Postgres Docs
MySQL	✅	✅	✅	Binlog-based CDC	MySQL Docs
MongoDB	✅	✅	✅	Oplog-based CDC	MongoDB Docs
Oracle	✅	WIP	✅	JDBC based Full Load & Incremental	Oracle Docs
DB2	✅	-	✅	JDBC based Full Load & Incremental	DB2 Docs
MSSQL	✅	✅	✅	Full Load, CDC & Incremental	MSSQL Docs
S3	✅	-	✅	Ingests from Amazon S3 or S3-compatible (MinIO, LocalStack)	S3 Docs

Sources (Kafka)

Source	Bounded Incremental	Notes	Documentation
Kafka	✅	Latest offset bounded incremental sync	Kafka Docs

Destinations

Destination	Format	Supported Catalogs
Iceberg	✅	Glue, Hive, JDBC, REST (Nessie, Polaris, Unity, Lakekeeper, AWS S3 tables)
Parquet	✅	Filesystem
Other formats	🔜	Planned: Delta Lake, Hudi

Writer Docs

Apache Iceberg Docs
1. Catalogs
2. Azure ADLS Gen2
3. Google Cloud Storage (GCS)
4. MinIO (local)
5. Iceberg Table Management
  1. S3 Tables Supported
Parquet Writer

🧪 Quickstart (UI + Docker)

OLake UI is a web-based interface for managing OLake jobs, sources, destinations, and configurations. You can run the entire OLake stack (UI, Backend, and all dependencies) using Docker Compose. This is the recommended way to get started. Run the UI, connect your source DB, and start syncing in minutes.

curl -sSL https://raw.githubusercontent.com/datazip-inc/olake-ui/master/docker-compose.yml | docker compose -f - up -d

Access the UI: * OLake UI: http://localhost:8000 * Log in with default credentials: admin / password.

Detailed getting started using OLake UI can be found here.

Creating Your First Job

With the UI running, you can create a data pipeline in a few steps:

Create a Job: Navigate to the Jobs tab and click Create Job.
Configure Source: Set up your source connection (e.g., PostgreSQL, MySQL, MongoDB).
Configure Destination: Set up your destination (e.g., Apache Iceberg with a Glue, REST, Hive, or JDBC catalog).
Select Streams: Choose which tables to sync and configure their sync mode (CDC or Full Refresh).
Configure & Run: Give your job a name, set a schedule, and click Create Job to finish.

For a detailed walkthrough, refer to the Jobs documentation.

🛠️ CLI Usage (Advanced)

For advanced users and automation, OLake's core logic is exposed via a powerful CLI. The core framework handles state management, configuration validation, logging, and type detection. It interacts with drivers using four main commands:

spec: Returns a render-able JSON Schema for a connector's configuration.
check: Validates connection configurations for sources and destinations.
discover: Returns all available streams (e.g., tables) and their schemas from a source.
sync: Executes the data replication job, extracting from the source and writing to the destination.

Find out more about CLI here.

Install OLake

Below are other different ways you can run OLake:

Playground

🌍 Use Cases

✅ Migrate from OLTP to Iceberg without Spark or Flink
✅ Enable BI over fresh CDC data using Athena, StarRocks, Trino, Presto, Dremio, Databricks, Snowflake and more!
✅ Build near real-time data lake-house on cost-efficient cloud object stores
✅ Move away from vendor-lock-in warehouse or tools with open data lake-house
✅ Single copy for both analytics & machine learning.

🧭 Roadmap Highlights

Oracle Full Load Support
Oracle Incremental
Filters for Full Load and Incremental
Compaction & other table optimisations (In-progress)
Iceberg V3 Support

📌 Check out our GitHub Project Roadmap and the Upcoming OLake Roadmap to track what's next. If you have ideas or feedback, please share them in our GitHub Discussions or by opening an issue.

🤝 Contributing

We ❤️ contributions, big or small!

Check out our Bounty Program. A huge thanks to all our amazing contributors!

To contribute to the OLake core, see CONTRIBUTING.md.
To contribute to the UI, visit the OLake UI Repository.
To contribute to our website and documentation, visit the Olake Docs Repository.

Name		Name	Last commit message	Last commit date
Latest commit History 542 Commits
.githooks		.githooks
.github		.github
constants		constants
destination		destination
drivers		drivers
examples		examples
pkg		pkg
protocol		protocol
types		types
utils		utils
.dockerignore		.dockerignore
.gitignore		.gitignore
.golangci.yml		.golangci.yml
.trivyignore		.trivyignore
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENCE		LICENCE
Makefile		Makefile
README.md		README.md
build.sh		build.sh
connector.go		connector.go
docker-compose.yml		docker-compose.yml
go.mod		go.mod
go.work		go.work
release-tool.sh		release-tool.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OLake

OLake — Super-fast Sync to Apache Iceberg

🚀 Why OLake?

📊 Benchmarks & possible connections

Full Load

CDC

🔧 Supported Sources and Destinations

Sources (Databases and S3)

Sources (Kafka)

Destinations

Writer Docs

🧪 Quickstart (UI + Docker)

Creating Your First Job

🛠️ CLI Usage (Advanced)

Install OLake

Playground

🌍 Use Cases

🧭 Roadmap Highlights

🤝 Contributing

About

Uh oh!

Releases 40

Uh oh!

Contributors 40

Languages

License

datazip-inc/olake

Folders and files

Latest commit

History

Repository files navigation

OLake

OLake — Super-fast Sync to Apache Iceberg

🚀 Why OLake?

📊 Benchmarks & possible connections

Full Load

CDC

🔧 Supported Sources and Destinations

Sources (Databases and S3)

Sources (Kafka)

Destinations

Writer Docs

🧪 Quickstart (UI + Docker)

Creating Your First Job

🛠️ CLI Usage (Advanced)

Install OLake

Playground

🌍 Use Cases

🧭 Roadmap Highlights

🤝 Contributing

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 40

Uh oh!

Contributors 40

Languages