A minimal, LLM-powered prototype inspired by Datafold’s Migration Agent (DMA) — built to translate SQL dialects and validate data parity between outputs using DuckDB.
This project is a hands-on demo inspired by the vision behind Datafold’s DMA.
I wanted to replicate a small, functional version of what Datafold is doing — SQL translation + data validation — using LLMs and lightweight tools.
- LLM-powered SQL translation using OpenAI
- Runs SQL queries on DuckDB using CSV inputs
- Compares outputs from legacy and translated queries
- Generates detailed diff reports with timestamps
- Clean project structure for modularity and clarity
ai_sql_migrator/
│
├—— data/
│ ├— query.csv ← Input data (acts as source table)
│
├—— sql/
│ ├— legacy_query.sql ← Original SQL (legacy dialect)
│ └— translated_query.sql ← LLM-translated SQL (modern dialect)
│
├—— outputs/
│ ├— diff_report.txt ← Latest diff result
│ └— diff_reports/ ← All historical diff logs (timestamped)
│
├— .env ← Your OpenAI API key lives here
├— main.py ← Loads data into DuckDB
├— translate_sql.py ← Translates legacy SQL using OpenAI
├— diff_checker.py ← Compares outputs and generates reports
├— requirements.txt ← Required libraries
└— README.md ← You're here :)
- (Optional) Create a virtual environment:
python -m venv venv
venv\Scripts\activate # for Windows
source venv/bin/activate # for Mac/Linux
- Install dependencies:
pip install -r requirements.txt
- Create
.envfile and add your API key:
OPENAI_API_KEY=your_openai_key_here
- Load data into DuckDB:
python main.py
- Translate legacy SQL using OpenAI:
python translate_sql.py
- Compare results and generate diff report:
python diff_checker.py
🧠 Translating SQL with OpenAI...
✅ Translation saved to sql/translated_query.sql
🔁 Running legacy SQL...
🌟 Running translated SQL...
🔍 Comparing results...
✅ Query outputs match! Data parity confirmed.
📄 Diff report saved to: outputs/diff_reports/diff_YYYY-MM-DD_HH-MM-SS.txt
Hi — I’m Srinath, a Data Engineer deeply interested in automation and AI for infra.
I admire what Datafold is building, and this project is a way for me to say:
- I understand the problem you're solving
- I can build fast and stay focused on real impact
- I'm genuinely interested in contributing to your mission
Would love to connect if this aligns with your team’s goals.
Let’s build something together 🚀
—
Srinath
[email protected]