Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Data Migration tool from traditional DataBase to Modern or new Database models based on the projects structure and decision.This is a light-weight model which uses OpenAPI key to easily migrate hassle-free with minimal efforts and automated Query generator for target Database.

Notifications You must be signed in to change notification settings

srinathmyana/AI_SQL_Migrator

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AI SQL Migrator 🧠⚙️

A minimal, LLM-powered prototype inspired by Datafold’s Migration Agent (DMA) — built to translate SQL dialects and validate data parity between outputs using DuckDB.


🚀 Why this project?

This project is a hands-on demo inspired by the vision behind Datafold’s DMA.
I wanted to replicate a small, functional version of what Datafold is doing — SQL translation + data validation — using LLMs and lightweight tools.


🧹 Features

  • LLM-powered SQL translation using OpenAI
  • Runs SQL queries on DuckDB using CSV inputs
  • Compares outputs from legacy and translated queries
  • Generates detailed diff reports with timestamps
  • Clean project structure for modularity and clarity

📁 Folder Structure

ai_sql_migrator/

├—— data/
│ ├— query.csv ← Input data (acts as source table)

├—— sql/
│ ├— legacy_query.sql ← Original SQL (legacy dialect)
│ └— translated_query.sql ← LLM-translated SQL (modern dialect)

├—— outputs/
│ ├— diff_report.txt ← Latest diff result
│ └— diff_reports/ ← All historical diff logs (timestamped)

├— .env ← Your OpenAI API key lives here
├— main.py ← Loads data into DuckDB
├— translate_sql.py ← Translates legacy SQL using OpenAI
├— diff_checker.py ← Compares outputs and generates reports
├— requirements.txt ← Required libraries
└— README.md ← You're here :)


⚙️ Setup

  1. (Optional) Create a virtual environment:
python -m venv venv  
venv\Scripts\activate  # for Windows  
source venv/bin/activate  # for Mac/Linux
  1. Install dependencies:
pip install -r requirements.txt
  1. Create .env file and add your API key:
OPENAI_API_KEY=your_openai_key_here

▶️ How to Run

  1. Load data into DuckDB:
python main.py
  1. Translate legacy SQL using OpenAI:
python translate_sql.py
  1. Compare results and generate diff report:
python diff_checker.py

✅ Sample Output

🧠 Translating SQL with OpenAI...
✅ Translation saved to sql/translated_query.sql

🔁 Running legacy SQL...
🌟 Running translated SQL...
🔍 Comparing results...
✅ Query outputs match! Data parity confirmed.
📄 Diff report saved to: outputs/diff_reports/diff_YYYY-MM-DD_HH-MM-SS.txt

🙇 Why I Built This (For Datafold)

Hi — I’m Srinath, a Data Engineer deeply interested in automation and AI for infra.
I admire what Datafold is building, and this project is a way for me to say:

  • I understand the problem you're solving
  • I can build fast and stay focused on real impact
  • I'm genuinely interested in contributing to your mission

Would love to connect if this aligns with your team’s goals.

Let’s build something together 🚀

Srinath
[email protected]

About

Data Migration tool from traditional DataBase to Modern or new Database models based on the projects structure and decision.This is a light-weight model which uses OpenAPI key to easily migrate hassle-free with minimal efforts and automated Query generator for target Database.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages