DataToVisual

A data-to-visualization system where users ask business questions in plain English and instantly see the answer as a chart.

Overview

Users ask questions like "Show total sales by category" and get an instant chart visualization.

Bar Chart	Line Chart	Pie Chart

The system uses:

MCP (Model Context Protocol) for AI-tool interaction
Gemini AI for natural language understanding
Chart.js for visualization

Architecture

Sequence Diagram

Why HTTP/SSE?

Docker-friendly (each service in own container)
Scalable and debuggable
Production-ready

Tech Stack

Component	Technology	Purpose
Backend	FastAPI	Async API server
MCP Server	Starlette + SSE	Tool execution
Database	SQLAlchemy	Multi-DB support
AI	Gemini API	Natural language processing
Frontend	Chart.js	Chart rendering
Container	Docker Compose	Deployment

Setup

Prerequisites

Docker & Docker Compose
Gemini API key (Get free)

Quick Start

# 1. Clone
git clone <repo-url>
cd DataToVisual

# 2. Configure
cp .env.example .env
# Edit .env and add GEMINI_API_KEY

# 3. Start all services
make up

# 4. Seed database
make seed

# 5. Open frontend
open http://localhost:5500

Run Each Service

Database (PostgreSQL)

# Start database only
docker compose up -d postgres

# Check database is ready
docker compose exec postgres pg_isready

# Connect to database
docker compose exec postgres psql -U postgres -d datatovisual

Backend (FastAPI)

# Start backend (requires database)
docker compose up -d postgres backend

# View backend logs
make logs s=backend

# Run tests
make test

MCP Server

# Start MCP server (requires database)
docker compose up -d postgres mcp-server

# View MCP server logs
make logs s=mcp-server

Frontend (Chart.js)

# Start frontend
docker compose up -d frontend

# Or run locally without Docker
cd frontend && python -m http.server 5500

URLs

Service	URL
Frontend	http://localhost:5500
API	http://localhost:8000
API Docs	http://localhost:8000/docs
MCP Server	http://localhost:3001

MCP Inspector

Debug and test MCP server tools, resources, and prompts:

# Install MCP Inspector
npx @anthropic/mcp-inspector

# Connect to MCP server
# URL: http://localhost:3001/sse

Features:

List and test tools (simple_query, advanced_query)
Read resources (schema://database)
Get prompts (data_analyst)
View request/response payloads

Make Commands

Command	Description
`make up`	Start all services
`make down`	Stop all services
`make rebuild`	Rebuild and restart
`make test`	Run all tests
`make test-unit`	Run unit tests only
`make test-int`	Run integration tests only
`make seed`	Seed the database
`make logs s=backend`	Show logs for a service
`make status`	Show service status
`make clean`	Remove containers and volumes

Usage

Example Questions

Question	Chart Type
Compare total sales in 2026 vs 2022	Bar
How are our sales trending over the last 5 years?	Line
What are our top 5 selling products?	Bar
Show sales by product category	Bar
What is the average order value by category?	Bar
Show monthly revenue for 2025	Line
Compare Q1 vs Q4 sales in 2024	Bar
Show total quantity sold per product	Bar
What is total revenue by product name?	Bar
Show sales distribution by category	Pie

API Reference

POST /api/v1/query

Request:

{
  "question": "Show total sales by category"
}

Response:

{
  "question": "Show total sales by category",
  "chart_type": "bar",
  "rows": [
    { "label": "Electronics", "value": 17059236.4 },
    { "label": "Home", "value": 3001309.34 }
  ]
}

MCP Tools

simple_query

Structured query for single-table operations. SQL is built from validated parameters.

{
  "table": "sales",
  "label_column": "product_id",
  "value_column": "total_amount",
  "aggregation": "SUM",
  "filters": [{"column": "sale_date", "operator": ">=", "value": "2024-01-01"}],
  "order_by": "value_desc",
  "limit": 10,
  "chart_type": "bar"
}

advanced_query

Raw SQL for complex queries (JOINs, subqueries). Includes SQL sanitization.

{
  "sql": "SELECT p.category AS label, SUM(s.total_amount) AS value FROM sales s JOIN products p ON s.product_id = p.id GROUP BY p.category",
  "chart_type": "bar"
}

Database

Schema

Multi-Database Support

Database	DATABASE_TYPE	DATABASE_URL
PostgreSQL	`postgresql`	`postgresql://user:pass@host/db`
MySQL	`mysql`	`mysql://user:pass@host/db`
SQLite	`sqlite`	`sqlite:///path/to/file.db`

Design Decisions

1. Two-Tool Design (simple_query + advanced_query)

Tool	Use Case	Safety
`simple_query`	Single-table aggregations	SQL built from validated structure
`advanced_query`	JOINs, complex queries	Raw SQL with sanitization

Why only 2 tools?

Approach	Pros	Cons
1 tool (raw SQL only)	Maximum flexibility	High security risk, AI errors in SQL syntax
2 tools (our choice)	Balance of safety + flexibility	AI must choose correctly
Many tools (per operation)	Very safe, predictable	Limited queries, complex tool selection

Trade-offs of 2-tool approach:

Trade-off	Our Choice	Alternative
Safety vs Flexibility	80% safe (simple), 20% flexible (advanced)	All raw SQL = 100% flexible but risky
AI Complexity	AI chooses between 2 tools	Many tools = harder for AI to choose
Query Coverage	Covers most business questions	Single tool = limited or unsafe
Maintenance	2 tools to maintain	Many tools = more code to maintain

When AI chooses which tool:

Question Type	Tool	Why
"Count products per category"	`simple_query`	Single table, basic aggregation
"Total sales by category"	`advanced_query`	Needs JOIN (sales + products)
"Top 5 products by revenue"	`advanced_query`	Needs JOIN + ORDER BY + LIMIT
"Average price per category"	`simple_query`	Single table (products)

2. SQL Safety Layers

Layer	Protection
Keyword blocking	DROP, DELETE, UPDATE, INSERT, ALTER, TRUNCATE
Pattern blocking	`;` (multi-statement), `--` (comments), `UNION`
Row limit	Auto-adds `LIMIT 1000`
Timeout	30-second query timeout
Table whitelist	Only allowed tables in `simple_query`

3. Dynamic Schema Discovery

Schema is fetched from database at runtime via MCP Resource, not hardcoded.

AI Reliability

Validation

AI responses are validated against MCP tool schemas dynamically:

Check	Description
Tool name	Must exist in MCP server's tool list
Required fields	All required fields from inputSchema
Enum values	Must match allowed values from schema

Retry with Fallback

simple_query fails → AI generates raw SQL → Execute advanced_query

Implementation: backend/app/mcp/clients/base.py

Project Structure

DataToVisual/
├── docker-compose.yml
├── backend/
│   ├── app/
│   │   ├── main.py              # FastAPI app
│   │   ├── config.py            # Environment config
│   │   ├── routers/query.py     # API endpoint
│   │   ├── mcp/
│   │   │   ├── server.py        # MCP Server
│   │   │   ├── sql_builder.py   # SQL building & validation
│   │   │   └── clients/
│   │   │       ├── base.py      # MCP client + reliability
│   │   │       └── gemini.py    # Gemini integration
│   │   └── db/
│   │       └── database.py      # SQLAlchemy
│   └── tests/
│       ├── unit/
│       └── integration/
└── frontend/
    ├── index.html
    ├── style.css
    └── app.js

Future Improvements

Security

HTTPS (nginx/Cloudflare)
User authentication
Rate limiting middleware

Performance

Redis caching for repeated queries
Database read replicas

Operations

CI/CD pipeline (GitHub Actions)
Prometheus metrics + Grafana
Secrets manager (Vault/AWS Secrets)

Features

Query history
Export charts as images

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
backend		backend
docs		docs
frontend		frontend
.env.example		.env.example
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
Makefile		Makefile
README.md		README.md
docker-compose.yml		docker-compose.yml

trinh31201/DataToVisual

Folders and files

Latest commit

History

Repository files navigation