A data-to-visualization system where users ask business questions in plain English and instantly see the answer as a chart.
- Overview
- Architecture
- Tech Stack
- Setup
- Usage
- API Reference
- MCP Tools
- Database
- Design Decisions
- AI Reliability
- Project Structure
- Future Improvements
Users ask questions like "Show total sales by category" and get an instant chart visualization.
| Bar Chart | Line Chart | Pie Chart |
|---|---|---|
![]() |
![]() |
![]() |
The system uses:
- MCP (Model Context Protocol) for AI-tool interaction
- Gemini AI for natural language understanding
- Chart.js for visualization
- Docker-friendly (each service in own container)
- Scalable and debuggable
- Production-ready
| Component | Technology | Purpose |
|---|---|---|
| Backend | FastAPI | Async API server |
| MCP Server | Starlette + SSE | Tool execution |
| Database | SQLAlchemy | Multi-DB support |
| AI | Gemini API | Natural language processing |
| Frontend | Chart.js | Chart rendering |
| Container | Docker Compose | Deployment |
- Docker & Docker Compose
- Gemini API key (Get free)
# 1. Clone
git clone <repo-url>
cd DataToVisual
# 2. Configure
cp .env.example .env
# Edit .env and add GEMINI_API_KEY
# 3. Start all services
make up
# 4. Seed database
make seed
# 5. Open frontend
open http://localhost:5500# Start database only
docker compose up -d postgres
# Check database is ready
docker compose exec postgres pg_isready
# Connect to database
docker compose exec postgres psql -U postgres -d datatovisual# Start backend (requires database)
docker compose up -d postgres backend
# View backend logs
make logs s=backend
# Run tests
make test# Start MCP server (requires database)
docker compose up -d postgres mcp-server
# View MCP server logs
make logs s=mcp-server# Start frontend
docker compose up -d frontend
# Or run locally without Docker
cd frontend && python -m http.server 5500| Service | URL |
|---|---|
| Frontend | http://localhost:5500 |
| API | http://localhost:8000 |
| API Docs | http://localhost:8000/docs |
| MCP Server | http://localhost:3001 |
Debug and test MCP server tools, resources, and prompts:
# Install MCP Inspector
npx @anthropic/mcp-inspector
# Connect to MCP server
# URL: http://localhost:3001/sseFeatures:
- List and test tools (
simple_query,advanced_query) - Read resources (
schema://database) - Get prompts (
data_analyst) - View request/response payloads
| Command | Description |
|---|---|
make up |
Start all services |
make down |
Stop all services |
make rebuild |
Rebuild and restart |
make test |
Run all tests |
make test-unit |
Run unit tests only |
make test-int |
Run integration tests only |
make seed |
Seed the database |
make logs s=backend |
Show logs for a service |
make status |
Show service status |
make clean |
Remove containers and volumes |
| Question | Chart Type |
|---|---|
| Compare total sales in 2026 vs 2022 | Bar |
| How are our sales trending over the last 5 years? | Line |
| What are our top 5 selling products? | Bar |
| Show sales by product category | Bar |
| What is the average order value by category? | Bar |
| Show monthly revenue for 2025 | Line |
| Compare Q1 vs Q4 sales in 2024 | Bar |
| Show total quantity sold per product | Bar |
| What is total revenue by product name? | Bar |
| Show sales distribution by category | Pie |
Request:
{
"question": "Show total sales by category"
}Response:
{
"question": "Show total sales by category",
"chart_type": "bar",
"rows": [
{ "label": "Electronics", "value": 17059236.4 },
{ "label": "Home", "value": 3001309.34 }
]
}Structured query for single-table operations. SQL is built from validated parameters.
{
"table": "sales",
"label_column": "product_id",
"value_column": "total_amount",
"aggregation": "SUM",
"filters": [{"column": "sale_date", "operator": ">=", "value": "2024-01-01"}],
"order_by": "value_desc",
"limit": 10,
"chart_type": "bar"
}Raw SQL for complex queries (JOINs, subqueries). Includes SQL sanitization.
{
"sql": "SELECT p.category AS label, SUM(s.total_amount) AS value FROM sales s JOIN products p ON s.product_id = p.id GROUP BY p.category",
"chart_type": "bar"
}| Database | DATABASE_TYPE | DATABASE_URL |
|---|---|---|
| PostgreSQL | postgresql |
postgresql://user:pass@host/db |
| MySQL | mysql |
mysql://user:pass@host/db |
| SQLite | sqlite |
sqlite:///path/to/file.db |
| Tool | Use Case | Safety |
|---|---|---|
simple_query |
Single-table aggregations | SQL built from validated structure |
advanced_query |
JOINs, complex queries | Raw SQL with sanitization |
Why only 2 tools?
| Approach | Pros | Cons |
|---|---|---|
| 1 tool (raw SQL only) | Maximum flexibility | High security risk, AI errors in SQL syntax |
| 2 tools (our choice) | Balance of safety + flexibility | AI must choose correctly |
| Many tools (per operation) | Very safe, predictable | Limited queries, complex tool selection |
Trade-offs of 2-tool approach:
| Trade-off | Our Choice | Alternative |
|---|---|---|
| Safety vs Flexibility | 80% safe (simple), 20% flexible (advanced) | All raw SQL = 100% flexible but risky |
| AI Complexity | AI chooses between 2 tools | Many tools = harder for AI to choose |
| Query Coverage | Covers most business questions | Single tool = limited or unsafe |
| Maintenance | 2 tools to maintain | Many tools = more code to maintain |
When AI chooses which tool:
| Question Type | Tool | Why |
|---|---|---|
| "Count products per category" | simple_query |
Single table, basic aggregation |
| "Total sales by category" | advanced_query |
Needs JOIN (sales + products) |
| "Top 5 products by revenue" | advanced_query |
Needs JOIN + ORDER BY + LIMIT |
| "Average price per category" | simple_query |
Single table (products) |
| Layer | Protection |
|---|---|
| Keyword blocking | DROP, DELETE, UPDATE, INSERT, ALTER, TRUNCATE |
| Pattern blocking | ; (multi-statement), -- (comments), UNION |
| Row limit | Auto-adds LIMIT 1000 |
| Timeout | 30-second query timeout |
| Table whitelist | Only allowed tables in simple_query |
Schema is fetched from database at runtime via MCP Resource, not hardcoded.
AI responses are validated against MCP tool schemas dynamically:
| Check | Description |
|---|---|
| Tool name | Must exist in MCP server's tool list |
| Required fields | All required fields from inputSchema |
| Enum values | Must match allowed values from schema |
simple_query fails → AI generates raw SQL → Execute advanced_query
Implementation: backend/app/mcp/clients/base.py
DataToVisual/
├── docker-compose.yml
├── backend/
│ ├── app/
│ │ ├── main.py # FastAPI app
│ │ ├── config.py # Environment config
│ │ ├── routers/query.py # API endpoint
│ │ ├── mcp/
│ │ │ ├── server.py # MCP Server
│ │ │ ├── sql_builder.py # SQL building & validation
│ │ │ └── clients/
│ │ │ ├── base.py # MCP client + reliability
│ │ │ └── gemini.py # Gemini integration
│ │ └── db/
│ │ └── database.py # SQLAlchemy
│ └── tests/
│ ├── unit/
│ └── integration/
└── frontend/
├── index.html
├── style.css
└── app.js
Security
- HTTPS (nginx/Cloudflare)
- User authentication
- Rate limiting middleware
Performance
- Redis caching for repeated queries
- Database read replicas
Operations
- CI/CD pipeline (GitHub Actions)
- Prometheus metrics + Grafana
- Secrets manager (Vault/AWS Secrets)
Features
- Query history
- Export charts as images






