AI-Native application for discovering local restaurants. Community driven, powered by AI.
-
Upload API endpoint to accept an image + metadata (user ID, location). Store image in object storage.
-
OCR Processing Extract text from the image (items, prices). Store raw text + structured JSON in Postgres.
-
Embeddings & RAG Generate embeddings for menu items. Store embeddings in a vector DB (e.g., Qdrant).
-
Provide an /ask endpoint: Input: natural language query (e.g., “Where can I get jollof rice near me?”)
Process: retrieve relevant menus via embeddings + geo filter
Output: structured JSON with results (not just raw text).
This project implements a backend service designed to process restaurant menu images, extract structured information using Optical Character Recognition (OCR), and make that data searchable
- Menu Image Upload: An API endpoint to accept restaurant menu images along with basic metadata (user ID, location).
- Object Storage: Secure storage of uploaded images. (Currently, this is a local file system storage, but designed for easy integration with cloud object storage like S3 or GCS).
- OCR Processing: Automatic extraction of raw text and structured menu items (item name, price, description) from images using Google Gemini 2.5 Flash.
- PostgreSQL Persistence: Storage of OCR results (raw text and structured JSON) in a PostgreSQL database.
- Embedding Generation: Creation of vector embeddings for menu items (or raw text as a fallback) using Google's
gemini-embedding-001model. - Vector Database (Qdrant): Storage of embeddings and their associated metadata (menu ID, user ID, item details) in Qdrant for efficient semantic search.
- RAG Query Endpoint (Planned): An
/askendpoint (to be implemented) that will leverage the vector database and potentially a Large Language Model (LLM) to answer natural language queries about menu items.
The food_rag service is built as a FastAPI application, orchestrating several key components:
- FastAPI Application (
main.py): The entry point for the API, handling incoming requests and coordinating the workflow. - Image Storage (
storage.py): Manages saving uploaded menu images. - OCR Module (
ocr.py): Interfaces with Google Gemini for text extraction and structuring. - PostgreSQL Database: Stores metadata about uploaded menus and OCR results.
- Embedding Module (
embeddings.py): Generates vector representations of menu text. - Qdrant Vector Database (
qdrant_service.py): Stores and indexes the generated embeddings for fast retrieval.
The typical flow for an uploaded menu is as follows:
- User uploads an image via the
/uploadendpoint. - The image is saved to local storage.
- OCR is performed to extract raw text and structured menu items.
- OCR results and metadata are stored in PostgreSQL.
- Embeddings are generated for structured menu items (or raw text).
- Embeddings and relevant payload are uploaded to Qdrant.
- FastAPI: Modern, fast (high-performance) web framework for building APIs with Python 3.11+.
- SQLAlchemy: Python SQL toolkit and Object Relational Mapper (ORM) for interacting with PostgreSQL.
- Pydantic: Data validation and settings management using Python type hints.
- PostgreSQL: Robust, open-source relational database.
- Alembic: Lightweight database migration tool for SQLAlchemy.
- Google Generative AI SDK: Used for accessing Google Gemini 2.5 Flash for OCR and
gemini-embedding-001for text embeddings. - Qdrant: High-performance, scalable vector database for similarity search.
- Docker & Docker Compose: For containerization and orchestration of the Qdrant service.
- Python 3.11+
This section guides you through setting up the food_rag project for development and local testing.
Before you begin, ensure you have the following installed:
- Python 3.11+
- pip (Python package installer)
- Docker & Docker Compose: Required to run the Qdrant vector database.
Start by cloning the project repository to your local machine:
git clone https://github.com/ugochimbo/mucho.git
cd muchopython -m venv venv
source venv/bin/activate # On Windows, use `venv\\Scripts\\activate`
pip install -r requirements.txt
The application requires a Google Gemini API key and Qdrant host/port configuration. Create a .env file in the root of your project or set these as environment variables in your shell.
# .env file example
GEMINI_API_KEY="YOUR_GOOGLE_GEMINI_API_KEY"
QDRANT_HOST="localhost"
QDRANT_PORT=6333
# For PostgreSQL connection (replace with your actual database URL)
DATABASE_URL="postgresql://user:password@host:port/database_name"
This project uses Qdrant as its vector database. You can run it easily using Docker Compose.
Ensure Docker is running on your system, then navigate to the project root directory and execute:
docker-compose up -d
This command will download the qdrant/qdrant:latest image and start the Qdrant service in the background, mapping port 6333 (REST API & Web UI) and 6334 (gRPC API) to your host. A Docker volume qdrant_data will be created to persist Qdrant's data.
This project uses Alembic for managing PostgreSQL database migrations.
- Ensure PostgreSQLis running** and accessible at the
DATABASE_URLyou configured. - Initialize Alembic (if not already done, usually once per project):
alembic init food_rag/alembic - Update
alembic.iniandenv.py**: Configurealembic.iniwith yoursqlalchemy.url(fromDATABASE_URLenvironment variable) andenv.pyto import yourBasefrommodels.py.- In
alembic.ini, uncomment and setsqlalchemy.url = <your_database_url> - In
food_rag/alembic/env.py, importBaseand settarget_metadata = Base.metadata.
- In
# food_rag/alembic/env.py
# ... other imports
from food_rag.models import Base
target_metadata = Base.metadata
- Generate Migration Script: After making changes to
models.py, generate a new migration:alembic revision --autogenerate -m "Initial migration" - Apply Migrations: Apply the pending migrations to your database:
alembic upgrade head
Once all services (Qdrant, PostgreSQL) are running and the Python environment is set up, you can start the FastAPI application using Uvicorn:
uvicorn main:app --reload --app-dir food_rag
The --reload flag will automatically restart the server on code changes, which is useful for development. The --app-dir food_rag ensures Uvicorn looks for main.py inside the food_rag directory.
The API will be available at http://127.0.0.1:8000. You can access the FastAPI interactive documentation (Swagger UI) at http://127.0.0.1:8000/docs.
Uploads a restaurant menu image, processes it with OCR, stores results, generates embeddings, and indexes them in Qdrant.
- URL:
/upload - Method:
POST - Request Body (Form Data):
image:File(required) - The menu image file (e.g., JPEG, PNG).user_id:str(required) - The ID of the user uploading the menu.location:str(optional) - The geographical location associated with the menu/restaurant.
Response (200 OK):
{
"message": "Menu uploaded, OCR processed, and embeddings stored.",
"menu_id": 1, # The ID assigned in the PostgreSQL database
"image_path": "/path/to/saved/image.jpg",
"ocr_raw_text": "Extracted raw text from the menu...",
""ocr_structured_json": {
"menu_items": [
{
"item": "Jollof Rice",
"price": "$15.00",
"description": "A traditional West African rice dish."
}
]
},
"qdrant_points_uploaded": 1 # Number of points uploaded to Qdrant
}
Error Responses:
- 422 Unprocessable Entity: If validation fails for user_id or image.
- 500 Internal Server Error: For issues during image saving, OCR processing, database errors, or Qdrant upload.
This section describes the data structures used within the application.
Defined in food_rag/models.py, this model represents a record in the menu_ocr_result table, storing the outcome of menu image processing.