Thanks to visit codestin.com
Credit goes to github.com

Skip to content

ugochimbo/mucho

Repository files navigation

Mucho Api

AI-Native application for discovering local restaurants. Community driven, powered by AI.

Core Flow

  • Upload API endpoint to accept an image + metadata (user ID, location). Store image in object storage.

  • OCR Processing Extract text from the image (items, prices). Store raw text + structured JSON in Postgres.

  • Embeddings & RAG Generate embeddings for menu items. Store embeddings in a vector DB (e.g., Qdrant).

  • Provide an /ask endpoint: Input: natural language query (e.g., “Where can I get jollof rice near me?”)

Process: retrieve relevant menus via embeddings + geo filter

Output: structured JSON with results (not just raw text).


This project implements a backend service designed to process restaurant menu images, extract structured information using Optical Character Recognition (OCR), and make that data searchable

1. Core Features

  • Menu Image Upload: An API endpoint to accept restaurant menu images along with basic metadata (user ID, location).
  • Object Storage: Secure storage of uploaded images. (Currently, this is a local file system storage, but designed for easy integration with cloud object storage like S3 or GCS).
  • OCR Processing: Automatic extraction of raw text and structured menu items (item name, price, description) from images using Google Gemini 2.5 Flash.
  • PostgreSQL Persistence: Storage of OCR results (raw text and structured JSON) in a PostgreSQL database.
  • Embedding Generation: Creation of vector embeddings for menu items (or raw text as a fallback) using Google's gemini-embedding-001 model.
  • Vector Database (Qdrant): Storage of embeddings and their associated metadata (menu ID, user ID, item details) in Qdrant for efficient semantic search.
  • RAG Query Endpoint (Planned): An /ask endpoint (to be implemented) that will leverage the vector database and potentially a Large Language Model (LLM) to answer natural language queries about menu items.

2. Architecture

The food_rag service is built as a FastAPI application, orchestrating several key components:

  1. FastAPI Application (main.py): The entry point for the API, handling incoming requests and coordinating the workflow.
  2. Image Storage (storage.py): Manages saving uploaded menu images.
  3. OCR Module (ocr.py): Interfaces with Google Gemini for text extraction and structuring.
  4. PostgreSQL Database: Stores metadata about uploaded menus and OCR results.
  5. Embedding Module (embeddings.py): Generates vector representations of menu text.
  6. Qdrant Vector Database (qdrant_service.py): Stores and indexes the generated embeddings for fast retrieval.

The typical flow for an uploaded menu is as follows:

  • User uploads an image via the /upload endpoint.
  • The image is saved to local storage.
  • OCR is performed to extract raw text and structured menu items.
  • OCR results and metadata are stored in PostgreSQL.
  • Embeddings are generated for structured menu items (or raw text).
  • Embeddings and relevant payload are uploaded to Qdrant.

3. Technologies Used

  • FastAPI: Modern, fast (high-performance) web framework for building APIs with Python 3.11+.
  • SQLAlchemy: Python SQL toolkit and Object Relational Mapper (ORM) for interacting with PostgreSQL.
  • Pydantic: Data validation and settings management using Python type hints.
  • PostgreSQL: Robust, open-source relational database.
  • Alembic: Lightweight database migration tool for SQLAlchemy.
  • Google Generative AI SDK: Used for accessing Google Gemini 2.5 Flash for OCR and gemini-embedding-001 for text embeddings.
  • Qdrant: High-performance, scalable vector database for similarity search.
  • Docker & Docker Compose: For containerization and orchestration of the Qdrant service.
  • Python 3.11+

4. Setup Instructions

This section guides you through setting up the food_rag project for development and local testing.

Prerequisites

Before you begin, ensure you have the following installed:

1. Clone the Repository

Start by cloning the project repository to your local machine:

git clone https://github.com/ugochimbo/mucho.git
cd mucho

2. Set up virtual environment

python -m venv venv
source venv/bin/activate  # On Windows, use `venv\\Scripts\\activate`
pip install -r requirements.txt

3. Configure Environment Variables

The application requires a Google Gemini API key and Qdrant host/port configuration. Create a .env file in the root of your project or set these as environment variables in your shell.

# .env file example
GEMINI_API_KEY="YOUR_GOOGLE_GEMINI_API_KEY"
QDRANT_HOST="localhost"
QDRANT_PORT=6333
# For PostgreSQL connection (replace with your actual database URL)
DATABASE_URL="postgresql://user:password@host:port/database_name"

4. Run Qdrant with Docker Compose

This project uses Qdrant as its vector database. You can run it easily using Docker Compose.

Ensure Docker is running on your system, then navigate to the project root directory and execute:

docker-compose up -d

This command will download the qdrant/qdrant:latest image and start the Qdrant service in the background, mapping port 6333 (REST API & Web UI) and 6334 (gRPC API) to your host. A Docker volume qdrant_data will be created to persist Qdrant's data.

5. Database Setup (Alembic Migrations)

This project uses Alembic for managing PostgreSQL database migrations.

  1. Ensure PostgreSQLis running** and accessible at the DATABASE_URL you configured.
  2. Initialize Alembic (if not already done, usually once per project):
    alembic init  food_rag/alembic
    
  3. Update alembic.ini and env.py**: Configure alembic.ini with your sqlalchemy.url (from DATABASE_URL environment variable) and env.py to import your Base from models.py.
    • In alembic.ini, uncomment and set sqlalchemy.url = <your_database_url>
    • In food_rag/alembic/env.py, import Base and set target_metadata = Base.metadata.
# food_rag/alembic/env.py
# ... other imports
from food_rag.models import Base
target_metadata = Base.metadata
  1. Generate Migration Script: After making changes to models.py, generate a new migration:
    alembic revision --autogenerate -m "Initial migration"
    
  2. Apply Migrations: Apply the pending migrations to your database:
    alembic upgrade head
    

6. Running the Application:

Once all services (Qdrant, PostgreSQL) are running and the Python environment is set up, you can start the FastAPI application using Uvicorn:

uvicorn main:app --reload --app-dir food_rag

The --reload flag will automatically restart the server on code changes, which is useful for development. The --app-dir food_rag ensures Uvicorn looks for main.py inside the food_rag directory.

The API will be available at http://127.0.0.1:8000. You can access the FastAPI interactive documentation (Swagger UI) at http://127.0.0.1:8000/docs.

7. API Endpoints

POST /upload

Uploads a restaurant menu image, processes it with OCR, stores results, generates embeddings, and indexes them in Qdrant.

  • URL: /upload
  • Method: POST
  • Request Body (Form Data):
    • image: File (required) - The menu image file (e.g., JPEG, PNG).
    • user_id: str (required) - The ID of the user uploading the menu.
    • location: str (optional) - The geographical location associated with the menu/restaurant.

Response (200 OK):

    {
      "message": "Menu uploaded, OCR processed, and embeddings stored.",
      "menu_id": 1, # The ID assigned in the PostgreSQL database
      "image_path": "/path/to/saved/image.jpg",
      "ocr_raw_text": "Extracted raw text from the menu...",
      ""ocr_structured_json": {
        "menu_items": [
          {
            "item": "Jollof Rice",
            "price": "$15.00",
            "description": "A traditional West African rice dish."
          }
        ]
      },
      "qdrant_points_uploaded": 1 # Number of points uploaded to Qdrant
    }

Error Responses: - 422 Unprocessable Entity: If validation fails for user_id or image. - 500 Internal Server Error: For issues during image saving, OCR processing, database errors, or Qdrant upload.

8 Data Models

This section describes the data structures used within the application.

MenuOCRResult (SQLAlchemy Model)

Defined in food_rag/models.py, this model represents a record in the menu_ocr_result table, storing the outcome of menu image processing.

About

AI-Native application for local discovery

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages