SSM Multimodal MVP

A full-stack MVP for multimodal (audio + text) sequence modeling and inference, featuring model training, ONNX export, gRPC-based serving, a Next.js frontend, and monitoring with Prometheus and Grafana.

Project Structure

ssm-multimodal-mvp/
├── data/                # Processed datasets (e.g., synthetic_processed.pt)
├── deployment/          # Dockerfiles, docker-compose, monitoring configs
├── frontend/            # Next.js client app
├── models/              # Model checkpoints and ONNX exports
├── monitoring/          # (empty or for future monitoring scripts)
├── protoc-25.3/         # Protobuf tools
├── serving/             # Go gRPC server and proto definitions
├── training/            # Model training, data prep, and evaluation scripts
├── requirements.txt     # Python dependencies
└── ...

Features

Multimodal Model: Audio (mel spectrogram) + text input, trained with Mamba SSM blocks.
Training Pipeline: Data preparation, training, evaluation, and ONNX export.
gRPC Inference Server: Go-based server loads ONNX model and exposes gRPC API.
gRPC-Web Proxy: Bridges browser/frontend to gRPC backend.
Next.js Frontend: User interface for inference.
Monitoring: Prometheus metrics and Grafana dashboards.
Dockerized: All components containerized for easy deployment.

Setup & Installation

Prerequisites

Docker & Docker Compose
NVIDIA GPU + drivers (for training)
Python 3.8+ (for local training)
Node.js 18+ (for frontend dev)

Quick Start (All-in-One)

cd deployment
docker-compose up --build

Frontend: http://localhost:3000
Prometheus: http://localhost:9090
Grafana: http://localhost:3001 (admin/admin)

Training Pipeline

Data Preparation: training/prepare_data.py downloads and processes LibriSpeech audio, extracts log-mel spectrograms, and tokenizes text using GPT-2 tokenizer.
Model Training: training/train_model.py trains the MultimodalMambaModel (see below) on the processed data, logs metrics to TensorBoard, and saves checkpoints.
Quick Training: training/quick_train.py provides a minimal example for rapid prototyping.
ONNX Export: Export trained models to ONNX format for serving.

Example: Run Training

cd training
python prepare_data.py
python train_model.py
python export_onnx.py

Model Architecture

Defined in training/multimodal_mamba.py:

Audio Encoder: 1D Conv layers project mel spectrograms to model dimension.
Text Embedding: Standard embedding layer for tokenized text.
Mamba SSM Blocks: Two stacked Mamba blocks for sequence modeling.
Output Projection: Linear layer to vocabulary size.

class MultimodalMambaModel(nn.Module):
    ...
    def forward(self, audio_features, text_tokens):
        # Encode audio and text, concatenate, pass through Mamba blocks, project to vocab

Inference & Serving

Go gRPC Server (serving/go-server/): Loads ONNX model, exposes gRPC API for inference and health checks, and serves Prometheus metrics.
gRPC-Web Proxy: Converts HTTP requests from the frontend to gRPC calls.
ONNX Models: Place exported models in models/onnx/.

API Endpoints

/api/inference (POST): Accepts audio data and text prompt, returns generated text and confidence.
/api/health (GET): Health check endpoint.

Frontend

Framework: Next.js (TypeScript, Tailwind CSS)
Location: frontend/nextjs-client/
Dev Start:

cd frontend/nextjs-client
npm install
npm run dev

Build & Serve (Docker): Handled by Dockerfile.frontend and docker-compose.

Monitoring

Prometheus: Scrapes metrics from inference server.
Grafana: Pre-configured dashboards for inference latency, request counts, etc.
Config: See deployment/prometheus.yml and deployment/grafana/.

Deployment (Docker Compose)

All services are orchestrated via deployment/docker-compose.yml:

inference-server: gRPC server (Go, ONNX)
grpc-web-proxy: HTTP-to-gRPC bridge
frontend: Next.js app
training: Model training (GPU required)
prometheus: Metrics collection
grafana: Visualization

Example

cd deployment
docker-compose up --build

Directory Overview

data/: Processed datasets (e.g., synthetic_processed.pt)
models/onnx/: Exported ONNX models for serving
models/checkpoints/: PyTorch model checkpoints
training/: All training, data prep, and evaluation scripts
serving/go-server/: Go gRPC server and proxy
frontend/nextjs-client/: Next.js frontend app
deployment/: Dockerfiles, docker-compose, monitoring configs

Requirements

See requirements.txt for Python dependencies, including:

torch, torchvision, torchaudio
mamba-ssm
transformers, datasets
librosa, soundfile
onnx, onnxruntime
tensorboard, prometheus-client
jupyter, ipywidgets

License

MIT License (or specify your license here)

For more details, see the code in each subdirectory and the comments in the scripts.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SSM Multimodal MVP

Table of Contents

Project Structure

Features

Setup & Installation

Prerequisites

Quick Start (All-in-One)

Training Pipeline

Example: Run Training

Model Architecture

Inference & Serving

API Endpoints

Frontend

Monitoring

Deployment (Docker Compose)

Example

Directory Overview

Requirements

License

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
deployment		deployment
frontend		frontend
models/onnx		models/onnx
protoc-25.3		protoc-25.3
serving		serving
training		training
README.md		README.md
protoc-25.3-linux-x86_64.zip		protoc-25.3-linux-x86_64.zip
requirements.txt		requirements.txt

ManiBAJPAI22/SSM

Folders and files

Latest commit

History

Repository files navigation

SSM Multimodal MVP

Table of Contents

Project Structure

Features

Setup & Installation

Prerequisites

Quick Start (All-in-One)

Training Pipeline

Example: Run Training

Model Architecture

Inference & Serving

API Endpoints

Frontend

Monitoring

Deployment (Docker Compose)

Example

Directory Overview

Requirements

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages