Gen AI Training Models and API

A comprehensive Generative AI platform providing training models and RESTful API for:

Language Models (GPT-2 fine-tuning, RL-based formatting)
Computer Vision (CNN, GAN, Diffusion, Energy-based models)
Natural Language Processing (Bigram models, embeddings, text generation)

Quick Start

🍎 Mac Users (Recommended - Native MPS Support)

For Mac users, we recommend using command-line training scripts directly to take advantage of Apple Silicon (M1/M2/M3) GPU acceleration via MPS (Metal Performance Shaders). Docker does not support native MPS acceleration.

Setup Python Environment:

# Create and activate virtual environment
python -m venv .venv
source .venv/bin/activate

# Install dependencies
cd genai
pip install -e .

Train Models Using Commands (see training sections below):

# Example: Train CNN on CIFAR-10 (uses MPS automatically)
python genai/commands/train_cnn.py --epochs 10 --batch_size 64 --dataset cifar10

# Example: Fine-tune GPT-2 (uses MPS automatically)
python genai/commands/fine_tune_gpt2.py --epochs 3 --batch_size 8

Note: All training scripts automatically detect and use MPS on Mac, then fall back to CUDA or CPU if unavailable.

🐳 Docker Users (Linux/Windows)

For Linux/Windows users or when you need the API server:

# Start all services
docker-compose up --build -d

# Access the services:
# FastAPI: http://localhost:8888
# FastAPI Docs: http://localhost:8888/docs
# Jupyter: http://localhost:8889

Important Docker Limitations:

⚠️ MPS (Apple Silicon GPU) is NOT supported in Docker - Docker containers cannot access Mac's native GPU
✅ CUDA works fine in Docker on Linux/Windows systems with NVIDIA GPUs
⚠️ CPU training is very slow for LLM and CNN models - not recommended for production training
💡 Mac users should use native command-line training for best performance

📦 Automatic Downloads

Note: This repository does not include large dataset files or pre-trained model checkpoints. All datasets (CIFAR-10, MNIST, etc.) and model weights will be automatically downloaded when you run the training scripts or use the API endpoints. The first run may take longer as datasets are downloaded.

Datasets: Automatically downloaded from official sources (PyTorch datasets) on first use
Pre-trained Models: Download from HuggingFace or train from scratch using the provided scripts
Model Checkpoints: Created automatically during training and saved to genai/checkpoints/

Project Structure

This repository contains the following structure:

gen-ai/
├── .gitignore                          # Git ignore file
├── .python-version                     # Python version specification
├── .venv/                              # Virtual environment directory
├── README.md                           # Project documentation
├── docker-compose.yml                  # Docker Compose configuration for multi-service setup
├── pytest.ini                         # Pytest configuration file
├── start_jupyter.bat                   # Windows Jupyter startup script
├── start_jupyter.py                    # Python Jupyter startup script
├── start_jupyter.sh                    # Unix/Linux Jupyter startup script
├── jupyter/                            # Jupyter Notebook service
│   ├── Dockerfile                      # Docker container configuration for Jupyter
│   ├── README.md                       # Jupyter service documentation
│   ├── requirements.txt                # Python dependencies for Jupyter notebooks
│   └── workspace/                      # Jupyter workspace directory
└── genai/                              # Main application directory
    ├── .dockerignore                   # Docker ignore file
    ├── .env.example                    # Environment variables template
    ├── API_DOCUMENTATION.md            # API documentation
    ├── Dockerfile                      # Docker container configuration
    ├── README.md                       # Application-specific documentation
    ├── bigram_model.py                 # Bigram language model implementation
    ├── main.py                         # FastAPI application entry point (router-based)
    ├── pyproject.toml                  # Python project configuration and dependencies
    ├── start.sh                        # Application startup script
    ├── commands/                       # Command-line training scripts
    │   ├── train_cnn.py                # Script to train CNN models
    │   ├── train_gan.py                # Script to train GAN models
    │   ├── train_diffusion.py          # Script to train Diffusion model (CIFAR-10)
    │   ├── train_energy.py             # Script to train Energy-Based model (CIFAR-10)
    │   ├── train_llm_finetune.py       # Script to fine-tune LLM on custom datasets
    │   ├── train_llm_format.py         # Script to post-train LLM with RL for formatting
    │   ├── train_simple_lm.py          # Script to train simple GPT-style language model
    │   └── fine_tune_gpt2.py           # Fine-tune GPT-2 on SQuAD QA pairs
    ├── checkpoints/                    # Model checkpoints and saved models
    │   ├── cnn/                        # CNN model checkpoints
    │   ├── gan_{dataset}/              # GAN model checkpoints by dataset
    │   ├── diffusion_cifar/            # Diffusion model checkpoints (latest + final)
    │   ├── energy_cifar/               # Energy model checkpoints (latest + final)
    │   ├── llm_finetuned/              # Fine-tuned LLM checkpoints
    │   ├── llm_rl/                     # RL-trained language model checkpoints
    │   └── rl_gpt_model.pth       # RL-fine-tuned GPT model
    ├── results/                        # Generated sample images
    │   ├── diffusion_cifar/            # Diffusion outputs (grid + individual + originals)
    │   └── energy_cifar/               # Energy outputs (grid + individual + originals)
    ├── result/                         # Result folder for output from functions
    ├── help_lib/                       # Core helper modules (replaces legacy functions/)
    │   ├── __init__.py                 # Package initialization
    │   ├── checkpoints.py              # Model checkpoint utilities
    │   ├── cifar10_classifier.py       # CIFAR-10 training/prediction orchestration
    │   ├── data_loader.py              # Dataset and dataloader utilities (CIFAR-10)
    │   ├── embeddings.py               # Text embedding helpers
    │   ├── evaluator.py                # Training/evaluation loops
    │   ├── generator.py                # Text generation helpers
    │   ├── model.py                    # Model factory and optimizer setup
    │   ├── neural_networks.py          # Simple/Enhanced/Assignment CNNs and utils
    │   ├── probability.py              # Probability utilities
    │   ├── text_processing.py          # Text preprocessing helpers
    │   ├── trainer.py                  # Generic train/eval history collection
    │   └── utils.py                    # Shared utilities
    ├── models/                         # API data models and CNN model factory
    │   ├── __init__.py                 # Package initialization
    │   ├── requests.py                 # Request data models
    │   ├── responses.py                # Response data models
    │   ├── cnn_models.py               # Practical CNN architectures and factory
    │   └── energy_diffusion_models.py  # Diffusion UNet + Energy model and trainers
    └── routers/                        # FastAPI routers for organized API endpoints
        ├── __init__.py                 # Package initialization
        ├── probability.py              # Probability and statistics API endpoints
        ├── embedding.py                # Text processing and embedding API endpoints
        ├── neural_networks.py          # Neural networks, CIFAR-10 & GAN API endpoints
        └── llm.py                      # Language model training and generation API endpoints

Key Files Description

docker-compose.yml: Orchestrates multiple services including the FastAPI application and Jupyter notebook server
pytest.ini: Configuration file for pytest testing framework
.python-version: Specifies the Python version for the project
.venv/: Virtual environment directory for local development
start_jupyter.*: Cross-platform scripts for starting Jupyter notebook server
tests/: Directory for unit tests (pytest compatible)
jupyter/: Jupyter notebook service containing:
- Dockerfile: Container configuration for Jupyter Lab/Notebook environment
- requirements.txt: Python dependencies for data science and machine learning
- workspace/: Mounted workspace directory
- README.md: Jupyter service specific documentation
genai/main.py: Main FastAPI application entry point with router-based architecture
genai/bigram_model.py: Implementation of a bigram language model for text generation
genai/pyproject.toml: Modern Python project configuration with dependencies and build settings
genai/Dockerfile: Container configuration for building the FastAPI application image
genai/start.sh: Shell script for starting the application
genai/.env.example: Template for environment variables configuration
genai/API_DOCUMENTATION.md: Comprehensive API documentation
genai/test_routers.py: Test script for router-based API endpoints
genai/commands/: Command-line training scripts:
- train_cnn.py: Script to train CNN models on CIFAR-10
- train_gan.py: Script to train GAN models for image generation
- train_diffusion.py: Train a UNet-based diffusion model on CIFAR-10
- train_energy.py: Train an energy-based model on CIFAR-10
- train_simple_lm.py: Train a simple GPT-style word-level language model (Transformer)
genai/checkpoints/: Model checkpoints and saved models:
- cnn_{dataset}/: CNN model checkpoints by dataset
- gan_{dataset}/: GAN model checkpoints by dataset
- diffusion_cifar/: Diffusion model (latest_checkpoint.pth, diffusion_cifar.pth)
- energy_cifar/: Energy model (latest_checkpoint.pth, energy_cifar.pth)
genai/help_lib/: Core helper modules for NLP and vision:
- data_loader.py: CIFAR-10 transforms and dataloaders
- neural_networks.py: Includes SimpleCNN, EnhancedCNN, and custom CNN architectures
- model.py: Model factory for CNN architectures
- trainer.py, evaluator.py: Training loops and metrics history
- embeddings.py, text_processing.py, probability.py, utils.py
genai/models/: Data models and CNN factory for API:
- requests.py: API request data structures
- responses.py: API response data structures
- cnn_models.py: Practical CNNs (Simple, Enhanced, Flexible) and factory
- energy_diffusion_models.py: Diffusion UNet, Energy model, trainers, and dataloaders
- simple_lm.py: GPT-style word-level language model (Transformer)
- rl_gpt_model.py: RL environment and policy for GPT formatting
genai/routers/: FastAPI routers for organized API endpoints:
- probability.py: Probability and statistics API endpoints (/probability/*)
- embedding.py: Text processing and embedding API endpoints (/embedding/*)
- neural_networks.py: Neural networks, CIFAR-10 & GAN API endpoints (/neural-networks/*)
- llm.py: Language model training and generation API endpoints (/llm/*)

Training Models

🖼️ Computer Vision Models

Train CNN on CIFAR-10

# Mac (uses MPS automatically)
python genai/commands/train_cnn.py \
  --model_type simple \
  --dataset cifar10 \
  --epochs 10 \
  --batch_size 64 \
  --lr 0.001

# Specify device manually (optional)
python genai/commands/train_cnn.py --epochs 10 --device mps  # Mac
python genai/commands/train_cnn.py --epochs 10 --device cuda  # Linux/Windows with NVIDIA GPU

Outputs:

Checkpoints: genai/checkpoints/cnn_cifar10/
Model file: genai/checkpoints/cnn_cifar10/cnn_cifar10.pth

Train GAN on MNIST

python genai/commands/train_gan.py \
  --dataset mnist \
  --epochs 20 \
  --batch_size 64 \
  --device mps  # Auto-detected on Mac

Outputs:

Checkpoints: genai/checkpoints/gan_mnist/
Generated samples saved during training

Train Diffusion Model (CIFAR-10)

python genai/commands/train_diffusion.py \
  --epochs 50 \
  --batch_size 128 \
  --lr 0.001 \
  --diffusion_steps 200

Outputs:

Checkpoints: genai/checkpoints/diffusion_cifar/
Samples: genai/results/diffusion_cifar/ (grid: generated_samples.png, individuals in individual/, and originals in original/ + original_128/)

Note: Increase --diffusion_steps (e.g., 500–1000) for higher-quality samples.

Train Energy Model (CIFAR-10)

python genai/commands/train_energy.py \
  --epochs 50 \
  --batch_size 128 \
  --lr 0.0001

Outputs:

Checkpoints: genai/checkpoints/energy_cifar/
Samples: genai/results/energy_cifar/ (grid + individual + originals)

Device Auto-Detection:

Mac: Automatically uses mps (Metal Performance Shaders) for GPU acceleration
Linux/Windows: Uses cuda if NVIDIA GPU available, otherwise cpu
CPU training is very slow for CNN and LLM models - use GPU when possible

📝 Language Models

Simple Word-level Language Model (Baseline)

Train a tiny GPT-style Transformer LM from scratch on plain text (word-level). Good for quick experiments and understanding training loops.

python genai/commands/train_simple_lm.py \
  --data_file path/to/text.txt \
  --epochs 5 \
  --seq_len 30 \
  --batch_size 64 \
  --vocab_size 8000 \
  --embedding_dim 256 \
  --hidden_dim 512 \
  --num_layers 1 \
  --dropout 0.1 \
  --learning_rate 3e-4 \
  --output_dir genai/checkpoints/simple_lm

Outputs:

Checkpoint: genai/checkpoints/simple_lm/simple_lm.pt
Vocabulary: genai/checkpoints/simple_lm/vocab.json

Notes:

If --data_file is omitted, a small built‑in sample text is used
Preprocessing: lowercase, remove punctuation (keep spaces), whitespace tokenization
Device auto-detect: prefers Apple mps, then cuda, else cpu
CPU training is very slow - use GPU (MPS/CUDA) when available

Quick greedy generation example (Python):

import json, torch
from genai.models.simple_lm import SimpleLanguageModel

ckpt_dir = "genai/checkpoints/simple_lm"
with open(f"{ckpt_dir}/vocab.json", "r", encoding="utf-8") as f:
    vocab = json.load(f)
id_to_token = {i: t for t, i in vocab.items()}

# Use the SAME dims you trained with (defaults shown)
model = SimpleLanguageModel(
    vocab_size=len(vocab),
    embedding_dim=256,
    hidden_dim=512,
    num_layers=1,
    dropout=0.1,
)
model.load_state_dict(torch.load(f"{ckpt_dir}/simple_lm.pt", map_location="cpu"))
model.eval()

def preprocess(text: str):
    import re
    text = re.sub(r"[^a-zA-Z0-9\\s]", " ", text)
    text = re.sub(r"\\s+", " ", text)
    return text.lower().strip().split()

def encode(tokens): return [vocab.get(t, vocab.get("<UNK>", 1)) for t in tokens]
def decode(ids):    return " ".join(id_to_token.get(i, "<UNK>") for i in ids)

def generate(prompt: str, max_new_tokens: int = 30):
    tokens = encode(preprocess(prompt))
    for _ in range(max_new_tokens):
        inp = torch.tensor([tokens], dtype=torch.long)
        with torch.no_grad():
            logits = model(inp)              # (1, T, V)
            next_id = int(logits[0, -1].argmax().item())
        tokens.append(next_id)
    return decode(tokens)

print(generate("in the beginning"))

GPT-2 Fine-Tuning on SQuAD (Question-Answering)

Fine-tune GPT-2 on SQuAD dataset for question-answering tasks. Recommended for Mac users - uses MPS automatically for faster training.

# Mac (uses MPS automatically - recommended)
python genai/commands/fine_tune_gpt2.py \
  --model_name openai-community/gpt2 \
  --dataset_name squad \
  --train_samples 10000 \
  --eval_samples 2000 \
  --epochs 3 \
  --batch_size 8 \
  --output_dir genai/checkpoints/gpt2_squad

Outputs:

Checkpoints: genai/checkpoints/gpt2_squad/
Best model: genai/checkpoints/gpt2_squad/best/

Performance Notes:

⚡ MPS (Mac): Fast training on Apple Silicon
⚡ CUDA (Linux/Windows): Fast training on NVIDIA GPUs
⚠️ CPU: Very slow - not recommended for LLM training

LLM Fine-Tuning on Custom Dataset

Fine-tune any HuggingFace model on custom datasets:

python genai/commands/train_llm_finetune.py \
  --model_name gpt2 \
  --epochs 3 \
  --batch_size 4 \
  --lr 5e-5 \
  --output_dir genai/checkpoints/llm_finetuned

RL Post-Training With Format Control

Train a model to enforce a specific answer format using reinforcement learning:

python genai/commands/train_llm_format.py \
  --epochs 150 \
  --lr 1e-5 \
  --base_model_path genai/checkpoints/gpt2_squad \
  --target_prefix "That is a great question" \
  --target_suffix "let me know if you have any other questions"

Outputs:

Checkpoint: genai/checkpoints/rl_gpt_formatting.pth

🌐 API Endpoints (Docker/Linux/Windows)

For API-based training and inference, use the Docker setup (see Docker section below). Note that:

API endpoints work fine with CUDA on Linux/Windows
CPU-based API training is very slow for LLM and CNN models
Mac users should prefer command-line training for best performance

API endpoints available:

POST /llm/gpt2-finetune – GPT-2 QA fine-tuning
POST /llm/gpt2-generate – Generate from fine-tuned GPT-2
POST /llm/rl-train – RL post-training
POST /llm/rl-generate – Generate using RL-formatted policy
See genai/API_DOCUMENTATION.md for full API reference

Development Setup

🍎 Mac Setup (Recommended for Training)

Python Environment Setup:

# Create virtual environment
python -m venv .venv

# Activate virtual environment
source .venv/bin/activate

# Install dependencies
cd genai
pip install -e .

Train Models (see training sections above):

# All scripts automatically use MPS on Mac
python genai/commands/train_cnn.py --epochs 10
python genai/commands/fine_tune_gpt2.py --epochs 3

Start Jupyter Notebook (optional):

./start_jupyter.sh
# Or: python start_jupyter.py

🐧 Linux/Windows Setup

Python Environment Setup:

# Create virtual environment
python -m venv .venv

# Activate virtual environment
# On Linux:
source .venv/bin/activate
# On Windows:
.venv\Scripts\activate

# Install dependencies
cd genai
pip install -e .

Train Models:

# Uses CUDA automatically if available, otherwise CPU
python genai/commands/train_cnn.py --epochs 10 --device cuda

Note: CPU training is very slow for LLM and CNN models. Use CUDA (NVIDIA GPU) when available.

Python Version

This project uses Python as specified in .python-version. Make sure you have the correct Python version installed.

Docker Setup

Multi-Service Docker Compose

The project uses Docker Compose to orchestrate multiple services:

FastAPI Service (genai-fast-api): API server on port 8888
Jupyter Service (genai-jupyter): Notebook server on port 8889

# Start all services
docker-compose up --build -d

# Access the services:
# FastAPI: http://localhost:8888
# FastAPI Docs: http://localhost:8888/docs
# Jupyter: http://localhost:8889

⚠️ Important Docker Limitations

For Mac Users:

❌ MPS (Apple Silicon GPU) is NOT supported in Docker
Docker containers cannot access Mac's native GPU acceleration
Recommendation: Use native command-line training scripts for best performance on Mac

For Linux/Windows Users:

✅ CUDA works fine in Docker on systems with NVIDIA GPUs
⚠️ CPU training is very slow for LLM and CNN models
Use --gpus all flag when running Docker containers for GPU access:
```
docker run --gpus all -p 8888:8888 genai-fastapi
```

Performance Comparison:

MPS (Mac native): ⚡ Fast
CUDA (Linux/Windows): ⚡ Fast
CPU (Docker or no GPU): 🐌 Very slow - not recommended for training

Build the FastAPI Docker Image

You can build the FastAPI Docker image from different locations:

Option 1: Build from the project root directory (recommended):

cd /path/to/gen-ai
docker build -t genai-fastapi -f ./genai/Dockerfile ./genai

Option 2: Build from the genai directory:

cd /path/to/gen-ai/genai
docker build -t genai-fastapi .

Option 3: Build from root with different context:

cd /path/to/gen-ai
docker build -t genai-fastapi --build-arg BUILD_CONTEXT=root -f ./genai/Dockerfile .

Run the FastAPI Container

To run the FastAPI container directly:

docker run -p 8888:8888 genai-fastapi

This will start the FastAPI app on port 8888. You can access the API at:

http://localhost:8888

And view the interactive API documentation at:

http://localhost:8888/docs

Jupyter Docker Setup

Build the Jupyter Docker Image

# From project root
docker build -t genai-jupyter -f ./jupyter/Dockerfile ./jupyter

# Or from jupyter directory
cd jupyter
docker build -t genai-jupyter .

Run the Jupyter Container

docker run -p 8889:8888 -v $(pwd)/jupyter/workspace:/home/jovyan genai-jupyter

This will start the Jupyter server on port 8889. You can access it at:

http://localhost:8889

Stopping the Services

To stop all running containers:

# Using Docker Compose
docker-compose down

# Using Make
make down

For troubleshooting, ensure your pyproject.toml is in the correct path and your Dockerfile matches the build context.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
genai		genai
jupyter		jupyter
.DS_Store		.DS_Store
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
pytest.ini		pytest.ini
start_jupyter.bat		start_jupyter.bat
start_jupyter.py		start_jupyter.py
start_jupyter.sh		start_jupyter.sh

License

hyper07/gen-ai

Folders and files

Latest commit

History

Repository files navigation

Gen AI Training Models and API

Quick Start

🍎 Mac Users (Recommended - Native MPS Support)

🐳 Docker Users (Linux/Windows)

📦 Automatic Downloads

Project Structure

Key Files Description

Training Models

🖼️ Computer Vision Models

Train CNN on CIFAR-10

Train GAN on MNIST

Train Diffusion Model (CIFAR-10)

Train Energy Model (CIFAR-10)

📝 Language Models

Simple Word-level Language Model (Baseline)

GPT-2 Fine-Tuning on SQuAD (Question-Answering)

LLM Fine-Tuning on Custom Dataset

RL Post-Training With Format Control

🌐 API Endpoints (Docker/Linux/Windows)

Development Setup

🍎 Mac Setup (Recommended for Training)

🐧 Linux/Windows Setup

Python Version

Docker Setup

Multi-Service Docker Compose

⚠️ Important Docker Limitations

Build the FastAPI Docker Image

Run the FastAPI Container

Jupyter Docker Setup

Build the Jupyter Docker Image

Run the Jupyter Container

Stopping the Services

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages