A fitness assistant chatbot with Retrieval-Augmented Generation (RAG). Upload your own workout plans, training programs, or nutrition guides and the assistant will ground its answers in your documents — citing the source inline.
Built with Next.js 16, LangChain, PostgreSQL + pgvector, and the Vercel AI SDK.
- Demo
- Features
- Architecture
- Project Structure
- Database Schema
- API Reference
- RAG Pipeline
- Prompts System
- UI Components
- Getting Started
- Configuration
- Scripts
- Tech Stack
- Streaming chat powered by any OpenRouter model (defaults to Llama 3.3 70B Instruct free tier)
- RAG pipeline — PDF, TXT, and Markdown files are chunked, embedded locally with a HuggingFace model, and stored in PostgreSQL with pgvector
- Source citations — each assistant reply shows which uploaded documents were used as source badges
- Multi-turn retrieval — the last three user turns are combined into the retrieval query so follow-up questions still find the right context
- Knowledge base management — upload and delete documents via a dedicated UI page
- Drag-and-drop upload with real-time progress tracking (upload phase + embedding phase)
- Markdown rendering — assistant replies render full markdown: lists, code blocks, tables, blockquotes
- Local embeddings — no external embedding API needed;
Xenova/all-MiniLM-L6-v2runs in-process via@huggingface/transformers - Graceful degradation — if the database is unavailable, chat still works using the model's general knowledge
User message
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ POST /api/chat │
│ │
│ 1. Build retrieval query from last 3 user turns │
│ 2. retrieveRelevantChunks() ◄── pgvector HNSW cosine search │
│ top-5 chunks, min similarity score 0.3 │
│ 3. buildSystemPrompt() ◄── injects context into template │
│ 4. streamText() via OpenRouter (Llama 3.3 70B or custom model) │
│ 5. Write source-document stream parts (Vercel AI SDK v6) │
│ + merge text delta stream │
└─────────────────────────────────────────────────────────────────┘
│
▼ (UIMessage stream to client)
┌─────────────────────────────────────────────────────────────────┐
│ Client │
│ - Source badges rendered from source-document parts │
│ - Markdown response rendered via react-markdown + remark-gfm │
│ - Typing indicator during streaming │
└─────────────────────────────────────────────────────────────────┘
File upload (PDF / TXT / MD)
│
▼
PDFLoader (LangChain) or plain-text parser
│
▼
RecursiveCharacterTextSplitter
chunk size: 1000 chars | overlap: 200 chars
│
├──► Insert metadata row into `documents` (Drizzle)
│
▼
Xenova/all-MiniLM-L6-v2 ◄── runs locally, no API key
384-dimensional float32 vectors
│
▼
PGVectorStore → `langchain_chunks` table (PostgreSQL)
HNSW index on embedding column (cosine distance)
│
└── on failure: rollback `documents` row (no orphaned records)
team-sheep-ai/
├── src/
│ ├── app/
│ │ ├── api/
│ │ │ ├── chat/
│ │ │ │ └── route.ts # Streaming chat endpoint
│ │ │ └── knowledge/
│ │ │ ├── upload/route.ts # Multipart file upload + ingest
│ │ │ ├── list/route.ts # List documents
│ │ │ └── delete/route.ts # Delete document + chunks
│ │ ├── knowledge/
│ │ │ └── page.tsx # Knowledge management page (server component)
│ │ ├── layout.tsx # Root layout (sidebar nav, fonts)
│ │ ├── page.tsx # Chat page
│ │ └── globals.css # Tailwind v4 + CSS variables (oklch theme)
│ ├── components/
│ │ ├── chat/
│ │ │ ├── ChatInterface.tsx # Main chat container (useChat hook)
│ │ │ ├── MessageList.tsx # Message renderer (markdown + source badges)
│ │ │ └── ChatInput.tsx # Textarea input with send/stop controls
│ │ ├── knowledge/
│ │ │ ├── DocumentUpload.tsx # Drag-and-drop uploader with progress
│ │ │ └── DocumentList.tsx # Document table with delete buttons
│ │ └── ui/ # shadcn-based primitives
│ │ ├── button.tsx
│ │ ├── input.tsx
│ │ ├── textarea.tsx
│ │ ├── card.tsx
│ │ ├── badge.tsx
│ │ └── separator.tsx
│ └── lib/
│ ├── ai/
│ │ ├── client.ts # OpenRouter provider configuration
│ │ └── prompts.ts # System prompt builder (template + JSON config)
│ ├── db/
│ │ ├── index.ts # pg.Pool + Drizzle singleton
│ │ ├── schema.ts # Drizzle schema (documents table)
│ │ └── migrate.ts # Migration runner (pgvector ext + Drizzle)
│ ├── rag/
│ │ ├── embeddings.ts # HuggingFace Transformers singleton
│ │ ├── vectorStore.ts # PGVectorStore (lazy init, retry on failure)
│ │ ├── ingest.ts # Ingestion pipeline (load → split → embed → store)
│ │ └── retrieval.ts # Similarity search + context formatter
│ └── utils.ts # cn() helper (clsx + tailwind-merge)
├── prompts/
│ ├── answerPrompt.json # Prompt configuration (role, instructions, tone)
│ └── template.txt # System prompt template with {placeholders}
├── drizzle/
│ └── 0000_stiff_electro.sql # Initial migration (documents table, HNSW index)
├── docker-compose.yml # PostgreSQL + pgvector service
├── drizzle.config.ts # Drizzle Kit configuration
├── next.config.ts # Next.js config (serverExternalPackages)
└── .env.local.example # Environment variable template
| Column | Type | Notes |
|---|---|---|
id |
uuid |
Primary key, auto-generated |
name |
text |
Original filename |
file_type |
text |
MIME type (e.g. application/pdf) |
size_bytes |
integer |
File size in bytes |
created_at |
timestamp |
Upload time, defaults to now() |
| Column | Type | Notes |
|---|---|---|
id |
uuid |
Primary key |
content |
text |
Chunk text |
embedding |
vector(384) |
384-dimensional float32 embedding |
metadata |
jsonb |
{ documentId, documentName, chunkIndex } |
created_at |
timestamp |
Auto-set on insert |
An HNSW index is created on the embedding column using cosine distance for fast approximate nearest-neighbour search.
Streams a chat response with source citations.
Request body:
{
"messages": [
{ "id": "1", "role": "user", "parts": [{ "type": "text", "text": "What are the best recovery exercises?" }] }
]
}Response: A UIMessageStream (Vercel AI SDK v6) that emits:
source-documentparts — one per unique source document referenced- Text delta parts — streamed assistant response
Error handling:
- Rate-limited →
429with friendly message - Timeout →
408with friendly message - Context too long →
413with friendly message - DB unavailable → chat continues with general model knowledge (graceful degradation)
Max duration: 60 seconds
Uploads and ingests a document into the knowledge base.
Request: multipart/form-data with a file field.
Constraints:
- Accepted types:
application/pdf,text/plain,text/markdown - Max size: 10 MB
Response:
{
"success": true,
"documentId": "550e8400-e29b-41d4-a716-446655440000",
"chunkCount": 42
}Returns all documents in the knowledge base, newest first.
Response:
{
"documents": [
{
"id": "550e8400-...",
"name": "stronglifts-5x5.pdf",
"fileType": "application/pdf",
"sizeBytes": 204800,
"createdAt": "2026-03-13T10:00:00.000Z"
}
]
}Deletes a document and all its associated vector chunks.
Request body:
{ "id": "550e8400-e29b-41d4-a716-446655440000" }Response:
{ "success": true }Chunks are deleted via a raw SQL DELETE ... WHERE metadata->>'documentId' = $1 query against langchain_chunks, then the documents row is removed.
- Load — PDFLoader (LangChain + pdf-parse) for
.pdffiles; plainBuffer.toString('utf-8')for.txt/.mdfiles - Split —
RecursiveCharacterTextSplitterwith 1000-char chunk size and 200-char overlap; each chunk carries{ documentId, documentName, chunkIndex }metadata - Persist metadata — A row is inserted into the
documentstable via Drizzle before embedding starts - Embed —
Xenova/all-MiniLM-L6-v2(384 dimensions) runs locally via@huggingface/transformers; no external API call required - Store vectors —
PGVectorStore.addDocuments()writes chunks + embeddings tolangchain_chunks - Rollback on failure — if step 4–5 fails, the
documentsrow inserted in step 3 is deleted to keep state consistent
- Build query — Concatenates the content of the last three user messages to preserve conversational context for follow-up questions
- Similarity search —
PGVectorStore.similaritySearchWithScore()runs an HNSW cosine-similarity search against all stored vectors - Filter — Only chunks with similarity score ≥ 0.3 are kept; top 5 chunks are returned
- Format context — Chunks are formatted as numbered excerpts with source attribution and injected into the system prompt under
<context>…</context> - Source deduplication — Unique document names are extracted and written as
source-documentstream parts before text streaming begins, so the client can render source badges immediately
The system prompt is built from two files in the prompts/ directory:
Defines the assistant's persona and behaviour:
{
"metadata": { "app": "Team Sheep AI", "version": "1.0", "category": "fitness" },
"role": "Expert fitness assistant specialising in workouts, training programs, and nutrition",
"task": "Answer questions about fitness, workouts, training, and nutrition",
"instructions": [
"Answer exclusively about fitness topics",
"Politely redirect non-fitness questions",
"Prioritise context from uploaded documents over general knowledge",
"Cite document sources when using uploaded context",
"Use structured format (bullets, numbered lists) for workout plans",
"Include sets/reps for workouts and portions for nutrition advice",
"Be encouraging and ground answers in exercise science"
],
"constraints": {
"language": "Match the user's language, default to English",
"tone": "encouraging and professional",
"format": "Clear text with structured formatting"
}
}The string template that src/lib/ai/prompts.ts populates at request time:
You are {role}.
**Task:** {task}
**Tone and format:** {tone}. Use {format}.
**Instructions:**
{instructions}
**Knowledge base context (from uploaded documents):**
<context>
{context}
</context>
Answer using context when relevant. If context is not relevant, answer from general fitness knowledge.
If no relevant chunks are retrieved, {context} is replaced with (No relevant excerpts from uploaded documents.) and the model falls back to its general fitness knowledge.
| Component | Description |
|---|---|
ChatInterface |
Owns the useChat() state; handles auto-scroll and streaming status |
MessageList |
Renders user and assistant turns; parses source-document parts into badges; shows typing indicator (three bouncing dots) while streaming |
ChatInput |
Textarea with Enter-to-send (Shift+Enter for newline); shows a Stop button while streaming; includes an AI disclaimer |
Markdown support in assistant messages: headings, bold/italic, unordered/ordered lists, inline code, fenced code blocks, tables, and blockquotes — all styled with Tailwind.
| Component | Description |
|---|---|
DocumentUpload |
Drag-and-drop zone + click-to-browse; uses XMLHttpRequest for upload progress tracking; two-phase progress bar (uploading → processing/embedding) |
DocumentList |
Table of uploaded documents showing name, type badge, size, upload date, and a delete button |
The knowledge page is a server component that queries the database directly and passes document data to the client components as props. If the database is unavailable it returns an empty list instead of erroring.
- Node.js 20+
- Docker (for PostgreSQL + pgvector)
- An OpenRouter API key
git clone <repo-url>
cd team-sheep-ai
npm installcp .env.local.example .env.localEdit .env.local:
# Required
OPENROUTER_API_KEY=your_openrouter_api_key
DATABASE_URL=postgresql://postgres:postgres@localhost:5432/team_sheep_ai
# Optional — defaults shown
OPENROUTER_MODEL=meta-llama/llama-3.3-70b-instruct:free
EMBEDDING_MODEL=Xenova/all-MiniLM-L6-v2docker compose up -dThis starts a pgvector/pgvector:pg17 container on port 5432 with a persistent pgdata volume.
npm run db:migrateCreates the pgvector extension, the documents table, and the HNSW index. The langchain_chunks table is created automatically by PGVectorStore on first use.
npm run devOpen http://localhost:3000.
- Go to Knowledge → upload a PDF or text document (workout plan, nutrition guide, etc.)
- Return to Chat and ask a question — the assistant will answer using your documents and cite them inline
| Variable | Required | Default | Description |
|---|---|---|---|
OPENROUTER_API_KEY |
Yes | — | API key from openrouter.ai |
DATABASE_URL |
Yes | — | PostgreSQL connection string |
OPENROUTER_MODEL |
No | meta-llama/llama-3.3-70b-instruct:free |
Any model available on OpenRouter |
EMBEDDING_MODEL |
No | Xenova/all-MiniLM-L6-v2 |
Any ONNX-compatible HuggingFace sentence-transformer |
Set OPENROUTER_MODEL to any model slug from the OpenRouter models list. Examples:
OPENROUTER_MODEL=openai/gpt-4o
OPENROUTER_MODEL=anthropic/claude-3.5-sonnet
OPENROUTER_MODEL=google/gemini-flash-1.5The embedding model must be compatible with @huggingface/transformers (ONNX format). It must output vectors of a consistent dimension — if you change the model after documents have already been ingested you will need to re-upload them because the vector dimensions will not match.
| Command | Description |
|---|---|
npm run dev |
Start Next.js development server |
npm run build |
Production build |
npm run start |
Start production server |
npm run lint |
Run ESLint |
npm run db:generate |
Generate a new Drizzle migration from schema changes |
npm run db:migrate |
Apply all pending migrations |
| Layer | Technology |
|---|---|
| Framework | Next.js 16 (App Router) |
| Language | TypeScript 5 (strict mode) |
| AI streaming | Vercel AI SDK 6 (@ai-sdk/react, ai) |
| LLM provider | OpenRouter via @openrouter/ai-sdk-provider |
| Embeddings | @huggingface/transformers — runs locally, no API key |
| RAG framework | LangChain (langchain, @langchain/community, @langchain/core) |
| Database | PostgreSQL 17 + pgvector |
| ORM | Drizzle ORM + Drizzle Kit |
| UI primitives | shadcn/ui (Radix-based) |
| Styling | Tailwind CSS 4 + oklch color system |
| Markdown | react-markdown + remark-gfm |
| Icons | Lucide React |
| PDF parsing | pdf-parse (via LangChain PDFLoader) |
| Containerisation | Docker Compose (pgvector/pgvector:pg17) |
