RooCode Qwen3 Codebase Indexing Setup

Qwen3-embedding has topped embedding benchmarks, easily beating both open and close-source models. This project provides tools to optimize any Qwen3-Embedding GGUF model downloaded through Ollama, with an OpenAI-compatible API wrapper and optimized Qdrant vector store.

🎯 Fully RooCode Compatible! - Works seamlessly with Cline tributaries Roo & Kilo Code.

Quick Start

Automated Setup (Recommended)

# One-command setup: downloads model, optimizes, and configures everything
./setup.sh

This automated script:

Downloads Qwen3-Embedding-0.6B model (Q8_0-optimized) via Ollama
Extracts and optimizes the GGUF model from Ollama storage
Creates optimized Ollama model for embedding-only usage
Installs Python dependencies and starts all services
Sets up Qdrant vector database with proper configuration

Manual Setup

# 1. Download and optimize Qwen3 model (0.6b:Q8 recommended, 4B:Q4 best)
ollama pull hf.co/Qwen/Qwen3-Embedding-0.6B-GGUF:Q8_0
python optimize_gguf.py hf.co/Qwen/Qwen3-Embedding-0.6B-GGUF:Q8_0 qwen3-embedding

# 2. Install dependencies and start services
pip install -r requirements.txt
docker run -d --name qdrant -p 6333:6333 -e QDRANT__SERVICE__API_KEY="your-super-secret-qdrant-api-key" -v $(pwd)/qdrant_storage:/qdrant/storage qdrant/qdrant
python qdrantsetup.py
python qwen3-api.py

Ready to use with RooCode! The setup script displays the exact configuration values needed.

This setup provides a complete, optimized embedding pipeline with Qwen developer recommendations:

GGUF Model Optimizer: optimize_gguf.py - Extracts and optimizes any Qwen3 model from Ollama
Instruction-Aware Embedding: Task-specific instructions for 1-5% performance improvement
MRL Support: Matryoshka Representation Learning with 512, 768, and 1024 dimensions
OpenAI-Compatible API: qwen3-api.py wrapper with RooCode base64 encoding support
Optimized Qdrant Vector Store: qdrantsetup.py with performance tuning for 1024-dimensional vectors
Task-Specific Templates: Code search, document retrieval, Q&A, clustering, and more
Complete RooCode Integration: Ready-to-use with proper API keys and endpoints

Services

Ollama: http://localhost:11434 (Serving optimized GGUF model)
Qwen3-0.6B API: http://localhost:8000 (OpenAI-compatible FastAPI wrapper, RooCode Compatible)
Qdrant Vector DB: http://localhost:6333 (Docker container with optimizations)

RooCode Integration

After running the setup script, you'll see the exact configuration values needed for RooCode integration:

# RooCode Configuration (displayed by setup script)
Embeddings Provider: OpenAI-compatible
Base URL: http://localhost:8000
API Key: your-super-secret-qdrant-api-key
Model: qwen3-embedding
Embedding Dimension: 1024 # 4B upto 2560; 8B upto 4096

# Vector Database Configuration
Qdrant URL: http://localhost:6333
Qdrant API Key: your-super-secret-qdrant-api-key
Collection Name: qwen3_embedding

Qwen3-Embedding Advanced Features

🚀 Instruction-Aware Embedding

Automatic task-specific instruction formatting
9 optimized instruction templates for different use cases

🎯 MRL (Matryoshka Representation Learning)

Support for 512, 768, and 1024 dimensions
Smaller embeddings for faster search when full precision isn't needed
Maintains quality with reduced dimensionality

⚡ Optimized Configuration

Memory-mapped file loading for faster startup (1-3ms load time)
Multi-threaded processing for better performance
Optimal context window and rope frequency settings

📊 Benchmarked Performance

Q8_0 quantization: Best quality/size balance (610MB)
1024-dimensional embeddings with high semantic accuracy
Fast inference optimized for codebase indexing workflows

Automatic instruction formatting

"text_search", # General semantic search (default)
"code_search", # Code and programming tasks
"document_retrieval", # Document and text retrieval
"question_answering", # Q&A systems
"clustering", # Text clustering and categorization
"classification", # Classification tasks
"similarity", # Semantic similarity comparison
"general" # General purpose embedding

Each task automatically applies the optimal instruction format

API Endpoints

POST /v1/embeddings - Create embeddings (OpenAI compatible)
GET /v1/models - List available models
GET /health - Health check
GET / - API information

Usage Examples

OpenAI-Compatible Embedding API

import requests

# Basic embedding (uses default "text_search" task)
response = requests.post("http://localhost:8000/v1/embeddings", json={
    "input": "Your text to embed",
    "model": "qwen3-embedding",
    "encoding_format": "float"
})

# Task-specific embedding (1-5% performance improvement)
response = requests.post("http://localhost:8000/v1/embeddings", json={
    "input": "def fibonacci(n): return n if n <= 1 else fibonacci(n-1) + fibonacci(n-2)",
    "model": "qwen3-embedding",
    "task": "code_search",  # Optimized for code
    "encoding_format": "float"
})

# Custom instruction embedding (maximum performance)
response = requests.post("http://localhost:8000/v1/embeddings", json={
    "input": "Advanced machine learning concepts",
    "model": "qwen3-embedding", 
    "instruction": "Represent this text for academic research and similarity:",
    "encoding_format": "float"
})

# MRL - Custom dimensions (Matryoshka Representation Learning)
response = requests.post("http://localhost:8000/v1/embeddings", json={
    "input": "Text for lower-dimensional embedding", 
    "model": "qwen3-embedding",
    "dimensions": 768,  # Instead of default 1024
    "encoding_format": "float"
})

embeddings = response.json()["data"][0]["embedding"]
print(f"Generated {len(embeddings)}-dimensional embedding")

Verification & Testing

# Test individual components if needed
curl http://localhost:8000/health      # API health check
curl http://localhost:6333/health      # Qdrant health check
curl http://localhost:11434/api/tags   # List Ollama models

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

RooCode Qwen3 Codebase Indexing Setup

Quick Start

Services

RooCode Integration

Qwen3-Embedding Advanced Features

API Endpoints

Usage Examples

Verification & Testing

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
.gitignore		.gitignore
README.md		README.md
optimize_gguf.py		optimize_gguf.py
qdrantsetup.py		qdrantsetup.py
qwen3-api.py		qwen3-api.py
requirements.txt		requirements.txt
setup.sh		setup.sh
test_qwen_features.py		test_qwen_features.py

OJamals/Modal

Folders and files

Latest commit

History

Repository files navigation

RooCode Qwen3 Codebase Indexing Setup

Quick Start

Services

RooCode Integration

Qwen3-Embedding Advanced Features

API Endpoints

Usage Examples

Verification & Testing

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages