A comprehensive Model Context Protocol (MCP) server that provides intelligent orchestration, queuing, and advanced features for LM Studio integration.
- Intelligent Model Selection: Automatically selects the best model based on task complexity, requirements, and performance metrics
- Task Queuing & Orchestration: Asynchronous task processing with priority queues and intelligent scheduling
- RAG Integration: Retrieval-Augmented Generation with vector stores and similarity search
- LangChain Integration: Advanced text processing, chunking, and embeddings
- Agentic Flows: Support for autonomous tool-calling workflows
- Conversation Context Management: Persistent conversation contexts with automatic cleanup
- Performance Monitoring: Real-time model performance tracking and optimization
- Multi-Model Support: Seamless switching between LLM and embedding models
- Streaming Responses: Real-time streaming for chat completions
- Vision Model Support: Integration with vision-language models
- Tool Integration: Custom tool definitions for agentic workflows
- Vector Stores: Create and manage multiple vector stores for RAG
- Similarity Search: Advanced semantic search capabilities
- Configuration Management: Flexible configuration with environment variables and files
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ MCP Client │◄──►│ MCP Server │◄──►│ LM Studio │
│ │ │ │ │ │
└─────────────────┘ └─────────────────┘ └─────────────────┘
│
▼
┌─────────────────┐
│ Orchestrator │
│ │
└─────┬───────────┘
│
┌─────────────┼─────────────┐
▼ ▼ ▼
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ Task Queue │ │ RAG Service │ │ LangChain │
│ │ │ │ │ │
└──────────────┘ └──────────────┘ └──────────────┘
│ │ │
▼ ▼ ▼
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ Redis Queue │ │ Vector Store │ │ Text Splitter│
│ │ │ │ │ │
└──────────────┘ └──────────────┘ └──────────────┘
- Node.js 18+
- LM Studio running with API enabled
- Redis server (for queuing)
- TypeScript
- Clone and install dependencies:
cd servers/lmstudio-server
npm install- Configure environment:
cp .env.example .env
# Edit .env with your settings- Build the server:
npm run build-
Start LM Studio:
- Launch LM Studio
- Load your preferred models
- Enable the API server (typically on port 1234)
-
Start Redis (required for queuing):
# On Windows with Redis installed
redis-server
# On macOS with Homebrew
brew services start redis
# On Ubuntu/Debian
sudo systemctl start redis-server- Run the server:
npm startSee .env.example for all available configuration options:
- LM Studio: Connection settings, timeouts, retry logic
- Database: SQLite, PostgreSQL, or MongoDB for persistence
- Queue: Redis configuration and concurrency settings
- RAG: Vector store settings, chunk sizes, similarity thresholds
- Model Selection: Strategy and auto-loading preferences
- Monitoring: Logging levels and metrics collection
Alternatively, create a config.json file:
{
"lmstudio": {
"baseUrl": "http://localhost:1234",
"timeout": 60000
},
"langchain": {
"enableRAG": true,
"embeddingModel": "nomic-embed-text-v1.5",
"chunkSize": 1000
},
"modelSelection": {
"strategy": "performance",
"autoLoad": true
}
}chat_completion: Intelligent chat with context managementtext_completion: Text completion with model selectionrag_query: RAG-powered question answering
load_model: Load models with intelligent selectionunload_model: Unload specific modelslist_models: List available and loaded modelsget_model_info: Get detailed model informationget_model_performance: Performance metrics for all models
create_vector_store: Create new vector storesadd_documents_to_vector_store: Add documents from text or filessearch_vector_store: Semantic similarity searchlist_vector_stores: List all vector storesdelete_vector_store: Remove vector stores
start_agentic_flow: Begin autonomous tool-calling workflowsget_task_status: Monitor task progresscancel_task: Cancel running tasks
create_embeddings: Generate embeddingssimilarity_search: Calculate semantic similaritycount_tokens: Token counting for any textget_system_status: Comprehensive system statusget_queue_stats: Queue statistics and health
{
"tool": "chat_completion",
"arguments": {
"messages": "Explain quantum computing in simple terms",
"temperature": 0.7,
"maxTokens": 500,
"modelHints": {
"complexity": "medium"
}
}
}{
"tool": "rag_query",
"arguments": {
"query": "What are the benefits of renewable energy?",
"vectorStoreId": "environmental_docs",
"maxTokens": 800,
"retrievalConfig": {
"maxRelevantChunks": 5,
"similarityThreshold": 0.8
}
}
}{
"tool": "create_vector_store",
"arguments": {
"id": "company_docs",
"name": "Company Documentation",
"filePaths": [
"./docs/handbook.txt",
"./docs/policies.md",
"./docs/procedures.pdf"
]
}
}{
"tool": "start_agentic_flow",
"arguments": {
"name": "Data Analysis Flow",
"description": "Analyze data and create visualizations",
"initialPrompt": "Analyze the sales data and create a summary report",
"tools": [
{
"name": "read_file",
"description": "Read data from a file",
"parameters": { "path": "string" },
"implementation": "return fs.readFileSync(args.path, 'utf-8');"
},
{
"name": "calculate_stats",
"description": "Calculate basic statistics",
"parameters": { "data": "array" },
"implementation": "return { mean: args.data.reduce((a,b) => a+b)/args.data.length };"
}
],
"maxRounds": 10
}
}{
"tool": "load_model",
"arguments": {
"taskType": "code_generation",
"complexity": "high",
"requiresToolUse": true,
"config": {
"contextLength": 16384,
"gpu": { "ratio": 0.8 }
}
}
}The server automatically selects the best model for each task based on:
- Task complexity (auto-detected from input)
- Required capabilities (vision, tool use, etc.)
- Model performance metrics
- Resource availability
- User preferences
- Priority Queues: HIGH, MEDIUM, LOW priority levels
- Concurrent Processing: Configurable concurrency per queue type
- Retry Logic: Exponential backoff for failed tasks
- Progress Tracking: Real-time task status monitoring
- Persistent Contexts: Maintain conversation state across requests
- Automatic Cleanup: Remove old contexts to manage memory
- Token Tracking: Monitor context length to prevent overflow
Get comprehensive system information:
{
"tool": "get_system_status",
"arguments": {}
}Returns:
- Queue statistics
- Loaded models
- Performance metrics
- RAG statistics
- Active conversations
- Memory usage
{
"tool": "get_model_performance",
"arguments": {}
}Tracks:
- Average latency per model
- Tokens per second
- Success rates
- Memory usage
- Usage patterns
Configurable log levels:
debug: Detailed execution informationinfo: General operational infowarn: Warnings and degraded performanceerror: Errors and failures
Logs are written to:
- Console (with colors)
./logs/combined.log(all logs)./logs/error.log(errors only)
- Performance: Select based on speed and accuracy metrics
- Availability: Use currently loaded models when possible
- Cost: Optimize for computational efficiency
- Chunk Sizes: Adjust for your document types
- Overlap: Balance context preservation vs. performance
- Similarity Thresholds: Fine-tune relevance filtering
- Reranking: Enable for improved result quality
- Concurrency: Balance throughput vs. resource usage
- TTL: Automatic model unloading after idle time
- Cleanup: Regular maintenance of completed tasks
- Connection Refused: Ensure LM Studio API is running on the correct port
- Redis Connection: Verify Redis server is running and accessible
- Model Loading: Check available models in LM Studio
- Memory Issues: Adjust concurrency and model limits
- Performance: Monitor queue stats and adjust priorities
LOG_LEVEL=debug npm start{
"tool": "get_system_status",
"arguments": {}
}- Fork the repository
- Create your feature branch (
git checkout -b feature/AmazingFeature) - Commit your changes (
git commit -m 'Add some AmazingFeature') - Push to the branch (
git push origin feature/AmazingFeature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
For issues and questions:
- Check the troubleshooting section
- Review logs for error details
- Open an issue with reproduction steps
- Include system status output when reporting bugs