Advanced LM Studio MCP Server

A comprehensive Model Context Protocol (MCP) server that provides intelligent orchestration, queuing, and advanced features for LM Studio integration.

Features

🚀 Core Capabilities

Intelligent Model Selection: Automatically selects the best model based on task complexity, requirements, and performance metrics
Task Queuing & Orchestration: Asynchronous task processing with priority queues and intelligent scheduling
RAG Integration: Retrieval-Augmented Generation with vector stores and similarity search
LangChain Integration: Advanced text processing, chunking, and embeddings
Agentic Flows: Support for autonomous tool-calling workflows
Conversation Context Management: Persistent conversation contexts with automatic cleanup
Performance Monitoring: Real-time model performance tracking and optimization

🔧 Advanced Features

Multi-Model Support: Seamless switching between LLM and embedding models
Streaming Responses: Real-time streaming for chat completions
Vision Model Support: Integration with vision-language models
Tool Integration: Custom tool definitions for agentic workflows
Vector Stores: Create and manage multiple vector stores for RAG
Similarity Search: Advanced semantic search capabilities
Configuration Management: Flexible configuration with environment variables and files

Architecture

┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│   MCP Client    │◄──►│  MCP Server     │◄──►│   LM Studio     │
│                 │    │                 │    │                 │
└─────────────────┘    └─────────────────┘    └─────────────────┘
                              │
                              ▼
                    ┌─────────────────┐
                    │  Orchestrator   │
                    │                 │
                    └─────┬───────────┘
                          │
            ┌─────────────┼─────────────┐
            ▼             ▼             ▼
    ┌──────────────┐ ┌──────────────┐ ┌──────────────┐
    │ Task Queue   │ │ RAG Service  │ │ LangChain    │
    │              │ │              │ │              │
    └──────────────┘ └──────────────┘ └──────────────┘
            │             │             │
            ▼             ▼             ▼
    ┌──────────────┐ ┌──────────────┐ ┌──────────────┐
    │ Redis Queue  │ │ Vector Store │ │ Text Splitter│
    │              │ │              │ │              │
    └──────────────┘ └──────────────┘ └──────────────┘

Installation

Prerequisites

Node.js 18+
LM Studio running with API enabled
Redis server (for queuing)
TypeScript

Setup

Clone and install dependencies:

cd servers/lmstudio-server
npm install

Configure environment:

cp .env.example .env
# Edit .env with your settings

Build the server:

npm run build

Start LM Studio:
- Launch LM Studio
- Load your preferred models
- Enable the API server (typically on port 1234)
Start Redis (required for queuing):

# On Windows with Redis installed
redis-server

# On macOS with Homebrew
brew services start redis

# On Ubuntu/Debian
sudo systemctl start redis-server

Run the server:

npm start

Configuration

Environment Variables

See .env.example for all available configuration options:

LM Studio: Connection settings, timeouts, retry logic
Database: SQLite, PostgreSQL, or MongoDB for persistence
Queue: Redis configuration and concurrency settings
RAG: Vector store settings, chunk sizes, similarity thresholds
Model Selection: Strategy and auto-loading preferences
Monitoring: Logging levels and metrics collection

Configuration File

Alternatively, create a config.json file:

{
  "lmstudio": {
    "baseUrl": "http://localhost:1234",
    "timeout": 60000
  },
  "langchain": {
    "enableRAG": true,
    "embeddingModel": "nomic-embed-text-v1.5",
    "chunkSize": 1000
  },
  "modelSelection": {
    "strategy": "performance",
    "autoLoad": true
  }
}

Available Tools

Chat & Completion

chat_completion: Intelligent chat with context management
text_completion: Text completion with model selection
rag_query: RAG-powered question answering

Model Management

load_model: Load models with intelligent selection
unload_model: Unload specific models
list_models: List available and loaded models
get_model_info: Get detailed model information
get_model_performance: Performance metrics for all models

Vector Stores & RAG

create_vector_store: Create new vector stores
add_documents_to_vector_store: Add documents from text or files
search_vector_store: Semantic similarity search
list_vector_stores: List all vector stores
delete_vector_store: Remove vector stores

Agentic Flows

start_agentic_flow: Begin autonomous tool-calling workflows
get_task_status: Monitor task progress
cancel_task: Cancel running tasks

Utilities

create_embeddings: Generate embeddings
similarity_search: Calculate semantic similarity
count_tokens: Token counting for any text
get_system_status: Comprehensive system status
get_queue_stats: Queue statistics and health

Usage Examples

Basic Chat Completion

{
  "tool": "chat_completion",
  "arguments": {
    "messages": "Explain quantum computing in simple terms",
    "temperature": 0.7,
    "maxTokens": 500,
    "modelHints": {
      "complexity": "medium"
    }
  }
}

RAG Query with Context

{
  "tool": "rag_query",
  "arguments": {
    "query": "What are the benefits of renewable energy?",
    "vectorStoreId": "environmental_docs",
    "maxTokens": 800,
    "retrievalConfig": {
      "maxRelevantChunks": 5,
      "similarityThreshold": 0.8
    }
  }
}

Create Vector Store from Documents

{
  "tool": "create_vector_store",
  "arguments": {
    "id": "company_docs",
    "name": "Company Documentation",
    "filePaths": [
      "./docs/handbook.txt",
      "./docs/policies.md",
      "./docs/procedures.pdf"
    ]
  }
}

Start Agentic Flow with Tools

{
  "tool": "start_agentic_flow",
  "arguments": {
    "name": "Data Analysis Flow",
    "description": "Analyze data and create visualizations",
    "initialPrompt": "Analyze the sales data and create a summary report",
    "tools": [
      {
        "name": "read_file",
        "description": "Read data from a file",
        "parameters": { "path": "string" },
        "implementation": "return fs.readFileSync(args.path, 'utf-8');"
      },
      {
        "name": "calculate_stats",
        "description": "Calculate basic statistics",
        "parameters": { "data": "array" },
        "implementation": "return { mean: args.data.reduce((a,b) => a+b)/args.data.length };"
      }
    ],
    "maxRounds": 10
  }
}

Intelligent Model Loading

{
  "tool": "load_model",
  "arguments": {
    "taskType": "code_generation",
    "complexity": "high",
    "requiresToolUse": true,
    "config": {
      "contextLength": 16384,
      "gpu": { "ratio": 0.8 }
    }
  }
}

Performance Features

Intelligent Model Selection

The server automatically selects the best model for each task based on:

Task complexity (auto-detected from input)
Required capabilities (vision, tool use, etc.)
Model performance metrics
Resource availability
User preferences

Queue Management

Priority Queues: HIGH, MEDIUM, LOW priority levels
Concurrent Processing: Configurable concurrency per queue type
Retry Logic: Exponential backoff for failed tasks
Progress Tracking: Real-time task status monitoring

Context Management

Persistent Contexts: Maintain conversation state across requests
Automatic Cleanup: Remove old contexts to manage memory
Token Tracking: Monitor context length to prevent overflow

Monitoring & Debugging

System Status

Get comprehensive system information:

{
  "tool": "get_system_status",
  "arguments": {}
}

Returns:

Queue statistics
Loaded models
Performance metrics
RAG statistics
Active conversations
Memory usage

Performance Metrics

{
  "tool": "get_model_performance",
  "arguments": {}
}

Tracks:

Average latency per model
Tokens per second
Success rates
Memory usage
Usage patterns

Logging

Configurable log levels:

debug: Detailed execution information
info: General operational info
warn: Warnings and degraded performance
error: Errors and failures

Logs are written to:

Console (with colors)
./logs/combined.log (all logs)
./logs/error.log (errors only)

Advanced Configuration

Model Selection Strategies

Performance: Select based on speed and accuracy metrics
Availability: Use currently loaded models when possible
Cost: Optimize for computational efficiency

RAG Configuration

Chunk Sizes: Adjust for your document types
Overlap: Balance context preservation vs. performance
Similarity Thresholds: Fine-tune relevance filtering
Reranking: Enable for improved result quality

Queue Optimization

Concurrency: Balance throughput vs. resource usage
TTL: Automatic model unloading after idle time
Cleanup: Regular maintenance of completed tasks

Troubleshooting

Common Issues

Connection Refused: Ensure LM Studio API is running on the correct port
Redis Connection: Verify Redis server is running and accessible
Model Loading: Check available models in LM Studio
Memory Issues: Adjust concurrency and model limits
Performance: Monitor queue stats and adjust priorities

Debug Mode

LOG_LEVEL=debug npm start

Health Checks

{
  "tool": "get_system_status",
  "arguments": {}
}

Contributing

Fork the repository
Create your feature branch (git checkout -b feature/AmazingFeature)
Commit your changes (git commit -m 'Add some AmazingFeature')
Push to the branch (git push origin feature/AmazingFeature)
Open a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Support

For issues and questions:

Check the troubleshooting section
Review logs for error details
Open an issue with reproduction steps
Include system status output when reporting bugs

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
config		config
data/vectorstores		data/vectorstores
scripts		scripts
src		src
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
package-lock.json		package-lock.json
package.json		package.json
runMinimalTest.js		runMinimalTest.js
runTest.js		runTest.js
tsconfig.json		tsconfig.json

Diatonic-AI/lmstudio

Folders and files

Latest commit

History

Repository files navigation