Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Diatonic-AI/lmstudio

Repository files navigation

Advanced LM Studio MCP Server

A comprehensive Model Context Protocol (MCP) server that provides intelligent orchestration, queuing, and advanced features for LM Studio integration.

Features

🚀 Core Capabilities

  • Intelligent Model Selection: Automatically selects the best model based on task complexity, requirements, and performance metrics
  • Task Queuing & Orchestration: Asynchronous task processing with priority queues and intelligent scheduling
  • RAG Integration: Retrieval-Augmented Generation with vector stores and similarity search
  • LangChain Integration: Advanced text processing, chunking, and embeddings
  • Agentic Flows: Support for autonomous tool-calling workflows
  • Conversation Context Management: Persistent conversation contexts with automatic cleanup
  • Performance Monitoring: Real-time model performance tracking and optimization

🔧 Advanced Features

  • Multi-Model Support: Seamless switching between LLM and embedding models
  • Streaming Responses: Real-time streaming for chat completions
  • Vision Model Support: Integration with vision-language models
  • Tool Integration: Custom tool definitions for agentic workflows
  • Vector Stores: Create and manage multiple vector stores for RAG
  • Similarity Search: Advanced semantic search capabilities
  • Configuration Management: Flexible configuration with environment variables and files

Architecture

┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│   MCP Client    │◄──►│  MCP Server     │◄──►│   LM Studio     │
│                 │    │                 │    │                 │
└─────────────────┘    └─────────────────┘    └─────────────────┘
                              │
                              ▼
                    ┌─────────────────┐
                    │  Orchestrator   │
                    │                 │
                    └─────┬───────────┘
                          │
            ┌─────────────┼─────────────┐
            ▼             ▼             ▼
    ┌──────────────┐ ┌──────────────┐ ┌──────────────┐
    │ Task Queue   │ │ RAG Service  │ │ LangChain    │
    │              │ │              │ │              │
    └──────────────┘ └──────────────┘ └──────────────┘
            │             │             │
            ▼             ▼             ▼
    ┌──────────────┐ ┌──────────────┐ ┌──────────────┐
    │ Redis Queue  │ │ Vector Store │ │ Text Splitter│
    │              │ │              │ │              │
    └──────────────┘ └──────────────┘ └──────────────┘

Installation

Prerequisites

  • Node.js 18+
  • LM Studio running with API enabled
  • Redis server (for queuing)
  • TypeScript

Setup

  1. Clone and install dependencies:
cd servers/lmstudio-server
npm install
  1. Configure environment:
cp .env.example .env
# Edit .env with your settings
  1. Build the server:
npm run build
  1. Start LM Studio:

    • Launch LM Studio
    • Load your preferred models
    • Enable the API server (typically on port 1234)
  2. Start Redis (required for queuing):

# On Windows with Redis installed
redis-server

# On macOS with Homebrew
brew services start redis

# On Ubuntu/Debian
sudo systemctl start redis-server
  1. Run the server:
npm start

Configuration

Environment Variables

See .env.example for all available configuration options:

  • LM Studio: Connection settings, timeouts, retry logic
  • Database: SQLite, PostgreSQL, or MongoDB for persistence
  • Queue: Redis configuration and concurrency settings
  • RAG: Vector store settings, chunk sizes, similarity thresholds
  • Model Selection: Strategy and auto-loading preferences
  • Monitoring: Logging levels and metrics collection

Configuration File

Alternatively, create a config.json file:

{
  "lmstudio": {
    "baseUrl": "http://localhost:1234",
    "timeout": 60000
  },
  "langchain": {
    "enableRAG": true,
    "embeddingModel": "nomic-embed-text-v1.5",
    "chunkSize": 1000
  },
  "modelSelection": {
    "strategy": "performance",
    "autoLoad": true
  }
}

Available Tools

Chat & Completion

  • chat_completion: Intelligent chat with context management
  • text_completion: Text completion with model selection
  • rag_query: RAG-powered question answering

Model Management

  • load_model: Load models with intelligent selection
  • unload_model: Unload specific models
  • list_models: List available and loaded models
  • get_model_info: Get detailed model information
  • get_model_performance: Performance metrics for all models

Vector Stores & RAG

  • create_vector_store: Create new vector stores
  • add_documents_to_vector_store: Add documents from text or files
  • search_vector_store: Semantic similarity search
  • list_vector_stores: List all vector stores
  • delete_vector_store: Remove vector stores

Agentic Flows

  • start_agentic_flow: Begin autonomous tool-calling workflows
  • get_task_status: Monitor task progress
  • cancel_task: Cancel running tasks

Utilities

  • create_embeddings: Generate embeddings
  • similarity_search: Calculate semantic similarity
  • count_tokens: Token counting for any text
  • get_system_status: Comprehensive system status
  • get_queue_stats: Queue statistics and health

Usage Examples

Basic Chat Completion

{
  "tool": "chat_completion",
  "arguments": {
    "messages": "Explain quantum computing in simple terms",
    "temperature": 0.7,
    "maxTokens": 500,
    "modelHints": {
      "complexity": "medium"
    }
  }
}

RAG Query with Context

{
  "tool": "rag_query",
  "arguments": {
    "query": "What are the benefits of renewable energy?",
    "vectorStoreId": "environmental_docs",
    "maxTokens": 800,
    "retrievalConfig": {
      "maxRelevantChunks": 5,
      "similarityThreshold": 0.8
    }
  }
}

Create Vector Store from Documents

{
  "tool": "create_vector_store",
  "arguments": {
    "id": "company_docs",
    "name": "Company Documentation",
    "filePaths": [
      "./docs/handbook.txt",
      "./docs/policies.md",
      "./docs/procedures.pdf"
    ]
  }
}

Start Agentic Flow with Tools

{
  "tool": "start_agentic_flow",
  "arguments": {
    "name": "Data Analysis Flow",
    "description": "Analyze data and create visualizations",
    "initialPrompt": "Analyze the sales data and create a summary report",
    "tools": [
      {
        "name": "read_file",
        "description": "Read data from a file",
        "parameters": { "path": "string" },
        "implementation": "return fs.readFileSync(args.path, 'utf-8');"
      },
      {
        "name": "calculate_stats",
        "description": "Calculate basic statistics",
        "parameters": { "data": "array" },
        "implementation": "return { mean: args.data.reduce((a,b) => a+b)/args.data.length };"
      }
    ],
    "maxRounds": 10
  }
}

Intelligent Model Loading

{
  "tool": "load_model",
  "arguments": {
    "taskType": "code_generation",
    "complexity": "high",
    "requiresToolUse": true,
    "config": {
      "contextLength": 16384,
      "gpu": { "ratio": 0.8 }
    }
  }
}

Performance Features

Intelligent Model Selection

The server automatically selects the best model for each task based on:

  • Task complexity (auto-detected from input)
  • Required capabilities (vision, tool use, etc.)
  • Model performance metrics
  • Resource availability
  • User preferences

Queue Management

  • Priority Queues: HIGH, MEDIUM, LOW priority levels
  • Concurrent Processing: Configurable concurrency per queue type
  • Retry Logic: Exponential backoff for failed tasks
  • Progress Tracking: Real-time task status monitoring

Context Management

  • Persistent Contexts: Maintain conversation state across requests
  • Automatic Cleanup: Remove old contexts to manage memory
  • Token Tracking: Monitor context length to prevent overflow

Monitoring & Debugging

System Status

Get comprehensive system information:

{
  "tool": "get_system_status",
  "arguments": {}
}

Returns:

  • Queue statistics
  • Loaded models
  • Performance metrics
  • RAG statistics
  • Active conversations
  • Memory usage

Performance Metrics

{
  "tool": "get_model_performance",
  "arguments": {}
}

Tracks:

  • Average latency per model
  • Tokens per second
  • Success rates
  • Memory usage
  • Usage patterns

Logging

Configurable log levels:

  • debug: Detailed execution information
  • info: General operational info
  • warn: Warnings and degraded performance
  • error: Errors and failures

Logs are written to:

  • Console (with colors)
  • ./logs/combined.log (all logs)
  • ./logs/error.log (errors only)

Advanced Configuration

Model Selection Strategies

  1. Performance: Select based on speed and accuracy metrics
  2. Availability: Use currently loaded models when possible
  3. Cost: Optimize for computational efficiency

RAG Configuration

  • Chunk Sizes: Adjust for your document types
  • Overlap: Balance context preservation vs. performance
  • Similarity Thresholds: Fine-tune relevance filtering
  • Reranking: Enable for improved result quality

Queue Optimization

  • Concurrency: Balance throughput vs. resource usage
  • TTL: Automatic model unloading after idle time
  • Cleanup: Regular maintenance of completed tasks

Troubleshooting

Common Issues

  1. Connection Refused: Ensure LM Studio API is running on the correct port
  2. Redis Connection: Verify Redis server is running and accessible
  3. Model Loading: Check available models in LM Studio
  4. Memory Issues: Adjust concurrency and model limits
  5. Performance: Monitor queue stats and adjust priorities

Debug Mode

LOG_LEVEL=debug npm start

Health Checks

{
  "tool": "get_system_status",
  "arguments": {}
}

Contributing

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/AmazingFeature)
  3. Commit your changes (git commit -m 'Add some AmazingFeature')
  4. Push to the branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Support

For issues and questions:

  • Check the troubleshooting section
  • Review logs for error details
  • Open an issue with reproduction steps
  • Include system status output when reporting bugs

About

LM Studio server module for MCP production

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published