Reliable model swapping for any local OpenAI/Anthropic compatible server - llama.cpp, vllm, etc
-
Updated
Jan 22, 2026 - Go
Reliable model swapping for any local OpenAI/Anthropic compatible server - llama.cpp, vllm, etc
Go library for embedded vector search and semantic embeddings using llama.cpp
☸️ Easy, advanced inference platform for large language models on Kubernetes. 🌟 Star to support our work!
🚢 Yet another operator for running large language models on Kubernetes with ease. Powered by Ollama! 🐫
High-performance lightweight proxy and load balancer for LLM infrastructure. Intelligent routing, automatic failover and unified model discovery across local and remote inference backends.
Unified management and routing for llama.cpp, MLX and vLLM models with web dashboard.
Inference Hub for AI at Scale
Eternal is an experimental platform for machine learning models and workflows.
Go package and example utilities for using Ollama / LLMs
Fast LLM swapping with sleep/wake support, compatible with vllm, llama.cpp, etc. llama-swap fork.
Local LLM proxy, DevOps friendly
A Model Context Protocol (MCP) server written in GO that provides text completion capabilities using local LLama.cpp models. This server exposes a single MCP tool that accepts text prompts and returns AI-generated completions using locally hosted language models.
DisCEdge: Distributed Context Management for LLMs at the Edge
Add a description, image, and links to the llamacpp topic page so that developers can more easily learn about it.
To associate your repository with the llamacpp topic, visit your repo's landing page and select "manage topics."