Intelligent load balancer for distributed vLLM server clusters ๅๅธๅผ vLLM ๆๅกๅจ้็พค็ๆบ่ฝ่ด่ฝฝๅ่กกๅจ
-
Updated
Oct 22, 2025 - Python
Intelligent load balancer for distributed vLLM server clusters ๅๅธๅผ vLLM ๆๅกๅจ้็พค็ๆบ่ฝ่ด่ฝฝๅ่กกๅจ
agentsculptor is an experimental AI-powered development agent designed to analyze, refactor, and extend Python projects automatically. It uses an OpenAI-like plannerโexecutor loop on top of a vLLM backend, combining project context analysis, structured tool calls, and iterative refinement. It has only been tested with gpt-oss-120b via vLLM.
A curated list of plugins built on top of vLLM
Deploy the Magistral-Small-2506 model using vLLM and Modal
Wheels & Docker images for running vLLM on CPU-only systems, optimized for different CPU instruction sets
Performant LLM inferencing on Kubernetes via vLLM
This Repository contains terraform configuration for vllm production-stack in the cloud managed K8s
[KAIST CS632] Road damage detection using YOLOv8 on Xilinx FPGA, repair estimation with vLLM-Serve Phi-3.5 FAISS RAG, and data management via GS1 EPCISv2 and React dashboard
This project offers a production-ready RAG (Retrieval-Augmented Generation) API running on FastAPI, utilizing the high-performance vLLM engine.
Load testing openai/gpt-oss-20b with vLLM and Docker
Production-grade vLLM serving with an OpenAI-compatible API, per-request LoRA routing, KEDA autoscaling on Prometheus metrics, Grafana/OTel observability, and a benchmark comparing AWQ vs GPTQ vs GGUF.
A simple app to generate caption for your instagram post using `JoyCaption` model hosted in RunPod.io
Finetunned llm for domain usecase and inference using the vllm and serving on ollama
ฤรขy lร mรด hรฌnh OCR ฤฦฐแปฃc tinh chแปnh tแปซ Vintern1B (InternVL 1B) vแปi 1 tแปท tham sแป. Mรด hรฌnh cรณ khแบฃ nฤng nhแบญn diแปn vฤn bแบฃn trong nhiแปu ngแปฏ cแบฃnh khรกc nhau nhฦฐ chแปฏ viแบฟt tay, chแปฏ in, vร vฤn bแบฃn trรชn cรกc ฤแปi tฦฐแปฃng thแปฑc tแบฟ.
[2024 Elice AI Hellothon Excellence Award (2nd Place)] Caregiver cognitive activity lesson guide creator and elderly interactive AI drawing diary service, Saem, Sam
A comprehensive framework for multi-node, multi-GPU scalable LLM inference on HPC systems using vLLM and Ollama. Includes distributed deployment templates, benchmarking workflows, and chatbot/RAG pipelines for high-throughput, production-grade AI services
Project to set up a UI for users to interact with a LLM being served using vLLM
Open WebUI w/ vLLM engine ๐ฅ ๐
This repository provides a Docker image for vLLM with transformers>=5.0.0rc0 pre-installed to support newer models.
Add a description, image, and links to the vllm-serve topic page so that developers can more easily learn about it.
To associate your repository with the vllm-serve topic, visit your repo's landing page and select "manage topics."