Thanks to visit codestin.com
Credit goes to Github.com

#

vllm-serve

Here are 21 public repositories matching this topic...

xerrors / mvllm

Intelligent load balancer for distributed vLLM server clusters 分布式 vLLM 服务器集群的智能负载均衡器

inference balancer llms vllm vllm-serve

Updated Oct 22, 2025
Python

Perpetue237 / agentsculptor

agentsculptor is an experimental AI-powered development agent designed to analyze, refactor, and extend Python projects automatically. It uses an OpenAI-like planner–executor loop on top of a vLLM backend, combining project context analysis, structured tool calls, and iterative refinement. It has only been tested with gpt-oss-120b via vLLM.

nlp open-source ai hackathon-project coding-assistant llms vllm agentic-ai vllm-serve gpt-oss gpt-oss-120b gpt-oss-20b vllm-server-config

Updated Sep 17, 2025
Python

BudEcosystem / Awesome-vLLM-plugins

A curated list of plugins built on top of vLLM

plugins vllm vllm-operator vllm-serve vllm-integration vllm-plugins

Updated Dec 12, 2025

kingabzpro / Deploying-the-Magistral-with-Modal

Deploy the Magistral-Small-2506 model using vLLM and Modal

modal mistral openai-api vllm-serve

Updated Jun 16, 2025
Python

MekayelAnik / vllm-cpu

Wheels & Docker images for running vLLM on CPU-only systems, optimized for different CPU instruction sets

cpu-inference vllm llm-inference vllm-serve vllm-server

Updated Dec 24, 2025
Shell

hadii-tech / vllm-mlops

Performant LLM inferencing on Kubernetes via vLLM

kubernetes digitalocean machine-learning mlops vllm vllm-serve

Updated Feb 11, 2025

brokedba / vllm-lab

This Repository contains terraform configuration for vllm production-stack in the cloud managed K8s

gke aks civo eks oke vllm llmcache vllm-operator vllm-serve vllm-production-stack

Updated Nov 10, 2025
HCL

SeungjaeLim / Efficient-Road-Repairs-System

[KAIST CS632] Road damage detection using YOLOv8 on Xilinx FPGA, repair estimation with vLLM-Serve Phi-3.5 FAISS RAG, and data management via GS1 EPCISv2 and React dashboard

react gs1 xilinx-fpga epcis faiss lmm rag yolov8 microsoft-phi3 vllm-serve

Updated Dec 19, 2024
Python

AbdulSametTurkmenoglu / vllm_rag_api

This project offers a production-ready RAG (Retrieval-Augmented Generation) API running on FastAPI, utilizing the high-performance vLLM engine.

rag llm vllm rag-chatbot vllm-serve

Updated Oct 31, 2025
Python

Aquiles-ai / load-test-vllm-gpt-oss-20b

Load testing openai/gpt-oss-20b with vLLM and Docker

docker load-testing vllm-serve gpt-oss-20b

Updated Sep 8, 2025
Python

ai-art-dev99 / vLLM-efficient-serving-stack

Production-grade vLLM serving with an OpenAI-compatible API, per-request LoRA routing, KEDA autoscaling on Prometheus metrics, Grafana/OTel observability, and a benchmark comparing AWQ vs GPTQ vs GGUF.

grafana openai-api keda-scalers awq large-language-models vllm low-rank-adaptation vllm-serve

Updated Aug 30, 2025
Python

aidamian / cpu_serving

containerization inference-engine mlops inference-benchmark llamacpp vllm-serve cpu-serving

Updated Nov 6, 2025
Python

mahimairaja / joycaption-runpod

A simple app to generate caption for your instagram post using `JoyCaption` model hosted in RunPod.io

image-captioning gpt vision-transformer vllm runpod-serverless joycaption vllm-serve

Updated Oct 4, 2025
Python

SumanMadipeddi / vllm-finetuned-inference-serving

Finetunned llm for domain usecase and inference using the vllm and serving on ollama

inference lora qlora ollama-client peft-fine-tuning-llm vllm-serve

Updated Oct 8, 2025
Jupyter Notebook

HTAnh2003 / ViOCR-VLM-1B

Đây là mô hình OCR được tinh chỉnh từ Vintern1B (InternVL 1B) với 1 tỷ tham số. Mô hình có khả năng nhận diện văn bản trong nhiều ngữ cảnh khác nhau như chữ viết tay, chữ in, và văn bản trên các đối tượng thực tế.

docker llm vllm-serve

Updated Jun 9, 2025
HTML

SeungjaeLim / SaemSam

[2024 Elice AI Hellothon Excellence Award (2nd Place)] Caregiver cognitive activity lesson guide creator and elderly interactive AI drawing diary service, Saem, Sam

react whisper vite fastapi openai-api dalle-3 llama3 gemma2 elice-cloud vllm-serve

Updated Nov 25, 2024
TypeScript

AI-Inference-On-HPC

ABHIPATEL98 / AI-Inference-On-HPC

A comprehensive framework for multi-node, multi-GPU scalable LLM inference on HPC systems using vLLM and Ollama. Includes distributed deployment templates, benchmarking workflows, and chatbot/RAG pipelines for high-throughput, production-grade AI services

hpc rag ai-inference vllm llm-inference ollama distributed-inference vllm-serve ai-inference-server multinode-training

Updated Dec 10, 2025
Python

sh4shv4t / vllmproject

Project to set up a UI for users to interact with a LLM being served using vLLM

flask vllm vllm-serve

Updated Sep 9, 2025
Python

evdcush / vllm-openwebui

Open WebUI w/ vLLM engine 🔥 🆒

local-first tailscale vllm local-llm openwebui vllm-serve no-ollama

Updated Oct 19, 2025

kldzj / vllm-transformers5

This repository provides a Docker image for vLLM with transformers>=5.0.0rc0 pre-installed to support newer models.

transformers vllm vllm-serve

Updated Dec 18, 2025
Dockerfile

Improve this page

Add a description, image, and links to the vllm-serve topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the vllm-serve topic, visit your repo's landing page and select "manage topics."