Nano-vLLM-for-Older-GPU

A lightweight vLLM implementation built from scratch.

The original implementation is incompatible with older GPU architectures such as V100 or T4. This repository provides compatibility by leveraging the torch_native attention backend.

Key Features

🚀 Fast offline inference - Comparable inference speeds to vLLM
📖 Readable codebase - Clean implementation in ~ 1,200 lines of Python code
⚡ Optimization Suite - Prefix caching, Tensor Parallelism, Torch compilation, CUDA graph, etc.

Installation

pip install git+https://github.com/GeeeekExplorer/nano-vllm.git

Manual Download

If you prefer to download the model weights manually, use the following command:

huggingface-cli download --resume-download Qwen/Qwen3-0.6B \
  --local-dir ~/huggingface/Qwen3-0.6B/ \
  --local-dir-use-symlinks False

Quick Start

See example.py for usage. The API mirrors vLLM's interface with minor differences in the LLM.generate method:

from nanovllm import LLM, SamplingParams
llm = LLM("/YOUR/MODEL/PATH", enforce_eager=True, tensor_parallel_size=1)
sampling_params = SamplingParams(temperature=0.6, max_tokens=256)
prompts = ["Hello, Nano-vLLM."]
outputs = llm.generate(prompts, sampling_params)
outputs[0]["text"]

Benchmark

See bench.py for benchmark.

Test Configuration:

Hardware: RTX 4070 Laptop (8GB)
Model: Qwen3-0.6B
Total Requests: 256 sequences
Input Length: Randomly sampled between 100–1024 tokens
Output Length: Randomly sampled between 100–1024 tokens

Performance Results:

Inference Engine	Output Tokens	Time (s)	Throughput (tokens/s)
vLLM	133,966	98.37	1361.84
Nano-vLLM	133,966	93.41	1434.13

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
nanovllm		nanovllm
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
bench.py		bench.py
example.py		example.py
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Repository files navigation

Nano-vLLM-for-Older-GPU

Key Features

Installation

Manual Download

Quick Start

Benchmark

Star History

About

Uh oh!

Releases

Packages

Languages

Uh oh!

License

Uh oh!

kaka-zb/nano-vllm-for-older-gpu

Folders and files

Latest commit

History

Repository files navigation

Nano-vLLM-for-Older-GPU

Key Features

Installation

Manual Download

Quick Start

Benchmark

Star History

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages