ArcticInference

Latest news

[2025/05] - Fastest Speculative Decoding in vLLM with Arctic Inference and Arctic Training
[2025/04] - Low-Latency and High-Throughput Inference for Long Context with Sequence Parallelism (aka Ulysses)

ArcticInference

ArcticInference is a new library from Snowflake AI Research that contains current and future LLM inference optimizations developed at Snowflake. It is integrated with vLLM v0.8.1 using vLLM’s custom plugin feature, allowing us to develop and integrate inference optimizations quickly into vLLM and make them available to the community.

Once installed, ArcticInference automatically patches vLLM to use Arctic Ulysses and other optimizations implemented in ArcticInference, and users can continue to use their familiar vLLM APIs and CLI. It’s easy to get started!

Installation

$ pip install arctic-inference[vllm]

Projects

To better understand what features ArcticInference supports please refer to the following list of projects we have released under this framework:

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
.github		.github
arctic_inference		arctic_inference
csrc/suffix_cache		csrc/suffix_cache
projects		projects
scripts		scripts
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Latest news

ArcticInference

Installation

Projects

About

Uh oh!

Releases

Packages

Languages

License

sfc-gh-mhidayetoglu/ArcticInference

Folders and files

Latest commit

History

Repository files navigation

Latest news

ArcticInference

Installation

Projects

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages