Hogwild! Inference: Parallel LLM Generation with a Concurrent Attention Cache

Official PyTorch implementation for the paper Hogwild! Inference: Parallel LLM Generation with a Concurrent Attention Cache

Demo

Hogwild! Inference.

Inference with shared cache:

Dependencies

Install packages from requirements.txt:

pip install -r requirements.txt

Run with multiple workers

To try inference described in the paper you can run jupyter notebooks from notebooks/ folder:

Simple example with minimal prompt: basic_example.ipynb

Hogwild! Inference with full prompt: full_example.ipynb

Minimal colab example with Llama-3.2 3B and very limited collaboration: colab_example.ipynb

Cite

If you found this work useful, please consider citing:

@misc{rodionov2025hogwildinferenceparallelllm,
      title={Hogwild! Inference: Parallel LLM Generation via Concurrent Attention}, 
      author={Gleb Rodionov and Roman Garipov and Alina Shutova and George Yakushev and Vage Egiazarian and Anton Sinitsin and Denis Kuznedelev and Dan Alistarh},
      year={2025},
      eprint={2504.06261},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2504.06261}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
shared_cache		shared_cache
LICENSE		LICENSE
README.md		README.md
basic_example.ipynb		basic_example.ipynb
colab_example.ipynb		colab_example.ipynb
formatting.py		formatting.py
full_example.ipynb		full_example.ipynb
generation.py		generation.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Hogwild! Inference: Parallel LLM Generation with a Concurrent Attention Cache

Demo

Inference with shared cache:

Dependencies

Run with multiple workers

Cite

About

Releases

Packages

Contributors 10

Languages

License

eqimp/hogwild_llm

Folders and files

Latest commit

History

Repository files navigation

Hogwild! Inference: Parallel LLM Generation with a Concurrent Attention Cache

Demo

Inference with shared cache:

Dependencies

Run with multiple workers

Cite

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 10

Languages

Packages