Official PyTorch implementation for the paper Hogwild! Inference: Parallel LLM Generation with a Concurrent Attention Cache
Install packages from requirements.txt
:
pip install -r requirements.txt
To try inference described in the paper you can run jupyter notebooks from notebooks/ folder:
Simple example with minimal prompt: basic_example.ipynb
Hogwild! Inference with full prompt: full_example.ipynb
Minimal colab example with Llama-3.2 3B and very limited collaboration: colab_example.ipynb
If you found this work useful, please consider citing:
@misc{rodionov2025hogwildinferenceparallelllm,
title={Hogwild! Inference: Parallel LLM Generation via Concurrent Attention},
author={Gleb Rodionov and Roman Garipov and Alina Shutova and George Yakushev and Vage Egiazarian and Anton Sinitsin and Denis Kuznedelev and Dan Alistarh},
year={2025},
eprint={2504.06261},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2504.06261},
}