Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Official PyTorch implementation for Hogwild! Inference: Parallel LLM Generation with a Concurrent Attention Cache

License

Notifications You must be signed in to change notification settings

eqimp/hogwild_llm

Repository files navigation

     

Hogwild! Inference: Parallel LLM Generation with a Concurrent Attention Cache

Official PyTorch implementation for the paper Hogwild! Inference: Parallel LLM Generation with a Concurrent Attention Cache


Demo


Hogwild! Inference.

Inference with shared cache:

Dependencies

Install packages from requirements.txt:

pip install -r requirements.txt

Run with multiple workers

To try inference described in the paper you can run jupyter notebooks from notebooks/ folder:

Simple example with minimal prompt: basic_example.ipynb

Hogwild! Inference with full prompt: full_example.ipynb

Minimal colab example with Llama-3.2 3B and very limited collaboration: colab_example.ipynb Open In Colab

Cite

If you found this work useful, please consider citing:

@misc{rodionov2025hogwildinferenceparallelllm,
      title={Hogwild! Inference: Parallel LLM Generation via Concurrent Attention}, 
      author={Gleb Rodionov and Roman Garipov and Alina Shutova and George Yakushev and Vage Egiazarian and Anton Sinitsin and Denis Kuznedelev and Dan Alistarh},
      year={2025},
      eprint={2504.06261},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2504.06261}, 
}

About

Official PyTorch implementation for Hogwild! Inference: Parallel LLM Generation with a Concurrent Attention Cache

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published