Thanks to visit codestin.com
Credit goes to github.com

Skip to content

ThunderAgent-org/ThunderAgent

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

123 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ThunderAgent

Fast, simple and program-aware agentic inference system.

| Wiki | Documentation | Blog | Paper |


About

ThunderAgent is a fast and easy-to-use library for agentic inference and rollout.

ThunderAgent is fast with:

  • Agentic program-aware scheduler that increases KV-cache hit rate and reduces memory imbalance across nodes, increasing agentic inference throughputs 1.5-3.6x across multiple agentic workflows.
  • Tool-call lifecycle management with automatic resource reclaim for more stable and reliable long-running rollouts

ThunderAgent is flexible and easy to use with:

  • OpenAI-compatible API passthrough with only one changing, adding Program_id to the sending API.

  • Multiple inference support for vLLM and SGLang

  • Multiple agentic RL training example like Search-R1 agent with slime and mini-swe-agent with SkyRL.

  • Real-time visualization of agentic trajectory metrics including total tokens, tool-use time, and per-program profiling.

Overview

ThunderAgent sits between agent clients and the infrastructure layer as an agentic workflow scheduler. On one hand, it improves inference throughput of vLLM/SGLang across multiple GPU nodes through program-aware scheduling. On the other hand, it provides a unified tool management interface for resources like Docker containers and remote APIs.

ThunderAgent Architecture

Inference & Evaluation Results

ThunderAgent improves vLLM throughput by 1.5–3.6× across diverse agentic workloads including SWE-Agent, OpenHands, and ToolOrchestra.

Inference Pipeline Results

Getting Started

Install ThunderAgent from source:

git clone [email protected]:HaoKang-Timmy/ThunderAgent.git
cd ThunderAgent
pip install -e .

How to use? Choose one backend you like, for example vllm.

uv pip install vllm --torch-backend=auto # install vllm

vllm serve Qwen/Qwen3-32B --port 8000 # serve a model

thunderagent --backend-type vllm --backends http://localhost:8000 --port 9000 --metrics --profile # launch ThunderAgent, make sure to send request through 9000.

How to embed with your own agentic workflow?

# original openai sender
openai.client.chat.completions.create(
            model=self.config.model_name,
            messages=messages,
          )
# ThunderAgent openai sender
extra_body = {}
extra_body["program_id"] = "unique_id"
# if you use docker for your agentic workflow
# extra_body["docker_ids"] = ["docker_id1", "docker_id2", ...]
openai.client.chat.completions.create(
            model=self.config.model_name,
            messages=messages,
            extra_body = extra_body
          )

Contributing

We welcome and value any contributions and collaborations. Please create a pull request.

Citation

If you use ThunderAgent for your research, please cite our paper:

@misc{kang2026thunderagentsimplefastprogramaware,
      title={ThunderAgent: A Simple, Fast and Program-Aware Agentic Inference System}, 
      author={Hao Kang and Ziyang Li and Xinyu Yang and Weili Xu and Yinfang Chen and Junxiong Wang and Beidi Chen and Tushar Krishna and Chenfeng Xu and Simran Arora},
      year={2026},
      eprint={2602.13692},
      archivePrefix={arXiv},
      primaryClass={cs.OS},
      url={https://arxiv.org/abs/2602.13692}, 
}

Contact Us

For enterprises interested in adopting or deploying ThunderAgent at scale, including technical consulting, sponsorship opportunities, or partnership inquiries, please contact us at [email protected].

License

This repository is available under the MIT license. See the LICENSE.md file for details.

About

A simple, fast and robust program-aware agentic inference system.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages