Codestin Search App

SCOOT: SLO-Oriented Performance Tuning for LLM Inference Engines

This is the implementations of the WWW2025 oral paper SCOOT: SLO-Oriented Performance Tuning for LLM Inference Engines

Overview

SCOOT is an automatic performance tuning system to optimize SLOs for each LLM inference service by tuning the parameters of the inference engine. It jointly exploits single-objective and multiple-objective Bayesian optimization techniques to handle various optimization objectives via exploration and exploitation. Moreover, SCOOT prunes the searchb space with known constraints and adopts a random forest to learn hidden constraints during the tuning process to mitigate invalid exploration. It can improve the performance of the LLM inference engine efficiently.

Quick Start

bo_scoot.py is the script invovling the whole pipeline.

The shell script tune_entry.sh is used to reproduce the main results in the paper.

The python scripts in the directory clients are forked form vllm, involving api_server.py, backend_request_func.py and benchmark_serving.py, which are used to initialize server, client and benchmarking requsting, respectively.

Also, we implement modules of handling hidden and hard constraints in the BO search based on HEBO, which is in hebo directory. Specifically, the modules of handling hidden and hard constraints are incorporated in acquisition functions and the optimizers, i.e., /hebo/acquisitions/acq.py and /hebo/optimizers/util.py.

Citation

@inproceedings{cheng2025scoot,
  title={SCOOT: SLO-Oriented Performance Tuning for LLM Inference Engines},
  author={Cheng, Ke and Wang, Zhi and Hu, Wen and Yang, Tiannuo and Li, Jianguo and Zhang, Sheng},
  booktitle={Proceedings of the ACM on Web Conference 2025},
  pages={829--839},
  year={2025},
  publisher={Association for Computing Machinery}
}

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
clients		clients
hebo		hebo
tuner_conf		tuner_conf
README.md		README.md
SCOOT.jpg		SCOOT.jpg
benchmark_pipeline.sh		benchmark_pipeline.sh
bo_scoot.py		bo_scoot.py
requirements.txt		requirements.txt
run_client.sh		run_client.sh
run_entry_bo_scoot.sh		run_entry_bo_scoot.sh
run_server.sh		run_server.sh
tune_entry.sh		tune_entry.sh
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SCOOT: SLO-Oriented Performance Tuning for LLM Inference Engines

Overview

Quick Start

Citation

About

Uh oh!

Releases

Packages

Languages

Ketonmi/SCOOT

Folders and files

Latest commit

History

Repository files navigation

SCOOT: SLO-Oriented Performance Tuning for LLM Inference Engines

Overview

Quick Start

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages