ServeGen

ServeGen is a framework for generating realistic large language model (LLM) serving workloads. Powered by the analysis of billions of inference requests across 12 production models on Alibaba Cloud Model Studio (百炼), ServeGen is able to replicate the nuanced complexity of real-world workloads, such as:

Bursty request arrivals beyond simple Poisson models
Shifting input/output length distributions over days and weeks
Heterogeneous data composition in multimodal workloads (Qwen-VL)
Bimodal reasoning length distribution in reasoning workloads (DeepSeek-R1)

We hope ServeGen can become a data-driven bridge between frontier research and production realities when designing and deploying new LLM serving systems.

For more detailed analysis results, check out our characterization paper!

Requirements

ServeGen requires Python 3.8 or higher and the following dependencies:

numpy (>=1.20.0): For numerical computations and array operations
scipy (>=1.7.0): For statistical distributions and sampling
pytest (>=7.0.0): For running tests (optional, only needed for development)

You can install all dependencies and this project using pip:

pip install -r requirements.txt
pip install -e .

Examples

Basic Usage

from servegen import Category, ClientPool
from servegen.construct import generate_workload

# Load client data
pool = ClientPool(Category.LANGUAGE, "m-large")

# Generate workload
rate_fn = {0: 100.0, 600: 150.0}  # requests per second
requests = generate_workload(pool, rate_fn, duration=1200)

Custom Workloads

# Create custom clients with different patterns
bursty_client = create_bursty_client(1)  # High CV, concentrated distributions
stable_client = create_stable_client(2)  # Low CV, Pareto/Exponential distributions

# Generate workload with custom rate function
rate_fn = {0: 10.0, 60: 15.0, 120: 8.0}
requests = generate_workload(pool, rate_fn, duration=180)

Multimodal and Reasoning Workloads

# Generate multimodal workload
pool = ClientPool(Category.MULTIMODAL, "mm-image")
requests = generate_workload(pool, rate_fn, duration=600)

# Generate reasoning workload
pool = ClientPool(Category.REASON, "deepseek-r1")
requests = generate_workload(pool, rate_fn, duration=3600)

See examples/ for more detailed examples:

basic_usage.py: Basic workload generation and saving to CSV
generate_custom.py: Custom workload patterns
generate_realistic.py: Realistic workload generation
generate_advanced.py: Multimodal and reasoning workloads
clientpool_example.py: Client pool analysis and filtering

Filtering Client Data and Getting CDFs

from servegen import Category, ClientPool
import numpy as np

# Load client pool
pool = ClientPool(Category.LANGUAGE, "m-large")

# Filter clients by various criteria
filtered_view = (
    pool
    .span(72000, 75600)  # 20:00-21:00
    .filter_by_cv(0.5, 1.5)  # Filter by coefficient of variation
    .filter_by_avg_input_len(100, 1000)  # Filter by average input length
    .filter_by_max_output_len(2000)  # Filter by maximum output length
)

# Get CDFs of client behaviors
cdfs = filtered_view.get_cdfs()

# Print information about available CDFs
print("\nAvailable CDFs:")
for field in cdfs:
    if field in ["rate", "cv"]:
        timestamps = sorted(cdfs[field].keys())
        print(f"  {field}: {len(timestamps)} timestamps")
    else:
        stats = cdfs[field].keys()
        print(f"  {field}: {len(stats)} statistics")

# Print detailed information for the first timestamp
first_ts = min(cdfs["rate"].keys())
values, probs = cdfs["rate"][first_ts]
print(f"\nRate CDF at timestamp {first_ts}:")
print(f"  Values: {values}")
print(f"  Probabilities: {probs}")

# Print statistics for input tokens
print("\nInput token statistics:")
for stat in ["avg", "p50", "p95", "p99"]:
    if stat in cdfs["input_tokens"] and first_ts in cdfs["input_tokens"][stat]:
        values, probs = cdfs["input_tokens"][stat][first_ts]
        print(f"  {stat.upper()}:")
        print(f"    Values: {values}")
        print(f"    Probabilities: {probs}")

Data Structure

The framework comes with data organized as follows:

data/
├── language/
│   ├── m-large/
│   │   ├── chunk-1-dataset.json
│   │   ├── chunk-1-trace.csv
│   │   ├── chunk-2-dataset.json
│   │   └── chunk-2-trace.csv
│   ├── m-mid/
│   │   └── ...
│   ├── m-small/
│   │   └── ...
├── reason/
│   ├── deepseek-r1/
│   │   ├── chunk-1-dataset.json
│   │   └── chunk-1-trace.csv
└── multimodal/
    ├── mm-image/
    │   ├── chunk-1-dataset.json
    │   └── chunk-1-trace.csv

Each category (LANGUAGE, REASON, MULTIMODAL) contains model-specific data with:

chunk-i-dataset.json: Request data distributions
chunk-i-trace.csv: Rate and arrival pattern information

Citation

If you find our work helpful, feel free to give us a cite.

@misc{servegen,
    title={ServeGen: Workload Characterization and Generation of Large Language Model Serving in Production}, 
    author={Yuxing Xiang and Xue Li and Kun Qian and Wenyuan Yu and Ennan Zhai and Xin Jin},
    year={2025},
    eprint={2505.09999},
    archivePrefix={arXiv},
    primaryClass={cs.DC},
    url={https://arxiv.org/abs/2505.09999}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
data		data
examples		examples
servegen		servegen
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Repository files navigation

ServeGen

Requirements

Examples

Basic Usage

Custom Workloads

Multimodal and Reasoning Workloads

Filtering Client Data and Getting CDFs

Data Structure

Citation

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

Uh oh!

License

Uh oh!

alibaba/ServeGen

Folders and files

Latest commit

History

Repository files navigation

ServeGen

Requirements

Examples

Basic Usage

Custom Workloads

Multimodal and Reasoning Workloads

Filtering Client Data and Getting CDFs

Data Structure

Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages