SEC-bench

Automated Benchmarking of LLM Agents on Real-World Software Security Tasks

Hwiwon Lee¹, Ziqi Zhang¹, Hanxiao Lu², Lingming Zhang¹

¹UIUC, ²Purdue University

🎯 Overview

SEC-bench is a comprehensive benchmarking framework designed to evaluate Large Language Model (LLM) agents on real-world software security tasks. It provides automated tools for collecting vulnerability data, building reproducible vulnerability instances, and evaluating agent performance on security-related tasks.

✨ Features

🔍 Automated Benchmark Generation: Automated benchmark generation from OSV database and CVE records by using multi-agentic system
🐳 Containerized Environments: Docker-based reproducible vulnerability instances
🤖 Agent-oriented Evaluation: Evaluate agents on critical software security tasks (SWE-agent, OpenHands, and Aider are supported)
📊 Comprehensive Security Assessment: Both PoC generation and vulnerability patching assessment with extensibility to other tasks (e.g., fuzzing, static analysis, etc.)
📈 Rich Reporting: Detailed progress tracking and result visualization with rich terminal output

🔧 Prerequisites

Python: 3.12 or higher
Docker: Latest version with sufficient disk space (>200GB recommended)
Git: For repository cloning and submodule management
Conda: For environment management (recommended)

🚀 Installation

1. Clone the Repository

git clone --recurse-submodules https://github.com/SEC-bench/SEC-bench.git
cd SEC-bench

2. Create Python Environment

conda create -n secb python=3.12
conda activate secb

3. Install Dependencies

pip install -r requirements.txt

🔑 Environment Setup

Configure the following environment variables in your shell profile or .env file:

# Required API tokens
export GITHUB_TOKEN=<your_github_token>
export GITLAB_TOKEN=<your_gitlab_token>
export OPENAI_API_KEY=<your_openai_api_key>
export ANTHROPIC_API_KEY=<your_anthropic_api_key>

# Hugging Face configuration
export HF_TOKEN_PATH=$HOME/.cache/hf_hub_token
export HF_HOME=<path/to/huggingface>

📖 Usage

🗂️ Data Collection

The data collection process involves three main steps: seed generation, report extraction, and project configuration.

Step 1: Seed Generation

Extract metadata from OSV database files:

python -m secb.preprocessor.seed \
    --input-dir [OSV_DIR] \
    --output-file [SEED_OUTPUT_FILE_PATH] \
    --verbose

Step 2: Report Extraction

Extract bug reports from reference URLs:

python -m secb.preprocessor.report \
    --input-file [SEED_OUTPUT_FILE_PATH] \
    --output-file [REPORT_OUTPUT_FILE_PATH] \
    --reports-dir [REPORTS_DIR] \
    --lang [LANGUAGE] \
    --type [TYPE] \
    --whitelist [WHITELIST_PROJECTS] \
    --blacklist [BLACKLIST_PROJECTS] \
    --oss-fuzz

Step 3: Project Configuration

Generate project configurations for vulnerability reproduction:

python -m secb.preprocessor.project \
    --input-file [REPORT_OUTPUT_FILE_PATH] \
    --output-file [PROJECT_OUTPUT_FILE_PATH] \
    --tracking-file [TRACKING_FILE_PATH] \
    --verbose

🛠️ Simplified Collection with Script

Use the provided script for streamlined processing:

./run_preprocessor.sh <mode> [options]

Available modes:

seed: Parse CVE/OSV files and extract relevant information
report: Extract bug descriptions from reference URLs
project: Generate project configurations for reproducing vulnerabilities

Example workflows:

Note

Download the OSV database and place it in the output/osv directory. The following example workflows are for C/C++ vulnerabilities. For other languages, you need to specify the language and type.

# Basic seed generation
./run_preprocessor.sh seed --input-dir ./output/osv --output-file ./output/seed.jsonl

# Filter for C/C++ CVEs in OSS-Fuzz projects
./run_preprocessor.sh report \
    --input-file ./output/seed.jsonl \
    --type CVE \
    --oss-fuzz \
    --lang C,C++

# Generate minimal project configurations
./run_preprocessor.sh project \
    --input-file ./output/report-cve-oss-c-cpp.jsonl \
    --sanitizer-only \
    --minimal

🏗️ Instance Building

Build Base Images

Create foundational Docker images:

python -m secb.preprocessor.build_base_images

Build Instance Images

Create specific vulnerability instances:

# Build specific instance
python -m secb.preprocessor.build_instance_images \
    --input-file [PROJECT_OUTPUT_FILE] \
    --ids [INSTANCE_IDS]

# Example: Build OpenJPEG CVE instance
python -m secb.preprocessor.build_instance_images \
    --input-file ./output/project-cve-oss-c-cpp-sanitizer-minimal.jsonl \
    --ids openjpeg.cve-2024-56827

# Example: Build all GPAC CVE instances
python -m secb.preprocessor.build_instance_images \
    --input-file ./output/project-cve-oss-c-cpp-sanitizer-minimal.jsonl \
    --filter gpac.cve

✅ Verification

Verify built instances using the SecVerifier repository. This step ensures that vulnerability instances are correctly configured and reproducible.

🧪 Evaluation

Option 1: Use Pre-built Images

Access verified evaluation images from Docker Hub:

docker pull hwiwonlee/secb.eval.x86_64.[instance_name]

Option 2: Build Evaluation Images

python -m secb.evaluator.build_eval_instances \
    --input-dir [VERIFIED_INSTANCE_DIR]

Run Evaluation

python -m secb.evaluator.eval_instances \
    --input-dir [AGENT_OUTPUT_DIR] \
    --type [TYPE] \
    --split [SPLIT] \
    --agent [AGENT] \
    --num-workers [NUM_WORKERS] \
    --output-dir [OUTPUT_DIR]

Parameters:

type: Evaluation type (patch or poc)
split: Dataset split to evaluate
agent: Agent type (swea, oh, aider)
num-workers: Number of parallel workers

📊 Results Viewing

Patch Results

python -m secb.evaluator.view_patch_results \
    --agent [AGENT] \
    --input-dir [EVALUATION_OUTPUT_DIR]

PoC Results

python -m secb.evaluator.view_poc_results \
    --agent [AGENT] \
    --input-dir [EVALUATION_OUTPUT_DIR]

🐳 Docker Images

Note

In the Docker evaluation images, you can check a harness secb with options such as build, repro, and patch.

SEC-bench uses a hierarchical Docker image structure:

Base Images: hwiwonlee/secb.base:* - Foundation images with build tools
Instance Images: hwiwonlee/secb.x86_64.* - Vulnerability-specific environments
Evaluation Images: hwiwonlee/secb.eval.x86_64.* - Verified evaluation instances

Image Naming Convention

hwiwonlee/secb.eval.x86_64.[project].[vulnerability_id]

Example: hwiwonlee/secb.eval.x86_64.mruby.cve-2022-0240

📚 Citation

If you use SEC-bench in your research, please cite our paper:

@article{lee2025sec,
  title={SEC-bench: Automated Benchmarking of LLM Agents on Real-World Software Security Tasks},
  author={Lee, Hwiwon and Zhang, Ziqi and Lu, Hanxiao and Zhang, Lingming},
  journal={arXiv preprint arXiv:2506.11791},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
secb		secb
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
run_preprocessor.sh		run_preprocessor.sh

Uh oh!

License

Uh oh!

SEC-bench/SEC-bench

Folders and files

Latest commit

History

Repository files navigation

SEC-bench

🎯 Overview

✨ Features

🔧 Prerequisites

🚀 Installation

1. Clone the Repository

2. Create Python Environment

3. Install Dependencies

🔑 Environment Setup

📖 Usage

🗂️ Data Collection

Step 1: Seed Generation

Step 2: Report Extraction

Step 3: Project Configuration

🛠️ Simplified Collection with Script

🏗️ Instance Building

Build Base Images

Build Instance Images

✅ Verification

🧪 Evaluation

Option 1: Use Pre-built Images

Option 2: Build Evaluation Images

Run Evaluation

📊 Results Viewing

Patch Results

PoC Results

🐳 Docker Images

Image Naming Convention

📚 Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages