Thanks to visit codestin.com
Credit goes to github.com

Skip to content

SEC-bench/SEC-bench

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

4 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

SEC-bench

Automated Benchmarking of LLM Agents on Real-World Software Security Tasks

Hwiwon Lee1, Ziqi Zhang1, Hanxiao Lu2, Lingming Zhang1

1UIUC, 2Purdue University

πŸ“„ Paper πŸ“Š Leaderboard πŸ€— Dataset


🎯 Overview

SEC-bench is a comprehensive benchmarking framework designed to evaluate Large Language Model (LLM) agents on real-world software security tasks. It provides automated tools for collecting vulnerability data, building reproducible vulnerability instances, and evaluating agent performance on security-related tasks.

✨ Features

  • πŸ” Automated Benchmark Generation: Automated benchmark generation from OSV database and CVE records by using multi-agentic system
  • 🐳 Containerized Environments: Docker-based reproducible vulnerability instances
  • πŸ€– Agent-oriented Evaluation: Evaluate agents on critical software security tasks (SWE-agent, OpenHands, and Aider are supported)
  • πŸ“Š Comprehensive Security Assessment: Both PoC generation and vulnerability patching assessment with extensibility to other tasks (e.g., fuzzing, static analysis, etc.)
  • πŸ“ˆ Rich Reporting: Detailed progress tracking and result visualization with rich terminal output

πŸ”§ Prerequisites

  • Python: 3.12 or higher
  • Docker: Latest version with sufficient disk space (>200GB recommended)
  • Git: For repository cloning and submodule management
  • Conda: For environment management (recommended)

πŸš€ Installation

1. Clone the Repository

git clone --recurse-submodules https://github.com/SEC-bench/SEC-bench.git
cd SEC-bench

2. Create Python Environment

conda create -n secb python=3.12
conda activate secb

3. Install Dependencies

pip install -r requirements.txt

πŸ”‘ Environment Setup

Configure the following environment variables in your shell profile or .env file:

# Required API tokens
export GITHUB_TOKEN=<your_github_token>
export GITLAB_TOKEN=<your_gitlab_token>
export OPENAI_API_KEY=<your_openai_api_key>
export ANTHROPIC_API_KEY=<your_anthropic_api_key>

# Hugging Face configuration
export HF_TOKEN_PATH=$HOME/.cache/hf_hub_token
export HF_HOME=<path/to/huggingface>

πŸ“– Usage

πŸ—‚οΈ Data Collection

The data collection process involves three main steps: seed generation, report extraction, and project configuration.

Step 1: Seed Generation

Extract metadata from OSV database files:

python -m secb.preprocessor.seed \
    --input-dir [OSV_DIR] \
    --output-file [SEED_OUTPUT_FILE_PATH] \
    --verbose

Step 2: Report Extraction

Extract bug reports from reference URLs:

python -m secb.preprocessor.report \
    --input-file [SEED_OUTPUT_FILE_PATH] \
    --output-file [REPORT_OUTPUT_FILE_PATH] \
    --reports-dir [REPORTS_DIR] \
    --lang [LANGUAGE] \
    --type [TYPE] \
    --whitelist [WHITELIST_PROJECTS] \
    --blacklist [BLACKLIST_PROJECTS] \
    --oss-fuzz

Step 3: Project Configuration

Generate project configurations for vulnerability reproduction:

python -m secb.preprocessor.project \
    --input-file [REPORT_OUTPUT_FILE_PATH] \
    --output-file [PROJECT_OUTPUT_FILE_PATH] \
    --tracking-file [TRACKING_FILE_PATH] \
    --verbose

πŸ› οΈ Simplified Collection with Script

Use the provided script for streamlined processing:

./run_preprocessor.sh <mode> [options]

Available modes:

  • seed: Parse CVE/OSV files and extract relevant information
  • report: Extract bug descriptions from reference URLs
  • project: Generate project configurations for reproducing vulnerabilities

Example workflows:

Note

Download the OSV database and place it in the output/osv directory. The following example workflows are for C/C++ vulnerabilities. For other languages, you need to specify the language and type.

# Basic seed generation
./run_preprocessor.sh seed --input-dir ./output/osv --output-file ./output/seed.jsonl

# Filter for C/C++ CVEs in OSS-Fuzz projects
./run_preprocessor.sh report \
    --input-file ./output/seed.jsonl \
    --type CVE \
    --oss-fuzz \
    --lang C,C++

# Generate minimal project configurations
./run_preprocessor.sh project \
    --input-file ./output/report-cve-oss-c-cpp.jsonl \
    --sanitizer-only \
    --minimal

πŸ—οΈ Instance Building

Build Base Images

Create foundational Docker images:

python -m secb.preprocessor.build_base_images

Build Instance Images

Create specific vulnerability instances:

# Build specific instance
python -m secb.preprocessor.build_instance_images \
    --input-file [PROJECT_OUTPUT_FILE] \
    --ids [INSTANCE_IDS]

# Example: Build OpenJPEG CVE instance
python -m secb.preprocessor.build_instance_images \
    --input-file ./output/project-cve-oss-c-cpp-sanitizer-minimal.jsonl \
    --ids openjpeg.cve-2024-56827

# Example: Build all GPAC CVE instances
python -m secb.preprocessor.build_instance_images \
    --input-file ./output/project-cve-oss-c-cpp-sanitizer-minimal.jsonl \
    --filter gpac.cve

βœ… Verification

Verify built instances using the SecVerifier repository. This step ensures that vulnerability instances are correctly configured and reproducible.

πŸ§ͺ Evaluation

Option 1: Use Pre-built Images

Access verified evaluation images from Docker Hub:

docker pull hwiwonlee/secb.eval.x86_64.[instance_name]

Option 2: Build Evaluation Images

python -m secb.evaluator.build_eval_instances \
    --input-dir [VERIFIED_INSTANCE_DIR]

Run Evaluation

python -m secb.evaluator.eval_instances \
    --input-dir [AGENT_OUTPUT_DIR] \
    --type [TYPE] \
    --split [SPLIT] \
    --agent [AGENT] \
    --num-workers [NUM_WORKERS] \
    --output-dir [OUTPUT_DIR]

Parameters:

  • type: Evaluation type (patch or poc)
  • split: Dataset split to evaluate
  • agent: Agent type (swea, oh, aider)
  • num-workers: Number of parallel workers

πŸ“Š Results Viewing

Patch Results

python -m secb.evaluator.view_patch_results \
    --agent [AGENT] \
    --input-dir [EVALUATION_OUTPUT_DIR]

PoC Results

python -m secb.evaluator.view_poc_results \
    --agent [AGENT] \
    --input-dir [EVALUATION_OUTPUT_DIR]

🐳 Docker Images

Note

In the Docker evaluation images, you can check a harness secb with options such as build, repro, and patch.

SEC-bench uses a hierarchical Docker image structure:

  • Base Images: hwiwonlee/secb.base:* - Foundation images with build tools
  • Instance Images: hwiwonlee/secb.x86_64.* - Vulnerability-specific environments
  • Evaluation Images: hwiwonlee/secb.eval.x86_64.* - Verified evaluation instances

Image Naming Convention

hwiwonlee/secb.eval.x86_64.[project].[vulnerability_id]

Example: hwiwonlee/secb.eval.x86_64.mruby.cve-2022-0240


πŸ“š Citation

If you use SEC-bench in your research, please cite our paper:

@article{lee2025sec,
  title={SEC-bench: Automated Benchmarking of LLM Agents on Real-World Software Security Tasks},
  author={Lee, Hwiwon and Zhang, Ziqi and Lu, Hanxiao and Zhang, Lingming},
  journal={arXiv preprint arXiv:2506.11791},
  year={2025}
}

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published