longbench

LongBench Evaluation with TensorRT-LLM and Sparse Attention

This directory contains evaluation scripts for LongBench v1 datasets using TensorRT-LLM backend.

Note:
LongBench v2 evaluation has been integrated into trtllm-eval. Please refer to tensorrt_llm/evaluate/longbench_v2.py for details.

Environment Setup

1. Clone LongBench Repository

First, clone the LongBench repository which contains the datasets and evaluation utilities:

git clone https://github.com/THUDM/LongBench.git

2. Install Requirements

Install the required dependencies:

pip install -r requirements.txt

3. Directory Structure

After cloning, your directory structure should look like:

sparse_attention/
├── eval_longbench_v1.py          # LongBench v1 evaluation script
├── README.md                     # This file
└── LongBench/                    # Cloned LongBench repository
    ├── LongBench/                # LongBench v1 data and configs
    │   ├── config/
    │   └── ...
    ├── ...
    └── requirements.txt

Scripts Overview

The script eval_longbench_v1.py evaluates models on the LongBench v1 dataset, which includes multiple specific tasks like narrativeqa, qasper, multifieldqa, etc. Key features:

Dataset: LongBench v1 with task-specific evaluation
Tasks: Support for 20+ different long-context tasks
Prompts: Task-specific prompts from LongBench v1 configuration
Metrics: Task-specific metrics (F1, ROUGE, classification scores, etc.)
Output: Task-level results with comprehensive summary statistics

Usage Examples

Basic Usage (Standard Attention)

python eval_longbench_v1.py \
    --model_path "/path/to/your/model" \
    --longbench_path ./LongBench \
    --output_dir results/v1_vanilla \
    --attention_backend VANILLA \
    --backend pytorch

Specific tasks With Sparse Attention (RocketKV)

python eval_longbench_v1.py \
    --model_path "/path/to/your/model" \
    --longbench_path ./LongBench \
    --dataset narrativeqa qasper \
    --output_dir results/v1_rocket \
    --attention_backend VANILLA \
    --backend pytorch \
    --rocket_sparse

Output Structure

results/v1_experiment/
├── config.json                          # Experiment configuration
├── overall_summary.json                 # Overall experiment summary
├── narrativeqa/
│   ├── narrativeqa_results.jsonl       # Detailed results
│   ├── narrativeqa_summary.json        # Task summary
│   └── pred/
│       └── narrativeqa.jsonl           # Predictions in LongBench format
├── qasper/
│   └── ...
└── ...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

LongBench Evaluation with TensorRT-LLM and Sparse Attention

Environment Setup

1. Clone LongBench Repository

2. Install Requirements

3. Directory Structure

Scripts Overview

Usage Examples

Basic Usage (Standard Attention)

Specific tasks With Sparse Attention (RocketKV)

Output Structure

Name		Name	Last commit message	Last commit date
parent directory ..
README.md		README.md
eval_longbench_v1.py		eval_longbench_v1.py
requirements.txt		requirements.txt

FilesExpand file tree

longbench

Directory actions

More options

Directory actions

More options

Latest commit

History

longbench

Folders and files

parent directory

README.md

LongBench Evaluation with TensorRT-LLM and Sparse Attention

Environment Setup

1. Clone LongBench Repository

2. Install Requirements

3. Directory Structure

Scripts Overview

Usage Examples

Basic Usage (Standard Attention)

Specific tasks With Sparse Attention (RocketKV)

Output Structure