CMedBench: Evaluating LLM Compression Impact on Medical Applications

Introduction

The Compressed Medical LLM Benchmark (CMedBench) is a comprehensive benchmark designed to evaluate the performance of compressed large language models (LLMs) in medical applications. It provides an in-depth analysis across multiple tracks to assess model efficiency, accuracy, and trustworthiness in medical contexts.

Installation

Clone the repository and set up the environment:

git clone https://github.com/Tabrisrei/CMedBench.git
cd CMedBench

conda create -n cmedbench python=3.10
conda activate cmedbench
cd TrustLLM/trustllm_pkg
pip install -e .
cd ../../opencompass
pip install -e .
pip install vllm pynvml

Dataset Acquisition

Tracks 1, 2, and 4

Add your API token to PycrawlersDownload.py.
Run the download script:
```
python PycrawlersDownload.py
```

Tips: None of the mmlu dataset in huggingface is correctly parsed, so we use opencompass dataset reader. Please download the dataset from https://people.eecs.berkeley.edu/~hendrycks/data.tar

Alternatively, download the dataset zip file from our GitHub repository and unzip it in the project folder to access Track 1, 2, and 4 datasets.

Track 3

Unzip the trustworthy dataset:

unzip TrustLLM/dataset/dataset.zip

Usage

This repository includes scripts to evaluate LLMs across five tracks. Ensure the LLM to be tested is prepared before running evaluations.

Testing Tracks 1, 2, and 4

Update the dataset and model paths in the configuration file:
```
opencompass/configs/xperiments
```
Modify the log and result paths in:
```
opencompass/scripts/launcher.sh
```
Run the evaluation:
```
cd opencompass
bash scripts/launcher.sh
```

Testing Track 3

Update the paths in the generation and evaluation scripts:
```
TrustLLM/run_generation.py
TrustLLM/run_evaluation.py
```
Generate LLM results:
```
cd TrustLLM
python run_generation.py
```
After generation completes, calculate metrics:
```
python run_evaluation.py
```
Note: The generation process may take significant time. Consider using nohup or tmux to run it in the background.

Testing Track 5

Update the path in the efficiency evaluation script:
```
track5_efficiency.py
```
Run the script to evaluate model efficiency.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
TrustLLM		TrustLLM
figs		figs
opencompass		opencompass
.gitattributes		.gitattributes
LICENSE		LICENSE
PycrawlersDownload.py		PycrawlersDownload.py
README.md		README.md
track5_efficiency.py		track5_efficiency.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

CMedBench: Evaluating LLM Compression Impact on Medical Applications

Introduction

Installation

Dataset Acquisition

Tracks 1, 2, and 4

Track 3

Usage

Testing Tracks 1, 2, and 4

Testing Track 3

Testing Track 5

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

Tabrisrei/CMedBench

Folders and files

Latest commit

History

Repository files navigation

CMedBench: Evaluating LLM Compression Impact on Medical Applications

Introduction

Installation

Dataset Acquisition

Tracks 1, 2, and 4

Track 3

Usage

Testing Tracks 1, 2, and 4

Testing Track 3

Testing Track 5

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages