Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Alab-NII/llm-judge-extract-qa

Repository files navigation

LLM-as-a-Judge for Extractive QA Datasets

This is the repository for the paper: LLM-as-a-Judge: Reassessing the Performance of LLMs in Extractive QA

Reproduction of Results

Table 2: Pearson correlation coefficients

python3 get_correlation_score.py

Table 3: Automatic evaluation scores (EM and F1) and LLM-as-a-judge scores

python3 run_eval.py

Running process

Step 1: Run the QA Task

python3 run_qa.py

Postprocessing the generated response from the QA task

python3 postprocess_qa.py

Step 2: Run LLM-as-a-judge

python3 run_judge.py

Step 3: Evaluation

python3 run_eval.py

Calculate correlation scores:

python3 get_correlation_score.py

Data files include:

Releases

No releases published

Packages

No packages published

Languages