-
Notifications
You must be signed in to change notification settings - Fork 0
Project for an advanced lab investigating LLM benchmarks from an IR perspective. Instead of focusing on model performance, we evaluated benchmark robustness, identifying which questions truly differentiate models and whether leaderboard rankings reflect real differences or are dominated by easy, high-hubness items.
g4ix/advLab1-HITS
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
About
Project for an advanced lab investigating LLM benchmarks from an IR perspective. Instead of focusing on model performance, we evaluated benchmark robustness, identifying which questions truly differentiate models and whether leaderboard rankings reflect real differences or are dominated by easy, high-hubness items.
Topics
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published