-
Notifications
You must be signed in to change notification settings - Fork 34
Open
Description
Hello! We are attempting to implement a benchmark evaluation utilizing medmcqa, and we believe we may have discovered an issue with the underlying dataset that we wanted to bring to your attention: many multiple-choice questions have duplicate choices.
We found 692 examples from the training set and 21 examples from the test set that have this issue.
Here are 10 ids of offending examples from the training set:
- 476a3ecd-7c42-4c85-9982-1ce80c95ab82
- 9f553c15-928f-41f8-8e94-021521702b9b
- 6d893f23-4404-4711-97df-e266c407ecdc
- 1154e512-eec5-4eae-b944-3de530532c4e
- deb53386-ca4b-48e0-b6de-489537df647b
- 5ba3d7de-9e3f-42cf-9ba8-7330fd1c1701
- 6c110742-768c-4dbd-8d12-7fa08d8d7d9c
- 67fddd3c-c80c-46e1-b28e-90b27214be8d
- 0be9175d-db8c-4da5-840c-38cb1060028d
- c5b72144-dbe1-47a6-8312-cdb42994bb01
In some cases, the correct answer is duplicated.
Metadata
Metadata
Assignees
Labels
No labels