Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Dataset Issue: 713 examples have duplicate choices #5

@bviggiano

Description

@bviggiano

Hello! We are attempting to implement a benchmark evaluation utilizing medmcqa, and we believe we may have discovered an issue with the underlying dataset that we wanted to bring to your attention: many multiple-choice questions have duplicate choices.

We found 692 examples from the training set and 21 examples from the test set that have this issue.

Here are 10 ids of offending examples from the training set:

- 476a3ecd-7c42-4c85-9982-1ce80c95ab82
- 9f553c15-928f-41f8-8e94-021521702b9b
- 6d893f23-4404-4711-97df-e266c407ecdc
- 1154e512-eec5-4eae-b944-3de530532c4e
- deb53386-ca4b-48e0-b6de-489537df647b
- 5ba3d7de-9e3f-42cf-9ba8-7330fd1c1701
- 6c110742-768c-4dbd-8d12-7fa08d8d7d9c
- 67fddd3c-c80c-46e1-b28e-90b27214be8d
- 0be9175d-db8c-4da5-840c-38cb1060028d
- c5b72144-dbe1-47a6-8312-cdb42994bb01

In some cases, the correct answer is duplicated.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions