Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@bilalaws
Copy link
Contributor

Description of changes: This pull request adds the Precision and Recall metrics for the Question Answering task. Previous metrics do not capture the cases where one of the target output or model output is short and the other one is long.

For instance, consider the question Did RMS Titanic sink in 1912? If the target output is Yes and the model output is Yes. The ship indeed sank in 1912. It was the largest ship at the time <some long text> then the existing metrics will give a low score even though the answer is correct. The recall metric added in this PR will be 1.0 indicating that all of the target output words are contained within the model output. The precision metric operates in the opposite direction and measures what fraction of words in the model output are found in the target output.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

"""
if normalize_text: # pragma: no branch
model_output, target_output = (_normalize_text_quac_protocol(text) for text in (model_output, target_output))
ret = precision(reference=set(target_output.split(" ")), test=set(model_output.split(" ")))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why set and not list? we want to discard repetitions of words?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Valid question, but set seems standard. At least that's what the NLTK metric assumes that we use here.

"""
if normalize_text: # pragma: no branch
model_output, target_output = (_normalize_text_quac_protocol(text) for text in (model_output, target_output))
ret = recall(reference=set(target_output.split(" ")), test=set(model_output.split(" ")))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment

polaschwoebel
polaschwoebel previously approved these changes Dec 14, 2023
"""
if normalize_text: # pragma: no branch
model_output, target_output = (_normalize_text_quac_protocol(text) for text in (model_output, target_output))
ret = precision(reference=set(target_output.split(" ")), test=set(model_output.split(" ")))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Valid question, but set seems standard. At least that's what the NLTK metric assumes that we use here.

polaschwoebel
polaschwoebel previously approved these changes Dec 15, 2023
franluca
franluca previously approved these changes Dec 15, 2023
malhotra18
malhotra18 previously approved these changes Dec 15, 2023
@xiaoyi-cheng xiaoyi-cheng changed the title Added the precision and recall metrics for QA accuracy feat: added the precision and recall metrics for QA accuracy Dec 21, 2023
@bilalaws bilalaws merged commit 32c089a into main Jan 12, 2024
@bilalaws bilalaws deleted the qa_metrics branch January 17, 2024 12:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants