-
Couldn't load subscription status.
- Fork 57
feat: add quasi_exact_inclusion metric to factual knowledge; change factual_knowledge score name to exact_inclusion
#302
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
Regarding naming: Should we rename the scores
FACTUAL_KNOWLEDGE-->EXACT_INCLUSIONandFACTUAL_KNOWLEDGE_FUZZY-->QUASI_EXACT_INCLUSION
to be consistent with the rest of the codebase and specifically QA accuracy?
This way, now that we have more than one score for the factual knowledge evaluation, we would not call both the evaluation itself and the score "factual knowledge".
| from fmeval.eval_algorithms.util import ( | ||
| validate_dataset, | ||
| get_dataset_configs, | ||
| normalize_text_quac_protocol, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea to move shared functions out, let's do more of this as we continue to modularize the code base. I suggest we add a metrics.py in addition to utils.py for all metric-related computations going forward.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good! Please fix the naming change and also address Pola's comments
quasi_exact_inclusion metric to factual knowledge; change factual_knowledge score name to exact_inclusion
|
Thanks for adding the score descriptions, could you update this list for reporting to use the new factual knowledge score names as well? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! See small comment regarding score descriptions.
1aee116
Description of changes: I added a new metric to factual knowledge called "quasi_exact_inclusion" which tests checks whether the target output is included in the model output after both outputs are normalized. Since both qa_accuracy and factual_knowledge.py had similar logic in their normalization functions, I moved this to util and imported the function from there. I updated the EvalScores and outputs so they would output both metrics. I also updated the unit tests (to check for both metrics) and added more tests to check the second metrics output. Then, I added more unit tests to check the behavior of private functions in both qa_accuracy and factual_knowledge. Lastly, I updated the integration tests for factual_knowledge so that they would also check the second metric. I reviewed the code and tried to make sure that I updated documentation of functions/files so that they would incorporate both metrics.
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.