Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@kirupang-code
Copy link
Contributor

@kirupang-code kirupang-code commented Jul 3, 2024

Description of changes: I added a new metric to factual knowledge called "quasi_exact_inclusion" which tests checks whether the target output is included in the model output after both outputs are normalized. Since both qa_accuracy and factual_knowledge.py had similar logic in their normalization functions, I moved this to util and imported the function from there. I updated the EvalScores and outputs so they would output both metrics. I also updated the unit tests (to check for both metrics) and added more tests to check the second metrics output. Then, I added more unit tests to check the behavior of private functions in both qa_accuracy and factual_knowledge. Lastly, I updated the integration tests for factual_knowledge so that they would also check the second metric. I reviewed the code and tried to make sure that I updated documentation of functions/files so that they would incorporate both metrics.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@kirupang-code kirupang-code changed the title Added metric to factual knowledge + unit/integration tests [Feature Request] Added metric to factual knowledge + unit/integration tests Jul 3, 2024
polaschwoebel
polaschwoebel previously approved these changes Jul 5, 2024
Copy link
Contributor

@polaschwoebel polaschwoebel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

Regarding naming: Should we rename the scores

  • FACTUAL_KNOWLEDGE --> EXACT_INCLUSION and
  • FACTUAL_KNOWLEDGE_FUZZY --> QUASI_EXACT_INCLUSION

to be consistent with the rest of the codebase and specifically QA accuracy?
This way, now that we have more than one score for the factual knowledge evaluation, we would not call both the evaluation itself and the score "factual knowledge".

from fmeval.eval_algorithms.util import (
validate_dataset,
get_dataset_configs,
normalize_text_quac_protocol,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea to move shared functions out, let's do more of this as we continue to modularize the code base. I suggest we add a metrics.py in addition to utils.py for all metric-related computations going forward.

@xiaoyi-cheng xiaoyi-cheng changed the title [Feature Request] Added metric to factual knowledge + unit/integration tests feat: Added metric to factual knowledge + unit/integration tests Jul 8, 2024
Copy link
Contributor

@xiaoyi-cheng xiaoyi-cheng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good! Please fix the naming change and also address Pola's comments

@xiaoyi-cheng xiaoyi-cheng changed the title feat: Added metric to factual knowledge + unit/integration tests feat: add quasi_exact_inclusion metric to factual knowledge; change factual_knowledge score name to exact_inclusion Jul 9, 2024
xiaoyi-cheng
xiaoyi-cheng previously approved these changes Jul 9, 2024
oyangz
oyangz previously approved these changes Jul 9, 2024
@kirupang-code kirupang-code dismissed stale reviews from oyangz and xiaoyi-cheng via d7e5fa5 July 9, 2024 17:53
@oyangz
Copy link
Contributor

oyangz commented Jul 9, 2024

Thanks for adding the score descriptions, could you update this list for reporting to use the new factual knowledge score names as well?

xiaoyi-cheng
xiaoyi-cheng previously approved these changes Jul 9, 2024
oyangz
oyangz previously approved these changes Jul 10, 2024
polaschwoebel
polaschwoebel previously approved these changes Jul 10, 2024
Copy link
Contributor

@polaschwoebel polaschwoebel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! See small comment regarding score descriptions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants