feat: add `quasi_exact_inclusion` metric to factual knowledge; change `factual_knowledge` score name to `exact_inclusion` #302

kirupang-code · 2024-07-03T21:03:15Z

Description of changes: I added a new metric to factual knowledge called "quasi_exact_inclusion" which tests checks whether the target output is included in the model output after both outputs are normalized. Since both qa_accuracy and factual_knowledge.py had similar logic in their normalization functions, I moved this to util and imported the function from there. I updated the EvalScores and outputs so they would output both metrics. I also updated the unit tests (to check for both metrics) and added more tests to check the second metrics output. Then, I added more unit tests to check the behavior of private functions in both qa_accuracy and factual_knowledge. Lastly, I updated the integration tests for factual_knowledge so that they would also check the second metric. I reviewed the code and tried to make sure that I updated documentation of functions/files so that they would incorporate both metrics.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

cr: https://code.amazon.com/reviews/CR-135854933

polaschwoebel

Nice!

Regarding naming: Should we rename the scores

FACTUAL_KNOWLEDGE --> EXACT_INCLUSION and
FACTUAL_KNOWLEDGE_FUZZY --> QUASI_EXACT_INCLUSION

to be consistent with the rest of the codebase and specifically QA accuracy?
This way, now that we have more than one score for the factual knowledge evaluation, we would not call both the evaluation itself and the score "factual knowledge".

src/fmeval/eval_algorithms/factual_knowledge.py

polaschwoebel · 2024-07-05T08:29:54Z

src/fmeval/eval_algorithms/qa_accuracy.py

 from fmeval.eval_algorithms.util import (
    validate_dataset,
    get_dataset_configs,
+    normalize_text_quac_protocol,


Good idea to move shared functions out, let's do more of this as we continue to modularize the code base. I suggest we add a metrics.py in addition to utils.py for all metric-related computations going forward.

test/integration/test_factual_knowledge.py

test/unit/eval_algorithms/test_factual_knowledge.py

src/fmeval/eval_algorithms/factual_knowledge.py

xiaoyi-cheng

Overall looks good! Please fix the naming change and also address Pola's comments

src/fmeval/eval_algorithms/factual_knowledge.py

oyangz · 2024-07-09T21:17:51Z

Thanks for adding the score descriptions, could you update this list for reporting to use the new factual knowledge score names as well?

polaschwoebel

Looks good! See small comment regarding score descriptions.

src/fmeval/reporting/constants.py

Added metric to factual knowledge + unit/integration tests

6d66b62

cr: https://code.amazon.com/reviews/CR-135854933

kirupang-code changed the title ~~Added metric to factual knowledge + unit/integration tests~~ [Feature Request] Added metric to factual knowledge + unit/integration tests Jul 3, 2024

polaschwoebel previously approved these changes Jul 5, 2024

View reviewed changes

polaschwoebel reviewed Jul 5, 2024

View reviewed changes

src/fmeval/eval_algorithms/factual_knowledge.py Show resolved Hide resolved

xiaoyi-cheng changed the title ~~[Feature Request] Added metric to factual knowledge + unit/integration tests~~ feat: Added metric to factual knowledge + unit/integration tests Jul 8, 2024

xiaoyi-cheng requested changes Jul 8, 2024

View reviewed changes

src/fmeval/eval_algorithms/factual_knowledge.py Show resolved Hide resolved

src/fmeval/eval_algorithms/factual_knowledge.py Outdated Show resolved Hide resolved

src/fmeval/eval_algorithms/factual_knowledge.py Outdated Show resolved Hide resolved

fixed changes from PR comments

cc866ed

kirupang-code dismissed polaschwoebel’s stale review via cc866ed July 8, 2024 20:36

Deleted metrics.py and restored code in util.py

843d9f6

xiaoyi-cheng approved these changes Jul 9, 2024

View reviewed changes

xiaoyi-cheng changed the title ~~feat: Added metric to factual knowledge + unit/integration tests~~ feat: add quasi_exact_inclusion metric to factual knowledge; change factual_knowledge score name to exact_inclusion Jul 9, 2024

xiaoyi-cheng previously approved these changes Jul 9, 2024

View reviewed changes

xiaoyi-cheng requested a review from polaschwoebel July 9, 2024 04:28

oyangz previously approved these changes Jul 9, 2024

View reviewed changes

kirupang-code added 2 commits July 9, 2024 10:45

added factual knowledge metrics to constants.py

c2f9efb

Merge branch 'main' of github.com:aws/fmeval

d7e5fa5

kirupang-code dismissed stale reviews from oyangz and xiaoyi-cheng via d7e5fa5 July 9, 2024 17:53

kirupang-code requested review from oyangz and xiaoyi-cheng July 9, 2024 21:04

added factual knowledge metrics to be included in binary score

8d9bf4f

xiaoyi-cheng previously approved these changes Jul 9, 2024

View reviewed changes

oyangz previously approved these changes Jul 10, 2024

View reviewed changes

polaschwoebel previously approved these changes Jul 10, 2024

View reviewed changes

src/fmeval/reporting/constants.py Outdated Show resolved Hide resolved

updated score descriptions for factual knowledge

1aee116

kirupang-code dismissed stale reviews from polaschwoebel, oyangz, and xiaoyi-cheng via 1aee116 July 10, 2024 16:09

xiaoyi-cheng approved these changes Jul 10, 2024

View reviewed changes

oyangz approved these changes Jul 10, 2024

View reviewed changes

xiaoyi-cheng merged commit 57e316b into aws:main Jul 10, 2024

danielezhu mentioned this pull request Jul 18, 2024

chore: rename factual knowledge scores #312

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

feat: add `quasi_exact_inclusion` metric to factual knowledge; change `factual_knowledge` score name to `exact_inclusion` #302

feat: add `quasi_exact_inclusion` metric to factual knowledge; change `factual_knowledge` score name to `exact_inclusion` #302

Uh oh!

kirupang-code commented Jul 3, 2024 •

edited

Loading

Uh oh!

polaschwoebel left a comment •

edited

Loading

Uh oh!

Uh oh!

polaschwoebel Jul 5, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

xiaoyi-cheng left a comment •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

oyangz commented Jul 9, 2024

Uh oh!

polaschwoebel left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

feat: add quasi_exact_inclusion metric to factual knowledge; change factual_knowledge score name to exact_inclusion #302

feat: add quasi_exact_inclusion metric to factual knowledge; change factual_knowledge score name to exact_inclusion #302

Uh oh!

Conversation

kirupang-code commented Jul 3, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

polaschwoebel left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

polaschwoebel Jul 5, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

xiaoyi-cheng left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

oyangz commented Jul 9, 2024

Uh oh!

polaschwoebel left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

feat: add `quasi_exact_inclusion` metric to factual knowledge; change `factual_knowledge` score name to `exact_inclusion` #302

feat: add `quasi_exact_inclusion` metric to factual knowledge; change `factual_knowledge` score name to `exact_inclusion` #302

kirupang-code commented Jul 3, 2024 •

edited

Loading

polaschwoebel left a comment •

edited

Loading

xiaoyi-cheng left a comment •

edited

Loading