feat: add support for non-deterministic models in GeneralSemanticRobustness and add BERTScore Dissimilarity #184

bilalaws · 2024-02-07T17:37:00Z

Description of changes:

This PR implements the following changes to GeneralSemanticRobustness eval:

To complement the Word Error Rate metric that measures syntactic differences, we add the BERTScore Dissimilarity metric that measures semantic differences. We use BERTScore Dissimilarity = 1 - BERTScore (a dissimilarity metric) instead of BERTScore (a similarity metric). We use dissimilarity to be consistent with Word Error Rate and the rest of SemanticRobustness evals that measure dissimilarities.
We normalize the BERTScore Dissimilarity and Word Error Rate when the model is non-deterministic.

…void out of memory errors

…Add the normalization factor for stochastic models.

polaschwoebel

I approve. We had multiple rounds of offline discussion, no additional comments.

polaschwoebel · 2024-02-16T14:57:50Z

src/fmeval/eval_algorithms/general_semantic_robustness.py

+    without a change in the input. So this evaluation normalizes the robustness score to account for
+    the baseline non-determinism. Specifically, if d is a score (Word Error Rate or BERTScore
+    Dissimilarity), then the evaluation reports max(0, d - d_base) where d_base measures the
+    differences between the model output on the same input.


Nice explanation!

…e care of it

…nd disk space

src/fmeval/constants.py

src/fmeval/eval_algorithms/summarization_accuracy.py

test/unit/eval_algorithms/test_summarization_accuracy.py

…ents of get_meteor_score and get_bert_score out of kwargs.

polaschwoebel

No scientific changes since last review.

oyangz

Could you add a short metric description in the constants here for reporting?

Discussed with Bilal offline, will add description in a followup PR.

Add the BERTScore computation to general semantic robustness

0577aa2

bilalaws requested review from franluca, keerthanvasist, malhotra18 and polaschwoebel February 7, 2024 17:37

fixing lint issue

2af3bf7

xiaoyi-cheng changed the title ~~Add the BERTScore computation to general semantic robustness~~ feat: add the BERTScore computation to general semantic robustness Feb 8, 2024

bilalaws and others added 7 commits February 12, 2024 16:18

Merge branch 'aws:main' into main

3f5d657

use the correct default model for BERTScore

3b358a6

Merge branch 'aws:main' into main

f012efd

Changes to allow mocking helper models in unit tests

a17fb9a

Foreceful ray shutdown in general semantic robustness integ test to a…

53efab0

…void out of memory errors

Change BERTScore from a similarity metric to a dissimilarity metric. …

ec06a97

…Add the normalization factor for stochastic models.

Merge branch 'aws:main' into main

9afbaae

bilalaws changed the title ~~feat: add the BERTScore computation to general semantic robustness~~ feat: add support for non-deterministic models in GeneralSemanticRobustness and add BERTScore Dissimilarity Feb 15, 2024

bilalaws and others added 3 commits February 15, 2024 16:22

Fix the bug where max should be used instead of min in the baseline

ccee2d0

Enable verbose logging for integ tests to debug test failures

2aa26ee

adding additional verbose flags for integ test

136be4d

polaschwoebel previously approved these changes Feb 16, 2024

View reviewed changes

bilalaws and others added 2 commits February 16, 2024 16:29

Merge branch 'main' into main

140ad63

Remove redundant ray shutdown as we have a autouse fixture now to tak…

bbf85ea

…e care of it

malhotra18 dismissed polaschwoebel’s stale review via bbf85ea February 16, 2024 17:26

Removing perturbation types based integ tests to save time consumed a…

4f46593

…nd disk space

malhotra18 requested changes Feb 19, 2024

View reviewed changes

src/fmeval/constants.py Outdated Show resolved Hide resolved

src/fmeval/eval_algorithms/summarization_accuracy.py Show resolved Hide resolved

test/unit/eval_algorithms/test_summarization_accuracy.py Outdated Show resolved Hide resolved

Move the BertscoreHelperModels enumerator and tests. Unroll the argum…

d9be175

…ents of get_meteor_score and get_bert_score out of kwargs.

bilalaws removed the request for review from franluca February 19, 2024 14:49

polaschwoebel approved these changes Feb 19, 2024

View reviewed changes

oyangz previously requested changes Feb 19, 2024

View reviewed changes

malhotra18 approved these changes Feb 19, 2024

View reviewed changes

malhotra18 merged commit 7982a0f into aws:main Feb 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

feat: add support for non-deterministic models in GeneralSemanticRobustness and add BERTScore Dissimilarity #184

feat: add support for non-deterministic models in GeneralSemanticRobustness and add BERTScore Dissimilarity #184

Uh oh!

bilalaws commented Feb 7, 2024 •

edited

Loading

Uh oh!

polaschwoebel left a comment

Uh oh!

polaschwoebel Feb 16, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

polaschwoebel left a comment

Uh oh!

oyangz left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

feat: add support for non-deterministic models in GeneralSemanticRobustness and add BERTScore Dissimilarity #184

feat: add support for non-deterministic models in GeneralSemanticRobustness and add BERTScore Dissimilarity #184

Uh oh!

Conversation

bilalaws commented Feb 7, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

polaschwoebel left a comment

Choose a reason for hiding this comment

Uh oh!

polaschwoebel Feb 16, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

polaschwoebel left a comment

Choose a reason for hiding this comment

Uh oh!

oyangz left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

bilalaws commented Feb 7, 2024 •

edited

Loading