Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@bilalaws
Copy link
Contributor

@bilalaws bilalaws commented Feb 7, 2024

Description of changes:

This PR implements the following changes to GeneralSemanticRobustness eval:

  1. To complement the Word Error Rate metric that measures syntactic differences, we add the BERTScore Dissimilarity metric that measures semantic differences. We use BERTScore Dissimilarity = 1 - BERTScore (a dissimilarity metric) instead of BERTScore (a similarity metric). We use dissimilarity to be consistent with Word Error Rate and the rest of SemanticRobustness evals that measure dissimilarities.
  2. We normalize the BERTScore Dissimilarity and Word Error Rate when the model is non-deterministic.

@xiaoyi-cheng xiaoyi-cheng changed the title Add the BERTScore computation to general semantic robustness feat: add the BERTScore computation to general semantic robustness Feb 8, 2024
@bilalaws bilalaws changed the title feat: add the BERTScore computation to general semantic robustness feat: add support for non-deterministic models in GeneralSemanticRobustness and add BERTScore Dissimilarity Feb 15, 2024
polaschwoebel
polaschwoebel previously approved these changes Feb 16, 2024
Copy link
Contributor

@polaschwoebel polaschwoebel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I approve. We had multiple rounds of offline discussion, no additional comments.

without a change in the input. So this evaluation normalizes the robustness score to account for
the baseline non-determinism. Specifically, if d is a score (Word Error Rate or BERTScore
Dissimilarity), then the evaluation reports max(0, d - d_base) where d_base measures the
differences between the model output on the same input.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice explanation!

…ents of get_meteor_score and get_bert_score out of kwargs.
@bilalaws bilalaws removed the request for review from franluca February 19, 2024 14:49
Copy link
Contributor

@polaschwoebel polaschwoebel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No scientific changes since last review.

oyangz
oyangz previously requested changes Feb 19, 2024
Copy link
Contributor

@oyangz oyangz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a short metric description in the constants here for reporting?

@oyangz oyangz dismissed their stale review February 19, 2024 20:35

Discussed with Bilal offline, will add description in a followup PR.

@malhotra18 malhotra18 merged commit 7982a0f into aws:main Feb 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants