-
Couldn't load subscription status.
- Fork 57
feat: add support for non-deterministic models in GeneralSemanticRobustness and add BERTScore Dissimilarity #184
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…void out of memory errors
…Add the normalization factor for stochastic models.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I approve. We had multiple rounds of offline discussion, no additional comments.
| without a change in the input. So this evaluation normalizes the robustness score to account for | ||
| the baseline non-determinism. Specifically, if d is a score (Word Error Rate or BERTScore | ||
| Dissimilarity), then the evaluation reports max(0, d - d_base) where d_base measures the | ||
| differences between the model output on the same input. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice explanation!
…ents of get_meteor_score and get_bert_score out of kwargs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No scientific changes since last review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a short metric description in the constants here for reporting?
Discussed with Bilal offline, will add description in a followup PR.
Description of changes:
This PR implements the following changes to
GeneralSemanticRobustnesseval:Word Error Ratemetric that measures syntactic differences, we add theBERTScore Dissimilaritymetric that measures semantic differences. We useBERTScore Dissimilarity = 1 - BERTScore(a dissimilarity metric) instead ofBERTScore(a similarity metric). We use dissimilarity to be consistent withWord Error Rateand the rest ofSemanticRobustnessevals that measure dissimilarities.BERTScore DissimilarityandWord Error Ratewhen the model is non-deterministic.