-
Couldn't load subscription status.
- Fork 57
feat: update implementation of SummarizationAccuracySemanticRobustness to use Transform-based approach #233
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
β¦s to use Transform-based approach
β¦fix other unit tests that use invoke_model instead of get_helper_scores
| class TestSummarizationAccuracySemanticRobustness: | ||
| @pytest.mark.parametrize( | ||
| "config, expected_evaluate_sample_scores, expected_evaluate_scores", | ||
| "config, evaluate_sample_scores, evaluate_scores", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that as with GSR, I have verified that the SASR logic remains the same as before by temporarily using the old numpy APIs in the semantic perturbation code, and then running the test cases using the old expected scores.
β¦n't get loaded preemptively
β¦be created from them
| self._bertscore = hf_evaluate.load("bertscore") | ||
| self._model_type = model_type | ||
|
|
||
| # Dummy call to download the model within constructor |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Deleting this b/c loading the model into memory during the evaluate_sample integ tests causes codebuild integ tests to fail (the job just hangs/crashes). Note that I never had any issues locally, even with a several year old macbook pro.
Note that this doesn't impact the correctness of the algo/helper model at all, as we're going to call compute "for real" when we obtain the first scores anyways.
| "Missing required input: model_input, for SummarizationAccuracySemanticRobustness evaluate_sample", | ||
| ) | ||
| util.require( | ||
| transforms = get_model_responses_from_perturbed_inputs( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ugh, I left this extraneous variable in from my old code. I will get rid of it in a followup PR, as we need to get the other PRs for the rest of the algos merged asap.
Description of changes:
See title. This PR also fixes a bug introduced in the previous PR where the
invoke_modelcall inBertScorewas not replaced withget_helper_scores.By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.