feat: update implementation of SummarizationAccuracySemanticRobustness to use Transform-based approach #233

danielezhu · 2024-03-27T01:57:25Z

Description of changes:
See title. This PR also fixes a bug introduced in the previous PR where the invoke_model call in BertScore was not replaced with get_helper_scores.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

…s to use Transform-based approach

…fix other unit tests that use invoke_model instead of get_helper_scores

danielezhu · 2024-03-27T02:50:21Z

test/integration/test_summarization_accuracy_semantic_robustness.py

 class TestSummarizationAccuracySemanticRobustness:
    @pytest.mark.parametrize(
-        "config, expected_evaluate_sample_scores, expected_evaluate_scores",
+        "config, evaluate_sample_scores, evaluate_scores",


Note that as with GSR, I have verified that the SASR logic remains the same as before by temporarily using the old numpy APIs in the semantic perturbation code, and then running the test cases using the old expected scores.

…n't get loaded preemptively

…be created from them

…parately

danielezhu · 2024-03-27T15:21:43Z

src/fmeval/eval_algorithms/helper_models/helper_model.py

        self._bertscore = hf_evaluate.load("bertscore")
        self._model_type = model_type

-        # Dummy call to download the model within constructor


Deleting this b/c loading the model into memory during the evaluate_sample integ tests causes codebuild integ tests to fail (the job just hangs/crashes). Note that I never had any issues locally, even with a several year old macbook pro.

Note that this doesn't impact the correctness of the algo/helper model at all, as we're going to call compute "for real" when we obtain the first scores anyways.

danielezhu · 2024-03-27T15:34:48Z

src/fmeval/eval_algorithms/summarization_accuracy_semantic_robustness.py

-            "Missing required input: model_input, for SummarizationAccuracySemanticRobustness evaluate_sample",
-        )
-        util.require(
+        transforms = get_model_responses_from_perturbed_inputs(


Ugh, I left this extraneous variable in from my old code. I will get rid of it in a followup PR, as we need to get the other PRs for the rest of the algos merged asap.

Daniel Zhu added 3 commits March 26, 2024 18:49

feat: update implementation of SummarizationAccuracySemanticRobustnes…

7e7200c

…s to use Transform-based approach

fix: replace invoke_model with get_helper_scores in BertScore logic

b3c8165

fix: restore accidentally-deleted BertscoreHelperModel unit test and …

b1de05f

…fix other unit tests that use invoke_model instead of get_helper_scores

danielezhu commented Mar 27, 2024

View reviewed changes

Daniel Zhu added 7 commits March 26, 2024 20:24

remove dummy call in BertscoreHelperModel init so that the model does…

61c3bc3

…n't get loaded preemptively

Implement serializers for helper models so that shared resources can …

d6dc085

…be created from them

run only SA and SASR integ tests

e7bcf7b

try separating evaluate_sample and evaluate tests

c5abe22

Try using shared resource for both evaluate_sample and evaluate

7f058f3

fix linting and re-enable all integ tests

a98c204

Use original code but run evaluate_sample and evaluate integ tests se…

72d778b

…parately

danielezhu commented Mar 27, 2024

View reviewed changes

nathanng17 approved these changes Mar 27, 2024

View reviewed changes

malhotra18 approved these changes Mar 27, 2024

View reviewed changes

danielezhu merged commit 5c931fe into aws:main Mar 27, 2024

danielezhu deleted the sasr branch March 27, 2024 15:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

feat: update implementation of SummarizationAccuracySemanticRobustness to use Transform-based approach #233

feat: update implementation of SummarizationAccuracySemanticRobustness to use Transform-based approach #233

Uh oh!

danielezhu commented Mar 27, 2024

Uh oh!

danielezhu Mar 27, 2024

Uh oh!

danielezhu Mar 27, 2024 •

edited

Loading

Uh oh!

danielezhu Mar 27, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

feat: update implementation of SummarizationAccuracySemanticRobustness to use Transform-based approach #233

feat: update implementation of SummarizationAccuracySemanticRobustness to use Transform-based approach #233

Uh oh!

Conversation

danielezhu commented Mar 27, 2024

Uh oh!

danielezhu Mar 27, 2024

Choose a reason for hiding this comment

Uh oh!

danielezhu Mar 27, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

danielezhu Mar 27, 2024

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

danielezhu Mar 27, 2024 •

edited

Loading