feat: update implementation of QAAccuracySemanticRobustness to use Transform-based approach #235

danielezhu · 2024-03-27T17:17:12Z

Description of changes:
Title

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

…ansform-based approach

danielezhu · 2024-03-27T17:18:38Z

test/unit/eval_algorithms/test_qa_accuracy_semantic_robustness.py

-            (test_case.original_model_output,),
-            (test_case.perturbed_model_output_1,),
-            (test_case.perturbed_model_output_2,),
+            (test_case.original_model_output, None),


Model runner predict method's output should be a two-element tuple.

danielezhu · 2024-03-27T17:19:20Z

test/unit/eval_algorithms/test_qa_accuracy_semantic_robustness.py

+        mock_get_results_path.return_value = "/path/to/results"
+        model_runner = Mock()

-    @pytest.mark.parametrize(


All invalid input cases are handled by evaluate_dataset now.

danielezhu · 2024-03-27T17:21:30Z

test/unit/eval_algorithms/test_qa_accuracy_semantic_robustness.py

-            ),
-        ],
-    )
-    def test_qa_accuracy_semantic_robustness_evaluate_sample_with_model_output(self, test_case):


I removed model_output as an argument to evaluate_sample, to be consistent with evaluate. Since semantic robustness algos require a model and model inputs anyways, there's no need to make things more complicated by allowing users to first invoke their model to first get the model output, and then pass that output here. We should just get the model output ourselves, with their model.

danielezhu · 2024-03-27T17:22:31Z

test/unit/eval_algorithms/test_qa_accuracy_semantic_robustness.py

+        user_provided_prompt_template: Optional[str]
+        dataset_prompt_template: str

    @pytest.mark.parametrize(


These are all test cases that don't actually validate the numerical values, but rather ensure that correct function calls are made. evaluate_dataset now handles all of this logic, so we can get rid of these test cases. Notice how all of scores are just 0.0, since we're mocking everything.

…om evaluate integ test

feat: update implementation of QAAccuracySemanticRobustness to use Tr…

ea8b611

…ansform-based approach

danielezhu commented Mar 27, 2024

View reviewed changes

malhotra18 previously approved these changes Mar 27, 2024

View reviewed changes

nathanng17 previously approved these changes Mar 27, 2024

View reviewed changes

remove the assert False, which was an artifact from local testing, fr…

f2680fc

…om evaluate integ test

danielezhu dismissed stale reviews from nathanng17 and malhotra18 via f2680fc March 27, 2024 17:42

nathanng17 approved these changes Mar 27, 2024

View reviewed changes

oyangz approved these changes Mar 27, 2024

View reviewed changes

malhotra18 approved these changes Mar 27, 2024

View reviewed changes

danielezhu merged commit 1c99234 into aws:main Mar 27, 2024

danielezhu deleted the qa_sr branch March 27, 2024 18:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

feat: update implementation of QAAccuracySemanticRobustness to use Transform-based approach #235

feat: update implementation of QAAccuracySemanticRobustness to use Transform-based approach #235

Uh oh!

danielezhu commented Mar 27, 2024

Uh oh!

danielezhu Mar 27, 2024

Uh oh!

danielezhu Mar 27, 2024

Uh oh!

danielezhu Mar 27, 2024

Uh oh!

danielezhu Mar 27, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

feat: update implementation of QAAccuracySemanticRobustness to use Transform-based approach #235

feat: update implementation of QAAccuracySemanticRobustness to use Transform-based approach #235

Uh oh!

Conversation

danielezhu commented Mar 27, 2024

Uh oh!

danielezhu Mar 27, 2024

Choose a reason for hiding this comment

Uh oh!

danielezhu Mar 27, 2024

Choose a reason for hiding this comment

Uh oh!

danielezhu Mar 27, 2024

Choose a reason for hiding this comment

Uh oh!

danielezhu Mar 27, 2024

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants