-
Couldn't load subscription status.
- Fork 57
feat: update implementation of QAAccuracySemanticRobustness to use Transform-based approach #235
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ansform-based approach
| (test_case.original_model_output,), | ||
| (test_case.perturbed_model_output_1,), | ||
| (test_case.perturbed_model_output_2,), | ||
| (test_case.original_model_output, None), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Model runner predict method's output should be a two-element tuple.
| mock_get_results_path.return_value = "/path/to/results" | ||
| model_runner = Mock() | ||
|
|
||
| @pytest.mark.parametrize( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All invalid input cases are handled by evaluate_dataset now.
| ), | ||
| ], | ||
| ) | ||
| def test_qa_accuracy_semantic_robustness_evaluate_sample_with_model_output(self, test_case): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed model_output as an argument to evaluate_sample, to be consistent with evaluate. Since semantic robustness algos require a model and model inputs anyways, there's no need to make things more complicated by allowing users to first invoke their model to first get the model output, and then pass that output here. We should just get the model output ourselves, with their model.
| user_provided_prompt_template: Optional[str] | ||
| dataset_prompt_template: str | ||
|
|
||
| @pytest.mark.parametrize( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are all test cases that don't actually validate the numerical values, but rather ensure that correct function calls are made. evaluate_dataset now handles all of this logic, so we can get rid of these test cases. Notice how all of scores are just 0.0, since we're mocking everything.
…om evaluate integ test
Description of changes:
Title
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.