feat: update implementation of QAAccuracy to use Transform-based approach #234

danielezhu · 2024-03-27T15:52:17Z

Description of changes:
Title

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

…oach

src/fmeval/eval_algorithms/qa_accuracy.py

test/unit/eval_algorithms/test_qa_accuracy.py

malhotra18 · 2024-03-27T15:54:58Z

src/fmeval/eval_algorithms/general_semantic_robustness.py

    SemanticRobustnessConfig,
    get_perturbation_transform,
-    get_model_responses_from_perturbed_inputs,
+    get_model_outputs_from_perturbed_inputs,


i see use_ray is still present in this file, could you please remove it? Also check if it is present in any other files yet.

Will handle in separate PR dedicated to GSR changes.

src/fmeval/eval_algorithms/qa_accuracy.py

malhotra18 · 2024-03-27T16:06:23Z

src/fmeval/eval_algorithms/qa_accuracy.py

+        target_output_key = self.transform.target_output_key
+        model_output_key = self.transform.model_output_key
+        sample = {target_output_key: target_output, model_output_key: model_output}
+        pipeline = TransformPipeline([self.transform])


it feels this is making evaluate_sample more complicated, was this concern raised earlier? I understand we are too late to discuss on this now, but can you share the conclusion or thread if a discussion on this was done earlier?

It does make things slightly more complicated, but I think it's best to keep things consistent across all algos, where it's clear that evaluate_sample and evaluate execute the algo's pipeline. Plus, in terms of raw lines of code, there's hardly any increase.

one of the major tenet for evaluate_sample has been to keep it simple and readable. I feel we are moving against that by adding transforms in the method. Yeah, let's capture a backlog SIM for it, we can discuss with the team then and take a call.

I am in favor of not having so many hops for customer in evaluate_sample code.

feat: update implementation of QAAccuracy to use Transform-based appr…

daf08a3

…oach

danielezhu commented Mar 27, 2024

View reviewed changes

src/fmeval/eval_algorithms/qa_accuracy.py Show resolved Hide resolved

danielezhu commented Mar 27, 2024

View reviewed changes

src/fmeval/eval_algorithms/qa_accuracy.py Outdated Show resolved Hide resolved

danielezhu commented Mar 27, 2024

View reviewed changes

test/unit/eval_algorithms/test_qa_accuracy.py Show resolved Hide resolved

nathanng17 previously approved these changes Mar 27, 2024

View reviewed changes

malhotra18 requested changes Mar 27, 2024

View reviewed changes

refactor: move _get_score into QAAccuracyScores class

220ba05

danielezhu dismissed nathanng17’s stale review via 220ba05 March 27, 2024 16:41

nathanng17 approved these changes Mar 27, 2024

View reviewed changes

malhotra18 approved these changes Mar 27, 2024

View reviewed changes

danielezhu merged commit 5ebb18d into aws:main Mar 27, 2024

danielezhu deleted the qa_acc branch March 27, 2024 17:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

feat: update implementation of QAAccuracy to use Transform-based approach #234

feat: update implementation of QAAccuracy to use Transform-based approach #234

Uh oh!

danielezhu commented Mar 27, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

malhotra18 Mar 27, 2024

Uh oh!

danielezhu Mar 27, 2024

Uh oh!

Uh oh!

malhotra18 Mar 27, 2024 •

edited

Loading

Uh oh!

danielezhu Mar 27, 2024

Uh oh!

malhotra18 Mar 27, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

feat: update implementation of QAAccuracy to use Transform-based approach #234

feat: update implementation of QAAccuracy to use Transform-based approach #234

Uh oh!

Conversation

danielezhu commented Mar 27, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

malhotra18 Mar 27, 2024

Choose a reason for hiding this comment

Uh oh!

danielezhu Mar 27, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

malhotra18 Mar 27, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

danielezhu Mar 27, 2024

Choose a reason for hiding this comment

Uh oh!

malhotra18 Mar 27, 2024

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

malhotra18 Mar 27, 2024 •

edited

Loading