feat: update implementation of SummarizationAccuracy to use Transform-based approach #214

danielezhu · 2024-03-13T23:42:18Z

Description of changes:

Update the implementation of the SummarizationAccuracy evaluation algorithm so that it uses the Transform/TransformPipeline approach.
Minor new functions like get_dataset_configs and execute_record alongside their corresponding unit tests.
Remove deepcopy from Transform __init__, as it conflicts with Ray serialization and also didn't provide much value to begin with (and was potentially unexpected/unintuitive).

The largest diff for this PR is in the unit tests for SummarizationAccuracy. Because a lot of code was edited, deleted, moved, reading the diff may not be the best way to view the changes. I'd suggest just jumping to the file directly, and reading through the tests from a fresh perspective, since so much as been changed.

The unit tests for get_meteor_score and get_rouge_score have been moved to test_summarization_accuracy_metrics.py and have been adapted to test the MeteorScore and RougeScore transforms. Note that BertScore isn't tested because unit tests that validate numerical values already exist for the BertscoreModel helper model.

Also note that there was actually some unintended behavior in the original unit tests for get_rouge_score, where we were passing the same config for every parametrized test case (so the rouge_type was always "rouge2", instead of varying based on the test case). This has been fixed in my new unit tests, and as a result, some of the expected values have changed.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

…-based approach

lucfra

overall looks great! I think we can still improve on a couple of points.

src/fmeval/eval_algorithms/summarization_accuracy.py

lucfra · 2024-03-19T11:22:30Z

src/fmeval/eval_algorithms/summarization_accuracy.py

-        else:
-            dataset_configs = [DATASET_CONFIGS[dataset_name] for dataset_name in EVAL_DATASETS[self.eval_name]]
-
+        dataset_configs = get_dataset_configs(dataset_config, self.eval_name)


I'm not following here. what if dataset_config has been passed as argument?
Also. I think it would be much cleaner to have under the hood a generic implementation, with dataset taken as required argument and a built-in version that calls the generic version on the built-in datasets. Is this possible?

lucfra · 2024-03-19T11:25:35Z

src/fmeval/eval_algorithms/summarization_accuracy.py

-                    dataset, [METEOR_SCORE, ROUGE_SCORE, BERT_SCORE], agg_method=MEAN
-                )
+                dataset = pipeline.execute(dataset)
+                dataset_scores, category_scores = aggregate_evaluation_scores(dataset, METRIC_NAMES, agg_method=MEAN)


can we make this and also the save operation part of standard transforms?

lucfra · 2024-03-19T11:26:16Z

src/fmeval/eval_algorithms/summarization_accuracy.py


-        :param eval_algorithm_config: Summarization Accuracy eval algorithm config.
+    def __init__(
+        self, eval_algorithm_config: SummarizationAccuracyConfig = SummarizationAccuracyConfig(), use_ray: bool = True


love the use_ray flag :)

Daniel Zhu and others added 3 commits March 13, 2024 16:37

feat: update implementation of SummarizationAccuracy to use Transform…

6d8c418

…-based approach

Merge branch 'main' into summ_acc

69f8d35

Document the use_ray parameter in __init__

2c42dcd

nathanng17 approved these changes Mar 14, 2024

View reviewed changes

oyangz approved these changes Mar 14, 2024

View reviewed changes

danielezhu merged commit db7b5b7 into aws:main Mar 14, 2024

danielezhu deleted the summ_acc branch March 14, 2024 22:25

lucfra reviewed Mar 19, 2024

View reviewed changes

danielezhu mentioned this pull request Mar 20, 2024

feat: update implementation of GeneralSemanticRobustness to use Transform-based approach #222

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

feat: update implementation of SummarizationAccuracy to use Transform-based approach #214

feat: update implementation of SummarizationAccuracy to use Transform-based approach #214

Uh oh!

danielezhu commented Mar 13, 2024 •

edited

Loading

Uh oh!

lucfra left a comment

Uh oh!

Uh oh!

lucfra Mar 19, 2024

Uh oh!

lucfra Mar 19, 2024

Uh oh!

lucfra Mar 19, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

feat: update implementation of SummarizationAccuracy to use Transform-based approach #214

feat: update implementation of SummarizationAccuracy to use Transform-based approach #214

Uh oh!

Conversation

danielezhu commented Mar 13, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lucfra left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

lucfra Mar 19, 2024

Choose a reason for hiding this comment

Uh oh!

lucfra Mar 19, 2024

Choose a reason for hiding this comment

Uh oh!

lucfra Mar 19, 2024

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

danielezhu commented Mar 13, 2024 •

edited

Loading