Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@danielezhu
Copy link
Contributor

@danielezhu danielezhu commented Mar 13, 2024

Description of changes:

  1. Update the implementation of the SummarizationAccuracy evaluation algorithm so that it uses the Transform/TransformPipeline approach.
  2. Minor new functions like get_dataset_configs and execute_record alongside their corresponding unit tests.
  3. Remove deepcopy from Transform __init__, as it conflicts with Ray serialization and also didn't provide much value to begin with (and was potentially unexpected/unintuitive).

The largest diff for this PR is in the unit tests for SummarizationAccuracy. Because a lot of code was edited, deleted, moved, reading the diff may not be the best way to view the changes. I'd suggest just jumping to the file directly, and reading through the tests from a fresh perspective, since so much as been changed.

The unit tests for get_meteor_score and get_rouge_score have been moved to test_summarization_accuracy_metrics.py and have been adapted to test the MeteorScore and RougeScore transforms. Note that BertScore isn't tested because unit tests that validate numerical values already exist for the BertscoreModel helper model.

Also note that there was actually some unintended behavior in the original unit tests for get_rouge_score, where we were passing the same config for every parametrized test case (so the rouge_type was always "rouge2", instead of varying based on the test case). This has been fixed in my new unit tests, and as a result, some of the expected values have changed.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@danielezhu danielezhu merged commit db7b5b7 into aws:main Mar 14, 2024
@danielezhu danielezhu deleted the summ_acc branch March 14, 2024 22:25
Copy link

@lucfra lucfra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall looks great! I think we can still improve on a couple of points.

else:
dataset_configs = [DATASET_CONFIGS[dataset_name] for dataset_name in EVAL_DATASETS[self.eval_name]]

dataset_configs = get_dataset_configs(dataset_config, self.eval_name)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not following here. what if dataset_config has been passed as argument?
Also. I think it would be much cleaner to have under the hood a generic implementation, with dataset taken as required argument and a built-in version that calls the generic version on the built-in datasets. Is this possible?

dataset, [METEOR_SCORE, ROUGE_SCORE, BERT_SCORE], agg_method=MEAN
)
dataset = pipeline.execute(dataset)
dataset_scores, category_scores = aggregate_evaluation_scores(dataset, METRIC_NAMES, agg_method=MEAN)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we make this and also the save operation part of standard transforms?


:param eval_algorithm_config: Summarization Accuracy eval algorithm config.
def __init__(
self, eval_algorithm_config: SummarizationAccuracyConfig = SummarizationAccuracyConfig(), use_ray: bool = True
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

love the use_ray flag :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants