-
Couldn't load subscription status.
- Fork 57
feat: update implementation of SummarizationAccuracy to use Transform-based approach #214
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
overall looks great! I think we can still improve on a couple of points.
| else: | ||
| dataset_configs = [DATASET_CONFIGS[dataset_name] for dataset_name in EVAL_DATASETS[self.eval_name]] | ||
|
|
||
| dataset_configs = get_dataset_configs(dataset_config, self.eval_name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not following here. what if dataset_config has been passed as argument?
Also. I think it would be much cleaner to have under the hood a generic implementation, with dataset taken as required argument and a built-in version that calls the generic version on the built-in datasets. Is this possible?
| dataset, [METEOR_SCORE, ROUGE_SCORE, BERT_SCORE], agg_method=MEAN | ||
| ) | ||
| dataset = pipeline.execute(dataset) | ||
| dataset_scores, category_scores = aggregate_evaluation_scores(dataset, METRIC_NAMES, agg_method=MEAN) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we make this and also the save operation part of standard transforms?
|
|
||
| :param eval_algorithm_config: Summarization Accuracy eval algorithm config. | ||
| def __init__( | ||
| self, eval_algorithm_config: SummarizationAccuracyConfig = SummarizationAccuracyConfig(), use_ray: bool = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
love the use_ray flag :)
Description of changes:
SummarizationAccuracyevaluation algorithm so that it uses theTransform/TransformPipelineapproach.get_dataset_configsandexecute_recordalongside their corresponding unit tests.deepcopyfromTransform__init__, as it conflicts with Ray serialization and also didn't provide much value to begin with (and was potentially unexpected/unintuitive).The largest diff for this PR is in the unit tests for
SummarizationAccuracy. Because a lot of code was edited, deleted, moved, reading the diff may not be the best way to view the changes. I'd suggest just jumping to the file directly, and reading through the tests from a fresh perspective, since so much as been changed.The unit tests for
get_meteor_scoreandget_rouge_scorehave been moved totest_summarization_accuracy_metrics.pyand have been adapted to test theMeteorScoreandRougeScoretransforms. Note thatBertScoreisn't tested because unit tests that validate numerical values already exist for theBertscoreModelhelper model.Also note that there was actually some unintended behavior in the original unit tests for
get_rouge_score, where we were passing the same config for every parametrized test case (so therouge_typewas always"rouge2", instead of varying based on the test case). This has been fixed in my new unit tests, and as a result, some of the expected values have changed.By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.