[espnet3-8] Bugfix for recipe#6270
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces a significant and valuable refactoring of the collect_stats functionality, replacing separate local and parallel implementations with a more maintainable Runner/Provider pattern. The changes improve code structure, enhance user experience by adding a progress bar for parallel runs, and fix important configuration handling issues in espnet3/utils/config.py.
My review focuses on the new implementation in espnet3/collect_stats.py. I've identified a couple of areas for improvement: a leftover debug statement and an opportunity to reduce code duplication, which will further enhance the maintainability of the new architecture. Overall, this is a solid improvement.
espnet3/collect_stats.py
Outdated
| else: | ||
| model = instantiate(model_config) | ||
|
|
||
| print(model) |
| def build_env_local(self) -> Dict[str, Any]: | ||
| env = super().build_env_local() | ||
| collate_fn = _build_collate_fn(self.config.dataloader_config) | ||
| env["collate_fn"] = collate_fn | ||
|
|
||
| dataset = env.get("dataset") | ||
| if hasattr(dataset, "use_espnet_collator"): | ||
| dataset.use_espnet_collator = isinstance(collate_fn, CommonCollateFn) | ||
| env["dataset"] = dataset | ||
|
|
||
| device = env.get("device") | ||
| if device is None: | ||
| device = torch.device("cuda" if torch.cuda.is_available() else "cpu") | ||
| env["device"] = device | ||
|
|
||
| env["model"] = env["model"].to(device).eval() | ||
| env.setdefault("write_collected_feats", False) | ||
| env.setdefault("collect_stats_kwargs", None) | ||
| return env | ||
|
|
||
| def make_worker_setup_fn(self): | ||
| base_setup = super().make_worker_setup_fn() | ||
| dataloader_config = self.config.dataloader_config | ||
|
|
||
| def setup(): | ||
| env = base_setup() | ||
| collate_fn = _build_collate_fn(dataloader_config) | ||
| env["collate_fn"] = collate_fn | ||
|
|
||
| dataset = env.get("dataset") | ||
| if hasattr(dataset, "use_espnet_collator"): | ||
| dataset.use_espnet_collator = isinstance(collate_fn, CommonCollateFn) | ||
| env["dataset"] = dataset | ||
|
|
||
| device = env.get("device") | ||
| if device is None: | ||
| device = torch.device("cuda" if torch.cuda.is_available() else "cpu") | ||
| env["device"] = device | ||
|
|
||
| env["model"] = env["model"].to(device).eval() | ||
| env.setdefault("write_collected_feats", False) | ||
| env.setdefault("collect_stats_kwargs", None) | ||
| return env | ||
|
|
||
| return setup |
There was a problem hiding this comment.
The methods build_env_local and the inner setup function within make_worker_setup_fn contain nearly identical logic for setting up the environment (e.g., creating collate_fn, determining device, preparing the model). This code duplication can make future maintenance more difficult, as changes might not be consistently applied to both places.
Consider extracting this common logic into a private helper method to be called by both build_env_local and setup. This would centralize the environment setup and improve code clarity and maintainability.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## espnet3 #6270 +/- ##
===========================================
- Coverage 70.17% 70.17% -0.01%
===========================================
Files 754 752 -2
Lines 69240 69300 +60
===========================================
+ Hits 48589 48630 +41
- Misses 20651 20670 +19
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
- I'm not sure what the meaning of
process_batch_batching.The function name is also confusing. Please clarify it and also rename it appropriately. - Is the newly added collect stats function going through the unit test?
@Emrys365, can you review this?
|
For the collect-stats part, I realized the current fix is not enough.
While the update on current commits is enough for running recipe, I will fix the above items to make this PR better |
|
LGTM |
|
We need a careful document and explanation about what a runner is and what a provider is.
|
|
Thank you, I added a todo comment, and I will create a PR on document! I have prepared it but couldn't make a PR as we have multiple changes to interfaces, but for now I think it is a good timing to make PR! |
for more information, see https://pre-commit.ci
…spnet into espnet3/update_and_bugfix
|
I have added the document for parallel processing and runner/provider classes. |
Emrys365
left a comment
There was a problem hiding this comment.
Thanks @Masao-Someki. The modifications make the code more compact. With more detailed documentations (as mentioned in TODOs) it would be much more clear to read / use.
|
@Emrys365 |
What did you change?
Updated
espnet3/collect_stats.pywith Runner/Provider classes:Bug fixed
base_runner.py:parallel_for()withtqdmprogress bar for visual feedback during distributed runs.Bug fixed
espnet3/utils/config.py:ListConfiginstead of plain list inload_line().OmegaConf.resolve()after merging configs.Bug fixed
espnet3/trainer/model.py: importedPathfor consistency with file I/O utilities.Why did you make this change?
The previous
collect_statssystem used separate implementations (collect_stats_local,collect_stats_parallel, etc.), which made maintenance and testing difficult.In the [espnet3-6] Add evaluation scripts #6178 we implemented Runner/Provider classes for seamless integration of parallel processing/local execution.
This is much easier to maintain, so I modified from the parallel processing into runner/provider.
Added
tqdmprogress monitoring to improve user experience during long Dask-based runs.The
config.pychanges improve Hydra compatibility and ensure proper type resolution in nested configurations.Is your PR small enough?
✅ Yes.
collect_statsworkflow.Additional Context