Codestin Search App

Masao-Someki · 2025-10-24T01:57:26Z

What did you change?

Updated espnet3/collect_stats.py with Runner/Provider classes:
- This fix includes bugfix in collect_stats.
Bug fixed base_runner.py:
- Wrapped parallel_for() with tqdm progress bar for visual feedback during distributed runs.
Bug fixed espnet3/utils/config.py:
- Returned ListConfig instead of plain list in load_line().
- Added OmegaConf.resolve() after merging configs.
Bug fixed espnet3/trainer/model.py: imported Path for consistency with file I/O utilities.

Why did you make this change?

The previous collect_stats system used separate implementations (collect_stats_local, collect_stats_parallel, etc.), which made maintenance and testing difficult.
In the [espnet3-6] Add evaluation scripts #6178 we implemented Runner/Provider classes for seamless integration of parallel processing/local execution.
This is much easier to maintain, so I modified from the parallel processing into runner/provider.
Added tqdm progress monitoring to improve user experience during long Dask-based runs.
The config.py changes improve Hydra compatibility and ensure proper type resolution in nested configurations.

Is your PR small enough?

✅ Yes.

5 files changed
~450 insertions, ~20 deletions
All changes are localized and focused on refactoring the collect_stats workflow.

Additional Context

Development plan for ESPnet-3 #6133

gemini-code-assist

Code Review

This pull request introduces a significant and valuable refactoring of the collect_stats functionality, replacing separate local and parallel implementations with a more maintainable Runner/Provider pattern. The changes improve code structure, enhance user experience by adding a progress bar for parallel runs, and fix important configuration handling issues in espnet3/utils/config.py.

My review focuses on the new implementation in espnet3/collect_stats.py. I've identified a couple of areas for improvement: a leftover debug statement and an opportunity to reduce code duplication, which will further enhance the maintainability of the new architecture. Overall, this is a solid improvement.

gemini-code-assist · 2025-10-24T01:58:27Z

espnet3/collect_stats.py

+        else:
+            model = instantiate(model_config)
+
+        print(model)


This print(model) statement appears to be a leftover from debugging. It should be removed to avoid polluting logs, especially in distributed environments where it could generate a large amount of output.

gemini-code-assist · 2025-10-24T01:58:27Z

espnet3/collect_stats.py

+    def build_env_local(self) -> Dict[str, Any]:
+        env = super().build_env_local()
+        collate_fn = _build_collate_fn(self.config.dataloader_config)
+        env["collate_fn"] = collate_fn
+
+        dataset = env.get("dataset")
+        if hasattr(dataset, "use_espnet_collator"):
+            dataset.use_espnet_collator = isinstance(collate_fn, CommonCollateFn)
+            env["dataset"] = dataset
+
+        device = env.get("device")
+        if device is None:
+            device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+            env["device"] = device
+
+        env["model"] = env["model"].to(device).eval()
+        env.setdefault("write_collected_feats", False)
+        env.setdefault("collect_stats_kwargs", None)
+        return env
+
+    def make_worker_setup_fn(self):
+        base_setup = super().make_worker_setup_fn()
+        dataloader_config = self.config.dataloader_config
+
+        def setup():
+            env = base_setup()
+            collate_fn = _build_collate_fn(dataloader_config)
+            env["collate_fn"] = collate_fn
+
+            dataset = env.get("dataset")
+            if hasattr(dataset, "use_espnet_collator"):
+                dataset.use_espnet_collator = isinstance(collate_fn, CommonCollateFn)
+                env["dataset"] = dataset
+
+            device = env.get("device")
+            if device is None:
+                device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+                env["device"] = device
+
+            env["model"] = env["model"].to(device).eval()
+            env.setdefault("write_collected_feats", False)
+            env.setdefault("collect_stats_kwargs", None)
+            return env
+
+        return setup


The methods build_env_local and the inner setup function within make_worker_setup_fn contain nearly identical logic for setting up the environment (e.g., creating collate_fn, determining device, preparing the model). This code duplication can make future maintenance more difficult, as changes might not be consistently applied to both places.

Consider extracting this common logic into a private helper method to be called by both build_env_local and setup. This would centralize the environment setup and improve code clarity and maintainability.

codecov · 2025-10-24T02:17:46Z

Codecov Report

❌ Patch coverage is 83.17757% with 36 lines in your changes missing coverage. Please review.
✅ Project coverage is 70.17%. Comparing base (ce649fa) to head (8de525a).
⚠️ Report is 14 commits behind head on espnet3.

Files with missing lines	Patch %	Lines
espnet3/collect_stats.py	83.25%	35 Missing ⚠️
espnet3/runner/base_runner.py	0.00%	1 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff             @@
##           espnet3    #6270      +/-   ##
===========================================
- Coverage    70.17%   70.17%   -0.01%     
===========================================
  Files          754      752       -2     
  Lines        69240    69300      +60     
===========================================
+ Hits         48589    48630      +41     
- Misses       20651    20670      +19

Flag	Coverage Δ
test_integration_espnet2	`47.88% <ø> (ø)`
test_python_espnet2	`62.54% <0.00%> (-0.06%)`	⬇️
test_python_espnet3	`16.20% <83.17%> (+0.04%)`	⬆️
test_utils	`62.54% <0.00%> (-0.06%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

sw005320

I'm not sure what the meaning of process_batch_batching. The function name is also confusing. Please clarify it and also rename it appropriately.
Is the newly added collect stats function going through the unit test?

@Emrys365, can you review this?

Masao-Someki · 2025-10-24T16:17:21Z

For the collect-stats part, I realized the current fix is not enough.

I need to remove collect-stats files in utils/
Update unit tests
Collect-stats provider needs refactoring, it's very complicated and we can make it simple

While the update on current commits is enough for running recipe, I will fix the above items to make this PR better

jctian98 · 2025-10-28T03:36:13Z

LGTM

sw005320 · 2025-10-28T14:44:23Z

We need a careful document and explanation about what a runner is and what a provider is.

Please embed this information in the source code
Please add TODO to explain the runner and provider in our document for parallel processing

Masao-Someki · 2025-10-29T19:53:19Z

Thank you, I added a todo comment, and I will create a PR on document! I have prepared it but couldn't make a PR as we have multiple changes to interfaces, but for now I think it is a good timing to make PR!

for more information, see https://pre-commit.ci

…spnet into espnet3/update_and_bugfix

Masao-Someki · 2025-10-31T12:19:33Z

I have added the document for parallel processing and runner/provider classes.
It is still not on PR but you can check form here

parallel processing: https://github.com/Masao-Someki/espnet/blob/espnet3/documents/doc/espnet3/parallel.md
runner and provider: https://github.com/Masao-Someki/espnet/blob/espnet3/documents/doc/espnet3/provider_runner.md

Emrys365

Thanks @Masao-Someki. The modifications make the code more compact. With more detailed documentations (as mentioned in TODOs) it would be much more clear to read / use.

espnet3/collect_stats.py

tools/setup_anaconda.sh

tools/setup_miniforge.sh

tools/setup_python.sh

tools/setup_venv.sh

Masao-Someki · 2025-11-09T23:05:24Z

@Emrys365
I'm sorry, I just realized this and fixed the reviews!
For the pip version, I think we can focus on this later.. I had put this in the loadmap.

Masao-Someki added 3 commits October 23, 2025 20:55

Bug fixed

d3a338a

Update collect_stats with runner/provider

81cf0b9

Apply format

9fa75a5

dosubot bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Oct 24, 2025

Masao-Someki changed the title ~~[espnet3-8]Espnet3/bugfix~~ [espnet3-8] Bugfix for recipe Oct 24, 2025

Masao-Someki requested a review from Emrys365 October 24, 2025 01:57

dosubot bot added Bugfix ESPnet3 labels Oct 24, 2025

gemini-code-assist bot reviewed Oct 24, 2025

View reviewed changes

Masao-Someki mentioned this pull request Oct 24, 2025

Development plan for ESPnet-3 #6133

Open

52 tasks

Removed debug code based on Gemini review

d840a34

Masao-Someki mentioned this pull request Oct 24, 2025

[espnet3-9] Add Librispeech-100h ASR recipe #6271

Closed

sw005320 approved these changes Oct 24, 2025

View reviewed changes

dosubot bot added the lgtm This PR has been approved by a maintainer label Oct 24, 2025

Fixed collect stats and unit test

bf7b8cd

dosubot bot added size:XL This PR changes 500-999 lines, ignoring generated files. and removed size:L This PR changes 100-499 lines, ignoring generated files. labels Oct 24, 2025

Fhrozen added this to the v.202512 milestone Oct 26, 2025

Add TODO to enhance docstrings and document

c175b04

pre-commit-ci bot and others added 3 commits October 29, 2025 19:56

[pre-commit.ci] auto fixes from pre-commit.com hooks

0f743f1

for more information, see https://pre-commit.ci

Set pip version to 25.2 for CI as temporal workaround

fdf59a9

Merge branch 'espnet3/update_and_bugfix' of github.com:Masao-Someki/e…

29118cb

…spnet into espnet3/update_and_bugfix

dosubot bot added size:XXL This PR changes 1000+ lines, ignoring generated files. and removed size:XL This PR changes 500-999 lines, ignoring generated files. labels Oct 31, 2025

mergify bot added the Installation label Oct 31, 2025

Emrys365 approved these changes Oct 31, 2025

View reviewed changes

Masao-Someki added 2 commits November 9, 2025 16:50

Fixed based on Emrys365's review

0f92627

Added workaround on pip version

f3afadd

Fixed unit test

8de525a

sw005320 merged commit 99697da into espnet:espnet3 Nov 12, 2025
28 checks passed

Fhrozen modified the milestones: v.202512, v.202511 Nov 14, 2025

Masao-Someki mentioned this pull request Nov 17, 2025

[espnet3-10] Merge espnet3 branch into master #6304

Merged

Conversation

Masao-Someki commented Oct 24, 2025

What did you change?

Why did you make this change?

Is your PR small enough?

Additional Context

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Oct 24, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Oct 24, 2025

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Oct 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

sw005320 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Masao-Someki commented Oct 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jctian98 commented Oct 28, 2025

Uh oh!

sw005320 commented Oct 28, 2025

Uh oh!

Masao-Someki commented Oct 29, 2025

Uh oh!

Masao-Someki commented Oct 31, 2025

Uh oh!

Emrys365 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Masao-Someki commented Nov 9, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

codecov bot commented Oct 24, 2025 •

edited

Loading

sw005320 left a comment •

edited

Loading

Masao-Someki commented Oct 24, 2025 •

edited

Loading