Codestin Search App

Fhrozen · 2025-09-13T04:17:03Z

Note

Adding documentation using LLMs to activate flake8 test in espnetez and espnet2.
For espnet2, I will divide as much as possible because I observed more than 1K of issues related to missing documentation.

What did you change?

This pull request focuses on code quality improvements, documentation formatting, and minor dependency updates across several scripts and configuration files. The main themes are: (1) code style and formatting consistency, (2) enhancements to documentation tooling and configuration, and (3) minor dependency and CI workflow adjustments.

Code style and formatting consistency:

Standardized string quoting and improved formatting in Python scripts, especially in doc/argparse2rst.py, doc/convert_custom_tags_to_html.py, doc/convert_md_to_homepage.py, and doc/make_release_note_from_milestone.py, making the codebase more consistent and readable. [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17]

Documentation tooling and configuration:

Updated doc/conf.py to ensure proper imports, theme configuration, and string formatting for Sphinx documentation, improving compatibility and appearance. [1] [2] [3] [4] [5] [6] [7]
Improved Markdown and HTML conversion logic in documentation scripts for better custom tag handling and header formatting. [1] [2] [3]

Dependency and CI workflow adjustments:

Added libjpeg-dev to the Dockerfile dependencies and commented out the Miniconda installation block, likely to streamline the build process or address build issues. (.devcontainer/ci_cpu/espnet.dockerfile) [1] [2]
Enabled flake8 and pycodestyle linting in CI scripts to enforce code quality standards. [1] [2]

These changes collectively improve maintainability, readability, and the developer experience for the project.

gemini-code-assist

Code Review

This pull request introduces a substantial amount of AI-generated documentation to the espnetez package, significantly enhancing code clarity and maintainability. A key addition is a new script for automatically generating docstrings using LLMs. The changes also enable flake8 in CI and include various code style improvements. My review primarily focuses on the quality of the new documentation, and I have identified a couple of areas where the generated content is either incorrect or has formatting issues that impact readability.

espnetez/preprocess/tokenizer.py

codecov · 2025-09-13T05:14:16Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 56.49%. Comparing base (7c0bd1e) to head (c40a9dc).
⚠️ Report is 26 commits behind head on master.

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #6241      +/-   ##
==========================================
- Coverage   56.49%   56.49%   -0.01%     
==========================================
  Files         896      896              
  Lines       84820    84814       -6     
==========================================
- Hits        47920    47914       -6     
  Misses      36900    36900

Flag	Coverage Δ
test_integration_espnet2	`46.81% <ø> (ø)`
test_integration_espnetez	`36.93% <100.00%> (-0.01%)`	⬇️
test_python_espnet2	`50.93% <0.00%> (+<0.01%)`	⬆️
test_python_espnetez	`12.73% <100.00%> (-0.01%)`	⬇️
test_utils	`18.77% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

sw005320 · 2025-09-13T14:27:47Z

@Masao-Someki, can you check the espnetez part?

Masao-Someki

Thank you, I’ve reviewed the documents. For files like __init__.py, I think it’s good to include scripts within the same package.

espnetez/__init__.py

espnetez/config.py

espnetez/data/__init__.py

espnetez/preprocess/__init__.py

Masao-Someki · 2025-09-13T23:33:34Z

espnetez/trainer.py

+                datasets.  The structure can be either:
+                ``{"train": {...}, "valid": {...}}`` or a flat mapping where the same
+                items are used for both splits.  Each value must be a tuple
+                ``(file_name, name, type)``.


I think it's good to link where we can find the types.

espnet/espnet2/train/dataset.py

Line 248 in eaf9a83

DATA_TYPES = {

Fhrozen · 2025-09-15T00:51:40Z

@Masao-Someki Thank you for the comments.
Some minor details.
Currently, the code employs an open-source model that allows any user to generate docs from any self-hosted service.
Additionally, the code currently extracts the missing documentation from the flake8 test, which only provides the file and line where the missing documentation is located. Issues like sentencepiece.py or trainer files may require passing a longer context to an LLM model. I will fix the issues you mentioned. But, if a doc is largely incorrect, I will delete and keep it as minimal as possible. Of course, I will try to check if it is possible to improve the way to provide a large context to the LLM model. I think that using Copilot chat may help, but I will need to test the quality of the generated doc.

sw005320

OK for me.
If @Masao-Someki approves, we can merge this PR.

Fhrozen added 11 commits September 13, 2025 01:13

add generation code by llm

f723522

update generate code

bfc2fa6

timeout fix

8f22f66

add documentation to espnetez

b500c86

enable flake8 on espnetez

d50b08f

add doc on trainer

b892bba

removing additional quotation

d162d93

managing d104 to d107

fa2370a

Additional documentation

6d65fa6

flake8 fixes

45debf7

apply codestyle to doc scripts

c6d1a7a

dosubot bot added the size:XXL This PR changes 1000+ lines, ignoring generated files. label Sep 13, 2025

mergify bot added Documentation CI Travis, Circle CI, etc labels Sep 13, 2025

undo flake8 in espnet2

699e9bd

gemini-code-assist bot reviewed Sep 13, 2025

View reviewed changes

espnetez/preprocess/tokenizer.py Show resolved Hide resolved

Fhrozen added 2 commits September 13, 2025 04:19

undo dockerfile unrelated

ceb0010

fix docstring

84e2699

fix flake8 issue

ebcf68a

Masao-Someki reviewed Sep 13, 2025

View reviewed changes

Fhrozen added this to the v.202512 milestone Sep 23, 2025

Fhrozen added 3 commits September 28, 2025 12:06

fix flake8 for generate code

7b184d1

reflect comments of doc on espnetez

f393cdf

flake fixes

538586a

Fhrozen requested a review from sw005320 September 29, 2025 02:07

sw005320 approved these changes Sep 29, 2025

View reviewed changes

dosubot bot added the lgtm This PR has been approved by a maintainer label Sep 29, 2025

Masao-Someki approved these changes Oct 7, 2025

View reviewed changes

Merge branch 'master' into pr-doc

c40a9dc

Fhrozen merged commit 53e0976 into espnet:master Oct 24, 2025
7 of 9 checks passed

Fhrozen deleted the pr-doc branch October 24, 2025 02:26

Fhrozen modified the milestones: v.202512, v.202511 Nov 14, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Doc 1] Add AI-gen documentation to espnetez#6241

[Doc 1] Add AI-gen documentation to espnetez#6241
Fhrozen merged 19 commits intoespnet:masterfrom
Fhrozen:pr-doc

Fhrozen commented Sep 13, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

codecov bot commented Sep 13, 2025 •

edited

Loading

Uh oh!

sw005320 commented Sep 13, 2025

Uh oh!

Masao-Someki left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Masao-Someki Sep 13, 2025

Uh oh!

Fhrozen commented Sep 15, 2025

Uh oh!

sw005320 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Fhrozen commented Sep 13, 2025

Note

What did you change?

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

codecov bot commented Sep 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

sw005320 commented Sep 13, 2025

Uh oh!

Masao-Someki left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Masao-Someki Sep 13, 2025

Choose a reason for hiding this comment

Uh oh!

Fhrozen commented Sep 15, 2025

Uh oh!

sw005320 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codecov bot commented Sep 13, 2025 •

edited

Loading