Thanks to visit codestin.com
Credit goes to github.com

Skip to content

[Doc 1] Add AI-gen documentation to espnetez#6241

Merged
Fhrozen merged 19 commits intoespnet:masterfrom
Fhrozen:pr-doc
Oct 24, 2025
Merged

[Doc 1] Add AI-gen documentation to espnetez#6241
Fhrozen merged 19 commits intoespnet:masterfrom
Fhrozen:pr-doc

Conversation

@Fhrozen
Copy link
Member

@Fhrozen Fhrozen commented Sep 13, 2025

Note

Adding documentation using LLMs to activate flake8 test in espnetez and espnet2.
For espnet2, I will divide as much as possible because I observed more than 1K of issues related to missing documentation.

What did you change?

This pull request focuses on code quality improvements, documentation formatting, and minor dependency updates across several scripts and configuration files. The main themes are: (1) code style and formatting consistency, (2) enhancements to documentation tooling and configuration, and (3) minor dependency and CI workflow adjustments.

Code style and formatting consistency:

  • Standardized string quoting and improved formatting in Python scripts, especially in doc/argparse2rst.py, doc/convert_custom_tags_to_html.py, doc/convert_md_to_homepage.py, and doc/make_release_note_from_milestone.py, making the codebase more consistent and readable. [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17]

Documentation tooling and configuration:

  • Updated doc/conf.py to ensure proper imports, theme configuration, and string formatting for Sphinx documentation, improving compatibility and appearance. [1] [2] [3] [4] [5] [6] [7]
  • Improved Markdown and HTML conversion logic in documentation scripts for better custom tag handling and header formatting. [1] [2] [3]

Dependency and CI workflow adjustments:

  • Added libjpeg-dev to the Dockerfile dependencies and commented out the Miniconda installation block, likely to streamline the build process or address build issues. (.devcontainer/ci_cpu/espnet.dockerfile) [1] [2]
  • Enabled flake8 and pycodestyle linting in CI scripts to enforce code quality standards. [1] [2]

These changes collectively improve maintainability, readability, and the developer experience for the project.

@dosubot dosubot bot added the size:XXL This PR changes 1000+ lines, ignoring generated files. label Sep 13, 2025
@mergify mergify bot added Documentation CI Travis, Circle CI, etc labels Sep 13, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a substantial amount of AI-generated documentation to the espnetez package, significantly enhancing code clarity and maintainability. A key addition is a new script for automatically generating docstrings using LLMs. The changes also enable flake8 in CI and include various code style improvements. My review primarily focuses on the quality of the new documentation, and I have identified a couple of areas where the generated content is either incorrect or has formatting issues that impact readability.

@codecov
Copy link

codecov bot commented Sep 13, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 56.49%. Comparing base (7c0bd1e) to head (c40a9dc).
⚠️ Report is 26 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #6241      +/-   ##
==========================================
- Coverage   56.49%   56.49%   -0.01%     
==========================================
  Files         896      896              
  Lines       84820    84814       -6     
==========================================
- Hits        47920    47914       -6     
  Misses      36900    36900              
Flag Coverage Δ
test_integration_espnet2 46.81% <ø> (ø)
test_integration_espnetez 36.93% <100.00%> (-0.01%) ⬇️
test_python_espnet2 50.93% <0.00%> (+<0.01%) ⬆️
test_python_espnetez 12.73% <100.00%> (-0.01%) ⬇️
test_utils 18.77% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@sw005320
Copy link
Contributor

@Masao-Someki, can you check the espnetez part?

Copy link
Contributor

@Masao-Someki Masao-Someki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, I’ve reviewed the documents. For files like __init__.py, I think it’s good to include scripts within the same package.

datasets. The structure can be either:
``{"train": {...}, "valid": {...}}`` or a flat mapping where the same
items are used for both splits. Each value must be a tuple
``(file_name, name, type)``.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's good to link where we can find the types.

DATA_TYPES = {

@Fhrozen
Copy link
Member Author

Fhrozen commented Sep 15, 2025

@Masao-Someki Thank you for the comments.
Some minor details.
Currently, the code employs an open-source model that allows any user to generate docs from any self-hosted service.
Additionally, the code currently extracts the missing documentation from the flake8 test, which only provides the file and line where the missing documentation is located. Issues like sentencepiece.py or trainer files may require passing a longer context to an LLM model. I will fix the issues you mentioned. But, if a doc is largely incorrect, I will delete and keep it as minimal as possible. Of course, I will try to check if it is possible to improve the way to provide a large context to the LLM model. I think that using Copilot chat may help, but I will need to test the quality of the generated doc.

@Fhrozen Fhrozen added this to the v.202512 milestone Sep 23, 2025
@Fhrozen Fhrozen requested a review from sw005320 September 29, 2025 02:07
Copy link
Contributor

@sw005320 sw005320 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK for me.
If @Masao-Someki approves, we can merge this PR.

@dosubot dosubot bot added the lgtm This PR has been approved by a maintainer label Sep 29, 2025
@Fhrozen Fhrozen merged commit 53e0976 into espnet:master Oct 24, 2025
7 of 9 checks passed
@Fhrozen Fhrozen deleted the pr-doc branch October 24, 2025 02:26
@Fhrozen Fhrozen modified the milestones: v.202512, v.202511 Nov 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CI Travis, Circle CI, etc Documentation lgtm This PR has been approved by a maintainer size:XXL This PR changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants