Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Codec codebase bug fixes: detach() in RVQ residual and target_bandwidth in inference#6268

Merged
ftshijt merged 2 commits intoespnet:masterfrom
whr-a:pr_codec
Oct 22, 2025
Merged

Codec codebase bug fixes: detach() in RVQ residual and target_bandwidth in inference#6268
ftshijt merged 2 commits intoespnet:masterfrom
whr-a:pr_codec

Conversation

@whr-a
Copy link
Contributor

@whr-a whr-a commented Oct 20, 2025

  • Fix 2 bugs:
    • When calculating the residual in RVQ, detach() should be applied. Refer to Cisco's fix: core_vq.py, and Moshi's implementation: core_vq.py
    • Bug in passing target_bandwidth during inference

@dosubot dosubot bot added size:S This PR changes 10-29 lines, ignoring generated files. Bug bug should be fixed labels Oct 20, 2025
@mergify mergify bot added the ESPnet2 label Oct 20, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces two important bug fixes for the codec codebase. The first fix correctly applies .detach() when calculating the residual in the Residual Vector Quantizer, which is crucial for stable training. The second fix enables passing target_bandwidth during inference.

The changes for both fixes are well-implemented. However, I've identified a potential issue in the inference methods of both DAC and SoundStream models. Passing **kwargs directly to the generator's encode method could cause a TypeError if unexpected arguments are provided. I've suggested a small refactoring to call the model's own encode method instead, which handles keyword arguments safely and improves code reuse. Overall, this is a good set of fixes.


"""
codec = self.generator.encode(x)
codec = self.generator.encode(x, **kwargs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Calling self.generator.encode(x, **kwargs) directly is risky as it will raise a TypeError if kwargs contains any key other than target_bw. The inference method's signature allows for any keyword arguments, but DACGenerator.encode is more restrictive. It's safer to call self.encode(x, **kwargs), which correctly filters the keyword arguments, ensuring only target_bw is passed along. This also improves code reuse.

Suggested change
codec = self.generator.encode(x, **kwargs)
codec = self.encode(x, **kwargs)


"""
codec = self.generator.encode(x)
codec = self.generator.encode(x, **kwargs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Calling self.generator.encode(x, **kwargs) directly can lead to a TypeError if kwargs includes keys not expected by SoundStreamGenerator.encode (which only accepts target_bw). To prevent potential crashes and improve code reuse, it's better to call self.encode(x, **kwargs). The self.encode method is designed to safely handle arbitrary keyword arguments by extracting only the relevant ones.

Suggested change
codec = self.generator.encode(x, **kwargs)
codec = self.encode(x, **kwargs)

@codecov
Copy link

codecov bot commented Oct 20, 2025

Codecov Report

❌ Patch coverage is 50.00000% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 56.77%. Comparing base (333b6f7) to head (c5e3bdf).
⚠️ Report is 722 commits behind head on master.

Files with missing lines Patch % Lines
espnet2/gan_codec/dac/dac.py 0.00% 3 Missing ⚠️
espnet2/gan_codec/soundstream/soundstream.py 66.66% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           master    #6268       +/-   ##
===========================================
+ Coverage   46.53%   56.77%   +10.24%     
===========================================
  Files         542      889      +347     
  Lines       49601    84363    +34762     
===========================================
+ Hits        23080    47899    +24819     
- Misses      26521    36464     +9943     
Flag Coverage Δ
test_integration_espnet2 46.80% <37.50%> (+0.26%) ⬆️
test_integration_espnetez 36.92% <ø> (?)
test_python_espnet2 51.20% <25.00%> (?)
test_python_espnetez 12.81% <0.00%> (?)
test_utils 18.77% <ø> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@sw005320
Copy link
Contributor

Oh, this sounds critical.
Thanks for catching up on it.
@ftshijt, can you review this PR?

@sw005320 sw005320 requested a review from ftshijt October 21, 2025 11:16
@sw005320 sw005320 added this to the v.202512 milestone Oct 21, 2025
@ftshijt
Copy link
Collaborator

ftshijt commented Oct 22, 2025

Thanks for your fixing! The fixes look great to me.

@ftshijt ftshijt merged commit 7bbb72f into espnet:master Oct 22, 2025
32 checks passed
@Fhrozen Fhrozen modified the milestones: v.202512, v.202511 Nov 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Bug bug should be fixed ESPnet2 size:S This PR changes 10-29 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants