Thanks to visit codestin.com
Credit goes to github.com

Skip to content

[ESPnet-3] Merge master into espnet3 branch#6263

Merged
Masao-Someki merged 138 commits intoespnet:espnet3from
Masao-Someki:merge_master
Oct 18, 2025
Merged

[ESPnet-3] Merge master into espnet3 branch#6263
Masao-Someki merged 138 commits intoespnet:espnet3from
Masao-Someki:merge_master

Conversation

@Masao-Someki
Copy link
Contributor

What did you change?

Merged master branch into espnet3 branch


Why did you make this change?

To fix the CI issue.
See #6178 for details


Is your PR small enough?

no, but this is just a merge PR


Additional Context

@dosubot dosubot bot added size:XL This PR changes 500-999 lines, ignoring generated files. CI Travis, Circle CI, etc labels Oct 17, 2025
@Masao-Someki
Copy link
Contributor Author

@sw005320
I just create a PR that just merges master into espnet3 branch.
I think we can merge this PR after CI has passed.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request merges a significant number of changes from the master branch, introducing new features like Language Identification (LID) and SpeechLM tasks, along with forced alignment capabilities and new data samplers. The updates also include robustness improvements, such as better dependency handling and support for Apple's MPS devices. While the majority of the changes appear solid and well-integrated with new tests, I've identified a critical issue in the new force_align.py script that could lead to incorrect results by silently truncating input audio. My review focuses on this critical point to ensure the script's correctness.

Comment on lines +58 to +90
def prepare_speech(speech, model, device):
"""
Prepare speech tensor for model input.

Args:
speech: Audio waveform (numpy array or torch tensor)
model: Speech2Text model instance
device: Device to place tensor on

Returns:
Tuple of (speech_tensor, speech_lengths)
"""
if isinstance(speech, np.ndarray):
speech = torch.tensor(speech)

if speech.dim() > 1:
assert (
speech.dim() == 2 and speech.size(1) == 1
), f"Speech of size {speech.size()} is not supported"
speech = speech.squeeze(1)

speech_length = int(
model.preprocessor_conf["fs"] * model.preprocessor_conf["speech_length"]
)
original_length = speech.size(-1)

if original_length >= speech_length:
speech = speech[:speech_length]
else:
speech = F.pad(speech, (0, speech_length - original_length))
speech = speech.unsqueeze(0).to(getattr(torch, model.dtype))
speech_lengths = speech.new_full([1], dtype=torch.long, fill_value=speech.shape[1])
return speech, speech_lengths
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The current implementation of prepare_speech truncates or pads the input audio to a fixed length derived from the model's training configuration. This will cause any audio file longer than the configured speech_length to be silently truncated, leading to incomplete and incorrect forced alignment results. An alignment utility should process the entire audio file to be useful in a general context.

The suggested change removes this fixed-length processing, ensuring that the entire audio waveform is passed to the model for alignment.

def prepare_speech(speech, model, device):
    """
    Prepare speech tensor for model input.

    Args:
        speech: Audio waveform (numpy array or torch tensor)
        model: Speech2Text model instance
        device: Device to place tensor on

    Returns:
        Tuple of (speech_tensor, speech_lengths)
    """
    if isinstance(speech, np.ndarray):
        speech = torch.tensor(speech)

    if speech.dim() > 1:
        assert (
            speech.dim() == 2 and speech.size(1) == 1
        ), f"Speech of size {speech.size()} is not supported"
        speech = speech.squeeze(1)

    speech = speech.unsqueeze(0).to(getattr(torch, model.dtype))
    speech_lengths = speech.new_full([1], dtype=torch.long, fill_value=speech.shape[1])
    return speech, speech_lengths

@codecov
Copy link

codecov bot commented Oct 17, 2025

Codecov Report

❌ Patch coverage is 70.74236% with 67 lines in your changes missing coverage. Please review.
✅ Project coverage is 70.14%. Comparing base (4531bcc) to head (c5aea57).
⚠️ Report is 139 commits behind head on espnet3.

Files with missing lines Patch % Lines
espnet2/bin/lid_inference.py 70.22% 53 Missing ⚠️
espnet2/iterators/category_iter_factory.py 30.00% 7 Missing ⚠️
espnet2/tasks/abs_task.py 57.14% 3 Missing ⚠️
espnet2/bin/asr_align.py 50.00% 2 Missing ⚠️
espnet2/bin/s2t_ctc_align.py 50.00% 2 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##           espnet3    #6263       +/-   ##
============================================
+ Coverage         0   70.14%   +70.14%     
============================================
  Files            0      751      +751     
  Lines            0    69057    +69057     
============================================
+ Hits             0    48441    +48441     
- Misses           0    20616    +20616     
Flag Coverage Δ
test_integration_espnet2 47.88% <43.43%> (?)
test_python_espnet2 62.76% <58.51%> (?)
test_python_espnet3 15.98% <1.74%> (?)
test_utils 62.76% <58.51%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@Masao-Someki Masao-Someki changed the title Merge master [ESPnet-3] Merge master into espnet3 branch Oct 18, 2025
@Masao-Someki Masao-Someki merged commit 8b3fea3 into espnet:espnet3 Oct 18, 2025
28 checks passed
@Fhrozen Fhrozen added this to the v.202512 milestone Oct 26, 2025
@Fhrozen Fhrozen modified the milestones: v.202512, v.202511 Nov 14, 2025
@Masao-Someki Masao-Someki deleted the merge_master branch November 26, 2025 18:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CI Travis, Circle CI, etc ESPnet1 ESPnet2 Installation README size:XL This PR changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants