Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Implement unified batch decode interface for OWSM-CTC#6007

Merged
sw005320 merged 5 commits intoespnet:masterfrom
pyf98:owsmctc-test
Jan 13, 2025
Merged

Implement unified batch decode interface for OWSM-CTC#6007
sw005320 merged 5 commits intoespnet:masterfrom
pyf98:owsmctc-test

Conversation

@pyf98
Copy link
Collaborator

@pyf98 pyf98 commented Jan 6, 2025

What?

  • Implemented a unified batch decode interface for OWSM-CTC greedy search. The decode_batch method can decode a batch of audios which can be either short-form or long-form. Each audio can be provided as a path, a numpy 1-D array or a torch 1-D tensor. This makes the usage more flexible.
  • Enabled flash attention for inference. With mixed precision and flash attention, we can decode ~200 samples at the same time on a GPU with 96GB memory.
  • Added a stand-alone utility script to average model checkpoints.

@mergify mergify bot added the ESPnet2 label Jan 6, 2025
@pyf98 pyf98 changed the title Add a utility method to average checkpoints Implement unified batch decode interface for OWSM-CTC Jan 8, 2025
@pyf98
Copy link
Collaborator Author

pyf98 commented Jan 11, 2025

This PR is ready.

@sw005320 sw005320 added this to the v.202503 milestone Jan 12, 2025
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do you need this change?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The previous code uses flash attention only during training. But we can also use it for inference.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a test for this batch decoding?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added.

@codecov
Copy link

codecov bot commented Jan 12, 2025

Codecov Report

Attention: Patch coverage is 0.91743% with 108 lines in your changes missing coverage. Please review.

Project coverage is 14.52%. Comparing base (522891b) to head (4474435).
Report is 95 commits behind head on master.

Files with missing lines Patch % Lines
espnet2/bin/s2t_inference_ctc.py 0.00% 106 Missing ⚠️
...pnet/nets/pytorch_backend/transformer/attention.py 33.33% 2 Missing ⚠️

❗ There is a different number of reports uploaded between BASE (522891b) and HEAD (4474435). Click for more details.

HEAD has 8 uploads less than BASE
Flag BASE (522891b) HEAD (4474435)
test_integration_espnet2 8 0
Additional details and impacted files
@@             Coverage Diff             @@
##           master    #6007       +/-   ##
===========================================
- Coverage   47.49%   14.52%   -32.97%     
===========================================
  Files         529      854      +325     
  Lines       47850    80268    +32418     
===========================================
- Hits        22727    11660    -11067     
- Misses      25123    68608    +43485     
Flag Coverage Δ
test_integration_espnet2 ?
test_python_espnetez 12.72% <0.00%> (?)
test_utils 20.64% <33.33%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@sw005320 sw005320 merged commit b927b00 into espnet:master Jan 13, 2025
38 of 39 checks passed
@sw005320
Copy link
Contributor

Thanks!

@pyf98 pyf98 deleted the owsmctc-test branch January 13, 2025 22:53
Shikhar-S pushed a commit to Shikhar-S/espnet that referenced this pull request Mar 13, 2025
Implement unified batch decode interface for OWSM-CTC
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants