Unified inference of streaming ASR#14817
Merged
Merged
Conversation
added 28 commits
September 26, 2025 15:05
Signed-off-by: naymaraq <[email protected]>
Signed-off-by: naymaraq <[email protected]>
Signed-off-by: naymaraq <[email protected]>
Signed-off-by: naymaraq <[email protected]>
Signed-off-by: naymaraq <[email protected]>
Signed-off-by: naymaraq <[email protected]>
Signed-off-by: naymaraq <[email protected]>
Signed-off-by: naymaraq <[email protected]>
Signed-off-by: naymaraq <[email protected]>
Signed-off-by: naymaraq <[email protected]>
Signed-off-by: naymaraq <[email protected]>
Signed-off-by: naymaraq <[email protected]>
Signed-off-by: naymaraq <[email protected]>
Signed-off-by: naymaraq <[email protected]>
Signed-off-by: naymaraq <[email protected]>
Signed-off-by: naymaraq <[email protected]>
Signed-off-by: naymaraq <[email protected]>
Signed-off-by: naymaraq <[email protected]>
Signed-off-by: naymaraq <[email protected]>
Signed-off-by: naymaraq <[email protected]>
Signed-off-by: naymaraq <[email protected]>
Signed-off-by: naymaraq <[email protected]>
Signed-off-by: naymaraq <[email protected]>
Signed-off-by: naymaraq <[email protected]>
Signed-off-by: naymaraq <[email protected]>
Signed-off-by: naymaraq <[email protected]>
Signed-off-by: naymaraq <[email protected]>
Signed-off-by: naymaraq <[email protected]>
5299973
artbataev
previously approved these changes
Oct 31, 2025
Signed-off-by: naymaraq <[email protected]>
Signed-off-by: naymaraq <[email protected]>
chtruong814
approved these changes
Nov 1, 2025
Collaborator
|
Last test failure was unrelated to this change. Moving forward with merging it. |
quapham
pushed a commit
to quapham/NeMo
that referenced
this pull request
Dec 16, 2025
* init inference folders Signed-off-by: naymaraq <[email protected]> * added base asr inference Signed-off-by: naymaraq <[email protected]> * add ctc and rnnt inference classes Signed-off-by: naymaraq <[email protected]> * small changes for ctc/rnnt inference Signed-off-by: naymaraq <[email protected]> * add cache aware ctc/rnnt inference classes Signed-off-by: naymaraq <[email protected]> * finilize asr inference part Signed-off-by: naymaraq <[email protected]> * add word class Signed-off-by: naymaraq <[email protected]> * add enums file Signed-off-by: naymaraq <[email protected]> * add alignment preserving itn Signed-off-by: naymaraq <[email protected]> * add punctuation/capitalization model Signed-off-by: naymaraq <[email protected]> * add audio_io and progressbar files Signed-off-by: naymaraq <[email protected]> * add framing and buffering files Signed-off-by: naymaraq <[email protected]> * mv common/inference/utils into asr/inference/utils Signed-off-by: naymaraq <[email protected]> * add StreamingState objects Signed-off-by: naymaraq <[email protected]> * temporary rm enhancement stuff Signed-off-by: naymaraq <[email protected]> * rm common/inference Signed-off-by: naymaraq <[email protected]> * add greedy decoders for CTC/RNNT Signed-off-by: naymaraq <[email protected]> * add endpointing files Signed-off-by: naymaraq <[email protected]> * add text processing Signed-off-by: naymaraq <[email protected]> * mv itn_utils into utils Signed-off-by: naymaraq <[email protected]> * add bpe_decoder, context_manager for cache aware, recognizer_utils Signed-off-by: naymaraq <[email protected]> * add base_recognizer and recognizer interface files Signed-off-by: naymaraq <[email protected]> * add recognizers Signed-off-by: naymaraq <[email protected]> * add factory Signed-off-by: naymaraq <[email protected]> * add inference example and asr_client.py Signed-off-by: naymaraq <[email protected]> * minor fix Signed-off-by: naymaraq <[email protected]> * minor fixes Signed-off-by: naymaraq <[email protected]> * add example usage Signed-off-by: naymaraq <[email protected]> * add jsonl support Signed-off-by: naymaraq <[email protected]> * rm niva prefix Signed-off-by: naymaraq <[email protected]> * fix docstrings Signed-off-by: naymaraq <[email protected]> * mv RequestType into enums.py Signed-off-by: naymaraq <[email protected]> * rm redundant setters Signed-off-by: naymaraq <[email protected]> * add a log_level to config.yaml Signed-off-by: naymaraq <[email protected]> * setup log_level in RecognizerBuilder Signed-off-by: naymaraq <[email protected]> * add comments in multi stream and fix docstrings in buffering Signed-off-by: naymaraq <[email protected]> * conditional import for diskcache Signed-off-by: naymaraq <[email protected]> * set log level to INFO Signed-off-by: naymaraq <[email protected]> * add MPS device support Signed-off-by: naymaraq <[email protected]> * add tests Signed-off-by: naymaraq <[email protected]> * move inference into examples/asr/asr_chunked_inference/ctc Signed-off-by: naymaraq <[email protected]> * rm duplicated create_partial_transcript method Signed-off-by: naymaraq <[email protected]> * Apply isort and black reformatting Signed-off-by: naymaraq <[email protected]> Signed-off-by: naymaraq <[email protected]> * resolve flake8 errors Signed-off-by: naymaraq <[email protected]> * resolve return type Signed-off-by: naymaraq <[email protected]> * fix imports in tests Signed-off-by: naymaraq <[email protected]> * optimize bpe_decoder Signed-off-by: naymaraq <[email protected]> * optimize log prob normalization Signed-off-by: naymaraq <[email protected]> * optimize split_text function Signed-off-by: naymaraq <[email protected]> * fix parital batching, improved GPU utilization Signed-off-by: naymaraq <[email protected]> * simplify ctc greedy decoder Signed-off-by: naymaraq <[email protected]> * add a method to perform ITN on a list of texts Signed-off-by: naymaraq <[email protected]> * remove duplicated code in enums Signed-off-by: naymaraq <[email protected]> * remove unnecessary pad_to logging Signed-off-by: naymaraq <[email protected]> * modified update_punctuation_and_language_tokens_timestamps function to ensure correct global timestamps for eou calculation Signed-off-by: naymaraq <[email protected]> * Apply isort and black reformatting Signed-off-by: naymaraq <[email protected]> * [refactor: segment-level output] conditional import for pynini and nemo_text_processing Signed-off-by: naymaraq <[email protected]> * [refactor: segment-level output] fix configs, added asr_output_granularity Signed-off-by: naymaraq <[email protected]> * [refactor: segment-level output] write segment/word level output into json instead of ctm Signed-off-by: naymaraq <[email protected]> * [refactor: segment-level output] add output granuality to request options Signed-off-by: naymaraq <[email protected]> * [refactor: segment-level output] add segment related fields to state Signed-off-by: naymaraq <[email protected]> * [refactor: segment-level output] add remove repeated punctuation function Signed-off-by: naymaraq <[email protected]> * [refactor: segment-level output] add TextSegment class Signed-off-by: naymaraq <[email protected]> * [refactor: segment-level output] update bpe decoder to support text segment Signed-off-by: naymaraq <[email protected]> * [refactor: segment-level output] update recognizers Signed-off-by: naymaraq <[email protected]> * [refactor: segment-level output] update text processing to support segment-level output Signed-off-by: naymaraq <[email protected]> * rm unused and duplicated code Signed-off-by: naymaraq <[email protected]> * Apply isort and black reformatting Signed-off-by: naymaraq <[email protected]> * code cleanup Signed-off-by: naymaraq <[email protected]> * Apply isort and black reformatting Signed-off-by: naymaraq <[email protected]> * rm unused code and code cleanup Signed-off-by: naymaraq <[email protected]> * Set num_slots to 1024 and add a num_slots parameter to the config files Signed-off-by: naymaraq <[email protected]> * removed hyp.alignment processing codes Signed-off-by: naymaraq <[email protected]> * disable amp Signed-off-by: naymaraq <[email protected]> * mv diskcache req into requirements_asr.txt Signed-off-by: naymaraq <[email protected]> * set use_amp to true and make typing consistent Signed-off-by: naymaraq <[email protected]> * use match/case for readability Signed-off-by: naymaraq <[email protected]> * rm lambdas from punctuation_capitalization_config.py Signed-off-by: naymaraq <[email protected]> * rm detect_eou method from RNNTGreedyEndpointing Signed-off-by: naymaraq <[email protected]> * reuse read_manifest from manifest_utils Signed-off-by: naymaraq <[email protected]> * use librosa instead of soundfile Signed-off-by: naymaraq <[email protected]> * unfreeze ASRRequestOptions dataclass Signed-off-by: naymaraq <[email protected]> * set use_amp to false for buffered CTC/RNNT recognizers, improved throughput Signed-off-by: naymaraq <[email protected]> * change matmul precision to high for cache aware models Signed-off-by: naymaraq <[email protected]> * optimized audio buffer shifting Signed-off-by: naymaraq <[email protected]> * Move running scripts and YAML files out of the ctc folder Signed-off-by: naymaraq <[email protected]> * reorganize file structure Signed-off-by: naymaraq <[email protected]> * Apply isort and black reformatting Signed-off-by: naymaraq <[email protected]> * Minor code simplifications Signed-off-by: naymaraq <[email protected]> * rm duplicated initializations from recognizers Signed-off-by: naymaraq <[email protected]> * remove package version for diskcache Signed-off-by: naymaraq <[email protected]> * move tqdm import to the top Signed-off-by: naymaraq <[email protected]> * simplify millisecond_to_frames function Signed-off-by: naymaraq <[email protected]> * raise a ValueError in case of stream_id > n_audio_files Signed-off-by: naymaraq <[email protected]> * fix return types Signed-off-by: naymaraq <[email protected]> * use list/dict/... instead of List/Dict/... Signed-off-by: naymaraq <[email protected]> * use keyword argument passing to create CacheFeatureBufferer Signed-off-by: naymaraq <[email protected]> * clean up state resetting logic Signed-off-by: naymaraq <[email protected]> * reuse normalize_batch Signed-off-by: naymaraq <[email protected]> * rename verbatim_transcripts and automatic_punctuation Signed-off-by: naymaraq <[email protected]> * rename recognizers to pipelines Signed-off-by: naymaraq <[email protected]> * rename asr/*_inference -> model_wrappers/*_inference_wrapper Signed-off-by: naymaraq <[email protected]> * Apply isort and black reformatting Signed-off-by: naymaraq <[email protected]> * reorgonize pnc, itn, text_processing params Signed-off-by: naymaraq <[email protected]> * improved code readability in pipeline initializations Signed-off-by: naymaraq <[email protected]> * Apply isort and black reformatting Signed-off-by: naymaraq <[email protected]> * add CI script for testing Signed-off-by: naymaraq <[email protected]> * add output_dir in CI test Signed-off-by: naymaraq <[email protected]> * move python running script into new folder Signed-off-by: naymaraq <[email protected]> * renamed asr_streaming_infer -> asr_streaming_inference Signed-off-by: naymaraq <[email protected]> * correct path in CI test Signed-off-by: naymaraq <[email protected]> * fix: variable may be used before it is initialized Signed-off-by: naymaraq <[email protected]> * fix docstring in itn/ folder Signed-off-by: naymaraq <[email protected]> * fix docstring in model_wrappers/ folder Signed-off-by: naymaraq <[email protected]> * fix docstring in utils/ folder Signed-off-by: naymaraq <[email protected]> * fix docstring in pipelines/ folder Signed-off-by: naymaraq <[email protected]> * fix docstring in streaming/ folder Signed-off-by: naymaraq <[email protected]> * remove PnC codes since nlp models are no longer supported Signed-off-by: naymaraq <[email protected]> * minor changes Signed-off-by: naymaraq <[email protected]> * return step output from transcribe_step method Signed-off-by: naymaraq <[email protected]> * Apply isort and black reformatting Signed-off-by: naymaraq <[email protected]> * fix functional_test Signed-off-by: naymaraq <[email protected]> * increase timeout for L0_Unit_Tests_CPU_ASR Signed-off-by: naymaraq <[email protected]> * rm cache aware inference from functional test Signed-off-by: naymaraq <[email protected]> --------- Signed-off-by: naymaraq <[email protected]> Signed-off-by: naymaraq <[email protected]> Co-authored-by: naymaraq <[email protected]> Co-authored-by: naymaraq <[email protected]> Signed-off-by: quanpham <[email protected]>
nune-tadevosyan
pushed a commit
to nune-tadevosyan/NeMo
that referenced
this pull request
Mar 13, 2026
* init inference folders Signed-off-by: naymaraq <[email protected]> * added base asr inference Signed-off-by: naymaraq <[email protected]> * add ctc and rnnt inference classes Signed-off-by: naymaraq <[email protected]> * small changes for ctc/rnnt inference Signed-off-by: naymaraq <[email protected]> * add cache aware ctc/rnnt inference classes Signed-off-by: naymaraq <[email protected]> * finilize asr inference part Signed-off-by: naymaraq <[email protected]> * add word class Signed-off-by: naymaraq <[email protected]> * add enums file Signed-off-by: naymaraq <[email protected]> * add alignment preserving itn Signed-off-by: naymaraq <[email protected]> * add punctuation/capitalization model Signed-off-by: naymaraq <[email protected]> * add audio_io and progressbar files Signed-off-by: naymaraq <[email protected]> * add framing and buffering files Signed-off-by: naymaraq <[email protected]> * mv common/inference/utils into asr/inference/utils Signed-off-by: naymaraq <[email protected]> * add StreamingState objects Signed-off-by: naymaraq <[email protected]> * temporary rm enhancement stuff Signed-off-by: naymaraq <[email protected]> * rm common/inference Signed-off-by: naymaraq <[email protected]> * add greedy decoders for CTC/RNNT Signed-off-by: naymaraq <[email protected]> * add endpointing files Signed-off-by: naymaraq <[email protected]> * add text processing Signed-off-by: naymaraq <[email protected]> * mv itn_utils into utils Signed-off-by: naymaraq <[email protected]> * add bpe_decoder, context_manager for cache aware, recognizer_utils Signed-off-by: naymaraq <[email protected]> * add base_recognizer and recognizer interface files Signed-off-by: naymaraq <[email protected]> * add recognizers Signed-off-by: naymaraq <[email protected]> * add factory Signed-off-by: naymaraq <[email protected]> * add inference example and asr_client.py Signed-off-by: naymaraq <[email protected]> * minor fix Signed-off-by: naymaraq <[email protected]> * minor fixes Signed-off-by: naymaraq <[email protected]> * add example usage Signed-off-by: naymaraq <[email protected]> * add jsonl support Signed-off-by: naymaraq <[email protected]> * rm niva prefix Signed-off-by: naymaraq <[email protected]> * fix docstrings Signed-off-by: naymaraq <[email protected]> * mv RequestType into enums.py Signed-off-by: naymaraq <[email protected]> * rm redundant setters Signed-off-by: naymaraq <[email protected]> * add a log_level to config.yaml Signed-off-by: naymaraq <[email protected]> * setup log_level in RecognizerBuilder Signed-off-by: naymaraq <[email protected]> * add comments in multi stream and fix docstrings in buffering Signed-off-by: naymaraq <[email protected]> * conditional import for diskcache Signed-off-by: naymaraq <[email protected]> * set log level to INFO Signed-off-by: naymaraq <[email protected]> * add MPS device support Signed-off-by: naymaraq <[email protected]> * add tests Signed-off-by: naymaraq <[email protected]> * move inference into examples/asr/asr_chunked_inference/ctc Signed-off-by: naymaraq <[email protected]> * rm duplicated create_partial_transcript method Signed-off-by: naymaraq <[email protected]> * Apply isort and black reformatting Signed-off-by: naymaraq <[email protected]> Signed-off-by: naymaraq <[email protected]> * resolve flake8 errors Signed-off-by: naymaraq <[email protected]> * resolve return type Signed-off-by: naymaraq <[email protected]> * fix imports in tests Signed-off-by: naymaraq <[email protected]> * optimize bpe_decoder Signed-off-by: naymaraq <[email protected]> * optimize log prob normalization Signed-off-by: naymaraq <[email protected]> * optimize split_text function Signed-off-by: naymaraq <[email protected]> * fix parital batching, improved GPU utilization Signed-off-by: naymaraq <[email protected]> * simplify ctc greedy decoder Signed-off-by: naymaraq <[email protected]> * add a method to perform ITN on a list of texts Signed-off-by: naymaraq <[email protected]> * remove duplicated code in enums Signed-off-by: naymaraq <[email protected]> * remove unnecessary pad_to logging Signed-off-by: naymaraq <[email protected]> * modified update_punctuation_and_language_tokens_timestamps function to ensure correct global timestamps for eou calculation Signed-off-by: naymaraq <[email protected]> * Apply isort and black reformatting Signed-off-by: naymaraq <[email protected]> * [refactor: segment-level output] conditional import for pynini and nemo_text_processing Signed-off-by: naymaraq <[email protected]> * [refactor: segment-level output] fix configs, added asr_output_granularity Signed-off-by: naymaraq <[email protected]> * [refactor: segment-level output] write segment/word level output into json instead of ctm Signed-off-by: naymaraq <[email protected]> * [refactor: segment-level output] add output granuality to request options Signed-off-by: naymaraq <[email protected]> * [refactor: segment-level output] add segment related fields to state Signed-off-by: naymaraq <[email protected]> * [refactor: segment-level output] add remove repeated punctuation function Signed-off-by: naymaraq <[email protected]> * [refactor: segment-level output] add TextSegment class Signed-off-by: naymaraq <[email protected]> * [refactor: segment-level output] update bpe decoder to support text segment Signed-off-by: naymaraq <[email protected]> * [refactor: segment-level output] update recognizers Signed-off-by: naymaraq <[email protected]> * [refactor: segment-level output] update text processing to support segment-level output Signed-off-by: naymaraq <[email protected]> * rm unused and duplicated code Signed-off-by: naymaraq <[email protected]> * Apply isort and black reformatting Signed-off-by: naymaraq <[email protected]> * code cleanup Signed-off-by: naymaraq <[email protected]> * Apply isort and black reformatting Signed-off-by: naymaraq <[email protected]> * rm unused code and code cleanup Signed-off-by: naymaraq <[email protected]> * Set num_slots to 1024 and add a num_slots parameter to the config files Signed-off-by: naymaraq <[email protected]> * removed hyp.alignment processing codes Signed-off-by: naymaraq <[email protected]> * disable amp Signed-off-by: naymaraq <[email protected]> * mv diskcache req into requirements_asr.txt Signed-off-by: naymaraq <[email protected]> * set use_amp to true and make typing consistent Signed-off-by: naymaraq <[email protected]> * use match/case for readability Signed-off-by: naymaraq <[email protected]> * rm lambdas from punctuation_capitalization_config.py Signed-off-by: naymaraq <[email protected]> * rm detect_eou method from RNNTGreedyEndpointing Signed-off-by: naymaraq <[email protected]> * reuse read_manifest from manifest_utils Signed-off-by: naymaraq <[email protected]> * use librosa instead of soundfile Signed-off-by: naymaraq <[email protected]> * unfreeze ASRRequestOptions dataclass Signed-off-by: naymaraq <[email protected]> * set use_amp to false for buffered CTC/RNNT recognizers, improved throughput Signed-off-by: naymaraq <[email protected]> * change matmul precision to high for cache aware models Signed-off-by: naymaraq <[email protected]> * optimized audio buffer shifting Signed-off-by: naymaraq <[email protected]> * Move running scripts and YAML files out of the ctc folder Signed-off-by: naymaraq <[email protected]> * reorganize file structure Signed-off-by: naymaraq <[email protected]> * Apply isort and black reformatting Signed-off-by: naymaraq <[email protected]> * Minor code simplifications Signed-off-by: naymaraq <[email protected]> * rm duplicated initializations from recognizers Signed-off-by: naymaraq <[email protected]> * remove package version for diskcache Signed-off-by: naymaraq <[email protected]> * move tqdm import to the top Signed-off-by: naymaraq <[email protected]> * simplify millisecond_to_frames function Signed-off-by: naymaraq <[email protected]> * raise a ValueError in case of stream_id > n_audio_files Signed-off-by: naymaraq <[email protected]> * fix return types Signed-off-by: naymaraq <[email protected]> * use list/dict/... instead of List/Dict/... Signed-off-by: naymaraq <[email protected]> * use keyword argument passing to create CacheFeatureBufferer Signed-off-by: naymaraq <[email protected]> * clean up state resetting logic Signed-off-by: naymaraq <[email protected]> * reuse normalize_batch Signed-off-by: naymaraq <[email protected]> * rename verbatim_transcripts and automatic_punctuation Signed-off-by: naymaraq <[email protected]> * rename recognizers to pipelines Signed-off-by: naymaraq <[email protected]> * rename asr/*_inference -> model_wrappers/*_inference_wrapper Signed-off-by: naymaraq <[email protected]> * Apply isort and black reformatting Signed-off-by: naymaraq <[email protected]> * reorgonize pnc, itn, text_processing params Signed-off-by: naymaraq <[email protected]> * improved code readability in pipeline initializations Signed-off-by: naymaraq <[email protected]> * Apply isort and black reformatting Signed-off-by: naymaraq <[email protected]> * add CI script for testing Signed-off-by: naymaraq <[email protected]> * add output_dir in CI test Signed-off-by: naymaraq <[email protected]> * move python running script into new folder Signed-off-by: naymaraq <[email protected]> * renamed asr_streaming_infer -> asr_streaming_inference Signed-off-by: naymaraq <[email protected]> * correct path in CI test Signed-off-by: naymaraq <[email protected]> * fix: variable may be used before it is initialized Signed-off-by: naymaraq <[email protected]> * fix docstring in itn/ folder Signed-off-by: naymaraq <[email protected]> * fix docstring in model_wrappers/ folder Signed-off-by: naymaraq <[email protected]> * fix docstring in utils/ folder Signed-off-by: naymaraq <[email protected]> * fix docstring in pipelines/ folder Signed-off-by: naymaraq <[email protected]> * fix docstring in streaming/ folder Signed-off-by: naymaraq <[email protected]> * remove PnC codes since nlp models are no longer supported Signed-off-by: naymaraq <[email protected]> * minor changes Signed-off-by: naymaraq <[email protected]> * return step output from transcribe_step method Signed-off-by: naymaraq <[email protected]> * Apply isort and black reformatting Signed-off-by: naymaraq <[email protected]> * fix functional_test Signed-off-by: naymaraq <[email protected]> * increase timeout for L0_Unit_Tests_CPU_ASR Signed-off-by: naymaraq <[email protected]> * rm cache aware inference from functional test Signed-off-by: naymaraq <[email protected]> --------- Signed-off-by: naymaraq <[email protected]> Signed-off-by: naymaraq <[email protected]> Co-authored-by: naymaraq <[email protected]> Co-authored-by: naymaraq <[email protected]>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Important
The
Update branchbutton must only be pressed in very rare occassions.An outdated branch is never blocking the merge of a PR.
Please reach out to the automation team before pressing that button.
What does this PR do ?
This PR contains inference support for streaming ASR, including Buffered CTC/RNN-T/TDT and Cache-Aware CTC/RNN-T
Collection: [ASR]
Changelog
Usage
# Add a code snippet demonstrating how to use thisGitHub Actions CI
The Jenkins CI system has been replaced by GitHub Actions self-hosted runners.
The GitHub Actions CI will run automatically when the "Run CICD" label is added to the PR.
To re-run CI remove and add the label again.
To run CI on an untrusted fork, a NeMo user with write access must first click "Approve and run".
Before your PR is "Ready for review"
Pre checks:
PR Type:
If you haven't finished some of the above items you can still open "Draft" PR.
Who can review?
Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.
Additional Information