KEMBAR78
Initial Chunking by nune-tadevosyan · Pull Request #14321 · NVIDIA-NeMo/NeMo · GitHub
Skip to content

Conversation

@nune-tadevosyan
Copy link
Collaborator

@nune-tadevosyan nune-tadevosyan commented Jul 24, 2025

Important

The Update branch button must only be pressed in very rare occassions.
An outdated branch is never blocking the merge of a PR.
Please reach out to the automation team before pressing that button.

What does this PR do ?

  • Allows dynamic chunking mechanism for AED models.

Collection: [ASR]

Changelog

  • Add dynamic chunking ability .transcribe()
  • Support dynamic chunking with timestamps
  • Add unit tests

Usage

The dynamic chunking feature is automatically enabled when calling .transcribe() on a single audio file, or when using batch_size=1 with multiple audio files that are longer than 40 seconds.

from nemo.collections.asr.models import EncDecMultiTaskModel
canary_model = EncDecMultiTaskModel.from_pretrained('nvidia/canary-1b-v2')
output=canary_model.transcribe(['<file_path>'],timestamps=True) # or set to False for transcription without timestamps

print("predicted text:", output[0].text)
# word level timestamps
print(output[0].timestamp['word'])

# segment level timestamps
print(output[0].timestamp['segment'])

GitHub Actions CI

The Jenkins CI system has been replaced by GitHub Actions self-hosted runners.

The GitHub Actions CI will run automatically when the "Run CICD" label is added to the PR.
To re-run CI remove and add the label again.
To run CI on an untrusted fork, a NeMo user with write access must first click "Approve and run".

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you add or update any necessary documentation?
  • Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
    • Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

  • New Feature
  • Bugfix
  • Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.

Additional Information

  • Related to # (issue)

monica-sekoyan and others added 12 commits July 3, 2025 04:45
Signed-off-by: Monica Sekoyan <msekoyan@nvidia.com>
Signed-off-by: Monica Sekoyan <msekoyan@nvidia.com>
Signed-off-by: monica-sekoyan <monica-sekoyan@users.noreply.github.com>
Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>
Signed-off-by: monica-sekoyan <monica-sekoyan@users.noreply.github.com>
Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>
Signed-off-by: Nune <ntadevosyan@nvidia.com>
Signed-off-by: nune-tadevosyan <nune-tadevosyan@users.noreply.github.com>
@github-actions github-actions bot added the ASR label Jul 24, 2025
monica-sekoyan and others added 5 commits July 27, 2025 08:48
Signed-off-by: Monica Sekoyan <msekoyan@nvidia.com>
Signed-off-by: monica-sekoyan <monica-sekoyan@users.noreply.github.com>
Signed-off-by: Monica Sekoyan <msekoyan@nvidia.com>
Signed-off-by: monica-sekoyan <monica-sekoyan@users.noreply.github.com>
@nune-tadevosyan nune-tadevosyan force-pushed the nune/canary_chunking branch 5 times, most recently from 5df3f2f to 90c3bec Compare August 3, 2025 13:05
@nune-tadevosyan nune-tadevosyan force-pushed the nune/canary_chunking branch 5 times, most recently from 3dfcf67 to af3df5e Compare August 7, 2025 17:57
for i in range(audio.shape[0]):
waveform = audio[i, : audio_lens[i]]
# Split the waveform into chunks and get their lengths.
chunks, chunk_lens = self._chunk_waveform(waveform)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn;t be there an option for overlap control here?

Base automatically changed from msekoyan/canary2_timestamps to main August 13, 2025 20:27
@nithinraok nithinraok dismissed their stale review August 13, 2025 20:27

The base branch was changed.

@nithinraok nithinraok changed the base branch from main to msekoyan/canary2_timestamps August 13, 2025 22:01
Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>
@github-actions github-actions bot added core Changes to NeMo Core NLP labels Aug 13, 2025
@ko3n1g ko3n1g added Run CICD and removed Run CICD labels Aug 13, 2025
@nithinraok nithinraok changed the base branch from msekoyan/canary2_timestamps to main August 13, 2025 22:24
@nithinraok nithinraok marked this pull request as ready for review August 13, 2025 22:25
nithinraok
nithinraok previously approved these changes Aug 13, 2025
Signed-off-by: Charlie Truong <chtruong@nvidia.com>
@chtruong814 chtruong814 enabled auto-merge (squash) August 14, 2025 03:32
@github-actions github-actions bot removed the Run CICD label Aug 14, 2025
@github-actions
Copy link
Contributor

[🤖]: Hi @nune-tadevosyan 👋,

We wanted to let you know that a CICD pipeline for this PR just finished successfully.

So it might be time to merge this PR or get some approvals.

//cc @chtruong814 @ko3n1g @pablo-garay @thomasdhc

@chtruong814 chtruong814 merged commit e503a6e into main Aug 14, 2025
238 of 250 checks passed
@chtruong814 chtruong814 deleted the nune/canary_chunking branch August 14, 2025 03:46
guyueh1 pushed a commit to guyueh1/NeMo that referenced this pull request Aug 25, 2025
* adding nfa to canary

Signed-off-by: Monica Sekoyan <msekoyan@nvidia.com>

* remove comments

Signed-off-by: Monica Sekoyan <msekoyan@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: monica-sekoyan <monica-sekoyan@users.noreply.github.com>

* modify external model loading

Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: monica-sekoyan <monica-sekoyan@users.noreply.github.com>

* fix audio padding

Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>

* reseting

Signed-off-by: Nune <ntadevosyan@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: nune-tadevosyan <nune-tadevosyan@users.noreply.github.com>

* handle non-possible alignment

Signed-off-by: Monica Sekoyan <msekoyan@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: monica-sekoyan <monica-sekoyan@users.noreply.github.com>

* add offset refinement

Signed-off-by: Monica Sekoyan <msekoyan@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: monica-sekoyan <monica-sekoyan@users.noreply.github.com>

* Initial Chunking

Signed-off-by: Nune <ntadevosyan@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: nune-tadevosyan <nune-tadevosyan@users.noreply.github.com>
Signed-off-by: Nune <ntadevosyan@nvidia.com>

* Adding comments and docstrings

Signed-off-by: Nune <ntadevosyan@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: nune-tadevosyan <nune-tadevosyan@users.noreply.github.com>
Signed-off-by: Nune <ntadevosyan@nvidia.com>

* Changes in doctrings

Signed-off-by: Nune <ntadevosyan@nvidia.com>

* Changes in doctrings

Signed-off-by: Nune <ntadevosyan@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: nune-tadevosyan <nune-tadevosyan@users.noreply.github.com>
Signed-off-by: Nune <ntadevosyan@nvidia.com>

* Updates to the algrithm

Signed-off-by: Nune <ntadevosyan@nvidia.com>

* Update with timestamps

Signed-off-by: Nune <ntadevosyan@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: nune-tadevosyan <nune-tadevosyan@users.noreply.github.com>
Signed-off-by: Nune <ntadevosyan@nvidia.com>

* Remove join_text

Signed-off-by: Nune <ntadevosyan@nvidia.com>

* Final

Signed-off-by: Nune <ntadevosyan@nvidia.com>

* Remove pdb

Signed-off-by: Nune <ntadevosyan@nvidia.com>

* Adjust timestamps

Signed-off-by: Nune <ntadevosyan@nvidia.com>

* Adjust timestamps

Signed-off-by: Nune <ntadevosyan@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: nune-tadevosyan <nune-tadevosyan@users.noreply.github.com>
Signed-off-by: Nune <ntadevosyan@nvidia.com>

* Support for long audio

Signed-off-by: Nune <ntadevosyan@nvidia.com>

* Refactoring to keep model clean

Signed-off-by: Nune <ntadevosyan@nvidia.com>

* Small changes

Signed-off-by: Nune <ntadevosyan@nvidia.com>

* Removing changes from mixin

Signed-off-by: Nune <ntadevosyan@nvidia.com>

* small updates

Signed-off-by: Nune <ntadevosyan@nvidia.com>

* Back to main for mixin

Signed-off-by: Nune <ntadevosyan@nvidia.com>

* Fix for hypotheses

Signed-off-by: Nune <ntadevosyan@nvidia.com>

* Revert "Fix for hypotheses"

This reverts commit 61fb893.

Signed-off-by: Nune <ntadevosyan@nvidia.com>

* Fix for hypotheses

Signed-off-by: Nune <ntadevosyan@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: nune-tadevosyan <nune-tadevosyan@users.noreply.github.com>
Signed-off-by: Nune <ntadevosyan@nvidia.com>

* Revert "Revert "Fix for hypotheses""

This reverts commit 3c62a2d.

Signed-off-by: Nune <ntadevosyan@nvidia.com>

* Resolve

Signed-off-by: Nune <ntadevosyan@nvidia.com>

* Allowing user to control chunking

Signed-off-by: Nune <ntadevosyan@nvidia.com>

* Doc changes

Signed-off-by: Nune <ntadevosyan@nvidia.com>

* Forcing true for chunking

Signed-off-by: Nune <ntadevosyan@nvidia.com>

* Revert "reseting"

This reverts commit 6d74ad0.

Signed-off-by: monica-sekoyan <msekoyan@vidia.com>

* Revert "Apply isort and black reformatting"

This reverts commit 1d8c363.

Signed-off-by: monica-sekoyan <msekoyan@vidia.com>

* handle merge case for timestamps

Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>

* add timestamp_type

Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: monica-sekoyan <monica-sekoyan@users.noreply.github.com>

* add timestamps support chunked inference

Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>

* refactor ctc timestamps to use utils

Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>

* correct restore_token_cased with unk_token

Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>

* use timestamps utils in rnnt_decoding

Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>

* change external timestamps asr model loading

Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>

* add forced aligned method tests

Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>

* modify nfa to match new setup and utils

Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: monica-sekoyan <monica-sekoyan@users.noreply.github.com>

* remove unused imports

Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>

* merge conflicts

Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>

* remove unused errors

Signed-off-by: monica-sekoyan <msekoyan@vidia.com>

* Apply isort and black reformatting

Signed-off-by: monica-sekoyan <monica-sekoyan@users.noreply.github.com>

* remove unused import

Signed-off-by: monica-sekoyan <msekoyan@vidia.com>

* addressing comments, linting and flake8

Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: monica-sekoyan <monica-sekoyan@users.noreply.github.com>

* handle decode_ids_to_str change

Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: monica-sekoyan <monica-sekoyan@users.noreply.github.com>

* correct usage of decode_tokens_to_str

Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: nune-tadevosyan <nune-tadevosyan@users.noreply.github.com>

* update nfa docs

Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: monica-sekoyan <monica-sekoyan@users.noreply.github.com>

* revert jupyter settings

Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>

* Merge and Tests

Signed-off-by: Nune <ntadevosyan@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: nune-tadevosyan <nune-tadevosyan@users.noreply.github.com>

* Unit tests

Signed-off-by: Nune <ntadevosyan@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: nune-tadevosyan <nune-tadevosyan@users.noreply.github.com>

* change decoding_tokens_to_str

Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: nune-tadevosyan <nune-tadevosyan@users.noreply.github.com>

* change decoding_tokens_to_str

Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: nune-tadevosyan <nune-tadevosyan@users.noreply.github.com>

* Update

Signed-off-by: Nune <ntadevosyan@nvidia.com>

* Doc updates

Signed-off-by: Nune <ntadevosyan@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: nune-tadevosyan <nune-tadevosyan@users.noreply.github.com>

* Doc updates

Signed-off-by: Nune <ntadevosyan@nvidia.com>

* Doc change for speech_to_text_aed_chunked_infer

Signed-off-by: Nune <ntadevosyan@nvidia.com>

* Remove some import

Signed-off-by: Nune <ntadevosyan@nvidia.com>

* Copyright

Signed-off-by: Nune <ntadevosyan@nvidia.com>

* Remove some import

Signed-off-by: Nune <ntadevosyan@nvidia.com>

* correct  description

Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>

* make  private

Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>

* rewrite restore_timestamps_asr_model

Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: monica-sekoyan <monica-sekoyan@users.noreply.github.com>

* Update timestamps

Signed-off-by: Nune <ntadevosyan@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: nune-tadevosyan <nune-tadevosyan@users.noreply.github.com>

* Small updates

Signed-off-by: Nune <ntadevosyan@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: nune-tadevosyan <nune-tadevosyan@users.noreply.github.com>

* fix word offset logic

Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: monica-sekoyan <monica-sekoyan@users.noreply.github.com>

* Tests update after the fix

Signed-off-by: Nune <ntadevosyan@nvidia.com>

* Cases for monotonicity

Signed-off-by: Nune <ntadevosyan@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: nune-tadevosyan <nune-tadevosyan@users.noreply.github.com>

* Tests fix

Signed-off-by: Nune <ntadevosyan@nvidia.com>

* Increase L0_Unit_Tests_GPU_ASR timeout to 30

Signed-off-by: Charlie Truong <chtruong@nvidia.com>

---------

Signed-off-by: Monica Sekoyan <msekoyan@nvidia.com>
Signed-off-by: monica-sekoyan <monica-sekoyan@users.noreply.github.com>
Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>
Signed-off-by: Nune <ntadevosyan@nvidia.com>
Signed-off-by: nune-tadevosyan <nune-tadevosyan@users.noreply.github.com>
Signed-off-by: monica-sekoyan <msekoyan@vidia.com>
Signed-off-by: nune-tadevosyan <152167970+nune-tadevosyan@users.noreply.github.com>
Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>
Signed-off-by: Charlie Truong <chtruong@nvidia.com>
Co-authored-by: Monica Sekoyan <msekoyan@nvidia.com>
Co-authored-by: monica-sekoyan <monica-sekoyan@users.noreply.github.com>
Co-authored-by: nune-tadevosyan <nune-tadevosyan@users.noreply.github.com>
Co-authored-by: monica-sekoyan <msekoyan@vidia.com>
Co-authored-by: nithinraok <nithinrao.koluguri@gmail.com>
Co-authored-by: Charlie Truong <chtruong@nvidia.com>
Signed-off-by: Guyue Huang <guyueh@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants