Improve NEST GPU Utilization 2/N #14089

MahmoudAshraf97 · 2025-07-01T12:24:45Z

This PR is among a series aimed to improve the training speed and GPU utilization of NEST models #13619

after profiling, the masking module takes a good chunk of the total training loop time while involving no computation, thus reducing the utilization, this PR reduces python for-loops as much as possible, and converts the masking operation to a singe function call

As shown in the following graph, This PR along with #14086 reduces the training step time by almost 17%

The green is with the PRs applied

cc @stevehuang52

stevehuang52 · 2025-07-01T14:03:05Z

nemo/collections/asr/modules/ssl_modules/masking.py

-        mask_value = self.mask_embedding.unsqueeze(-1)
        masks = torch.zeros_like(input_feats)
-        maksed_feats = input_feats.clone()
+        masked_feats = input_feats


Will this also change input_feats as well when masked_feats is updated? Ideally we should return the masked_feats while keeping input_feats unchanged.

I did confirm that by:

module = RandomBlockMasking(80, allow_overlap=False, mask_prob=0.01, block_size=40).cuda() for i in range(100): input = torch.randn(1, 80, 1000).cuda() input_len = torch.tensor([1000]).cuda() masked_feats, masks = module(input, input_len) assert not torch.allclose(masked_feats, input)

Awesome, thanks

stevehuang52 · 2025-07-02T01:30:23Z

@MahmoudAshraf97 could you please fix the DCO error?

Signed-off-by: MahmoudAshraf97 <hassouna97.ma@gmail.com>

MahmoudAshraf97 · 2025-07-02T06:46:10Z

@stevehuang52 Fixed

MahmoudAshraf97 · 2025-07-04T08:54:40Z

@stevehuang52 Can you rerun CICD?

stevehuang52

Looks good to me, thanks for the improvement!

Signed-off-by: MahmoudAshraf97 <hassouna97.ma@gmail.com> Signed-off-by: Amir Hussein <amhussein@nvidia.com>

Signed-off-by: MahmoudAshraf97 <hassouna97.ma@gmail.com>

Signed-off-by: MahmoudAshraf97 <hassouna97.ma@gmail.com> Signed-off-by: Guyue Huang <guyueh@nvidia.com>

github-actions bot added the ASR label Jul 1, 2025

stevehuang52 self-requested a review July 1, 2025 13:54

stevehuang52 reviewed Jul 1, 2025

View reviewed changes

stevehuang52 added the Run CICD label Jul 1, 2025

stevehuang52 temporarily deployed to test July 1, 2025 15:20 — with GitHub Actions Inactive

MahmoudAshraf97 force-pushed the nest_masking branch from 5559d1a to 2e5e1b9 Compare July 2, 2025 06:29

ko3n1g added Run CICD and removed Run CICD labels Jul 2, 2025

initial commit

545b55f

Signed-off-by: MahmoudAshraf97 <hassouna97.ma@gmail.com>

MahmoudAshraf97 force-pushed the nest_masking branch from 2e5e1b9 to 545b55f Compare July 2, 2025 06:30

ko3n1g added Run CICD and removed Run CICD labels Jul 2, 2025

ko3n1g temporarily deployed to test July 4, 2025 13:29 — with GitHub Actions Inactive

stevehuang52 approved these changes Jul 7, 2025

View reviewed changes

stevehuang52 merged commit b9371da into NVIDIA-NeMo:main Jul 7, 2025
130 checks passed

MahmoudAshraf97 deleted the nest_masking branch July 7, 2025 16:01

AmirHussein96 pushed a commit to AmirHussein96/NeMo that referenced this pull request Jul 23, 2025

initial commit (NVIDIA-NeMo#14089)

2319422

Signed-off-by: MahmoudAshraf97 <hassouna97.ma@gmail.com> Signed-off-by: Amir Hussein <amhussein@nvidia.com>

AmirHussein96 pushed a commit to AmirHussein96/NeMo that referenced this pull request Aug 5, 2025

initial commit (NVIDIA-NeMo#14089)

9764882

Signed-off-by: MahmoudAshraf97 <hassouna97.ma@gmail.com> Signed-off-by: Amir Hussein <amhussein@nvidia.com>

AmirHussein96 pushed a commit to AmirHussein96/NeMo that referenced this pull request Aug 5, 2025

initial commit (NVIDIA-NeMo#14089)

ff17213

Signed-off-by: MahmoudAshraf97 <hassouna97.ma@gmail.com> Signed-off-by: Amir Hussein <amhussein@nvidia.com>

nasretdinovr pushed a commit to nasretdinovr/NeMo that referenced this pull request Aug 8, 2025

initial commit (NVIDIA-NeMo#14089)

705a315

Signed-off-by: MahmoudAshraf97 <hassouna97.ma@gmail.com>

guyueh1 pushed a commit to guyueh1/NeMo that referenced this pull request Aug 25, 2025

initial commit (NVIDIA-NeMo#14089)

ec472db

Signed-off-by: MahmoudAshraf97 <hassouna97.ma@gmail.com> Signed-off-by: Guyue Huang <guyueh@nvidia.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve NEST GPU Utilization 2/N #14089

Improve NEST GPU Utilization 2/N #14089

Uh oh!

MahmoudAshraf97 commented Jul 1, 2025 •

edited

Loading

Uh oh!

stevehuang52 Jul 1, 2025

Uh oh!

MahmoudAshraf97 Jul 1, 2025

Uh oh!

stevehuang52 Jul 1, 2025

Uh oh!

stevehuang52 commented Jul 2, 2025

Uh oh!

MahmoudAshraf97 commented Jul 2, 2025

Uh oh!

MahmoudAshraf97 commented Jul 4, 2025

Uh oh!

stevehuang52 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Improve NEST GPU Utilization 2/N #14089

Improve NEST GPU Utilization 2/N #14089

Uh oh!

Conversation

MahmoudAshraf97 commented Jul 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

stevehuang52 Jul 1, 2025

Choose a reason for hiding this comment

Uh oh!

MahmoudAshraf97 Jul 1, 2025

Choose a reason for hiding this comment

Uh oh!

stevehuang52 Jul 1, 2025

Choose a reason for hiding this comment

Uh oh!

stevehuang52 commented Jul 2, 2025

Uh oh!

MahmoudAshraf97 commented Jul 2, 2025

Uh oh!

MahmoudAshraf97 commented Jul 4, 2025

Uh oh!

stevehuang52 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

MahmoudAshraf97 commented Jul 1, 2025 •

edited

Loading