-
Notifications
You must be signed in to change notification settings - Fork 3.1k
Improve NEST GPU Utilization 2/N #14089
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| mask_value = self.mask_embedding.unsqueeze(-1) | ||
| masks = torch.zeros_like(input_feats) | ||
| maksed_feats = input_feats.clone() | ||
| masked_feats = input_feats |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this also change input_feats as well when masked_feats is updated? Ideally we should return the masked_feats while keeping input_feats unchanged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did confirm that by:
module = RandomBlockMasking(80, allow_overlap=False, mask_prob=0.01, block_size=40).cuda()
for i in range(100):
input = torch.randn(1, 80, 1000).cuda()
input_len = torch.tensor([1000]).cuda()
masked_feats, masks = module(input, input_len)
assert not torch.allclose(masked_feats, input)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome, thanks
|
@MahmoudAshraf97 could you please fix the DCO error? |
5559d1a to
2e5e1b9
Compare
Signed-off-by: MahmoudAshraf97 <hassouna97.ma@gmail.com>
2e5e1b9 to
545b55f
Compare
|
@stevehuang52 Fixed |
|
@stevehuang52 Can you rerun CICD? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, thanks for the improvement!
Signed-off-by: MahmoudAshraf97 <hassouna97.ma@gmail.com> Signed-off-by: Amir Hussein <amhussein@nvidia.com>
Signed-off-by: MahmoudAshraf97 <hassouna97.ma@gmail.com> Signed-off-by: Amir Hussein <amhussein@nvidia.com>
Signed-off-by: MahmoudAshraf97 <hassouna97.ma@gmail.com> Signed-off-by: Amir Hussein <amhussein@nvidia.com>
Signed-off-by: MahmoudAshraf97 <hassouna97.ma@gmail.com>
Signed-off-by: MahmoudAshraf97 <hassouna97.ma@gmail.com> Signed-off-by: Guyue Huang <guyueh@nvidia.com>
This PR is among a series aimed to improve the training speed and GPU utilization of NEST models #13619
after profiling, the masking module takes a good chunk of the total training loop time while involving no computation, thus reducing the utilization, this PR reduces python for-loops as much as possible, and converts the masking operation to a singe function call
As shown in the following graph, This PR along with #14086 reduces the training step time by almost 17%

The green is with the PRs applied
cc @stevehuang52