-
Notifications
You must be signed in to change notification settings - Fork 25.7k
Fix an example: Resolve broadcasting error in attn_bias and attn_mask… #130209
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
… addition, fix device assignment for newly created variables in method
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/130209
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 81dfdda with merge base 9983242 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fyi @drisspg
A more elegant implementation of distribution devices from @mikaylagawarecki. Co-authored-by: mikaylagawarecki <mikaylagawarecki@gmail.com>
@pytorchbot merge |
Merge failedReason: This PR needs a If not, please add the To add a label, you can comment to pytorchbot, for example For more information, see Details for Dev Infra teamRaised by workflow job |
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
The merge job was canceled or timed out. This most often happen if two merge requests were issued for the same PR, or if merge job was waiting for more than 6 hours for tests to finish. In later case, please do not hesitate to reissue the merge command |
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
pytorch#130209) … addition, fix device assignment for newly created variables in method Fix an example: Resolve broadcasting error in attn_bias and attn_mask addition, fix device assignment for newly created variables in method 1. `attn_bias += attn_mask` would cause a broadcasting error. Because the shape of `attn_bias` is (L, S), the shape of the output would be expected as (L, S) too. When the shape of input is (N, num_heads, L, S), a broadcasting should be triggered. Then, the shape of the output would be (N, num_heads, L, S), which is unexpected. 2. `attn_bias` is a newly created variables in method, which is not assigned device. **This is my retry of pytorch#130200 .** I used a wrong account in that pr. Co-authored-by: mikaylagawarecki <mikaylagawarecki@gmail.com> Pull Request resolved: pytorch#130209 Approved by: https://github.com/mikaylagawarecki
pytorch#130209) … addition, fix device assignment for newly created variables in method Fix an example: Resolve broadcasting error in attn_bias and attn_mask addition, fix device assignment for newly created variables in method 1. `attn_bias += attn_mask` would cause a broadcasting error. Because the shape of `attn_bias` is (L, S), the shape of the output would be expected as (L, S) too. When the shape of input is (N, num_heads, L, S), a broadcasting should be triggered. Then, the shape of the output would be (N, num_heads, L, S), which is unexpected. 2. `attn_bias` is a newly created variables in method, which is not assigned device. **This is my retry of pytorch#130200 .** I used a wrong account in that pr. Co-authored-by: mikaylagawarecki <mikaylagawarecki@gmail.com> Pull Request resolved: pytorch#130209 Approved by: https://github.com/mikaylagawarecki
This sholud be reopened and merge again for the code has been overrided. @mikaylagawarecki Thank you. |
#135427) …` and `attn_mask`, and correct device assignment for newly created variables in the method. Fix example: Address broadcasting error in the addition of `attn_bias` and `attn_mask`, and correct device assignment for newly created variables in the method. 1. Adding `attn_bias += attn_mask` results in a broadcasting error. The expected shape of `attn_bias` is (L, S), so the output should also have the shape (L, S). However, when the input shape is (N, num_heads, L, S), broadcasting occurs, leading to an output shape of (N, num_heads, L, S), which is not desired. 2. `attn_bias` is a newly created variable within the method, but it is not assigned to the correct device. **This is my retry of PR #130209 . The PR has been merged into commit `d4a79d4a7c746068d25fe5cf9333495561f4ce1f`, but the modifications were overwritten by subsequent commits.** Co-authored-by: mikaylagawarecki <mikaylagawarecki@gmail.com> @mikaylagawarecki provided a more elegant implementation. Pull Request resolved: #135427 Approved by: https://github.com/ezyang
pytorch#135427) …` and `attn_mask`, and correct device assignment for newly created variables in the method. Fix example: Address broadcasting error in the addition of `attn_bias` and `attn_mask`, and correct device assignment for newly created variables in the method. 1. Adding `attn_bias += attn_mask` results in a broadcasting error. The expected shape of `attn_bias` is (L, S), so the output should also have the shape (L, S). However, when the input shape is (N, num_heads, L, S), broadcasting occurs, leading to an output shape of (N, num_heads, L, S), which is not desired. 2. `attn_bias` is a newly created variable within the method, but it is not assigned to the correct device. **This is my retry of PR pytorch#130209 . The PR has been merged into commit `d4a79d4a7c746068d25fe5cf9333495561f4ce1f`, but the modifications were overwritten by subsequent commits.** Co-authored-by: mikaylagawarecki <mikaylagawarecki@gmail.com> @mikaylagawarecki provided a more elegant implementation. Pull Request resolved: pytorch#135427 Approved by: https://github.com/ezyang
pytorch#135427) …` and `attn_mask`, and correct device assignment for newly created variables in the method. Fix example: Address broadcasting error in the addition of `attn_bias` and `attn_mask`, and correct device assignment for newly created variables in the method. 1. Adding `attn_bias += attn_mask` results in a broadcasting error. The expected shape of `attn_bias` is (L, S), so the output should also have the shape (L, S). However, when the input shape is (N, num_heads, L, S), broadcasting occurs, leading to an output shape of (N, num_heads, L, S), which is not desired. 2. `attn_bias` is a newly created variable within the method, but it is not assigned to the correct device. **This is my retry of PR pytorch#130209 . The PR has been merged into commit `d4a79d4a7c746068d25fe5cf9333495561f4ce1f`, but the modifications were overwritten by subsequent commits.** Co-authored-by: mikaylagawarecki <mikaylagawarecki@gmail.com> @mikaylagawarecki provided a more elegant implementation. Pull Request resolved: pytorch#135427 Approved by: https://github.com/ezyang
… addition, fix device assignment for newly created variables in method
Fix an example: Resolve broadcasting error in attn_bias and attn_mask addition, fix device assignment for newly created variables in method
attn_bias += attn_mask
would cause a broadcasting error. Because the shape ofattn_bias
is (L, S), the shape of the output would be expected as (L, S) too. When the shape of input is (N, num_heads, L, S), a broadcasting should be triggered. Then, the shape of the output would be (N, num_heads, L, S), which is unexpected.attn_bias
is a newly created variables in method, which is not assigned device.This is my retry of #130200 . I used a wrong account in that pr.