-
Notifications
You must be signed in to change notification settings - Fork 25.7k
set thread_work_size to 4 for unrolled kernel #152396
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/152396
Note: Links to docs will display an error until the docs builds have been completed. ❌ 1 Cancelled JobAs of commit e1d4fb8 with merge base eb69f4e ( CANCELLED JOB - The following job was cancelled. Please retry:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
@pytorchbot rebase |
@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here |
Successfully rebased |
9d00366
to
bbb6d6e
Compare
bbb6d6e
to
ea98fe6
Compare
@pytorchbot rebase |
@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here |
Successfully rebased |
ea98fe6
to
e1d4fb8
Compare
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding a script to run a benchmark would be nice
The merge job was canceled or timed out. This most often happen if two merge requests were issued for the same PR, or if merge job was waiting for more than 6 hours for tests to finish. In later case, please do not hesitate to reissue the merge command |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Merge failedReason: 1 jobs have failed, first few of them are: trunk / macos-py3-arm64 / build Details for Dev Infra teamRaised by workflow job |
@pytorchbot merge -f "failure unrelated" |
Merge startedYour change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Biggest user is conversion ops that are not specialized (we specialize float -> bfloat16, but bfloat16 -> float still goes through generic path, and tbh I'm reluctant to specialize it). x+1 just for comparison of fully vectorized kernel perf |
@pytorchbot cherry-pick --onto release/2.7 -c critical |
Previous PRs enabling 8-vectorization inadvertently regressed unrolled kernel perf. Pull Request resolved: #152396 Approved by: https://github.com/BoyuanFeng, https://github.com/msaroufim, https://github.com/malfet, https://github.com/Aidyn-A, https://github.com/atalman (cherry picked from commit adebb8b)
Cherry picking #152396The cherry pick PR is at #154541 and it is recommended to link a critical cherry pick PR with an issue. The following tracker issues are updated: Details for Dev Infra teamRaised by workflow job |
set thread_work_size to 4 for unrolled kernel (#152396) Previous PRs enabling 8-vectorization inadvertently regressed unrolled kernel perf. Pull Request resolved: #152396 Approved by: https://github.com/BoyuanFeng, https://github.com/msaroufim, https://github.com/malfet, https://github.com/Aidyn-A, https://github.com/atalman (cherry picked from commit adebb8b) Co-authored-by: Natalia Gimelshein <ngimel@meta.com>
Previous PRs enabling 8-vectorization inadvertently regressed unrolled kernel perf.
cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov