KEMBAR78
set thread_work_size to 4 for unrolled kernel by pytorchbot · Pull Request #154541 · pytorch/pytorch · GitHub
Skip to content

Conversation

pytorchbot
Copy link
Collaborator

Previous PRs enabling 8-vectorization inadvertently regressed unrolled kernel perf.

Pull Request resolved: #152396
Approved by: https://github.com/BoyuanFeng, https://github.com/msaroufim, https://github.com/malfet, https://github.com/Aidyn-A, https://github.com/atalman

(cherry picked from commit adebb8b)
@pytorchbot pytorchbot requested review from eqy and syed-ahmed as code owners May 28, 2025 18:34
@pytorch-bot pytorch-bot bot added the release notes: cuda release notes category label May 28, 2025
@pytorch-bot
Copy link

pytorch-bot bot commented May 28, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/154541

Note: Links to docs will display an error until the docs builds have been completed.

❌ 28 New Failures, 44 Pending, 1 Unrelated Failure

As of commit da6574c with merge base 924a247 (image):

NEW FAILURES - The following jobs have failed:

UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@atalman atalman added the ciflow/binaries_wheel Trigger binary build and upload jobs for wheel on the PR label May 28, 2025
Copy link
Contributor

@atalman atalman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@atalman atalman merged commit e2d141d into release/2.7 May 29, 2025
265 of 338 checks passed
@github-actions github-actions bot deleted the cherry-pick-152396-by-pytorch_bot_bot_ branch June 29, 2025 02:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/binaries_wheel Trigger binary build and upload jobs for wheel on the PR open source release notes: cuda release notes category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants