KEMBAR78
set thread_work_size to 4 for unrolled kernel by ngimel · Pull Request #152396 · pytorch/pytorch · GitHub
Skip to content

Conversation

ngimel
Copy link
Collaborator

@ngimel ngimel commented Apr 29, 2025

@ngimel ngimel requested review from eqy and syed-ahmed as code owners April 29, 2025 02:05
@pytorch-bot
Copy link

pytorch-bot bot commented Apr 29, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/152396

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 Cancelled Job

As of commit e1d4fb8 with merge base eb69f4e (image):

CANCELLED JOB - The following job was cancelled. Please retry:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added the release notes: cuda release notes category label Apr 29, 2025
@ngimel
Copy link
Collaborator Author

ngimel commented Apr 29, 2025

@pytorchbot rebase

@pytorchmergebot
Copy link
Collaborator

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

@pytorchmergebot
Copy link
Collaborator

Successfully rebased ngimel/unroll_kernel onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout ngimel/unroll_kernel && git pull --rebase)

@malfet
Copy link
Contributor

malfet commented Apr 29, 2025

@pytorchbot rebase

@pytorchmergebot
Copy link
Collaborator

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

@pytorchmergebot
Copy link
Collaborator

Successfully rebased ngimel/unroll_kernel onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout ngimel/unroll_kernel && git pull --rebase)

@msaroufim msaroufim self-requested a review April 30, 2025 01:19
@ngimel
Copy link
Collaborator Author

ngimel commented Apr 30, 2025

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Apr 30, 2025
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

Copy link
Contributor

@malfet malfet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding a script to run a benchmark would be nice

@pytorchmergebot
Copy link
Collaborator

The merge job was canceled or timed out. This most often happen if two merge requests were issued for the same PR, or if merge job was waiting for more than 6 hours for tests to finish. In later case, please do not hesitate to reissue the merge command
For more information see pytorch-bot wiki.

Copy link
Contributor

@atalman atalman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@ngimel
Copy link
Collaborator Author

ngimel commented Apr 30, 2025

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: 1 jobs have failed, first few of them are: trunk / macos-py3-arm64 / build

Details for Dev Infra team Raised by workflow job

@ngimel
Copy link
Collaborator Author

ngimel commented Apr 30, 2025

@pytorchbot merge -f "failure unrelated"

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@ngimel
Copy link
Collaborator Author

ngimel commented Apr 30, 2025

import triton
from triton.testing import do_bench
import torch

x = torch.randn(1024*1024*16, device="cuda",  dtype=torch.bfloat16)

with torch.profiler.profile() as p:
    for _ in range(10):
        x.float().bfloat16()
        x + 1
p.export_chrome_trace("trace.json")

Biggest user is conversion ops that are not specialized (we specialize float -> bfloat16, but bfloat16 -> float still goes through generic path, and tbh I'm reluctant to specialize it). x+1 just for comparison of fully vectorized kernel perf

@atalman
Copy link
Contributor

atalman commented May 28, 2025

@pytorchbot cherry-pick --onto release/2.7 -c critical

pytorchbot pushed a commit that referenced this pull request May 28, 2025
Previous PRs enabling 8-vectorization inadvertently regressed unrolled kernel perf.

Pull Request resolved: #152396
Approved by: https://github.com/BoyuanFeng, https://github.com/msaroufim, https://github.com/malfet, https://github.com/Aidyn-A, https://github.com/atalman

(cherry picked from commit adebb8b)
@pytorchbot
Copy link
Collaborator

Cherry picking #152396

The cherry pick PR is at #154541 and it is recommended to link a critical cherry pick PR with an issue. The following tracker issues are updated:

Details for Dev Infra team Raised by workflow job

atalman pushed a commit that referenced this pull request May 29, 2025
set thread_work_size to 4 for unrolled kernel (#152396)

Previous PRs enabling 8-vectorization inadvertently regressed unrolled kernel perf.

Pull Request resolved: #152396
Approved by: https://github.com/BoyuanFeng, https://github.com/msaroufim, https://github.com/malfet, https://github.com/Aidyn-A, https://github.com/atalman

(cherry picked from commit adebb8b)

Co-authored-by: Natalia Gimelshein <ngimel@meta.com>
@github-actions github-actions bot deleted the ngimel/unroll_kernel branch June 29, 2025 02:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk Trigger trunk jobs on your pull request Merged release notes: cuda release notes category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants