KEMBAR78
Dont use constant mask if ynumel potentially overflows ygrids by kundaMwiza · Pull Request #139751 · pytorch/pytorch · GitHub
Skip to content

Conversation

@kundaMwiza
Copy link
Collaborator

@kundaMwiza kundaMwiza commented Nov 5, 2024

If (ynumel / YBLOCK) > get_max_ygrids(), the z dimension will be used if znumel is None. However, if (ynumel / YBLOCK) % get_max_ygrids() != 0, there will be program launches with inputs that require masking, and so this needs to be considered when determining if the y dimension has a constant mask.

Fixes #ISSUE_NUMBER

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang @aakhundov

@pytorch-bot
Copy link

pytorch-bot bot commented Nov 5, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/139751

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (3 Unrelated Failures)

As of commit 26e745b with merge base 5deca07 (image):

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

UNSTABLE - The following job failed but was likely due to flakiness present on trunk and has been marked as unstable:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@kundaMwiza
Copy link
Collaborator Author

@pytorchbot label 'topic: not user facing'

@pytorch-bot pytorch-bot bot added the topic: not user facing topic category label Nov 5, 2024
@aakhundov aakhundov changed the title Dont use constant mask if ynumel potentially overlows ygrids Dont use constant mask if ynumel potentially overflows ygrids Nov 5, 2024
@bdhirsh bdhirsh added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Nov 8, 2024
@shunting314
Copy link
Contributor

This should work but I'll let @eellison to comment if there can be better ways to handle constant ymask.

Copy link
Contributor

@eellison eellison left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tests for new conditions would be nice, one comment

@kundaMwiza kundaMwiza force-pushed the mwizak/fix-constant-mask-large-triton-grids branch from 260d7d9 to 17743a9 Compare November 20, 2024 08:58
@kundaMwiza kundaMwiza requested a review from eellison November 20, 2024 09:12
@kundaMwiza
Copy link
Collaborator Author

@pytorchbot merge

@pytorch-bot
Copy link

pytorch-bot bot commented Nov 20, 2024

Pull workflow has not been scheduled for the PR yet. It could be because author doesn't have permissions to run those or skip-checks keywords were added to PR/commits, aborting merge. Please get/give approval for the workflows and/or remove skip ci decorators before next merge attempt. If you think this is a mistake, please contact PyTorch Dev Infra.

@kundaMwiza
Copy link
Collaborator Author

@eellison Can you approve the workflows? Thanks

@kundaMwiza
Copy link
Collaborator Author

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Nov 21, 2024
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: 1 jobs have failed, first few of them are: linux-binary-manywheel / manywheel-py3_9-cuda12_6-test / test

Details for Dev Infra team Raised by workflow job

@kundaMwiza
Copy link
Collaborator Author

@eellison Looks like an unrelated failure

@kundaMwiza
Copy link
Collaborator Author

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: 1 jobs have failed, first few of them are: linux-binary-manywheel / manywheel-py3_9-cuda12_6-test / test

Details for Dev Infra team Raised by workflow job

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: 3 mandatory check(s) failed. The first few are:

Dig deeper by viewing the failures on hud

Details for Dev Infra team Raised by workflow job

Failing merge rule: Core Maintainers

@eellison
Copy link
Contributor

@pytorchbot rebase

@pytorchmergebot
Copy link
Collaborator

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

@pytorchmergebot
Copy link
Collaborator

Rebase failed due to Command git -C /home/runner/work/pytorch/pytorch push -f https://github.com/graphcore/pytorch-fork.git pull/139751/head:mwizak/fix-constant-mask-large-triton-grids returned non-zero exit code 128

remote: Permission to graphcore/pytorch-fork.git denied to pytorchmergebot.
fatal: unable to access 'https://github.com/graphcore/pytorch-fork.git/': The requested URL returned error: 403

This is likely because the author did not allow edits from maintainers on the PR or because the repo has additional permissions settings that mergebot does not qualify.
Raised by https://github.com/pytorch/pytorch/actions/runs/12055871747

@kundaMwiza kundaMwiza force-pushed the mwizak/fix-constant-mask-large-triton-grids branch from 863a623 to da97530 Compare November 28, 2024 14:01
kundaMwiza and others added 3 commits December 2, 2024 08:11
Update torch/_inductor/codegen/triton.py

Co-authored-by: George White <georgew@graphcore.ai>
@kundaMwiza kundaMwiza force-pushed the mwizak/fix-constant-mask-large-triton-grids branch from da97530 to 26e745b Compare December 2, 2024 08:11
@kundaMwiza
Copy link
Collaborator Author

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: 3 mandatory check(s) failed. The first few are:

Dig deeper by viewing the failures on hud

Details for Dev Infra team Raised by workflow job

Failing merge rule: Core Maintainers

@eellison
Copy link
Contributor

eellison commented Dec 3, 2024

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

pobin6 pushed a commit to pobin6/pytorch that referenced this pull request Dec 5, 2024
…h#139751)

If (ynumel / YBLOCK)  > get_max_ygrids(), the z dimension will be used if znumel is None. However, if (ynumel / YBLOCK) % get_max_ygrids() != 0, there will be program launches with inputs that require masking, and so this needs to be considered when determining if the y dimension has a constant mask.

Fixes #ISSUE_NUMBER

Pull Request resolved: pytorch#139751
Approved by: https://github.com/eellison

Co-authored-by: George White <georgew@graphcore.ai>
AmdSampsa pushed a commit to AmdSampsa/pytorch that referenced this pull request Dec 9, 2024
…h#139751)

If (ynumel / YBLOCK)  > get_max_ygrids(), the z dimension will be used if znumel is None. However, if (ynumel / YBLOCK) % get_max_ygrids() != 0, there will be program launches with inputs that require masking, and so this needs to be considered when determining if the y dimension has a constant mask.

Fixes #ISSUE_NUMBER

Pull Request resolved: pytorch#139751
Approved by: https://github.com/eellison

Co-authored-by: George White <georgew@graphcore.ai>
pytorchmergebot pushed a commit that referenced this pull request Jun 20, 2025
Apply the same logic introduced in #139751 to triton kernels using block ptrs. Here, if ynumel / YBLOCK > max_y_grids, dimensions dependent on YBLOCK need to be boundary checked, even if the block shape in such dimensions is a multiple of an expression in YBLOCK. This is because ynumel / YBLOCK % get_max_y_grids() may not be zero, so redundant programs will be launched that will attempt to read / write OOB.

Pull Request resolved: #149504
Approved by: https://github.com/blaine-rister

Co-authored-by: blaine-rister <145300525+blaine-rister@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk Trigger trunk jobs on your pull request Merged module: inductor open source topic: not user facing topic category triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants