[ROCm] Bug fix for flex attention configs avoiding ROCm path #140270

jataylo · 2024-11-11T15:07:23Z

Follow up fix to #139883 which made the bulk of the changes required but a logic error resulted in ROCm still using h100 configurations.

cc @jeffdaily @sunway513 @jithunnair-amd @pruthvistony @ROCmSupport @dllehr-amd @hongxiayang @naromero77amd @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang @aakhundov

pytorch-bot · 2024-11-11T15:07:28Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/140270

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

[DomainsOnly] Jobs fail with GLIBC version not found

✅ No Failures

As of commit 1b16483 with merge base 62eea62 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

jataylo · 2024-11-11T20:36:44Z

Adding priority label to enable testing for triton 3.2/pytorch 2.6

drisspg · 2024-11-11T20:37:47Z

torch/_inductor/kernel/flex_attention.py

    default_config = None

-    if head_dim <= 256 and torch.cuda.get_device_capability() >= (9, 0):  # H100
+    if head_dim <= 256 and torch.version.hip:


I think it might be cleaner to instead create a _get_config_rocm and we branch at call site, even if there is overlap

TBH this flow is already hard to grok

We still need the conditionalisation on head_dim at least

Do you have any specific recommendations @drisspg, would like to get this bugfix in asap for triton 3.2 debugging, but the config selection in this file definitely does need refactoring.

I think we should copy and paste the entire function and create two versions, 1 for rocm and for cuda 1. It feel to me although you need branching on head_dms that the branching should likely be different given the reduced smem constraints and increased warp size on AMD gpus.

I think lets just make two funcs and copy and paste as much as you need, until the dust settles some more

bertmaher

Looks like this has addressed @drisspg's concerns so I'm good to merge

bertmaher · 2024-11-12T14:44:33Z

@pytorchbot merge

pytorchmergebot · 2024-11-12T14:46:44Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2024-11-12T14:57:24Z

Merge failed

Reason: 1 mandatory check(s) failed. The first few are:

Lint / lintrunner-noclang / linux-job

Dig deeper by viewing the failures on hud

Details for Dev Infra team

Raised by workflow job

Failing merge rule: Core Maintainers

jataylo · 2024-11-12T16:32:10Z

There are failures associated, likely due to the new structure on NV side... let me sanity check and when we get a green signal I'll merge. cc: @bertmaher

jataylo · 2024-11-14T09:50:48Z

Rebasing failure seems unrelated

jataylo · 2024-11-14T09:50:55Z

@pytorchbot rebase

pytorchmergebot · 2024-11-14T09:52:25Z

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

pytorchmergebot · 2024-11-14T09:52:29Z

Successfully rebased flex-fix onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout flex-fix && git pull --rebase)

jataylo · 2024-11-14T15:41:14Z

@pytorchbot merge

pytorchmergebot · 2024-11-14T15:42:58Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2024-11-14T21:41:36Z

The merge job was canceled or timed out. This most often happen if two merge requests were issued for the same PR, or if merge job was waiting for more than 6 hours for tests to finish. In later case, please do not hesitate to reissue the merge command
For more information see pytorch-bot wiki.

jataylo · 2024-11-15T12:02:00Z

@pytorchbot merge

pytorchmergebot · 2024-11-15T12:03:41Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

…#140270) Fixes pytorch#139755 pytorch#139621 Follow up fix to pytorch#139883 which made the bulk of the changes required but a logic error resulted in ROCm still using h100 configurations. Pull Request resolved: pytorch#140270 Approved by: https://github.com/bertmaher

jataylo added ciflow/rocm Trigger "default" config CI on ROCm ciflow/inductor-rocm Trigger "inductor" config CI on ROCm labels Nov 11, 2024

jataylo requested a review from bertmaher November 11, 2024 15:07

pytorch-bot bot added ciflow/inductor module: inductor module: rocm AMD GPU support for Pytorch labels Nov 11, 2024

pytorchbot added the open source label Nov 11, 2024

jataylo added topic: not user facing topic category and removed open source labels Nov 11, 2024

pytorchbot added the open source label Nov 11, 2024

jataylo requested a review from drisspg November 11, 2024 20:36

jataylo added the rocm priority high priority ROCm PRs from performance or other aspects label Nov 11, 2024

drisspg reviewed Nov 11, 2024

View reviewed changes

bertmaher approved these changes Nov 12, 2024

View reviewed changes

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Nov 12, 2024

pytorchmergebot added the merging label Nov 12, 2024

pytorchmergebot removed the merging label Nov 12, 2024

jataylo and others added 4 commits November 14, 2024 09:52

[ROCm] Bug fix for flex attention configs avoiding ROCm path

7152c8a

Refactoring per request

4c6ff77

Fixing mistake in logic

1ffd316

NV fixes

d0e41b7

jataylo added 3 commits November 14, 2024 09:52

Simplifying the structure due to linting issues/bugs

6ccc305

Fixes

fcc9760

NV fixes

1b16483

pytorchmergebot force-pushed the flex-fix branch from cb09132 to 1b16483 Compare November 14, 2024 09:52

pytorchmergebot added the merging label Nov 14, 2024

jataylo mentioned this pull request Nov 14, 2024

[triton] Update pin for PyTorch 2.6/Triton 3.2 #139206

Closed

pytorchmergebot added the Merged label Nov 15, 2024

pytorchmergebot closed this in 4caf6a1 Nov 15, 2024

pytorchmergebot removed the merging label Nov 15, 2024

jataylo mentioned this pull request Nov 18, 2024

[ROCm] [Triton 3.2] Shmem OOM errors in flex attention #139882

Closed

[ROCm] Bug fix for flex attention configs avoiding ROCm path #140270

[ROCm] Bug fix for flex attention configs avoiding ROCm path #140270

Uh oh!

Conversation

jataylo commented Nov 11, 2024 • edited by pytorch-bot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Nov 11, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/140270

❗ 1 Active SEVs

✅ No Failures

Uh oh!

jataylo commented Nov 11, 2024

Uh oh!

drisspg Nov 11, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jataylo Nov 11, 2024

Choose a reason for hiding this comment

Uh oh!

jataylo Nov 11, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

drisspg Nov 12, 2024

Choose a reason for hiding this comment

Uh oh!

bertmaher left a comment

Choose a reason for hiding this comment

Uh oh!

bertmaher commented Nov 12, 2024

Uh oh!

pytorchmergebot commented Nov 12, 2024

Merge started

Uh oh!

pytorchmergebot commented Nov 12, 2024

Merge failed

Uh oh!

jataylo commented Nov 12, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jataylo commented Nov 14, 2024

Uh oh!

jataylo commented Nov 14, 2024

Uh oh!

pytorchmergebot commented Nov 14, 2024

Uh oh!

pytorchmergebot commented Nov 14, 2024

Uh oh!

jataylo commented Nov 14, 2024

Uh oh!

pytorchmergebot commented Nov 14, 2024

Merge started

Uh oh!

pytorchmergebot commented Nov 14, 2024

Uh oh!

jataylo commented Nov 15, 2024

Uh oh!

pytorchmergebot commented Nov 15, 2024

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

jataylo commented Nov 11, 2024 •

edited by pytorch-bot bot

Loading

pytorch-bot bot commented Nov 11, 2024 •

edited

Loading

drisspg Nov 11, 2024 •

edited

Loading

jataylo Nov 11, 2024 •

edited

Loading

jataylo commented Nov 12, 2024 •

edited

Loading