[ROCm] Support new AMD triton stream pipeliner #139881

jataylo · 2024-11-06T11:59:43Z

In Triton 3.2 num_stages=0 will be deprecated with Triton's AMD backend. Let's query default num_stages from the relevant triton backend

cc @jeffdaily @sunway513 @jithunnair-amd @pruthvistony @ROCmSupport @dllehr-amd @hongxiayang @naromero77amd @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang @aakhundov

pytorch-bot · 2024-11-06T11:59:47Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/139881

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 31093c7 with merge base d031d1b ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

davidberard98 · 2024-11-06T16:15:45Z

@pytorchbot rebase -s

davidberard98 · 2024-11-06T16:16:19Z

^^ I want to see if the "0 active drivers" failures go away with a rebase

pytorchmergebot · 2024-11-06T16:17:20Z

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

jataylo · 2024-11-07T16:46:59Z

We're green on CI now @bertmaher @davidberard98 @aakhundov if you agree this solution is suitable.

aakhundov · 2024-11-07T22:25:56Z

torch/_inductor/kernel/mm_common.py

    [
        {"config": (32, 32, 16, 1, 2), "cond": True},
-        {"config": (32, 32, 128, 2, 4), "cond": torch.version.hip is None},
+        {"config": (32, 32, 128, 2, 4), "cond": True},


Curious: this config (and the two below) didn't work before but are going to work now? Also with the old Triton version?

We disabled these a long time ago as we were seeing runtime errors when we would throw a shmem OOM error. The autotune implementation seems safe enough now to not completely error out if we hit a config that throws oom issues for us. The CI is green here on the older triton still so I think we're safe here.

aakhundov · 2024-11-07T22:28:20Z

torch/_inductor/utils.py

+    from .runtime.triton_helpers import get_backend_options
+
+    options = get_backend_options()
+    return options.get("num_stages", 2 if torch.version.hip else None)


Why is the default 2 here? Will this work with the old Triton version (pre-3.2) on ROCm?

We grab the default num_stages from the AMD backend via HIPOptions, pre 3.2 Triton will return num_stages=0 still as expected. https://github.com/triton-lang/triton/pull/4845/files#diff-33c9a103282c05c9d9d213b94450ae7481b6db8c3c6d810f54f175b4735a3c72

I just provided a default case for safety here, maybe some future proofing if the dict changes (pretty unlikely num_stages will renamed though...) 2 is the new recommended for gemms.

I see, thanks for the explanation. This helper is not specifically marked / named for ROCm. So, if folks end up using this for NV, I assume it will still work as expected (i.e., will options contain num_stages on NV, too)? Also, should we @lru_cache this, as the value shouldn't change through one run?

Yeah the same should work on NV here too

CUDAOptions
https://github.com/triton-lang/triton/blob/main/third_party/nvidia/backend/compiler.py#L98

Adding lru_cache makes sense I'll make that change

We could return 3 as default on NV side rather than return None too.

bertmaher · 2024-11-08T04:35:00Z

@pytorchbot merge

bertmaher · 2024-11-08T04:36:06Z

torch/_inductor/utils.py

        threads = torch.get_num_threads()
    return threads

+@functools.lru_cache(None)


Oops, linter needs one more blank line above this (sigh)

pytorchmergebot · 2024-11-08T04:36:45Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2024-11-08T04:36:54Z

Merge failed

Reason: 1 mandatory check(s) failed. The first few are:

Lint / lintrunner-noclang / linux-job

Dig deeper by viewing the failures on hud

Details for Dev Infra team

Raised by workflow job

Failing merge rule: Core Maintainers

jataylo · 2024-11-08T10:14:04Z

@pytorchbot merge

pytorchmergebot · 2024-11-08T10:16:44Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

Fixes pytorch#139182 In Triton 3.2 num_stages=0 will be deprecated with Triton's AMD backend. Let's query default num_stages from the relevant triton backend Pull Request resolved: pytorch#139881 Approved by: https://github.com/bertmaher

jataylo added ciflow/periodic Trigger jobs ran periodically on master (periodic.yml) on the PR ciflow/rocm Trigger "default" config CI on ROCm labels Nov 6, 2024

pytorch-bot bot added ciflow/inductor module: inductor module: rocm AMD GPU support for Pytorch labels Nov 6, 2024

jataylo added ciflow/inductor-rocm Trigger "inductor" config CI on ROCm and removed module: rocm AMD GPU support for Pytorch module: inductor ciflow/inductor labels Nov 6, 2024

pytorch-bot bot added ciflow/inductor module: inductor module: rocm AMD GPU support for Pytorch labels Nov 6, 2024

pytorchbot added the open source label Nov 6, 2024

jataylo mentioned this pull request Nov 6, 2024

[inductor][AMD] Set num_stages=2 for SWP on Triton >= 3.2.0 #139844

Closed

jataylo requested review from aakhundov, bertmaher, davidberard98 and nmacchioni November 6, 2024 12:23

jataylo added release notes: inductor release notes: rocm mandatorylabel labels Nov 6, 2024

jataylo marked this pull request as draft November 6, 2024 13:06

jataylo marked this pull request as ready for review November 6, 2024 13:37

jataylo and others added 4 commits November 6, 2024 16:17

[ROCm] Support new AMD triton stream pipeliner

d2d373c

Fix out-of-date base issue

14fa196

Fix out-of-date base issue

bfb3b90

Linting artefact fix

f07c2e2

jataylo requested review from bertmaher and davidberard98 November 7, 2024 16:46

This was referenced Nov 7, 2024

[v2.6] Inductor issue tracker for Triton release/3.2 #139175

Closed

[triton] Update pin for PyTorch 2.6/Triton 3.2 #139206

Closed

aakhundov reviewed Nov 7, 2024

View reviewed changes

Add lru_cache and NV default

7ba48f8

bertmaher approved these changes Nov 8, 2024

View reviewed changes

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Nov 8, 2024

bertmaher reviewed Nov 8, 2024

View reviewed changes

pytorchmergebot added the merging label Nov 8, 2024

pytorchmergebot removed the merging label Nov 8, 2024

Linting

31093c7

pytorchmergebot added the merging label Nov 8, 2024

pytorchmergebot added the Merged label Nov 8, 2024

pytorchmergebot closed this in a33fa37 Nov 8, 2024

pytorchmergebot removed the merging label Nov 8, 2024

[ROCm] Support new AMD triton stream pipeliner #139881

[ROCm] Support new AMD triton stream pipeliner #139881

Uh oh!

Conversation

jataylo commented Nov 6, 2024 • edited by pytorch-bot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Nov 6, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/139881

✅ No Failures

Uh oh!

davidberard98 commented Nov 6, 2024

Uh oh!

davidberard98 commented Nov 6, 2024

Uh oh!

pytorchmergebot commented Nov 6, 2024

Uh oh!

jataylo commented Nov 7, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aakhundov Nov 7, 2024

Choose a reason for hiding this comment

Uh oh!

jataylo Nov 7, 2024

Choose a reason for hiding this comment

Uh oh!

aakhundov Nov 7, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jataylo Nov 7, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aakhundov Nov 7, 2024

Choose a reason for hiding this comment

Uh oh!

jataylo Nov 8, 2024

Choose a reason for hiding this comment

Uh oh!

jataylo Nov 8, 2024

Choose a reason for hiding this comment

Uh oh!

bertmaher commented Nov 8, 2024

Uh oh!

bertmaher Nov 8, 2024

Choose a reason for hiding this comment

Uh oh!

pytorchmergebot commented Nov 8, 2024

Merge started

Uh oh!

pytorchmergebot commented Nov 8, 2024

Merge failed

Uh oh!

jataylo commented Nov 8, 2024

Uh oh!

pytorchmergebot commented Nov 8, 2024

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

jataylo commented Nov 6, 2024 •

edited by pytorch-bot bot

Loading

pytorch-bot bot commented Nov 6, 2024 •

edited

Loading

jataylo commented Nov 7, 2024 •

edited

Loading

aakhundov Nov 7, 2024 •

edited

Loading

jataylo Nov 7, 2024 •

edited

Loading