[ROCm] fastSpecializedAtomicAdd for MI300 #135770

jeffdaily · 2024-09-11T22:33:42Z

MI300 adds HW support for packed bfloat16 and fp16. Enable via existing fastSpecializedAtomicAdd.

cc @sunway513 @jithunnair-amd @pruthvistony @ROCmSupport @dllehr-amd @jataylo @hongxiayang @naromero77amd

pytorch-bot · 2024-09-11T22:33:47Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/135770

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

PyTorch Testing Nodes Undergoing ROCm 6.2.1 Upgrades

✅ No Failures

As of commit 6df3cf1 with merge base 31c0467 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

aten/src/ATen/native/cuda/KernelUtils.cuh

facebook-github-bot · 2024-09-15T21:33:29Z

@jianyuh has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

aten/src/ATen/native/cuda/KernelUtils.cuh

Mellonta · 2024-09-16T21:30:46Z

Hi Jeff, I tested it internally on ROCm 6.2.0 and the performance looks great—thanks! However, I noticed that the code specifies ROCM_VERSION >= 60201. Is this a requirement, or should it also work with 6.2.0?

xw285cornell · 2024-09-19T01:35:53Z

@Mellonta it's very possible that our internal clang compiler is newer than the clang rpm in 6.2.0

jeffdaily · 2024-09-20T17:41:14Z

Hi Jeff, I tested it internally on ROCm 6.2.0 and the performance looks great—thanks! However, I noticed that the code specifies ROCM_VERSION >= 60201. Is this a requirement, or should it also work with 6.2.0?

When working on this PR I discovered a bug in our ROCm 6.2 compiler. To better support you, I got the release team to push the fix as a patch in ROCm 6.2.1. The compilation will succeed on ROCm 6.2, but your results when using index_add will be garbage for bf16 and fp16 types. That's why I guard it as needing 6.2.1 or newer.

facebook-github-bot · 2024-09-24T20:44:32Z

@xw285cornell has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

xw285cornell

The failed tests doesn't seem relevant

aten/src/ATen/native/cuda/KernelUtils.cuh

jeffdaily · 2024-09-26T17:14:58Z

@jianyuh @xw285cornell Build should be fixed now.

facebook-github-bot · 2024-09-27T02:05:13Z

@xw285cornell has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2024-09-29T21:50:25Z

@pytorchbot merge -f 'Landed internally'

(Initiating merge automatically since Phabricator Diff has merged, using force because this PR might not pass merge_rules.json but landed internally)

pytorchmergebot · 2024-09-29T21:52:00Z

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

MI300 adds HW support for packed bfloat16 and fp16. Enable via existing fastSpecializedAtomicAdd. Pull Request resolved: pytorch#135770 Approved by: https://github.com/xw285cornell, https://github.com/jianyuh

MI300 adds HW support for packed bfloat16 and fp16. Enable via existing fastSpecializedAtomicAdd. Pull Request resolved: pytorch#135770 Approved by: https://github.com/xw285cornell, https://github.com/jianyuh Co-authored-by: Jeff Daily <jeff.daily@amd.com>

MI300 adds HW support for packed bfloat16 and fp16. Enable via existing fastSpecializedAtomicAdd. Pull Request resolved: pytorch#135770 Approved by: https://github.com/xw285cornell, https://github.com/jianyuh

MI300 adds HW support for packed bfloat16 and fp16. Enable via existing fastSpecializedAtomicAdd. Pull Request resolved: pytorch#135770 Approved by: https://github.com/xw285cornell, https://github.com/jianyuh (cherry picked from commit d33a5e2)

…) (#1746) MI300 adds HW support for packed bfloat16 and fp16. Enable via existing fastSpecializedAtomicAdd. Helps with improving [torch.scatter_add_ performance](https://ontrack-internal.amd.com/browse/SWDEV-497013), among others. Pull Request resolved: pytorch#135770 Co-authored-by: Jeff Daily <jeff.daily@amd.com>

MI300 adds HW support for packed bfloat16 and fp16. Enable via existing fastSpecializedAtomicAdd. Pull Request resolved: pytorch#135770 Approved by: https://github.com/xw285cornell, https://github.com/jianyuh Co-authored-by: Jeff Daily <jeff.daily@amd.com>

jeffdaily added 3 commits August 16, 2024 17:30

[ROCm] fastSpecializedAtomicAdd for MI300

8330a75

use unions for atomics

33ed842

rename to avoid clash with HIP names

9b7e174

jeffdaily requested review from eqy and syed-ahmed as code owners September 11, 2024 22:33

pytorch-bot bot added ciflow/rocm Trigger "default" config CI on ROCm module: rocm AMD GPU support for Pytorch release notes: cuda release notes category labels Sep 11, 2024

pytorchbot added the open source label Sep 11, 2024

eqy reviewed Sep 12, 2024

View reviewed changes

aten/src/ATen/native/cuda/KernelUtils.cuh Show resolved Hide resolved

jianyuh reviewed Sep 13, 2024

View reviewed changes

aten/src/ATen/native/cuda/KernelUtils.cuh Show resolved Hide resolved

aten/src/ATen/native/cuda/KernelUtils.cuh Show resolved Hide resolved

jeffdaily added 2 commits September 13, 2024 20:06

Merge branch 'main' into rocm_bf16_fp16_atomics

a4868f4

fix build for ROCm < 6.2.1

f4935f5

janeyx99 added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Sep 16, 2024

jianyuh reviewed Sep 16, 2024

View reviewed changes

aten/src/ATen/native/cuda/KernelUtils.cuh Outdated Show resolved Hide resolved

xw285cornell approved these changes Sep 24, 2024

View reviewed changes

jianyuh requested changes Sep 25, 2024

View reviewed changes

aten/src/ATen/native/cuda/KernelUtils.cuh Outdated Show resolved Hide resolved

jeffdaily added 3 commits September 26, 2024 15:54

Merge branch 'upstream_main' into rocm_bf16_fp16_atomics

5e413f4

fix accidental change to atomicAddNoReturn for cuda path

9e60b7e

reduce the size of the diff

6df3cf1

jeffdaily requested a review from jianyuh September 26, 2024 17:15

jeffdaily added release notes: rocm mandatorylabel and removed release notes: cuda release notes category labels Sep 26, 2024

jianyuh approved these changes Sep 27, 2024

View reviewed changes

pruthvistony added the rocm This tag is for PRs from ROCm team label Sep 27, 2024

pytorchmergebot added the merging label Sep 29, 2024

pytorchmergebot added the Merged label Sep 29, 2024

pytorchmergebot closed this in d33a5e2 Sep 29, 2024

pytorchmergebot removed the merging label Sep 29, 2024

jerrymannil mentioned this pull request Nov 5, 2024

[release/2.4] [ROCm] fastSpecializedAtomicAdd for MI300 (#135770) ROCm/pytorch#1677

Merged

jithunnair-amd mentioned this pull request Nov 23, 2024

[release/2.5] [ROCm] fastSpecializedAtomicAdd for MI300 (#135770) ROCm/pytorch#1746

Merged

[ROCm] fastSpecializedAtomicAdd for MI300 #135770

[ROCm] fastSpecializedAtomicAdd for MI300 #135770

Uh oh!

Conversation

jeffdaily commented Sep 11, 2024 • edited by pytorch-bot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Sep 11, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/135770

❗ 1 Active SEVs

✅ No Failures

Uh oh!

Uh oh!

Uh oh!

Uh oh!

facebook-github-bot commented Sep 15, 2024

Uh oh!

Uh oh!

Mellonta commented Sep 16, 2024

Uh oh!

xw285cornell commented Sep 19, 2024

Uh oh!

jeffdaily commented Sep 20, 2024

Uh oh!

facebook-github-bot commented Sep 24, 2024

Uh oh!

xw285cornell left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jeffdaily commented Sep 26, 2024

Uh oh!

facebook-github-bot commented Sep 27, 2024

Uh oh!

facebook-github-bot commented Sep 29, 2024

Uh oh!

pytorchmergebot commented Sep 29, 2024

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

10 participants

jeffdaily commented Sep 11, 2024 •

edited by pytorch-bot bot

Loading

pytorch-bot bot commented Sep 11, 2024 •

edited

Loading