KEMBAR78
[ROCm] fastSpecializedAtomicAdd for MI300 by jeffdaily · Pull Request #135770 · pytorch/pytorch · GitHub
Skip to content

Conversation

@jeffdaily
Copy link
Collaborator

@jeffdaily jeffdaily commented Sep 11, 2024

MI300 adds HW support for packed bfloat16 and fp16. Enable via existing fastSpecializedAtomicAdd.

cc @sunway513 @jithunnair-amd @pruthvistony @ROCmSupport @dllehr-amd @jataylo @hongxiayang @naromero77amd

@pytorch-bot pytorch-bot bot added ciflow/rocm Trigger "default" config CI on ROCm module: rocm AMD GPU support for Pytorch release notes: cuda release notes category labels Sep 11, 2024
@pytorch-bot
Copy link

pytorch-bot bot commented Sep 11, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/135770

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

✅ No Failures

As of commit 6df3cf1 with merge base 31c0467 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot
Copy link
Contributor

@jianyuh has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@janeyx99 janeyx99 added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Sep 16, 2024
@Mellonta
Copy link
Contributor

Hi Jeff, I tested it internally on ROCm 6.2.0 and the performance looks great—thanks! However, I noticed that the code specifies ROCM_VERSION >= 60201. Is this a requirement, or should it also work with 6.2.0?

@xw285cornell
Copy link
Contributor

@Mellonta it's very possible that our internal clang compiler is newer than the clang rpm in 6.2.0

@jeffdaily
Copy link
Collaborator Author

Hi Jeff, I tested it internally on ROCm 6.2.0 and the performance looks great—thanks! However, I noticed that the code specifies ROCM_VERSION >= 60201. Is this a requirement, or should it also work with 6.2.0?

When working on this PR I discovered a bug in our ROCm 6.2 compiler. To better support you, I got the release team to push the fix as a patch in ROCm 6.2.1. The compilation will succeed on ROCm 6.2, but your results when using index_add will be garbage for bf16 and fp16 types. That's why I guard it as needing 6.2.1 or newer.

@facebook-github-bot
Copy link
Contributor

@xw285cornell has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Copy link
Contributor

@xw285cornell xw285cornell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The failed tests doesn't seem relevant

@jeffdaily
Copy link
Collaborator Author

@jianyuh @xw285cornell Build should be fixed now.

@jeffdaily jeffdaily requested a review from jianyuh September 26, 2024 17:15
@jeffdaily jeffdaily added release notes: rocm mandatorylabel and removed release notes: cuda release notes category labels Sep 26, 2024
@pruthvistony pruthvistony added the rocm This tag is for PRs from ROCm team label Sep 27, 2024
@facebook-github-bot
Copy link
Contributor

@xw285cornell has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@pytorchbot merge -f 'Landed internally'

(Initiating merge automatically since Phabricator Diff has merged, using force because this PR might not pass merge_rules.json but landed internally)

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

jerrymannil pushed a commit to ROCm/pytorch that referenced this pull request Oct 15, 2024
MI300 adds HW support for packed bfloat16 and fp16. Enable via existing fastSpecializedAtomicAdd.

Pull Request resolved: pytorch#135770
Approved by: https://github.com/xw285cornell, https://github.com/jianyuh
jerrymannil pushed a commit to ROCm/pytorch that referenced this pull request Nov 5, 2024
MI300 adds HW support for packed bfloat16 and fp16. Enable via existing fastSpecializedAtomicAdd.

Pull Request resolved: pytorch#135770
Approved by: https://github.com/xw285cornell, https://github.com/jianyuh
pruthvistony pushed a commit to ROCm/pytorch that referenced this pull request Nov 5, 2024
MI300 adds HW support for packed bfloat16 and fp16. Enable via existing
fastSpecializedAtomicAdd.

Pull Request resolved: pytorch#135770
Approved by: https://github.com/xw285cornell, https://github.com/jianyuh

Co-authored-by: Jeff Daily <jeff.daily@amd.com>
jerrymannil pushed a commit to ROCm/pytorch that referenced this pull request Nov 19, 2024
MI300 adds HW support for packed bfloat16 and fp16. Enable via existing fastSpecializedAtomicAdd.

Pull Request resolved: pytorch#135770
Approved by: https://github.com/xw285cornell, https://github.com/jianyuh
jithunnair-amd pushed a commit to ROCm/pytorch that referenced this pull request Nov 23, 2024
MI300 adds HW support for packed bfloat16 and fp16. Enable via existing fastSpecializedAtomicAdd.

Pull Request resolved: pytorch#135770
Approved by: https://github.com/xw285cornell, https://github.com/jianyuh

(cherry picked from commit d33a5e2)
jithunnair-amd added a commit to ROCm/pytorch that referenced this pull request Nov 23, 2024
…) (#1746)

MI300 adds HW support for packed bfloat16 and fp16. Enable via existing
fastSpecializedAtomicAdd.

Helps with improving [torch.scatter_add_
performance](https://ontrack-internal.amd.com/browse/SWDEV-497013),
among others.

Pull Request resolved: pytorch#135770

Co-authored-by: Jeff Daily <jeff.daily@amd.com>
jithunnair-amd pushed a commit to ROCm/pytorch that referenced this pull request Mar 17, 2025
MI300 adds HW support for packed bfloat16 and fp16. Enable via existing
fastSpecializedAtomicAdd.

Pull Request resolved: pytorch#135770
Approved by: https://github.com/xw285cornell, https://github.com/jianyuh

Co-authored-by: Jeff Daily <jeff.daily@amd.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/rocm Trigger "default" config CI on ROCm Merged module: rocm AMD GPU support for Pytorch open source release notes: rocm mandatorylabel rocm This tag is for PRs from ROCm team triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants