[Distributed] add FP8 support to NaN checker #135891

kwen2501 · 2024-09-12T20:30:05Z

Stack from ghstack (oldest at bottom):

Adding support for torch.float8_e4m3fn and torch.float8_e5m2

cc @XilunWu @H-Huang @awgu @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @c-p-i-o

[ghstack-poisoned]

pytorch-bot · 2024-09-12T20:30:10Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/135891

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 27e3426 with merge base 0216936 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

ghstack-source-id: 8054946 Pull Request resolved: #135891

rghosh08 · 2024-09-12T20:51:51Z

PR Reviewer Guide 🔍

⏱️ Estimated effort to review: 4 🔵🔵🔵🔵⚪
🧪 PR contains tests
🔒 No security concerns identified
⚡ Key issues to review Code Smell The comment in line 370 suggests that filling values into a FP8 tensor is currently not supported. This might be worth considering for future improvements. Code Smell The AT_DISPATCH_FLOATING_TYPES_AND4 macro now includes support for FP8 types. However, the comment above it still mentions support for only Half and BFloat16 types. This might be worth updating for consistency.

wconstab

LGTM, except for missing optimized template for the f8 kernel. Can you determine if its worth adding before deciding to land this as is or optimize further?

Adding support for `torch.float8_e4m3fn` and `torch.float8_e5m2` cc XilunWu H-Huang awgu wanchaol fegin fduwjj wz337 wconstab d4l3k c-p-i-o [ghstack-poisoned]

kwen2501 · 2024-09-13T06:14:01Z

@pytorchbot merge

pytorchmergebot · 2024-09-13T06:15:51Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

We check 8 x FP8 simultaneously, at size of 8 bytes. Pull Request resolved: #135961 Approved by: https://github.com/yifuwang, https://github.com/Skylion007 ghstack dependencies: #135891

Add support for Float8_e5m2, following similar algorithm used for Float8_e4m3fn (i.e. overflow check). Made `HasNanFP8x8` a template so that it is extendable based on dtype. Pull Request resolved: #136115 Approved by: https://github.com/Skylion007 ghstack dependencies: #135891, #135961

Adding support for `torch.float8_e4m3fn` and `torch.float8_e5m2` Pull Request resolved: pytorch#135891 Approved by: https://github.com/wconstab

We check 8 x FP8 simultaneously, at size of 8 bytes. Pull Request resolved: pytorch#135961 Approved by: https://github.com/yifuwang, https://github.com/Skylion007 ghstack dependencies: pytorch#135891

Add support for Float8_e5m2, following similar algorithm used for Float8_e4m3fn (i.e. overflow check). Made `HasNanFP8x8` a template so that it is extendable based on dtype. Pull Request resolved: pytorch#136115 Approved by: https://github.com/Skylion007 ghstack dependencies: pytorch#135891, pytorch#135961

ghstack-source-id: a9fb66a Pull Request resolved: pytorch/pytorch#135891

[Distributed] add FP8 support to NaN checker

af1e029

[ghstack-poisoned]

pytorch-bot bot added oncall: distributed Add this issue/PR to distributed oncall triage queue release notes: distributed (c10d) release notes category labels Sep 12, 2024

kwen2501 added a commit that referenced this pull request Sep 12, 2024

[Distributed] add FP8 support to NaN checker

d9532a3

ghstack-source-id: 8054946 Pull Request resolved: #135891

kwen2501 requested review from shuqiangzhang and wconstab September 12, 2024 20:33

wconstab approved these changes Sep 12, 2024

View reviewed changes

Update on "[Distributed] add FP8 support to NaN checker"

27e3426

Adding support for `torch.float8_e4m3fn` and `torch.float8_e5m2` cc XilunWu H-Huang awgu wanchaol fegin fduwjj wz337 wconstab d4l3k c-p-i-o [ghstack-poisoned]

kwen2501 mentioned this pull request Sep 13, 2024

[Distributed] add pack-check method for float8_e4m3fn #135961

Closed

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Sep 13, 2024

pytorchmergebot added the merging label Sep 13, 2024

pytorchmergebot added the Merged label Sep 13, 2024

pytorchmergebot closed this in 31007cf Sep 13, 2024

pytorchmergebot removed the merging label Sep 13, 2024

kwen2501 mentioned this pull request Sep 15, 2024

[Distributed] add pack-check method for float8_e5m2 #136115

Closed

github-actions bot deleted the gh/kwen2501/59/head branch October 14, 2024 06:24

KnAwnime pushed a commit to KnAwnime/Biblioteka that referenced this pull request Oct 16, 2024

[Distributed] add FP8 support to NaN checker

00e0491

ghstack-source-id: a9fb66a Pull Request resolved: pytorch/pytorch#135891

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Distributed] add FP8 support to NaN checker #135891

[Distributed] add FP8 support to NaN checker #135891

Uh oh!

kwen2501 commented Sep 12, 2024 •

edited

Loading

Uh oh!

pytorch-bot bot commented Sep 12, 2024 •

edited

Loading

Uh oh!

rghosh08 commented Sep 12, 2024

Uh oh!

wconstab left a comment

Uh oh!

kwen2501 commented Sep 13, 2024

Uh oh!

pytorchmergebot commented Sep 13, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[Distributed] add FP8 support to NaN checker #135891

[Distributed] add FP8 support to NaN checker #135891

Uh oh!

Conversation

kwen2501 commented Sep 12, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Sep 12, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/135891

✅ No Failures

Uh oh!

rghosh08 commented Sep 12, 2024

PR Reviewer Guide 🔍

Uh oh!

wconstab left a comment

Choose a reason for hiding this comment

Uh oh!

kwen2501 commented Sep 13, 2024

Uh oh!

pytorchmergebot commented Sep 13, 2024

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

kwen2501 commented Sep 12, 2024 •

edited

Loading

pytorch-bot bot commented Sep 12, 2024 •

edited

Loading