KEMBAR78
Add support for float8_e4m3fnuz and _e5m2fnuz by galexite · Pull Request #107586 · pytorch/pytorch · GitHub
Skip to content

Conversation

@galexite
Copy link
Collaborator

@galexite galexite commented Aug 21, 2023

This PR relates to the feature in this feature submission. It has been based on #104242 which adds similar float8 types.

These new types added in this PR are described in the paper at https://arxiv.org/abs/2206.02915. A brief description and comparison of the types with other float8 types can be also found in the OpenXLA RFC.

cc @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10 @EikanWang @albanD

@pytorch-bot
Copy link

pytorch-bot bot commented Aug 21, 2023

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/107586

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure

As of commit 89852c6 with merge base 79e3833 (image):

NEW FAILURE - The following job has failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added the release notes: linalg_frontend release notes category label Aug 21, 2023
@github-actions github-actions bot added module: cpu CPU specific problem (e.g., perf, algorithm) NNC release notes: quantization release notes category labels Aug 21, 2023
@galexite galexite force-pushed the georgew/upstream_fp8 branch from 49b0849 to 34871d7 Compare August 21, 2023 13:12
@lezcano lezcano removed their request for review August 21, 2023 14:17
@IvanYashchuk IvanYashchuk removed their request for review August 21, 2023 14:45
@galexite galexite force-pushed the georgew/upstream_fp8 branch 2 times, most recently from cc7fd13 to ed84796 Compare August 22, 2023 10:27
@galexite
Copy link
Collaborator Author

@albanD @seemethere – would it be possible to get an exception for the 2000 LOC limit? I'm only 6 lines over and quite a lot of it is just registering the types.

@galexite galexite force-pushed the georgew/upstream_fp8 branch from ed84796 to f889d6d Compare August 22, 2023 12:37
@galexite
Copy link
Collaborator Author

galexite commented Aug 22, 2023

I've removed my changes to add these and existing types to torch.finfo to fit inside the LOC limit – I'll commit them as a separate PR.

@nikitaved nikitaved removed their request for review August 22, 2023 13:47
@cpuhrsch cpuhrsch added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Aug 23, 2023
@vkuzo
Copy link
Contributor

vkuzo commented Aug 24, 2023

Thanks for working on this! At a high level this looks good, and we will need to do a more detailed review. Given that the branch cut for v2.1 is only a few days away and the high LOC of this change and our experience with landing previous "new dtype" PRs, I would expect that Pytorch v2.2 would be a reasonable target for eventually getting this in.

@vkuzo vkuzo requested review from malfet and vkuzo August 24, 2023 18:43
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks reasonable. Just curious, what made you choose a LUT over bit shifting?

also, do we expect the hardware to support an accelerated version of these?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no particular reason to use a LUT here, I can change to bit shifting if needed!

Yep, Graphcore's C600 hardware has dedicated instructions for which can be used to convert to and from both these FP8 types.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did we run any perf benchmarks/binary size consideration for having this lookup table embedded over and over in every op that will need to convert F8E4M3FNUZ to float?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will move the LUT in to the .cpp file.

@vkuzo
Copy link
Contributor

vkuzo commented Sep 18, 2023

The dtype pieces looks reasonable to me! Wondering if you could share some context on which hardware supports these float8 flavors now, and which hardware is expected to support this in the future?

cc @malfet , would you be up for a more detailed review on the framework pieces figuring out the best way to land this?

@galexite
Copy link
Collaborator Author

galexite commented Sep 19, 2023

Thanks for the review @vkuzo! I'll make the change to take those list of dtypes for the test parameters out in to a constant.

The dtype pieces looks reasonable to me! Wondering if you could share some context on which hardware supports these float8 flavors now, and which hardware is expected to support this in the future?

Graphcore's current C600 card support these types at the hardware level. It has instructions in the Tile ISA to perform common operations directly on FP8 data, as well as convert between types.

@galexite galexite force-pushed the georgew/upstream_fp8 branch 2 times, most recently from 1dd6bd9 to cac8d42 Compare September 21, 2023 09:19
@galexite
Copy link
Collaborator Author

I've rebased and added back my TypeInfo changes as I see that TypeInfo support has been added for the other FP8 types too, but this might make it go over the LOC limit again if the number of lines removed is also included in that count.


if (f_bits >= fnuz_max) {
// NaN -- sign bit set to 1, rest 0s
return 0x80;
Copy link

@umangyadav umangyadav Nov 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

table here clips float values more than FNUZ_MAX to FLT_MAX.
https://onnx.ai/onnx/technical/float8.html#cast

Is there reason behind using NaNs ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason is that the existing casting code for e5m2 and e4m3fn types are also implemented without any saturation, so I chose to do the same to match that behaviour.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fear it would lead to having more NaNs when doing inference. Ideally there should be a flag for the users to set which behaviour they want

Copy link
Contributor

@alugorey alugorey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple comments to help with your rocm CI failure. Hope this helps! :)

@malfet
Copy link
Contributor

malfet commented Nov 15, 2023

@pytorchbot merge -i

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged while ignoring the following 1 checks: pull / linux-focal-cuda11.8-py3.10-gcc9 / test (distributed, 1, 3, linux.8xlarge.nvidia.gpu)

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@galexite galexite deleted the georgew/upstream_fp8 branch November 15, 2023 15:29
jeffdaily added a commit to ROCm/pytorch that referenced this pull request Dec 6, 2023
jeffdaily added a commit to ROCm/pytorch that referenced this pull request Dec 20, 2023
pytorchmergebot pushed a commit to ROCm/pytorch that referenced this pull request Dec 21, 2023
pytorchmergebot pushed a commit to ROCm/pytorch that referenced this pull request Jan 4, 2024
jeffdaily added a commit to ROCm/pytorch that referenced this pull request Jan 10, 2024
pytorchmergebot pushed a commit to ROCm/pytorch that referenced this pull request Jan 18, 2024
pytorchmergebot pushed a commit that referenced this pull request Jan 19, 2024
jeffdaily added a commit to ROCm/pytorch that referenced this pull request Jan 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk Trigger trunk jobs on your pull request Merged module: cpu CPU specific problem (e.g., perf, algorithm) NNC open source release notes: linalg_frontend release notes category release notes: quantization release notes category skip-pr-sanity-checks triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Projects

None yet

Development

Successfully merging this pull request may close these issues.