ln + amax + fp8 quant inductor enablement #109301

ipiszy · 2023-09-14T17:00:21Z

Stack from ghstack (oldest at bottom):

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx @peterbell10 @ngimel @yf225 @chenyang78 @kadeng @muchulee8 @aakhundov

[ghstack-poisoned]

pytorch-bot · 2023-09-14T17:00:24Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/109301

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 0c1bfbc with merge base 59592ce ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

ghstack-source-id: 3e1ad9f Pull Request resolved: #109301

vkuzo · 2023-09-15T21:17:53Z

test/inductor/test_fp8.py



+# Utility functions are copied from
+# https://github.com/pytorch-labs/float8_playground/blob/main/float8_playground/float8_utils.py.


nit: maybe we can remove the link, since this repo isn't public yet? I think it's ok to not cite it in the landed version

vkuzo · 2023-09-15T21:53:08Z

test/inductor/test_fp8.py

+
+        def ln_fp8(x: Tensor, scale: float, amax_buffer: Tensor):
+            x = torch.nn.functional.layer_norm(x, [hidden_size], weight=None, bias=None, eps=1e-05)
+            amax_buffer.fill_(torch.max(torch.abs(x)))


if it helps, this function can be moved to after the pointwise stuff in user code. The way we originally wrote this code isn't super intuitive

I feel it doesn't matter. This requires a reduction kernel anyways.

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ngimel yf225 chenyang78 kadeng muchulee8 aakhundov [ghstack-poisoned]

ghstack-source-id: 184687f Pull Request resolved: #109301

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ngimel yf225 chenyang78 kadeng muchulee8 aakhundov [ghstack-poisoned]

ghstack-source-id: c582881 Pull Request resolved: #109301

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ngimel yf225 chenyang78 kadeng muchulee8 aakhundov [ghstack-poisoned]

ipiszy · 2023-09-21T01:10:56Z

torch/_inductor/codegen/triton.py

-            and len(self.numels) == 2
-            and self.numels[-1] >= 256
-        )
+        # self.no_x_dim = (


Note: no_x_dim related logics needs to be removed so that XBLOCK is configurable.
However, according to @jansel there could be perf regressions if XBLOCK is added.

ipiszy · 2023-09-21T02:11:57Z

torch/_inductor/codegen/triton.py

            return False
        threshold = {
-            ReductionHint.INNER: 1024,
+            ReductionHint.INNER: 256,


Note: the threshold for persistent kernel needs to be decreased to make sure that fox cases like max(x).to(fp8):

XBLOCK size fulfill fp8 min_element_per_thread requirements. e.g. For fp8_e5m2 the min_element_per_thread is 4. In this case, XBLOCK size = 4 * NUM_WARPS (which is RBLOCK size / 128 by default) * 32 (warp_size)

XBLOCK * RBLOCK < 131072, which is Triton maximum tensor numel.
So XBLOCK * RBLOCK = RBLOCK_SIZE * RBLOCK_SIZE < 131072, max RBLOCK_size is 256.

There are lots of things can be tuned potentially. e.g.

We may only update this rule when fp8 conversion is followed by a reduction. This would need some code refactoring as it seems no way to get this information without actually running ops.to_dtype().

We could also reduce NUM_WARPS. However this would reduce parallelism, which doesn't seem to be ideal.

For a normal reduction kernel (i.e. not a persistent reduction kernel), we may also want to decrease split-reduction threshold so that each block handles fewer number of elements.

ipiszy · 2023-09-21T02:18:50Z

test/inductor/test_fp8.py

+
+        def ln_fp8(x: Tensor, scale: float, amax_buffer: Tensor):
+            x = torch.nn.functional.layer_norm(x, [hidden_size], weight=None, bias=None, eps=1e-05)
+            amax_buffer.fill_(torch.max(torch.abs(x)))


I feel it doesn't matter. This requires a reduction kernel anyways.

ipiszy · 2023-09-21T02:25:13Z

Hi @jansel @eellison @Chillee please help review this PR, thanks!
Meanwhile, I filed an issue in Triton github to collect more feedback triton-lang/triton#2354.

jansel · 2023-09-21T15:15:35Z

torch/_inductor/codegen/triton.py

+        self.min_elem_per_thread_reduction_block = 0
+        self.min_elem_per_thread_non_reduction_block = 0


Perhaps make this a list with the same length as len(groups)?

We today we have XBLOCK, YBLOCK, RBLOCK, and a (disabled by config) ZBLOCK.

You might also want to test this on a pointwise kernel that gets 2D tiling. You can trigger that codepath by doing a pointwise kernel with transposed inputs.

In the future we may have tiled reduction kernels as well.

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ngimel yf225 chenyang78 kadeng muchulee8 aakhundov [ghstack-poisoned]

ln + amax + fp8 quant inductor enablement

3d0f5fd

[ghstack-poisoned]

ipiszy mentioned this pull request Sep 14, 2023

Basic fp8 support in Inductor #109168

Closed

github-actions bot added module: inductor ciflow/inductor labels Sep 14, 2023

ipiszy added a commit that referenced this pull request Sep 14, 2023

ln + amax + fp8 quant inductor enablement

bb93ef5

ghstack-source-id: 3e1ad9f Pull Request resolved: #109301

vkuzo reviewed Sep 15, 2023

View reviewed changes

Update on "ln + amax + fp8 quant inductor enablement"

0f0fdfc

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ngimel yf225 chenyang78 kadeng muchulee8 aakhundov [ghstack-poisoned]

ipiszy added a commit that referenced this pull request Sep 19, 2023

ln + amax + fp8 quant inductor enablement

dc1c9c4

ghstack-source-id: 184687f Pull Request resolved: #109301

Update on "ln + amax + fp8 quant inductor enablement"

3bd97ed

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ngimel yf225 chenyang78 kadeng muchulee8 aakhundov [ghstack-poisoned]

ipiszy added a commit that referenced this pull request Sep 19, 2023

ln + amax + fp8 quant inductor enablement

9b9b198

ghstack-source-id: c582881 Pull Request resolved: #109301

Update on "ln + amax + fp8 quant inductor enablement"

c6c5617

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ngimel yf225 chenyang78 kadeng muchulee8 aakhundov [ghstack-poisoned]

Update on "ln + amax + fp8 quant inductor enablement"

6bc8e1b

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ngimel yf225 chenyang78 kadeng muchulee8 aakhundov [ghstack-poisoned]

ipiszy commented Sep 21, 2023

View reviewed changes

ipiszy requested review from Chillee, eellison and jansel September 21, 2023 02:22

ipiszy marked this pull request as ready for review September 21, 2023 02:23

ipiszy mentioned this pull request Sep 21, 2023

ln + fp8 quant benchmark #109765

Closed

jansel requested changes Sep 21, 2023

View reviewed changes

ipiszy added 2 commits September 21, 2023 16:42

Update on "ln + amax + fp8 quant inductor enablement"

aedce37

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ngimel yf225 chenyang78 kadeng muchulee8 aakhundov [ghstack-poisoned]

Update on "ln + amax + fp8 quant inductor enablement"

0c1bfbc

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ngimel yf225 chenyang78 kadeng muchulee8 aakhundov [ghstack-poisoned]

ipiszy mentioned this pull request Oct 10, 2023

debug #110930

Closed

ipiszy closed this Oct 20, 2023

facebook-github-bot deleted the gh/ipiszy@gmail.com/8/head branch November 19, 2023 15:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ln + amax + fp8 quant inductor enablement #109301

ln + amax + fp8 quant inductor enablement #109301

Uh oh!

ipiszy commented Sep 14, 2023 •

edited

Loading

Uh oh!

pytorch-bot bot commented Sep 14, 2023 •

edited

Loading

Uh oh!

vkuzo Sep 15, 2023

Uh oh!

vkuzo Sep 15, 2023

Uh oh!

ipiszy Sep 21, 2023

Uh oh!

ipiszy Sep 21, 2023

Uh oh!

ipiszy Sep 21, 2023

Uh oh!

ipiszy Sep 21, 2023

Uh oh!

ipiszy commented Sep 21, 2023

Uh oh!

jansel Sep 21, 2023 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants



		# Utility functions are copied from
		# https://github.com/pytorch-labs/float8_playground/blob/main/float8_playground/float8_utils.py.

		self.min_elem_per_thread_reduction_block = 0
		self.min_elem_per_thread_non_reduction_block = 0

ln + amax + fp8 quant inductor enablement #109301

ln + amax + fp8 quant inductor enablement #109301

Uh oh!

Conversation

ipiszy commented Sep 14, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Sep 14, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/109301

✅ No Failures

Uh oh!

vkuzo Sep 15, 2023

Choose a reason for hiding this comment

Uh oh!

vkuzo Sep 15, 2023

Choose a reason for hiding this comment

Uh oh!

ipiszy Sep 21, 2023

Choose a reason for hiding this comment

Uh oh!

ipiszy Sep 21, 2023

Choose a reason for hiding this comment

Uh oh!

ipiszy Sep 21, 2023

Choose a reason for hiding this comment

Uh oh!

ipiszy Sep 21, 2023

Choose a reason for hiding this comment

Uh oh!

ipiszy commented Sep 21, 2023

Uh oh!

jansel Sep 21, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ipiszy commented Sep 14, 2023 •

edited

Loading

pytorch-bot bot commented Sep 14, 2023 •

edited

Loading

jansel Sep 21, 2023 •

edited

Loading