[Inductor] fix broadcast logic for Triton #141027

nandesuka · 2024-11-19T15:40:57Z

Summary: Fix logic for inserting broadcast on kernel with load going directly to store. In the case where load is going directly to store, we insert a tl.broadcast on the store, regardless of the block size on the load. In the case where a broadcast is not required, the downstream Triton compiler is expected to remove this no-op broadcast instruction.

Test Plan: Added tests under test_torchinductor_strided_blocks.py:test_expand_broadcast in OSS and internal test cases.

Reviewed By: blaine-rister

Differential Revision: D65518033

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang @aakhundov

pytorch-bot · 2024-11-19T15:41:01Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/141027

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❌ 2 New Failures, 3 Unrelated Failures

As of commit b4276da with merge base 9dd3b85 ():

NEW FAILURES - The following jobs have failed:

inductor-rocm / rocm6.2-py3.10-inductor / test (inductor, 1, 2, linux.rocm.gpu.2) (gh)
##[error]Credentials could not be loaded, please check your action inputs: Could not load credentials from any providers
inductor-rocm / rocm6.2-py3.10-inductor / test (inductor, 2, 2, linux.rocm.gpu.2) (gh)
##[error]Credentials could not be loaded, please check your action inputs: Could not load credentials from any providers

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

inductor / cuda12.1-py3.10-gcc9-sm86 / test (inductor_timm, 1, 2, linux.g5.4xlarge.nvidia.gpu) (gh) (similar failure)
convnext_base
pull / linux-jammy-py3.10-clang15-asan / test (default, 1, 6, lf.linux.4xlarge) (gh) (detected as infra flaky with no log or failing log classifier)

UNSTABLE - The following job failed but was likely due to flakiness present on trunk and has been marked as unstable:

inductor / cuda12.4-py3.10-gcc9-sm86 / test (inductor_timm, 1, 2, linux.g5.4xlarge.nvidia.gpu, unstable) (gh) (#141498)
convnext_base

This comment was automatically generated by Dr. CI and updates every 15 minutes.

linux-foundation-easycla · 2024-11-19T15:41:01Z

The committers listed above are authorized under a signed CLA.

✅ login: nandesuka / name: nanzha (b4276da)

facebook-github-bot · 2024-11-19T15:41:05Z

This pull request was exported from Phabricator. Differential Revision: D65518033

facebook-github-bot · 2024-11-19T15:44:54Z

This pull request was exported from Phabricator. Differential Revision: D65518033

Summary: Fix logic for inserting broadcast on kernel with load going directly to store. In the case where load is going directly to store, we insert a tl.broadcast on the store, regardless of the block size on the load. In the case where a broadcast is not required, the downstream Triton compiler is expected to remove this no-op broadcast instruction. Test Plan: Added tests under test_torchinductor_strided_blocks.py:test_expand_broadcast in OSS and internal test cases. Reviewed By: blaine-rister Differential Revision: D65518033

facebook-github-bot · 2024-11-19T19:12:06Z

This pull request was exported from Phabricator. Differential Revision: D65518033

Summary: Fix logic for inserting broadcast on kernel with load going directly to store. In the case where load is going directly to store, we insert a tl.broadcast on the store, regardless of the block size on the load. In the case where a broadcast is not required, the downstream Triton compiler is expected to remove this no-op broadcast instruction. Test Plan: Added tests under test_torchinductor_strided_blocks.py:test_expand_broadcast in OSS and internal test cases. Reviewed By: blaine-rister Differential Revision: D65518033

facebook-github-bot · 2024-11-20T02:40:34Z

This pull request was exported from Phabricator. Differential Revision: D65518033

nandesuka · 2024-11-20T04:14:32Z

@pytorchbot label "topic: not user facing"

blaine-rister

LGTM, thanks for the fix. I left a comment about one of the tests being changed. Also it looks like the expected code for one of the CI tests needs to be updated. Once the CI is green, this looks ready to go :)

blaine-rister · 2024-11-20T16:33:19Z

test/inductor/test_torchinductor_strided_blocks.py

+            y
+        )
+
+    @parametrize("prefer_nd_tiling", [True])


Seems like this change was accidental? This used to be

@parametrize("prefer_nd_tiling", [False, True])

Yup, good catch. Will update.

Summary: Fix logic for inserting broadcast on kernel with load going directly to store. In the case where load is going directly to store, we insert a tl.broadcast on the store, regardless of the block size on the load. In the case where a broadcast is not required, the downstream Triton compiler is expected to remove this no-op broadcast instruction. Test Plan: Added tests under test_torchinductor_strided_blocks.py:test_expand_broadcast in OSS and internal test cases. Reviewed By: blaine-rister Differential Revision: D65518033

facebook-github-bot · 2024-11-20T16:43:16Z

This pull request was exported from Phabricator. Differential Revision: D65518033

Summary: Fix logic for inserting broadcast on kernel with load going directly to store. In the case where load is going directly to store, we insert a tl.broadcast on the store, regardless of the block size on the load. In the case where a broadcast is not required, the downstream Triton compiler is expected to remove this no-op broadcast instruction. Test Plan: Added tests under test_torchinductor_strided_blocks.py:test_expand_broadcast in OSS and internal test cases. Reviewed By: blaine-rister Differential Revision: D65518033

facebook-github-bot · 2024-11-25T20:20:39Z

This pull request was exported from Phabricator. Differential Revision: D65518033

facebook-github-bot · 2024-11-26T02:14:22Z

This pull request was exported from Phabricator. Differential Revision: D65518033

Summary: Fix logic for inserting broadcast on kernel with load going directly to store. In the case where load is going directly to store, we insert a tl.broadcast on the store, regardless of the block size on the load. In the case where a broadcast is not required, the downstream Triton compiler is expected to remove this no-op broadcast instruction. Test Plan: Added tests under test_torchinductor_strided_blocks.py:test_expand_broadcast in OSS and internal test cases. Reviewed By: blaine-rister Differential Revision: D65518033

facebook-github-bot · 2024-11-26T02:16:07Z

This pull request was exported from Phabricator. Differential Revision: D65518033

facebook-github-bot · 2024-11-26T14:52:43Z

This pull request was exported from Phabricator. Differential Revision: D65518033

nandesuka · 2024-11-26T19:35:17Z

@pytorchbot rebase

pytorchmergebot · 2024-11-26T19:36:54Z

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

Summary: Fix logic for inserting broadcast on kernel with load going directly to store. In the case where load is going directly to store, we insert a tl.broadcast on the store, regardless of the block size on the load. In the case where a broadcast is not required, the downstream Triton compiler is expected to remove this no-op broadcast instruction. Test Plan: Added tests under test_torchinductor_strided_blocks.py:test_expand_broadcast in OSS and internal test cases. Reviewed By: blaine-rister Differential Revision: D65518033

pytorchmergebot · 2024-11-26T19:36:57Z

Successfully rebased export-D65518033 onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout export-D65518033 && git pull --rebase)

Summary: Fix logic for inserting broadcast on kernel with load going directly to store. In the case where load is going directly to store, we insert a tl.broadcast on the store, regardless of the block size on the load. In the case where a broadcast is not required, the downstream Triton compiler is expected to remove this no-op broadcast instruction. Test Plan: Added tests under test_torchinductor_strided_blocks.py:test_expand_broadcast in OSS and internal test cases. Reviewed By: blaine-rister Differential Revision: D65518033

Summary: Fix logic for inserting broadcast on kernel with load going directly to store. In the case where load is going directly to store, we insert a tl.broadcast on the store, regardless of the block size on the load. In the case where a broadcast is not required, the downstream Triton compiler is expected to remove this no-op broadcast instruction. Test Plan: Added tests under test_torchinductor_strided_blocks.py:test_expand_broadcast in OSS and internal test cases. Reviewed By: blaine-rister Differential Revision: D65518033 Pull Request resolved: #141693 Approved by: https://github.com/blaine-rister

…41693) Summary: Fix logic for inserting broadcast on kernel with load going directly to store. In the case where load is going directly to store, we insert a tl.broadcast on the store, regardless of the block size on the load. In the case where a broadcast is not required, the downstream Triton compiler is expected to remove this no-op broadcast instruction. Test Plan: Added tests under test_torchinductor_strided_blocks.py:test_expand_broadcast in OSS and internal test cases. Reviewed By: blaine-rister Differential Revision: D65518033 Pull Request resolved: pytorch#141693 Approved by: https://github.com/blaine-rister

pytorch-bot bot added ciflow/inductor module: inductor labels Nov 19, 2024

facebook-github-bot added the fb-exported label Nov 19, 2024

nandesuka requested review from blaine-rister and eellison November 19, 2024 16:13

nandesuka force-pushed the export-D65518033 branch from a629d1d to d7e2280 Compare November 19, 2024 19:11

nandesuka force-pushed the export-D65518033 branch from d7e2280 to b5b9898 Compare November 20, 2024 02:40

pytorch-bot bot added the topic: not user facing topic category label Nov 20, 2024

blaine-rister approved these changes Nov 20, 2024

View reviewed changes

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Nov 20, 2024

nandesuka force-pushed the export-D65518033 branch from b5b9898 to 0f2a82d Compare November 20, 2024 16:42

nandesuka force-pushed the export-D65518033 branch 2 times, most recently from 79d197f to 929d49a Compare November 25, 2024 20:20

nandesuka force-pushed the export-D65518033 branch from 929d49a to b1991d3 Compare November 26, 2024 02:13

nandesuka force-pushed the export-D65518033 branch from b1991d3 to 124e2b7 Compare November 26, 2024 02:15

pytorchmergebot force-pushed the export-D65518033 branch from 124e2b7 to b4276da Compare November 26, 2024 19:36

nandesuka closed this Nov 27, 2024

nandesuka deleted the export-D65518033 branch November 27, 2024 14:51

[Inductor] fix broadcast logic for Triton #141027

[Inductor] fix broadcast logic for Triton #141027

Uh oh!

Conversation

nandesuka commented Nov 19, 2024 • edited by pytorch-bot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Nov 19, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/141027

❌ 2 New Failures, 3 Unrelated Failures

Uh oh!

linux-foundation-easycla bot commented Nov 19, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

facebook-github-bot commented Nov 19, 2024

Uh oh!

facebook-github-bot commented Nov 19, 2024

Uh oh!

facebook-github-bot commented Nov 19, 2024

Uh oh!

facebook-github-bot commented Nov 20, 2024

Uh oh!

nandesuka commented Nov 20, 2024

Uh oh!

blaine-rister left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

blaine-rister Nov 20, 2024

Choose a reason for hiding this comment

Uh oh!

nandesuka Nov 20, 2024

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot commented Nov 20, 2024

Uh oh!

facebook-github-bot commented Nov 25, 2024

Uh oh!

facebook-github-bot commented Nov 26, 2024

Uh oh!

facebook-github-bot commented Nov 26, 2024

Uh oh!

facebook-github-bot commented Nov 26, 2024

Uh oh!

nandesuka commented Nov 26, 2024

Uh oh!

pytorchmergebot commented Nov 26, 2024

Uh oh!

pytorchmergebot commented Nov 26, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

nandesuka commented Nov 19, 2024 •

edited by pytorch-bot bot

Loading

pytorch-bot bot commented Nov 19, 2024 •

edited

Loading

linux-foundation-easycla bot commented Nov 19, 2024 •

edited

Loading

blaine-rister left a comment •

edited

Loading