Add option to tweak inductor stride settings for user-defined triton kernels #135530

zou3519 · 2024-09-09T20:29:40Z

Stack from ghstack (oldest at bottom):

Flip triton kernel default layout constraint to "needs_fixed_stride_order" #135581
-> Add option to tweak inductor stride settings for user-defined triton kernels #135530

Previously, Inductor was allowed to modify the stride/storage_offset
(layout) for inputs to user-defined triton kernels. This can cause
silent incorrectness because most triton kernels are written for a
specific striding pattern (usually contiguous).

This PR adds a config to allow the user to choose Inductor's behavior on
this. The options are:

"flexible_layout" (default): Inductor can modify the layout for inputs
to user-defined triton kernels as much as it wants.
"needs_fixed_stride_order": Inductor must preserve the stride order
(when compared to tracing) for inputs to user-defined triton kernels.

This matches our handling for custom operators. In the future, we'll
want a "needs_exact_strides" option (this is the safest option).

Test Plan:

new test

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang

…kernels Previously, Inductor was allowed to modify the stride/storage_offset (layout) for inputs to user-defined triton kernels. This can cause silent incorrectness because most triton kernels are written for a specific striding pattern (usually contiguous). This PR adds a config to allow the user to choose Inductor's behavior on this. The options are: - "flexible_layout" (default): Inductor can modify the layout for inputs to user-defined triton kernels as much as it wants. - "needs_fixed_stride_order": Inductor must preserve the stride order (when compared to tracing) for inputs to user-defined triton kernels. This matches our handling for custom operators. In the future, we'll want a "needs_exact_strides" option (this is the safest option). Test Plan: - new test [ghstack-poisoned]

pytorch-bot · 2024-09-09T20:29:43Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/135530

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit 448c377 with merge base 5a9ac83 ():

FLAKY - The following job failed but was likely due to flakiness present on trunk:

pull / linux-focal-cuda12.1-py3.10-gcc9-sm86 / test (default, 2, 5, linux.g5.4xlarge.nvidia.gpu) (gh) (disabled by #135102)
dynamo/test_modes.py::TorchFunctionModeTests::test_torch_function_mode_guards_ignored_types_py

This comment was automatically generated by Dr. CI and updates every 15 minutes.

…kernels Previously, Inductor was allowed to modify the stride/storage_offset (layout) for inputs to user-defined triton kernels. This can cause silent incorrectness because most triton kernels are written for a specific striding pattern (usually contiguous). This PR adds a config to allow the user to choose Inductor's behavior on this. The options are: - "flexible_layout" (default): Inductor can modify the layout for inputs to user-defined triton kernels as much as it wants. - "needs_fixed_stride_order": Inductor must preserve the stride order (when compared to tracing) for inputs to user-defined triton kernels. This matches our handling for custom operators. In the future, we'll want a "needs_exact_strides" option (this is the safest option). Test Plan: - new test ghstack-source-id: a67d19e Pull Request resolved: #135530

FindHao

LGTM

FindHao · 2024-09-09T22:13:03Z

torch/_inductor/graph.py

+                debug("user_defined_triton_kernel_layout_constraints")
+                if (
+                    config.triton_kernel_default_layout_constraint
+                    == "needs_fixed_stride_order"


Will strings needs_fixed_stride_order and flexible_layout be used in other places? If so, can we avoid hardcoding?

I don't think so, I'll turn it into an enum if I end up needing it more. I wasn't sure if enums were valid Inductor config options (there's a limited set of types it accepts).

…ned triton kernels" Previously, Inductor was allowed to modify the stride/storage_offset (layout) for inputs to user-defined triton kernels. This can cause silent incorrectness because most triton kernels are written for a specific striding pattern (usually contiguous). This PR adds a config to allow the user to choose Inductor's behavior on this. The options are: - "flexible_layout" (default): Inductor can modify the layout for inputs to user-defined triton kernels as much as it wants. - "needs_fixed_stride_order": Inductor must preserve the stride order (when compared to tracing) for inputs to user-defined triton kernels. This matches our handling for custom operators. In the future, we'll want a "needs_exact_strides" option (this is the safest option). Test Plan: - new test cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang [ghstack-poisoned]

…kernels Previously, Inductor was allowed to modify the stride/storage_offset (layout) for inputs to user-defined triton kernels. This can cause silent incorrectness because most triton kernels are written for a specific striding pattern (usually contiguous). This PR adds a config to allow the user to choose Inductor's behavior on this. The options are: - "flexible_layout" (default): Inductor can modify the layout for inputs to user-defined triton kernels as much as it wants. - "needs_fixed_stride_order": Inductor must preserve the stride order (when compared to tracing) for inputs to user-defined triton kernels. This matches our handling for custom operators. In the future, we'll want a "needs_exact_strides" option (this is the safest option). Test Plan: - new test ghstack-source-id: 7c7bdd2 Pull Request resolved: #135530

eellison

Do we still want stride order as the api now that we support matching exact strides ?

zou3519 · 2024-09-10T20:00:49Z

Do we still want stride order as the api now that we support matching exact strides ?

We haven't put the exact strides logic in yet. The API should be {flexible_layout, match_stride_order, match_exact_strides}, and the ultimate default we want is match_exact_strides. We're moving there but "match_stride_order" is a good middle ground.

zou3519 · 2024-09-10T20:01:37Z

@pytorchbot merge

pytorchmergebot · 2024-09-10T20:03:20Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

…ntroduced in #135530 [ghstack-poisoned]

…ntroduced in #135530 ghstack-source-id: 491759e Pull Request resolved: #135656

…rder" (#135581) This is to match the default layout constraint for custom operators. By default, Inductor should match the stride order of inputs to a triton kernel. Test Plan: - existing tests Pull Request resolved: #135581 Approved by: https://github.com/eellison ghstack dependencies: #135530

…ntroduced in #135530 ghstack-source-id: 7d0e14c Pull Request resolved: #135656

…ntroduced in #135530 (#135656) Pull Request resolved: #135656 Approved by: https://github.com/EikanWang, https://github.com/zou3519

…kernels (pytorch#135530) Previously, Inductor was allowed to modify the stride/storage_offset (layout) for inputs to user-defined triton kernels. This can cause silent incorrectness because most triton kernels are written for a specific striding pattern (usually contiguous). This PR adds a config to allow the user to choose Inductor's behavior on this. The options are: - "flexible_layout" (default): Inductor can modify the layout for inputs to user-defined triton kernels as much as it wants. - "needs_fixed_stride_order": Inductor must preserve the stride order (when compared to tracing) for inputs to user-defined triton kernels. This matches our handling for custom operators. In the future, we'll want a "needs_exact_strides" option (this is the safest option). Test Plan: - new test Pull Request resolved: pytorch#135530 Approved by: https://github.com/FindHao, https://github.com/oulgen

…rder" (pytorch#135581) This is to match the default layout constraint for custom operators. By default, Inductor should match the stride order of inputs to a triton kernel. Test Plan: - existing tests Pull Request resolved: pytorch#135581 Approved by: https://github.com/eellison ghstack dependencies: pytorch#135530

…ntroduced in pytorch#135530 (pytorch#135656) Pull Request resolved: pytorch#135656 Approved by: https://github.com/EikanWang, https://github.com/zou3519

pytorch-bot bot added ciflow/inductor module: inductor labels Sep 9, 2024

zou3519 added keep-going Don't stop on first failure, keep running tests until the end ci-no-td Do not run TD on this PR release notes: inductor labels Sep 9, 2024

zou3519 requested review from FindHao, eellison and oulgen September 9, 2024 21:38

FindHao approved these changes Sep 9, 2024

View reviewed changes

oulgen approved these changes Sep 9, 2024

View reviewed changes

zou3519 mentioned this pull request Sep 10, 2024

Flip triton kernel default layout constraint to "needs_fixed_stride_order" #135581

Closed

eellison reviewed Sep 10, 2024

View reviewed changes

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Sep 10, 2024

pytorchmergebot added the merging label Sep 10, 2024

pytorchmergebot added the Merged label Sep 11, 2024

pytorchmergebot closed this in 29408ea Sep 11, 2024

pytorchmergebot removed the merging label Sep 11, 2024

etaf mentioned this pull request Sep 11, 2024

Update torch-xpu-ops pin (ATen XPU implementation) #135647

Closed

etaf added a commit that referenced this pull request Sep 11, 2024

[Inductor UT] Generalize device-bias code in test_triton_kernels.py i…

f87f55f

…ntroduced in #135530 [ghstack-poisoned]

etaf added a commit that referenced this pull request Sep 11, 2024

[Inductor UT] Generalize device-bias code in test_triton_kernels.py i…

cf7d97a

…ntroduced in #135530 ghstack-source-id: 491759e Pull Request resolved: #135656

alexbaden mentioned this pull request Sep 12, 2024

[Inductor] Fix needs_fixed_stride_order test on XPU #135779

Closed

pytorchmergebot pushed a commit that referenced this pull request Sep 12, 2024

[Inductor UT] Generalize device-bias code in test_triton_kernels.py i…

dc95e50

…ntroduced in #135530 ghstack-source-id: 7d0e14c Pull Request resolved: #135656

github-actions bot deleted the gh/zou3519/1066/head branch October 12, 2024 02:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add option to tweak inductor stride settings for user-defined triton kernels #135530

Add option to tweak inductor stride settings for user-defined triton kernels #135530

Uh oh!

zou3519 commented Sep 9, 2024 •

edited

Loading

Uh oh!

pytorch-bot bot commented Sep 9, 2024 •

edited

Loading

Uh oh!

FindHao left a comment

Uh oh!

FindHao Sep 9, 2024

Uh oh!

zou3519 Sep 10, 2024

Uh oh!

eellison left a comment

Uh oh!

zou3519 commented Sep 10, 2024

Uh oh!

zou3519 commented Sep 10, 2024

Uh oh!

pytorchmergebot commented Sep 10, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Add option to tweak inductor stride settings for user-defined triton kernels #135530

Add option to tweak inductor stride settings for user-defined triton kernels #135530

Uh oh!

Conversation

zou3519 commented Sep 9, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Sep 9, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/135530

✅ You can merge normally! (1 Unrelated Failure)

Uh oh!

FindHao left a comment

Choose a reason for hiding this comment

Uh oh!

FindHao Sep 9, 2024

Choose a reason for hiding this comment

Uh oh!

zou3519 Sep 10, 2024

Choose a reason for hiding this comment

Uh oh!

eellison left a comment

Choose a reason for hiding this comment

Uh oh!

zou3519 commented Sep 10, 2024

Uh oh!

zou3519 commented Sep 10, 2024

Uh oh!

pytorchmergebot commented Sep 10, 2024

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

zou3519 commented Sep 9, 2024 •

edited

Loading

pytorch-bot bot commented Sep 9, 2024 •

edited

Loading