[inductor] force strides for efficient attn bwd #138879

shunting314 · 2024-10-25T01:14:13Z

Stack from ghstack (oldest at bottom):

-> [inductor] force strides for efficient attn bwd #138879

Try to fix #138772 .

aten._scaled_dot_product_efficient_attention_backward requires the out and gradient_out to have stride order (3, 1, 2, 0). When Inductor layout optimization is enabled, Inductor may change tensor strides if they are not user visible. For efficient_attention_backward, Inductor tries to follow eager strides. But the eager strides Inductor gets for backward graph may be the one after optimization. There are a few possible fixes:

change the kernel to allow stride order other than (3, 1, 2, 0). This is probably hard
backout https://github.com/pytorch/pytorch/pull/112045/files and don't do layout optimization if the model contains efficient_attention.
Force (3, 1, 2, 0) strides order for the relevant tensors
Pass original eager layouts to Inductor for the backward graph. Let Inductor follow those layouts for tensors with extra layout requirement.

The PR implements option 3. Option 4 looks more general to me, I think we can do this in long term.

I tried to add a test but failed to repro: https://gist.github.com/shunting314/fe37a246aad269de9ea00199446688f6

Here is the original command to repro the issue:

TORCHINDUCTOR_LAYOUT_OPTIMIZATION=1 PYTORCH_NO_CUDA_MEMORY_CACHING=1 CUDA_LAUNCH_BLOCKING=1 time python benchmark.py --model maxvit_nano_rw_256 --precision bfloat16 --torchcompile --bench train --no-retry -b 64

benchmark.py is https://github.com/huggingface/pytorch-image-models/blob/main/benchmark.py

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang @aakhundov

[ghstack-poisoned]

pytorch-bot · 2024-10-25T01:14:17Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/138879

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure, 1 Unrelated Failure

As of commit 15e9cbf with merge base 889717a ():

NEW FAILURE - The following job has failed:

pull / linux-focal-py3.11-clang10 / test (dynamo, 2, 3, linux.2xlarge) (gh)
backends/xeon/test_launch 1/1 failed!

FLAKY - The following job failed but was likely due to flakiness present on trunk:

pull / linux-focal-py3.12-clang10 / test (default, 2, 4, linux.4xlarge) (gh) (similar failure)
'test/inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_multi_threading_dynamic_shapes_cpu'

This comment was automatically generated by Dr. CI and updates every 15 minutes.

ghstack-source-id: 95d605d Pull Request resolved: #138879

torch/_inductor/lowering.py

Try to fix #138772 . aten._scaled_dot_product_efficient_attention_backward requires the out and gradient_out to have stride order (3, 1, 2, 0). When Inductor layout optimization is enabled, Inductor may change tensor strides if they are not user visible. For efficient_attention_backward, Inductor tries to follow eager strides. But the eager strides Inductor gets for backward graph may be the one after optimization. There are a few possible fixes: 1. change the kernel to allow stride order other than (3, 1, 2, 0). This is probably hard 2. backout https://github.com/pytorch/pytorch/pull/112045/files and don't do layout optimization if the model contains efficient_attention. 3. Force (3, 1, 2, 0) strides order for the relevant tensors 4. Pass original eager layouts to Inductor for the backward graph. Let Inductor follow those layouts for tensors with extra layout requirement. The PR implements option 3. Option 4 looks more general to me, I think we can do this in long term. I tried to add a test but failed to repro: https://gist.github.com/shunting314/fe37a246aad269de9ea00199446688f6 Here is the original command to repro the issue: ``` TORCHINDUCTOR_LAYOUT_OPTIMIZATION=1 PYTORCH_NO_CUDA_MEMORY_CACHING=1 CUDA_LAUNCH_BLOCKING=1 time python benchmark.py --model maxvit_nano_rw_256 --precision bfloat16 --torchcompile --bench train --no-retry -b 64 ``` benchmark.py is https://github.com/huggingface/pytorch-image-models/blob/main/benchmark.py cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang aakhundov [ghstack-poisoned]

ghstack-source-id: 6a46119 Pull Request resolved: #138879

shunting314 · 2024-10-25T21:47:34Z

@pytorchbot merge

pytorchmergebot · 2024-10-25T21:49:13Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

eellison

Sorry, late comment, but test would be nice. i guess we couldnt get make smaller repro..

shunting314 · 2024-10-25T21:54:06Z

test would be nice. i guess we couldnt get make smaller repro..

Right I tried something but can not repro: https://gist.github.com/shunting314/fe37a246aad269de9ea00199446688f6

The fixed scenario happens with certain interaction between efficient attention and convolution.

pytorchmergebot · 2024-10-25T22:48:18Z

Merge failed

Reason: 1 mandatory check(s) failed. The first few are:

pull / linux-focal-py3.11-clang10 / test (dynamo, 2, 3, linux.2xlarge)

Dig deeper by viewing the failures on hud

Details for Dev Infra team

Raised by workflow job

Failing merge rule: Core Maintainers

shunting314 · 2024-10-27T04:40:29Z

@pytorchbot merge

pytorchmergebot · 2024-10-27T04:42:12Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2024-10-27T04:42:29Z

Merge failed

Reason: 1 mandatory check(s) failed. The first few are:

pull / linux-focal-py3.11-clang10 / test (dynamo, 2, 3, linux.2xlarge)

Dig deeper by viewing the failures on hud

Details for Dev Infra team

Raised by workflow job

Failing merge rule: Core Maintainers

shunting314 · 2024-10-27T04:46:25Z

@pytorchbot merge -i

pytorchmergebot · 2024-10-27T04:48:11Z

Merge started

Your change will be merged while ignoring the following 2 checks: pull / linux-focal-py3.12-clang10 / test (default, 2, 4, linux.4xlarge), pull / linux-focal-py3.11-clang10 / test (dynamo, 2, 3, linux.2xlarge)

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

[inductor] force strides for efficient attn bwd

caa2cd5

[ghstack-poisoned]

pytorch-bot bot added ciflow/inductor module: inductor labels Oct 25, 2024

shunting314 added a commit that referenced this pull request Oct 25, 2024

[inductor] force strides for efficient attn bwd

12c8aa1

ghstack-source-id: 95d605d Pull Request resolved: #138879

shunting314 requested review from Chillee, drisspg and eellison October 25, 2024 01:23

drisspg reviewed Oct 25, 2024

View reviewed changes

torch/_inductor/lowering.py Outdated Show resolved Hide resolved

drisspg reviewed Oct 25, 2024

View reviewed changes

torch/_inductor/lowering.py Show resolved Hide resolved

drisspg approved these changes Oct 25, 2024

View reviewed changes

shunting314 added a commit that referenced this pull request Oct 25, 2024

[inductor] force strides for efficient attn bwd

ce4092e

ghstack-source-id: 6a46119 Pull Request resolved: #138879

shunting314 added the topic: not user facing topic category label Oct 25, 2024

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Oct 25, 2024

pytorchmergebot added the merging label Oct 25, 2024

eellison approved these changes Oct 25, 2024

View reviewed changes

pytorchmergebot removed the merging label Oct 25, 2024

pytorchmergebot added the merging label Oct 27, 2024

pytorchmergebot removed the merging label Oct 27, 2024

pytorchmergebot added the merging label Oct 27, 2024

pytorchmergebot added the Merged label Oct 27, 2024

pytorchmergebot closed this in 4681539 Oct 27, 2024

pytorchmergebot removed the merging label Oct 27, 2024

shunting314 mentioned this pull request Nov 1, 2024

Errors with torch.compile after upgrading to 2.4.0 #133571

Closed

github-actions bot deleted the gh/shunting314/181/head branch November 28, 2024 02:12

[inductor] force strides for efficient attn bwd #138879

[inductor] force strides for efficient attn bwd #138879

Uh oh!

Conversation

shunting314 commented Oct 25, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Oct 25, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/138879

❌ 1 New Failure, 1 Unrelated Failure

Uh oh!

Uh oh!

Uh oh!

shunting314 commented Oct 25, 2024

Uh oh!

pytorchmergebot commented Oct 25, 2024

Merge started

Uh oh!

eellison left a comment

Choose a reason for hiding this comment

Uh oh!

shunting314 commented Oct 25, 2024

Uh oh!

pytorchmergebot commented Oct 25, 2024

Merge failed

Uh oh!

shunting314 commented Oct 27, 2024

Uh oh!

pytorchmergebot commented Oct 27, 2024

Merge started

Uh oh!

pytorchmergebot commented Oct 27, 2024

Merge failed

Uh oh!

shunting314 commented Oct 27, 2024

Uh oh!

pytorchmergebot commented Oct 27, 2024

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

shunting314 commented Oct 25, 2024 •

edited

Loading

pytorch-bot bot commented Oct 25, 2024 •

edited

Loading