Use different conv layout optimization heuristics for inference #114600

eellison · 2023-11-27T16:52:15Z

Stack from ghstack (oldest at bottom):

-> Use different conv layout optimization heuristics for inference #114600

While many models regress in training when converted to channels last, in inference the results are quite different. Almost all of the models experienced a speedup when converted to channels last. There were a few big regressions in torchbench - timm_regnet from 1.4343 → 1.0573 and timm_resnet from 1.7484 → 1.2868.

I used a modified script of the operator benchmarks here to measure the average speedup of convolutions across all of the input shapes found in torchbench according to the existing classifications that @shunting314 used - grouped convs, small channel convs, convolution with larger in-channel than out-channel. Only grouped convolutions benchmarked as a slowdown in inference.

I updated the inference heuristic to multiply the flops of each conv with its predicted speedup/slowdown in channels last. With this heuristic the two previously regressing models no longer regress.

Speeds up inference for torchbench ~8% and timm ~6%. The motivating model here was SDXL which now hits channels last and improves 10%.

There were some models that were sped up in training when forcing channels last (along with a number of regressions). It's possible there is some speedup in training to be had with additional heuristics. We could also have more granular classification/predictions which might benefit both training and inference.

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @aakhundov @ColinPeppler

[ghstack-poisoned]

pytorch-bot · 2023-11-27T16:52:19Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/114600

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (5 Unrelated Failures)

As of commit 70140dc with merge base 56a95af ():

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

…rence" cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler [ghstack-poisoned]

…rence" While many models regress in training when converted to channels last, in inference the results are quite different. Almost all of the models experienced a speedup when converted to channels last. There were a few big regressions in torchbench - `timm_regnet` from `1.4343 → 1.0573` and `timm_resnet` from `1.7484 → 1.2868`. I used a modified script of the operator benchmarks [here](https://gist.github.com/eellison/e11dc645412f52e8b45fb26ba6f9f6a1) to measure the average speedup of convolutions across all of the input shapes found in torchbench according to the existing classifications that shunting314 used - grouped convs, small channel convs, convolution with larger in-channel than out-channel. Only grouped convolutions benchmarked as a slowdown in inference. I updated the inference heuristic to multiply the flops of each conv with its predicted speedup/slowdown in channels last. With this heuristic the two previously regressing models no longer regress. Speeds up inference for torchbench ~8% and timm ~6%. The motivating model here was SDXL which now hits channels last and improves 10%. There were some models that were sped up in training when forcing channels last (along with a number of regressions), it's possible there is some speedup there to be had. We could also have more granular classification/predictions. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler [ghstack-poisoned]

ghstack-source-id: 4f92adc Pull Request resolved: #114600

…rence" While many models regress in training when converted to channels last, in inference the results are quite different. Almost all of the models experienced a speedup when converted to channels last. There were a few big regressions in torchbench - `timm_regnet` from `1.4343 → 1.0573` and `timm_resnet` from `1.7484 → 1.2868`. I used a modified script of the operator benchmarks [here](https://gist.github.com/eellison/e11dc645412f52e8b45fb26ba6f9f6a1) to measure the average speedup of convolutions across all of the input shapes found in torchbench according to the existing classifications that shunting314 used - grouped convs, small channel convs, convolution with larger in-channel than out-channel. Only grouped convolutions benchmarked as a slowdown in inference. I updated the inference heuristic to multiply the flops of each conv with its predicted speedup/slowdown in channels last. With this heuristic the two previously regressing models no longer regress. Speeds up inference for torchbench ~8% and timm ~6%. The motivating model here was SDXL which now hits channels last and improves 10%. There were some models that were sped up in training when forcing channels last (along with a number of regressions), it's possible there is some speedup there to be had. We could also have more granular classification/predictions. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler [ghstack-poisoned]

ghstack-source-id: cfe7116 Pull Request resolved: #114600

test/inductor/test_torchinductor.py

jansel · 2023-11-28T00:31:32Z

test/inductor/test_torchinductor_dynamic_shapes.py

    "test_kwargs_dynamic_shapes": TestFailure(("cpu",)),
    # calling div on only symint args
    "test_AllenaiLongformerBase_repro_dynamic_shapes": TestFailure(("cpu", "cuda")),
+    "test_conv_inference_heuristics_dynamic_shapes": TestFailure("cuda"),


We turn off channels last optimization for dynamic shapes. That is blocked on persistent reduction perf not being enabled for dynamic shapes. CC @peterbell10 @shunting314 was one of you working on persistent reduction with dynamic shapes ?

torch/_inductor/graph.py

…rence" While many models regress in training when converted to channels last, in inference the results are quite different. Almost all of the models experienced a speedup when converted to channels last. There were a few big regressions in torchbench - `timm_regnet` from `1.4343 → 1.0573` and `timm_resnet` from `1.7484 → 1.2868`. I used a modified script of the operator benchmarks [here](https://gist.github.com/eellison/e11dc645412f52e8b45fb26ba6f9f6a1) to measure the average speedup of convolutions across all of the input shapes found in torchbench according to the existing classifications that shunting314 used - grouped convs, small channel convs, convolution with larger in-channel than out-channel. Only grouped convolutions benchmarked as a slowdown in inference. I updated the inference heuristic to multiply the flops of each conv with its predicted speedup/slowdown in channels last. With this heuristic the two previously regressing models no longer regress. Speeds up inference for torchbench ~8% and timm ~6%. The motivating model here was SDXL which now hits channels last and improves 10%. There were some models that were sped up in training when forcing channels last (along with a number of regressions). It's possible there is some speedup in training to be had with additional heuristics. We could also have more granular classification/predictions which might benefit both training and inference. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler [ghstack-poisoned]

ghstack-source-id: 32a372b Pull Request resolved: #114600

shunting314 · 2023-11-28T21:44:19Z

Speeds up inference for torchbench ~8% and timm ~6%.

Is this speedup between using the new inference specific heuristics v.s. using the previous heuristics for both training and inference?

eellison · 2023-11-29T05:22:24Z

@shunting314 the speedup is for inference relative to main branch.

eellison · 2023-11-29T05:22:28Z

@pytorchbot merge

pytorchmergebot · 2023-11-29T05:24:42Z

Merge failed

Reason: This PR needs a release notes: label
If your changes are user facing and intended to be a part of release notes, please use a label starting with release notes:.

If not, please add the topic: not user facing label.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "topic: not user facing"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Details for Dev Infra team

Raised by workflow job

eellison · 2023-11-29T05:37:28Z

@pytorchbot merge

pytorchmergebot · 2023-11-29T05:39:23Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

stncil · 2024-02-13T16:47:21Z

Is there any reason as to why this speedup/slowdown occurs and also why the behavior changes during training and inference>

Use different conv layout optimization heuristics for inference

7f008ef

[ghstack-poisoned]

github-actions bot added module: inductor module: dynamo ciflow/inductor labels Nov 27, 2023

Update on "Use different conv layout optimization heuristics for infe…

be8526e

…rence" cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler [ghstack-poisoned]

Update on "Use different conv layout optimization heuristics for infe…

8dd9e70

…rence" cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler [ghstack-poisoned]

eellison requested review from Chillee, jansel and shunting314 and removed request for Chillee, jansel and shunting314 November 27, 2023 17:57

eellison added a commit that referenced this pull request Nov 27, 2023

Use different conv layout optimization heuristics for inference

e795e79

ghstack-source-id: 4f92adc Pull Request resolved: #114600

eellison requested review from Chillee, jansel, jgong5 and shunting314 and removed request for jgong5 and shunting314 November 27, 2023 22:17

eellison added a commit that referenced this pull request Nov 28, 2023

Use different conv layout optimization heuristics for inference

68d85b3

ghstack-source-id: cfe7116 Pull Request resolved: #114600

jansel requested changes Nov 28, 2023

View reviewed changes

eellison added a commit that referenced this pull request Nov 28, 2023

Use different conv layout optimization heuristics for inference

b7dfbab

ghstack-source-id: 32a372b Pull Request resolved: #114600

eellison requested a review from jansel November 28, 2023 17:04

jansel approved these changes Nov 28, 2023

View reviewed changes

shunting314 approved these changes Nov 28, 2023

View reviewed changes

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Nov 29, 2023

pytorchmergebot added the merging label Nov 29, 2023

pytorchmergebot removed the merging label Nov 29, 2023

eellison added the topic: not user facing topic category label Nov 29, 2023

pytorchmergebot added the merging label Nov 29, 2023

pytorchmergebot added Merged and removed merging labels Nov 29, 2023

pytorchmergebot closed this in 7692595 Nov 29, 2023

facebook-github-bot deleted the gh/eellison/561/head branch December 2, 2023 15:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Use different conv layout optimization heuristics for inference #114600

Use different conv layout optimization heuristics for inference #114600

Uh oh!

eellison commented Nov 27, 2023 •

edited

Loading

Uh oh!

pytorch-bot bot commented Nov 27, 2023 •

edited

Loading

Uh oh!

Uh oh!

jansel Nov 28, 2023

Uh oh!

eellison Nov 28, 2023

Uh oh!

Uh oh!

shunting314 commented Nov 28, 2023

Uh oh!

eellison commented Nov 29, 2023 •

edited

Loading

Uh oh!

eellison commented Nov 29, 2023

Uh oh!

pytorchmergebot commented Nov 29, 2023

Uh oh!

eellison commented Nov 29, 2023

Uh oh!

pytorchmergebot commented Nov 29, 2023

Uh oh!

stncil commented Feb 13, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Use different conv layout optimization heuristics for inference #114600

Use different conv layout optimization heuristics for inference #114600

Uh oh!

Conversation

eellison commented Nov 27, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Nov 27, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/114600

✅ You can merge normally! (5 Unrelated Failures)

Uh oh!

Uh oh!

jansel Nov 28, 2023

Choose a reason for hiding this comment

Uh oh!

eellison Nov 28, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

shunting314 commented Nov 28, 2023

Uh oh!

eellison commented Nov 29, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

eellison commented Nov 29, 2023

Uh oh!

pytorchmergebot commented Nov 29, 2023

Merge failed

Uh oh!

eellison commented Nov 29, 2023

Uh oh!

pytorchmergebot commented Nov 29, 2023

Merge started

Uh oh!

stncil commented Feb 13, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

eellison commented Nov 27, 2023 •

edited

Loading

pytorch-bot bot commented Nov 27, 2023 •

edited

Loading

eellison commented Nov 29, 2023 •

edited

Loading