KEMBAR78
[inductor] modify the heuristic for loop split optimization by jiayisunx · Pull Request #137550 · pytorch/pytorch · GitHub
Skip to content

Conversation

@jiayisunx
Copy link
Collaborator

@jiayisunx jiayisunx commented Oct 9, 2024

Stack from ghstack (oldest at bottom):

Summary

  1. Improve the heuristic for loop split optimization: The divisor needs to be an integer and cannot be too small (needs to be greater than 8, this threshold has been tuned).
  2. Improve the heuristic for disabling vectorization: add quantity_threshold and relax ratio_threshold for the number of non-contiguous load/store/index_expr in the loop body.

This PR will bring performance improvements for two torchbench models(functorch_dp_cifar10, opacus_cifar10) and one timm model(sebotnet33ts_256).

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang @aakhundov

[ghstack-poisoned]
@pytorch-bot
Copy link

pytorch-bot bot commented Oct 9, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/137550

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (3 Unrelated Failures)

As of commit 0cd0434 with merge base d0fd42e (image):

FLAKY - The following job failed but was likely due to flakiness present on trunk:

BROKEN TRUNK - The following jobs failed but was present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

jiayisunx added a commit that referenced this pull request Oct 9, 2024
@jiayisunx jiayisunx marked this pull request as draft October 9, 2024 02:30
@jiayisunx jiayisunx added the topic: not user facing topic category label Oct 9, 2024
@jiayisunx jiayisunx added ciflow/trunk Trigger trunk jobs on your pull request and removed open source labels Oct 9, 2024
[ghstack-poisoned]
jiayisunx added a commit that referenced this pull request Oct 10, 2024
jiayisunx added a commit that referenced this pull request Oct 21, 2024
[ghstack-poisoned]
@jiayisunx jiayisunx marked this pull request as ready for review October 21, 2024 02:15
Copy link
Collaborator

@jgong5 jgong5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have perf numbers to support this heuristics?

[ghstack-poisoned]
[ghstack-poisoned]
jiayisunx added a commit that referenced this pull request Oct 29, 2024
[ghstack-poisoned]
jiayisunx added a commit that referenced this pull request Nov 4, 2024
jiayisunx added a commit to jiayisunx/pytorch that referenced this pull request Nov 14, 2024
[ghstack-poisoned]
jiayisunx added a commit that referenced this pull request Nov 18, 2024
[ghstack-poisoned]
jiayisunx added a commit that referenced this pull request Nov 21, 2024
@jiayisunx jiayisunx requested a review from jansel November 21, 2024 14:50
[ghstack-poisoned]
@jiayisunx
Copy link
Collaborator Author

Do you have perf numbers to support this heuristics?

Yes, this PR will bring about 20% performance improvements for two torchbench models(functorch_dp_cifar10, opacus_cifar10) and one timm model(sebotnet33ts_256).

@jiayisunx
Copy link
Collaborator Author

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

[ghstack-poisoned]
jiayisunx added a commit that referenced this pull request Nov 25, 2024
@jiayisunx
Copy link
Collaborator Author

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

Ryo-not-rio pushed a commit to Ryo-not-rio/pytorch that referenced this pull request Dec 2, 2024
…137550)

### Summary

1. Improve the heuristic for loop split optimization: The divisor needs to be an integer and cannot be too small (needs to be greater than 8, this threshold has been tuned).
2. Improve the heuristic for disabling vectorization: add quantity_threshold and relax ratio_threshold for the number of non-contiguous load/store/index_expr in the loop body.

This PR will bring performance improvements for two torchbench models(functorch_dp_cifar10, opacus_cifar10) and one timm model(sebotnet33ts_256).

Pull Request resolved: pytorch#137550
Approved by: https://github.com/leslie-fang-intel, https://github.com/jgong5, https://github.com/jansel
pobin6 pushed a commit to pobin6/pytorch that referenced this pull request Dec 5, 2024
…137550)

### Summary

1. Improve the heuristic for loop split optimization: The divisor needs to be an integer and cannot be too small (needs to be greater than 8, this threshold has been tuned).
2. Improve the heuristic for disabling vectorization: add quantity_threshold and relax ratio_threshold for the number of non-contiguous load/store/index_expr in the loop body.

This PR will bring performance improvements for two torchbench models(functorch_dp_cifar10, opacus_cifar10) and one timm model(sebotnet33ts_256).

Pull Request resolved: pytorch#137550
Approved by: https://github.com/leslie-fang-intel, https://github.com/jgong5, https://github.com/jansel
@github-actions github-actions bot deleted the gh/jiayisunx/32/head branch December 26, 2024 02:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants