-
Notifications
You must be signed in to change notification settings - Fork 25.7k
[AOTI] don't allow int32 indices if {non-inf, > int32_max} upper bound is provided #159433
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AOTI] don't allow int32 indices if {non-inf, > int32_max} upper bound is provided #159433
Conversation
…d is provided [ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/159433
Note: Links to docs will display an error until the docs builds have been completed. ✅ You can merge normally! (1 Unrelated Failure)As of commit e9a9ff1 with merge base 9a680e1 ( UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
@davidberard98 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
torch/_inductor/utils.py
Outdated
|
||
if V.aot_compilation: | ||
# check whether value has an upper bound | ||
if V.graph.sizevars.statically_known_true(e < 1e100): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2^64 (~1e20) should be good enough?
… upper bound is provided" cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben Differential Revision: [D79220301](https://our.internmc.facebook.com/intern/diff/D79220301) [ghstack-poisoned]
… upper bound is provided" cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben Differential Revision: [D79220301](https://our.internmc.facebook.com/intern/diff/D79220301) [ghstack-poisoned]
… upper bound is provided" cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben Differential Revision: [D79220301](https://our.internmc.facebook.com/intern/diff/D79220301) [ghstack-poisoned]
@davidberard98 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
…pper bound is provided" cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben Differential Revision: [D79220301](https://our.internmc.facebook.com/intern/diff/D79220301) [ghstack-poisoned]
@pytorchbot rebase |
@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here |
Successfully rebased |
@davidberard98 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
I ran an A100 benchmark and there's no significant performance regression. LGTM! |
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
…d is provided (pytorch#159433) **Motivation / Context**: (what I _think_ is happening here) In "eager"/just-in-time PT2 usage, dynamo/inductor will guard on whether indices fit in int32 or not. So it's generally safe in Inductor code to rely on the example values for symbolic ints in order to determine whether indices fit in int32, because the indices will be guarded on anyway; and if the inputs ever increase to `>int32_max`, dynamo will cause a recompilation. But with AOTI, those int32 guards aren't respected; so if the example input is `< int32_max` but can be `> int32_max` during future execution, then the future execution might fail / IMA. **Solution space** Export allows users to specify which dimension are dynamic, and to provide **ranges of valid sizes**. One solution idea is to always respect the upper bound of the dynamic shape range when doing AOTI; if the index's range includes values `>int32_max`, then don't use the hint and assume that this index doesn't fit in int32. However, the problem with this is that many users may specify dynamism without specifying a range of values - the upper bound of the range will be set to the default of `inf`. Such use cases could potentially experience a perf regression if we implemented the idea above. To prevent any such regressions, this implementation will rely solely on the specified range only if the upper bound of the range isn't inf. In other words, we'll ignore the hints/example values for AOTI (and rely only on the specified range) only if the upper bound of the range isn't inf - if users explicitly specify a range that extends past int32, we can be fairly sure that they actually do need values `>int32_max`. If we continue to see correctness issues even with this implementation, we could consider more aggressively relying on the ranges. Differential Revision: [D79220301](https://our.internmc.facebook.com/intern/diff/D79220301) Pull Request resolved: pytorch#159433 Approved by: https://github.com/jingsh, https://github.com/ColinPeppler
Stack from ghstack (oldest at bottom):
Motivation / Context: (what I think is happening here)
In "eager"/just-in-time PT2 usage, dynamo/inductor will guard on whether indices fit in int32 or not. So it's generally safe in Inductor code to rely on the example values for symbolic ints in order to determine whether indices fit in int32, because the indices will be guarded on anyway; and if the inputs ever increase to
>int32_max
, dynamo will cause a recompilation.But with AOTI, those int32 guards aren't respected; so if the example input is
< int32_max
but can be> int32_max
during future execution, then the future execution might fail / IMA.Solution space
Export allows users to specify which dimension are dynamic, and to provide ranges of valid sizes.
One solution idea is to always respect the upper bound of the dynamic shape range when doing AOTI; if the index's range includes values
>int32_max
, then don't use the hint and assume that this index doesn't fit in int32.However, the problem with this is that many users may specify dynamism without specifying a range of values - the upper bound of the range will be set to the default of
inf
. Such use cases could potentially experience a perf regression if we implemented the idea above.To prevent any such regressions, this implementation will rely solely on the specified range only if the upper bound of the range isn't inf. In other words, we'll ignore the hints/example values for AOTI (and rely only on the specified range) only if the upper bound of the range isn't inf - if users explicitly specify a range that extends past int32, we can be fairly sure that they actually do need values
>int32_max
.If we continue to see correctness issues even with this implementation, we could consider more aggressively relying on the ranges.
cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov @coconutruben
Differential Revision: D79220301