KEMBAR78
[AOTI] don't allow int32 indices if {non-inf, > int32_max} upper bound is provided by davidberard98 · Pull Request #159433 · pytorch/pytorch · GitHub
Skip to content

Conversation

@davidberard98
Copy link
Contributor

@davidberard98 davidberard98 commented Jul 29, 2025

Stack from ghstack (oldest at bottom):

Motivation / Context: (what I think is happening here)

In "eager"/just-in-time PT2 usage, dynamo/inductor will guard on whether indices fit in int32 or not. So it's generally safe in Inductor code to rely on the example values for symbolic ints in order to determine whether indices fit in int32, because the indices will be guarded on anyway; and if the inputs ever increase to >int32_max, dynamo will cause a recompilation.

But with AOTI, those int32 guards aren't respected; so if the example input is < int32_max but can be > int32_max during future execution, then the future execution might fail / IMA.

Solution space

Export allows users to specify which dimension are dynamic, and to provide ranges of valid sizes.

One solution idea is to always respect the upper bound of the dynamic shape range when doing AOTI; if the index's range includes values >int32_max, then don't use the hint and assume that this index doesn't fit in int32.

However, the problem with this is that many users may specify dynamism without specifying a range of values - the upper bound of the range will be set to the default of inf. Such use cases could potentially experience a perf regression if we implemented the idea above.

To prevent any such regressions, this implementation will rely solely on the specified range only if the upper bound of the range isn't inf. In other words, we'll ignore the hints/example values for AOTI (and rely only on the specified range) only if the upper bound of the range isn't inf - if users explicitly specify a range that extends past int32, we can be fairly sure that they actually do need values >int32_max.

If we continue to see correctness issues even with this implementation, we could consider more aggressively relying on the ranges.

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov @coconutruben

Differential Revision: D79220301

@pytorch-bot
Copy link

pytorch-bot bot commented Jul 29, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/159433

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit e9a9ff1 with merge base 9a680e1 (image):

UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

davidberard98 added a commit that referenced this pull request Jul 29, 2025
…d is provided

ghstack-source-id: dcf376c
Pull Request resolved: #159433
@davidberard98
Copy link
Contributor Author

@davidberard98 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Jul 30, 2025

if V.aot_compilation:
# check whether value has an upper bound
if V.graph.sizevars.statically_known_true(e < 1e100):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2^64 (~1e20) should be good enough?

… upper bound is provided"

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben

Differential Revision: [D79220301](https://our.internmc.facebook.com/intern/diff/D79220301)

[ghstack-poisoned]
davidberard98 added a commit that referenced this pull request Jul 30, 2025
…d is provided

ghstack-source-id: a5ed891
Pull Request resolved: #159433
… upper bound is provided"

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben

Differential Revision: [D79220301](https://our.internmc.facebook.com/intern/diff/D79220301)

[ghstack-poisoned]
davidberard98 added a commit that referenced this pull request Aug 2, 2025
…d is provided

ghstack-source-id: 32bfa86
Pull Request resolved: #159433
… upper bound is provided"

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben

Differential Revision: [D79220301](https://our.internmc.facebook.com/intern/diff/D79220301)

[ghstack-poisoned]
davidberard98 added a commit that referenced this pull request Aug 2, 2025
…d is provided

ghstack-source-id: 824546e
Pull Request resolved: #159433
@davidberard98 davidberard98 marked this pull request as ready for review August 2, 2025 04:47
@davidberard98
Copy link
Contributor Author

@davidberard98 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

…pper bound is provided"

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben

Differential Revision: [D79220301](https://our.internmc.facebook.com/intern/diff/D79220301)

[ghstack-poisoned]
davidberard98 added a commit that referenced this pull request Aug 2, 2025
…d is provided

ghstack-source-id: 9a4d42a
Pull Request resolved: #159433
@davidberard98
Copy link
Contributor Author

@pytorchbot rebase

@pytorchmergebot
Copy link
Collaborator

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

[ghstack-poisoned]
@pytorchmergebot
Copy link
Collaborator

Successfully rebased gh/davidberard98/393/orig onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via ghstack checkout https://github.com/pytorch/pytorch/pull/159433)

pytorchmergebot pushed a commit that referenced this pull request Aug 3, 2025
…d is provided

ghstack-source-id: ae56ecf
Pull Request resolved: #159433
@davidberard98
Copy link
Contributor Author

@davidberard98 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@ColinPeppler
Copy link
Contributor

I ran an A100 benchmark and there's no significant performance regression. LGTM!

@davidberard98
Copy link
Contributor Author

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@github-actions github-actions bot deleted the gh/davidberard98/393/head branch September 4, 2025 02:07
markc-614 pushed a commit to markc-614/pytorch that referenced this pull request Sep 17, 2025
…d is provided (pytorch#159433)

**Motivation / Context**: (what I _think_ is happening here)

In "eager"/just-in-time PT2 usage, dynamo/inductor will guard on whether indices fit in int32 or not. So it's generally safe in Inductor code to rely on the example values for symbolic ints in order to determine whether indices fit in int32, because the indices will be guarded on anyway; and if the inputs ever increase to `>int32_max`, dynamo will cause a recompilation.

But with AOTI, those int32 guards aren't respected; so if the example input is `< int32_max` but can be `> int32_max` during future execution, then the future execution might fail / IMA.

**Solution space**

Export allows users to specify which dimension are dynamic, and to provide **ranges of valid sizes**.

One solution idea is to always respect the upper bound of the dynamic shape range when doing AOTI; if the index's range includes values `>int32_max`, then don't use the hint and assume that this index doesn't fit in int32.

However, the problem with this is that many users may specify dynamism without specifying a range of values - the upper bound of the range will be set to the default of `inf`. Such use cases could potentially experience a perf regression if we implemented the idea above.

To prevent any such regressions, this implementation will rely solely on the specified range only if the upper bound of the range isn't inf. In other words, we'll ignore the hints/example values for AOTI (and rely only on the specified range) only if the upper bound of the range isn't inf - if users explicitly specify a range that extends past int32, we can be fairly sure that they actually do need values `>int32_max`.

If we continue to see correctness issues even with this implementation, we could consider more aggressively relying on the ranges.

Differential Revision: [D79220301](https://our.internmc.facebook.com/intern/diff/D79220301)
Pull Request resolved: pytorch#159433
Approved by: https://github.com/jingsh, https://github.com/ColinPeppler
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants