KEMBAR78
[AOTI] Assert misaligned input by desertfire · Pull Request #142136 · pytorch/pytorch · GitHub
Skip to content

Conversation

@desertfire
Copy link
Contributor

@desertfire desertfire commented Dec 5, 2024

Stack from ghstack (oldest at bottom):

Summary: Fixes #141891. JIT Inductor relies on copy_misaligned_inputs to fix misaligned inputs. For AOTInductor's use scenario, this is an unacceptable performance hit, so we codegen input alignment check at the entry point and throws an error if any misalignment exists.

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @chauhang @aakhundov

Differential Revision: D66881038

Summary: Fixes #141891. JIT Inductor relies on copy_misaligned_inputs to fix misaligned inputs. For AOTInductor's use scenario, this is an unacceptable performance hit, so we codegen input alignment check at the entry point and throws an error if any misalignment exists.

[ghstack-poisoned]
@pytorch-bot
Copy link

pytorch-bot bot commented Dec 5, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/142136

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 4f8fd16 with merge base 91d3054 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Summary: Fixes #141891. JIT Inductor relies on copy_misaligned_inputs to fix misaligned inputs. For AOTInductor's use scenario, this is an unacceptable performance hit, so we codegen input alignment check at the entry point and throws an error if any misalignment exists.

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames chauhang aakhundov

[ghstack-poisoned]
@desertfire desertfire requested review from eellison and ezyang December 5, 2024 21:02
Copy link
Contributor

@eellison eellison left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really know that we should error here vs not error in torch.compile.. but this is still an improvement, and i think a rare case, given our lack of imas so far.

Comment on lines 224 to 226
if ((long({name}.data_ptr()) & ({GPU_ALIGN_BYTES} -1)) != 0) {{
throw std::runtime_error("{name} is not aligned to {GPU_ALIGN_BYTES} bytes");
}}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Within inductor, we don't necessarily specialize on all inputs being aligned. We'll infer alignment from first invocation. It's taken from inputs_to_check.

# JIT Inductor does not guard on input alignment. It relies on copy_misaligned_inputs to
# copy misaligned inputs to aligned buffers. For AOTInductor, we expect users to use it
# as non-Python deployment for its best performance, so implicitly copying misaligned inputs
# to aligned buffers is going to bring a surprising performance hit. Instead, we check input
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm not convinced it's that much of a performance hit. a single copy is usually not that slow..

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chenyang78 , soliciting your opinion. Which one do you think makes more sense for production scenario, clone or assert?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Off the top of my head, I don't recall any production model with misaligned inputs. I slightly prefer adding checks rather than creating clones. If we hit any assertion failures in a production model, we can conduct a real benchmark to check the performance impact of using clones.

Copy link
Contributor

@ezyang ezyang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Must not do this test on inputs that don't have alignment requirement

[ghstack-poisoned]
@desertfire
Copy link
Contributor Author

@desertfire has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Dec 6, 2024
[ghstack-poisoned]
@desertfire
Copy link
Contributor Author

@desertfire has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

[ghstack-poisoned]
@desertfire
Copy link
Contributor Author

@desertfire has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

[ghstack-poisoned]
desertfire added a commit that referenced this pull request Dec 8, 2024
Summary: Fixes #141891. JIT Inductor relies on copy_misaligned_inputs to fix misaligned inputs. For AOTInductor's use scenario, this is an unacceptable performance hit, so we codegen input alignment check at the entry point and throws an error if any misalignment exists.

ghstack-source-id: bba4c68
Pull Request resolved: #142136
@desertfire
Copy link
Contributor Author

@desertfire has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@pytorchbot merge

(Initiating merge automatically since Phabricator Diff has merged)

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

AmdSampsa pushed a commit to AmdSampsa/pytorch that referenced this pull request Dec 9, 2024
Summary: Fixes pytorch#141891. JIT Inductor relies on copy_misaligned_inputs to fix misaligned inputs. For AOTInductor's use scenario, this is an unacceptable performance hit, so we codegen input alignment check at the entry point and throws an error if any misalignment exists.

Differential Revision: [D66881038](https://our.internmc.facebook.com/intern/diff/D66881038)
Pull Request resolved: pytorch#142136
Approved by: https://github.com/eellison, https://github.com/ezyang
ghstack dependencies: pytorch#142133
desertfire added a commit that referenced this pull request Dec 13, 2024
Summary: #142136 added a runtime alignment assertion. But the assumption is probably too strict for more flexible use cases of AOTI, e.g. python deployment, see a recent error torchchat ran into for more details, https://github.com/pytorch/torchchat/actions/runs/12322072267/job/34394851280 . This PR relaxes the runtime check and implements copy_misaligned_inputs in cpp instead.

[ghstack-poisoned]
desertfire added a commit that referenced this pull request Dec 13, 2024
Summary: #142136 added a runtime alignment assertion. But the assumption is probably too strict for more flexible use cases of AOTI, e.g. python deployment, see a recent error torchchat ran into for more details, https://github.com/pytorch/torchchat/actions/runs/12322072267/job/34394851280 . This PR relaxes the runtime check and implements copy_misaligned_inputs in cpp instead.

ghstack-source-id: 76513df
Pull Request resolved: #143236
desertfire added a commit that referenced this pull request Dec 16, 2024
Summary: #142136 added a runtime alignment assertion. But the assumption is probably too strict for more flexible use cases of AOTI, e.g. python deployment, see a recent error torchchat ran into for more details, https://github.com/pytorch/torchchat/actions/runs/12322072267/job/34394851280 . This PR relaxes the runtime check and implements copy_misaligned_inputs in cpp instead.

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames chauhang aakhundov

[ghstack-poisoned]
desertfire added a commit that referenced this pull request Dec 16, 2024
Summary: #142136 added a runtime alignment assertion. But the assumption is probably too strict for more flexible use cases of AOTI, e.g. python deployment, see a recent error torchchat ran into for more details, https://github.com/pytorch/torchchat/actions/runs/12322072267/job/34394851280 . This PR relaxes the runtime check and implements copy_misaligned_inputs in cpp instead.

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames chauhang aakhundov

[ghstack-poisoned]
desertfire added a commit that referenced this pull request Dec 16, 2024
Summary: #142136 added a runtime alignment assertion. But the assumption is probably too strict for more flexible use cases of AOTI, e.g. python deployment, see a recent error torchchat ran into for more details, https://github.com/pytorch/torchchat/actions/runs/12322072267/job/34394851280 . This PR relaxes the runtime check and implements copy_misaligned_inputs in cpp instead.

ghstack-source-id: 989e60a
Pull Request resolved: #143236
pytorchmergebot pushed a commit that referenced this pull request Dec 17, 2024
Summary: #142136 added a runtime alignment assertion. But the assumption is probably too strict for more flexible use cases of AOTI, e.g. python deployment, see a recent error torchchat ran into for more details, https://github.com/pytorch/torchchat/actions/runs/12322072267/job/34394851280 . This PR relaxes the runtime check and implements copy_misaligned_inputs in cpp instead.

Differential Revision: [D67287922](https://our.internmc.facebook.com/intern/diff/D67287922)
Pull Request resolved: #143236
Approved by: https://github.com/malfet, https://github.com/chenyang78
@github-actions github-actions bot deleted the gh/desertfire/518/head branch January 8, 2025 02:05
@fxmarty-amd
Copy link

+1 @desertfire, having dynamo/inductor silently copying inputs is kind of nuts, and from what I have seen in my workload it is a serious overhead for small compiled functions.

Is there a way exposed to users to assert (raise an error) instead of copy when using the high-level torch.compile?

@ezyang
Copy link
Contributor

ezyang commented Jun 25, 2025

@fxmarty-amd yes, this should even be easy, not sure if there's already an issue, if there isn't one can you file us one

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants