[AOTI] Add a multi_arch_kernel_binary option #154413

desertfire · 2025-05-27T13:30:19Z

Stack from ghstack (oldest at bottom):

Summary: CUDA can support multi-arch with the fatbin format. Add this multi_arch_kernel_binary option, so the compiled model binary can run across different GPU archs.

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov

Differential Revision: D75452094

Summary: CUDA can support multi-arch with the fatbin format. Add this multi_arch_kernel_binary option, so the compiled model binary can run across different GPU archs. [ghstack-poisoned]

pytorch-bot · 2025-05-27T13:30:23Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/154413

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (3 Unrelated Failures)

As of commit 73ff440 with merge base ef6306e ():

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

inductor / cuda12.8-py3.10-gcc9-sm86 / test (inductor_torchbench, 2, 2, linux.g5.4xlarge.nvidia.gpu) (gh) (similar failure)
opacus_cifar10
inductor / linux-jammy-cpu-py3.9-gcc11-inductor / test (cpu_inductor_torchbench, 2, 2, linux.8xlarge.amx) (gh) (similar failure)
opacus_cifar10
inductor / linux-jammy-cpu-py3.9-gcc11-inductor / test (dynamic_cpu_inductor_torchbench, 2, 2, linux.8xlarge.amx) (gh) (similar failure)
opacus_cifar10

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Summary: CUDA can support multi-arch with the fatbin format. Add this multi_arch_kernel_binary option, so the compiled model binary can run across different GPU archs. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov [ghstack-poisoned]

desertfire · 2025-05-27T14:38:10Z

@desertfire has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Summary: CUDA can support multi-arch with the fatbin format. Add this multi_arch_kernel_binary option, so the compiled model binary can run across different GPU archs. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov Differential Revision: [D75452094](https://our.internmc.facebook.com/intern/diff/D75452094) [ghstack-poisoned]

desertfire · 2025-05-27T17:36:42Z

@desertfire has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

pytorchmergebot · 2025-05-27T18:18:57Z

Starting merge as part of PR stack under #154414

pytorchmergebot · 2025-05-28T00:38:24Z

Starting merge as part of PR stack under #154414

pytorchmergebot · 2025-05-28T00:41:51Z

Starting merge as part of PR stack under #154414

pytorchmergebot · 2025-05-28T01:20:02Z

Starting merge as part of PR stack under #154414

Summary: Add support of multi_arch_kernel_binary in the package_cpp_only mode. More specifically, generate specific cmake targets to compile .ptx to .fatbin and embed them in the final shared library or binary. Differential Revision: [D75452096](https://our.internmc.facebook.com/intern/diff/D75452096) Pull Request resolved: #154414 Approved by: https://github.com/angelayi ghstack dependencies: #154412, #154413

Summary: Add support of multi_arch_kernel_binary in the package_cpp_only mode. More specifically, generate specific cmake targets to compile .ptx to .fatbin and embed them in the final shared library or binary. Differential Revision: [D75452096](https://our.internmc.facebook.com/intern/diff/D75452096) Pull Request resolved: #154414 Approved by: https://github.com/angelayi ghstack dependencies: #154412, #154413 ghstack-source-id: 55d13e3

…l_binary option for XPU." Following the design of #154413, this PR add XPU support for generating kernel binary files that support multiple archs. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov [ghstack-poisoned]

…for XPU." Following the design of #154413, this PR add XPU support for generating kernel binary files that support multiple archs. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov [ghstack-poisoned]

…l_binary option for XPU." Following the design of #154413, this PR add XPU support for generating kernel binary files that support multiple archs. Fixes #154682, Fixes #154683, Fixes 154689, Fixes #154685 , Fixes #154690, Fixes #154681 cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov [ghstack-poisoned]

…for XPU." Following the design of #154413, this PR add XPU support for generating kernel binary files that support multiple archs. Fixes #154682, Fixes #154683, Fixes 154689, Fixes #154685 , Fixes #154690, Fixes #154681 cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov [ghstack-poisoned]

…154514) Following the design of #154413, this PR add XPU support for generating kernel binary files that support multiple archs. Fixes #154682, Fixes #154683, Fixes 154689, Fixes #154685 , Fixes #154690, Fixes #154681 Pull Request resolved: #154514 Approved by: https://github.com/desertfire, https://github.com/EikanWang

…ytorch#154514) Following the design of pytorch#154413, this PR add XPU support for generating kernel binary files that support multiple archs. Fixes pytorch#154682, Fixes pytorch#154683, Fixes 154689, Fixes pytorch#154685 , Fixes pytorch#154690, Fixes pytorch#154681 Pull Request resolved: pytorch#154514 Approved by: https://github.com/desertfire, https://github.com/EikanWang

[AOTI] Add a multi_arch_kernel_binary option

193563b

Summary: CUDA can support multi-arch with the fatbin format. Add this multi_arch_kernel_binary option, so the compiled model binary can run across different GPU archs. [ghstack-poisoned]

pytorch-bot bot added ciflow/inductor module: inductor labels May 27, 2025

desertfire mentioned this pull request May 27, 2025

[AOTI] Support multi-arch when using package_cpp_only #154414

Closed

desertfire mentioned this pull request May 27, 2025

[AOTI][refactor] Rename embed_cubin to embed_kernel_binary #154412

Closed

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label May 27, 2025

desertfire added the release notes: inductor (aoti) label May 27, 2025

angelayi approved these changes May 27, 2025

View reviewed changes

pytorchmergebot closed this in cde82d2 May 28, 2025

pytorchmergebot added the Merged label May 28, 2025

etaf mentioned this pull request May 28, 2025

[AOTI][Intel GPU] Support multi_arch_kernel_binary option for XPU. #154514

Closed

github-actions bot deleted the gh/desertfire/578/head branch June 27, 2025 02:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[AOTI] Add a multi_arch_kernel_binary option #154413

[AOTI] Add a multi_arch_kernel_binary option #154413

Uh oh!

desertfire commented May 27, 2025 •

edited

Loading

Uh oh!

pytorch-bot bot commented May 27, 2025 •

edited

Loading

Uh oh!

desertfire commented May 27, 2025

Uh oh!

desertfire commented May 27, 2025

Uh oh!

pytorchmergebot commented May 27, 2025

Uh oh!

pytorchmergebot commented May 28, 2025

Uh oh!

pytorchmergebot commented May 28, 2025

Uh oh!

pytorchmergebot commented May 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[AOTI] Add a multi_arch_kernel_binary option #154413

[AOTI] Add a multi_arch_kernel_binary option #154413

Uh oh!

Conversation

desertfire commented May 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented May 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/154413

✅ You can merge normally! (3 Unrelated Failures)

Uh oh!

desertfire commented May 27, 2025

Uh oh!

desertfire commented May 27, 2025

Uh oh!

pytorchmergebot commented May 27, 2025

Uh oh!

pytorchmergebot commented May 28, 2025

Uh oh!

pytorchmergebot commented May 28, 2025

Uh oh!

pytorchmergebot commented May 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

desertfire commented May 27, 2025 •

edited

Loading

pytorch-bot bot commented May 27, 2025 •

edited

Loading