[AOTI] Fix a two-pass kernel missmatch #141041

desertfire · 2024-11-19T16:43:04Z

Stack from ghstack (oldest at bottom):

-> [AOTI] Fix a two-pass kernel missmatch #141041

Summary: Fixes #140766. In AOTI's two-pass codegen, the first pass generates triton_per_fused_add_native_layer_norm_4, and the second pass generates triton_red_fused_add_native_layer_norm_4. While this problem will go away with the incoming one-pass implementation, further debugging reveals there is a mismatch in has_non_contiguous_pw_in_reduction_kernel between the two passes, due to a symbol comparsion problem in stride1_for_last_dim.

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @chauhang @aakhundov

Differential Revision: D66203298

Summary: Fixes #140766. In AOTI's two-pass codegen, the first pass generates triton_per_fused_add_native_layer_norm_4, and the second pass generates triton_red_fused_add_native_layer_norm_4. While this problem will go away with the incoming one-pass implementation, further debugging reveals there is a mismatch in has_non_contiguous_pw_in_reduction_kernel between the two passes, due to a symbol comparsion problem in stride1_for_last_dim. [ghstack-poisoned]

Summary: Fixes #140766. In AOTI's two-pass codegen, the first pass generates triton_per_fused_add_native_layer_norm_4, and the second pass generates triton_red_fused_add_native_layer_norm_4. While this problem will go away with the incoming one-pass implementation, further debugging reveals there is a mismatch in has_non_contiguous_pw_in_reduction_kernel between the two passes, due to a symbol comparsion problem in stride1_for_last_dim. ghstack-source-id: e431769 Pull Request resolved: #141041

pytorch-bot · 2024-11-19T16:43:09Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/141041

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

[DomainsOnly] Jobs fail with GLIBC version not found

✅ No Failures

As of commit cbe5448 with merge base b379a28 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

desertfire · 2024-11-19T23:39:54Z

@desertfire has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Summary: Fixes #140766. In AOTI's two-pass codegen, the first pass generates triton_per_fused_add_native_layer_norm_4, and the second pass generates triton_red_fused_add_native_layer_norm_4. While this problem will go away with the incoming one-pass implementation, further debugging reveals there is a mismatch in has_non_contiguous_pw_in_reduction_kernel between the two passes, due to a symbol comparsion problem in stride1_for_last_dim. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames chauhang aakhundov Differential Revision: [D66203298](https://our.internmc.facebook.com/intern/diff/D66203298) [ghstack-poisoned]

Summary: Fixes #140766. In AOTI's two-pass codegen, the first pass generates triton_per_fused_add_native_layer_norm_4, and the second pass generates triton_red_fused_add_native_layer_norm_4. While this problem will go away with the incoming one-pass implementation, further debugging reveals there is a mismatch in has_non_contiguous_pw_in_reduction_kernel between the two passes, due to a symbol comparsion problem in stride1_for_last_dim. ghstack-source-id: 4fbeb19 Pull Request resolved: #141041

desertfire · 2024-11-20T15:16:28Z

@desertfire has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2024-11-20T23:27:02Z

@pytorchbot merge

(Initiating merge automatically since Phabricator Diff has merged)

pytorchmergebot · 2024-11-20T23:28:42Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

Summary: Fixes pytorch#140766. In AOTI's two-pass codegen, the first pass generates triton_per_fused_add_native_layer_norm_4, and the second pass generates triton_red_fused_add_native_layer_norm_4. While this problem will go away with the incoming one-pass implementation, further debugging reveals there is a mismatch in has_non_contiguous_pw_in_reduction_kernel between the two passes, due to a symbol comparsion problem in stride1_for_last_dim. Differential Revision: [D66203298](https://our.internmc.facebook.com/intern/diff/D66203298) Pull Request resolved: pytorch#141041 Approved by: https://github.com/shunting314

pytorch-bot bot added ciflow/inductor module: inductor labels Nov 19, 2024

desertfire added topic: bug fixes topic category release notes: inductor labels Nov 19, 2024

desertfire requested review from angelayi and shunting314 November 19, 2024 17:58

shunting314 approved these changes Nov 19, 2024

View reviewed changes

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Nov 19, 2024

pytorchmergebot added the merging label Nov 20, 2024

pytorchmergebot added the Merged label Nov 20, 2024

pytorchmergebot closed this in 040af30 Nov 20, 2024

pytorchmergebot removed the merging label Nov 20, 2024

github-actions bot deleted the gh/desertfire/509/head branch December 21, 2024 02:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[AOTI] Fix a two-pass kernel missmatch #141041

[AOTI] Fix a two-pass kernel missmatch #141041

Uh oh!

desertfire commented Nov 19, 2024 •

edited

Loading

Uh oh!

pytorch-bot bot commented Nov 19, 2024 •

edited

Loading

Uh oh!

desertfire commented Nov 19, 2024

Uh oh!

desertfire commented Nov 20, 2024

Uh oh!

facebook-github-bot commented Nov 20, 2024

Uh oh!

pytorchmergebot commented Nov 20, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[AOTI] Fix a two-pass kernel missmatch #141041

[AOTI] Fix a two-pass kernel missmatch #141041

Uh oh!

Conversation

desertfire commented Nov 19, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Nov 19, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/141041

❗ 1 Active SEVs

✅ No Failures

Uh oh!

desertfire commented Nov 19, 2024

Uh oh!

desertfire commented Nov 20, 2024

Uh oh!

facebook-github-bot commented Nov 20, 2024

Uh oh!

pytorchmergebot commented Nov 20, 2024

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

desertfire commented Nov 19, 2024 •

edited

Loading

pytorch-bot bot commented Nov 19, 2024 •

edited

Loading