-
Notifications
You must be signed in to change notification settings - Fork 25.7k
[export] Modify SDPA decomposition to decompose _scaled_dot_product_flash_attention_for_cpu #117097
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…lash_attention_for_cpu Summary: As titled. #115913 added `_scaled_dot_product_flash_attention_for_cpu` and the export result of `scaled_dot_product_attention` includes this op. Adding this decomposition so that it's being decomposed the same way as `_scaled_dot_product_attention_math`. Test Plan: Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/117097
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 6a34f62 with merge base 19e93b8 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
…t_product_flash_attention_for_cpu" Summary: As titled. #115913 added `_scaled_dot_product_flash_attention_for_cpu` and the export result of `scaled_dot_product_attention` includes this op. Adding this decomposition so that it's being decomposed the same way as `_scaled_dot_product_attention_math`. Test Plan: Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
…lash_attention_for_cpu Summary: As titled. #115913 added `_scaled_dot_product_flash_attention_for_cpu` and the export result of `scaled_dot_product_attention` includes this op. Adding this decomposition so that it's being decomposed the same way as `_scaled_dot_product_attention_math`. Test Plan: python test/test_decomp.py -k test_aten_core_operators Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 8651762 Pull Request resolved: #117097
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't we want to keep the decomp of _scaled_dot_product_flash_attention?
|
I have been wanting to fix this in a more correct way which probably is non-trivial. Correct way is really to define decomp for aten sdpa op. I remember discussing this with you, but I still think we should look into decomposing aten sdpa. One not so nice executorch specific workaround is inside to_edge, where before calling run_decomposition, we manually replace instances of aten sdpa with its decomp version. |
why flash attention? |
|
This PR removes one decomposition and registers a decomposition in one of the functions inside it. Why don't we keep the original decomposition in terms of this new decomposition? |
Oh I see. Sorry didnt follow the first time around. Yeah makes sense. Basically you are suggesting to keep definition for both decomp and one of it just calls the other one. @larryliu0820 ? |
|
Yep, although looking at the removed code, the previous decomposition was completely wrong, with variables like logsumexp = torch.empty([batchSize, qSize, num_head, headSize], dtype=torch.float)that are never filled up, so it may be alright to straight up remove it. |
Yeah the previous decomposition was pretty awkward because |
|
@pytorchbot merge |
|
@larryliu0820 can you please deactivate the test for CUDA? |
oh sorry I'm on it |
…t_product_flash_attention_for_cpu" Summary: As titled. #115913 added `_scaled_dot_product_flash_attention_for_cpu` and the export result of `scaled_dot_product_attention` includes this op. Adding this decomposition so that it's being decomposed the same way as `_scaled_dot_product_attention_math`. Test Plan: Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
…lash_attention_for_cpu Summary: As titled. #115913 added `_scaled_dot_product_flash_attention_for_cpu` and the export result of `scaled_dot_product_attention` includes this op. Adding this decomposition so that it's being decomposed the same way as `_scaled_dot_product_attention_math`. Test Plan: python test/test_decomp.py -k test_aten_core_operators Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 45bd1d2 Pull Request resolved: #117097
|
@pytorchbot merge |
Merge failedReason: This PR needs a If not, please add the To add a label, you can comment to pytorchbot, for example For more information, see Details for Dev Infra teamRaised by workflow job |
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Summary: A follow up for #117097. In that PR I didn't add `_scaled_dot_product_attention_for_cpu` into the core_aten_decomposition table. This PR does that and also add a unit test. Test Plan: python test/export/test_export.py -k test_scaled_dot_product_attention Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
Summary: A follow up for #117097. In that PR I didn't add `_scaled_dot_product_attention_for_cpu` into the core_aten_decomposition table. This PR does that and also add a unit test. Test Plan: python test/export/test_export.py -k test_scaled_dot_product_attention Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
Summary: A follow up for #117097. In that PR I didn't add `_scaled_dot_product_attention_for_cpu` into the core_aten_decomposition table. This PR does that and also add a unit test. Test Plan: python test/export/test_export.py -k test_scaled_dot_product_attention Reviewers: Subscribers: Tasks: Tags: cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler [ghstack-poisoned]
Summary: A follow up for #117097. In that PR I didn't add `_scaled_dot_product_attention_for_cpu` into the core_aten_decomposition table. This PR does that and also add a unit test. Test Plan: python test/export/test_export.py -k test_scaled_dot_product_attention Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 8c7f1ab Pull Request resolved: #117390
Summary: A follow up for #117097. In that PR I didn't add `_scaled_dot_product_attention_for_cpu` into the core_aten_decomposition table. This PR does that and also add a unit test. Test Plan: python test/export/test_export.py -k test_scaled_dot_product_attention Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: #117390 Approved by: https://github.com/drisspg
…ult" Summary: A follow up for #117097. In that PR I didn't add `_scaled_dot_product_attention_for_cpu` into the core_aten_decomposition table. This PR does that and also add a unit test. Test Plan: python test/export/test_export.py -k test_scaled_dot_product_attention Reviewers: Subscribers: Tasks: Tags: cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler [ghstack-poisoned]
Summary: A follow up for #117097. In that PR I didn't add `_scaled_dot_product_attention_for_cpu` into the core_aten_decomposition table. This PR does that and also add a unit test. Test Plan: python test/export/test_export.py -k test_scaled_dot_product_attention Reviewers: Subscribers: Tasks: Tags: cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler [ghstack-poisoned]
A follow up for #117097. In that PR I didn't add `_scaled_dot_product_attention_for_cpu` into the core_aten_decomposition table. This PR does that and also add a unit test. Pull Request resolved: #117390 Approved by: https://github.com/drisspg Internal: << DO NOT EDIT BELOW THIS LINE >> Differential Revision: [D52788012](https://our.internmc.facebook.com/intern/diff/D52788012/) ghstack-source-id: 212131226
Stack from ghstack (oldest at bottom):
Summary: As titled. #115913 added
_scaled_dot_product_flash_attention_for_cpuand the export result ofscaled_dot_product_attentionincludes this op. Adding thisdecomposition so that it's being decomposed the same way as
_scaled_dot_product_attention_math.Test Plan:
Reviewers:
Subscribers:
Tasks:
Tags: