Enable XPU path for FlexAttention #143553

liangan1 · 2024-12-19T04:39:48Z

Motivation

The Attention has been the critical performance bottleneck in the current LLM models, and FlexAttention is a good choice to cover the broad variants in the transformers series models. With FlexAttention, it is easy for us to enable the paged attention and fused SDPA in the transformers repo on XPU device. Besides, it also provide a candidate to process attention in LLM ecosystem libraries ., e.g., vLLM, SGLang on XPU device.
FlexAttention is good start point to push the intel triton based GEMM kernel to be matured. FlexAttention provide both flexattention kernel and flexdecoding kernel to cover both compute bound and memory bound GEMM computation, and different shapes should also been supported to serve LLM inference., e.g. head_dim=64, 96, 128, 256.

What does this PR do?

Enable the device type for Flexattention kernel and UTs to ensure all important UTs pass on XPU device.
For E2E model inference, ensure the functionality of LLM models inference with FlexAttention to be ready.

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov @coconutruben @Lucaskabela @yf225 @ColinPeppler @desertfire

pytorch-bot · 2024-12-19T04:39:51Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/143553

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

ROCm MI2xx CI/CD workflows failing due to : download from https://api.github.com/repos/pytorch/pytorch timed out.

❌ 1 New Failure, 4 Unrelated Failures

As of commit 29dbb36 with merge base d153af7 ():

NEW FAILURE - The following job has failed:

xpu / linux-jammy-xpu-n-py3.10 / test (default, 6, 8, linux.idc.xpu) (gh)
inductor/test_fxir_backend.py::AOTFxirTestCase::test_aoti_fx_dynamic

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

trunk / macos-py3-arm64 / test (default, 2, 3, macos-m1-stable) (gh) (similar failure)
Build left local git repository checkout dirty
xpu / linux-jammy-xpu-n-py3.10 / test (default, 3, 8, linux.idc.xpu) (gh) (similar failure)
'Test'
xpu / linux-jammy-xpu-n-py3.10 / test (default, 5, 8, linux.idc.xpu) (gh) (similar failure)
'Test'

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

xpu / linux-jammy-xpu-n-py3.10 / test (default, 1, 8, linux.idc.xpu) (gh) (trunk failure)
test_openreg.py::TestOpenReg::test_printing

This comment was automatically generated by Dr. CI and updates every 15 minutes.

linux-foundation-easycla · 2024-12-19T04:39:54Z

The committers listed above are authorized under a signed CLA.

✅ login: retonym / name: Mao Yunfei (6da92dd, 5beaca3, b51f840, e0be178)
✅ login: hoshibara / name: Xingyuan Li (ad280b4, ba427ee, e9818be, 749392c, bde2d86, 9e316fd, f7cff4e, b9ea95b, 5250f64, ee9682f, fd14da6, 46df31b, f79f883, ffdb463, 401b399, 851aee1, 73db0c2, 8e56615, cbb7fe6, 0e89cf0, b6caa7d, 5ba57df, 24dc060, 453ec83, 5717803, 58a681a, 2843189, 13cb614, 214550b, 28715ed, fc93c31, f47e886, 8a3421f, b32ee44, 3d992ad, e642f6e, 973ebaa, 9390351, 7594b64, 1585ef8, 7f8814b, 7d4c320, dfec943, 0107952, 14ccfeb, 962ee1b, b7bbb96, 2c279f2, dc718c8, 9d6eb25, 500ed41, aa4be5a, dce24dd, b21033c, 90b6f0b, 47757b2, 54d8df9, b83e482, b514310, c810417, 685bfda, 471592d, 2981e41, 24a2602, f6b827b, 39971c1, e4bc4b9, b1c8ef1, d2ee416, 46e9aac, 91c3709, d87b92c, 75a52c5, 272c817, 3e1d1af, 486757c, 4d77dc2, 6ff44b3, a12cc6f, 33cb3e1, 96f44d7, 5acb43d, 7e69c40, a6f267c, 6b442c7, 377eef5, 800bed4, ca1f749, 0b4e4fc, 580a2f0, 66179fe, de7babd, ff7c5e4, 07e3bd5, fcda7b3, 702b06e, 3de28ca, 49859fe, 7579c4c, 70a107b, 1bf03c3, 0fafd2b, 1a9d880, 9fad36f, e6ae1ed, 34ffc7d, c5f2f4c, d5fe4d3, 8bad838, d63329f, d4b388c, eee6cb8, 9ca6d95, 29dbb36, 8fe1251)
✅ login: majing921201 / name: majing (553de0c, 619619f)
✅ login: liangan1 / name: Zhang, Liangang (e0d24ce, 0f4578c, 2ff0774, b557dde, bbc1fc4, 2cc84f2, 113dcf9, 686e05d, 92220a0, dd87093)
✅ login: xiaowangintel / name: Xiao, Wang (6460dc2, 1f9fb66)

pytorch-bot · 2024-12-24T02:08:47Z

To add the ciflow label ciflow/xpu please first approve the workflows that are awaiting approval (scroll to the bottom of this page).

This helps ensure we don't trigger CI on this PR until it is actually authorized to do so. Please ping one of the reviewers if you do not have access to approve and run workflows.

pytorch-bot · 2024-12-24T02:14:21Z

To add the ciflow label ciflow/xpu please first approve the workflows that are awaiting approval (scroll to the bottom of this page).

This helps ensure we don't trigger CI on this PR until it is actually authorized to do so. Please ping one of the reviewers if you do not have access to approve and run workflows.

…mask

…n crash

liangan1 · 2025-02-10T00:42:10Z

@pytorchbot rebase

pytorch-bot · 2025-02-10T00:42:14Z

You don't have permissions to rebase this PR since you are a first time contributor. If you think this is a mistake, please contact PyTorch Dev Infra.

hoshibara · 2025-08-28T07:44:31Z

@pytorchbot label ciflow/xpu

hoshibara · 2025-08-28T07:44:46Z

@pytorchbot merge

pytorch-bot · 2025-08-28T07:44:52Z

Pull workflow has not been scheduled for the PR yet. It could be because author doesn't have permissions to run those or skip-checks keywords were added to PR/commits, aborting merge. Please get/give approval for the workflows and/or remove skip ci decorators before next merge attempt. If you think this is a mistake, please contact PyTorch Dev Infra.

hoshibara · 2025-08-28T07:47:49Z

@pytorchbot label ciflow/trunk

pytorch-bot · 2025-08-28T07:48:00Z

To add these label(s) (ciflow/trunk) to the PR, please first approve the workflows that are awaiting approval (scroll to the bottom of this page).

This helps ensure we don't trigger CI on this PR until it is actually authorized to do so. Please ping one of the reviewers if you do not have access to approve and run workflows.

This reverts commit e6ae1ed.

enable tma case for xpu

hoshibara · 2025-08-29T03:25:33Z

Hi @EikanWang
The TMA code has been reverted. TMA related UT shows that USE_TMA flag can be interpreted correctly on XPU.

EikanWang · 2025-08-29T20:36:23Z

@pytorchbot merge -i

pytorchmergebot · 2025-08-29T20:38:15Z

Merge started

Your change will be merged while ignoring the following 4 checks: xpu / linux-jammy-xpu-n-py3.10 / test (default, 1, 8, linux.idc.xpu), xpu / linux-jammy-xpu-n-py3.10 / test (default, 5, 8, linux.idc.xpu), xpu / linux-jammy-xpu-n-py3.10 / test (default, 3, 8, linux.idc.xpu), xpu / linux-jammy-xpu-n-py3.10 / test (default, 6, 8, linux.idc.xpu)

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

guangyey · 2025-09-02T16:58:23Z

torch/testing/_internal/common_device_type.py

-
-            if not _has_sufficient_memory(_device, size_bytes):
+            # TODO: Memory availability checks for Intel GPU
+            if device != "xpu" and not _has_sufficient_memory(_device, size_bytes):


Here changed the logic of largeTensorTest. It disabled largeTensorTest on XPU device which results in the failure of python test/dynamo/test_aot_autograd_cache.py AOTAutogradCacheTests.test_autograd_inductor_guards_device_xpu_float16_requires_grad_True.

In e6ae1ed, we attempted to complete the sufficient memory check for XPU, but it caused some previously skipped cases to fail.
This issue needs a new PR to fix.

@hoshibara Please fix those failures ASAP.

Raise #162034 for fixing this case

Guangye‘s PR #161988 will fix this issue.

@hoshibara Thanks. PR landed.

Tensor descriptor implementation is not used without this patch. The change in `flex_attention.py` is removed from pytorch/pytorch#143553 before merging. The requirement in `can_use_tma` was too restrictive for using tensor descriptor. Fixes #5036 Benchmark CI: https://github.com/intel/intel-xpu-backend-for-triton/actions/runs/17456013457 Signed-off-by: Whitney Tsang <whitney.tsang@intel.com>

[#RFC153024](pytorch#153024) **Motivation** 1. The Attention has been the critical performance bottleneck in the current LLM models, and FlexAttention is a good choice to cover the broad variants in the transformers series models. With FlexAttention, it is easy for us to enable the paged attention and fused SDPA in the transformers repo on XPU device. Besides, it also provide a candidate to process attention in LLM ecosystem libraries ., e.g., vLLM, SGLang on XPU device. 2. FlexAttention is good start point to push the intel triton based GEMM kernel to be matured. FlexAttention provide both flexattention kernel and flexdecoding kernel to cover both compute bound and memory bound GEMM computation, and different shapes should also been supported to serve LLM inference., e.g. head_dim=64, 96, 128, 256. **What does this PR do?** 1. Enable the device type for Flexattention kernel and UTs to ensure all important UTs pass on XPU device. 2. For E2E model inference, ensure the functionality of LLM models inference with FlexAttention to be ready. Pull Request resolved: pytorch#143553 Approved by: https://github.com/EikanWang, https://github.com/drisspg Co-authored-by: Mao Yunfei <yunfei.mao@intel.com> Co-authored-by: Xingyuan Li <xingyuan.li@intel.com> Co-authored-by: majing <jing1.ma@intel.com> Co-authored-by: Xiao, Wang <wang.xiao@intel.com>

Enable XPU path for FlexAttention

0f4578c

pytorch-bot bot added the module: inductor label Dec 19, 2024

liangan1 marked this pull request as draft December 19, 2024 04:39

pytorchbot added the open source label Dec 19, 2024

liangan1 added 3 commits December 18, 2024 23:14

Enable flexAttention UT

2cc84f2

Fix the fp64 error

113dcf9

Add xpu for _validate_device

b557dde

EikanWang added topic: not user facing topic category triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module ciflow/xpu Run XPU CI tasks labels Dec 24, 2024

pytorch-bot bot removed the ciflow/xpu Run XPU CI tasks label Dec 24, 2024

EikanWang self-requested a review December 24, 2024 02:14

EikanWang added the ciflow/xpu Run XPU CI tasks label Dec 24, 2024

pytorch-bot bot removed the ciflow/xpu Run XPU CI tasks label Dec 24, 2024

liangan1 and others added 5 commits December 23, 2024 19:31

minor changes

686e05d

support test_flex_decoding.py with xpu device

5beaca3

For test_flex_attention.py, add missing device info for create_block_…

6da92dd

…mask

temporarily use rocm config to avoid flex-attention backward assertio…

b51f840

…n crash

Merge branch 'main' into liangan1/flex_attention

2ff0774

Fix xpu rebase bug

bbc1fc4

liangan1 mentioned this pull request Feb 25, 2025

[FlexAttention] get poor performance with Triton based GEMM kernel. intel/intel-xpu-backend-for-triton#3518

Closed

enable new flexattention xpu ut

e0be178

pytorch-bot bot added the module: dynamo label Feb 27, 2025

remove debug config

9d6eb25

pytorch-bot bot added the ciflow/xpu Run XPU CI tasks label Aug 28, 2025

Merge branch 'main' into liangan1/flex_attention

d5fe4d3

pytorch-bot bot removed the ciflow/xpu Run XPU CI tasks label Aug 29, 2025

Revert "align mem check for Intel GPU with cuda"

d63329f

This reverts commit e6ae1ed.

ZhiweiYan-96 added the ciflow/xpu Run XPU CI tasks label Aug 29, 2025

revert xpu bias tma path

29dbb36

enable tma case for xpu

pytorch-bot bot removed the ciflow/xpu Run XPU CI tasks label Aug 29, 2025

etaf added the ciflow/xpu Run XPU CI tasks label Aug 29, 2025

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Aug 29, 2025

pytorchmergebot added the merging label Aug 29, 2025

pytorchmergebot added the Merged label Aug 29, 2025

pytorchmergebot closed this in 3e45949 Aug 29, 2025

github-project-automation bot moved this from In Progress to Done in PyTorch Intel Aug 29, 2025

pytorchmergebot removed the merging label Aug 29, 2025

Valentine233 mentioned this pull request Sep 2, 2025

[Flex Attn][CPU] support flash decoding for cpu #159835

Open

guangyey reviewed Sep 2, 2025

View reviewed changes

hoshibara mentioned this pull request Sep 3, 2025

[Inductor UT][Intel GPU] Align the _has_sufficient_memory check with CUDA #162034

Closed

whitneywhtsang mentioned this pull request Sep 4, 2025

[FlexAttn] Fix performance degradation intel/intel-xpu-backend-for-triton#5038

Merged

Enable XPU path for FlexAttention #143553

Enable XPU path for FlexAttention #143553

Uh oh!

Conversation

liangan1 commented Dec 19, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Dec 19, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/143553

❗ 1 Active SEVs

❌ 1 New Failure, 4 Unrelated Failures

Uh oh!

linux-foundation-easycla bot commented Dec 19, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Dec 24, 2024

Uh oh!

pytorch-bot bot commented Dec 24, 2024

Uh oh!

liangan1 commented Feb 10, 2025

Uh oh!

pytorch-bot bot commented Feb 10, 2025

Uh oh!

hoshibara commented Aug 28, 2025

Uh oh!

hoshibara commented Aug 28, 2025

Uh oh!

pytorch-bot bot commented Aug 28, 2025

Uh oh!

hoshibara commented Aug 28, 2025

Uh oh!

pytorch-bot bot commented Aug 28, 2025

Uh oh!

hoshibara commented Aug 29, 2025

Uh oh!

EikanWang commented Aug 29, 2025

Uh oh!

pytorchmergebot commented Aug 29, 2025

Merge started

Uh oh!

guangyey Sep 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hoshibara Sep 2, 2025

Choose a reason for hiding this comment

Uh oh!

etaf Sep 2, 2025

Choose a reason for hiding this comment

Uh oh!

hoshibara Sep 3, 2025

Choose a reason for hiding this comment

Uh oh!

hoshibara Sep 3, 2025

Choose a reason for hiding this comment

Uh oh!

guangyey Sep 4, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

18 participants

liangan1 commented Dec 19, 2024 •

edited

Loading

pytorch-bot bot commented Dec 19, 2024 •

edited

Loading

linux-foundation-easycla bot commented Dec 19, 2024 •

edited

Loading

guangyey Sep 2, 2025 •

edited

Loading