[ROCm] Prevent accidental enablement of efficient attention. (#134531) #1565

xinyazhang · 2024-08-28T16:30:46Z

[ROCm] Prevent accidental enablement of efficient attention. (pytorch#133331)

Currently Efficient attention and Flash attention share the same set of GPU kernels on ROCM and have common limitations on head sizes.

(cherry picked from commit 46ecc67)

…#134531) [ROCm] Prevent accidental enablement of efficient attention. (pytorch#133331) Currently Efficient attention and Flash attention share the same set of GPU kernels on ROCM and have common limitations on head sizes. Fixes pytorch#132004 Pull Request resolved: pytorch#133331 Approved by: https://github.com/malfet, https://github.com/jithunnair-amd (cherry picked from commit 46ecc67) Co-authored-by: Xinya Zhang <Xinya.Zhang@amd.com>

pruthvistony approved these changes Sep 9, 2024

View reviewed changes

pruthvistony merged commit 7e5ac3e into release/2.4 Sep 9, 2024
1 check failed

pruthvistony deleted the xinyazhang/no257meff-rocm-2.4 branch September 9, 2024 06:25

Provide feedback