multi_head_attention_forward generates different values on MPS compared to CPU

### 🐛 Describe the bug

`multi_head_attention_forward` will generate different values on MPS compared to CPU with same inputs.
I don't have a MPS machine to reproduce this issue. You can refer to https://github.com/pytorch/pytorch/actions/runs/6561612634/job/17822025576.
`scaled_dot_product_attention` should have same issue:
https://github.com/pytorch/pytorch/actions/runs/6610481092/job/17952919631.

FP32 output on CPU:  
```
(tensor([[[-6.5419e+02, -8.7080e+01],
[ 1.2814e+02, -1.7165e+02]],
[[-1.3241e+03, -1.7267e+02],
[ 1.2814e+02, -1.7165e+02]],
[[-1.4078e+03, -3.3899e+02],
[-2.6367e-02, -3.5078e+00]]], grad_fn=<ViewBackward0>), tensor([[[1.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00],
[3.0850e-09, 1.0000e+00, 0.0000e+00, 1.8921e-10],
[0.0000e+00, 1.0000e+00, 1.0000e+00, 0.0000e+00]],
[[0.0000e+00, 2.0000e+00, 0.0000e+00, 0.0000e+00],
[0.0000e+00, 2.0000e+00, 0.0000e+00, 0.0000e+00],
[0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00]]],
grad_fn=<MeanBackward1>))
```
FP32 output on MPS:
```
(tensor([[[-2.9954e+02, -5.9902e+02],
[-2.6367e-02, -3.5078e+00]],
[[-1.3241e+03, -1.7267e+02],
[-9.9200e+01, -2.0069e+02]],
[[-1.3241e+03, -1.7267e+02],
[-9.2043e+02, -3.0561e+02]]], device='mps:0', grad_fn=<ViewBackward0>), tensor([[[0.0000e+00, 1.0000e+00, 0.0000e+00, 0.0000e+00],
[0.0000e+00, 1.0000e+00, 0.0000e+00, 1.8921e-10],
[0.0000e+00, 1.0000e+00, 0.0000e+00, 0.0000e+00]],
[[0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00],
[0.0000e+00, 1.0000e+00, 0.0000e+00, 0.0000e+00],
[0.0000e+00, 1.0000e+00, 1.0000e+00, 0.0000e+00]]], device='mps:0',
grad_fn=<MeanBackward1>))
```

### Versions

PyTorch version: 2.2.0a0+git5fa0c13
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A

OS: CentOS Linux release 8.5.2111 (x86_64)
GCC version: (GCC) 11.2.1 20210728 (Red Hat 11.2.1-1)
Clang version: 16.0.0 (Red Hat 16.0.0-2.module_el8+405+25122a8c)
CMake version: version 3.21.4
Libc version: glibc-2.28

Python version: 3.8.5 (default, Sep  4 2020, 07:30:14)  [GCC 7.3.0] (64-bit runtime)
Python platform: Linux-5.16.0-rc1-intel-next-00543-g5867b0a2a125-x86_64-with-glibc2.10
Is CUDA available: False
CUDA runtime version: No CUDA
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

Versions of relevant libraries:
[pip3] flake8==3.8.2
[pip3] flake8-bugbear==20.1.4
[pip3] flake8-coding==1.3.3
[pip3] flake8-comprehensions==3.3.0
[pip3] flake8-executable==2.0.4
[pip3] flake8-pyi==20.5.0
[pip3] intel-extension-for-pytorch==2.2.0+gite7090c6
[pip3] mypy==1.4.1
[pip3] mypy-extensions==1.0.0
[pip3] numpy==1.22.4
[pip3] onnx==1.14.1
[pip3] onnxruntime==1.15.1
[pip3] onnxscript==0.1.0.dev20230830
[pip3] torch==2.2.0a0+git29048be
[pip3] torchvision==0.16.0a0+fb115c2
[pip3] triton==2.0.0
[conda] intel-extension-for-pytorch 2.2.0+gite7090c6           dev_0    <develop>
[conda] mkl                       2022.1.0           hc2b9512_224  
[conda] mkl-include               2023.1.0                 pypi_0    pypi
[conda] mkl-static                2023.1.0                 pypi_0    pypi
[conda] numpy                     1.22.4                   pypi_0    pypi
[conda] torch                     2.2.0a0+git29048be           dev_0    <develop>
[conda] torchvision               0.16.0a0+fb115c2           dev_0    <develop>
[conda] triton                    2.0.0                    pypi_0    pyp

cc @kulinseth @albanD @malfet @DenisVieriu97 @jhavukainen @razarmehr @abhudev

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

multi_head_attention_forward generates different values on MPS compared to CPU #111479

🐛 Describe the bug

Versions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

multi_head_attention_forward generates different values on MPS compared to CPU #111479

Description

🐛 Describe the bug

Versions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions