ROCm context parallel backward lse not scaled

### 🐛 Describe the bug

based on this issue: https://github.com/pytorch/pytorch/issues/156012 and this PR: https://github.com/pytorch/pytorch/pull/156903

The fix patched forward but did not patch backward.
To patch backward, add
`logsumexp /= 0.6931471805599453`
at  
https://github.com/ROCm/pytorch/blob/cfa0de7c5151cfd4d036b2b4ee6d35a37bd7a983/torch/distributed/tensor/experimental/_attention.py#L498

### Versions

before patching gradient diff is 1e-1, after patching is 1e-7

cc @jeffdaily @sunway513 @jithunnair-amd @pruthvistony @ROCmSupport @dllehr-amd @jataylo @hongxiayang @naromero77amd

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ROCm context parallel backward lse not scaled #163958

🐛 Describe the bug

Versions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

ROCm context parallel backward lse not scaled #163958

Description

🐛 Describe the bug

Versions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions