KEMBAR78
[CUDA] fix nansum in non-JIT build by pzzp · Pull Request #158633 · pytorch/pytorch · GitHub
Skip to content

Conversation

@pzzp
Copy link
Contributor

@pzzp pzzp commented Jul 18, 2025

This change fix crash of

import torch
a = torch.tensor([[1, 2]], dtype=torch.complex32).to('cuda')
b = torch.nansum(a, dim=0)
print(b)

This change fix crash of
```
import torch
a = torch.tensor([[1, 2]], dtype=torch.complex32).to('cuda')
b = torch.nansum(a, dim=0)
print(b)
```
@pzzp pzzp requested review from eqy and syed-ahmed as code owners July 18, 2025 09:57
@pytorch-bot
Copy link

pytorch-bot bot commented Jul 18, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/158633

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit 7eb797b with merge base 32aade9 (image):

UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented Jul 18, 2025

CLA Signed


The committers listed above are authorized under a signed CLA.

@malfet
Copy link
Contributor

malfet commented Jul 18, 2025

@pzzp can you sign the CLA please?

@pzzp
Copy link
Contributor Author

pzzp commented Jul 18, 2025

@malfet ✅️

@thenumberouscode
Copy link
Contributor

Hi, I’m just curious why changing acc_t to scalar_t fixes this bug.

@pzzp
Copy link
Contributor Author

pzzp commented Jul 21, 2025

@thenumberouscode
The correct computation process is: input scalar_t (complex half), reduce using the acc_t (complex float) type, and output out_scalar_t (complex half). This bug occurred because the output_scalar_t type was incorrect, leading to miscalculated output size and resulting in unaligned memory access.

image

@pzzp
Copy link
Contributor Author

pzzp commented Jul 27, 2025

@ngimel hi, how can I merge this change?

@ngimel
Copy link
Collaborator

ngimel commented Jul 28, 2025

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Jul 28, 2025
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

yangw-dev pushed a commit that referenced this pull request Aug 1, 2025
This change fix crash of
```
import torch
a = torch.tensor([[1, 2]], dtype=torch.complex32).to('cuda')
b = torch.nansum(a, dim=0)
print(b)
```

Pull Request resolved: #158633
Approved by: https://github.com/ngimel
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk Trigger trunk jobs on your pull request Merged open source release notes: cuda release notes category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants