-
Notifications
You must be signed in to change notification settings - Fork 25.7k
Closed
Labels
module: cpuCPU specific problem (e.g., perf, algorithm)CPU specific problem (e.g., perf, algorithm)module: linear algebraIssues related to specialized linear algebra operations in PyTorch; includes matrix multiply matmulIssues related to specialized linear algebra operations in PyTorch; includes matrix multiply matmultriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate moduleThis issue has been looked at a team member, and triaged and prioritized into an appropriate module
Description
🐛 Describe the bug
NaNs in the out tensor argument to baddbmm seem to persist after the op computes, despite this being a supposedly write-only argument.
This implies that the op reads out's contents, which is wasted memory bandwidth at the very least.
But it also makes it impossible to trust tensors obtained through, for example, torch.empty or empty_like to be used in the out argument. The naive user will be expecting them to be fully overwritten, as in all other ops that have an out argument.
a, b, c, z = [torch.rand((3,2,2)) for _ in range(4)]
z[:] = torch.nan
torch.addcmul(c, a, b, out=z)
print(z.isnan().sum()) # -> tensor(0), NaNs overwritten, great
z[:] = torch.nan
torch.baddbmm(c, a, b, alpha=1, beta=0, out=z)
print(z.isnan().sum()) # -> tensor(12), `z` is all NaNs
z = c
z[1,1,1] = z[0,0,0] = torch.nan # plant two NaNs
torch.baddbmm(c, a, b, alpha=1, beta=0, out=z)
print(z.isnan().sum()) # -> tensor(2) two NaNs preservedBTW it's also annoying that torch.baddbmm(None, a, b, alpha=1, beta=0) raises TypeError, it should ignore input when beta is 0.
Anyway, the bug above does not depend on alpha or beta.
Versions
Collecting environment information...
PyTorch version: 1.13.1
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A
OS: Ubuntu 20.04.5 LTS (x86_64)
GCC version: Could not collect
Clang version: Could not collect
CMake version: Could not collect
Libc version: glibc-2.31
Python version: 3.9.16 (main, Mar 1 2023, 18:22:10) [GCC 11.2.0] (64-bit runtime)
Python platform: Linux-5.15.90.1-microsoft-standard-WSL2-x86_64-with-glibc2.31
Is CUDA available: False
CUDA runtime version: No CUDA
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
CPU:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
Address sizes: 48 bits physical, 48 bits virtual
CPU(s): 16
On-line CPU(s) list: 0-15
Thread(s) per core: 2
Core(s) per socket: 8
Socket(s): 1
Vendor ID: AuthenticAMD
CPU family: 25
Model: 80
Model name: AMD Ryzen 9 5900HS with Radeon Graphics
Stepping: 0
CPU MHz: 3293.727
BogoMIPS: 6587.45
Virtualization: AMD-V
Hypervisor vendor: Microsoft
Virtualization type: full
L1d cache: 256 KiB
L1i cache: 256 KiB
L2 cache: 4 MiB
L3 cache: 16 MiB
Vulnerability Itlb multihit: Not affected
Vulnerability L1tf: Not affected
Vulnerability Mds: Not affected
Vulnerability Meltdown: Not affected
Vulnerability Mmio stale data: Not affected
Vulnerability Retbleed: Not affected
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp
Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2: Mitigation; Retpolines, IBPB conditional, IBRS_FW, STIBP conditional, RSB filling, PBRSB-eIBRS Not affected
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Not affected
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl tsc_reliable nonstop_tsc cpuid extd_apicid pni pclmulqdq ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy svm cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw topoext perfctr_core ssbd ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 erms invpcid rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves clzero xsaveerptr arat npt nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold v_vmsave_vmload umip vaes vpclmulqdq rdpid fsrm
Versions of relevant libraries:
[pip3] torch==1.13.1
[conda] blas 1.0 mkl
[conda] cpuonly 2.0 0 pytorch
[conda] mkl 2023.0.0 h6d00ec8_25399
[conda] pytorch 1.13.1 py3.9_cpu_0 pytorch
[conda] pytorch-mutex 1.0 cpu pytorch
cc @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10 @jianyuh @nikitaved @pearu @mruberry @walterddr @IvanYashchuk @xwang233 @Lezcano
Metadata
Metadata
Assignees
Labels
module: cpuCPU specific problem (e.g., perf, algorithm)CPU specific problem (e.g., perf, algorithm)module: linear algebraIssues related to specialized linear algebra operations in PyTorch; includes matrix multiply matmulIssues related to specialized linear algebra operations in PyTorch; includes matrix multiply matmultriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate moduleThis issue has been looked at a team member, and triaged and prioritized into an appropriate module