-
Notifications
You must be signed in to change notification settings - Fork 25.7k
Open
Labels
NNConcall: jitAdd this issue/PR to JIT oncall triage queueAdd this issue/PR to JIT oncall triage queue
Description
🐛 Bug
A traced model seems to create overflows when int8 tensor values are transformed to int32 on the GPU.
To Reproduce
import torch
import torch.nn as nn
class AddSubNet(nn.Module):
def __init__(self, *args):
self.torch_output0_dtype = args[0][0]
self.torch_output1_dtype = args[0][1]
super(AddSubNet, self).__init__()
def forward(self, input0, input1):
return (input0 + input1).to(self.torch_output0_dtype), \
(input0 - input1).to(self.torch_output1_dtype)
device = 'cpu'
model = AddSubNet((torch.int32, torch.int32)).to(device)
x1 = torch.randint(-127, 128, (16,)).to(torch.int8).to(device)
x2 = torch.randint(-127, 128, (16,)).to(torch.int8).to(device)
model_cpu = torch.jit.trace(model, (x1, x2))
print('input')
print(x1)
print(x2)
out1_cpu, out2_cpu = model_cpu(x1, x2)
print('cpu output')
print(out1_cpu)
print(out2_cpu)
device = 'cuda'
model.to(device)
x1 = x1.to(device)
x2 = x2.to(device)
model_gpu = torch.jit.trace(model, (x1, x2))
#print(model_gpu.graph)
out1, out2 = model_gpu(x1, x2)
print('cuda output')
print(out1)
print(out2)Output:
input
tensor([ -14, 127, -24, 9, -115, -24, -102, -5, 5, 93, 45, -69,
-74, 46, 109, -90], dtype=torch.int8)
tensor([ 32, -46, 13, 78, 109, -84, -104, 76, 29, -97, -90, 73,
17, -105, 34, 117], dtype=torch.int8)
cpu output
tensor([ 18, 81, -11, 87, -6, -108, 50, 71, 34, -4, -45, 4,
-57, -59, -113, 27], dtype=torch.int32)
tensor([ -46, -83, -37, -69, 32, 60, 2, -81, -24, -66, -121, 114,
-91, -105, 75, 49], dtype=torch.int32)
cuda output
tensor([ 18, 81, -11, 87, -6, -108, -206, 71, 34, -4, -45, 4,
-57, -59, 143, 27], device='cuda:0', dtype=torch.int32)
tensor([ -46, 173, -37, -69, -224, 60, 2, -81, -24, 190, 135, -142,
-91, 151, 75, -207], device='cuda:0', dtype=torch.int32)The mismatches look like overflowing values (e.g. 50 vs. -206).
Environment
PyTorch version: 1.11.0.dev20211019+cu111
Is debug build: False
CUDA used to build PyTorch: 11.1
ROCM used to build PyTorch: N/A
OS: Ubuntu 20.04.3 LTS (x86_64)
GCC version: (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
Clang version: Could not collect
CMake version: version 3.21.3
Libc version: glibc-2.31
Reproduced on different GPUs.
Additional information
Enabling nvfuser via:
torch._C._jit_set_nvfuser_enabled(True)
torch._C._jit_set_texpr_fuser_enabled(False)
torch._C._jit_set_profiling_executor(True)
torch._C._jit_set_profiling_mode(True)
torch._C._jit_override_can_fuse_on_cpu(False)
torch._C._jit_override_can_fuse_on_gpu(False)
torch._C._jit_set_bailout_depth(20)yields matching values.
CC @malfet as we've discussed this issue. (Initially I thought it would be ARM-specifc, but that turns out to be wrong)
vadimkantorov
Metadata
Metadata
Assignees
Labels
NNConcall: jitAdd this issue/PR to JIT oncall triage queueAdd this issue/PR to JIT oncall triage queue