JIT vs. eager mismatches for jit.traced `int8` to `int32` casting

## 🐛 Bug

A traced model seems to create overflows when `int8` tensor values are transformed to `int32` on the GPU.

## To Reproduce

```python
import torch
import torch.nn as nn

class AddSubNet(nn.Module):
    def __init__(self, *args):
        self.torch_output0_dtype = args[0][0]
        self.torch_output1_dtype = args[0][1]
        super(AddSubNet, self).__init__()

    def forward(self, input0, input1):
        return (input0 + input1).to(self.torch_output0_dtype), \
               (input0 - input1).to(self.torch_output1_dtype)

device = 'cpu'
model = AddSubNet((torch.int32, torch.int32)).to(device)
x1 = torch.randint(-127, 128, (16,)).to(torch.int8).to(device)
x2 = torch.randint(-127, 128, (16,)).to(torch.int8).to(device)
model_cpu = torch.jit.trace(model, (x1, x2))

print('input')
print(x1)
print(x2)

out1_cpu, out2_cpu = model_cpu(x1, x2)
print('cpu output')
print(out1_cpu)
print(out2_cpu)

device = 'cuda'
model.to(device)
x1 = x1.to(device)
x2 = x2.to(device)

model_gpu = torch.jit.trace(model, (x1, x2))
#print(model_gpu.graph)

out1, out2 = model_gpu(x1, x2)
print('cuda output')
print(out1)
print(out2)
```
Output:
```python
input
tensor([ -14,  127,  -24,    9, -115,  -24, -102,   -5,    5,   93,   45,  -69,
         -74,   46,  109,  -90], dtype=torch.int8)
tensor([  32,  -46,   13,   78,  109,  -84, -104,   76,   29,  -97,  -90,   73,
          17, -105,   34,  117], dtype=torch.int8)
cpu output
tensor([  18,   81,  -11,   87,   -6, -108,   50,   71,   34,   -4,  -45,    4,
         -57,  -59, -113,   27], dtype=torch.int32)
tensor([ -46,  -83,  -37,  -69,   32,   60,    2,  -81,  -24,  -66, -121,  114,
         -91, -105,   75,   49], dtype=torch.int32)
cuda output
tensor([  18,   81,  -11,   87,   -6, -108, -206,   71,   34,   -4,  -45,    4,
         -57,  -59,  143,   27], device='cuda:0', dtype=torch.int32)
tensor([ -46,  173,  -37,  -69, -224,   60,    2,  -81,  -24,  190,  135, -142,
         -91,  151,   75, -207], device='cuda:0', dtype=torch.int32)
```
The mismatches look like overflowing values (e.g. `50` vs. `-206`).

## Environment

```
PyTorch version: 1.11.0.dev20211019+cu111
Is debug build: False
CUDA used to build PyTorch: 11.1
ROCM used to build PyTorch: N/A

OS: Ubuntu 20.04.3 LTS (x86_64)
GCC version: (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
Clang version: Could not collect
CMake version: version 3.21.3
Libc version: glibc-2.31
```
Reproduced on different GPUs.

## Additional information

Enabling `nvfuser` via:
```python
torch._C._jit_set_nvfuser_enabled(True)
torch._C._jit_set_texpr_fuser_enabled(False)
torch._C._jit_set_profiling_executor(True)
torch._C._jit_set_profiling_mode(True)
torch._C._jit_override_can_fuse_on_cpu(False)
torch._C._jit_override_can_fuse_on_gpu(False)
torch._C._jit_set_bailout_depth(20)
```
yields matching values.

CC @malfet as we've discussed this issue. (Initially I thought it would be ARM-specifc, but that turns out to be wrong)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

JIT vs. eager mismatches for jit.traced `int8` to `int32` casting #66930

🐛 Bug

To Reproduce

Environment

Additional information

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

JIT vs. eager mismatches for jit.traced int8 to int32 casting #66930

Description

🐛 Bug

To Reproduce

Environment

Additional information

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

JIT vs. eager mismatches for jit.traced `int8` to `int32` casting #66930