KEMBAR78
TestCommonCUDA.test_dtypes_matmul_cuda fails · Issue #60443 · pytorch/pytorch · GitHub
Skip to content

TestCommonCUDA.test_dtypes_matmul_cuda fails #60443

@xwang233

Description

@xwang233

🐛 Bug

TestCommonCUDA.test_dtypes_matmul_cuda fails

This test failure is also seen on V100, A100, and 3090 GPU. The earliest seen failure was on 6/18/2021.

Line 5944 seems related #60157

OpInfo('matmul',
dtypes=floating_types(),
dtypesIfCPU=all_types_and_complex(),
dtypesIfCUDA=floating_and_complex_types_and(torch.float16, *[torch.bfloat16] if CUDA11OrLater else []),
dtypesIfROCM=floating_types_and(torch.half, torch.bfloat16),
backward_dtypesIfCUDA=floating_and_complex_types_and(torch.float16),
assert_autodiffed=True,
sample_inputs_func=sample_inputs_matmul,
skips=(
# FIXME: bfloat16 backward support likely depends on CUDA11+
# and SM53+
SkipInfo('TestCommon', 'test_dtypes', active_if=IS_WINDOWS),
# matmul does not correctly warn when resizing out= inputs
SkipInfo('TestCommon', 'test_out'),
SkipInfo('TestCommon', 'test_conj_view', device_type='cpu'),
)),

To Reproduce

Steps to reproduce the behavior:

python test/test_ops.py -v -k test_dtypes_matmul_cuda

error message

$ python test/test_ops.py -v -k test_dtypes_matmul_cuda
Test results will be stored in test-reports/python-unittest/.home.xwang.Developer.pytorch.test.test_ops

Running tests...
----------------------------------------------------------------------
  test_dtypes_matmul_cuda (__main__.TestCommonCUDA) ... FAIL (2.224s)

======================================================================
ERROR [2.224s]: test_dtypes_matmul_cuda (__main__.TestCommonCUDA)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/xwang/Developer/pytorch/torch/testing/_internal/common_utils.py", line 1051, in wrapper
    method(*args, **kwargs)
  File "/home/xwang/Developer/pytorch/torch/testing/_internal/common_utils.py", line 1051, in wrapper
    method(*args, **kwargs)
  File "/home/xwang/Developer/pytorch/torch/testing/_internal/common_device_type.py", line 380, in instantiated_test
    result = test_fn(self, *args)
  File "/home/xwang/Developer/pytorch/torch/testing/_internal/common_device_type.py", line 354, in test_wrapper
    return test(*args, **kwargs)
  File "/home/xwang/Developer/pytorch/torch/testing/_internal/common_device_type.py", line 746, in dep_fn
    return fn(slf, device, *args, **kwargs)
  File "/home/xwang/Developer/pytorch/torch/testing/_internal/common_device_type.py", line 894, in only_fn
    return fn(self, device, *args, **kwargs)
  File "/home/xwang/Developer/pytorch/test/test_ops.py", line 166, in test_dtypes
    self.assertEqual(supported_backward_dtypes, claimed_backward_supported, msg=msg)
  File "/home/xwang/Developer/pytorch/torch/testing/_internal/common_utils.py", line 1386, in assertEqual
    super().assertEqual(x, y, msg=self._get_assert_msg(msg, debug_msg=debug_msg))
AssertionError: Items in the first set but not the second:
torch.bfloat16 : Attempted to compare [set] types: Expected: {torch.complex128, torch.bfloat16, torch.float16, torch.float32, torch.float64, torch.complex64}; Actual: {torch.complex128, torch.float16, torch.float32, torch.complex64, torch.float64}.
The supported backward dtypes for matmul on cuda according to its OpInfo are
        {torch.complex128, torch.float16, torch.float32, torch.complex64, torch.float64}, but the detected supported backward dtypes are {torch.complex128, torch.bfloat16, torch.float16, torch.float32, torch.float64, torch.complex64}.
        The following backward dtypes should be added to the OpInfo: {torch.bfloat16}. 

----------------------------------------------------------------------
Ran 1 test in 2.225s

FAILED (errors=1)

Generating XML reports...

Expected behavior

no fail

Environment

Collecting environment information...
PyTorch version: 1.10.0a0+git01e0296
Is debug build: False
CUDA used to build PyTorch: 11.3
ROCM used to build PyTorch: N/A

OS: Manjaro Linux (x86_64)
GCC version: (GCC) 10.2.0
Clang version: Could not collect
CMake version: version 3.20.2
Libc version: glibc-2.33

Python version: 3.9 (64-bit runtime)
Python platform: Linux-5.10.36-2-MANJARO-x86_64-with-glibc2.33
Is CUDA available: True
CUDA runtime version: 11.3.58
GPU models and configuration: 
GPU 0: GeForce RTX 2070 SUPER
GPU 1: GeForce GTX 1070 Ti

Nvidia driver version: 460.80
cuDNN version: Probably one of the following:
/usr/lib/libcudnn.so.8.2.0
/usr/lib/libcudnn_adv_infer.so.8.2.0
/usr/lib/libcudnn_adv_train.so.8.2.0
/usr/lib/libcudnn_cnn_infer.so.8.2.0
/usr/lib/libcudnn_cnn_train.so.8.2.0
/usr/lib/libcudnn_ops_infer.so.8.2.0
/usr/lib/libcudnn_ops_train.so.8.2.0
HIP runtime version: N/A
MIOpen runtime version: N/A

Versions of relevant libraries:
[pip3] numpy==1.19.5
[pip3] torch==1.10.0a0+git01e0296
[pip3] torchvision==0.10.0a0+7d955df
[conda] Could not collec

Additional context

cc @ngimel @mruberry @VitalyFedyunin @walterddr @ptrblck

Metadata

Metadata

Assignees

No one assigned

    Labels

    module: cudaRelated to torch.cuda, and CUDA support in generalmodule: testsIssues related to tests (not the torch.testing module)triagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate module

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions