-
Notifications
You must be signed in to change notification settings - Fork 25.7k
Closed
Labels
module: cudaRelated to torch.cuda, and CUDA support in generalRelated to torch.cuda, and CUDA support in generalmodule: testsIssues related to tests (not the torch.testing module)Issues related to tests (not the torch.testing module)triagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate moduleThis issue has been looked at a team member, and triaged and prioritized into an appropriate module
Description
🐛 Bug
TestCommonCUDA.test_dtypes_matmul_cuda fails
This test failure is also seen on V100, A100, and 3090 GPU. The earliest seen failure was on 6/18/2021.
Line 5944 seems related #60157
pytorch/torch/testing/_internal/common_methods_invocations.py
Lines 5939 to 5954 in af3f7a2
| OpInfo('matmul', | |
| dtypes=floating_types(), | |
| dtypesIfCPU=all_types_and_complex(), | |
| dtypesIfCUDA=floating_and_complex_types_and(torch.float16, *[torch.bfloat16] if CUDA11OrLater else []), | |
| dtypesIfROCM=floating_types_and(torch.half, torch.bfloat16), | |
| backward_dtypesIfCUDA=floating_and_complex_types_and(torch.float16), | |
| assert_autodiffed=True, | |
| sample_inputs_func=sample_inputs_matmul, | |
| skips=( | |
| # FIXME: bfloat16 backward support likely depends on CUDA11+ | |
| # and SM53+ | |
| SkipInfo('TestCommon', 'test_dtypes', active_if=IS_WINDOWS), | |
| # matmul does not correctly warn when resizing out= inputs | |
| SkipInfo('TestCommon', 'test_out'), | |
| SkipInfo('TestCommon', 'test_conj_view', device_type='cpu'), | |
| )), |
To Reproduce
Steps to reproduce the behavior:
python test/test_ops.py -v -k test_dtypes_matmul_cuda
error message
$ python test/test_ops.py -v -k test_dtypes_matmul_cuda
Test results will be stored in test-reports/python-unittest/.home.xwang.Developer.pytorch.test.test_ops
Running tests...
----------------------------------------------------------------------
test_dtypes_matmul_cuda (__main__.TestCommonCUDA) ... FAIL (2.224s)
======================================================================
ERROR [2.224s]: test_dtypes_matmul_cuda (__main__.TestCommonCUDA)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/xwang/Developer/pytorch/torch/testing/_internal/common_utils.py", line 1051, in wrapper
method(*args, **kwargs)
File "/home/xwang/Developer/pytorch/torch/testing/_internal/common_utils.py", line 1051, in wrapper
method(*args, **kwargs)
File "/home/xwang/Developer/pytorch/torch/testing/_internal/common_device_type.py", line 380, in instantiated_test
result = test_fn(self, *args)
File "/home/xwang/Developer/pytorch/torch/testing/_internal/common_device_type.py", line 354, in test_wrapper
return test(*args, **kwargs)
File "/home/xwang/Developer/pytorch/torch/testing/_internal/common_device_type.py", line 746, in dep_fn
return fn(slf, device, *args, **kwargs)
File "/home/xwang/Developer/pytorch/torch/testing/_internal/common_device_type.py", line 894, in only_fn
return fn(self, device, *args, **kwargs)
File "/home/xwang/Developer/pytorch/test/test_ops.py", line 166, in test_dtypes
self.assertEqual(supported_backward_dtypes, claimed_backward_supported, msg=msg)
File "/home/xwang/Developer/pytorch/torch/testing/_internal/common_utils.py", line 1386, in assertEqual
super().assertEqual(x, y, msg=self._get_assert_msg(msg, debug_msg=debug_msg))
AssertionError: Items in the first set but not the second:
torch.bfloat16 : Attempted to compare [set] types: Expected: {torch.complex128, torch.bfloat16, torch.float16, torch.float32, torch.float64, torch.complex64}; Actual: {torch.complex128, torch.float16, torch.float32, torch.complex64, torch.float64}.
The supported backward dtypes for matmul on cuda according to its OpInfo are
{torch.complex128, torch.float16, torch.float32, torch.complex64, torch.float64}, but the detected supported backward dtypes are {torch.complex128, torch.bfloat16, torch.float16, torch.float32, torch.float64, torch.complex64}.
The following backward dtypes should be added to the OpInfo: {torch.bfloat16}.
----------------------------------------------------------------------
Ran 1 test in 2.225s
FAILED (errors=1)
Generating XML reports...Expected behavior
no fail
Environment
Collecting environment information...
PyTorch version: 1.10.0a0+git01e0296
Is debug build: False
CUDA used to build PyTorch: 11.3
ROCM used to build PyTorch: N/A
OS: Manjaro Linux (x86_64)
GCC version: (GCC) 10.2.0
Clang version: Could not collect
CMake version: version 3.20.2
Libc version: glibc-2.33
Python version: 3.9 (64-bit runtime)
Python platform: Linux-5.10.36-2-MANJARO-x86_64-with-glibc2.33
Is CUDA available: True
CUDA runtime version: 11.3.58
GPU models and configuration:
GPU 0: GeForce RTX 2070 SUPER
GPU 1: GeForce GTX 1070 Ti
Nvidia driver version: 460.80
cuDNN version: Probably one of the following:
/usr/lib/libcudnn.so.8.2.0
/usr/lib/libcudnn_adv_infer.so.8.2.0
/usr/lib/libcudnn_adv_train.so.8.2.0
/usr/lib/libcudnn_cnn_infer.so.8.2.0
/usr/lib/libcudnn_cnn_train.so.8.2.0
/usr/lib/libcudnn_ops_infer.so.8.2.0
/usr/lib/libcudnn_ops_train.so.8.2.0
HIP runtime version: N/A
MIOpen runtime version: N/A
Versions of relevant libraries:
[pip3] numpy==1.19.5
[pip3] torch==1.10.0a0+git01e0296
[pip3] torchvision==0.10.0a0+7d955df
[conda] Could not collec
Additional context
Metadata
Metadata
Assignees
Labels
module: cudaRelated to torch.cuda, and CUDA support in generalRelated to torch.cuda, and CUDA support in generalmodule: testsIssues related to tests (not the torch.testing module)Issues related to tests (not the torch.testing module)triagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate moduleThis issue has been looked at a team member, and triaged and prioritized into an appropriate module