RNN Transducer Loss Autograd Test #1532

vincentqb · 2021-05-26T19:17:03Z

This PR

adds to autograd test from RNNTL Autograd Tests carolineechen/audio#2 with reuse_logits_for_grads=False and float32.
fixes numpy RNNTL implementation by avoiding inplace operation, see comment below.

Follow-up noted below:

Remove reuse_logits_for_grads=True for now see remove reuse logits for grads from python interface. #1536. This is used for performance but requires mark_dirty, see Autograd Follow-Up vincentqb/audio#10.

vincentqb · 2021-05-26T19:26:41Z

Investigation with autograd test from carolineechen#2.

============================================================================================================= short test summary info ==============================================================================================================
FAILED autograd_cpu_test.py::TestAutograd::test_RNNTLoss_gradcheck_0 - torch.autograd.gradcheck.GradcheckError: Jacobian mismatch for output 0 with respect to input 0,
FAILED autograd_cpu_test.py::TestAutograd::test_RNNTLoss_gradcheck_1 - torch.autograd.gradcheck.GradcheckError: Jacobian mismatch for output 0 with respect to input 0,
FAILED autograd_cpu_test.py::TestAutograd::test_np_transducer_gradcheck_0 - torch.autograd.gradcheck.GradcheckError: Jacobian mismatch for output 0 with respect to input 0,
FAILED autograd_cpu_test.py::TestAutograd::test_np_transducer_gradcheck_1 - torch.autograd.gradcheck.GradcheckError: backward not multiplied by grad_output
FAILED autograd_cuda_test.py::TestAutograd::test_RNNTLoss_gradcheck_0 - torch.autograd.gradcheck.GradcheckError: Jacobian mismatch for output 0 with respect to input 0,
FAILED autograd_cuda_test.py::TestAutograd::test_RNNTLoss_gradcheck_1 - torch.autograd.gradcheck.GradcheckError: Jacobian mismatch for output 0 with respect to input 0,
FAILED autograd_cuda_test.py::TestAutograd::test_np_transducer_gradcheck_0 - torch.autograd.gradcheck.GradcheckError: Jacobian mismatch for output 0 with respect to input 0,
FAILED autograd_cuda_test.py::TestAutograd::test_np_transducer_gradcheck_1 - torch.autograd.gradcheck.GradcheckError: backward not multiplied by grad_output
===================================================================================================== 8 failed, 18 passed, 8 warnings in 4.58s =====================================================================================================

Details

============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-6.2.4, py-1.10.0, pluggy-0.13.1 -- /private/home/vincentqb/miniconda/envs/torch-nightly/bin/python
cachedir: .pytest_cache
rootdir: /private/home/vincentqb/autograd/audio
plugins: hydra-core-1.0.6
collecting ... collected 26 items

autograd_cpu_test.py::TestAutograd::test_RNNTLoss_gradcheck_0 FAILED     [  3%]
autograd_cpu_test.py::TestAutograd::test_RNNTLoss_gradcheck_1 FAILED     [  7%]
autograd_cpu_test.py::TestAutograd::test_np_transducer_gradcheck_0 FAILED [ 11%]
autograd_cpu_test.py::TestAutograd::test_np_transducer_gradcheck_1 FAILED [ 15%]
autograd_cuda_test.py::TestAutograd::test_RNNTLoss_gradcheck_0 FAILED    [ 19%]
autograd_cuda_test.py::TestAutograd::test_RNNTLoss_gradcheck_1 FAILED    [ 23%]
autograd_cuda_test.py::TestAutograd::test_np_transducer_gradcheck_0 FAILED [ 26%]
autograd_cuda_test.py::TestAutograd::test_np_transducer_gradcheck_1 FAILED [ 30%]
rnnt_loss_cpu_test.py::TestRNNTLoss::test_basic_backward PASSED          [ 34%]
rnnt_loss_cpu_test.py::TestRNNTLoss::test_costs_and_gradients_B1_T2_U3_D5_fp16 PASSED [ 38%]
rnnt_loss_cpu_test.py::TestRNNTLoss::test_costs_and_gradients_B1_T2_U3_D5_fp32 PASSED [ 42%]
rnnt_loss_cpu_test.py::TestRNNTLoss::test_costs_and_gradients_B2_T4_U3_D3_fp16 PASSED [ 46%]
rnnt_loss_cpu_test.py::TestRNNTLoss::test_costs_and_gradients_B2_T4_U3_D3_fp32 PASSED [ 50%]
rnnt_loss_cpu_test.py::TestRNNTLoss::test_costs_and_gradients_random_data_with_numpy_fp32 PASSED [ 53%]
rnnt_loss_cpu_test.py::TestRNNTLoss::test_rnnt_nonfused_log_softmax PASSED [ 57%]
rnnt_loss_cuda_test.py::TestRNNTLoss::test_basic_backward PASSED         [ 61%]
rnnt_loss_cuda_test.py::TestRNNTLoss::test_costs_and_gradients_B1_T2_U3_D5_fp16 PASSED [ 65%]
rnnt_loss_cuda_test.py::TestRNNTLoss::test_costs_and_gradients_B1_T2_U3_D5_fp32 PASSED [ 69%]
rnnt_loss_cuda_test.py::TestRNNTLoss::test_costs_and_gradients_B2_T4_U3_D3_fp16 PASSED [ 73%]
rnnt_loss_cuda_test.py::TestRNNTLoss::test_costs_and_gradients_B2_T4_U3_D3_fp32 PASSED [ 76%]
rnnt_loss_cuda_test.py::TestRNNTLoss::test_costs_and_gradients_random_data_with_numpy_fp32 PASSED [ 80%]
rnnt_loss_cuda_test.py::TestRNNTLoss::test_rnnt_nonfused_log_softmax PASSED [ 84%]
torchscript_consistency_cpu_test.py::TestRNNTLoss::test_RNNTLoss PASSED  [ 88%]
torchscript_consistency_cpu_test.py::TestRNNTLoss::test_rnnt_loss PASSED [ 92%]
torchscript_consistency_cuda_test.py::TestRNNTLoss::test_RNNTLoss PASSED [ 96%]
torchscript_consistency_cuda_test.py::TestRNNTLoss::test_rnnt_loss PASSED [100%]

=================================== FAILURES ===================================
____________________ TestAutograd.test_RNNTLoss_gradcheck_0 ____________________

a = (<torchaudio_unittest.rnnt.autograd_cpu_test.TestAutograd testMethod=test_RNNTLoss_gradcheck_0>,)

    @wraps(func)
    def standalone_func(*a):
>       return func(*(a + p.args), **p.kwargs)

../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/parameterized/parameterized.py:533: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
autograd_impl.py:64: in test_RNNTLoss_gradcheck
    self.assert_grad(loss, inputs, enable_all_grad=False)
autograd_impl.py:47: in assert_grad
    assert gradcheck(loss, inputs, eps=1e-03, atol=1e-02, rtol=1e-02, nondet_tol=0.)
../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/torch/autograd/gradcheck.py:1245: in gradcheck
    return _gradcheck_helper(**args)
../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/torch/autograd/gradcheck.py:1258: in _gradcheck_helper
    _gradcheck_real_imag(gradcheck_fn, func, func_out, tupled_inputs, outputs, eps,
../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/torch/autograd/gradcheck.py:930: in _gradcheck_real_imag
    gradcheck_fn(func, func_out, tupled_inputs, outputs, eps,
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

func = RNNTLoss()
func_out = tensor([4.2807, 3.9384], grad_fn=<RNNTLossFunction>>)
tupled_inputs = (tensor([[[[-0.3646, -0.1560,  0.5206],
          [-0.1865,  0.1634,  0.0231],
          [-0.0825,  0.0413,  0.0413]],...sor([[1, 2],
        [1, 1]], dtype=torch.int32), tensor([4, 4], dtype=torch.int32), tensor([2, 2], dtype=torch.int32))
outputs = (tensor([4.2807, 3.9384], grad_fn=<RNNTLossFunction>>),), eps = 0.001
rtol = 0.01, atol = 0.01, check_grad_dtypes = False, nondet_tol = 0.0

    def _slow_gradcheck(func, func_out, tupled_inputs, outputs, eps, rtol, atol, check_grad_dtypes,
                        nondet_tol, *, use_forward_ad=False, complex_indices=None, test_imag=False):
        if not outputs:
            return _check_no_differentiable_outputs(func, tupled_inputs, _as_tuple(func_out), eps)
    
        numerical = _transpose(_get_numerical_jacobian(func, tupled_inputs, outputs, eps=eps, is_forward_ad=use_forward_ad))
    
        if use_forward_ad:
            analytical_forward = _get_analytical_jacobian_forward_ad(func, tupled_inputs, outputs, check_grad_dtypes=check_grad_dtypes)
    
            for i, n_per_out in enumerate(numerical):
                for j, n in enumerate(n_per_out):
                    a = analytical_forward[j][i]
                    if not _allclose_with_type_promotion(a, n.to(a.device), rtol, atol):
                        raise GradcheckError(_get_notallclose_msg(a, n, i, j, complex_indices, test_imag,
                                                                  is_forward_ad=True))
        else:
            for i, o in enumerate(outputs):
                analytical = _check_analytical_jacobian_attributes(tupled_inputs, o, nondet_tol, check_grad_dtypes)
    
                for j, (a, n) in enumerate(zip(analytical, numerical[i])):
                    if not _allclose_with_type_promotion(a, n.to(a.device), rtol, atol):
>                       raise GradcheckError(_get_notallclose_msg(a, n, i, j, complex_indices, test_imag))
E                       torch.autograd.gradcheck.GradcheckError: Jacobian mismatch for output 0 with respect to input 0,
E                       numerical:tensor([[ 1.9532e+02,  2.5656e+02],
E                               [ 3.7360e+01,  1.0766e+01],
E                               [ 2.0285e+00,  1.2028e+00],
E                               [ 1.3320e+00,  3.9721e-01],
E                               [ 2.4867e-01,  1.9717e-01],
E                               [-3.6073e-01,  1.1039e-01],
E                               [-1.6189e-01,  6.2943e-02],
E                               [-6.7949e-02,  3.6478e-02],
E                               [-2.3365e-02,  2.1696e-02],
E                               [-1.9288e-01,  1.2875e-02],
E                               [-7.1764e-02,  7.3910e-03],
E                               [ 2.1267e-01,  4.0531e-03],
E                               [-2.2006e-01,  2.6226e-03],
E                               [ 2.4772e-01,  2.1458e-03],
E                               [-5.7220e-03,  9.5367e-04],
E                               [-1.9026e-01,  4.7684e-04],
E                               [ 1.2255e-01,  4.7684e-04],
E                               [ 9.7036e-02,  2.3842e-04],
E                               [-6.9141e-03, -4.7684e-04],
E                               [-8.7261e-02, -2.3842e-04],
E                               [ 1.1921e-01,  4.7684e-04],
E                               [-1.2374e-01, -2.3842e-04],
E                               [ 2.5749e-01,  0.0000e+00],
E                               [-9.5367e-02, -2.3842e-04],
E                               [-3.8958e-01,  0.0000e+00],
E                               [ 2.6107e-01, -2.3842e-04],
E                               [ 2.1553e-01, -2.3842e-04],
E                               [ 5.1022e-02,  2.3842e-04],
E                               [-7.4387e-02,  0.0000e+00],
E                               [ 5.3167e-02, -2.3842e-04],
E                               [ 1.4925e-01,  2.3842e-04],
E                               [ 1.2517e-01, -7.1526e-04],
E                               [-3.3593e-01,  7.1526e-04],
E                               [-7.9751e-01, -2.3842e-04],
E                               [ 4.7374e-01,  7.1526e-04],
E                               [ 4.5204e-01,  2.3842e-04],
E                               [ 3.1233e-02, -3.0470e-01],
E                               [ 9.5367e-04, -8.4639e-02],
E                               [ 4.7684e-04,  5.1975e-01],
E                               [-2.3842e-04, -1.6069e-01],
E                               [ 0.0000e+00,  4.3154e-02],
E                               [ 0.0000e+00,  1.5235e-01],
E                               [ 2.3842e-04, -8.1301e-02],
E                               [ 0.0000e+00,  4.6968e-02],
E                               [ 0.0000e+00,  3.1471e-02],
E                               [ 0.0000e+00, -1.1992e-01],
E                               [ 2.3842e-04, -7.4387e-02],
E                               [-4.7684e-04,  2.3460e-01],
E                               [ 9.5367e-04, -2.1267e-01],
E                               [-4.7684e-04,  8.5831e-03],
E                               [ 0.0000e+00,  2.2697e-01],
E                               [ 0.0000e+00, -2.0170e-01],
E                               [-2.3842e-04,  1.3256e-01],
E                               [-2.3842e-04,  9.8467e-02],
E                               [ 0.0000e+00, -6.4373e-03],
E                               [-4.7684e-04, -8.6546e-02],
E                               [-4.7684e-04,  1.1921e-01],
E                               [ 7.1526e-04, -1.2445e-01],
E                               [ 4.7684e-04, -8.0824e-02],
E                               [-4.7684e-04,  2.2840e-01],
E                               [ 2.3842e-04, -4.0674e-01],
E                               [ 0.0000e+00,  2.7966e-01],
E                               [ 9.5367e-04,  2.1434e-01],
E                               [ 0.0000e+00,  5.2452e-02],
E                               [-4.7684e-04, -7.4387e-02],
E                               [ 0.0000e+00,  5.2929e-02],
E                               [ 0.0000e+00,  1.4973e-01],
E                               [-2.3842e-04, -3.2377e-01],
E                               [-2.3842e-04,  2.1839e-01],
E                               [ 0.0000e+00, -8.7285e-01],
E                               [-4.7684e-04,  4.6802e-01],
E                               [-4.7684e-04,  4.5013e-01]])
E                       analytical:tensor([[-0.3646, -0.0000],
E                               [-0.1560, -0.0000],
E                               [ 0.5206,  0.0000],
E                               [-0.1865, -0.0000],
E                               [ 0.1634,  0.0000],
E                               [ 0.0231,  0.0000],
E                               [-0.0825, -0.0000],
E                               [ 0.0413,  0.0000],
E                               [ 0.0413,  0.0000],
E                               [-0.1482, -0.0000],
E                               [-0.0941, -0.0000],
E                               [ 0.2423,  0.0000],
E                               [-0.2277, -0.0000],
E                               [ 0.2358,  0.0000],
E                               [-0.0080, -0.0000],
E                               [-0.2303, -0.0000],
E                               [ 0.1151,  0.0000],
E                               [ 0.1151,  0.0000],
E                               [-0.0173, -0.0000],
E                               [-0.0991, -0.0000],
E                               [ 0.1164,  0.0000],
E                               [-0.1395, -0.0000],
E                               [ 0.2369,  0.0000],
E                               [-0.0974, -0.0000],
E                               [-0.4631, -0.0000],
E                               [ 0.2316,  0.0000],
E                               [ 0.2316,  0.0000],
E                               [ 0.0413,  0.0000],
E                               [-0.0827, -0.0000],
E                               [ 0.0413,  0.0000],
E                               [ 0.1610,  0.0000],
E                               [ 0.1610,  0.0000],
E                               [-0.3220, -0.0000],
E                               [-0.8826, -0.0000],
E                               [ 0.4413,  0.0000],
E                               [ 0.4413,  0.0000],
E                               [-0.0000, -0.3645],
E                               [-0.0000, -0.1560],
E                               [ 0.0000,  0.5206],
E                               [-0.0000, -0.1865],
E                               [ 0.0000,  0.0231],
E                               [ 0.0000,  0.1634],
E                               [-0.0000, -0.0825],
E                               [ 0.0000,  0.0413],
E                               [ 0.0000,  0.0413],
E                               [-0.0000, -0.1482],
E                               [-0.0000, -0.0941],
E                               [ 0.0000,  0.2423],
E                               [-0.0000, -0.2277],
E                               [-0.0000, -0.0080],
E                               [ 0.0000,  0.2358],
E                               [-0.0000, -0.2303],
E                               [ 0.0000,  0.1151],
E                               [ 0.0000,  0.1151],
E                               [-0.0000, -0.0173],
E                               [-0.0000, -0.0991],
E                               [ 0.0000,  0.1164],
E                               [-0.0000, -0.1395],
E                               [-0.0000, -0.0974],
E                               [ 0.0000,  0.2369],
E                               [-0.0000, -0.4631],
E                               [ 0.0000,  0.2316],
E                               [ 0.0000,  0.2316],
E                               [ 0.0000,  0.0413],
E                               [-0.0000, -0.0827],
E                               [ 0.0000,  0.0413],
E                               [ 0.0000,  0.1610],
E                               [-0.0000, -0.3220],
E                               [ 0.0000,  0.1610],
E                               [-0.0000, -0.8826],
E                               [ 0.0000,  0.4412],
E                               [ 0.0000,  0.4411]])

../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/torch/autograd/gradcheck.py:978: GradcheckError
----------------------------- Captured stdout call -----------------------------
reference gradient
[[[[-1.86844e-01 -6.25550e-02  2.49399e-01]
   [-2.03377e-01  2.02399e-01  9.77000e-04]
   [-1.41016e-01  7.91230e-02  6.18930e-02]]

  [[-1.15520e-02 -8.12800e-02  9.28320e-02]
   [-1.54257e-01  2.29433e-01 -7.51760e-02]
   [-2.46593e-01  1.46405e-01  1.00188e-01]]

  [[-1.29180e-02 -6.15930e-02  7.45120e-02]
   [-5.59860e-02  2.19831e-01 -1.63845e-01]
   [-4.97627e-01  2.09240e-01  2.88387e-01]]

  [[ 1.36050e-02 -3.02200e-02  1.66150e-02]
   [ 1.13925e-01  6.27810e-02 -1.76706e-01]
   [-6.67078e-01  3.67659e-01  2.99419e-01]]]


 [[[-3.56344e-01 -5.53470e-02  4.11691e-01]
   [-9.69220e-02  2.94590e-02  6.74630e-02]
   [-6.35180e-02  2.76540e-02  3.58630e-02]]

  [[-1.54499e-01 -7.39420e-02  2.28441e-01]
   [-1.66790e-01 -8.80000e-05  1.66878e-01]
   [-1.72370e-01  1.05565e-01  6.68040e-02]]

  [[ 2.38750e-02 -1.18256e-01  9.43810e-02]
   [-1.04707e-01 -1.08934e-01  2.13642e-01]
   [-3.69844e-01  1.80118e-01  1.89726e-01]]

  [[ 2.57140e-02 -7.94620e-02  5.37480e-02]
   [ 1.22328e-01 -2.38789e-01  1.16460e-01]
   [-5.98687e-01  3.02203e-01  2.96484e-01]]]]
----------------------------- Captured stderr call -----------------------------
[W LegacyTypeDispatch.h:74] Warning: AutoNonVariableTypeMode is deprecated and will be removed in 1.10 release. For kernel implementations please use AutoDispatchBelowADInplaceOrView instead, If you are looking for a user facing API to enable running your inference-only workload, please use c10::InferenceMode. Using AutoDispatchBelowADInplaceOrView in user code is under risk of producing silent wrong result in some edge cases. See Note [AutoDispatchBelowAutograd] for more details. (function operator())
____________________ TestAutograd.test_RNNTLoss_gradcheck_1 ____________________

a = (<torchaudio_unittest.rnnt.autograd_cpu_test.TestAutograd testMethod=test_RNNTLoss_gradcheck_1>,)

    @wraps(func)
    def standalone_func(*a):
>       return func(*(a + p.args), **p.kwargs)

../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/parameterized/parameterized.py:533: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
autograd_impl.py:64: in test_RNNTLoss_gradcheck
    self.assert_grad(loss, inputs, enable_all_grad=False)
autograd_impl.py:47: in assert_grad
    assert gradcheck(loss, inputs, eps=1e-03, atol=1e-02, rtol=1e-02, nondet_tol=0.)
../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/torch/autograd/gradcheck.py:1245: in gradcheck
    return _gradcheck_helper(**args)
../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/torch/autograd/gradcheck.py:1258: in _gradcheck_helper
    _gradcheck_real_imag(gradcheck_fn, func, func_out, tupled_inputs, outputs, eps,
../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/torch/autograd/gradcheck.py:930: in _gradcheck_real_imag
    gradcheck_fn(func, func_out, tupled_inputs, outputs, eps,
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

func = RNNTLoss(), func_out = tensor([5.0957], grad_fn=<RNNTLossFunction>>)
tupled_inputs = (tensor([[[[ 0.2438, -0.5317,  0.2438,  0.2438, -0.1996],
          [ 0.1468,  0.1468, -0.2588,  0.1468, -0.1816],
   ...quires_grad=True), tensor([[1, 2]], dtype=torch.int32), tensor([2], dtype=torch.int32), tensor([2], dtype=torch.int32))
outputs = (tensor([5.0957], grad_fn=<RNNTLossFunction>>),), eps = 0.001
rtol = 0.01, atol = 0.01, check_grad_dtypes = False, nondet_tol = 0.0

    def _slow_gradcheck(func, func_out, tupled_inputs, outputs, eps, rtol, atol, check_grad_dtypes,
                        nondet_tol, *, use_forward_ad=False, complex_indices=None, test_imag=False):
        if not outputs:
            return _check_no_differentiable_outputs(func, tupled_inputs, _as_tuple(func_out), eps)
    
        numerical = _transpose(_get_numerical_jacobian(func, tupled_inputs, outputs, eps=eps, is_forward_ad=use_forward_ad))
    
        if use_forward_ad:
            analytical_forward = _get_analytical_jacobian_forward_ad(func, tupled_inputs, outputs, check_grad_dtypes=check_grad_dtypes)
    
            for i, n_per_out in enumerate(numerical):
                for j, n in enumerate(n_per_out):
                    a = analytical_forward[j][i]
                    if not _allclose_with_type_promotion(a, n.to(a.device), rtol, atol):
                        raise GradcheckError(_get_notallclose_msg(a, n, i, j, complex_indices, test_imag,
                                                                  is_forward_ad=True))
        else:
            for i, o in enumerate(outputs):
                analytical = _check_analytical_jacobian_attributes(tupled_inputs, o, nondet_tol, check_grad_dtypes)
    
                for j, (a, n) in enumerate(zip(analytical, numerical[i])):
                    if not _allclose_with_type_promotion(a, n.to(a.device), rtol, atol):
>                       raise GradcheckError(_get_notallclose_msg(a, n, i, j, complex_indices, test_imag))
E                       torch.autograd.gradcheck.GradcheckError: Jacobian mismatch for output 0 with respect to input 0,
E                       numerical:tensor([[ 1.5291e+02],
E                               [ 1.2874e+01],
E                               [ 7.4053e-01],
E                               [ 3.3712e-01],
E                               [-7.9679e-01],
E                               [ 7.1311e-01],
E                               [ 3.2616e-01],
E                               [ 2.5654e-01],
E                               [ 1.4234e-01],
E                               [-1.8954e-01],
E                               [ 1.0657e-01],
E                               [ 9.6798e-02],
E                               [ 9.0361e-02],
E                               [ 8.4639e-02],
E                               [-2.0695e-01],
E                               [ 1.1611e-01],
E                               [-3.5858e-01],
E                               [ 1.6069e-01],
E                               [ 7.1049e-02],
E                               [ 7.0333e-02],
E                               [ 1.4257e-01],
E                               [ 1.4997e-01],
E                               [-6.6042e-01],
E                               [ 2.4986e-01],
E                               [ 1.4782e-01],
E                               [ 2.5439e-01],
E                               [ 2.5010e-01],
E                               [ 2.5296e-01],
E                               [ 2.5249e-01],
E                               [-8.9550e-01]])
E                       analytical:tensor([[ 0.2438],
E                               [-0.5317],
E                               [ 0.2438],
E                               [ 0.2438],
E                               [-0.1996],
E                               [ 0.1468],
E                               [ 0.1468],
E                               [-0.2588],
E                               [ 0.1468],
E                               [-0.1816],
E                               [ 0.0762],
E                               [ 0.0762],
E                               [ 0.0762],
E                               [ 0.0762],
E                               [-0.3046],
E                               [ 0.0760],
E                               [-0.3041],
E                               [ 0.0760],
E                               [ 0.0760],
E                               [ 0.0760],
E                               [ 0.1433],
E                               [ 0.1433],
E                               [-0.5733],
E                               [ 0.1433],
E                               [ 0.1433],
E                               [ 0.2318],
E                               [ 0.2318],
E                               [ 0.2318],
E                               [ 0.2318],
E                               [-0.9273]])

../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/torch/autograd/gradcheck.py:978: GradcheckError
----------------------------- Captured stdout call -----------------------------
reference gradient
[[[[ 0.17703132 -0.39992708  0.17703132  0.17703132 -0.13116692]
   [ 0.12247062  0.12247062 -0.181684    0.12247062 -0.1857276 ]
   [ 0.06269141  0.06269141  0.06928471  0.12624498 -0.32091248]]

  [[ 0.05456069 -0.2182428   0.05456069  0.05456069  0.05456069]
   [ 0.12073967  0.12073967 -0.48295838  0.12073967  0.12073967]
   [ 0.30741188  0.16871123  0.18645471  0.16871123 -0.83128875]]]]
_________________ TestAutograd.test_np_transducer_gradcheck_0 __________________

a = (<torchaudio_unittest.rnnt.autograd_cpu_test.TestAutograd testMethod=test_np_transducer_gradcheck_0>,)

    @wraps(func)
    def standalone_func(*a):
>       return func(*(a + p.args), **p.kwargs)

../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/parameterized/parameterized.py:533: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
autograd_impl.py:81: in test_np_transducer_gradcheck
    self.assert_grad(loss, inputs, enable_all_grad=False)
autograd_impl.py:47: in assert_grad
    assert gradcheck(loss, inputs, eps=1e-03, atol=1e-02, rtol=1e-02, nondet_tol=0.)
../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/torch/autograd/gradcheck.py:1245: in gradcheck
    return _gradcheck_helper(**args)
../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/torch/autograd/gradcheck.py:1258: in _gradcheck_helper
    _gradcheck_real_imag(gradcheck_fn, func, func_out, tupled_inputs, outputs, eps,
../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/torch/autograd/gradcheck.py:930: in _gradcheck_real_imag
    gradcheck_fn(func, func_out, tupled_inputs, outputs, eps,
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

func = NumpyTransducerLoss()
func_out = tensor([4.2807, 3.9384], grad_fn=<_NumpyTransducerBackward>)
tupled_inputs = (tensor([[[[0.0654, 0.7875, 0.0816],
          [0.5297, 0.7507, 0.7541],
          [0.6098, 0.8681, 0.6225]],

       ...sor([4, 4], dtype=torch.int32), tensor([2, 2], dtype=torch.int32), tensor([[1, 2],
        [1, 1]], dtype=torch.int32))
outputs = (tensor([4.2807, 3.9384], grad_fn=<_NumpyTransducerBackward>),)
eps = 0.001, rtol = 0.01, atol = 0.01, check_grad_dtypes = False
nondet_tol = 0.0

    def _slow_gradcheck(func, func_out, tupled_inputs, outputs, eps, rtol, atol, check_grad_dtypes,
                        nondet_tol, *, use_forward_ad=False, complex_indices=None, test_imag=False):
        if not outputs:
            return _check_no_differentiable_outputs(func, tupled_inputs, _as_tuple(func_out), eps)
    
        numerical = _transpose(_get_numerical_jacobian(func, tupled_inputs, outputs, eps=eps, is_forward_ad=use_forward_ad))
    
        if use_forward_ad:
            analytical_forward = _get_analytical_jacobian_forward_ad(func, tupled_inputs, outputs, check_grad_dtypes=check_grad_dtypes)
    
            for i, n_per_out in enumerate(numerical):
                for j, n in enumerate(n_per_out):
                    a = analytical_forward[j][i]
                    if not _allclose_with_type_promotion(a, n.to(a.device), rtol, atol):
                        raise GradcheckError(_get_notallclose_msg(a, n, i, j, complex_indices, test_imag,
                                                                  is_forward_ad=True))
        else:
            for i, o in enumerate(outputs):
                analytical = _check_analytical_jacobian_attributes(tupled_inputs, o, nondet_tol, check_grad_dtypes)
    
                for j, (a, n) in enumerate(zip(analytical, numerical[i])):
                    if not _allclose_with_type_promotion(a, n.to(a.device), rtol, atol):
>                       raise GradcheckError(_get_notallclose_msg(a, n, i, j, complex_indices, test_imag))
E                       torch.autograd.gradcheck.GradcheckError: Jacobian mismatch for output 0 with respect to input 0,
E                       numerical:tensor([[-0.1869,  0.0000],
E                               [-0.0629,  0.0000],
E                               [ 0.2494,  0.0000],
E                               [-0.2034,  0.0000],
E                               [ 0.2024,  0.0000],
E                               [ 0.0010,  0.0000],
E                               [-0.1411,  0.0000],
E                               [ 0.0789,  0.0000],
E                               [ 0.0620,  0.0000],
E                               [-0.0114,  0.0000],
E                               [-0.0813,  0.0000],
E                               [ 0.0930,  0.0000],
E                               [-0.1540,  0.0000],
E                               [ 0.2296,  0.0000],
E                               [-0.0751,  0.0000],
E                               [-0.2465,  0.0000],
E                               [ 0.1464,  0.0000],
E                               [ 0.1004,  0.0000],
E                               [-0.0131,  0.0000],
E                               [-0.0618,  0.0000],
E                               [ 0.0744,  0.0000],
E                               [-0.0560,  0.0000],
E                               [ 0.2201,  0.0000],
E                               [-0.1640,  0.0000],
E                               [-0.4976,  0.0000],
E                               [ 0.2096,  0.0000],
E                               [ 0.2885,  0.0000],
E                               [ 0.0136,  0.0000],
E                               [-0.0303,  0.0000],
E                               [ 0.0167,  0.0000],
E                               [ 0.1137,  0.0000],
E                               [ 0.0629,  0.0000],
E                               [-0.1767,  0.0000],
E                               [-0.6671,  0.0000],
E                               [ 0.3676,  0.0000],
E                               [ 0.2995,  0.0000],
E                               [ 0.0000, -0.3562],
E                               [ 0.0000, -0.0554],
E                               [ 0.0000,  0.4117],
E                               [ 0.0000, -0.0969],
E                               [ 0.0000,  0.0294],
E                               [ 0.0000,  0.0675],
E                               [ 0.0000, -0.0634],
E                               [ 0.0000,  0.0278],
E                               [ 0.0000,  0.0359],
E                               [ 0.0000, -0.1546],
E                               [ 0.0000, -0.0737],
E                               [ 0.0000,  0.2285],
E                               [ 0.0000, -0.1669],
E                               [ 0.0000,  0.0000],
E                               [ 0.0000,  0.1669],
E                               [ 0.0000, -0.1724],
E                               [ 0.0000,  0.1055],
E                               [ 0.0000,  0.0670],
E                               [ 0.0000,  0.0240],
E                               [ 0.0000, -0.1181],
E                               [ 0.0000,  0.0942],
E                               [ 0.0000, -0.1047],
E                               [ 0.0000, -0.1090],
E                               [ 0.0000,  0.2136],
E                               [ 0.0000, -0.3699],
E                               [ 0.0000,  0.1799],
E                               [ 0.0000,  0.1895],
E                               [ 0.0000,  0.0259],
E                               [ 0.0000, -0.0793],
E                               [ 0.0000,  0.0539],
E                               [ 0.0000,  0.1224],
E                               [ 0.0000, -0.2387],
E                               [ 0.0000,  0.1165],
E                               [ 0.0000, -0.5988],
E                               [ 0.0000,  0.3023],
E                               [ 0.0000,  0.2966]])
E                       analytical:tensor([[-1.8684e-01, -1.8684e-01],
E                               [-6.2555e-02, -6.2555e-02],
E                               [ 2.4940e-01,  2.4940e-01],
E                               [-2.0338e-01, -2.0338e-01],
E                               [ 2.0240e-01,  2.0240e-01],
E                               [ 9.7747e-04,  9.7747e-04],
E                               [-1.4102e-01, -1.4102e-01],
E                               [ 7.9123e-02,  7.9123e-02],
E                               [ 6.1893e-02,  6.1893e-02],
E                               [-1.1552e-02, -1.1552e-02],
E                               [-8.1280e-02, -8.1280e-02],
E                               [ 9.2832e-02,  9.2832e-02],
E                               [-1.5426e-01, -1.5426e-01],
E                               [ 2.2943e-01,  2.2943e-01],
E                               [-7.5175e-02, -7.5175e-02],
E                               [-2.4659e-01, -2.4659e-01],
E                               [ 1.4640e-01,  1.4640e-01],
E                               [ 1.0019e-01,  1.0019e-01],
E                               [-1.2918e-02, -1.2918e-02],
E                               [-6.1593e-02, -6.1593e-02],
E                               [ 7.4512e-02,  7.4512e-02],
E                               [-5.5986e-02, -5.5986e-02],
E                               [ 2.1983e-01,  2.1983e-01],
E                               [-1.6385e-01, -1.6385e-01],
E                               [-4.9763e-01, -4.9763e-01],
E                               [ 2.0924e-01,  2.0924e-01],
E                               [ 2.8839e-01,  2.8839e-01],
E                               [ 1.3605e-02,  1.3605e-02],
E                               [-3.0220e-02, -3.0220e-02],
E                               [ 1.6615e-02,  1.6615e-02],
E                               [ 1.1392e-01,  1.1392e-01],
E                               [ 6.2781e-02,  6.2781e-02],
E                               [-1.7671e-01, -1.7671e-01],
E                               [-6.6708e-01, -6.6708e-01],
E                               [ 3.6766e-01,  3.6766e-01],
E                               [ 2.9942e-01,  2.9942e-01],
E                               [-3.5634e-01, -3.5634e-01],
E                               [-5.5347e-02, -5.5347e-02],
E                               [ 4.1169e-01,  4.1169e-01],
E                               [-9.6922e-02, -9.6922e-02],
E                               [ 2.9459e-02,  2.9459e-02],
E                               [ 6.7463e-02,  6.7463e-02],
E                               [-6.3518e-02, -6.3518e-02],
E                               [ 2.7654e-02,  2.7654e-02],
E                               [ 3.5863e-02,  3.5863e-02],
E                               [-1.5450e-01, -1.5450e-01],
E                               [-7.3942e-02, -7.3942e-02],
E                               [ 2.2844e-01,  2.2844e-01],
E                               [-1.6679e-01, -1.6679e-01],
E                               [-8.8003e-05, -8.8003e-05],
E                               [ 1.6688e-01,  1.6688e-01],
E                               [-1.7237e-01, -1.7237e-01],
E                               [ 1.0557e-01,  1.0557e-01],
E                               [ 6.6804e-02,  6.6804e-02],
E                               [ 2.3875e-02,  2.3875e-02],
E                               [-1.1826e-01, -1.1826e-01],
E                               [ 9.4381e-02,  9.4381e-02],
E                               [-1.0471e-01, -1.0471e-01],
E                               [-1.0893e-01, -1.0893e-01],
E                               [ 2.1364e-01,  2.1364e-01],
E                               [-3.6984e-01, -3.6984e-01],
E                               [ 1.8012e-01,  1.8012e-01],
E                               [ 1.8973e-01,  1.8973e-01],
E                               [ 2.5714e-02,  2.5714e-02],
E                               [-7.9462e-02, -7.9462e-02],
E                               [ 5.3748e-02,  5.3748e-02],
E                               [ 1.2233e-01,  1.2233e-01],
E                               [-2.3879e-01, -2.3879e-01],
E                               [ 1.1646e-01,  1.1646e-01],
E                               [-5.9869e-01, -5.9869e-01],
E                               [ 3.0220e-01,  3.0220e-01],
E                               [ 2.9648e-01,  2.9648e-01]])

../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/torch/autograd/gradcheck.py:978: GradcheckError
----------------------------- Captured stdout call -----------------------------
reference gradient
[[[[-1.86844e-01 -6.25550e-02  2.49399e-01]
   [-2.03377e-01  2.02399e-01  9.77000e-04]
   [-1.41016e-01  7.91230e-02  6.18930e-02]]

  [[-1.15520e-02 -8.12800e-02  9.28320e-02]
   [-1.54257e-01  2.29433e-01 -7.51760e-02]
   [-2.46593e-01  1.46405e-01  1.00188e-01]]

  [[-1.29180e-02 -6.15930e-02  7.45120e-02]
   [-5.59860e-02  2.19831e-01 -1.63845e-01]
   [-4.97627e-01  2.09240e-01  2.88387e-01]]

  [[ 1.36050e-02 -3.02200e-02  1.66150e-02]
   [ 1.13925e-01  6.27810e-02 -1.76706e-01]
   [-6.67078e-01  3.67659e-01  2.99419e-01]]]


 [[[-3.56344e-01 -5.53470e-02  4.11691e-01]
   [-9.69220e-02  2.94590e-02  6.74630e-02]
   [-6.35180e-02  2.76540e-02  3.58630e-02]]

  [[-1.54499e-01 -7.39420e-02  2.28441e-01]
   [-1.66790e-01 -8.80000e-05  1.66878e-01]
   [-1.72370e-01  1.05565e-01  6.68040e-02]]

  [[ 2.38750e-02 -1.18256e-01  9.43810e-02]
   [-1.04707e-01 -1.08934e-01  2.13642e-01]
   [-3.69844e-01  1.80118e-01  1.89726e-01]]

  [[ 2.57140e-02 -7.94620e-02  5.37480e-02]
   [ 1.22328e-01 -2.38789e-01  1.16460e-01]
   [-5.98687e-01  3.02203e-01  2.96484e-01]]]]
_________________ TestAutograd.test_np_transducer_gradcheck_1 __________________

a = (<torchaudio_unittest.rnnt.autograd_cpu_test.TestAutograd testMethod=test_np_transducer_gradcheck_1>,)

    @wraps(func)
    def standalone_func(*a):
>       return func(*(a + p.args), **p.kwargs)

../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/parameterized/parameterized.py:533: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
autograd_impl.py:81: in test_np_transducer_gradcheck
    self.assert_grad(loss, inputs, enable_all_grad=False)
autograd_impl.py:47: in assert_grad
    assert gradcheck(loss, inputs, eps=1e-03, atol=1e-02, rtol=1e-02, nondet_tol=0.)
../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/torch/autograd/gradcheck.py:1245: in gradcheck
    return _gradcheck_helper(**args)
../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/torch/autograd/gradcheck.py:1265: in _gradcheck_helper
    _test_backward_mul_by_grad_output(outputs, tupled_inputs, check_sparse_nnz)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

outputs = (tensor([5.0957], grad_fn=<_NumpyTransducerBackward>),)
inputs = (tensor([[[[0.1000, 0.6000, 0.1000, 0.1000, 0.1000],
          [0.1000, 0.1000, 0.6000, 0.1000, 0.1000],
          [0....quires_grad=True), tensor([2], dtype=torch.int32), tensor([2], dtype=torch.int32), tensor([[1, 2]], dtype=torch.int32))
check_sparse_nnz = False

    def _test_backward_mul_by_grad_output(outputs, inputs, check_sparse_nnz) -> bool:
        # Tests that backward is multiplied by grad_output
        diff_input_list: List[torch.Tensor] = list(_iter_tensors(inputs, True))
        if not diff_input_list:
            raise GradcheckError("no Tensors requiring grad found in input")
        grads_input = torch.autograd.grad(outputs, diff_input_list,
                                          [torch.zeros_like(o, memory_format=torch.legacy_contiguous_format) for o in outputs],
                                          allow_unused=True)
        for gi, di in zip(grads_input, diff_input_list):
            if gi is None:
                continue
            if isinstance(gi, torch.Tensor) and gi.layout != torch.strided:
                if gi.layout != di.layout:
                    raise GradcheckError('grad is incorrect layout (' + str(gi.layout) + ' is not ' + str(di.layout) + ')')
                if gi.layout == torch.sparse_coo:
                    if gi.sparse_dim() != di.sparse_dim():
                        raise GradcheckError('grad is sparse tensor, but has incorrect sparse_dim')
                    if gi.dense_dim() != di.dense_dim():
                        raise GradcheckError('grad is sparse tensor, but has incorrect dense_dim')
                gi = gi.to_dense()
                di = di.to_dense()
    
            if check_sparse_nnz:
                if not torch.allclose(gi, torch.zeros_like(gi)):
                    raise GradcheckError('backward not multiplied by grad_output')
            elif not gi.eq(0).all():
>               raise GradcheckError('backward not multiplied by grad_output')
E               torch.autograd.gradcheck.GradcheckError: backward not multiplied by grad_output

../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/torch/autograd/gradcheck.py:799: GradcheckError
----------------------------- Captured stdout call -----------------------------
reference gradient
[[[[ 0.17703132 -0.39992708  0.17703132  0.17703132 -0.13116692]
   [ 0.12247062  0.12247062 -0.181684    0.12247062 -0.1857276 ]
   [ 0.06269141  0.06269141  0.06928471  0.12624498 -0.32091248]]

  [[ 0.05456069 -0.2182428   0.05456069  0.05456069  0.05456069]
   [ 0.12073967  0.12073967 -0.48295838  0.12073967  0.12073967]
   [ 0.30741188  0.16871123  0.18645471  0.16871123 -0.83128875]]]]
____________________ TestAutograd.test_RNNTLoss_gradcheck_0 ____________________

a = (<torchaudio_unittest.rnnt.autograd_cuda_test.TestAutograd testMethod=test_RNNTLoss_gradcheck_0>,)

    @wraps(func)
    def standalone_func(*a):
>       return func(*(a + p.args), **p.kwargs)

../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/parameterized/parameterized.py:533: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
autograd_impl.py:64: in test_RNNTLoss_gradcheck
    self.assert_grad(loss, inputs, enable_all_grad=False)
autograd_impl.py:47: in assert_grad
    assert gradcheck(loss, inputs, eps=1e-03, atol=1e-02, rtol=1e-02, nondet_tol=0.)
../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/torch/autograd/gradcheck.py:1245: in gradcheck
    return _gradcheck_helper(**args)
../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/torch/autograd/gradcheck.py:1258: in _gradcheck_helper
    _gradcheck_real_imag(gradcheck_fn, func, func_out, tupled_inputs, outputs, eps,
../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/torch/autograd/gradcheck.py:930: in _gradcheck_real_imag
    gradcheck_fn(func, func_out, tupled_inputs, outputs, eps,
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

func = RNNTLoss()
func_out = tensor([4.2807, 3.9384], device='cuda:0', grad_fn=<RNNTLossFunction>>)
tupled_inputs = (tensor([[[[-0.3646, -0.1560,  0.5206],
          [-0.1865,  0.1634,  0.0231],
          [-0.0825,  0.0413,  0.0413]],...e=torch.int32), tensor([4, 4], device='cuda:0', dtype=torch.int32), tensor([2, 2], device='cuda:0', dtype=torch.int32))
outputs = (tensor([4.2807, 3.9384], device='cuda:0', grad_fn=<RNNTLossFunction>>),)
eps = 0.001, rtol = 0.01, atol = 0.01, check_grad_dtypes = False
nondet_tol = 0.0

    def _slow_gradcheck(func, func_out, tupled_inputs, outputs, eps, rtol, atol, check_grad_dtypes,
                        nondet_tol, *, use_forward_ad=False, complex_indices=None, test_imag=False):
        if not outputs:
            return _check_no_differentiable_outputs(func, tupled_inputs, _as_tuple(func_out), eps)
    
        numerical = _transpose(_get_numerical_jacobian(func, tupled_inputs, outputs, eps=eps, is_forward_ad=use_forward_ad))
    
        if use_forward_ad:
            analytical_forward = _get_analytical_jacobian_forward_ad(func, tupled_inputs, outputs, check_grad_dtypes=check_grad_dtypes)
    
            for i, n_per_out in enumerate(numerical):
                for j, n in enumerate(n_per_out):
                    a = analytical_forward[j][i]
                    if not _allclose_with_type_promotion(a, n.to(a.device), rtol, atol):
                        raise GradcheckError(_get_notallclose_msg(a, n, i, j, complex_indices, test_imag,
                                                                  is_forward_ad=True))
        else:
            for i, o in enumerate(outputs):
                analytical = _check_analytical_jacobian_attributes(tupled_inputs, o, nondet_tol, check_grad_dtypes)
    
                for j, (a, n) in enumerate(zip(analytical, numerical[i])):
                    if not _allclose_with_type_promotion(a, n.to(a.device), rtol, atol):
>                       raise GradcheckError(_get_notallclose_msg(a, n, i, j, complex_indices, test_imag))
E                       torch.autograd.gradcheck.GradcheckError: Jacobian mismatch for output 0 with respect to input 0,
E                       numerical:tensor([[ 1.9532e+02,  2.5656e+02],
E                               [ 3.7359e+01,  1.0767e+01],
E                               [ 2.0287e+00,  1.2023e+00],
E                               [ 1.3320e+00,  3.9697e-01],
E                               [ 2.4867e-01,  1.9670e-01],
E                               [-3.6025e-01,  1.1015e-01],
E                               [-1.6212e-01,  6.2942e-02],
E                               [-6.7711e-02,  3.7193e-02],
E                               [-2.3127e-02,  2.2173e-02],
E                               [-1.9336e-01,  1.1921e-02],
E                               [-7.1049e-02,  7.3910e-03],
E                               [ 2.1219e-01,  4.2915e-03],
E                               [-2.1982e-01,  2.3842e-03],
E                               [ 2.4748e-01,  1.9073e-03],
E                               [-6.1989e-03,  9.5367e-04],
E                               [-1.9050e-01,  0.0000e+00],
E                               [ 1.2302e-01,  4.7684e-04],
E                               [ 9.7275e-02,  7.1526e-04],
E                               [-7.6294e-03, -2.3842e-04],
E                               [-8.6784e-02,  2.3842e-04],
E                               [ 1.1897e-01,  0.0000e+00],
E                               [-1.2517e-01,  0.0000e+00],
E                               [ 2.5821e-01,  0.0000e+00],
E                               [-9.5606e-02, -2.3842e-04],
E                               [-3.8981e-01,  0.0000e+00],
E                               [ 2.6083e-01, -2.3842e-04],
E                               [ 2.1482e-01,  0.0000e+00],
E                               [ 5.1737e-02, -4.7684e-04],
E                               [-7.4387e-02,  2.3842e-04],
E                               [ 5.2452e-02,  0.0000e+00],
E                               [ 1.4949e-01,  2.3842e-04],
E                               [ 1.2517e-01, -2.3842e-04],
E                               [-3.3593e-01,  0.0000e+00],
E                               [-7.9775e-01,  0.0000e+00],
E                               [ 4.7398e-01,  2.3842e-04],
E                               [ 4.5300e-01,  0.0000e+00],
E                               [ 3.0279e-02, -3.0446e-01],
E                               [ 1.1921e-03, -8.5115e-02],
E                               [ 7.1526e-04,  5.1975e-01],
E                               [ 2.3842e-04, -1.6117e-01],
E                               [ 9.5367e-04,  4.2439e-02],
E                               [-4.7684e-04,  1.5283e-01],
E                               [ 0.0000e+00, -8.1539e-02],
E                               [ 0.0000e+00,  4.7445e-02],
E                               [ 2.3842e-04,  3.1710e-02],
E                               [ 0.0000e+00, -1.2016e-01],
E                               [-7.1526e-04, -7.4387e-02],
E                               [ 0.0000e+00,  2.3484e-01],
E                               [-4.7684e-04, -2.1338e-01],
E                               [ 0.0000e+00,  9.0599e-03],
E                               [ 7.1526e-04,  2.2674e-01],
E                               [ 0.0000e+00, -2.0170e-01],
E                               [ 0.0000e+00,  1.3256e-01],
E                               [ 0.0000e+00,  9.8944e-02],
E                               [-2.3842e-04, -6.4373e-03],
E                               [ 2.3842e-04, -8.6546e-02],
E                               [ 0.0000e+00,  1.1945e-01],
E                               [ 2.3842e-04, -1.2493e-01],
E                               [-2.3842e-04, -8.0347e-02],
E                               [-2.3842e-04,  2.2817e-01],
E                               [ 2.3842e-04, -4.0579e-01],
E                               [ 0.0000e+00,  2.7943e-01],
E                               [-2.3842e-04,  2.1434e-01],
E                               [ 0.0000e+00,  5.1975e-02],
E                               [-2.3842e-04, -7.3910e-02],
E                               [ 0.0000e+00,  5.3406e-02],
E                               [-2.3842e-04,  1.4853e-01],
E                               [ 0.0000e+00, -3.2330e-01],
E                               [-4.7684e-04,  2.1815e-01],
E                               [ 2.3842e-04, -8.7214e-01],
E                               [ 0.0000e+00,  4.6754e-01],
E                               [ 0.0000e+00,  4.4966e-01]], device='cuda:0')
E                       analytical:tensor([[-0.3646, -0.0000],
E                               [-0.1560, -0.0000],
E                               [ 0.5206,  0.0000],
E                               [-0.1865, -0.0000],
E                               [ 0.1634,  0.0000],
E                               [ 0.0231,  0.0000],
E                               [-0.0825, -0.0000],
E                               [ 0.0413,  0.0000],
E                               [ 0.0413,  0.0000],
E                               [-0.1482, -0.0000],
E                               [-0.0941, -0.0000],
E                               [ 0.2423,  0.0000],
E                               [-0.2277, -0.0000],
E                               [ 0.2358,  0.0000],
E                               [-0.0080, -0.0000],
E                               [-0.2303, -0.0000],
E                               [ 0.1151,  0.0000],
E                               [ 0.1151,  0.0000],
E                               [-0.0173, -0.0000],
E                               [-0.0991, -0.0000],
E                               [ 0.1164,  0.0000],
E                               [-0.1395, -0.0000],
E                               [ 0.2369,  0.0000],
E                               [-0.0974, -0.0000],
E                               [-0.4631, -0.0000],
E                               [ 0.2316,  0.0000],
E                               [ 0.2316,  0.0000],
E                               [ 0.0413,  0.0000],
E                               [-0.0827, -0.0000],
E                               [ 0.0413,  0.0000],
E                               [ 0.1610,  0.0000],
E                               [ 0.1610,  0.0000],
E                               [-0.3220, -0.0000],
E                               [-0.8826, -0.0000],
E                               [ 0.4413,  0.0000],
E                               [ 0.4413,  0.0000],
E                               [-0.0000, -0.3645],
E                               [-0.0000, -0.1560],
E                               [ 0.0000,  0.5206],
E                               [-0.0000, -0.1865],
E                               [ 0.0000,  0.0231],
E                               [ 0.0000,  0.1634],
E                               [-0.0000, -0.0825],
E                               [ 0.0000,  0.0413],
E                               [ 0.0000,  0.0413],
E                               [-0.0000, -0.1482],
E                               [-0.0000, -0.0941],
E                               [ 0.0000,  0.2423],
E                               [-0.0000, -0.2277],
E                               [-0.0000, -0.0080],
E                               [ 0.0000,  0.2358],
E                               [-0.0000, -0.2303],
E                               [ 0.0000,  0.1151],
E                               [ 0.0000,  0.1151],
E                               [-0.0000, -0.0173],
E                               [-0.0000, -0.0991],
E                               [ 0.0000,  0.1164],
E                               [-0.0000, -0.1395],
E                               [-0.0000, -0.0974],
E                               [ 0.0000,  0.2369],
E                               [-0.0000, -0.4631],
E                               [ 0.0000,  0.2316],
E                               [ 0.0000,  0.2316],
E                               [ 0.0000,  0.0413],
E                               [-0.0000, -0.0827],
E                               [ 0.0000,  0.0413],
E                               [ 0.0000,  0.1610],
E                               [-0.0000, -0.3220],
E                               [ 0.0000,  0.1610],
E                               [-0.0000, -0.8826],
E                               [ 0.0000,  0.4412],
E                               [ 0.0000,  0.4411]], device='cuda:0')

../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/torch/autograd/gradcheck.py:978: GradcheckError
----------------------------- Captured stdout call -----------------------------
reference gradient
[[[[-1.86844e-01 -6.25550e-02  2.49399e-01]
   [-2.03377e-01  2.02399e-01  9.77000e-04]
   [-1.41016e-01  7.91230e-02  6.18930e-02]]

  [[-1.15520e-02 -8.12800e-02  9.28320e-02]
   [-1.54257e-01  2.29433e-01 -7.51760e-02]
   [-2.46593e-01  1.46405e-01  1.00188e-01]]

  [[-1.29180e-02 -6.15930e-02  7.45120e-02]
   [-5.59860e-02  2.19831e-01 -1.63845e-01]
   [-4.97627e-01  2.09240e-01  2.88387e-01]]

  [[ 1.36050e-02 -3.02200e-02  1.66150e-02]
   [ 1.13925e-01  6.27810e-02 -1.76706e-01]
   [-6.67078e-01  3.67659e-01  2.99419e-01]]]


 [[[-3.56344e-01 -5.53470e-02  4.11691e-01]
   [-9.69220e-02  2.94590e-02  6.74630e-02]
   [-6.35180e-02  2.76540e-02  3.58630e-02]]

  [[-1.54499e-01 -7.39420e-02  2.28441e-01]
   [-1.66790e-01 -8.80000e-05  1.66878e-01]
   [-1.72370e-01  1.05565e-01  6.68040e-02]]

  [[ 2.38750e-02 -1.18256e-01  9.43810e-02]
   [-1.04707e-01 -1.08934e-01  2.13642e-01]
   [-3.69844e-01  1.80118e-01  1.89726e-01]]

  [[ 2.57140e-02 -7.94620e-02  5.37480e-02]
   [ 1.22328e-01 -2.38789e-01  1.16460e-01]
   [-5.98687e-01  3.02203e-01  2.96484e-01]]]]
____________________ TestAutograd.test_RNNTLoss_gradcheck_1 ____________________

a = (<torchaudio_unittest.rnnt.autograd_cuda_test.TestAutograd testMethod=test_RNNTLoss_gradcheck_1>,)

    @wraps(func)
    def standalone_func(*a):
>       return func(*(a + p.args), **p.kwargs)

../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/parameterized/parameterized.py:533: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
autograd_impl.py:64: in test_RNNTLoss_gradcheck
    self.assert_grad(loss, inputs, enable_all_grad=False)
autograd_impl.py:47: in assert_grad
    assert gradcheck(loss, inputs, eps=1e-03, atol=1e-02, rtol=1e-02, nondet_tol=0.)
../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/torch/autograd/gradcheck.py:1245: in gradcheck
    return _gradcheck_helper(**args)
../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/torch/autograd/gradcheck.py:1258: in _gradcheck_helper
    _gradcheck_real_imag(gradcheck_fn, func, func_out, tupled_inputs, outputs, eps,
../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/torch/autograd/gradcheck.py:930: in _gradcheck_real_imag
    gradcheck_fn(func, func_out, tupled_inputs, outputs, eps,
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

func = RNNTLoss()
func_out = tensor([5.0957], device='cuda:0', grad_fn=<RNNTLossFunction>>)
tupled_inputs = (tensor([[[[ 0.2438, -0.5317,  0.2438,  0.2438, -0.1996],
          [ 0.1468,  0.1468, -0.2588,  0.1468, -0.1816],
   ..., dtype=torch.int32), tensor([2], device='cuda:0', dtype=torch.int32), tensor([2], device='cuda:0', dtype=torch.int32))
outputs = (tensor([5.0957], device='cuda:0', grad_fn=<RNNTLossFunction>>),)
eps = 0.001, rtol = 0.01, atol = 0.01, check_grad_dtypes = False
nondet_tol = 0.0

    def _slow_gradcheck(func, func_out, tupled_inputs, outputs, eps, rtol, atol, check_grad_dtypes,
                        nondet_tol, *, use_forward_ad=False, complex_indices=None, test_imag=False):
        if not outputs:
            return _check_no_differentiable_outputs(func, tupled_inputs, _as_tuple(func_out), eps)
    
        numerical = _transpose(_get_numerical_jacobian(func, tupled_inputs, outputs, eps=eps, is_forward_ad=use_forward_ad))
    
        if use_forward_ad:
            analytical_forward = _get_analytical_jacobian_forward_ad(func, tupled_inputs, outputs, check_grad_dtypes=check_grad_dtypes)
    
            for i, n_per_out in enumerate(numerical):
                for j, n in enumerate(n_per_out):
                    a = analytical_forward[j][i]
                    if not _allclose_with_type_promotion(a, n.to(a.device), rtol, atol):
                        raise GradcheckError(_get_notallclose_msg(a, n, i, j, complex_indices, test_imag,
                                                                  is_forward_ad=True))
        else:
            for i, o in enumerate(outputs):
                analytical = _check_analytical_jacobian_attributes(tupled_inputs, o, nondet_tol, check_grad_dtypes)
    
                for j, (a, n) in enumerate(zip(analytical, numerical[i])):
                    if not _allclose_with_type_promotion(a, n.to(a.device), rtol, atol):
>                       raise GradcheckError(_get_notallclose_msg(a, n, i, j, complex_indices, test_imag))
E                       torch.autograd.gradcheck.GradcheckError: Jacobian mismatch for output 0 with respect to input 0,
E                       numerical:tensor([[ 1.5292e+02],
E                               [ 1.2874e+01],
E                               [ 7.3957e-01],
E                               [ 3.3808e-01],
E                               [-7.9703e-01],
E                               [ 7.1311e-01],
E                               [ 3.2520e-01],
E                               [ 2.5606e-01],
E                               [ 1.4329e-01],
E                               [-1.9002e-01],
E                               [ 1.0657e-01],
E                               [ 9.7036e-02],
E                               [ 9.1076e-02],
E                               [ 8.3685e-02],
E                               [-2.0671e-01],
E                               [ 1.1563e-01],
E                               [-3.5834e-01],
E                               [ 1.6141e-01],
E                               [ 7.1287e-02],
E                               [ 6.9857e-02],
E                               [ 1.4281e-01],
E                               [ 1.4949e-01],
E                               [-6.6090e-01],
E                               [ 2.5082e-01],
E                               [ 1.4830e-01],
E                               [ 2.5296e-01],
E                               [ 2.5010e-01],
E                               [ 2.5296e-01],
E                               [ 2.5177e-01],
E                               [-8.9502e-01]], device='cuda:0')
E                       analytical:tensor([[ 0.2438],
E                               [-0.5317],
E                               [ 0.2438],
E                               [ 0.2438],
E                               [-0.1996],
E                               [ 0.1468],
E                               [ 0.1468],
E                               [-0.2588],
E                               [ 0.1468],
E                               [-0.1816],
E                               [ 0.0762],
E                               [ 0.0762],
E                               [ 0.0762],
E                               [ 0.0762],
E                               [-0.3046],
E                               [ 0.0760],
E                               [-0.3041],
E                               [ 0.0760],
E                               [ 0.0760],
E                               [ 0.0760],
E                               [ 0.1433],
E                               [ 0.1433],
E                               [-0.5733],
E                               [ 0.1433],
E                               [ 0.1433],
E                               [ 0.2318],
E                               [ 0.2318],
E                               [ 0.2318],
E                               [ 0.2318],
E                               [-0.9273]], device='cuda:0')

../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/torch/autograd/gradcheck.py:978: GradcheckError
----------------------------- Captured stdout call -----------------------------
reference gradient
[[[[ 0.17703132 -0.39992708  0.17703132  0.17703132 -0.13116692]
   [ 0.12247062  0.12247062 -0.181684    0.12247062 -0.1857276 ]
   [ 0.06269141  0.06269141  0.06928471  0.12624498 -0.32091248]]

  [[ 0.05456069 -0.2182428   0.05456069  0.05456069  0.05456069]
   [ 0.12073967  0.12073967 -0.48295838  0.12073967  0.12073967]
   [ 0.30741188  0.16871123  0.18645471  0.16871123 -0.83128875]]]]
_________________ TestAutograd.test_np_transducer_gradcheck_0 __________________

a = (<torchaudio_unittest.rnnt.autograd_cuda_test.TestAutograd testMethod=test_np_transducer_gradcheck_0>,)

    @wraps(func)
    def standalone_func(*a):
>       return func(*(a + p.args), **p.kwargs)

../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/parameterized/parameterized.py:533: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
autograd_impl.py:81: in test_np_transducer_gradcheck
    self.assert_grad(loss, inputs, enable_all_grad=False)
autograd_impl.py:47: in assert_grad
    assert gradcheck(loss, inputs, eps=1e-03, atol=1e-02, rtol=1e-02, nondet_tol=0.)
../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/torch/autograd/gradcheck.py:1245: in gradcheck
    return _gradcheck_helper(**args)
../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/torch/autograd/gradcheck.py:1258: in _gradcheck_helper
    _gradcheck_real_imag(gradcheck_fn, func, func_out, tupled_inputs, outputs, eps,
../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/torch/autograd/gradcheck.py:930: in _gradcheck_real_imag
    gradcheck_fn(func, func_out, tupled_inputs, outputs, eps,
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

func = NumpyTransducerLoss()
func_out = tensor([4.2807, 3.9384], device='cuda:0', grad_fn=<_NumpyTransducerBackward>)
tupled_inputs = (tensor([[[[0.0654, 0.7875, 0.0816],
          [0.5297, 0.7507, 0.7541],
          [0.6098, 0.8681, 0.6225]],

       ...nsor([2, 2], device='cuda:0', dtype=torch.int32), tensor([[1, 2],
        [1, 1]], device='cuda:0', dtype=torch.int32))
outputs = (tensor([4.2807, 3.9384], device='cuda:0', grad_fn=<_NumpyTransducerBackward>),)
eps = 0.001, rtol = 0.01, atol = 0.01, check_grad_dtypes = False
nondet_tol = 0.0

    def _slow_gradcheck(func, func_out, tupled_inputs, outputs, eps, rtol, atol, check_grad_dtypes,
                        nondet_tol, *, use_forward_ad=False, complex_indices=None, test_imag=False):
        if not outputs:
            return _check_no_differentiable_outputs(func, tupled_inputs, _as_tuple(func_out), eps)
    
        numerical = _transpose(_get_numerical_jacobian(func, tupled_inputs, outputs, eps=eps, is_forward_ad=use_forward_ad))
    
        if use_forward_ad:
            analytical_forward = _get_analytical_jacobian_forward_ad(func, tupled_inputs, outputs, check_grad_dtypes=check_grad_dtypes)
    
            for i, n_per_out in enumerate(numerical):
                for j, n in enumerate(n_per_out):
                    a = analytical_forward[j][i]
                    if not _allclose_with_type_promotion(a, n.to(a.device), rtol, atol):
                        raise GradcheckError(_get_notallclose_msg(a, n, i, j, complex_indices, test_imag,
                                                                  is_forward_ad=True))
        else:
            for i, o in enumerate(outputs):
                analytical = _check_analytical_jacobian_attributes(tupled_inputs, o, nondet_tol, check_grad_dtypes)
    
                for j, (a, n) in enumerate(zip(analytical, numerical[i])):
                    if not _allclose_with_type_promotion(a, n.to(a.device), rtol, atol):
>                       raise GradcheckError(_get_notallclose_msg(a, n, i, j, complex_indices, test_imag))
E                       torch.autograd.gradcheck.GradcheckError: Jacobian mismatch for output 0 with respect to input 0,
E                       numerical:tensor([[-0.1869,  0.0000],
E                               [-0.0625,  0.0000],
E                               [ 0.2494,  0.0000],
E                               [-0.2034,  0.0000],
E                               [ 0.2024,  0.0000],
E                               [ 0.0010,  0.0000],
E                               [-0.1411,  0.0000],
E                               [ 0.0789,  0.0000],
E                               [ 0.0620,  0.0000],
E                               [-0.0114,  0.0000],
E                               [-0.0813,  0.0000],
E                               [ 0.0930,  0.0000],
E                               [-0.1540,  0.0000],
E                               [ 0.2296,  0.0000],
E                               [-0.0751,  0.0000],
E                               [-0.2465,  0.0000],
E                               [ 0.1464,  0.0000],
E                               [ 0.1004,  0.0000],
E                               [-0.0131,  0.0000],
E                               [-0.0618,  0.0000],
E                               [ 0.0744,  0.0000],
E                               [-0.0560,  0.0000],
E                               [ 0.2201,  0.0000],
E                               [-0.1640,  0.0000],
E                               [-0.4976,  0.0000],
E                               [ 0.2096,  0.0000],
E                               [ 0.2887,  0.0000],
E                               [ 0.0136,  0.0000],
E                               [-0.0303,  0.0000],
E                               [ 0.0167,  0.0000],
E                               [ 0.1137,  0.0000],
E                               [ 0.0627,  0.0000],
E                               [-0.1767,  0.0000],
E                               [-0.6671,  0.0000],
E                               [ 0.3676,  0.0000],
E                               [ 0.2995,  0.0000],
E                               [ 0.0000, -0.3563],
E                               [ 0.0000, -0.0554],
E                               [ 0.0000,  0.4117],
E                               [ 0.0000, -0.0969],
E                               [ 0.0000,  0.0294],
E                               [ 0.0000,  0.0675],
E                               [ 0.0000, -0.0634],
E                               [ 0.0000,  0.0278],
E                               [ 0.0000,  0.0359],
E                               [ 0.0000, -0.1546],
E                               [ 0.0000, -0.0737],
E                               [ 0.0000,  0.2285],
E                               [ 0.0000, -0.1669],
E                               [ 0.0000,  0.0000],
E                               [ 0.0000,  0.1669],
E                               [ 0.0000, -0.1724],
E                               [ 0.0000,  0.1055],
E                               [ 0.0000,  0.0670],
E                               [ 0.0000,  0.0240],
E                               [ 0.0000, -0.1181],
E                               [ 0.0000,  0.0942],
E                               [ 0.0000, -0.1047],
E                               [ 0.0000, -0.1090],
E                               [ 0.0000,  0.2136],
E                               [ 0.0000, -0.3699],
E                               [ 0.0000,  0.1799],
E                               [ 0.0000,  0.1895],
E                               [ 0.0000,  0.0259],
E                               [ 0.0000, -0.0793],
E                               [ 0.0000,  0.0539],
E                               [ 0.0000,  0.1224],
E                               [ 0.0000, -0.2387],
E                               [ 0.0000,  0.1165],
E                               [ 0.0000, -0.5988],
E                               [ 0.0000,  0.3023],
E                               [ 0.0000,  0.2966]], device='cuda:0')
E                       analytical:tensor([[-1.8684e-01, -1.8684e-01],
E                               [-6.2555e-02, -6.2555e-02],
E                               [ 2.4940e-01,  2.4940e-01],
E                               [-2.0338e-01, -2.0338e-01],
E                               [ 2.0240e-01,  2.0240e-01],
E                               [ 9.7747e-04,  9.7747e-04],
E                               [-1.4102e-01, -1.4102e-01],
E                               [ 7.9123e-02,  7.9123e-02],
E                               [ 6.1893e-02,  6.1893e-02],
E                               [-1.1552e-02, -1.1552e-02],
E                               [-8.1280e-02, -8.1280e-02],
E                               [ 9.2832e-02,  9.2832e-02],
E                               [-1.5426e-01, -1.5426e-01],
E                               [ 2.2943e-01,  2.2943e-01],
E                               [-7.5175e-02, -7.5175e-02],
E                               [-2.4659e-01, -2.4659e-01],
E                               [ 1.4640e-01,  1.4640e-01],
E                               [ 1.0019e-01,  1.0019e-01],
E                               [-1.2918e-02, -1.2918e-02],
E                               [-6.1593e-02, -6.1593e-02],
E                               [ 7.4512e-02,  7.4512e-02],
E                               [-5.5986e-02, -5.5986e-02],
E                               [ 2.1983e-01,  2.1983e-01],
E                               [-1.6385e-01, -1.6385e-01],
E                               [-4.9763e-01, -4.9763e-01],
E                               [ 2.0924e-01,  2.0924e-01],
E                               [ 2.8839e-01,  2.8839e-01],
E                               [ 1.3605e-02,  1.3605e-02],
E                               [-3.0220e-02, -3.0220e-02],
E                               [ 1.6615e-02,  1.6615e-02],
E                               [ 1.1392e-01,  1.1392e-01],
E                               [ 6.2781e-02,  6.2781e-02],
E                               [-1.7671e-01, -1.7671e-01],
E                               [-6.6708e-01, -6.6708e-01],
E                               [ 3.6766e-01,  3.6766e-01],
E                               [ 2.9942e-01,  2.9942e-01],
E                               [-3.5634e-01, -3.5634e-01],
E                               [-5.5347e-02, -5.5347e-02],
E                               [ 4.1169e-01,  4.1169e-01],
E                               [-9.6922e-02, -9.6922e-02],
E                               [ 2.9459e-02,  2.9459e-02],
E                               [ 6.7463e-02,  6.7463e-02],
E                               [-6.3518e-02, -6.3518e-02],
E                               [ 2.7654e-02,  2.7654e-02],
E                               [ 3.5863e-02,  3.5863e-02],
E                               [-1.5450e-01, -1.5450e-01],
E                               [-7.3942e-02, -7.3942e-02],
E                               [ 2.2844e-01,  2.2844e-01],
E                               [-1.6679e-01, -1.6679e-01],
E                               [-8.7988e-05, -8.7988e-05],
E                               [ 1.6688e-01,  1.6688e-01],
E                               [-1.7237e-01, -1.7237e-01],
E                               [ 1.0557e-01,  1.0557e-01],
E                               [ 6.6804e-02,  6.6804e-02],
E                               [ 2.3875e-02,  2.3875e-02],
E                               [-1.1826e-01, -1.1826e-01],
E                               [ 9.4381e-02,  9.4381e-02],
E                               [-1.0471e-01, -1.0471e-01],
E                               [-1.0893e-01, -1.0893e-01],
E                               [ 2.1364e-01,  2.1364e-01],
E                               [-3.6984e-01, -3.6984e-01],
E                               [ 1.8012e-01,  1.8012e-01],
E                               [ 1.8973e-01,  1.8973e-01],
E                               [ 2.5714e-02,  2.5714e-02],
E                               [-7.9462e-02, -7.9462e-02],
E                               [ 5.3748e-02,  5.3748e-02],
E                               [ 1.2233e-01,  1.2233e-01],
E                               [-2.3879e-01, -2.3879e-01],
E                               [ 1.1646e-01,  1.1646e-01],
E                               [-5.9869e-01, -5.9869e-01],
E                               [ 3.0220e-01,  3.0220e-01],
E                               [ 2.9648e-01,  2.9648e-01]], device='cuda:0')

../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/torch/autograd/gradcheck.py:978: GradcheckError
----------------------------- Captured stdout call -----------------------------
reference gradient
[[[[-1.86844e-01 -6.25550e-02  2.49399e-01]
   [-2.03377e-01  2.02399e-01  9.77000e-04]
   [-1.41016e-01  7.91230e-02  6.18930e-02]]

  [[-1.15520e-02 -8.12800e-02  9.28320e-02]
   [-1.54257e-01  2.29433e-01 -7.51760e-02]
   [-2.46593e-01  1.46405e-01  1.00188e-01]]

  [[-1.29180e-02 -6.15930e-02  7.45120e-02]
   [-5.59860e-02  2.19831e-01 -1.63845e-01]
   [-4.97627e-01  2.09240e-01  2.88387e-01]]

  [[ 1.36050e-02 -3.02200e-02  1.66150e-02]
   [ 1.13925e-01  6.27810e-02 -1.76706e-01]
   [-6.67078e-01  3.67659e-01  2.99419e-01]]]


 [[[-3.56344e-01 -5.53470e-02  4.11691e-01]
   [-9.69220e-02  2.94590e-02  6.74630e-02]
   [-6.35180e-02  2.76540e-02  3.58630e-02]]

  [[-1.54499e-01 -7.39420e-02  2.28441e-01]
   [-1.66790e-01 -8.80000e-05  1.66878e-01]
   [-1.72370e-01  1.05565e-01  6.68040e-02]]

  [[ 2.38750e-02 -1.18256e-01  9.43810e-02]
   [-1.04707e-01 -1.08934e-01  2.13642e-01]
   [-3.69844e-01  1.80118e-01  1.89726e-01]]

  [[ 2.57140e-02 -7.94620e-02  5.37480e-02]
   [ 1.22328e-01 -2.38789e-01  1.16460e-01]
   [-5.98687e-01  3.02203e-01  2.96484e-01]]]]
_________________ TestAutograd.test_np_transducer_gradcheck_1 __________________

a = (<torchaudio_unittest.rnnt.autograd_cuda_test.TestAutograd testMethod=test_np_transducer_gradcheck_1>,)

    @wraps(func)
    def standalone_func(*a):
>       return func(*(a + p.args), **p.kwargs)

../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/parameterized/parameterized.py:533: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
autograd_impl.py:81: in test_np_transducer_gradcheck
    self.assert_grad(loss, inputs, enable_all_grad=False)
autograd_impl.py:47: in assert_grad
    assert gradcheck(loss, inputs, eps=1e-03, atol=1e-02, rtol=1e-02, nondet_tol=0.)
../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/torch/autograd/gradcheck.py:1245: in gradcheck
    return _gradcheck_helper(**args)
../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/torch/autograd/gradcheck.py:1265: in _gradcheck_helper
    _test_backward_mul_by_grad_output(outputs, tupled_inputs, check_sparse_nnz)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

outputs = (tensor([5.0957], device='cuda:0', grad_fn=<_NumpyTransducerBackward>),)
inputs = (tensor([[[[0.1000, 0.6000, 0.1000, 0.1000, 0.1000],
          [0.1000, 0.1000, 0.6000, 0.1000, 0.1000],
          [0....pe=torch.int32), tensor([2], device='cuda:0', dtype=torch.int32), tensor([[1, 2]], device='cuda:0', dtype=torch.int32))
check_sparse_nnz = False

    def _test_backward_mul_by_grad_output(outputs, inputs, check_sparse_nnz) -> bool:
        # Tests that backward is multiplied by grad_output
        diff_input_list: List[torch.Tensor] = list(_iter_tensors(inputs, True))
        if not diff_input_list:
            raise GradcheckError("no Tensors requiring grad found in input")
        grads_input = torch.autograd.grad(outputs, diff_input_list,
                                          [torch.zeros_like(o, memory_format=torch.legacy_contiguous_format) for o in outputs],
                                          allow_unused=True)
        for gi, di in zip(grads_input, diff_input_list):
            if gi is None:
                continue
            if isinstance(gi, torch.Tensor) and gi.layout != torch.strided:
                if gi.layout != di.layout:
                    raise GradcheckError('grad is incorrect layout (' + str(gi.layout) + ' is not ' + str(di.layout) + ')')
                if gi.layout == torch.sparse_coo:
                    if gi.sparse_dim() != di.sparse_dim():
                        raise GradcheckError('grad is sparse tensor, but has incorrect sparse_dim')
                    if gi.dense_dim() != di.dense_dim():
                        raise GradcheckError('grad is sparse tensor, but has incorrect dense_dim')
                gi = gi.to_dense()
                di = di.to_dense()
    
            if check_sparse_nnz:
                if not torch.allclose(gi, torch.zeros_like(gi)):
                    raise GradcheckError('backward not multiplied by grad_output')
            elif not gi.eq(0).all():
>               raise GradcheckError('backward not multiplied by grad_output')
E               torch.autograd.gradcheck.GradcheckError: backward not multiplied by grad_output

../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/torch/autograd/gradcheck.py:799: GradcheckError
----------------------------- Captured stdout call -----------------------------
reference gradient
[[[[ 0.17703132 -0.39992708  0.17703132  0.17703132 -0.13116692]
   [ 0.12247062  0.12247062 -0.181684    0.12247062 -0.1857276 ]
   [ 0.06269141  0.06269141  0.06928471  0.12624498 -0.32091248]]

  [[ 0.05456069 -0.2182428   0.05456069  0.05456069  0.05456069]
   [ 0.12073967  0.12073967 -0.48295838  0.12073967  0.12073967]
   [ 0.30741188  0.16871123  0.18645471  0.16871123 -0.83128875]]]]
=============================== warnings summary ===============================
test/torchaudio_unittest/rnnt/autograd_cpu_test.py::TestAutograd::test_RNNTLoss_gradcheck_0
test/torchaudio_unittest/rnnt/autograd_cpu_test.py::TestAutograd::test_RNNTLoss_gradcheck_1
test/torchaudio_unittest/rnnt/autograd_cpu_test.py::TestAutograd::test_np_transducer_gradcheck_0
test/torchaudio_unittest/rnnt/autograd_cpu_test.py::TestAutograd::test_np_transducer_gradcheck_1
test/torchaudio_unittest/rnnt/autograd_cuda_test.py::TestAutograd::test_RNNTLoss_gradcheck_0
test/torchaudio_unittest/rnnt/autograd_cuda_test.py::TestAutograd::test_RNNTLoss_gradcheck_1
test/torchaudio_unittest/rnnt/autograd_cuda_test.py::TestAutograd::test_np_transducer_gradcheck_0
test/torchaudio_unittest/rnnt/autograd_cuda_test.py::TestAutograd::test_np_transducer_gradcheck_1
  /private/home/vincentqb/miniconda/envs/torch-nightly/lib/python3.8/site-packages/torch/autograd/gradcheck.py:632: UserWarning: Input #0 requires gradient and is not a double precision floating point or complex. This check will likely fail if all the inputs are not of double precision floating point or complex. 
    warnings.warn(

-- Docs: https://docs.pytest.org/en/stable/warnings.html
=========================== short test summary info ============================
FAILED autograd_cpu_test.py::TestAutograd::test_RNNTLoss_gradcheck_0 - torch....
FAILED autograd_cpu_test.py::TestAutograd::test_RNNTLoss_gradcheck_1 - torch....
FAILED autograd_cpu_test.py::TestAutograd::test_np_transducer_gradcheck_0 - t...
FAILED autograd_cpu_test.py::TestAutograd::test_np_transducer_gradcheck_1 - t...
FAILED autograd_cuda_test.py::TestAutograd::test_RNNTLoss_gradcheck_0 - torch...
FAILED autograd_cuda_test.py::TestAutograd::test_RNNTLoss_gradcheck_1 - torch...
FAILED autograd_cuda_test.py::TestAutograd::test_np_transducer_gradcheck_0 - ...
FAILED autograd_cuda_test.py::TestAutograd::test_np_transducer_gradcheck_1 - ...
=================== 8 failed, 18 passed, 8 warnings in 4.95s ===================

vincentqb · 2021-05-26T20:45:44Z

Patch:

diff --git a/test/torchaudio_unittest/rnnt/numpy_transducer.py b/test/torchaudio_unittest/rnnt/numpy_transducer.py
index a284bc1..b4896b1 100644
--- a/test/torchaudio_unittest/rnnt/numpy_transducer.py
+++ b/test/torchaudio_unittest/rnnt/numpy_transducer.py
@@ -34,6 +34,8 @@ class _NumpyTransducer(torch.autograd.Function):
 
     @staticmethod
     def backward(ctx, output_gradients):
+        output_gradients = output_gradients.view(-1, 1, 1, 1).to(ctx.grads)
+        ctx.grads.mul_(output_gradients).to(ctx.grads)
         return ctx.grads, None, None, None, None, None, None, None, None
 
     @staticmethod

Before:

============================================================================================================= short test summary info ==============================================================================================================
FAILED autograd_cpu_test.py::TestAutograd::test_RNNTLoss_gradcheck_0 - torch.autograd.gradcheck.GradcheckError: Jacobian mismatch for output 0 with respect to input 0,
FAILED autograd_cpu_test.py::TestAutograd::test_RNNTLoss_gradcheck_1 - torch.autograd.gradcheck.GradcheckError: Jacobian mismatch for output 0 with respect to input 0,
FAILED autograd_cpu_test.py::TestAutograd::test_np_transducer_gradcheck_0 - torch.autograd.gradcheck.GradcheckError: Jacobian mismatch for output 0 with respect to input 0,
FAILED autograd_cpu_test.py::TestAutograd::test_np_transducer_gradcheck_1 - torch.autograd.gradcheck.GradcheckError: backward not multiplied by grad_output
FAILED autograd_cuda_test.py::TestAutograd::test_RNNTLoss_gradcheck_0 - torch.autograd.gradcheck.GradcheckError: Jacobian mismatch for output 0 with respect to input 0,
FAILED autograd_cuda_test.py::TestAutograd::test_RNNTLoss_gradcheck_1 - torch.autograd.gradcheck.GradcheckError: Jacobian mismatch for output 0 with respect to input 0,
FAILED autograd_cuda_test.py::TestAutograd::test_np_transducer_gradcheck_0 - torch.autograd.gradcheck.GradcheckError: Jacobian mismatch for output 0 with respect to input 0,
FAILED autograd_cuda_test.py::TestAutograd::test_np_transducer_gradcheck_1 - torch.autograd.gradcheck.GradcheckError: backward not multiplied by grad_output
===================================================================================================== 8 failed, 18 passed, 8 warnings in 4.57s =====================================================================================================

After:

============================================================================================================= short test summary info ==============================================================================================================
FAILED autograd_cpu_test.py::TestAutograd::test_RNNTLoss_gradcheck_0 - torch.autograd.gradcheck.GradcheckError: Jacobian mismatch for output 0 with respect to input 0,
FAILED autograd_cpu_test.py::TestAutograd::test_RNNTLoss_gradcheck_1 - torch.autograd.gradcheck.GradcheckError: Jacobian mismatch for output 0 with respect to input 0,
FAILED autograd_cpu_test.py::TestAutograd::test_np_transducer_gradcheck_0 - torch.autograd.gradcheck.GradcheckError: Backward is not reentrant, i.e., running backward with same input and grad_output multiple times gives different values, alt...
FAILED autograd_cuda_test.py::TestAutograd::test_RNNTLoss_gradcheck_0 - torch.autograd.gradcheck.GradcheckError: Jacobian mismatch for output 0 with respect to input 0,
FAILED autograd_cuda_test.py::TestAutograd::test_RNNTLoss_gradcheck_1 - torch.autograd.gradcheck.GradcheckError: Jacobian mismatch for output 0 with respect to input 0,
FAILED autograd_cuda_test.py::TestAutograd::test_np_transducer_gradcheck_0 - torch.autograd.gradcheck.GradcheckError: Backward is not reentrant, i.e., running backward with same input and grad_output multiple times gives different values, al...
===================================================================================================== 6 failed, 20 passed, 8 warnings in 4.80s =====================================================================================================

Details

============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-6.2.4, py-1.10.0, pluggy-0.13.1 -- /private/home/vincentqb/miniconda/envs/torch-nightly/bin/python
cachedir: .pytest_cache
rootdir: /private/home/vincentqb/autograd/audio
plugins: hydra-core-1.0.6
collecting ... collected 26 items

autograd_cpu_test.py::TestAutograd::test_RNNTLoss_gradcheck_0 FAILED     [  3%]
autograd_cpu_test.py::TestAutograd::test_RNNTLoss_gradcheck_1 FAILED     [  7%]
autograd_cpu_test.py::TestAutograd::test_np_transducer_gradcheck_0 FAILED [ 11%]
autograd_cpu_test.py::TestAutograd::test_np_transducer_gradcheck_1 PASSED [ 15%]
autograd_cuda_test.py::TestAutograd::test_RNNTLoss_gradcheck_0 FAILED    [ 19%]
autograd_cuda_test.py::TestAutograd::test_RNNTLoss_gradcheck_1 FAILED    [ 23%]
autograd_cuda_test.py::TestAutograd::test_np_transducer_gradcheck_0 FAILED [ 26%]
autograd_cuda_test.py::TestAutograd::test_np_transducer_gradcheck_1 PASSED [ 30%]
rnnt_loss_cpu_test.py::TestRNNTLoss::test_basic_backward PASSED          [ 34%]
rnnt_loss_cpu_test.py::TestRNNTLoss::test_costs_and_gradients_B1_T2_U3_D5_fp16 PASSED [ 38%]
rnnt_loss_cpu_test.py::TestRNNTLoss::test_costs_and_gradients_B1_T2_U3_D5_fp32 PASSED [ 42%]
rnnt_loss_cpu_test.py::TestRNNTLoss::test_costs_and_gradients_B2_T4_U3_D3_fp16 PASSED [ 46%]
rnnt_loss_cpu_test.py::TestRNNTLoss::test_costs_and_gradients_B2_T4_U3_D3_fp32 PASSED [ 50%]
rnnt_loss_cpu_test.py::TestRNNTLoss::test_costs_and_gradients_random_data_with_numpy_fp32 PASSED [ 53%]
rnnt_loss_cpu_test.py::TestRNNTLoss::test_rnnt_nonfused_log_softmax PASSED [ 57%]
rnnt_loss_cuda_test.py::TestRNNTLoss::test_basic_backward PASSED         [ 61%]
rnnt_loss_cuda_test.py::TestRNNTLoss::test_costs_and_gradients_B1_T2_U3_D5_fp16 PASSED [ 65%]
rnnt_loss_cuda_test.py::TestRNNTLoss::test_costs_and_gradients_B1_T2_U3_D5_fp32 PASSED [ 69%]
rnnt_loss_cuda_test.py::TestRNNTLoss::test_costs_and_gradients_B2_T4_U3_D3_fp16 PASSED [ 73%]
rnnt_loss_cuda_test.py::TestRNNTLoss::test_costs_and_gradients_B2_T4_U3_D3_fp32 PASSED [ 76%]
rnnt_loss_cuda_test.py::TestRNNTLoss::test_costs_and_gradients_random_data_with_numpy_fp32 PASSED [ 80%]
rnnt_loss_cuda_test.py::TestRNNTLoss::test_rnnt_nonfused_log_softmax PASSED [ 84%]
torchscript_consistency_cpu_test.py::TestRNNTLoss::test_RNNTLoss PASSED  [ 88%]
torchscript_consistency_cpu_test.py::TestRNNTLoss::test_rnnt_loss PASSED [ 92%]
torchscript_consistency_cuda_test.py::TestRNNTLoss::test_RNNTLoss PASSED [ 96%]
torchscript_consistency_cuda_test.py::TestRNNTLoss::test_rnnt_loss PASSED [100%]

=================================== FAILURES ===================================
____________________ TestAutograd.test_RNNTLoss_gradcheck_0 ____________________

a = (<torchaudio_unittest.rnnt.autograd_cpu_test.TestAutograd testMethod=test_RNNTLoss_gradcheck_0>,)

    @wraps(func)
    def standalone_func(*a):
>       return func(*(a + p.args), **p.kwargs)

../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/parameterized/parameterized.py:533: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
autograd_impl.py:64: in test_RNNTLoss_gradcheck
    self.assert_grad(loss, inputs, enable_all_grad=False)
autograd_impl.py:47: in assert_grad
    assert gradcheck(loss, inputs, eps=1e-03, atol=1e-02, rtol=1e-02, nondet_tol=0.)
../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/torch/autograd/gradcheck.py:1245: in gradcheck
    return _gradcheck_helper(**args)
../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/torch/autograd/gradcheck.py:1258: in _gradcheck_helper
    _gradcheck_real_imag(gradcheck_fn, func, func_out, tupled_inputs, outputs, eps,
../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/torch/autograd/gradcheck.py:930: in _gradcheck_real_imag
    gradcheck_fn(func, func_out, tupled_inputs, outputs, eps,
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

func = RNNTLoss()
func_out = tensor([4.2807, 3.9384], grad_fn=<RNNTLossFunction>>)
tupled_inputs = (tensor([[[[-0.3646, -0.1560,  0.5206],
          [-0.1865,  0.1634,  0.0231],
          [-0.0825,  0.0413,  0.0413]],...sor([[1, 2],
        [1, 1]], dtype=torch.int32), tensor([4, 4], dtype=torch.int32), tensor([2, 2], dtype=torch.int32))
outputs = (tensor([4.2807, 3.9384], grad_fn=<RNNTLossFunction>>),), eps = 0.001
rtol = 0.01, atol = 0.01, check_grad_dtypes = False, nondet_tol = 0.0

    def _slow_gradcheck(func, func_out, tupled_inputs, outputs, eps, rtol, atol, check_grad_dtypes,
                        nondet_tol, *, use_forward_ad=False, complex_indices=None, test_imag=False):
        if not outputs:
            return _check_no_differentiable_outputs(func, tupled_inputs, _as_tuple(func_out), eps)
    
        numerical = _transpose(_get_numerical_jacobian(func, tupled_inputs, outputs, eps=eps, is_forward_ad=use_forward_ad))
    
        if use_forward_ad:
            analytical_forward = _get_analytical_jacobian_forward_ad(func, tupled_inputs, outputs, check_grad_dtypes=check_grad_dtypes)
    
            for i, n_per_out in enumerate(numerical):
                for j, n in enumerate(n_per_out):
                    a = analytical_forward[j][i]
                    if not _allclose_with_type_promotion(a, n.to(a.device), rtol, atol):
                        raise GradcheckError(_get_notallclose_msg(a, n, i, j, complex_indices, test_imag,
                                                                  is_forward_ad=True))
        else:
            for i, o in enumerate(outputs):
                analytical = _check_analytical_jacobian_attributes(tupled_inputs, o, nondet_tol, check_grad_dtypes)
    
                for j, (a, n) in enumerate(zip(analytical, numerical[i])):
                    if not _allclose_with_type_promotion(a, n.to(a.device), rtol, atol):
>                       raise GradcheckError(_get_notallclose_msg(a, n, i, j, complex_indices, test_imag))
E                       torch.autograd.gradcheck.GradcheckError: Jacobian mismatch for output 0 with respect to input 0,
E                       numerical:tensor([[ 1.9532e+02,  2.5656e+02],
E                               [ 3.7360e+01,  1.0766e+01],
E                               [ 2.0285e+00,  1.2028e+00],
E                               [ 1.3320e+00,  3.9721e-01],
E                               [ 2.4867e-01,  1.9717e-01],
E                               [-3.6073e-01,  1.1039e-01],
E                               [-1.6189e-01,  6.2943e-02],
E                               [-6.7949e-02,  3.6478e-02],
E                               [-2.3365e-02,  2.1696e-02],
E                               [-1.9288e-01,  1.2875e-02],
E                               [-7.1764e-02,  7.3910e-03],
E                               [ 2.1267e-01,  4.0531e-03],
E                               [-2.2006e-01,  2.6226e-03],
E                               [ 2.4772e-01,  2.1458e-03],
E                               [-5.7220e-03,  9.5367e-04],
E                               [-1.9026e-01,  4.7684e-04],
E                               [ 1.2255e-01,  4.7684e-04],
E                               [ 9.7036e-02,  2.3842e-04],
E                               [-6.9141e-03, -4.7684e-04],
E                               [-8.7261e-02, -2.3842e-04],
E                               [ 1.1921e-01,  4.7684e-04],
E                               [-1.2374e-01, -2.3842e-04],
E                               [ 2.5749e-01,  0.0000e+00],
E                               [-9.5367e-02, -2.3842e-04],
E                               [-3.8958e-01,  0.0000e+00],
E                               [ 2.6107e-01, -2.3842e-04],
E                               [ 2.1553e-01, -2.3842e-04],
E                               [ 5.1022e-02,  2.3842e-04],
E                               [-7.4387e-02,  0.0000e+00],
E                               [ 5.3167e-02, -2.3842e-04],
E                               [ 1.4925e-01,  2.3842e-04],
E                               [ 1.2517e-01, -7.1526e-04],
E                               [-3.3593e-01,  7.1526e-04],
E                               [-7.9751e-01, -2.3842e-04],
E                               [ 4.7374e-01,  7.1526e-04],
E                               [ 4.5204e-01,  2.3842e-04],
E                               [ 3.1233e-02, -3.0470e-01],
E                               [ 9.5367e-04, -8.4639e-02],
E                               [ 4.7684e-04,  5.1975e-01],
E                               [-2.3842e-04, -1.6069e-01],
E                               [ 0.0000e+00,  4.3154e-02],
E                               [ 0.0000e+00,  1.5235e-01],
E                               [ 2.3842e-04, -8.1301e-02],
E                               [ 0.0000e+00,  4.6968e-02],
E                               [ 0.0000e+00,  3.1471e-02],
E                               [ 0.0000e+00, -1.1992e-01],
E                               [ 2.3842e-04, -7.4387e-02],
E                               [-4.7684e-04,  2.3460e-01],
E                               [ 9.5367e-04, -2.1267e-01],
E                               [-4.7684e-04,  8.5831e-03],
E                               [ 0.0000e+00,  2.2697e-01],
E                               [ 0.0000e+00, -2.0170e-01],
E                               [-2.3842e-04,  1.3256e-01],
E                               [-2.3842e-04,  9.8467e-02],
E                               [ 0.0000e+00, -6.4373e-03],
E                               [-4.7684e-04, -8.6546e-02],
E                               [-4.7684e-04,  1.1921e-01],
E                               [ 7.1526e-04, -1.2445e-01],
E                               [ 4.7684e-04, -8.0824e-02],
E                               [-4.7684e-04,  2.2840e-01],
E                               [ 2.3842e-04, -4.0674e-01],
E                               [ 0.0000e+00,  2.7966e-01],
E                               [ 9.5367e-04,  2.1434e-01],
E                               [ 0.0000e+00,  5.2452e-02],
E                               [-4.7684e-04, -7.4387e-02],
E                               [ 0.0000e+00,  5.2929e-02],
E                               [ 0.0000e+00,  1.4973e-01],
E                               [-2.3842e-04, -3.2377e-01],
E                               [-2.3842e-04,  2.1839e-01],
E                               [ 0.0000e+00, -8.7285e-01],
E                               [-4.7684e-04,  4.6802e-01],
E                               [-4.7684e-04,  4.5013e-01]])
E                       analytical:tensor([[-0.3646, -0.0000],
E                               [-0.1560, -0.0000],
E                               [ 0.5206,  0.0000],
E                               [-0.1865, -0.0000],
E                               [ 0.1634,  0.0000],
E                               [ 0.0231,  0.0000],
E                               [-0.0825, -0.0000],
E                               [ 0.0413,  0.0000],
E                               [ 0.0413,  0.0000],
E                               [-0.1482, -0.0000],
E                               [-0.0941, -0.0000],
E                               [ 0.2423,  0.0000],
E                               [-0.2277, -0.0000],
E                               [ 0.2358,  0.0000],
E                               [-0.0080, -0.0000],
E                               [-0.2303, -0.0000],
E                               [ 0.1151,  0.0000],
E                               [ 0.1151,  0.0000],
E                               [-0.0173, -0.0000],
E                               [-0.0991, -0.0000],
E                               [ 0.1164,  0.0000],
E                               [-0.1395, -0.0000],
E                               [ 0.2369,  0.0000],
E                               [-0.0974, -0.0000],
E                               [-0.4631, -0.0000],
E                               [ 0.2316,  0.0000],
E                               [ 0.2316,  0.0000],
E                               [ 0.0413,  0.0000],
E                               [-0.0827, -0.0000],
E                               [ 0.0413,  0.0000],
E                               [ 0.1610,  0.0000],
E                               [ 0.1610,  0.0000],
E                               [-0.3220, -0.0000],
E                               [-0.8826, -0.0000],
E                               [ 0.4413,  0.0000],
E                               [ 0.4413,  0.0000],
E                               [-0.0000, -0.3645],
E                               [-0.0000, -0.1560],
E                               [ 0.0000,  0.5206],
E                               [-0.0000, -0.1865],
E                               [ 0.0000,  0.0231],
E                               [ 0.0000,  0.1634],
E                               [-0.0000, -0.0825],
E                               [ 0.0000,  0.0413],
E                               [ 0.0000,  0.0413],
E                               [-0.0000, -0.1482],
E                               [-0.0000, -0.0941],
E                               [ 0.0000,  0.2423],
E                               [-0.0000, -0.2277],
E                               [-0.0000, -0.0080],
E                               [ 0.0000,  0.2358],
E                               [-0.0000, -0.2303],
E                               [ 0.0000,  0.1151],
E                               [ 0.0000,  0.1151],
E                               [-0.0000, -0.0173],
E                               [-0.0000, -0.0991],
E                               [ 0.0000,  0.1164],
E                               [-0.0000, -0.1395],
E                               [-0.0000, -0.0974],
E                               [ 0.0000,  0.2369],
E                               [-0.0000, -0.4631],
E                               [ 0.0000,  0.2316],
E                               [ 0.0000,  0.2316],
E                               [ 0.0000,  0.0413],
E                               [-0.0000, -0.0827],
E                               [ 0.0000,  0.0413],
E                               [ 0.0000,  0.1610],
E                               [-0.0000, -0.3220],
E                               [ 0.0000,  0.1610],
E                               [-0.0000, -0.8826],
E                               [ 0.0000,  0.4412],
E                               [ 0.0000,  0.4411]])

../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/torch/autograd/gradcheck.py:978: GradcheckError
----------------------------- Captured stdout call -----------------------------
reference gradient
[[[[-1.86844e-01 -6.25550e-02  2.49399e-01]
   [-2.03377e-01  2.02399e-01  9.77000e-04]
   [-1.41016e-01  7.91230e-02  6.18930e-02]]

  [[-1.15520e-02 -8.12800e-02  9.28320e-02]
   [-1.54257e-01  2.29433e-01 -7.51760e-02]
   [-2.46593e-01  1.46405e-01  1.00188e-01]]

  [[-1.29180e-02 -6.15930e-02  7.45120e-02]
   [-5.59860e-02  2.19831e-01 -1.63845e-01]
   [-4.97627e-01  2.09240e-01  2.88387e-01]]

  [[ 1.36050e-02 -3.02200e-02  1.66150e-02]
   [ 1.13925e-01  6.27810e-02 -1.76706e-01]
   [-6.67078e-01  3.67659e-01  2.99419e-01]]]


 [[[-3.56344e-01 -5.53470e-02  4.11691e-01]
   [-9.69220e-02  2.94590e-02  6.74630e-02]
   [-6.35180e-02  2.76540e-02  3.58630e-02]]

  [[-1.54499e-01 -7.39420e-02  2.28441e-01]
   [-1.66790e-01 -8.80000e-05  1.66878e-01]
   [-1.72370e-01  1.05565e-01  6.68040e-02]]

  [[ 2.38750e-02 -1.18256e-01  9.43810e-02]
   [-1.04707e-01 -1.08934e-01  2.13642e-01]
   [-3.69844e-01  1.80118e-01  1.89726e-01]]

  [[ 2.57140e-02 -7.94620e-02  5.37480e-02]
   [ 1.22328e-01 -2.38789e-01  1.16460e-01]
   [-5.98687e-01  3.02203e-01  2.96484e-01]]]]
----------------------------- Captured stderr call -----------------------------
[W LegacyTypeDispatch.h:74] Warning: AutoNonVariableTypeMode is deprecated and will be removed in 1.10 release. For kernel implementations please use AutoDispatchBelowADInplaceOrView instead, If you are looking for a user facing API to enable running your inference-only workload, please use c10::InferenceMode. Using AutoDispatchBelowADInplaceOrView in user code is under risk of producing silent wrong result in some edge cases. See Note [AutoDispatchBelowAutograd] for more details. (function operator())
____________________ TestAutograd.test_RNNTLoss_gradcheck_1 ____________________

a = (<torchaudio_unittest.rnnt.autograd_cpu_test.TestAutograd testMethod=test_RNNTLoss_gradcheck_1>,)

    @wraps(func)
    def standalone_func(*a):
>       return func(*(a + p.args), **p.kwargs)

../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/parameterized/parameterized.py:533: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
autograd_impl.py:64: in test_RNNTLoss_gradcheck
    self.assert_grad(loss, inputs, enable_all_grad=False)
autograd_impl.py:47: in assert_grad
    assert gradcheck(loss, inputs, eps=1e-03, atol=1e-02, rtol=1e-02, nondet_tol=0.)
../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/torch/autograd/gradcheck.py:1245: in gradcheck
    return _gradcheck_helper(**args)
../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/torch/autograd/gradcheck.py:1258: in _gradcheck_helper
    _gradcheck_real_imag(gradcheck_fn, func, func_out, tupled_inputs, outputs, eps,
../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/torch/autograd/gradcheck.py:930: in _gradcheck_real_imag
    gradcheck_fn(func, func_out, tupled_inputs, outputs, eps,
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

func = RNNTLoss(), func_out = tensor([5.0957], grad_fn=<RNNTLossFunction>>)
tupled_inputs = (tensor([[[[ 0.2438, -0.5317,  0.2438,  0.2438, -0.1996],
          [ 0.1468,  0.1468, -0.2588,  0.1468, -0.1816],
   ...quires_grad=True), tensor([[1, 2]], dtype=torch.int32), tensor([2], dtype=torch.int32), tensor([2], dtype=torch.int32))
outputs = (tensor([5.0957], grad_fn=<RNNTLossFunction>>),), eps = 0.001
rtol = 0.01, atol = 0.01, check_grad_dtypes = False, nondet_tol = 0.0

    def _slow_gradcheck(func, func_out, tupled_inputs, outputs, eps, rtol, atol, check_grad_dtypes,
                        nondet_tol, *, use_forward_ad=False, complex_indices=None, test_imag=False):
        if not outputs:
            return _check_no_differentiable_outputs(func, tupled_inputs, _as_tuple(func_out), eps)
    
        numerical = _transpose(_get_numerical_jacobian(func, tupled_inputs, outputs, eps=eps, is_forward_ad=use_forward_ad))
    
        if use_forward_ad:
            analytical_forward = _get_analytical_jacobian_forward_ad(func, tupled_inputs, outputs, check_grad_dtypes=check_grad_dtypes)
    
            for i, n_per_out in enumerate(numerical):
                for j, n in enumerate(n_per_out):
                    a = analytical_forward[j][i]
                    if not _allclose_with_type_promotion(a, n.to(a.device), rtol, atol):
                        raise GradcheckError(_get_notallclose_msg(a, n, i, j, complex_indices, test_imag,
                                                                  is_forward_ad=True))
        else:
            for i, o in enumerate(outputs):
                analytical = _check_analytical_jacobian_attributes(tupled_inputs, o, nondet_tol, check_grad_dtypes)
    
                for j, (a, n) in enumerate(zip(analytical, numerical[i])):
                    if not _allclose_with_type_promotion(a, n.to(a.device), rtol, atol):
>                       raise GradcheckError(_get_notallclose_msg(a, n, i, j, complex_indices, test_imag))
E                       torch.autograd.gradcheck.GradcheckError: Jacobian mismatch for output 0 with respect to input 0,
E                       numerical:tensor([[ 1.5291e+02],
E                               [ 1.2874e+01],
E                               [ 7.4053e-01],
E                               [ 3.3712e-01],
E                               [-7.9679e-01],
E                               [ 7.1311e-01],
E                               [ 3.2616e-01],
E                               [ 2.5654e-01],
E                               [ 1.4234e-01],
E                               [-1.8954e-01],
E                               [ 1.0657e-01],
E                               [ 9.6798e-02],
E                               [ 9.0361e-02],
E                               [ 8.4639e-02],
E                               [-2.0695e-01],
E                               [ 1.1611e-01],
E                               [-3.5858e-01],
E                               [ 1.6069e-01],
E                               [ 7.1049e-02],
E                               [ 7.0333e-02],
E                               [ 1.4257e-01],
E                               [ 1.4997e-01],
E                               [-6.6042e-01],
E                               [ 2.4986e-01],
E                               [ 1.4782e-01],
E                               [ 2.5439e-01],
E                               [ 2.5010e-01],
E                               [ 2.5296e-01],
E                               [ 2.5249e-01],
E                               [-8.9550e-01]])
E                       analytical:tensor([[ 0.2438],
E                               [-0.5317],
E                               [ 0.2438],
E                               [ 0.2438],
E                               [-0.1996],
E                               [ 0.1468],
E                               [ 0.1468],
E                               [-0.2588],
E                               [ 0.1468],
E                               [-0.1816],
E                               [ 0.0762],
E                               [ 0.0762],
E                               [ 0.0762],
E                               [ 0.0762],
E                               [-0.3046],
E                               [ 0.0760],
E                               [-0.3041],
E                               [ 0.0760],
E                               [ 0.0760],
E                               [ 0.0760],
E                               [ 0.1433],
E                               [ 0.1433],
E                               [-0.5733],
E                               [ 0.1433],
E                               [ 0.1433],
E                               [ 0.2318],
E                               [ 0.2318],
E                               [ 0.2318],
E                               [ 0.2318],
E                               [-0.9273]])

../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/torch/autograd/gradcheck.py:978: GradcheckError
----------------------------- Captured stdout call -----------------------------
reference gradient
[[[[ 0.17703132 -0.39992708  0.17703132  0.17703132 -0.13116692]
   [ 0.12247062  0.12247062 -0.181684    0.12247062 -0.1857276 ]
   [ 0.06269141  0.06269141  0.06928471  0.12624498 -0.32091248]]

  [[ 0.05456069 -0.2182428   0.05456069  0.05456069  0.05456069]
   [ 0.12073967  0.12073967 -0.48295838  0.12073967  0.12073967]
   [ 0.30741188  0.16871123  0.18645471  0.16871123 -0.83128875]]]]
_________________ TestAutograd.test_np_transducer_gradcheck_0 __________________

a = (<torchaudio_unittest.rnnt.autograd_cpu_test.TestAutograd testMethod=test_np_transducer_gradcheck_0>,)

    @wraps(func)
    def standalone_func(*a):
>       return func(*(a + p.args), **p.kwargs)

../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/parameterized/parameterized.py:533: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
autograd_impl.py:81: in test_np_transducer_gradcheck
    self.assert_grad(loss, inputs, enable_all_grad=False)
autograd_impl.py:47: in assert_grad
    assert gradcheck(loss, inputs, eps=1e-03, atol=1e-02, rtol=1e-02, nondet_tol=0.)
../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/torch/autograd/gradcheck.py:1245: in gradcheck
    return _gradcheck_helper(**args)
../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/torch/autograd/gradcheck.py:1258: in _gradcheck_helper
    _gradcheck_real_imag(gradcheck_fn, func, func_out, tupled_inputs, outputs, eps,
../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/torch/autograd/gradcheck.py:930: in _gradcheck_real_imag
    gradcheck_fn(func, func_out, tupled_inputs, outputs, eps,
../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/torch/autograd/gradcheck.py:974: in _slow_gradcheck
    analytical = _check_analytical_jacobian_attributes(tupled_inputs, o, nondet_tol, check_grad_dtypes)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

inputs = (tensor([[[[0.0654, 0.7875, 0.0816],
          [0.5297, 0.7507, 0.7541],
          [0.6098, 0.8681, 0.6225]],

       ...sor([4, 4], dtype=torch.int32), tensor([2, 2], dtype=torch.int32), tensor([[1, 2],
        [1, 1]], dtype=torch.int32))
output = tensor([4.2807, 3.9384], grad_fn=<_NumpyTransducerBackward>)
nondet_tol = 0.0, check_grad_dtypes = False, fast_mode = False, v = None

    def _check_analytical_jacobian_attributes(inputs, output, nondet_tol, check_grad_dtypes,
                                              fast_mode=False, v=None) -> Tuple[torch.Tensor, ...]:
        # This is used by both fast and slow mode:
        #  - For slow mode, vjps[i][j] is the jth row the Jacobian wrt the ith
        #    input.
        #  - For fast mode, vjps[i][0] is a linear combination of the rows
        #    of the Jacobian wrt the ith input
        diff_input_list = list(_iter_tensors(inputs, True))
    
        def vjp_fn(grad_output):
            return torch.autograd.grad(output, diff_input_list, grad_output,
                                       retain_graph=True, allow_unused=True)
        # Compute everything twice to check for nondeterminism (which we call reentrancy)
        if fast_mode:
            vjps1 = _get_analytical_vjps_wrt_specific_output(vjp_fn, output.clone(), v)
            vjps2 = _get_analytical_vjps_wrt_specific_output(vjp_fn, output.clone(), v)
        else:
            vjps1 = _compute_analytical_jacobian_rows(vjp_fn, output.clone())
            vjps2 = _compute_analytical_jacobian_rows(vjp_fn, output.clone())
    
        output_numel = output.numel() if not fast_mode else 1
        jacobians1, types_ok, sizes_ok = _stack_and_check_tensors(vjps1, inputs, output_numel)
        jacobians2, _, _ = _stack_and_check_tensors(vjps2, inputs, output_numel)
        reentrant = _check_jacobians_equal(jacobians1, jacobians2, nondet_tol)
    
        if not types_ok and check_grad_dtypes:
            raise GradcheckError('Gradient has dtype mismatch')
        if not sizes_ok:
            raise GradcheckError('Analytical gradient has incorrect size')
        if not reentrant:
>           raise GradcheckError('Backward is not reentrant, i.e., running backward with '
                                 'same input and grad_output multiple times gives different values, '
                                 'although analytical gradient matches numerical gradient.'
                                 f'The tolerance for nondeterminism was {nondet_tol}.' +
                                 FAILED_NONDET_MSG)
E           torch.autograd.gradcheck.GradcheckError: Backward is not reentrant, i.e., running backward with same input and grad_output multiple times gives different values, although analytical gradient matches numerical gradient.The tolerance for nondeterminism was 0.0.
E           
E           NOTE: If your op relies on non-deterministic operations i.e., it is listed here:
E           https://pytorch.org/docs/stable/generated/torch.use_deterministic_algorithms.html
E           this failure might be expected.
E           
E           If you are adding a new operator, please file an issue and then use one of the
E           workarounds. The workaround depends on how your test invokes gradcheck/gradgradcheck.
E           If the test
E           - manually invokes gradcheck/gradgradcheck, then call gradcheck/gradgradcheck
E             with `nondet_tol=<tol>` as a keyword argument.
E           - is OpInfo-based (e.g., in test_ops.py), then modify the OpInfo for the test
E             to have `gradcheck_nondet_tol=<tol>`.
E           - is a Module test (e.g., in common_nn.py), then modify the corresponding
E             module_test entry to have `gradcheck_nondet_tol=<tol>`

../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/torch/autograd/gradcheck.py:529: GradcheckError
----------------------------- Captured stdout call -----------------------------
reference gradient
[[[[-1.86844e-01 -6.25550e-02  2.49399e-01]
   [-2.03377e-01  2.02399e-01  9.77000e-04]
   [-1.41016e-01  7.91230e-02  6.18930e-02]]

  [[-1.15520e-02 -8.12800e-02  9.28320e-02]
   [-1.54257e-01  2.29433e-01 -7.51760e-02]
   [-2.46593e-01  1.46405e-01  1.00188e-01]]

  [[-1.29180e-02 -6.15930e-02  7.45120e-02]
   [-5.59860e-02  2.19831e-01 -1.63845e-01]
   [-4.97627e-01  2.09240e-01  2.88387e-01]]

  [[ 1.36050e-02 -3.02200e-02  1.66150e-02]
   [ 1.13925e-01  6.27810e-02 -1.76706e-01]
   [-6.67078e-01  3.67659e-01  2.99419e-01]]]


 [[[-3.56344e-01 -5.53470e-02  4.11691e-01]
   [-9.69220e-02  2.94590e-02  6.74630e-02]
   [-6.35180e-02  2.76540e-02  3.58630e-02]]

  [[-1.54499e-01 -7.39420e-02  2.28441e-01]
   [-1.66790e-01 -8.80000e-05  1.66878e-01]
   [-1.72370e-01  1.05565e-01  6.68040e-02]]

  [[ 2.38750e-02 -1.18256e-01  9.43810e-02]
   [-1.04707e-01 -1.08934e-01  2.13642e-01]
   [-3.69844e-01  1.80118e-01  1.89726e-01]]

  [[ 2.57140e-02 -7.94620e-02  5.37480e-02]
   [ 1.22328e-01 -2.38789e-01  1.16460e-01]
   [-5.98687e-01  3.02203e-01  2.96484e-01]]]]
____________________ TestAutograd.test_RNNTLoss_gradcheck_0 ____________________

a = (<torchaudio_unittest.rnnt.autograd_cuda_test.TestAutograd testMethod=test_RNNTLoss_gradcheck_0>,)

    @wraps(func)
    def standalone_func(*a):
>       return func(*(a + p.args), **p.kwargs)

../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/parameterized/parameterized.py:533: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
autograd_impl.py:64: in test_RNNTLoss_gradcheck
    self.assert_grad(loss, inputs, enable_all_grad=False)
autograd_impl.py:47: in assert_grad
    assert gradcheck(loss, inputs, eps=1e-03, atol=1e-02, rtol=1e-02, nondet_tol=0.)
../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/torch/autograd/gradcheck.py:1245: in gradcheck
    return _gradcheck_helper(**args)
../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/torch/autograd/gradcheck.py:1258: in _gradcheck_helper
    _gradcheck_real_imag(gradcheck_fn, func, func_out, tupled_inputs, outputs, eps,
../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/torch/autograd/gradcheck.py:930: in _gradcheck_real_imag
    gradcheck_fn(func, func_out, tupled_inputs, outputs, eps,
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

func = RNNTLoss()
func_out = tensor([4.2807, 3.9384], device='cuda:0', grad_fn=<RNNTLossFunction>>)
tupled_inputs = (tensor([[[[-0.3646, -0.1560,  0.5206],
          [-0.1865,  0.1634,  0.0231],
          [-0.0825,  0.0413,  0.0413]],...e=torch.int32), tensor([4, 4], device='cuda:0', dtype=torch.int32), tensor([2, 2], device='cuda:0', dtype=torch.int32))
outputs = (tensor([4.2807, 3.9384], device='cuda:0', grad_fn=<RNNTLossFunction>>),)
eps = 0.001, rtol = 0.01, atol = 0.01, check_grad_dtypes = False
nondet_tol = 0.0

    def _slow_gradcheck(func, func_out, tupled_inputs, outputs, eps, rtol, atol, check_grad_dtypes,
                        nondet_tol, *, use_forward_ad=False, complex_indices=None, test_imag=False):
        if not outputs:
            return _check_no_differentiable_outputs(func, tupled_inputs, _as_tuple(func_out), eps)
    
        numerical = _transpose(_get_numerical_jacobian(func, tupled_inputs, outputs, eps=eps, is_forward_ad=use_forward_ad))
    
        if use_forward_ad:
            analytical_forward = _get_analytical_jacobian_forward_ad(func, tupled_inputs, outputs, check_grad_dtypes=check_grad_dtypes)
    
            for i, n_per_out in enumerate(numerical):
                for j, n in enumerate(n_per_out):
                    a = analytical_forward[j][i]
                    if not _allclose_with_type_promotion(a, n.to(a.device), rtol, atol):
                        raise GradcheckError(_get_notallclose_msg(a, n, i, j, complex_indices, test_imag,
                                                                  is_forward_ad=True))
        else:
            for i, o in enumerate(outputs):
                analytical = _check_analytical_jacobian_attributes(tupled_inputs, o, nondet_tol, check_grad_dtypes)
    
                for j, (a, n) in enumerate(zip(analytical, numerical[i])):
                    if not _allclose_with_type_promotion(a, n.to(a.device), rtol, atol):
>                       raise GradcheckError(_get_notallclose_msg(a, n, i, j, complex_indices, test_imag))
E                       torch.autograd.gradcheck.GradcheckError: Jacobian mismatch for output 0 with respect to input 0,
E                       numerical:tensor([[ 1.9532e+02,  2.5656e+02],
E                               [ 3.7359e+01,  1.0767e+01],
E                               [ 2.0287e+00,  1.2023e+00],
E                               [ 1.3320e+00,  3.9697e-01],
E                               [ 2.4867e-01,  1.9670e-01],
E                               [-3.6025e-01,  1.1015e-01],
E                               [-1.6212e-01,  6.2942e-02],
E                               [-6.7711e-02,  3.7193e-02],
E                               [-2.3127e-02,  2.2173e-02],
E                               [-1.9336e-01,  1.1921e-02],
E                               [-7.1049e-02,  7.3910e-03],
E                               [ 2.1219e-01,  4.2915e-03],
E                               [-2.1982e-01,  2.3842e-03],
E                               [ 2.4748e-01,  1.9073e-03],
E                               [-6.1989e-03,  9.5367e-04],
E                               [-1.9050e-01,  0.0000e+00],
E                               [ 1.2302e-01,  4.7684e-04],
E                               [ 9.7275e-02,  7.1526e-04],
E                               [-7.6294e-03, -2.3842e-04],
E                               [-8.6784e-02,  2.3842e-04],
E                               [ 1.1897e-01,  0.0000e+00],
E                               [-1.2517e-01,  0.0000e+00],
E                               [ 2.5821e-01,  0.0000e+00],
E                               [-9.5606e-02, -2.3842e-04],
E                               [-3.8981e-01,  0.0000e+00],
E                               [ 2.6083e-01, -2.3842e-04],
E                               [ 2.1482e-01,  0.0000e+00],
E                               [ 5.1737e-02, -4.7684e-04],
E                               [-7.4387e-02,  2.3842e-04],
E                               [ 5.2452e-02,  0.0000e+00],
E                               [ 1.4949e-01,  2.3842e-04],
E                               [ 1.2517e-01, -2.3842e-04],
E                               [-3.3593e-01,  0.0000e+00],
E                               [-7.9775e-01,  0.0000e+00],
E                               [ 4.7398e-01,  2.3842e-04],
E                               [ 4.5300e-01,  0.0000e+00],
E                               [ 3.0279e-02, -3.0446e-01],
E                               [ 1.1921e-03, -8.5115e-02],
E                               [ 7.1526e-04,  5.1975e-01],
E                               [ 2.3842e-04, -1.6117e-01],
E                               [ 9.5367e-04,  4.2439e-02],
E                               [-4.7684e-04,  1.5283e-01],
E                               [ 0.0000e+00, -8.1539e-02],
E                               [ 0.0000e+00,  4.7445e-02],
E                               [ 2.3842e-04,  3.1710e-02],
E                               [ 0.0000e+00, -1.2016e-01],
E                               [-7.1526e-04, -7.4387e-02],
E                               [ 0.0000e+00,  2.3484e-01],
E                               [-4.7684e-04, -2.1338e-01],
E                               [ 0.0000e+00,  9.0599e-03],
E                               [ 7.1526e-04,  2.2674e-01],
E                               [ 0.0000e+00, -2.0170e-01],
E                               [ 0.0000e+00,  1.3256e-01],
E                               [ 0.0000e+00,  9.8944e-02],
E                               [-2.3842e-04, -6.4373e-03],
E                               [ 2.3842e-04, -8.6546e-02],
E                               [ 0.0000e+00,  1.1945e-01],
E                               [ 2.3842e-04, -1.2493e-01],
E                               [-2.3842e-04, -8.0347e-02],
E                               [-2.3842e-04,  2.2817e-01],
E                               [ 2.3842e-04, -4.0579e-01],
E                               [ 0.0000e+00,  2.7943e-01],
E                               [-2.3842e-04,  2.1434e-01],
E                               [ 0.0000e+00,  5.1975e-02],
E                               [-2.3842e-04, -7.3910e-02],
E                               [ 0.0000e+00,  5.3406e-02],
E                               [-2.3842e-04,  1.4853e-01],
E                               [ 0.0000e+00, -3.2330e-01],
E                               [-4.7684e-04,  2.1815e-01],
E                               [ 2.3842e-04, -8.7214e-01],
E                               [ 0.0000e+00,  4.6754e-01],
E                               [ 0.0000e+00,  4.4966e-01]], device='cuda:0')
E                       analytical:tensor([[-0.3646, -0.0000],
E                               [-0.1560, -0.0000],
E                               [ 0.5206,  0.0000],
E                               [-0.1865, -0.0000],
E                               [ 0.1634,  0.0000],
E                               [ 0.0231,  0.0000],
E                               [-0.0825, -0.0000],
E                               [ 0.0413,  0.0000],
E                               [ 0.0413,  0.0000],
E                               [-0.1482, -0.0000],
E                               [-0.0941, -0.0000],
E                               [ 0.2423,  0.0000],
E                               [-0.2277, -0.0000],
E                               [ 0.2358,  0.0000],
E                               [-0.0080, -0.0000],
E                               [-0.2303, -0.0000],
E                               [ 0.1151,  0.0000],
E                               [ 0.1151,  0.0000],
E                               [-0.0173, -0.0000],
E                               [-0.0991, -0.0000],
E                               [ 0.1164,  0.0000],
E                               [-0.1395, -0.0000],
E                               [ 0.2369,  0.0000],
E                               [-0.0974, -0.0000],
E                               [-0.4631, -0.0000],
E                               [ 0.2316,  0.0000],
E                               [ 0.2316,  0.0000],
E                               [ 0.0413,  0.0000],
E                               [-0.0827, -0.0000],
E                               [ 0.0413,  0.0000],
E                               [ 0.1610,  0.0000],
E                               [ 0.1610,  0.0000],
E                               [-0.3220, -0.0000],
E                               [-0.8826, -0.0000],
E                               [ 0.4413,  0.0000],
E                               [ 0.4413,  0.0000],
E                               [-0.0000, -0.3645],
E                               [-0.0000, -0.1560],
E                               [ 0.0000,  0.5206],
E                               [-0.0000, -0.1865],
E                               [ 0.0000,  0.0231],
E                               [ 0.0000,  0.1634],
E                               [-0.0000, -0.0825],
E                               [ 0.0000,  0.0413],
E                               [ 0.0000,  0.0413],
E                               [-0.0000, -0.1482],
E                               [-0.0000, -0.0941],
E                               [ 0.0000,  0.2423],
E                               [-0.0000, -0.2277],
E                               [-0.0000, -0.0080],
E                               [ 0.0000,  0.2358],
E                               [-0.0000, -0.2303],
E                               [ 0.0000,  0.1151],
E                               [ 0.0000,  0.1151],
E                               [-0.0000, -0.0173],
E                               [-0.0000, -0.0991],
E                               [ 0.0000,  0.1164],
E                               [-0.0000, -0.1395],
E                               [-0.0000, -0.0974],
E                               [ 0.0000,  0.2369],
E                               [-0.0000, -0.4631],
E                               [ 0.0000,  0.2316],
E                               [ 0.0000,  0.2316],
E                               [ 0.0000,  0.0413],
E                               [-0.0000, -0.0827],
E                               [ 0.0000,  0.0413],
E                               [ 0.0000,  0.1610],
E                               [-0.0000, -0.3220],
E                               [ 0.0000,  0.1610],
E                               [-0.0000, -0.8826],
E                               [ 0.0000,  0.4412],
E                               [ 0.0000,  0.4411]], device='cuda:0')

../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/torch/autograd/gradcheck.py:978: GradcheckError
----------------------------- Captured stdout call -----------------------------
reference gradient
[[[[-1.86844e-01 -6.25550e-02  2.49399e-01]
   [-2.03377e-01  2.02399e-01  9.77000e-04]
   [-1.41016e-01  7.91230e-02  6.18930e-02]]

  [[-1.15520e-02 -8.12800e-02  9.28320e-02]
   [-1.54257e-01  2.29433e-01 -7.51760e-02]
   [-2.46593e-01  1.46405e-01  1.00188e-01]]

  [[-1.29180e-02 -6.15930e-02  7.45120e-02]
   [-5.59860e-02  2.19831e-01 -1.63845e-01]
   [-4.97627e-01  2.09240e-01  2.88387e-01]]

  [[ 1.36050e-02 -3.02200e-02  1.66150e-02]
   [ 1.13925e-01  6.27810e-02 -1.76706e-01]
   [-6.67078e-01  3.67659e-01  2.99419e-01]]]


 [[[-3.56344e-01 -5.53470e-02  4.11691e-01]
   [-9.69220e-02  2.94590e-02  6.74630e-02]
   [-6.35180e-02  2.76540e-02  3.58630e-02]]

  [[-1.54499e-01 -7.39420e-02  2.28441e-01]
   [-1.66790e-01 -8.80000e-05  1.66878e-01]
   [-1.72370e-01  1.05565e-01  6.68040e-02]]

  [[ 2.38750e-02 -1.18256e-01  9.43810e-02]
   [-1.04707e-01 -1.08934e-01  2.13642e-01]
   [-3.69844e-01  1.80118e-01  1.89726e-01]]

  [[ 2.57140e-02 -7.94620e-02  5.37480e-02]
   [ 1.22328e-01 -2.38789e-01  1.16460e-01]
   [-5.98687e-01  3.02203e-01  2.96484e-01]]]]
____________________ TestAutograd.test_RNNTLoss_gradcheck_1 ____________________

a = (<torchaudio_unittest.rnnt.autograd_cuda_test.TestAutograd testMethod=test_RNNTLoss_gradcheck_1>,)

    @wraps(func)
    def standalone_func(*a):
>       return func(*(a + p.args), **p.kwargs)

../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/parameterized/parameterized.py:533: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
autograd_impl.py:64: in test_RNNTLoss_gradcheck
    self.assert_grad(loss, inputs, enable_all_grad=False)
autograd_impl.py:47: in assert_grad
    assert gradcheck(loss, inputs, eps=1e-03, atol=1e-02, rtol=1e-02, nondet_tol=0.)
../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/torch/autograd/gradcheck.py:1245: in gradcheck
    return _gradcheck_helper(**args)
../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/torch/autograd/gradcheck.py:1258: in _gradcheck_helper
    _gradcheck_real_imag(gradcheck_fn, func, func_out, tupled_inputs, outputs, eps,
../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/torch/autograd/gradcheck.py:930: in _gradcheck_real_imag
    gradcheck_fn(func, func_out, tupled_inputs, outputs, eps,
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

func = RNNTLoss()
func_out = tensor([5.0957], device='cuda:0', grad_fn=<RNNTLossFunction>>)
tupled_inputs = (tensor([[[[ 0.2438, -0.5317,  0.2438,  0.2438, -0.1996],
          [ 0.1468,  0.1468, -0.2588,  0.1468, -0.1816],
   ..., dtype=torch.int32), tensor([2], device='cuda:0', dtype=torch.int32), tensor([2], device='cuda:0', dtype=torch.int32))
outputs = (tensor([5.0957], device='cuda:0', grad_fn=<RNNTLossFunction>>),)
eps = 0.001, rtol = 0.01, atol = 0.01, check_grad_dtypes = False
nondet_tol = 0.0

    def _slow_gradcheck(func, func_out, tupled_inputs, outputs, eps, rtol, atol, check_grad_dtypes,
                        nondet_tol, *, use_forward_ad=False, complex_indices=None, test_imag=False):
        if not outputs:
            return _check_no_differentiable_outputs(func, tupled_inputs, _as_tuple(func_out), eps)
    
        numerical = _transpose(_get_numerical_jacobian(func, tupled_inputs, outputs, eps=eps, is_forward_ad=use_forward_ad))
    
        if use_forward_ad:
            analytical_forward = _get_analytical_jacobian_forward_ad(func, tupled_inputs, outputs, check_grad_dtypes=check_grad_dtypes)
    
            for i, n_per_out in enumerate(numerical):
                for j, n in enumerate(n_per_out):
                    a = analytical_forward[j][i]
                    if not _allclose_with_type_promotion(a, n.to(a.device), rtol, atol):
                        raise GradcheckError(_get_notallclose_msg(a, n, i, j, complex_indices, test_imag,
                                                                  is_forward_ad=True))
        else:
            for i, o in enumerate(outputs):
                analytical = _check_analytical_jacobian_attributes(tupled_inputs, o, nondet_tol, check_grad_dtypes)
    
                for j, (a, n) in enumerate(zip(analytical, numerical[i])):
                    if not _allclose_with_type_promotion(a, n.to(a.device), rtol, atol):
>                       raise GradcheckError(_get_notallclose_msg(a, n, i, j, complex_indices, test_imag))
E                       torch.autograd.gradcheck.GradcheckError: Jacobian mismatch for output 0 with respect to input 0,
E                       numerical:tensor([[ 1.5292e+02],
E                               [ 1.2874e+01],
E                               [ 7.3957e-01],
E                               [ 3.3808e-01],
E                               [-7.9703e-01],
E                               [ 7.1311e-01],
E                               [ 3.2520e-01],
E                               [ 2.5606e-01],
E                               [ 1.4329e-01],
E                               [-1.9002e-01],
E                               [ 1.0657e-01],
E                               [ 9.7036e-02],
E                               [ 9.1076e-02],
E                               [ 8.3685e-02],
E                               [-2.0671e-01],
E                               [ 1.1563e-01],
E                               [-3.5834e-01],
E                               [ 1.6141e-01],
E                               [ 7.1287e-02],
E                               [ 6.9857e-02],
E                               [ 1.4281e-01],
E                               [ 1.4949e-01],
E                               [-6.6090e-01],
E                               [ 2.5082e-01],
E                               [ 1.4830e-01],
E                               [ 2.5296e-01],
E                               [ 2.5010e-01],
E                               [ 2.5296e-01],
E                               [ 2.5177e-01],
E                               [-8.9502e-01]], device='cuda:0')
E                       analytical:tensor([[ 0.2438],
E                               [-0.5317],
E                               [ 0.2438],
E                               [ 0.2438],
E                               [-0.1996],
E                               [ 0.1468],
E                               [ 0.1468],
E                               [-0.2588],
E                               [ 0.1468],
E                               [-0.1816],
E                               [ 0.0762],
E                               [ 0.0762],
E                               [ 0.0762],
E                               [ 0.0762],
E                               [-0.3046],
E                               [ 0.0760],
E                               [-0.3041],
E                               [ 0.0760],
E                               [ 0.0760],
E                               [ 0.0760],
E                               [ 0.1433],
E                               [ 0.1433],
E                               [-0.5733],
E                               [ 0.1433],
E                               [ 0.1433],
E                               [ 0.2318],
E                               [ 0.2318],
E                               [ 0.2318],
E                               [ 0.2318],
E                               [-0.9273]], device='cuda:0')

../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/torch/autograd/gradcheck.py:978: GradcheckError
----------------------------- Captured stdout call -----------------------------
reference gradient
[[[[ 0.17703132 -0.39992708  0.17703132  0.17703132 -0.13116692]
   [ 0.12247062  0.12247062 -0.181684    0.12247062 -0.1857276 ]
   [ 0.06269141  0.06269141  0.06928471  0.12624498 -0.32091248]]

  [[ 0.05456069 -0.2182428   0.05456069  0.05456069  0.05456069]
   [ 0.12073967  0.12073967 -0.48295838  0.12073967  0.12073967]
   [ 0.30741188  0.16871123  0.18645471  0.16871123 -0.83128875]]]]
_________________ TestAutograd.test_np_transducer_gradcheck_0 __________________

a = (<torchaudio_unittest.rnnt.autograd_cuda_test.TestAutograd testMethod=test_np_transducer_gradcheck_0>,)

    @wraps(func)
    def standalone_func(*a):
>       return func(*(a + p.args), **p.kwargs)

../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/parameterized/parameterized.py:533: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
autograd_impl.py:81: in test_np_transducer_gradcheck
    self.assert_grad(loss, inputs, enable_all_grad=False)
autograd_impl.py:47: in assert_grad
    assert gradcheck(loss, inputs, eps=1e-03, atol=1e-02, rtol=1e-02, nondet_tol=0.)
../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/torch/autograd/gradcheck.py:1245: in gradcheck
    return _gradcheck_helper(**args)
../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/torch/autograd/gradcheck.py:1258: in _gradcheck_helper
    _gradcheck_real_imag(gradcheck_fn, func, func_out, tupled_inputs, outputs, eps,
../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/torch/autograd/gradcheck.py:930: in _gradcheck_real_imag
    gradcheck_fn(func, func_out, tupled_inputs, outputs, eps,
../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/torch/autograd/gradcheck.py:974: in _slow_gradcheck
    analytical = _check_analytical_jacobian_attributes(tupled_inputs, o, nondet_tol, check_grad_dtypes)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

inputs = (tensor([[[[0.0654, 0.7875, 0.0816],
          [0.5297, 0.7507, 0.7541],
          [0.6098, 0.8681, 0.6225]],

       ...nsor([2, 2], device='cuda:0', dtype=torch.int32), tensor([[1, 2],
        [1, 1]], device='cuda:0', dtype=torch.int32))
output = tensor([4.2807, 3.9384], device='cuda:0', grad_fn=<_NumpyTransducerBackward>)
nondet_tol = 0.0, check_grad_dtypes = False, fast_mode = False, v = None

    def _check_analytical_jacobian_attributes(inputs, output, nondet_tol, check_grad_dtypes,
                                              fast_mode=False, v=None) -> Tuple[torch.Tensor, ...]:
        # This is used by both fast and slow mode:
        #  - For slow mode, vjps[i][j] is the jth row the Jacobian wrt the ith
        #    input.
        #  - For fast mode, vjps[i][0] is a linear combination of the rows
        #    of the Jacobian wrt the ith input
        diff_input_list = list(_iter_tensors(inputs, True))
    
        def vjp_fn(grad_output):
            return torch.autograd.grad(output, diff_input_list, grad_output,
                                       retain_graph=True, allow_unused=True)
        # Compute everything twice to check for nondeterminism (which we call reentrancy)
        if fast_mode:
            vjps1 = _get_analytical_vjps_wrt_specific_output(vjp_fn, output.clone(), v)
            vjps2 = _get_analytical_vjps_wrt_specific_output(vjp_fn, output.clone(), v)
        else:
            vjps1 = _compute_analytical_jacobian_rows(vjp_fn, output.clone())
            vjps2 = _compute_analytical_jacobian_rows(vjp_fn, output.clone())
    
        output_numel = output.numel() if not fast_mode else 1
        jacobians1, types_ok, sizes_ok = _stack_and_check_tensors(vjps1, inputs, output_numel)
        jacobians2, _, _ = _stack_and_check_tensors(vjps2, inputs, output_numel)
        reentrant = _check_jacobians_equal(jacobians1, jacobians2, nondet_tol)
    
        if not types_ok and check_grad_dtypes:
            raise GradcheckError('Gradient has dtype mismatch')
        if not sizes_ok:
            raise GradcheckError('Analytical gradient has incorrect size')
        if not reentrant:
>           raise GradcheckError('Backward is not reentrant, i.e., running backward with '
                                 'same input and grad_output multiple times gives different values, '
                                 'although analytical gradient matches numerical gradient.'
                                 f'The tolerance for nondeterminism was {nondet_tol}.' +
                                 FAILED_NONDET_MSG)
E           torch.autograd.gradcheck.GradcheckError: Backward is not reentrant, i.e., running backward with same input and grad_output multiple times gives different values, although analytical gradient matches numerical gradient.The tolerance for nondeterminism was 0.0.
E           
E           NOTE: If your op relies on non-deterministic operations i.e., it is listed here:
E           https://pytorch.org/docs/stable/generated/torch.use_deterministic_algorithms.html
E           this failure might be expected.
E           
E           If you are adding a new operator, please file an issue and then use one of the
E           workarounds. The workaround depends on how your test invokes gradcheck/gradgradcheck.
E           If the test
E           - manually invokes gradcheck/gradgradcheck, then call gradcheck/gradgradcheck
E             with `nondet_tol=<tol>` as a keyword argument.
E           - is OpInfo-based (e.g., in test_ops.py), then modify the OpInfo for the test
E             to have `gradcheck_nondet_tol=<tol>`.
E           - is a Module test (e.g., in common_nn.py), then modify the corresponding
E             module_test entry to have `gradcheck_nondet_tol=<tol>`

../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/torch/autograd/gradcheck.py:529: GradcheckError
----------------------------- Captured stdout call -----------------------------
reference gradient
[[[[-1.86844e-01 -6.25550e-02  2.49399e-01]
   [-2.03377e-01  2.02399e-01  9.77000e-04]
   [-1.41016e-01  7.91230e-02  6.18930e-02]]

  [[-1.15520e-02 -8.12800e-02  9.28320e-02]
   [-1.54257e-01  2.29433e-01 -7.51760e-02]
   [-2.46593e-01  1.46405e-01  1.00188e-01]]

  [[-1.29180e-02 -6.15930e-02  7.45120e-02]
   [-5.59860e-02  2.19831e-01 -1.63845e-01]
   [-4.97627e-01  2.09240e-01  2.88387e-01]]

  [[ 1.36050e-02 -3.02200e-02  1.66150e-02]
   [ 1.13925e-01  6.27810e-02 -1.76706e-01]
   [-6.67078e-01  3.67659e-01  2.99419e-01]]]


 [[[-3.56344e-01 -5.53470e-02  4.11691e-01]
   [-9.69220e-02  2.94590e-02  6.74630e-02]
   [-6.35180e-02  2.76540e-02  3.58630e-02]]

  [[-1.54499e-01 -7.39420e-02  2.28441e-01]
   [-1.66790e-01 -8.80000e-05  1.66878e-01]
   [-1.72370e-01  1.05565e-01  6.68040e-02]]

  [[ 2.38750e-02 -1.18256e-01  9.43810e-02]
   [-1.04707e-01 -1.08934e-01  2.13642e-01]
   [-3.69844e-01  1.80118e-01  1.89726e-01]]

  [[ 2.57140e-02 -7.94620e-02  5.37480e-02]
   [ 1.22328e-01 -2.38789e-01  1.16460e-01]
   [-5.98687e-01  3.02203e-01  2.96484e-01]]]]
=============================== warnings summary ===============================
test/torchaudio_unittest/rnnt/autograd_cpu_test.py::TestAutograd::test_RNNTLoss_gradcheck_0
test/torchaudio_unittest/rnnt/autograd_cpu_test.py::TestAutograd::test_RNNTLoss_gradcheck_1
test/torchaudio_unittest/rnnt/autograd_cpu_test.py::TestAutograd::test_np_transducer_gradcheck_0
test/torchaudio_unittest/rnnt/autograd_cpu_test.py::TestAutograd::test_np_transducer_gradcheck_1
test/torchaudio_unittest/rnnt/autograd_cuda_test.py::TestAutograd::test_RNNTLoss_gradcheck_0
test/torchaudio_unittest/rnnt/autograd_cuda_test.py::TestAutograd::test_RNNTLoss_gradcheck_1
test/torchaudio_unittest/rnnt/autograd_cuda_test.py::TestAutograd::test_np_transducer_gradcheck_0
test/torchaudio_unittest/rnnt/autograd_cuda_test.py::TestAutograd::test_np_transducer_gradcheck_1
  /private/home/vincentqb/miniconda/envs/torch-nightly/lib/python3.8/site-packages/torch/autograd/gradcheck.py:632: UserWarning: Input #0 requires gradient and is not a double precision floating point or complex. This check will likely fail if all the inputs are not of double precision floating point or complex. 
    warnings.warn(

-- Docs: https://docs.pytest.org/en/stable/warnings.html
=========================== short test summary info ============================
FAILED autograd_cpu_test.py::TestAutograd::test_RNNTLoss_gradcheck_0 - torch....
FAILED autograd_cpu_test.py::TestAutograd::test_RNNTLoss_gradcheck_1 - torch....
FAILED autograd_cpu_test.py::TestAutograd::test_np_transducer_gradcheck_0 - t...
FAILED autograd_cuda_test.py::TestAutograd::test_RNNTLoss_gradcheck_0 - torch...
FAILED autograd_cuda_test.py::TestAutograd::test_RNNTLoss_gradcheck_1 - torch...
FAILED autograd_cuda_test.py::TestAutograd::test_np_transducer_gradcheck_0 - ...
=================== 6 failed, 20 passed, 8 warnings in 4.56s ===================

vincentqb · 2021-05-26T21:04:26Z

Patch with same result as comment above

diff --git a/test/torchaudio_unittest/rnnt/autograd_impl.py b/test/torchaudio_unittest/rnnt/autograd_impl.py
index aa642bc..123c752 100644
--- a/test/torchaudio_unittest/rnnt/autograd_impl.py
+++ b/test/torchaudio_unittest/rnnt/autograd_impl.py
@@ -16,6 +16,9 @@ from .utils import (
 )
 from .numpy_transducer import NumpyTransducerLoss
 
+import random
+import numpy as np
+
 
 class Autograd(TestBaseMixin):
     @staticmethod
@@ -44,7 +47,7 @@ class Autograd(TestBaseMixin):
         #         if enable_all_grad:
         #             i.requires_grad = True
         #     inputs_.append(i)
-        assert gradcheck(loss, inputs, eps=1e-03, atol=1e-02, rtol=1e-02, nondet_tol=0.)
+        assert gradcheck(loss, inputs, eps=1e-03, atol=1e-02, rtol=1e-02, nondet_tol=1e-02)
 
     @parameterized.expand([
         # (get_B1_T10_U3_D4_data, ),
@@ -77,5 +80,9 @@ class Autograd(TestBaseMixin):
             data["targets"],
         )
         loss = NumpyTransducerLoss(blank=data["blank"])
+
+        torch.use_deterministic_algorithms(True)
+        random.seed(0)
+        np.random.seed(0)
         
         self.assert_grad(loss, inputs, enable_all_grad=False)
diff --git a/test/torchaudio_unittest/rnnt/numpy_transducer.py b/test/torchaudio_unittest/rnnt/numpy_transducer.py
index a284bc1..b4896b1 100644
--- a/test/torchaudio_unittest/rnnt/numpy_transducer.py
+++ b/test/torchaudio_unittest/rnnt/numpy_transducer.py
@@ -34,6 +34,8 @@ class _NumpyTransducer(torch.autograd.Function):
 
     @staticmethod
     def backward(ctx, output_gradients):
+        output_gradients = output_gradients.view(-1, 1, 1, 1).to(ctx.grads)
+        ctx.grads.mul_(output_gradients).to(ctx.grads)
         return ctx.grads, None, None, None, None, None, None, None, None
 
     @staticmethod

vincentqb · 2021-05-26T21:39:30Z

FIX: The fix for numpy transducer is to not use inplace operations as shown in 8129432:

    def backward(ctx, grad_output):
        grad_output = grad_output.view(-1, 1, 1, 1).to(ctx.grads)
        return ctx.grads.mul(grad_output), None, None, None, None, None, None, None, None

BUG: warp-transducer and warp-rnnt modify the gradient inplace which can lead to backward pass not being reentrant. This means calling the loss function multiple time without calling a forward or using retain_graph will lead to different results. (internal)

out = rnnt_loss(…)
grad1, = autograd.grad(out, inputs, retain_graph=True)
grad2, = autograd.grad(out, inputs)
self.assertEqual(grad1, grad2, atol=atol, rtol=rtol)

The torchaudio C++ custom autograd function in torchaudio is ok thanks to #1507, see here.

BUG: the numpy transducer is issue is different: the jacobian-gradient product is not made and only the gradient is returned. (internal)

vincentqb · 2021-05-26T21:44:32Z

============================================================================================================= short test summary info ==============================================================================================================
FAILED autograd_cpu_test.py::TestAutograd::test_RNNTLoss_gradcheck_0 - torch.autograd.gradcheck.GradcheckError: Jacobian mismatch for output 0 with respect to input 0,
FAILED autograd_cpu_test.py::TestAutograd::test_RNNTLoss_gradcheck_1 - torch.autograd.gradcheck.GradcheckError: Jacobian mismatch for output 0 with respect to input 0,
FAILED autograd_cuda_test.py::TestAutograd::test_RNNTLoss_gradcheck_0 - torch.autograd.gradcheck.GradcheckError: Jacobian mismatch for output 0 with respect to input 0,
FAILED autograd_cuda_test.py::TestAutograd::test_RNNTLoss_gradcheck_1 - torch.autograd.gradcheck.GradcheckError: Jacobian mismatch for output 0 with respect to input 0,
===================================================================================================== 4 failed, 22 passed, 8 warnings in 4.90s =====================================================================================================

Details

============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-6.2.4, py-1.10.0, pluggy-0.13.1 -- /private/home/vincentqb/miniconda/envs/torch-nightly/bin/python
cachedir: .pytest_cache
rootdir: /private/home/vincentqb/autograd/audio
plugins: hydra-core-1.0.6
collecting ... collected 26 items

autograd_cpu_test.py::TestAutograd::test_RNNTLoss_gradcheck_0 FAILED     [  3%]
autograd_cpu_test.py::TestAutograd::test_RNNTLoss_gradcheck_1 FAILED     [  7%]
autograd_cpu_test.py::TestAutograd::test_np_transducer_gradcheck_0 PASSED [ 11%]
autograd_cpu_test.py::TestAutograd::test_np_transducer_gradcheck_1 PASSED [ 15%]
autograd_cuda_test.py::TestAutograd::test_RNNTLoss_gradcheck_0 FAILED    [ 19%]
autograd_cuda_test.py::TestAutograd::test_RNNTLoss_gradcheck_1 FAILED    [ 23%]
autograd_cuda_test.py::TestAutograd::test_np_transducer_gradcheck_0 PASSED [ 26%]
autograd_cuda_test.py::TestAutograd::test_np_transducer_gradcheck_1 PASSED [ 30%]
rnnt_loss_cpu_test.py::TestRNNTLoss::test_basic_backward PASSED          [ 34%]
rnnt_loss_cpu_test.py::TestRNNTLoss::test_costs_and_gradients_B1_T2_U3_D5_fp16 PASSED [ 38%]
rnnt_loss_cpu_test.py::TestRNNTLoss::test_costs_and_gradients_B1_T2_U3_D5_fp32 PASSED [ 42%]
rnnt_loss_cpu_test.py::TestRNNTLoss::test_costs_and_gradients_B2_T4_U3_D3_fp16 PASSED [ 46%]
rnnt_loss_cpu_test.py::TestRNNTLoss::test_costs_and_gradients_B2_T4_U3_D3_fp32 PASSED [ 50%]
rnnt_loss_cpu_test.py::TestRNNTLoss::test_costs_and_gradients_random_data_with_numpy_fp32 PASSED [ 53%]
rnnt_loss_cpu_test.py::TestRNNTLoss::test_rnnt_nonfused_log_softmax PASSED [ 57%]
rnnt_loss_cuda_test.py::TestRNNTLoss::test_basic_backward PASSED         [ 61%]
rnnt_loss_cuda_test.py::TestRNNTLoss::test_costs_and_gradients_B1_T2_U3_D5_fp16 PASSED [ 65%]
rnnt_loss_cuda_test.py::TestRNNTLoss::test_costs_and_gradients_B1_T2_U3_D5_fp32 PASSED [ 69%]
rnnt_loss_cuda_test.py::TestRNNTLoss::test_costs_and_gradients_B2_T4_U3_D3_fp16 PASSED [ 73%]
rnnt_loss_cuda_test.py::TestRNNTLoss::test_costs_and_gradients_B2_T4_U3_D3_fp32 PASSED [ 76%]
rnnt_loss_cuda_test.py::TestRNNTLoss::test_costs_and_gradients_random_data_with_numpy_fp32 PASSED [ 80%]
rnnt_loss_cuda_test.py::TestRNNTLoss::test_rnnt_nonfused_log_softmax PASSED [ 84%]
torchscript_consistency_cpu_test.py::TestRNNTLoss::test_RNNTLoss PASSED  [ 88%]
torchscript_consistency_cpu_test.py::TestRNNTLoss::test_rnnt_loss PASSED [ 92%]
torchscript_consistency_cuda_test.py::TestRNNTLoss::test_RNNTLoss PASSED [ 96%]
torchscript_consistency_cuda_test.py::TestRNNTLoss::test_rnnt_loss PASSED [100%]

=================================== FAILURES ===================================
____________________ TestAutograd.test_RNNTLoss_gradcheck_0 ____________________

a = (<torchaudio_unittest.rnnt.autograd_cpu_test.TestAutograd testMethod=test_RNNTLoss_gradcheck_0>,)

    @wraps(func)
    def standalone_func(*a):
>       return func(*(a + p.args), **p.kwargs)

../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/parameterized/parameterized.py:533: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
autograd_impl.py:63: in test_RNNTLoss_gradcheck
    self.assert_grad(loss, inputs, enable_all_grad=False)
autograd_impl.py:46: in assert_grad
    assert gradcheck(loss, inputs, eps=1e-03, atol=1e-02, rtol=1e-02, nondet_tol=0.)
../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/torch/autograd/gradcheck.py:1245: in gradcheck
    return _gradcheck_helper(**args)
../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/torch/autograd/gradcheck.py:1258: in _gradcheck_helper
    _gradcheck_real_imag(gradcheck_fn, func, func_out, tupled_inputs, outputs, eps,
../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/torch/autograd/gradcheck.py:930: in _gradcheck_real_imag
    gradcheck_fn(func, func_out, tupled_inputs, outputs, eps,
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

func = RNNTLoss()
func_out = tensor([4.2807, 3.9384], grad_fn=<RNNTLossFunction>>)
tupled_inputs = (tensor([[[[-0.3646, -0.1560,  0.5206],
          [-0.1865,  0.1634,  0.0231],
          [-0.0825,  0.0413,  0.0413]],...sor([[1, 2],
        [1, 1]], dtype=torch.int32), tensor([4, 4], dtype=torch.int32), tensor([2, 2], dtype=torch.int32))
outputs = (tensor([4.2807, 3.9384], grad_fn=<RNNTLossFunction>>),), eps = 0.001
rtol = 0.01, atol = 0.01, check_grad_dtypes = False, nondet_tol = 0.0

    def _slow_gradcheck(func, func_out, tupled_inputs, outputs, eps, rtol, atol, check_grad_dtypes,
                        nondet_tol, *, use_forward_ad=False, complex_indices=None, test_imag=False):
        if not outputs:
            return _check_no_differentiable_outputs(func, tupled_inputs, _as_tuple(func_out), eps)
    
        numerical = _transpose(_get_numerical_jacobian(func, tupled_inputs, outputs, eps=eps, is_forward_ad=use_forward_ad))
    
        if use_forward_ad:
            analytical_forward = _get_analytical_jacobian_forward_ad(func, tupled_inputs, outputs, check_grad_dtypes=check_grad_dtypes)
    
            for i, n_per_out in enumerate(numerical):
                for j, n in enumerate(n_per_out):
                    a = analytical_forward[j][i]
                    if not _allclose_with_type_promotion(a, n.to(a.device), rtol, atol):
                        raise GradcheckError(_get_notallclose_msg(a, n, i, j, complex_indices, test_imag,
                                                                  is_forward_ad=True))
        else:
            for i, o in enumerate(outputs):
                analytical = _check_analytical_jacobian_attributes(tupled_inputs, o, nondet_tol, check_grad_dtypes)
    
                for j, (a, n) in enumerate(zip(analytical, numerical[i])):
                    if not _allclose_with_type_promotion(a, n.to(a.device), rtol, atol):
>                       raise GradcheckError(_get_notallclose_msg(a, n, i, j, complex_indices, test_imag))
E                       torch.autograd.gradcheck.GradcheckError: Jacobian mismatch for output 0 with respect to input 0,
E                       numerical:tensor([[ 1.9532e+02,  2.5656e+02],
E                               [ 3.7360e+01,  1.0766e+01],
E                               [ 2.0285e+00,  1.2028e+00],
E                               [ 1.3320e+00,  3.9721e-01],
E                               [ 2.4867e-01,  1.9717e-01],
E                               [-3.6073e-01,  1.1039e-01],
E                               [-1.6189e-01,  6.2943e-02],
E                               [-6.7949e-02,  3.6478e-02],
E                               [-2.3365e-02,  2.1696e-02],
E                               [-1.9288e-01,  1.2875e-02],
E                               [-7.1764e-02,  7.3910e-03],
E                               [ 2.1267e-01,  4.0531e-03],
E                               [-2.2006e-01,  2.6226e-03],
E                               [ 2.4772e-01,  2.1458e-03],
E                               [-5.7220e-03,  9.5367e-04],
E                               [-1.9026e-01,  4.7684e-04],
E                               [ 1.2255e-01,  4.7684e-04],
E                               [ 9.7036e-02,  2.3842e-04],
E                               [-6.9141e-03, -4.7684e-04],
E                               [-8.7261e-02, -2.3842e-04],
E                               [ 1.1921e-01,  4.7684e-04],
E                               [-1.2374e-01, -2.3842e-04],
E                               [ 2.5749e-01,  0.0000e+00],
E                               [-9.5367e-02, -2.3842e-04],
E                               [-3.8958e-01,  0.0000e+00],
E                               [ 2.6107e-01, -2.3842e-04],
E                               [ 2.1553e-01, -2.3842e-04],
E                               [ 5.1022e-02,  2.3842e-04],
E                               [-7.4387e-02,  0.0000e+00],
E                               [ 5.3167e-02, -2.3842e-04],
E                               [ 1.4925e-01,  2.3842e-04],
E                               [ 1.2517e-01, -7.1526e-04],
E                               [-3.3593e-01,  7.1526e-04],
E                               [-7.9751e-01, -2.3842e-04],
E                               [ 4.7374e-01,  7.1526e-04],
E                               [ 4.5204e-01,  2.3842e-04],
E                               [ 3.1233e-02, -3.0470e-01],
E                               [ 9.5367e-04, -8.4639e-02],
E                               [ 4.7684e-04,  5.1975e-01],
E                               [-2.3842e-04, -1.6069e-01],
E                               [ 0.0000e+00,  4.3154e-02],
E                               [ 0.0000e+00,  1.5235e-01],
E                               [ 2.3842e-04, -8.1301e-02],
E                               [ 0.0000e+00,  4.6968e-02],
E                               [ 0.0000e+00,  3.1471e-02],
E                               [ 0.0000e+00, -1.1992e-01],
E                               [ 2.3842e-04, -7.4387e-02],
E                               [-4.7684e-04,  2.3460e-01],
E                               [ 9.5367e-04, -2.1267e-01],
E                               [-4.7684e-04,  8.5831e-03],
E                               [ 0.0000e+00,  2.2697e-01],
E                               [ 0.0000e+00, -2.0170e-01],
E                               [-2.3842e-04,  1.3256e-01],
E                               [-2.3842e-04,  9.8467e-02],
E                               [ 0.0000e+00, -6.4373e-03],
E                               [-4.7684e-04, -8.6546e-02],
E                               [-4.7684e-04,  1.1921e-01],
E                               [ 7.1526e-04, -1.2445e-01],
E                               [ 4.7684e-04, -8.0824e-02],
E                               [-4.7684e-04,  2.2840e-01],
E                               [ 2.3842e-04, -4.0674e-01],
E                               [ 0.0000e+00,  2.7966e-01],
E                               [ 9.5367e-04,  2.1434e-01],
E                               [ 0.0000e+00,  5.2452e-02],
E                               [-4.7684e-04, -7.4387e-02],
E                               [ 0.0000e+00,  5.2929e-02],
E                               [ 0.0000e+00,  1.4973e-01],
E                               [-2.3842e-04, -3.2377e-01],
E                               [-2.3842e-04,  2.1839e-01],
E                               [ 0.0000e+00, -8.7285e-01],
E                               [-4.7684e-04,  4.6802e-01],
E                               [-4.7684e-04,  4.5013e-01]])
E                       analytical:tensor([[-0.3646, -0.0000],
E                               [-0.1560, -0.0000],
E                               [ 0.5206,  0.0000],
E                               [-0.1865, -0.0000],
E                               [ 0.1634,  0.0000],
E                               [ 0.0231,  0.0000],
E                               [-0.0825, -0.0000],
E                               [ 0.0413,  0.0000],
E                               [ 0.0413,  0.0000],
E                               [-0.1482, -0.0000],
E                               [-0.0941, -0.0000],
E                               [ 0.2423,  0.0000],
E                               [-0.2277, -0.0000],
E                               [ 0.2358,  0.0000],
E                               [-0.0080, -0.0000],
E                               [-0.2303, -0.0000],
E                               [ 0.1151,  0.0000],
E                               [ 0.1151,  0.0000],
E                               [-0.0173, -0.0000],
E                               [-0.0991, -0.0000],
E                               [ 0.1164,  0.0000],
E                               [-0.1395, -0.0000],
E                               [ 0.2369,  0.0000],
E                               [-0.0974, -0.0000],
E                               [-0.4631, -0.0000],
E                               [ 0.2316,  0.0000],
E                               [ 0.2316,  0.0000],
E                               [ 0.0413,  0.0000],
E                               [-0.0827, -0.0000],
E                               [ 0.0413,  0.0000],
E                               [ 0.1610,  0.0000],
E                               [ 0.1610,  0.0000],
E                               [-0.3220, -0.0000],
E                               [-0.8826, -0.0000],
E                               [ 0.4413,  0.0000],
E                               [ 0.4413,  0.0000],
E                               [-0.0000, -0.3645],
E                               [-0.0000, -0.1560],
E                               [ 0.0000,  0.5206],
E                               [-0.0000, -0.1865],
E                               [ 0.0000,  0.0231],
E                               [ 0.0000,  0.1634],
E                               [-0.0000, -0.0825],
E                               [ 0.0000,  0.0413],
E                               [ 0.0000,  0.0413],
E                               [-0.0000, -0.1482],
E                               [-0.0000, -0.0941],
E                               [ 0.0000,  0.2423],
E                               [-0.0000, -0.2277],
E                               [-0.0000, -0.0080],
E                               [ 0.0000,  0.2358],
E                               [-0.0000, -0.2303],
E                               [ 0.0000,  0.1151],
E                               [ 0.0000,  0.1151],
E                               [-0.0000, -0.0173],
E                               [-0.0000, -0.0991],
E                               [ 0.0000,  0.1164],
E                               [-0.0000, -0.1395],
E                               [-0.0000, -0.0974],
E                               [ 0.0000,  0.2369],
E                               [-0.0000, -0.4631],
E                               [ 0.0000,  0.2316],
E                               [ 0.0000,  0.2316],
E                               [ 0.0000,  0.0413],
E                               [-0.0000, -0.0827],
E                               [ 0.0000,  0.0413],
E                               [ 0.0000,  0.1610],
E                               [-0.0000, -0.3220],
E                               [ 0.0000,  0.1610],
E                               [-0.0000, -0.8826],
E                               [ 0.0000,  0.4412],
E                               [ 0.0000,  0.4411]])

../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/torch/autograd/gradcheck.py:978: GradcheckError
----------------------------- Captured stdout call -----------------------------
reference gradient
[[[[-1.86844e-01 -6.25550e-02  2.49399e-01]
   [-2.03377e-01  2.02399e-01  9.77000e-04]
   [-1.41016e-01  7.91230e-02  6.18930e-02]]

  [[-1.15520e-02 -8.12800e-02  9.28320e-02]
   [-1.54257e-01  2.29433e-01 -7.51760e-02]
   [-2.46593e-01  1.46405e-01  1.00188e-01]]

  [[-1.29180e-02 -6.15930e-02  7.45120e-02]
   [-5.59860e-02  2.19831e-01 -1.63845e-01]
   [-4.97627e-01  2.09240e-01  2.88387e-01]]

  [[ 1.36050e-02 -3.02200e-02  1.66150e-02]
   [ 1.13925e-01  6.27810e-02 -1.76706e-01]
   [-6.67078e-01  3.67659e-01  2.99419e-01]]]


 [[[-3.56344e-01 -5.53470e-02  4.11691e-01]
   [-9.69220e-02  2.94590e-02  6.74630e-02]
   [-6.35180e-02  2.76540e-02  3.58630e-02]]

  [[-1.54499e-01 -7.39420e-02  2.28441e-01]
   [-1.66790e-01 -8.80000e-05  1.66878e-01]
   [-1.72370e-01  1.05565e-01  6.68040e-02]]

  [[ 2.38750e-02 -1.18256e-01  9.43810e-02]
   [-1.04707e-01 -1.08934e-01  2.13642e-01]
   [-3.69844e-01  1.80118e-01  1.89726e-01]]

  [[ 2.57140e-02 -7.94620e-02  5.37480e-02]
   [ 1.22328e-01 -2.38789e-01  1.16460e-01]
   [-5.98687e-01  3.02203e-01  2.96484e-01]]]]
----------------------------- Captured stderr call -----------------------------
[W LegacyTypeDispatch.h:74] Warning: AutoNonVariableTypeMode is deprecated and will be removed in 1.10 release. For kernel implementations please use AutoDispatchBelowADInplaceOrView instead, If you are looking for a user facing API to enable running your inference-only workload, please use c10::InferenceMode. Using AutoDispatchBelowADInplaceOrView in user code is under risk of producing silent wrong result in some edge cases. See Note [AutoDispatchBelowAutograd] for more details. (function operator())
____________________ TestAutograd.test_RNNTLoss_gradcheck_1 ____________________

a = (<torchaudio_unittest.rnnt.autograd_cpu_test.TestAutograd testMethod=test_RNNTLoss_gradcheck_1>,)

    @wraps(func)
    def standalone_func(*a):
>       return func(*(a + p.args), **p.kwargs)

../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/parameterized/parameterized.py:533: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
autograd_impl.py:63: in test_RNNTLoss_gradcheck
    self.assert_grad(loss, inputs, enable_all_grad=False)
autograd_impl.py:46: in assert_grad
    assert gradcheck(loss, inputs, eps=1e-03, atol=1e-02, rtol=1e-02, nondet_tol=0.)
../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/torch/autograd/gradcheck.py:1245: in gradcheck
    return _gradcheck_helper(**args)
../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/torch/autograd/gradcheck.py:1258: in _gradcheck_helper
    _gradcheck_real_imag(gradcheck_fn, func, func_out, tupled_inputs, outputs, eps,
../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/torch/autograd/gradcheck.py:930: in _gradcheck_real_imag
    gradcheck_fn(func, func_out, tupled_inputs, outputs, eps,
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

func = RNNTLoss(), func_out = tensor([5.0957], grad_fn=<RNNTLossFunction>>)
tupled_inputs = (tensor([[[[ 0.2438, -0.5317,  0.2438,  0.2438, -0.1996],
          [ 0.1468,  0.1468, -0.2588,  0.1468, -0.1816],
   ...quires_grad=True), tensor([[1, 2]], dtype=torch.int32), tensor([2], dtype=torch.int32), tensor([2], dtype=torch.int32))
outputs = (tensor([5.0957], grad_fn=<RNNTLossFunction>>),), eps = 0.001
rtol = 0.01, atol = 0.01, check_grad_dtypes = False, nondet_tol = 0.0

    def _slow_gradcheck(func, func_out, tupled_inputs, outputs, eps, rtol, atol, check_grad_dtypes,
                        nondet_tol, *, use_forward_ad=False, complex_indices=None, test_imag=False):
        if not outputs:
            return _check_no_differentiable_outputs(func, tupled_inputs, _as_tuple(func_out), eps)
    
        numerical = _transpose(_get_numerical_jacobian(func, tupled_inputs, outputs, eps=eps, is_forward_ad=use_forward_ad))
    
        if use_forward_ad:
            analytical_forward = _get_analytical_jacobian_forward_ad(func, tupled_inputs, outputs, check_grad_dtypes=check_grad_dtypes)
    
            for i, n_per_out in enumerate(numerical):
                for j, n in enumerate(n_per_out):
                    a = analytical_forward[j][i]
                    if not _allclose_with_type_promotion(a, n.to(a.device), rtol, atol):
                        raise GradcheckError(_get_notallclose_msg(a, n, i, j, complex_indices, test_imag,
                                                                  is_forward_ad=True))
        else:
            for i, o in enumerate(outputs):
                analytical = _check_analytical_jacobian_attributes(tupled_inputs, o, nondet_tol, check_grad_dtypes)
    
                for j, (a, n) in enumerate(zip(analytical, numerical[i])):
                    if not _allclose_with_type_promotion(a, n.to(a.device), rtol, atol):
>                       raise GradcheckError(_get_notallclose_msg(a, n, i, j, complex_indices, test_imag))
E                       torch.autograd.gradcheck.GradcheckError: Jacobian mismatch for output 0 with respect to input 0,
E                       numerical:tensor([[ 1.5291e+02],
E                               [ 1.2874e+01],
E                               [ 7.4053e-01],
E                               [ 3.3712e-01],
E                               [-7.9679e-01],
E                               [ 7.1311e-01],
E                               [ 3.2616e-01],
E                               [ 2.5654e-01],
E                               [ 1.4234e-01],
E                               [-1.8954e-01],
E                               [ 1.0657e-01],
E                               [ 9.6798e-02],
E                               [ 9.0361e-02],
E                               [ 8.4639e-02],
E                               [-2.0695e-01],
E                               [ 1.1611e-01],
E                               [-3.5858e-01],
E                               [ 1.6069e-01],
E                               [ 7.1049e-02],
E                               [ 7.0333e-02],
E                               [ 1.4257e-01],
E                               [ 1.4997e-01],
E                               [-6.6042e-01],
E                               [ 2.4986e-01],
E                               [ 1.4782e-01],
E                               [ 2.5439e-01],
E                               [ 2.5010e-01],
E                               [ 2.5296e-01],
E                               [ 2.5249e-01],
E                               [-8.9550e-01]])
E                       analytical:tensor([[ 0.2438],
E                               [-0.5317],
E                               [ 0.2438],
E                               [ 0.2438],
E                               [-0.1996],
E                               [ 0.1468],
E                               [ 0.1468],
E                               [-0.2588],
E                               [ 0.1468],
E                               [-0.1816],
E                               [ 0.0762],
E                               [ 0.0762],
E                               [ 0.0762],
E                               [ 0.0762],
E                               [-0.3046],
E                               [ 0.0760],
E                               [-0.3041],
E                               [ 0.0760],
E                               [ 0.0760],
E                               [ 0.0760],
E                               [ 0.1433],
E                               [ 0.1433],
E                               [-0.5733],
E                               [ 0.1433],
E                               [ 0.1433],
E                               [ 0.2318],
E                               [ 0.2318],
E                               [ 0.2318],
E                               [ 0.2318],
E                               [-0.9273]])

../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/torch/autograd/gradcheck.py:978: GradcheckError
----------------------------- Captured stdout call -----------------------------
reference gradient
[[[[ 0.17703132 -0.39992708  0.17703132  0.17703132 -0.13116692]
   [ 0.12247062  0.12247062 -0.181684    0.12247062 -0.1857276 ]
   [ 0.06269141  0.06269141  0.06928471  0.12624498 -0.32091248]]

  [[ 0.05456069 -0.2182428   0.05456069  0.05456069  0.05456069]
   [ 0.12073967  0.12073967 -0.48295838  0.12073967  0.12073967]
   [ 0.30741188  0.16871123  0.18645471  0.16871123 -0.83128875]]]]
____________________ TestAutograd.test_RNNTLoss_gradcheck_0 ____________________

a = (<torchaudio_unittest.rnnt.autograd_cuda_test.TestAutograd testMethod=test_RNNTLoss_gradcheck_0>,)

    @wraps(func)
    def standalone_func(*a):
>       return func(*(a + p.args), **p.kwargs)

../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/parameterized/parameterized.py:533: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
autograd_impl.py:63: in test_RNNTLoss_gradcheck
    self.assert_grad(loss, inputs, enable_all_grad=False)
autograd_impl.py:46: in assert_grad
    assert gradcheck(loss, inputs, eps=1e-03, atol=1e-02, rtol=1e-02, nondet_tol=0.)
../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/torch/autograd/gradcheck.py:1245: in gradcheck
    return _gradcheck_helper(**args)
../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/torch/autograd/gradcheck.py:1258: in _gradcheck_helper
    _gradcheck_real_imag(gradcheck_fn, func, func_out, tupled_inputs, outputs, eps,
../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/torch/autograd/gradcheck.py:930: in _gradcheck_real_imag
    gradcheck_fn(func, func_out, tupled_inputs, outputs, eps,
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

func = RNNTLoss()
func_out = tensor([4.2807, 3.9384], device='cuda:0', grad_fn=<RNNTLossFunction>>)
tupled_inputs = (tensor([[[[-0.3646, -0.1560,  0.5206],
          [-0.1865,  0.1634,  0.0231],
          [-0.0825,  0.0413,  0.0413]],...e=torch.int32), tensor([4, 4], device='cuda:0', dtype=torch.int32), tensor([2, 2], device='cuda:0', dtype=torch.int32))
outputs = (tensor([4.2807, 3.9384], device='cuda:0', grad_fn=<RNNTLossFunction>>),)
eps = 0.001, rtol = 0.01, atol = 0.01, check_grad_dtypes = False
nondet_tol = 0.0

    def _slow_gradcheck(func, func_out, tupled_inputs, outputs, eps, rtol, atol, check_grad_dtypes,
                        nondet_tol, *, use_forward_ad=False, complex_indices=None, test_imag=False):
        if not outputs:
            return _check_no_differentiable_outputs(func, tupled_inputs, _as_tuple(func_out), eps)
    
        numerical = _transpose(_get_numerical_jacobian(func, tupled_inputs, outputs, eps=eps, is_forward_ad=use_forward_ad))
    
        if use_forward_ad:
            analytical_forward = _get_analytical_jacobian_forward_ad(func, tupled_inputs, outputs, check_grad_dtypes=check_grad_dtypes)
    
            for i, n_per_out in enumerate(numerical):
                for j, n in enumerate(n_per_out):
                    a = analytical_forward[j][i]
                    if not _allclose_with_type_promotion(a, n.to(a.device), rtol, atol):
                        raise GradcheckError(_get_notallclose_msg(a, n, i, j, complex_indices, test_imag,
                                                                  is_forward_ad=True))
        else:
            for i, o in enumerate(outputs):
                analytical = _check_analytical_jacobian_attributes(tupled_inputs, o, nondet_tol, check_grad_dtypes)
    
                for j, (a, n) in enumerate(zip(analytical, numerical[i])):
                    if not _allclose_with_type_promotion(a, n.to(a.device), rtol, atol):
>                       raise GradcheckError(_get_notallclose_msg(a, n, i, j, complex_indices, test_imag))
E                       torch.autograd.gradcheck.GradcheckError: Jacobian mismatch for output 0 with respect to input 0,
E                       numerical:tensor([[ 1.9532e+02,  2.5656e+02],
E                               [ 3.7359e+01,  1.0767e+01],
E                               [ 2.0287e+00,  1.2023e+00],
E                               [ 1.3320e+00,  3.9697e-01],
E                               [ 2.4867e-01,  1.9670e-01],
E                               [-3.6025e-01,  1.1015e-01],
E                               [-1.6212e-01,  6.2942e-02],
E                               [-6.7711e-02,  3.7193e-02],
E                               [-2.3127e-02,  2.2173e-02],
E                               [-1.9336e-01,  1.1921e-02],
E                               [-7.1049e-02,  7.3910e-03],
E                               [ 2.1219e-01,  4.2915e-03],
E                               [-2.1982e-01,  2.3842e-03],
E                               [ 2.4748e-01,  1.9073e-03],
E                               [-6.1989e-03,  9.5367e-04],
E                               [-1.9050e-01,  0.0000e+00],
E                               [ 1.2302e-01,  4.7684e-04],
E                               [ 9.7275e-02,  7.1526e-04],
E                               [-7.6294e-03, -2.3842e-04],
E                               [-8.6784e-02,  2.3842e-04],
E                               [ 1.1897e-01,  0.0000e+00],
E                               [-1.2517e-01,  0.0000e+00],
E                               [ 2.5821e-01,  0.0000e+00],
E                               [-9.5606e-02, -2.3842e-04],
E                               [-3.8981e-01,  0.0000e+00],
E                               [ 2.6083e-01, -2.3842e-04],
E                               [ 2.1482e-01,  0.0000e+00],
E                               [ 5.1737e-02, -4.7684e-04],
E                               [-7.4387e-02,  2.3842e-04],
E                               [ 5.2452e-02,  0.0000e+00],
E                               [ 1.4949e-01,  2.3842e-04],
E                               [ 1.2517e-01, -2.3842e-04],
E                               [-3.3593e-01,  0.0000e+00],
E                               [-7.9775e-01,  0.0000e+00],
E                               [ 4.7398e-01,  2.3842e-04],
E                               [ 4.5300e-01,  0.0000e+00],
E                               [ 3.0279e-02, -3.0446e-01],
E                               [ 1.1921e-03, -8.5115e-02],
E                               [ 7.1526e-04,  5.1975e-01],
E                               [ 2.3842e-04, -1.6117e-01],
E                               [ 9.5367e-04,  4.2439e-02],
E                               [-4.7684e-04,  1.5283e-01],
E                               [ 0.0000e+00, -8.1539e-02],
E                               [ 0.0000e+00,  4.7445e-02],
E                               [ 2.3842e-04,  3.1710e-02],
E                               [ 0.0000e+00, -1.2016e-01],
E                               [-7.1526e-04, -7.4387e-02],
E                               [ 0.0000e+00,  2.3484e-01],
E                               [-4.7684e-04, -2.1338e-01],
E                               [ 0.0000e+00,  9.0599e-03],
E                               [ 7.1526e-04,  2.2674e-01],
E                               [ 0.0000e+00, -2.0170e-01],
E                               [ 0.0000e+00,  1.3256e-01],
E                               [ 0.0000e+00,  9.8944e-02],
E                               [-2.3842e-04, -6.4373e-03],
E                               [ 2.3842e-04, -8.6546e-02],
E                               [ 0.0000e+00,  1.1945e-01],
E                               [ 2.3842e-04, -1.2493e-01],
E                               [-2.3842e-04, -8.0347e-02],
E                               [-2.3842e-04,  2.2817e-01],
E                               [ 2.3842e-04, -4.0579e-01],
E                               [ 0.0000e+00,  2.7943e-01],
E                               [-2.3842e-04,  2.1434e-01],
E                               [ 0.0000e+00,  5.1975e-02],
E                               [-2.3842e-04, -7.3910e-02],
E                               [ 0.0000e+00,  5.3406e-02],
E                               [-2.3842e-04,  1.4853e-01],
E                               [ 0.0000e+00, -3.2330e-01],
E                               [-4.7684e-04,  2.1815e-01],
E                               [ 2.3842e-04, -8.7214e-01],
E                               [ 0.0000e+00,  4.6754e-01],
E                               [ 0.0000e+00,  4.4966e-01]], device='cuda:0')
E                       analytical:tensor([[-0.3646, -0.0000],
E                               [-0.1560, -0.0000],
E                               [ 0.5206,  0.0000],
E                               [-0.1865, -0.0000],
E                               [ 0.1634,  0.0000],
E                               [ 0.0231,  0.0000],
E                               [-0.0825, -0.0000],
E                               [ 0.0413,  0.0000],
E                               [ 0.0413,  0.0000],
E                               [-0.1482, -0.0000],
E                               [-0.0941, -0.0000],
E                               [ 0.2423,  0.0000],
E                               [-0.2277, -0.0000],
E                               [ 0.2358,  0.0000],
E                               [-0.0080, -0.0000],
E                               [-0.2303, -0.0000],
E                               [ 0.1151,  0.0000],
E                               [ 0.1151,  0.0000],
E                               [-0.0173, -0.0000],
E                               [-0.0991, -0.0000],
E                               [ 0.1164,  0.0000],
E                               [-0.1395, -0.0000],
E                               [ 0.2369,  0.0000],
E                               [-0.0974, -0.0000],
E                               [-0.4631, -0.0000],
E                               [ 0.2316,  0.0000],
E                               [ 0.2316,  0.0000],
E                               [ 0.0413,  0.0000],
E                               [-0.0827, -0.0000],
E                               [ 0.0413,  0.0000],
E                               [ 0.1610,  0.0000],
E                               [ 0.1610,  0.0000],
E                               [-0.3220, -0.0000],
E                               [-0.8826, -0.0000],
E                               [ 0.4413,  0.0000],
E                               [ 0.4413,  0.0000],
E                               [-0.0000, -0.3645],
E                               [-0.0000, -0.1560],
E                               [ 0.0000,  0.5206],
E                               [-0.0000, -0.1865],
E                               [ 0.0000,  0.0231],
E                               [ 0.0000,  0.1634],
E                               [-0.0000, -0.0825],
E                               [ 0.0000,  0.0413],
E                               [ 0.0000,  0.0413],
E                               [-0.0000, -0.1482],
E                               [-0.0000, -0.0941],
E                               [ 0.0000,  0.2423],
E                               [-0.0000, -0.2277],
E                               [-0.0000, -0.0080],
E                               [ 0.0000,  0.2358],
E                               [-0.0000, -0.2303],
E                               [ 0.0000,  0.1151],
E                               [ 0.0000,  0.1151],
E                               [-0.0000, -0.0173],
E                               [-0.0000, -0.0991],
E                               [ 0.0000,  0.1164],
E                               [-0.0000, -0.1395],
E                               [-0.0000, -0.0974],
E                               [ 0.0000,  0.2369],
E                               [-0.0000, -0.4631],
E                               [ 0.0000,  0.2316],
E                               [ 0.0000,  0.2316],
E                               [ 0.0000,  0.0413],
E                               [-0.0000, -0.0827],
E                               [ 0.0000,  0.0413],
E                               [ 0.0000,  0.1610],
E                               [-0.0000, -0.3220],
E                               [ 0.0000,  0.1610],
E                               [-0.0000, -0.8826],
E                               [ 0.0000,  0.4412],
E                               [ 0.0000,  0.4411]], device='cuda:0')

../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/torch/autograd/gradcheck.py:978: GradcheckError
----------------------------- Captured stdout call -----------------------------
reference gradient
[[[[-1.86844e-01 -6.25550e-02  2.49399e-01]
   [-2.03377e-01  2.02399e-01  9.77000e-04]
   [-1.41016e-01  7.91230e-02  6.18930e-02]]

  [[-1.15520e-02 -8.12800e-02  9.28320e-02]
   [-1.54257e-01  2.29433e-01 -7.51760e-02]
   [-2.46593e-01  1.46405e-01  1.00188e-01]]

  [[-1.29180e-02 -6.15930e-02  7.45120e-02]
   [-5.59860e-02  2.19831e-01 -1.63845e-01]
   [-4.97627e-01  2.09240e-01  2.88387e-01]]

  [[ 1.36050e-02 -3.02200e-02  1.66150e-02]
   [ 1.13925e-01  6.27810e-02 -1.76706e-01]
   [-6.67078e-01  3.67659e-01  2.99419e-01]]]


 [[[-3.56344e-01 -5.53470e-02  4.11691e-01]
   [-9.69220e-02  2.94590e-02  6.74630e-02]
   [-6.35180e-02  2.76540e-02  3.58630e-02]]

  [[-1.54499e-01 -7.39420e-02  2.28441e-01]
   [-1.66790e-01 -8.80000e-05  1.66878e-01]
   [-1.72370e-01  1.05565e-01  6.68040e-02]]

  [[ 2.38750e-02 -1.18256e-01  9.43810e-02]
   [-1.04707e-01 -1.08934e-01  2.13642e-01]
   [-3.69844e-01  1.80118e-01  1.89726e-01]]

  [[ 2.57140e-02 -7.94620e-02  5.37480e-02]
   [ 1.22328e-01 -2.38789e-01  1.16460e-01]
   [-5.98687e-01  3.02203e-01  2.96484e-01]]]]
____________________ TestAutograd.test_RNNTLoss_gradcheck_1 ____________________

a = (<torchaudio_unittest.rnnt.autograd_cuda_test.TestAutograd testMethod=test_RNNTLoss_gradcheck_1>,)

    @wraps(func)
    def standalone_func(*a):
>       return func(*(a + p.args), **p.kwargs)

../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/parameterized/parameterized.py:533: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
autograd_impl.py:63: in test_RNNTLoss_gradcheck
    self.assert_grad(loss, inputs, enable_all_grad=False)
autograd_impl.py:46: in assert_grad
    assert gradcheck(loss, inputs, eps=1e-03, atol=1e-02, rtol=1e-02, nondet_tol=0.)
../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/torch/autograd/gradcheck.py:1245: in gradcheck
    return _gradcheck_helper(**args)
../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/torch/autograd/gradcheck.py:1258: in _gradcheck_helper
    _gradcheck_real_imag(gradcheck_fn, func, func_out, tupled_inputs, outputs, eps,
../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/torch/autograd/gradcheck.py:930: in _gradcheck_real_imag
    gradcheck_fn(func, func_out, tupled_inputs, outputs, eps,
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

func = RNNTLoss()
func_out = tensor([5.0957], device='cuda:0', grad_fn=<RNNTLossFunction>>)
tupled_inputs = (tensor([[[[ 0.2438, -0.5317,  0.2438,  0.2438, -0.1996],
          [ 0.1468,  0.1468, -0.2588,  0.1468, -0.1816],
   ..., dtype=torch.int32), tensor([2], device='cuda:0', dtype=torch.int32), tensor([2], device='cuda:0', dtype=torch.int32))
outputs = (tensor([5.0957], device='cuda:0', grad_fn=<RNNTLossFunction>>),)
eps = 0.001, rtol = 0.01, atol = 0.01, check_grad_dtypes = False
nondet_tol = 0.0

    def _slow_gradcheck(func, func_out, tupled_inputs, outputs, eps, rtol, atol, check_grad_dtypes,
                        nondet_tol, *, use_forward_ad=False, complex_indices=None, test_imag=False):
        if not outputs:
            return _check_no_differentiable_outputs(func, tupled_inputs, _as_tuple(func_out), eps)
    
        numerical = _transpose(_get_numerical_jacobian(func, tupled_inputs, outputs, eps=eps, is_forward_ad=use_forward_ad))
    
        if use_forward_ad:
            analytical_forward = _get_analytical_jacobian_forward_ad(func, tupled_inputs, outputs, check_grad_dtypes=check_grad_dtypes)
    
            for i, n_per_out in enumerate(numerical):
                for j, n in enumerate(n_per_out):
                    a = analytical_forward[j][i]
                    if not _allclose_with_type_promotion(a, n.to(a.device), rtol, atol):
                        raise GradcheckError(_get_notallclose_msg(a, n, i, j, complex_indices, test_imag,
                                                                  is_forward_ad=True))
        else:
            for i, o in enumerate(outputs):
                analytical = _check_analytical_jacobian_attributes(tupled_inputs, o, nondet_tol, check_grad_dtypes)
    
                for j, (a, n) in enumerate(zip(analytical, numerical[i])):
                    if not _allclose_with_type_promotion(a, n.to(a.device), rtol, atol):
>                       raise GradcheckError(_get_notallclose_msg(a, n, i, j, complex_indices, test_imag))
E                       torch.autograd.gradcheck.GradcheckError: Jacobian mismatch for output 0 with respect to input 0,
E                       numerical:tensor([[ 1.5292e+02],
E                               [ 1.2874e+01],
E                               [ 7.3957e-01],
E                               [ 3.3808e-01],
E                               [-7.9703e-01],
E                               [ 7.1311e-01],
E                               [ 3.2520e-01],
E                               [ 2.5606e-01],
E                               [ 1.4329e-01],
E                               [-1.9002e-01],
E                               [ 1.0657e-01],
E                               [ 9.7036e-02],
E                               [ 9.1076e-02],
E                               [ 8.3685e-02],
E                               [-2.0671e-01],
E                               [ 1.1563e-01],
E                               [-3.5834e-01],
E                               [ 1.6141e-01],
E                               [ 7.1287e-02],
E                               [ 6.9857e-02],
E                               [ 1.4281e-01],
E                               [ 1.4949e-01],
E                               [-6.6090e-01],
E                               [ 2.5082e-01],
E                               [ 1.4830e-01],
E                               [ 2.5296e-01],
E                               [ 2.5010e-01],
E                               [ 2.5296e-01],
E                               [ 2.5177e-01],
E                               [-8.9502e-01]], device='cuda:0')
E                       analytical:tensor([[ 0.2438],
E                               [-0.5317],
E                               [ 0.2438],
E                               [ 0.2438],
E                               [-0.1996],
E                               [ 0.1468],
E                               [ 0.1468],
E                               [-0.2588],
E                               [ 0.1468],
E                               [-0.1816],
E                               [ 0.0762],
E                               [ 0.0762],
E                               [ 0.0762],
E                               [ 0.0762],
E                               [-0.3046],
E                               [ 0.0760],
E                               [-0.3041],
E                               [ 0.0760],
E                               [ 0.0760],
E                               [ 0.0760],
E                               [ 0.1433],
E                               [ 0.1433],
E                               [-0.5733],
E                               [ 0.1433],
E                               [ 0.1433],
E                               [ 0.2318],
E                               [ 0.2318],
E                               [ 0.2318],
E                               [ 0.2318],
E                               [-0.9273]], device='cuda:0')

../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/torch/autograd/gradcheck.py:978: GradcheckError
----------------------------- Captured stdout call -----------------------------
reference gradient
[[[[ 0.17703132 -0.39992708  0.17703132  0.17703132 -0.13116692]
   [ 0.12247062  0.12247062 -0.181684    0.12247062 -0.1857276 ]
   [ 0.06269141  0.06269141  0.06928471  0.12624498 -0.32091248]]

  [[ 0.05456069 -0.2182428   0.05456069  0.05456069  0.05456069]
   [ 0.12073967  0.12073967 -0.48295838  0.12073967  0.12073967]
   [ 0.30741188  0.16871123  0.18645471  0.16871123 -0.83128875]]]]
=============================== warnings summary ===============================
test/torchaudio_unittest/rnnt/autograd_cpu_test.py::TestAutograd::test_RNNTLoss_gradcheck_0
test/torchaudio_unittest/rnnt/autograd_cpu_test.py::TestAutograd::test_RNNTLoss_gradcheck_1
test/torchaudio_unittest/rnnt/autograd_cpu_test.py::TestAutograd::test_np_transducer_gradcheck_0
test/torchaudio_unittest/rnnt/autograd_cpu_test.py::TestAutograd::test_np_transducer_gradcheck_1
test/torchaudio_unittest/rnnt/autograd_cuda_test.py::TestAutograd::test_RNNTLoss_gradcheck_0
test/torchaudio_unittest/rnnt/autograd_cuda_test.py::TestAutograd::test_RNNTLoss_gradcheck_1
test/torchaudio_unittest/rnnt/autograd_cuda_test.py::TestAutograd::test_np_transducer_gradcheck_0
test/torchaudio_unittest/rnnt/autograd_cuda_test.py::TestAutograd::test_np_transducer_gradcheck_1
  /private/home/vincentqb/miniconda/envs/torch-nightly/lib/python3.8/site-packages/torch/autograd/gradcheck.py:632: UserWarning: Input #0 requires gradient and is not a double precision floating point or complex. This check will likely fail if all the inputs are not of double precision floating point or complex. 
    warnings.warn(

-- Docs: https://docs.pytest.org/en/stable/warnings.html
=========================== short test summary info ============================
FAILED autograd_cpu_test.py::TestAutograd::test_RNNTLoss_gradcheck_0 - torch....
FAILED autograd_cpu_test.py::TestAutograd::test_RNNTLoss_gradcheck_1 - torch....
FAILED autograd_cuda_test.py::TestAutograd::test_RNNTLoss_gradcheck_0 - torch...
FAILED autograd_cuda_test.py::TestAutograd::test_RNNTLoss_gradcheck_1 - torch...
=================== 4 failed, 22 passed, 8 warnings in 4.39s ===================

Link to _slow_check error in pytorch and where tests compute vjp.

vincentqb · 2021-05-27T14:59:17Z

Trying to reproduce gradcheck.

diff --git a/test/torchaudio_unittest/rnnt/autograd_impl.py b/test/torchaudio_unittest/rnnt/autograd_impl.py
index ed1671b..c7ae5ee 100644
--- a/test/torchaudio_unittest/rnnt/autograd_impl.py
+++ b/test/torchaudio_unittest/rnnt/autograd_impl.py
@@ -21,8 +21,27 @@ class Autograd(TestBaseMixin):
     def get_data(data_func, device):
         data_np = data_func()
         if type(data_np) == tuple:
+            grad_out_base = torch.zeros_like(torch.tensor(data_np[1]), memory_format=torch.legacy_contiguous_format)
+            flat_grad_out = grad_out_base.view(-1)
+            print(grad_out_base.shape)
+            for j in range(flat_grad_out.numel()):
+                flat_grad_out.zero_()
+                flat_grad_out[j] = 1.0
+                t = torch.tensor(data_np[-1]).mul(grad_out_base.view((-1, 1, 1, 1)))
+                print("test", t.shape, t)
+            print("ref grads", data_np[-1].shape, data_np[-1])
+
             data_np = data_np[0]
         data = numpy_to_torch(
             data=data_np, device=device, requires_grad=True
@@ -78,3 +97,4 @@ class Autograd(TestBaseMixin):
         loss = NumpyTransducerLoss(blank=data["blank"])
 
         self.assert_grad(loss, inputs, enable_all_grad=False)

diff --git a/test/torchaudio_unittest/rnnt/numpy_transducer.py b/test/torchaudio_unittest/rnnt/numpy_transducer.py
index 1a90703..e559a1d 100644
--- a/test/torchaudio_unittest/rnnt/numpy_transducer.py
+++ b/test/torchaudio_unittest/rnnt/numpy_transducer.py
@@ -35,6 +35,9 @@ class _NumpyTransducer(torch.autograd.Function):
     @staticmethod
     def backward(ctx, grad_output):
         grad_output = grad_output.view(-1, 1, 1, 1).to(ctx.grads)
+        print("backward", ctx.grads.mul(grad_output).shape, ctx.grads.mul(grad_output))
+        print("grads", ctx.grads.shape, ctx.grads)
         return ctx.grads.mul(grad_output), None, None, None, None, None, None, None, None

vincentqb · 2021-05-27T16:23:30Z

gradcheck is passing.

================================================================================================================= warnings summary =================================================================================================================
test/torchaudio_unittest/rnnt/autograd_cpu_test.py::TestAutograd::test_RNNTLoss_gradcheck_0
test/torchaudio_unittest/rnnt/autograd_cpu_test.py::TestAutograd::test_RNNTLoss_gradcheck_1
test/torchaudio_unittest/rnnt/autograd_cpu_test.py::TestAutograd::test_np_transducer_gradcheck_0
test/torchaudio_unittest/rnnt/autograd_cpu_test.py::TestAutograd::test_np_transducer_gradcheck_1
test/torchaudio_unittest/rnnt/autograd_cuda_test.py::TestAutograd::test_RNNTLoss_gradcheck_0
test/torchaudio_unittest/rnnt/autograd_cuda_test.py::TestAutograd::test_RNNTLoss_gradcheck_1
test/torchaudio_unittest/rnnt/autograd_cuda_test.py::TestAutograd::test_np_transducer_gradcheck_0
test/torchaudio_unittest/rnnt/autograd_cuda_test.py::TestAutograd::test_np_transducer_gradcheck_1
  /private/home/vincentqb/miniconda/envs/torch-nightly/lib/python3.8/site-packages/torch/autograd/gradcheck.py:635: UserWarning: Input #0 requires gradient and is not a double precision floating point or complex. This check will likely fail if all the inputs are not of double precision floating point or complex. 
    warnings.warn(

-- Docs: https://docs.pytest.org/en/stable/warnings.html
========================================================================================================== 26 passed, 8 warnings in 4.38s ==========================================================================================================

The option reuse_logits_for_grads needs to default to False to avoid surprises for the user.

astaff

Great find! See my comment about fused_log_softmax vs. reuse_logits_for_grads set to False.

test/torchaudio_unittest/rnnt/autograd_impl.py

vincentqb · 2021-05-31T16:25:09Z

Failing gradcheck means that

out = rnnt_loss(inputs)
grad1, = autograd.grad(out, inputs, retain_graph=True)
grad2, = autograd.grad(out, inputs)
self.assertEqual(grad1, grad2, atol=atol, rtol=rtol)

will fail.

carolineechen · 2021-06-01T20:54:48Z

test/torchaudio_unittest/rnnt/autograd_impl.py

+        (get_numpy_data_B2_T4_U3_D3, ),
+        (get_numpy_data_B1_T2_U3_D5, ),
+    ])
+    def test_RNNTLoss_gradcheck(self, data_func):


great find, and thanks for looking into this! can you add autograd tests for the functional version as well?

good point, added :)

carolineechen

Left a minor comment but otherwise lgtm, thanks for working on this!

carolineechen · 2021-06-03T21:06:53Z

test/torchaudio_unittest/rnnt/autograd_impl.py

+                if enable_all_grad:
+                    i.requires_grad = True
+            inputs_.append(i)
+        assert gradcheck(loss, inputs, eps=1e-03, atol=1e-03, rtol=1e-03, nondet_tol=0.)


can you check if atol can be reduced any further?

This needs to be documented in the test code. This looks much lower precision than other autograd tests we have in torchaudio.

can you check if atol can be reduced any further?

unfortunately, that's the lowest for atol and eps as expected for float32. rtol can be reduced to 0, but this seems too good, so let's use the default value (0.001).

This needs to be documented in the test code. This looks much lower precision than other autograd tests we have in torchaudio.

As commented here this is due to the use of float32. added a comment above the line.

Co-authored-by: Holly Sweeney <77758406+holly1238@users.noreply.github.com>

* Update build.sh Picks up 1.9 build from test. * Update build.sh * Update lite interpreter tutorial to beta (pytorch#1549) * Update lite interpreter tutorial to beta * Update lite interpreter to beta * update model export script * address comment and update documentation * add custome build in first paragraph * Update prototype_source/lite_interpreter.rst Co-authored-by: Raziel <129535+raziel@users.noreply.github.com> * Update prototype_source/lite_interpreter.rst Co-authored-by: Raziel <129535+raziel@users.noreply.github.com> * Update prototype_source/lite_interpreter.rst Co-authored-by: Raziel <129535+raziel@users.noreply.github.com> * Update prototype_source/lite_interpreter.rst Co-authored-by: Raziel <129535+raziel@users.noreply.github.com> * Update prototype_source/lite_interpreter.rst Co-authored-by: Raziel <129535+raziel@users.noreply.github.com> * Update prototype_source/lite_interpreter.rst Co-authored-by: Raziel <129535+raziel@users.noreply.github.com> * Update prototype_source/lite_interpreter.rst Co-authored-by: Raziel <129535+raziel@users.noreply.github.com> * Update prototype_source/lite_interpreter.rst Co-authored-by: Raziel <129535+raziel@users.noreply.github.com> * Update prototype_source/lite_interpreter.rst Co-authored-by: Raziel <129535+raziel@users.noreply.github.com> * Update prototype_source/lite_interpreter.rst Co-authored-by: Raziel <129535+raziel@users.noreply.github.com> * replace file name * update ios part Co-authored-by: Raziel <129535+raziel@users.noreply.github.com> * Revert "Update lite interpreter tutorial to beta (pytorch#1549)" (pytorch#1569) This reverts commit a702ca0fafe9d4a1ee0c1e4331de66245ceb3103. * Update build.sh * Update build.sh * updated pipeline tutorial (pytorch#1562) * reduce (pytorch#1546) * Update seq2seq_translation_tutorial.py (pytorch#1532) Co-authored-by: Holly Sweeney <77758406+holly1238@users.noreply.github.com> * added CPU optimization guide part into tuning_guide (pytorch#1512) * added CPU optimization guide part into tuning_guide * changed non-python command to python comments in CPU specific optimization section * Update tuning_guide.py Changed comment of bash commands to double quote. * Update tuning_guide.py Co-authored-by: Brian Johnson <brianjo@fb.com> * Typo fix (pytorch#1538) Co-authored-by: Holly Sweeney <77758406+holly1238@users.noreply.github.com> * Typo fix in text sentiment tutorial (pytorch#1543) Trivial typo fix in docs * Update dcgan_faces_tutorial.py (pytorch#1550) Co-authored-by: Holly Sweeney <77758406+holly1238@users.noreply.github.com> * updated pipeline tutorial Co-authored-by: define_liuyi <793753866@qq.com> Co-authored-by: dhayeah <57786651+dhayeah@users.noreply.github.com> Co-authored-by: Holly Sweeney <77758406+holly1238@users.noreply.github.com> Co-authored-by: Jing Xu <jing.xu@intel.com> Co-authored-by: Brian Johnson <brianjo@fb.com> Co-authored-by: Andrew C. Freeman <andrew.freeman@cawb.com> Co-authored-by: Davide Fiocco <davidefiocco@users.noreply.github.com> Co-authored-by: universuen <52519513+universuen@users.noreply.github.com> * Update audio manipulation tutorial (pytorch#1566) * add resampling tutorial * update benchmarking and sectioning * remove np import * Update torchaudio tutorial * update resample dtype initialization Co-authored-by: moto <855818+mthrok@users.noreply.github.com> * updated text sentiment tutorial (pytorch#1563) * updated transformer tutorial (pytorch#1565) * Update numeric_suite_tutorial.py s/Logger=/logger_cls=/ * Update profiler recipe doc (1.9) (pytorch#1528) Summary: Update the profiler recipe to use the new API and features Test Plan: make html-noplot Co-authored-by: Brian Johnson <brianjo@fb.com> * Update build.sh Co-authored-by: cccclai <chenlai@fb.com> Co-authored-by: Raziel <129535+raziel@users.noreply.github.com> Co-authored-by: parmeet <parmeetbhatia@fb.com> Co-authored-by: define_liuyi <793753866@qq.com> Co-authored-by: dhayeah <57786651+dhayeah@users.noreply.github.com> Co-authored-by: Holly Sweeney <77758406+holly1238@users.noreply.github.com> Co-authored-by: Jing Xu <jing.xu@intel.com> Co-authored-by: Andrew C. Freeman <andrew.freeman@cawb.com> Co-authored-by: Davide Fiocco <davidefiocco@users.noreply.github.com> Co-authored-by: universuen <52519513+universuen@users.noreply.github.com> Co-authored-by: Caroline Chen <carolinechen@fb.com> Co-authored-by: moto <855818+mthrok@users.noreply.github.com> Co-authored-by: Nikita Shulga <nshulga@fb.com> Co-authored-by: ilia-cher <30845429+ilia-cher@users.noreply.github.com>

facebook-github-bot added the CLA Signed label May 26, 2021

vincentqb force-pushed the rnntautograd branch 2 times, most recently from 65eb141 to 69029b1 Compare May 26, 2021 19:44

vincentqb force-pushed the rnntautograd branch from 8129432 to 6f0221a Compare May 26, 2021 21:43

astaff reviewed May 27, 2021

View reviewed changes

test/torchaudio_unittest/rnnt/autograd_impl.py Show resolved Hide resolved

vincentqb force-pushed the rnntautograd branch 2 times, most recently from 80534ac to 619fd57 Compare May 27, 2021 19:39

vincentqb mentioned this pull request May 27, 2021

Autograd Follow-Up vincentqb/audio#10

Draft

vincentqb marked this pull request as ready for review May 31, 2021 16:38

vincentqb requested review from carolineechen, cpuhrsch and mthrok May 31, 2021 16:38

vincentqb mentioned this pull request May 31, 2021

remove reuse logits for grads from python interface. #1536

Closed

carolineechen reviewed Jun 1, 2021

View reviewed changes

vincentqb changed the title ~~Autograd Test~~ RNN Transducer Loss Autograd Test Jun 3, 2021

vincentqb force-pushed the rnntautograd branch from 619fd57 to 63d8420 Compare June 3, 2021 15:44

vincentqb requested a review from carolineechen June 3, 2021 20:56

carolineechen approved these changes Jun 3, 2021

View reviewed changes

vincentqb added 4 commits June 3, 2021 14:51

select autograd test from carolineechen#2

b0740ab

fix numpy backward: be careful to not modify inplace.

65c0d59

gradcheck will fail if input is modified in place.

fa2956c

add rnnt_loss autograd test too.

4f6ec1e

vincentqb force-pushed the rnntautograd branch from 78d7aa9 to 484c217 Compare June 3, 2021 21:52

leave rtol to default value.

3425bac

vincentqb force-pushed the rnntautograd branch from 484c217 to 3425bac Compare June 3, 2021 21:56

vincentqb merged commit d4d0907 into pytorch:master Jun 4, 2021

vincentqb mentioned this pull request Jun 10, 2021

RNN Transducer Loss #1240

Open

22 tasks

carolineechen mentioned this pull request Jun 24, 2021

Remove reuse_logits_for_grads option for RNNTL #1610

Merged

mthrok pushed a commit to mthrok/audio that referenced this pull request Dec 13, 2022

Update seq2seq_translation_tutorial.py (pytorch#1532)

a162702

Co-authored-by: Holly Sweeney <77758406+holly1238@users.noreply.github.com>

RNN Transducer Loss Autograd Test #1532

RNN Transducer Loss Autograd Test #1532

Uh oh!

Conversation

vincentqb commented May 26, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vincentqb commented May 26, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vincentqb commented May 26, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vincentqb commented May 26, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vincentqb commented May 26, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vincentqb commented May 26, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vincentqb commented May 27, 2021

Uh oh!

vincentqb commented May 27, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

astaff left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

vincentqb commented May 31, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

carolineechen Jun 1, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vincentqb Jun 3, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

carolineechen left a comment

Choose a reason for hiding this comment

Uh oh!

carolineechen Jun 3, 2021

Choose a reason for hiding this comment

Uh oh!

mthrok Jun 3, 2021

Choose a reason for hiding this comment

Uh oh!

vincentqb Jun 3, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vincentqb Jun 3, 2021

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

vincentqb commented May 26, 2021 •

edited

Loading

vincentqb commented May 26, 2021 •

edited

Loading

vincentqb commented May 26, 2021 •

edited

Loading

vincentqb commented May 26, 2021 •

edited

Loading

vincentqb commented May 26, 2021 •

edited

Loading

vincentqb commented May 26, 2021 •

edited

Loading

vincentqb commented May 27, 2021 •

edited

Loading

vincentqb commented May 31, 2021 •

edited

Loading

carolineechen Jun 1, 2021 •

edited

Loading

vincentqb Jun 3, 2021 •

edited

Loading

vincentqb Jun 3, 2021 •

edited

Loading