KEMBAR78
Dynamic GRU weight · Issue #15749 · pytorch/pytorch · GitHub
Skip to content

Dynamic GRU weight #15749

@PetrochukM

Description

@PetrochukM

🐛 Bug

For weight norm or pruning, this library needs to support dynamically updating the weight tensors. My might attempt to do so that brought up several interesting stack traces:

To Reproduce

Run this script:

import torch

# Constants
size = 16
batch_size = 4
seq_len = 8
device = torch.device('cuda')
input_ = torch.randn(seq_len, batch_size, size).to(device)
hidden = torch.randn(1, batch_size, size).to(device)

gru = torch.nn.GRU(size, size).to(device)

# Update weight with a `torch.tensor`
# NOTE: Similar weight update as torch.nn.utils.weight_nrom
data = gru.weight_hh_l0.data
del gru._parameters['weight_hh_l0']
setattr(gru, 'weight_hh_l0', torch.tensor(data))

# Optional call to resolve parameter shapes
gru.flatten_parameters()

# Run forward pass
_, output = gru(input_, hidden)

With out gru.flatten_parameters:

UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
  setattr(gru, 'weight_hh_l0', torch.tensor(data))
Traceback (most recent call last):
  File "ddd.py", line 15, in <module>
    _, output = gru(input_, hidden)
  File "/home/michaelp/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/michaelp/.local/lib/python3.6/site-packages/torch/nn/modules/rnn.py", line 179, in forward
    self.dropout, self.training, self.bidirectional, self.batch_first)
RuntimeError: num_ptrs == (num_parameters * (has_biases ? 1 : 2)) ASSERT FAILED at /pytorch/aten/src/ATen/native/cudnn/RNN.cpp:1190, please report a bug to PyTorch.

With gru.flatten_parameters:

UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
  setattr(gru, 'weight_hh_l0', torch.tensor(data))
Traceback (most recent call last):
  File "ddd.py", line 14, in <module>
    gru.flatten_parameters()
  File "/home/michaelp/.local/lib/python3.6/site-packages/torch/nn/modules/rnn.py", line 113, in flatten_parameters
    self.batch_first, bool(self.bidirectional))
RuntimeError: MatrixRef: ArrayRef size 3 not divisible by stride 4

Expected behavior

That I can update the GRU weight with a new torch.tensor, without an issue.

Environment

Collecting environment information...
PyTorch version: 1.0.0
Is debug build: No
CUDA used to build PyTorch: 9.0.176

OS: Ubuntu 18.04.1 LTS
GCC version: (Ubuntu 7.3.0-16ubuntu3) 7.3.0
CMake version: version 3.10.2

Python version: 3.6
Is CUDA available: Yes
CUDA runtime version: Could not collect
GPU models and configuration:
GPU 0: Tesla P100-PCIE-16GB
GPU 1: Tesla P100-PCIE-16GB
GPU 2: Tesla P100-PCIE-16GB
GPU 3: Tesla P100-PCIE-16GB

Nvidia driver version: 390.30
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.7.1.3

Versions of relevant libraries:
[pip] Could not collect
[conda] Could not collect

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions