Add MKL-DNN Tensor #17748

jgong5 · 2019-03-07T14:30:35Z

This is a minimalist PR to add MKL-DNN tensor per discussion from Github issue: #16038

Changes include:

An opaque handle is added to TensorImpl, i.e. TensorImpl::opaque_handle_. It allows customized data layout other than the default strided layout. Storage is initialized and it is valid to get data pointer and range via data() and numel(). Metadata like device, dtype and dims can be queried like a normal tensor with strided layout but cannot be changed. Therefore allow_tensor_metadata_change() always returns false and set_allow_tensor_metadata_change() is no-op. The strides() is not supported. Calling any method that changes the metadata and storage would fail. Calling to is_contiguous() always returns false since "contiguous" is not well-defined for a customized layout.
OpaqueHandle template class is added and templated on arbitrary opaque handle class.
A new "Mkldnn" opaque layout is added.
MKL-DNN tensor is added with a new "Mkldnn" layout via OpaqueHandle<ideep::tensor> plugged into TensorImpl::opaque_handle_. Now it only supports CPU device (with tensor id MkldnnCPU) and fp32 data type.
On Pytorch frontend, conversion method "to_mkldnn" is added to convert from strided CPU tensor to MKL-DNN tensor, similar to "to_sparse"call. Reuse "to_dense" to convert back. Corresponding backward conversions are also added.
A simple test case is added to test the conversion and the query of tensor device, dtype and dim info.

Ops with MKL-DNN tensor will be supported in following-up PRs to speed up imperative path.

jgong5 · 2019-03-07T14:35:00Z

@yinghai @gujinghui @mingfeima @Jianhui-Li

aten/src/ATen/gen.py

aten/src/ATen/native/mkldnn/MKLDNNTensor.cpp

Jianhui-Li · 2019-03-07T17:57:13Z

@bddppq @gchanan @dzhulgakov @apaszke @cpuhrsch This is the PR for mkldnn tensor for C10. Your feedback are welcome.

pytorch#16038 1. A new layout "Mkldnn" is added. 2. MKL-DNN tensor is added with a new "Mkldnn" layout, similar to Sparse tensor. Now it only supports CPU device (with tensor id MkldnnCPU) and fp32 data type. MKLDNNTensorImpl, sub-class of TensorImpl is added accordingly, similar to SparseTensorImpl. Internally MKLDNNTensorImpl delegates its functionality to IDEEP tensor. 3. On Pytorch frontend, conversion methods "to_mkldnn" and "to_plainfmt" are added to convert between strided CPU tensor and MKL-DNN tensor, similar to "to_sparse" and "to_dense" calls. Corresponding backward conversions are also added. A simple test case is added to test the conversion.

gchanan · 2019-03-07T23:49:43Z

MKLDNNTensorImpl should be put under c10 folder but since it would require c10 build including IDEEP, we will do it later as well.

SparseTensorImpl isn't in c10, why does MKLDNNTensorImpl need to be?

jgong5 · 2019-03-08T04:13:10Z

MKLDNNTensorImpl should be put under c10 folder but since it would require c10 build including IDEEP, we will do it later as well.

SparseTensorImpl isn't in c10, why does MKLDNNTensorImpl need to be?

@gchanan Thanks for the question. Since both ATen and Caffe2 backends need MKL-DNN optimization and are merging into c10, it would be beneficial to allow them share the same MKL-DNN tensor definition, the same way as how CPU/GPU tensors are being unified (into c10::TensorImpl). That's why c10 folder is a better place to hold MKLDNNTensorImpl.

aten/src/ATen/core/Tensor.h

aten/src/ATen/gen.py

aten/src/ATen/native/mkldnn/MKLDNNTensorImpl.cpp

2. Do not support size(), sizes() and dim() from MKLDNNTensorImpl, query the size/dim from IDEEP tensor directly.

…lict with pytorch#17782.

…rge conflict with pytorch#17782." This reverts commit 4b53b12.

ykim362 · 2019-03-08T18:06:03Z

I have a general question. When would you expect and/or plan to have mkldnn is fully available in pytorch?

mingfeima · 2019-03-09T00:44:45Z

@ykim362 that depends on how you define fully.

On the caffe2 backend, the 2D CNN features are full. In case you have further questions jgong5 has the answer.
On the aten backend, conv2d is already there and rnn is currently under review. Most of other operators (bn, relu, pooling, and the 3D ones) already finished coding but perhaps i still need to refactor some of them (now aten calls mkldnn directly and caffe2 calls ideep bridge).

ykim362 · 2019-03-09T00:53:11Z

Thanks for the answer @mingfeima !
Yes, I am referring aten backend. Would FC layer also be included?
Thanks.

ezyang · 2019-03-11T20:19:56Z

@pytorchbot rebase this please

gchanan · 2019-03-11T20:26:31Z

is this currently useful to expose to the eager frontend? Are there ops you can string together where converting from/to the layout inside the op isn't sufficient?

aten/src/ATen/core/Type.h

…e opaque handle design.

Conflicts: aten/src/ATen/gen.py

jgong5 · 2019-03-12T01:52:10Z

is this currently useful to expose to the eager frontend? Are there ops you can string together where converting from/to the layout inside the op isn't sufficient?

@gchanan Thanks for the question. Yes, it is important to the performance of eager frontend. It can avoid conversion overhead of activation and weight in ops related to both CNN and RNN. As an example, conversion overhead of activation accounts for 20%-50% of the training workload with ResNet-50 and UNet3D. Weight cache can speed up a particular LSTM inference workload by 1.2X to 5-6X (see #15914).

aten/src/ATen/native/mkldnn/MKLDNNConversions.cpp

c10/core/TensorImpl.h

c10/core/TensorImpl.cpp

…into mkldnn_tensor_dev

facebook-github-bot