KEMBAR78
Add MKL-DNN Tensor by jgong5 · Pull Request #17748 · pytorch/pytorch · GitHub
Skip to content

Conversation

@jgong5
Copy link
Collaborator

@jgong5 jgong5 commented Mar 7, 2019

This is a minimalist PR to add MKL-DNN tensor per discussion from Github issue: #16038

Changes include:

  1. An opaque handle is added to TensorImpl, i.e. TensorImpl::opaque_handle_. It allows customized data layout other than the default strided layout. Storage is initialized and it is valid to get data pointer and range via data() and numel(). Metadata like device, dtype and dims can be queried like a normal tensor with strided layout but cannot be changed. Therefore allow_tensor_metadata_change() always returns false and set_allow_tensor_metadata_change() is no-op. The strides() is not supported. Calling any method that changes the metadata and storage would fail. Calling to is_contiguous() always returns false since "contiguous" is not well-defined for a customized layout.
  2. OpaqueHandle template class is added and templated on arbitrary opaque handle class.
  3. A new "Mkldnn" opaque layout is added.
  4. MKL-DNN tensor is added with a new "Mkldnn" layout via OpaqueHandle<ideep::tensor> plugged into TensorImpl::opaque_handle_. Now it only supports CPU device (with tensor id MkldnnCPU) and fp32 data type.
  5. On Pytorch frontend, conversion method "to_mkldnn" is added to convert from strided CPU tensor to MKL-DNN tensor, similar to "to_sparse"call. Reuse "to_dense" to convert back. Corresponding backward conversions are also added.
  6. A simple test case is added to test the conversion and the query of tensor device, dtype and dim info.

Ops with MKL-DNN tensor will be supported in following-up PRs to speed up imperative path.

@jgong5
Copy link
Collaborator Author

jgong5 commented Mar 7, 2019

@Jianhui-Li
Copy link

@bddppq @gchanan @dzhulgakov @apaszke @cpuhrsch This is the PR for mkldnn tensor for C10. Your feedback are welcome.

jgong5 added 2 commits March 8, 2019 06:46
pytorch#16038

1. A new layout "Mkldnn" is added.
2. MKL-DNN tensor is added with a new "Mkldnn" layout, similar to Sparse tensor. Now it only supports CPU device (with tensor id MkldnnCPU) and fp32 data type. MKLDNNTensorImpl, sub-class of TensorImpl is added accordingly, similar to SparseTensorImpl. Internally MKLDNNTensorImpl delegates its functionality to IDEEP tensor.
3. On Pytorch frontend, conversion methods "to_mkldnn" and "to_plainfmt" are added to convert between strided CPU tensor and MKL-DNN tensor, similar to "to_sparse" and "to_dense" calls. Corresponding backward conversions are also added. A simple test case is added to test the conversion.
@gchanan
Copy link
Contributor

gchanan commented Mar 7, 2019

MKLDNNTensorImpl should be put under c10 folder but since it would require c10 build including IDEEP, we will do it later as well.

SparseTensorImpl isn't in c10, why does MKLDNNTensorImpl need to be?

@jgong5
Copy link
Collaborator Author

jgong5 commented Mar 8, 2019

MKLDNNTensorImpl should be put under c10 folder but since it would require c10 build including IDEEP, we will do it later as well.

SparseTensorImpl isn't in c10, why does MKLDNNTensorImpl need to be?

@gchanan Thanks for the question. Since both ATen and Caffe2 backends need MKL-DNN optimization and are merging into c10, it would be beneficial to allow them share the same MKL-DNN tensor definition, the same way as how CPU/GPU tensors are being unified (into c10::TensorImpl). That's why c10 folder is a better place to hold MKLDNNTensorImpl.

jgong5 added 4 commits March 8, 2019 23:12
2. Do not support size(), sizes() and dim() from MKLDNNTensorImpl, query the size/dim from IDEEP tensor directly.
@ykim362
Copy link

ykim362 commented Mar 8, 2019

I have a general question. When would you expect and/or plan to have mkldnn is fully available in pytorch?

@mingfeima
Copy link
Collaborator

@ykim362 that depends on how you define fully.

On the caffe2 backend, the 2D CNN features are full. In case you have further questions jgong5 has the answer.
On the aten backend, conv2d is already there and rnn is currently under review. Most of other operators (bn, relu, pooling, and the 3D ones) already finished coding but perhaps i still need to refactor some of them (now aten calls mkldnn directly and caffe2 calls ideep bridge).

@ykim362
Copy link

ykim362 commented Mar 9, 2019

Thanks for the answer @mingfeima !
Yes, I am referring aten backend. Would FC layer also be included?
Thanks.

@ezyang
Copy link
Contributor

ezyang commented Mar 11, 2019

@pytorchbot rebase this please

@gchanan
Copy link
Contributor

gchanan commented Mar 11, 2019

is this currently useful to expose to the eager frontend? Are there ops you can string together where converting from/to the layout inside the op isn't sufficient?

@jgong5
Copy link
Collaborator Author

jgong5 commented Mar 12, 2019

is this currently useful to expose to the eager frontend? Are there ops you can string together where converting from/to the layout inside the op isn't sufficient?

@gchanan Thanks for the question. Yes, it is important to the performance of eager frontend. It can avoid conversion overhead of activation and weight in ops related to both CNN and RNN. As an example, conversion overhead of activation accounts for 20%-50% of the training workload with ResNet-50 and UNet3D. Weight cache can speed up a particular LSTM inference workload by 1.2X to 5-6X (see #15914).

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bddppq has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bddppq has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bddppq has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

jgong5 and others added 4 commits April 9, 2019 06:54
Conflicts:
	aten/src/ATen/native_parse.py
	c10/core/TensorImpl.cpp
	c10/core/TensorImpl.h
Main changes:
1) Introduce OpaqueTensorImpl, which is like SparseTensorImpl without sparse-specific code, and which holds a templatized OpaqueHandler.
2) OpaqueTensorImpl isn't forced to hold an intrusive_ptr_target; Instead, introduce IntrusivePtrTargetWrapper internal to MKLDNN.
3) Nothing inc10
4) Add a test for detach.
5) Assert there's no extra twin in an ideep tensor.
Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bddppq has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bddppq has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bddppq has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@jgong5
Copy link
Collaborator Author

jgong5 commented Apr 9, 2019

@gchanan I noticed you disabled the access to data_ptr of opaque tensor. Will you turn it on in the future? I think we can leave the flexibility for opaque tensor to leverage the storage, therefore enabling data_ptr as well. I did it that way in my previous changes. You know, my intention is to avoid additional changes to the client code, e.g. model code in fairseq and C10D.

@bddppq
Copy link
Contributor

bddppq commented Apr 9, 2019

@jgong5 data_ptr is in the storage and opaque tensorimpl does not have storage, so I don't think we will turn the access to data_ptr back on.
How did you plan to share the data_ptr of mkldnn tensors?

@jgong5
Copy link
Collaborator Author

jgong5 commented Apr 9, 2019

How did you plan to share the data_ptr of mkldnn tensors?

@bddppq I planned to expose MKL-DNN tensor in a way to support element-wise operation on its buffer. Take C10D as an example, I hoped to pass an MKL-DNN tensor to C10D directly without extra format conversion. Inside C10D, it uses data_ptr and numel to get the buffer for communication:
https://github.com/pytorch/pytorch/blob/master/torch/lib/c10d/ProcessGroupGloo.cpp#L1417

@bddppq
Copy link
Contributor

bddppq commented Apr 9, 2019

@jgong5 Hmm but that piece of code in c10 assume the underlying storage is contiguous:

  auto ptr = tensor.data_ptr();
  auto size = tensor.numel() * tensor.element_size();

  // Construct unbound buffer.
  auto& context = contexts_[0];
  auto buf = context->createUnboundBuffer(ptr, size);
  buf->send(dstRank, utag);

I don't think that holds for opaque tensorimpl. A better option would be define an interface for remote serialization for tensors and update c10d to use that.

@jgong5
Copy link
Collaborator Author

jgong5 commented Apr 9, 2019

@jgong5 Hmm but that piece of code in c10 assume the underlying storage is contiguous:
I don't think that holds for opaque tensorimpl. A better option would be define an interface for remote serialization for tensors and update c10d to use that.

@bddppq Yes, I realized that point but this is benign for element-wise operation of MKL-DNN tensors as long as they have same format. I agree with you that abstracting out an interface for opaque tensor would be more general.

But how can we handle the model code using data_ptr to check tensor equality?
https://github.com/pytorch/fairseq/blob/master/fairseq/modules/multihead_attention.py#L75

zdevito pushed a commit to zdevito/ATen that referenced this pull request Apr 9, 2019
Summary:
This is a minimalist PR to add MKL-DNN tensor per discussion from Github issue: pytorch/pytorch#16038

Ops with MKL-DNN tensor will be supported in following-up PRs to speed up imperative path.
Pull Request resolved: pytorch/pytorch#17748

Reviewed By: dzhulgakov

Differential Revision: D14614640

Pulled By: bddppq

fbshipit-source-id: c58de98e244b0c63ae11e10d752a8e8ed920c533
@gchanan
Copy link
Contributor

gchanan commented Apr 9, 2019

I think we could enable data_ptr in the future. To support that I'd want to figure out which places are currently calling data_ptr() now -- essentially you want two different APIs for "opaque data pointer" (e.g. I just want to copy over this buffer) and "data pointer where indexing is valid." At the python level we can probably just use the same API, but some sanity at the C++ API would be good.

But to @bddppq's point, data_point, numel, contiguous, etc. aren't sufficient for writing a serializer. Because it already doesn't work for sparse. CC @VitalyFedyunin.

Is there a specific use case you have code to support if data_ptr were enabled now?

mingfeima added a commit to mingfeima/pytorch that referenced this pull request Apr 10, 2019
cpu2mkldnn:
    cpu_tensor.to_mkldnn()

mkldnn2ccpu:
    mkldnn_tensor.to_dense()

get ideep::tensor from at::tensor:
    get_mkldnn_itensor(atensor)

this commit duplicate work from pytorch#17748
so that subsequent work won't be blocked, also code in ATen is re-organized,
making it more "ATen".
zhangguanheng66 pushed a commit to zhangguanheng66/pytorch that referenced this pull request May 6, 2019
Summary:
This is a minimalist PR to add MKL-DNN tensor per discussion from Github issue: pytorch#16038

Ops with MKL-DNN tensor will be supported in following-up PRs to speed up imperative path.
Pull Request resolved: pytorch#17748

Reviewed By: dzhulgakov

Differential Revision: D14614640

Pulled By: bddppq

fbshipit-source-id: c58de98e244b0c63ae11e10d752a8e8ed920c533
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.