-
Notifications
You must be signed in to change notification settings - Fork 25.7k
Add MKL-DNN Tensor #17748
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add MKL-DNN Tensor #17748
Conversation
|
@bddppq @gchanan @dzhulgakov @apaszke @cpuhrsch This is the PR for mkldnn tensor for C10. Your feedback are welcome. |
pytorch#16038 1. A new layout "Mkldnn" is added. 2. MKL-DNN tensor is added with a new "Mkldnn" layout, similar to Sparse tensor. Now it only supports CPU device (with tensor id MkldnnCPU) and fp32 data type. MKLDNNTensorImpl, sub-class of TensorImpl is added accordingly, similar to SparseTensorImpl. Internally MKLDNNTensorImpl delegates its functionality to IDEEP tensor. 3. On Pytorch frontend, conversion methods "to_mkldnn" and "to_plainfmt" are added to convert between strided CPU tensor and MKL-DNN tensor, similar to "to_sparse" and "to_dense" calls. Corresponding backward conversions are also added. A simple test case is added to test the conversion.
SparseTensorImpl isn't in c10, why does MKLDNNTensorImpl need to be? |
@gchanan Thanks for the question. Since both ATen and Caffe2 backends need MKL-DNN optimization and are merging into c10, it would be beneficial to allow them share the same MKL-DNN tensor definition, the same way as how CPU/GPU tensors are being unified (into c10::TensorImpl). That's why c10 folder is a better place to hold MKLDNNTensorImpl. |
2. Do not support size(), sizes() and dim() from MKLDNNTensorImpl, query the size/dim from IDEEP tensor directly.
…rge conflict with pytorch#17782." This reverts commit 4b53b12.
|
I have a general question. When would you expect and/or plan to have mkldnn is fully available in pytorch? |
|
@ykim362 that depends on how you define On the caffe2 backend, the 2D CNN features are full. In case you have further questions jgong5 has the answer. |
|
Thanks for the answer @mingfeima ! |
|
@pytorchbot rebase this please |
|
is this currently useful to expose to the eager frontend? Are there ops you can string together where converting from/to the layout inside the op isn't sufficient? |
…e opaque handle design.
Conflicts: aten/src/ATen/gen.py
@gchanan Thanks for the question. Yes, it is important to the performance of eager frontend. It can avoid conversion overhead of activation and weight in ops related to both CNN and RNN. As an example, conversion overhead of activation accounts for 20%-50% of the training workload with ResNet-50 and UNet3D. Weight cache can speed up a particular LSTM inference workload by 1.2X to 5-6X (see #15914). |
…into mkldnn_tensor_dev
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bddppq has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bddppq has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bddppq has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
Conflicts: aten/src/ATen/native_parse.py c10/core/TensorImpl.cpp c10/core/TensorImpl.h
Main changes: 1) Introduce OpaqueTensorImpl, which is like SparseTensorImpl without sparse-specific code, and which holds a templatized OpaqueHandler. 2) OpaqueTensorImpl isn't forced to hold an intrusive_ptr_target; Instead, introduce IntrusivePtrTargetWrapper internal to MKLDNN. 3) Nothing inc10 4) Add a test for detach. 5) Assert there's no extra twin in an ideep tensor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bddppq has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bddppq has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bddppq has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
|
@gchanan I noticed you disabled the access to |
|
@jgong5 data_ptr is in the storage and opaque tensorimpl does not have storage, so I don't think we will turn the access to data_ptr back on. |
@bddppq I planned to expose MKL-DNN tensor in a way to support element-wise operation on its buffer. Take C10D as an example, I hoped to pass an MKL-DNN tensor to C10D directly without extra format conversion. Inside C10D, it uses |
|
@jgong5 Hmm but that piece of code in c10 assume the underlying storage is contiguous: I don't think that holds for opaque tensorimpl. A better option would be define an interface for remote serialization for tensors and update c10d to use that. |
@bddppq Yes, I realized that point but this is benign for element-wise operation of MKL-DNN tensors as long as they have same format. I agree with you that abstracting out an interface for opaque tensor would be more general. But how can we handle the model code using |
Summary: This is a minimalist PR to add MKL-DNN tensor per discussion from Github issue: pytorch/pytorch#16038 Ops with MKL-DNN tensor will be supported in following-up PRs to speed up imperative path. Pull Request resolved: pytorch/pytorch#17748 Reviewed By: dzhulgakov Differential Revision: D14614640 Pulled By: bddppq fbshipit-source-id: c58de98e244b0c63ae11e10d752a8e8ed920c533
|
I think we could enable But to @bddppq's point, Is there a specific use case you have code to support if data_ptr were enabled now? |
cpu2mkldnn:
cpu_tensor.to_mkldnn()
mkldnn2ccpu:
mkldnn_tensor.to_dense()
get ideep::tensor from at::tensor:
get_mkldnn_itensor(atensor)
this commit duplicate work from pytorch#17748
so that subsequent work won't be blocked, also code in ATen is re-organized,
making it more "ATen".
Summary: This is a minimalist PR to add MKL-DNN tensor per discussion from Github issue: pytorch#16038 Ops with MKL-DNN tensor will be supported in following-up PRs to speed up imperative path. Pull Request resolved: pytorch#17748 Reviewed By: dzhulgakov Differential Revision: D14614640 Pulled By: bddppq fbshipit-source-id: c58de98e244b0c63ae11e10d752a8e8ed920c533
This is a minimalist PR to add MKL-DNN tensor per discussion from Github issue: #16038
Changes include:
TensorImpl, i.e.TensorImpl::opaque_handle_. It allows customized data layout other than the default strided layout. Storage is initialized and it is valid to get data pointer and range viadata()andnumel(). Metadata like device, dtype and dims can be queried like a normal tensor with strided layout but cannot be changed. Thereforeallow_tensor_metadata_change()always returnsfalseandset_allow_tensor_metadata_change()is no-op. Thestrides()is not supported. Calling any method that changes the metadata and storage would fail. Calling tois_contiguous()always returns false since "contiguous" is not well-defined for a customized layout.OpaqueHandletemplate class is added and templated on arbitrary opaque handle class.OpaqueHandle<ideep::tensor>plugged into TensorImpl::opaque_handle_. Now it only supports CPU device (with tensor id MkldnnCPU) and fp32 data type.Ops with MKL-DNN tensor will be supported in following-up PRs to speed up imperative path.