Make pin_memory/is_pinned use BackendSelect #60547

ezyang · 2021-06-23T15:39:28Z

Stack from ghstack:

Make pin_memory/is_pinned use BackendSelect #60547 Make pin_memory/is_pinned use BackendSelect

These now dispatch on the optional Device argument, which specifies
what device you want to pin for. We now directly register pinned
memory implementations for CUDA specifically, eliminating the need
for extra virtual methods.

This makes it possible for other backends to override the behavior
of pinned memory, c.f. #59291

Signed-off-by: Edward Z. Yang ezyang@fb.com

Differential Revision: D29331881

These now dispatch on the optional Device argument, which specifies what device you want to pin for. We now directly register pinned memory implementations for CUDA specifically, eliminating the need for extra virtual methods. This makes it possible for other backends to override the behavior of pinned memory, c.f. #59291 Signed-off-by: Edward Z. Yang <ezyang@fb.com> [ghstack-poisoned]

facebook-github-bot · 2021-06-23T15:39:35Z

💊 CI failures summary and remediations

As of commit fc160f5 (more details on the Dr. CI page and at hud.pytorch.org/pr/60547):

4/4 failures possibly* introduced in this PR
- 1/4 non-scanned failure(s)

🕵️ 1 new failure recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

pytorch_linux_xenial_py3_clang5_asan_test2 (1/1)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun)

Jul 07 20:50:59 SUMMARY: UndefinedBehaviorSanit.../jenkins/workspace/aten/src/ATen/Utils.cpp:20:3 in

Jul 07 20:50:59     #9 0x55c1bf2be8f2 in PyEval_EvalCode /home/builder/ktietz/cos6/ci_cos6/python_1622833237666/work/Python/ceval.c:731
Jul 07 20:50:59     #10 0x55c1bf326cd5 in run_mod /home/builder/ktietz/cos6/ci_cos6/python_1622833237666/work/Python/pythonrun.c:1025
Jul 07 20:50:59     #11 0x55c1bf328d5d in PyRun_StringFlags /home/builder/ktietz/cos6/ci_cos6/python_1622833237666/work/Python/pythonrun.c:949
Jul 07 20:50:59     #12 0x55c1bf328dbb in PyRun_SimpleStringFlags /home/builder/ktietz/cos6/ci_cos6/python_1622833237666/work/Python/pythonrun.c:445
Jul 07 20:50:59     #13 0x55c1bf329926 in run_command /home/builder/ktietz/cos6/ci_cos6/python_1622833237666/work/Modules/main.c:301
Jul 07 20:50:59     #14 0x55c1bf329926 in Py_Main /home/builder/ktietz/cos6/ci_cos6/python_1622833237666/work/Modules/main.c:749
Jul 07 20:50:59     #15 0x55c1bf263196 in main /home/builder/ktietz/cos6/ci_cos6/python_1622833237666/work/Programs/python.c:69
Jul 07 20:50:59     #16 0x7ffabe58483f in __libc_start_main /build/glibc-S7Ft5T/glibc-2.23/csu/../csu/libc-start.c:291
Jul 07 20:50:59     #17 0x55c1bf2f333d in _start (/opt/conda/bin/python3.6+0x1a733d)
Jul 07 20:50:59 
Jul 07 20:50:59 SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /var/lib/jenkins/workspace/aten/src/ATen/Utils.cpp:20:3 in 
Jul 07 20:50:59 + retcode=1
Jul 07 20:50:59 + set -e
Jul 07 20:50:59 + return 1
Jul 07 20:50:59 + [[ pytorch-linux-xenial-py3-clang5-asan-test2 == *-NO_AVX-* ]]
Jul 07 20:50:59 + [[ pytorch-linux-xenial-py3-clang5-asan-test2 == *-NO_AVX2-* ]]
Jul 07 20:50:59 + '[' -n https://github.com/pytorch/pytorch/pull/60547 ']'
Jul 07 20:50:59 + [[ pytorch-linux-xenial-py3-clang5-asan-test2 != *coverage* ]]
Jul 07 20:50:59 ++ mktemp
Jul 07 20:50:59 + DETERMINE_FROM=/tmp/tmp.WmmDc3XKdA
Jul 07 20:50:59 + file_diff_from_base /tmp/tmp.WmmDc3XKdA

2 failures not recognized by patterns:

Job	Step	Action
^{Windows CI (pytorch-win-vs2019-cuda10-cudnn7-py3) / build}	^{Install Visual Studio 2019 toolchain}	🔁 rerun
^{Windows CI (pytorch-win-vs2019-cpu-py3) / test (default, 1, 2, windows.4xlarge)}	^{Install Visual Studio 2019 toolchain}	🔁 rerun

Preview docs built from this PR

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

These now dispatch on the optional Device argument, which specifies what device you want to pin for. We now directly register pinned memory implementations for CUDA specifically, eliminating the need for extra virtual methods. This makes it possible for other backends to override the behavior of pinned memory, c.f. #59291 Signed-off-by: Edward Z. Yang <ezyang@fb.com> ghstack-source-id: 7406e46 Pull Request resolved: #60547

ezyang · 2021-06-23T15:42:20Z

@ezyang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

ezyang · 2021-06-23T15:42:24Z

cc @sujoysaraswati

These now dispatch on the optional Device argument, which specifies what device you want to pin for. We now directly register pinned memory implementations for CUDA specifically, eliminating the need for extra virtual methods. This makes it possible for other backends to override the behavior of pinned memory, c.f. #59291 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Differential Revision: [D29331881](https://our.internmc.facebook.com/intern/diff/D29331881) [ghstack-poisoned]

These now dispatch on the optional Device argument, which specifies what device you want to pin for. We now directly register pinned memory implementations for CUDA specifically, eliminating the need for extra virtual methods. This makes it possible for other backends to override the behavior of pinned memory, c.f. #59291 Signed-off-by: Edward Z. Yang <ezyang@fb.com> ghstack-source-id: a4db648 Pull Request resolved: #60547

ezyang · 2021-06-24T14:29:27Z

@ezyang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

bdhirsh · 2021-06-24T17:18:53Z

aten/src/ATen/native/native_functions.yaml


 - func: is_pinned(Tensor self, Device? device=None) -> bool
  variants: method
+  dispatch:


Does everything just work when we change an op from being math-composite (CompositeImplicitAutograd) to being backend-specific?

My first thought was that we'd need to add an entry to derivatives.yaml, but I see some other ops that are backend-specific but do not have a derivatives.yaml entry (e.g. aten::is_set_to), and it looks like the autograd codegen still creates dummy autograd kernels for them.

I checked the generated code for is_pinned and it looks fine:

bool is_pinned(c10::DispatchKeySet ks, const at::Tensor & self, c10::optional<at::Device> device) { auto& self_ = unpack(self, "self", 0); #ifndef NDEBUG c10::optional<Storage> self__storage_saved = self_.has_storage() ? c10::optional<Storage>(self_.storage()) : c10::nullopt; c10::intrusive_ptr<TensorImpl> self__impl_saved; if (self_.defined()) self__impl_saved = self_.getIntrusivePtr(); #endif auto _tmp = ([&]() { at::AutoDispatchBelowADInplaceOrView guard; return at::redispatch::is_pinned(ks & c10::after_autograd_keyset, self_, device); })(); auto result = _tmp; #ifndef NDEBUG if (self__storage_saved.has_value()) AT_ASSERT(self__storage_saved.value().is_alias_of(self_.storage())); if (self__impl_saved) AT_ASSERT(self__impl_saved == self_.getIntrusivePtr()); #endif TORCH_CHECK_NOT_IMPLEMENTED(!(isFwGradDefined(self)), "Trying to use forward AD with is_pinned that does not support it."); return result; }

These functions are weird because they don't have any meaningful autograd content (as they don't return a Tensor). Effectively, all it does exclude autograd key and then redispatch. I think this is just the AD codegen being smart enough to not error if it sees the function doesn't return a Tensor. cc @albanD for any further thoughts.

This is exactly what happens: we only generate actual code for ops that have either one differentiable input or output:

pytorch/tools/autograd/gen_variable_type.py

Line 461 in b7298f4

requires_derivative = (not undifferentiable) and (len(differentiable_inputs) > 0) and (len(differentiable_outputs) > 0)

Note that if you have differentiable input/output but you don't specify anything in derivatives.yaml, it is equivalent to doing:

- name: my_fn(Tensor foo, Tensor bar, int baz) -> Tensor foo: not_implemented("my_fn") bar: not_implemented("my_fn")

@ezyang I have a question on how this would work for a non CUDA backend. Since the empty_cpu() uses a pinned memory allocator for CUDA alone, are the other backends explicitly required to call a tensor.pin_memory() afterwards?

Yes. And then we can fix this if we change pin_memory argument on empty to take a DEVICE rather than a bool, but that is a very complicated change and very BC breaking without very careful handling so you shouldn't count on it happening in the near future.

bdhirsh · 2021-06-24T17:25:59Z

aten/src/ATen/templates/RegisterBackendSelect.cpp

+
 TORCH_LIBRARY_IMPL(aten, BackendSelect, m) {
  ${backend_select_function_registrations};
+  m.impl("is_pinned", is_pinned);


It looks like the reason you hardcoded these is that the codegen only generates backend select kernels for ops with a TensorOptions arg. Is your thinking that manually writing the code here is worth the save on confusion from having people pass in a TensorOptions arg that ignores everything but the device?

I could see that, but probably worth an explicit comment at least

Yeah, I much prefer only a device argument, all the other stuff is completely unnecessary. It honestly didn't occur to me that anyone would possibly want to do this, which is why there's no comment.

ok fair 😛

bdhirsh · 2021-06-24T17:27:56Z

aten/src/ATen/templates/RegisterBackendSelect.cpp

+bool is_pinned(const Tensor& self, c10::optional<at::Device> device) {
+  // Only CPU tensors can be pinned
+  if (!self.is_cpu()) {
+    return false;


Kinda confused by this. Won't this mean that if you call is_pinned(my_tensor, kCuda), we'll just short-circuit here and return false instead of dispatching into is_pinned_cuda?

Also, if you want any short-circuiting here for not-allowed devices, isn't that redundant to the is_pinned_default kernel?

oh, do we need this because of static dispatch? We want is_pinned(_, kCPU) to return false, not throw an error. Although another option would be to register is_pinned_default to both CPU and CompositeExplicitAutograd - not sure if that's better though.

Look carefully: the test is on self.is_cpu(), not device. If you pass a CUDA tensor to is pinned, we always return false, because the pinning concept is taking CPU memory and putting it in a place where the CUDA driver can read it out directly. So in your example, so long as my_tensor is CPU, we will not short circuit here.

bleh yeah, that makes sense. thanks!

ezyang · 2021-06-24T18:24:31Z

cc @bhosmer

bdhirsh

thanks for the clarifications!

These now dispatch on the optional Device argument, which specifies what device you want to pin for. We now directly register pinned memory implementations for CUDA specifically, eliminating the need for extra virtual methods. This makes it possible for other backends to override the behavior of pinned memory, c.f. #59291 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Differential Revision: [D29331881](https://our.internmc.facebook.com/intern/diff/D29331881) [ghstack-poisoned]

These now dispatch on the optional Device argument, which specifies what device you want to pin for. We now directly register pinned memory implementations for CUDA specifically, eliminating the need for extra virtual methods. This makes it possible for other backends to override the behavior of pinned memory, c.f. #59291 Signed-off-by: Edward Z. Yang <ezyang@fb.com> ghstack-source-id: c07b281 Pull Request resolved: #60547

ezyang · 2021-06-24T18:58:13Z

@ezyang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

These now dispatch on the optional Device argument, which specifies what device you want to pin for. We now directly register pinned memory implementations for CUDA specifically, eliminating the need for extra virtual methods. This makes it possible for other backends to override the behavior of pinned memory, c.f. #59291 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Differential Revision: [D29331881](https://our.internmc.facebook.com/intern/diff/D29331881) [ghstack-poisoned]

These now dispatch on the optional Device argument, which specifies what device you want to pin for. We now directly register pinned memory implementations for CUDA specifically, eliminating the need for extra virtual methods. This makes it possible for other backends to override the behavior of pinned memory, c.f. #59291 Signed-off-by: Edward Z. Yang <ezyang@fb.com> ghstack-source-id: 0236731 Pull Request resolved: #60547

ezyang · 2021-06-25T15:16:03Z

@ezyang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

These now dispatch on the optional Device argument, which specifies what device you want to pin for. We now directly register pinned memory implementations for CUDA specifically, eliminating the need for extra virtual methods. This makes it possible for other backends to override the behavior of pinned memory, c.f. #59291 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Differential Revision: [D29331881](https://our.internmc.facebook.com/intern/diff/D29331881) [ghstack-poisoned]

These now dispatch on the optional Device argument, which specifies what device you want to pin for. We now directly register pinned memory implementations for CUDA specifically, eliminating the need for extra virtual methods. This makes it possible for other backends to override the behavior of pinned memory, c.f. #59291 Signed-off-by: Edward Z. Yang <ezyang@fb.com> ghstack-source-id: e669690 Pull Request resolved: #60547

albanD · 2021-06-28T16:56:57Z

tools/autograd/derivatives.yaml

 - name: segment_reduce(Tensor data, str reduce, *, Tensor? lengths=None, Tensor? indices=None, int axis=0, bool unsafe=False, Scalar? initial=None) -> Tensor
  data: _segment_reduce_backward(grad, result, data, reduce, lengths)
+
+- name: pin_memory(Tensor(a) self, Device? device=None) -> Tensor(a)


pin_memory is a view op?
If it is a no-op in some cases (returning self as is), this is not ok for autograd. You will have to keep this one as composite and use another function to do the pinning that always return different memory.

whoops. OK yes will need to fix this. I wonder if any tests gonna catch this...

#60286 should once it lands haha
If you want to test this function only, you can give an input that requires grad and is already pinned, that should fail as well.

ezyang · 2021-06-28T16:58:17Z

@ezyang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

These now dispatch on the optional Device argument, which specifies what device you want to pin for. We now directly register pinned memory implementations for CUDA specifically, eliminating the need for extra virtual methods. This makes it possible for other backends to override the behavior of pinned memory, c.f. #59291 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Differential Revision: [D29331881](https://our.internmc.facebook.com/intern/diff/D29331881) [ghstack-poisoned]

These now dispatch on the optional Device argument, which specifies what device you want to pin for. We now directly register pinned memory implementations for CUDA specifically, eliminating the need for extra virtual methods. This makes it possible for other backends to override the behavior of pinned memory, c.f. #59291 Signed-off-by: Edward Z. Yang <ezyang@fb.com> ghstack-source-id: e784039 Pull Request resolved: #60547

ezyang · 2021-06-28T17:42:47Z

@ezyang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

These now dispatch on the optional Device argument, which specifies what device you want to pin for. We now directly register pinned memory implementations for CUDA specifically, eliminating the need for extra virtual methods. This makes it possible for other backends to override the behavior of pinned memory, c.f. #59291 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Differential Revision: [D29331881](https://our.internmc.facebook.com/intern/diff/D29331881) [ghstack-poisoned]

These now dispatch on the optional Device argument, which specifies what device you want to pin for. We now directly register pinned memory implementations for CUDA specifically, eliminating the need for extra virtual methods. This makes it possible for other backends to override the behavior of pinned memory, c.f. #59291 Signed-off-by: Edward Z. Yang <ezyang@fb.com> ghstack-source-id: 5a554cd Pull Request resolved: #60547

albanD

LGTM!

ezyang · 2021-06-29T18:00:02Z

@ezyang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

These now dispatch on the optional Device argument, which specifies what device you want to pin for. We now directly register pinned memory implementations for CUDA specifically, eliminating the need for extra virtual methods. This makes it possible for other backends to override the behavior of pinned memory, c.f. #59291 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Differential Revision: [D29331881](https://our.internmc.facebook.com/intern/diff/D29331881) [ghstack-poisoned]

These now dispatch on the optional Device argument, which specifies what device you want to pin for. We now directly register pinned memory implementations for CUDA specifically, eliminating the need for extra virtual methods. This makes it possible for other backends to override the behavior of pinned memory, c.f. #59291 Signed-off-by: Edward Z. Yang <ezyang@fb.com> ghstack-source-id: 42b9a6b Pull Request resolved: #60547

ezyang · 2021-07-07T17:04:01Z

@ezyang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

ezyang · 2021-07-07T17:04:18Z

cc @albanD I redid this PR to exactly preserve old autograd semantics

These now dispatch on the optional Device argument, which specifies what device you want to pin for. We now directly register pinned memory implementations for CUDA specifically, eliminating the need for extra virtual methods. This makes it possible for other backends to override the behavior of pinned memory, c.f. #59291 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Differential Revision: [D29331881](https://our.internmc.facebook.com/intern/diff/D29331881) [ghstack-poisoned]

ezyang · 2021-07-07T19:08:54Z

@ezyang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

These now dispatch on the optional Device argument, which specifies what device you want to pin for. We now directly register pinned memory implementations for CUDA specifically, eliminating the need for extra virtual methods. This makes it possible for other backends to override the behavior of pinned memory, c.f. #59291 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Differential Revision: [D29331881](https://our.internmc.facebook.com/intern/diff/D29331881) [ghstack-poisoned]

These now dispatch on the optional Device argument, which specifies what device you want to pin for. We now directly register pinned memory implementations for CUDA specifically, eliminating the need for extra virtual methods. This makes it possible for other backends to override the behavior of pinned memory, c.f. #59291 Signed-off-by: Edward Z. Yang <ezyang@fb.com> ghstack-source-id: 138c291 Pull Request resolved: #60547

ezyang · 2021-07-07T19:58:51Z

@ezyang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot · 2021-07-12T16:13:31Z

@ezyang merged this pull request in bacf8ec.

This was referenced Jun 23, 2021

Remove some unnecessary functions from CUDAHooks #59655

Closed

Add an optional Device parameter to pin_memory/is_pinned that does nothing #60201

Closed

facebook-github-bot added the cla signed label Jun 23, 2021

ezyang requested review from bdhirsh and wconstab June 23, 2021 15:42

ezyang requested a review from ngimel June 23, 2021 15:42

bdhirsh reviewed Jun 24, 2021

View reviewed changes

bdhirsh approved these changes Jun 24, 2021

View reviewed changes

ezyang requested a review from soulitzer as a code owner June 28, 2021 16:52

albanD reviewed Jun 28, 2021

View reviewed changes

This was referenced Jun 28, 2021

Add the specific Intel XPU stubs and the pluggable XPU hooks. #60076

Closed

Naming convention for variants of operations that never short circuit #60956

Open

albanD approved these changes Jun 29, 2021

View reviewed changes

facebook-github-bot closed this in bacf8ec Jul 12, 2021

facebook-github-bot added the Merged label Jul 12, 2021

facebook-github-bot deleted the gh/ezyang/1044/head branch July 16, 2021 14:17

Make pin_memory/is_pinned use BackendSelect #60547

Make pin_memory/is_pinned use BackendSelect #60547

Uh oh!

Conversation

ezyang commented Jun 23, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

facebook-github-bot commented Jun 23, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

💊 CI failures summary and remediations

🕵️ 1 new failure recognized by patterns

pytorch_linux_xenial_py3_clang5_asan_test2 (1/1)

2 failures not recognized by patterns:

Uh oh!

ezyang commented Jun 23, 2021

Uh oh!

ezyang commented Jun 23, 2021

Uh oh!

ezyang commented Jun 24, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SujoysGithub Jun 25, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ezyang commented Jun 24, 2021

Uh oh!

bdhirsh left a comment

Choose a reason for hiding this comment

Uh oh!

ezyang commented Jun 24, 2021

Uh oh!

ezyang commented Jun 25, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ezyang commented Jun 28, 2021

Uh oh!

ezyang commented Jun 28, 2021

Uh oh!

albanD left a comment

Choose a reason for hiding this comment

Uh oh!

ezyang commented Jun 29, 2021

Uh oh!

ezyang commented Jul 7, 2021

Uh oh!

ezyang commented Jul 7, 2021

Uh oh!

ezyang commented Jul 7, 2021

Uh oh!

ezyang commented Jul 7, 2021

Uh oh!

facebook-github-bot commented Jul 12, 2021

Uh oh!

ezyang commented Jun 23, 2021 •

edited

Loading

facebook-github-bot commented Jun 23, 2021 •

edited

Loading

SujoysGithub Jun 25, 2021 •

edited

Loading