KEMBAR78
Make pin_memory/is_pinned use BackendSelect by ezyang · Pull Request #60547 · pytorch/pytorch · GitHub
Skip to content

Conversation

@ezyang
Copy link
Contributor

@ezyang ezyang commented Jun 23, 2021

Stack from ghstack:

These now dispatch on the optional Device argument, which specifies
what device you want to pin for. We now directly register pinned
memory implementations for CUDA specifically, eliminating the need
for extra virtual methods.

This makes it possible for other backends to override the behavior
of pinned memory, c.f. #59291

Signed-off-by: Edward Z. Yang ezyang@fb.com

Differential Revision: D29331881

These now dispatch on the optional Device argument, which specifies
what device you want to pin for.  We now directly register pinned
memory implementations for CUDA specifically, eliminating the need
for extra virtual methods.

This makes it possible for other backends to override the behavior
of pinned memory, c.f. #59291

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

[ghstack-poisoned]
@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented Jun 23, 2021

💊 CI failures summary and remediations

As of commit fc160f5 (more details on the Dr. CI page and at hud.pytorch.org/pr/60547):


  • 4/4 failures possibly* introduced in this PR
    • 1/4 non-scanned failure(s)

🕵️ 1 new failure recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

See CircleCI build pytorch_linux_xenial_py3_clang5_asan_test2 (1/1)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun)

Jul 07 20:50:59 SUMMARY: UndefinedBehaviorSanit.../jenkins/workspace/aten/src/ATen/Utils.cpp:20:3 in
Jul 07 20:50:59     #9 0x55c1bf2be8f2 in PyEval_EvalCode /home/builder/ktietz/cos6/ci_cos6/python_1622833237666/work/Python/ceval.c:731
Jul 07 20:50:59     #10 0x55c1bf326cd5 in run_mod /home/builder/ktietz/cos6/ci_cos6/python_1622833237666/work/Python/pythonrun.c:1025
Jul 07 20:50:59     #11 0x55c1bf328d5d in PyRun_StringFlags /home/builder/ktietz/cos6/ci_cos6/python_1622833237666/work/Python/pythonrun.c:949
Jul 07 20:50:59     #12 0x55c1bf328dbb in PyRun_SimpleStringFlags /home/builder/ktietz/cos6/ci_cos6/python_1622833237666/work/Python/pythonrun.c:445
Jul 07 20:50:59     #13 0x55c1bf329926 in run_command /home/builder/ktietz/cos6/ci_cos6/python_1622833237666/work/Modules/main.c:301
Jul 07 20:50:59     #14 0x55c1bf329926 in Py_Main /home/builder/ktietz/cos6/ci_cos6/python_1622833237666/work/Modules/main.c:749
Jul 07 20:50:59     #15 0x55c1bf263196 in main /home/builder/ktietz/cos6/ci_cos6/python_1622833237666/work/Programs/python.c:69
Jul 07 20:50:59     #16 0x7ffabe58483f in __libc_start_main /build/glibc-S7Ft5T/glibc-2.23/csu/../csu/libc-start.c:291
Jul 07 20:50:59     #17 0x55c1bf2f333d in _start (/opt/conda/bin/python3.6+0x1a733d)
Jul 07 20:50:59 
Jul 07 20:50:59 SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /var/lib/jenkins/workspace/aten/src/ATen/Utils.cpp:20:3 in 
Jul 07 20:50:59 + retcode=1
Jul 07 20:50:59 + set -e
Jul 07 20:50:59 + return 1
Jul 07 20:50:59 + [[ pytorch-linux-xenial-py3-clang5-asan-test2 == *-NO_AVX-* ]]
Jul 07 20:50:59 + [[ pytorch-linux-xenial-py3-clang5-asan-test2 == *-NO_AVX2-* ]]
Jul 07 20:50:59 + '[' -n https://github.com/pytorch/pytorch/pull/60547 ']'
Jul 07 20:50:59 + [[ pytorch-linux-xenial-py3-clang5-asan-test2 != *coverage* ]]
Jul 07 20:50:59 ++ mktemp
Jul 07 20:50:59 + DETERMINE_FROM=/tmp/tmp.WmmDc3XKdA
Jul 07 20:50:59 + file_diff_from_base /tmp/tmp.WmmDc3XKdA

2 failures not recognized by patterns:

Job Step Action
GitHub Actions Windows CI (pytorch-win-vs2019-cuda10-cudnn7-py3) / build Install Visual Studio 2019 toolchain 🔁 rerun
GitHub Actions Windows CI (pytorch-win-vs2019-cpu-py3) / test (default, 1, 2, windows.4xlarge) Install Visual Studio 2019 toolchain 🔁 rerun

Preview docs built from this PR

This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

ezyang added a commit that referenced this pull request Jun 23, 2021
These now dispatch on the optional Device argument, which specifies
what device you want to pin for.  We now directly register pinned
memory implementations for CUDA specifically, eliminating the need
for extra virtual methods.

This makes it possible for other backends to override the behavior
of pinned memory, c.f. #59291

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

ghstack-source-id: 7406e46
Pull Request resolved: #60547
@ezyang ezyang requested review from bdhirsh and wconstab June 23, 2021 15:42
@ezyang
Copy link
Contributor Author

ezyang commented Jun 23, 2021

@ezyang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@ezyang
Copy link
Contributor Author

ezyang commented Jun 23, 2021

cc @sujoysaraswati

@ezyang ezyang requested a review from ngimel June 23, 2021 15:42
These now dispatch on the optional Device argument, which specifies
what device you want to pin for.  We now directly register pinned
memory implementations for CUDA specifically, eliminating the need
for extra virtual methods.

This makes it possible for other backends to override the behavior
of pinned memory, c.f. #59291

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Differential Revision: [D29331881](https://our.internmc.facebook.com/intern/diff/D29331881)

[ghstack-poisoned]
These now dispatch on the optional Device argument, which specifies
what device you want to pin for.  We now directly register pinned
memory implementations for CUDA specifically, eliminating the need
for extra virtual methods.

This makes it possible for other backends to override the behavior
of pinned memory, c.f. #59291

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Differential Revision: [D29331881](https://our.internmc.facebook.com/intern/diff/D29331881)

[ghstack-poisoned]
ezyang added a commit that referenced this pull request Jun 23, 2021
These now dispatch on the optional Device argument, which specifies
what device you want to pin for.  We now directly register pinned
memory implementations for CUDA specifically, eliminating the need
for extra virtual methods.

This makes it possible for other backends to override the behavior
of pinned memory, c.f. #59291

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

ghstack-source-id: a4db648
Pull Request resolved: #60547
@ezyang
Copy link
Contributor Author

ezyang commented Jun 24, 2021

@ezyang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.


- func: is_pinned(Tensor self, Device? device=None) -> bool
variants: method
dispatch:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does everything just work when we change an op from being math-composite (CompositeImplicitAutograd) to being backend-specific?

My first thought was that we'd need to add an entry to derivatives.yaml, but I see some other ops that are backend-specific but do not have a derivatives.yaml entry (e.g. aten::is_set_to), and it looks like the autograd codegen still creates dummy autograd kernels for them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked the generated code for is_pinned and it looks fine:

bool is_pinned(c10::DispatchKeySet ks, const at::Tensor & self, c10::optional<at::Device> device) {
  auto& self_ = unpack(self, "self", 0);
  #ifndef NDEBUG
  c10::optional<Storage> self__storage_saved =
    self_.has_storage() ? c10::optional<Storage>(self_.storage()) : c10::nullopt;
  c10::intrusive_ptr<TensorImpl> self__impl_saved;
  if (self_.defined()) self__impl_saved = self_.getIntrusivePtr();
  #endif
  auto _tmp = ([&]() {
    at::AutoDispatchBelowADInplaceOrView guard;
    return at::redispatch::is_pinned(ks & c10::after_autograd_keyset, self_, device);
  })();
  auto result = _tmp;
  #ifndef NDEBUG
  if (self__storage_saved.has_value())
    AT_ASSERT(self__storage_saved.value().is_alias_of(self_.storage()));
  if (self__impl_saved) AT_ASSERT(self__impl_saved == self_.getIntrusivePtr());
  #endif
  TORCH_CHECK_NOT_IMPLEMENTED(!(isFwGradDefined(self)), "Trying to use forward AD with is_pinned that does not support it.");
  return result;
} 

These functions are weird because they don't have any meaningful autograd content (as they don't return a Tensor). Effectively, all it does exclude autograd key and then redispatch. I think this is just the AD codegen being smart enough to not error if it sees the function doesn't return a Tensor. cc @albanD for any further thoughts.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is exactly what happens: we only generate actual code for ops that have either one differentiable input or output:

requires_derivative = (not undifferentiable) and (len(differentiable_inputs) > 0) and (len(differentiable_outputs) > 0)

Note that if you have differentiable input/output but you don't specify anything in derivatives.yaml, it is equivalent to doing:

- name: my_fn(Tensor foo, Tensor bar, int baz) -> Tensor
  foo: not_implemented("my_fn")
  bar: not_implemented("my_fn")

Copy link

@SujoysGithub SujoysGithub Jun 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ezyang I have a question on how this would work for a non CUDA backend. Since the empty_cpu() uses a pinned memory allocator for CUDA alone, are the other backends explicitly required to call a tensor.pin_memory() afterwards?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. And then we can fix this if we change pin_memory argument on empty to take a DEVICE rather than a bool, but that is a very complicated change and very BC breaking without very careful handling so you shouldn't count on it happening in the near future.


TORCH_LIBRARY_IMPL(aten, BackendSelect, m) {
${backend_select_function_registrations};
m.impl("is_pinned", is_pinned);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the reason you hardcoded these is that the codegen only generates backend select kernels for ops with a TensorOptions arg. Is your thinking that manually writing the code here is worth the save on confusion from having people pass in a TensorOptions arg that ignores everything but the device?

I could see that, but probably worth an explicit comment at least

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I much prefer only a device argument, all the other stuff is completely unnecessary. It honestly didn't occur to me that anyone would possibly want to do this, which is why there's no comment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok fair 😛

bool is_pinned(const Tensor& self, c10::optional<at::Device> device) {
// Only CPU tensors can be pinned
if (!self.is_cpu()) {
return false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kinda confused by this. Won't this mean that if you call is_pinned(my_tensor, kCuda), we'll just short-circuit here and return false instead of dispatching into is_pinned_cuda?

Also, if you want any short-circuiting here for not-allowed devices, isn't that redundant to the is_pinned_default kernel?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, do we need this because of static dispatch? We want is_pinned(_, kCPU) to return false, not throw an error. Although another option would be to register is_pinned_default to both CPU and CompositeExplicitAutograd - not sure if that's better though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Look carefully: the test is on self.is_cpu(), not device. If you pass a CUDA tensor to is pinned, we always return false, because the pinning concept is taking CPU memory and putting it in a place where the CUDA driver can read it out directly. So in your example, so long as my_tensor is CPU, we will not short circuit here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bleh yeah, that makes sense. thanks!

@ezyang
Copy link
Contributor Author

ezyang commented Jun 24, 2021

cc @bhosmer

Copy link
Contributor

@bdhirsh bdhirsh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the clarifications!

These now dispatch on the optional Device argument, which specifies
what device you want to pin for.  We now directly register pinned
memory implementations for CUDA specifically, eliminating the need
for extra virtual methods.

This makes it possible for other backends to override the behavior
of pinned memory, c.f. #59291

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Differential Revision: [D29331881](https://our.internmc.facebook.com/intern/diff/D29331881)

[ghstack-poisoned]
ezyang added a commit that referenced this pull request Jun 24, 2021
These now dispatch on the optional Device argument, which specifies
what device you want to pin for.  We now directly register pinned
memory implementations for CUDA specifically, eliminating the need
for extra virtual methods.

This makes it possible for other backends to override the behavior
of pinned memory, c.f. #59291

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

ghstack-source-id: c07b281
Pull Request resolved: #60547
@ezyang
Copy link
Contributor Author

ezyang commented Jun 24, 2021

@ezyang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

These now dispatch on the optional Device argument, which specifies
what device you want to pin for.  We now directly register pinned
memory implementations for CUDA specifically, eliminating the need
for extra virtual methods.

This makes it possible for other backends to override the behavior
of pinned memory, c.f. #59291

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Differential Revision: [D29331881](https://our.internmc.facebook.com/intern/diff/D29331881)

[ghstack-poisoned]
ezyang added a commit that referenced this pull request Jun 25, 2021
These now dispatch on the optional Device argument, which specifies
what device you want to pin for.  We now directly register pinned
memory implementations for CUDA specifically, eliminating the need
for extra virtual methods.

This makes it possible for other backends to override the behavior
of pinned memory, c.f. #59291

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

ghstack-source-id: 0236731
Pull Request resolved: #60547
@ezyang
Copy link
Contributor Author

ezyang commented Jun 25, 2021

@ezyang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

These now dispatch on the optional Device argument, which specifies
what device you want to pin for.  We now directly register pinned
memory implementations for CUDA specifically, eliminating the need
for extra virtual methods.

This makes it possible for other backends to override the behavior
of pinned memory, c.f. #59291

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Differential Revision: [D29331881](https://our.internmc.facebook.com/intern/diff/D29331881)

[ghstack-poisoned]
@ezyang ezyang requested a review from soulitzer as a code owner June 28, 2021 16:52
ezyang added a commit that referenced this pull request Jun 28, 2021
These now dispatch on the optional Device argument, which specifies
what device you want to pin for.  We now directly register pinned
memory implementations for CUDA specifically, eliminating the need
for extra virtual methods.

This makes it possible for other backends to override the behavior
of pinned memory, c.f. #59291

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

ghstack-source-id: e669690
Pull Request resolved: #60547
- name: segment_reduce(Tensor data, str reduce, *, Tensor? lengths=None, Tensor? indices=None, int axis=0, bool unsafe=False, Scalar? initial=None) -> Tensor
data: _segment_reduce_backward(grad, result, data, reduce, lengths)

- name: pin_memory(Tensor(a) self, Device? device=None) -> Tensor(a)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pin_memory is a view op?
If it is a no-op in some cases (returning self as is), this is not ok for autograd. You will have to keep this one as composite and use another function to do the pinning that always return different memory.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whoops. OK yes will need to fix this. I wonder if any tests gonna catch this...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#60286 should once it lands haha
If you want to test this function only, you can give an input that requires grad and is already pinned, that should fail as well.

@ezyang
Copy link
Contributor Author

ezyang commented Jun 28, 2021

@ezyang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

These now dispatch on the optional Device argument, which specifies
what device you want to pin for.  We now directly register pinned
memory implementations for CUDA specifically, eliminating the need
for extra virtual methods.

This makes it possible for other backends to override the behavior
of pinned memory, c.f. #59291

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Differential Revision: [D29331881](https://our.internmc.facebook.com/intern/diff/D29331881)

[ghstack-poisoned]
ezyang added a commit that referenced this pull request Jun 28, 2021
These now dispatch on the optional Device argument, which specifies
what device you want to pin for.  We now directly register pinned
memory implementations for CUDA specifically, eliminating the need
for extra virtual methods.

This makes it possible for other backends to override the behavior
of pinned memory, c.f. #59291

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

ghstack-source-id: e784039
Pull Request resolved: #60547
@ezyang
Copy link
Contributor Author

ezyang commented Jun 28, 2021

@ezyang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

These now dispatch on the optional Device argument, which specifies
what device you want to pin for.  We now directly register pinned
memory implementations for CUDA specifically, eliminating the need
for extra virtual methods.

This makes it possible for other backends to override the behavior
of pinned memory, c.f. #59291

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Differential Revision: [D29331881](https://our.internmc.facebook.com/intern/diff/D29331881)

[ghstack-poisoned]
ezyang added a commit that referenced this pull request Jun 29, 2021
These now dispatch on the optional Device argument, which specifies
what device you want to pin for.  We now directly register pinned
memory implementations for CUDA specifically, eliminating the need
for extra virtual methods.

This makes it possible for other backends to override the behavior
of pinned memory, c.f. #59291

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

ghstack-source-id: 5a554cd
Pull Request resolved: #60547
Copy link
Collaborator

@albanD albanD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@ezyang
Copy link
Contributor Author

ezyang commented Jun 29, 2021

@ezyang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

These now dispatch on the optional Device argument, which specifies
what device you want to pin for.  We now directly register pinned
memory implementations for CUDA specifically, eliminating the need
for extra virtual methods.

This makes it possible for other backends to override the behavior
of pinned memory, c.f. #59291

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Differential Revision: [D29331881](https://our.internmc.facebook.com/intern/diff/D29331881)

[ghstack-poisoned]
These now dispatch on the optional Device argument, which specifies
what device you want to pin for.  We now directly register pinned
memory implementations for CUDA specifically, eliminating the need
for extra virtual methods.

This makes it possible for other backends to override the behavior
of pinned memory, c.f. #59291

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Differential Revision: [D29331881](https://our.internmc.facebook.com/intern/diff/D29331881)

[ghstack-poisoned]
ezyang added a commit that referenced this pull request Jul 7, 2021
These now dispatch on the optional Device argument, which specifies
what device you want to pin for.  We now directly register pinned
memory implementations for CUDA specifically, eliminating the need
for extra virtual methods.

This makes it possible for other backends to override the behavior
of pinned memory, c.f. #59291

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

ghstack-source-id: 42b9a6b
Pull Request resolved: #60547
@ezyang
Copy link
Contributor Author

ezyang commented Jul 7, 2021

@ezyang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@ezyang
Copy link
Contributor Author

ezyang commented Jul 7, 2021

cc @albanD I redid this PR to exactly preserve old autograd semantics

These now dispatch on the optional Device argument, which specifies
what device you want to pin for.  We now directly register pinned
memory implementations for CUDA specifically, eliminating the need
for extra virtual methods.

This makes it possible for other backends to override the behavior
of pinned memory, c.f. #59291

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Differential Revision: [D29331881](https://our.internmc.facebook.com/intern/diff/D29331881)

[ghstack-poisoned]
@ezyang
Copy link
Contributor Author

ezyang commented Jul 7, 2021

@ezyang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

These now dispatch on the optional Device argument, which specifies
what device you want to pin for.  We now directly register pinned
memory implementations for CUDA specifically, eliminating the need
for extra virtual methods.

This makes it possible for other backends to override the behavior
of pinned memory, c.f. #59291

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Differential Revision: [D29331881](https://our.internmc.facebook.com/intern/diff/D29331881)

[ghstack-poisoned]
ezyang added a commit that referenced this pull request Jul 7, 2021
These now dispatch on the optional Device argument, which specifies
what device you want to pin for.  We now directly register pinned
memory implementations for CUDA specifically, eliminating the need
for extra virtual methods.

This makes it possible for other backends to override the behavior
of pinned memory, c.f. #59291

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

ghstack-source-id: 138c291
Pull Request resolved: #60547
@ezyang
Copy link
Contributor Author

ezyang commented Jul 7, 2021

@ezyang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@ezyang merged this pull request in bacf8ec.

@facebook-github-bot facebook-github-bot deleted the gh/ezyang/1044/head branch July 16, 2021 14:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants