-
Notifications
You must be signed in to change notification settings - Fork 25.7k
Make pin_memory/is_pinned use BackendSelect #60547
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
These now dispatch on the optional Device argument, which specifies what device you want to pin for. We now directly register pinned memory implementations for CUDA specifically, eliminating the need for extra virtual methods. This makes it possible for other backends to override the behavior of pinned memory, c.f. #59291 Signed-off-by: Edward Z. Yang <ezyang@fb.com> [ghstack-poisoned]
💊 CI failures summary and remediationsAs of commit fc160f5 (more details on the Dr. CI page and at hud.pytorch.org/pr/60547):
🕵️ 1 new failure recognized by patternsThe following CI failures do not appear to be due to upstream breakages:
|
| Job | Step | Action |
|---|---|---|
| Install Visual Studio 2019 toolchain | 🔁 rerun | |
| Install Visual Studio 2019 toolchain | 🔁 rerun |
Preview docs built from this PR
This comment was automatically generated by Dr. CI (expand for details).
Follow this link to opt-out of these comments for your Pull Requests.Please report bugs/suggestions to the (internal) Dr. CI Users group.
These now dispatch on the optional Device argument, which specifies what device you want to pin for. We now directly register pinned memory implementations for CUDA specifically, eliminating the need for extra virtual methods. This makes it possible for other backends to override the behavior of pinned memory, c.f. #59291 Signed-off-by: Edward Z. Yang <ezyang@fb.com> ghstack-source-id: 7406e46 Pull Request resolved: #60547
|
@ezyang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
These now dispatch on the optional Device argument, which specifies what device you want to pin for. We now directly register pinned memory implementations for CUDA specifically, eliminating the need for extra virtual methods. This makes it possible for other backends to override the behavior of pinned memory, c.f. #59291 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Differential Revision: [D29331881](https://our.internmc.facebook.com/intern/diff/D29331881) [ghstack-poisoned]
These now dispatch on the optional Device argument, which specifies what device you want to pin for. We now directly register pinned memory implementations for CUDA specifically, eliminating the need for extra virtual methods. This makes it possible for other backends to override the behavior of pinned memory, c.f. #59291 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Differential Revision: [D29331881](https://our.internmc.facebook.com/intern/diff/D29331881) [ghstack-poisoned]
These now dispatch on the optional Device argument, which specifies what device you want to pin for. We now directly register pinned memory implementations for CUDA specifically, eliminating the need for extra virtual methods. This makes it possible for other backends to override the behavior of pinned memory, c.f. #59291 Signed-off-by: Edward Z. Yang <ezyang@fb.com> ghstack-source-id: a4db648 Pull Request resolved: #60547
|
@ezyang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
|
|
||
| - func: is_pinned(Tensor self, Device? device=None) -> bool | ||
| variants: method | ||
| dispatch: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does everything just work when we change an op from being math-composite (CompositeImplicitAutograd) to being backend-specific?
My first thought was that we'd need to add an entry to derivatives.yaml, but I see some other ops that are backend-specific but do not have a derivatives.yaml entry (e.g. aten::is_set_to), and it looks like the autograd codegen still creates dummy autograd kernels for them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked the generated code for is_pinned and it looks fine:
bool is_pinned(c10::DispatchKeySet ks, const at::Tensor & self, c10::optional<at::Device> device) {
auto& self_ = unpack(self, "self", 0);
#ifndef NDEBUG
c10::optional<Storage> self__storage_saved =
self_.has_storage() ? c10::optional<Storage>(self_.storage()) : c10::nullopt;
c10::intrusive_ptr<TensorImpl> self__impl_saved;
if (self_.defined()) self__impl_saved = self_.getIntrusivePtr();
#endif
auto _tmp = ([&]() {
at::AutoDispatchBelowADInplaceOrView guard;
return at::redispatch::is_pinned(ks & c10::after_autograd_keyset, self_, device);
})();
auto result = _tmp;
#ifndef NDEBUG
if (self__storage_saved.has_value())
AT_ASSERT(self__storage_saved.value().is_alias_of(self_.storage()));
if (self__impl_saved) AT_ASSERT(self__impl_saved == self_.getIntrusivePtr());
#endif
TORCH_CHECK_NOT_IMPLEMENTED(!(isFwGradDefined(self)), "Trying to use forward AD with is_pinned that does not support it.");
return result;
}
These functions are weird because they don't have any meaningful autograd content (as they don't return a Tensor). Effectively, all it does exclude autograd key and then redispatch. I think this is just the AD codegen being smart enough to not error if it sees the function doesn't return a Tensor. cc @albanD for any further thoughts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is exactly what happens: we only generate actual code for ops that have either one differentiable input or output:
pytorch/tools/autograd/gen_variable_type.py
Line 461 in b7298f4
| requires_derivative = (not undifferentiable) and (len(differentiable_inputs) > 0) and (len(differentiable_outputs) > 0) |
Note that if you have differentiable input/output but you don't specify anything in derivatives.yaml, it is equivalent to doing:
- name: my_fn(Tensor foo, Tensor bar, int baz) -> Tensor
foo: not_implemented("my_fn")
bar: not_implemented("my_fn")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ezyang I have a question on how this would work for a non CUDA backend. Since the empty_cpu() uses a pinned memory allocator for CUDA alone, are the other backends explicitly required to call a tensor.pin_memory() afterwards?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. And then we can fix this if we change pin_memory argument on empty to take a DEVICE rather than a bool, but that is a very complicated change and very BC breaking without very careful handling so you shouldn't count on it happening in the near future.
|
|
||
| TORCH_LIBRARY_IMPL(aten, BackendSelect, m) { | ||
| ${backend_select_function_registrations}; | ||
| m.impl("is_pinned", is_pinned); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like the reason you hardcoded these is that the codegen only generates backend select kernels for ops with a TensorOptions arg. Is your thinking that manually writing the code here is worth the save on confusion from having people pass in a TensorOptions arg that ignores everything but the device?
I could see that, but probably worth an explicit comment at least
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I much prefer only a device argument, all the other stuff is completely unnecessary. It honestly didn't occur to me that anyone would possibly want to do this, which is why there's no comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok fair 😛
| bool is_pinned(const Tensor& self, c10::optional<at::Device> device) { | ||
| // Only CPU tensors can be pinned | ||
| if (!self.is_cpu()) { | ||
| return false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kinda confused by this. Won't this mean that if you call is_pinned(my_tensor, kCuda), we'll just short-circuit here and return false instead of dispatching into is_pinned_cuda?
Also, if you want any short-circuiting here for not-allowed devices, isn't that redundant to the is_pinned_default kernel?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, do we need this because of static dispatch? We want is_pinned(_, kCPU) to return false, not throw an error. Although another option would be to register is_pinned_default to both CPU and CompositeExplicitAutograd - not sure if that's better though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Look carefully: the test is on self.is_cpu(), not device. If you pass a CUDA tensor to is pinned, we always return false, because the pinning concept is taking CPU memory and putting it in a place where the CUDA driver can read it out directly. So in your example, so long as my_tensor is CPU, we will not short circuit here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bleh yeah, that makes sense. thanks!
|
cc @bhosmer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the clarifications!
These now dispatch on the optional Device argument, which specifies what device you want to pin for. We now directly register pinned memory implementations for CUDA specifically, eliminating the need for extra virtual methods. This makes it possible for other backends to override the behavior of pinned memory, c.f. #59291 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Differential Revision: [D29331881](https://our.internmc.facebook.com/intern/diff/D29331881) [ghstack-poisoned]
These now dispatch on the optional Device argument, which specifies what device you want to pin for. We now directly register pinned memory implementations for CUDA specifically, eliminating the need for extra virtual methods. This makes it possible for other backends to override the behavior of pinned memory, c.f. #59291 Signed-off-by: Edward Z. Yang <ezyang@fb.com> ghstack-source-id: c07b281 Pull Request resolved: #60547
|
@ezyang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
These now dispatch on the optional Device argument, which specifies what device you want to pin for. We now directly register pinned memory implementations for CUDA specifically, eliminating the need for extra virtual methods. This makes it possible for other backends to override the behavior of pinned memory, c.f. #59291 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Differential Revision: [D29331881](https://our.internmc.facebook.com/intern/diff/D29331881) [ghstack-poisoned]
These now dispatch on the optional Device argument, which specifies what device you want to pin for. We now directly register pinned memory implementations for CUDA specifically, eliminating the need for extra virtual methods. This makes it possible for other backends to override the behavior of pinned memory, c.f. #59291 Signed-off-by: Edward Z. Yang <ezyang@fb.com> ghstack-source-id: 0236731 Pull Request resolved: #60547
|
@ezyang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
These now dispatch on the optional Device argument, which specifies what device you want to pin for. We now directly register pinned memory implementations for CUDA specifically, eliminating the need for extra virtual methods. This makes it possible for other backends to override the behavior of pinned memory, c.f. #59291 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Differential Revision: [D29331881](https://our.internmc.facebook.com/intern/diff/D29331881) [ghstack-poisoned]
These now dispatch on the optional Device argument, which specifies what device you want to pin for. We now directly register pinned memory implementations for CUDA specifically, eliminating the need for extra virtual methods. This makes it possible for other backends to override the behavior of pinned memory, c.f. #59291 Signed-off-by: Edward Z. Yang <ezyang@fb.com> ghstack-source-id: e669690 Pull Request resolved: #60547
tools/autograd/derivatives.yaml
Outdated
| - name: segment_reduce(Tensor data, str reduce, *, Tensor? lengths=None, Tensor? indices=None, int axis=0, bool unsafe=False, Scalar? initial=None) -> Tensor | ||
| data: _segment_reduce_backward(grad, result, data, reduce, lengths) | ||
|
|
||
| - name: pin_memory(Tensor(a) self, Device? device=None) -> Tensor(a) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pin_memory is a view op?
If it is a no-op in some cases (returning self as is), this is not ok for autograd. You will have to keep this one as composite and use another function to do the pinning that always return different memory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whoops. OK yes will need to fix this. I wonder if any tests gonna catch this...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#60286 should once it lands haha
If you want to test this function only, you can give an input that requires grad and is already pinned, that should fail as well.
|
@ezyang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
These now dispatch on the optional Device argument, which specifies what device you want to pin for. We now directly register pinned memory implementations for CUDA specifically, eliminating the need for extra virtual methods. This makes it possible for other backends to override the behavior of pinned memory, c.f. #59291 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Differential Revision: [D29331881](https://our.internmc.facebook.com/intern/diff/D29331881) [ghstack-poisoned]
These now dispatch on the optional Device argument, which specifies what device you want to pin for. We now directly register pinned memory implementations for CUDA specifically, eliminating the need for extra virtual methods. This makes it possible for other backends to override the behavior of pinned memory, c.f. #59291 Signed-off-by: Edward Z. Yang <ezyang@fb.com> ghstack-source-id: e784039 Pull Request resolved: #60547
|
@ezyang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
These now dispatch on the optional Device argument, which specifies what device you want to pin for. We now directly register pinned memory implementations for CUDA specifically, eliminating the need for extra virtual methods. This makes it possible for other backends to override the behavior of pinned memory, c.f. #59291 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Differential Revision: [D29331881](https://our.internmc.facebook.com/intern/diff/D29331881) [ghstack-poisoned]
These now dispatch on the optional Device argument, which specifies what device you want to pin for. We now directly register pinned memory implementations for CUDA specifically, eliminating the need for extra virtual methods. This makes it possible for other backends to override the behavior of pinned memory, c.f. #59291 Signed-off-by: Edward Z. Yang <ezyang@fb.com> ghstack-source-id: 5a554cd Pull Request resolved: #60547
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
|
@ezyang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
These now dispatch on the optional Device argument, which specifies what device you want to pin for. We now directly register pinned memory implementations for CUDA specifically, eliminating the need for extra virtual methods. This makes it possible for other backends to override the behavior of pinned memory, c.f. #59291 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Differential Revision: [D29331881](https://our.internmc.facebook.com/intern/diff/D29331881) [ghstack-poisoned]
These now dispatch on the optional Device argument, which specifies what device you want to pin for. We now directly register pinned memory implementations for CUDA specifically, eliminating the need for extra virtual methods. This makes it possible for other backends to override the behavior of pinned memory, c.f. #59291 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Differential Revision: [D29331881](https://our.internmc.facebook.com/intern/diff/D29331881) [ghstack-poisoned]
These now dispatch on the optional Device argument, which specifies what device you want to pin for. We now directly register pinned memory implementations for CUDA specifically, eliminating the need for extra virtual methods. This makes it possible for other backends to override the behavior of pinned memory, c.f. #59291 Signed-off-by: Edward Z. Yang <ezyang@fb.com> ghstack-source-id: 42b9a6b Pull Request resolved: #60547
|
@ezyang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
|
cc @albanD I redid this PR to exactly preserve old autograd semantics |
These now dispatch on the optional Device argument, which specifies what device you want to pin for. We now directly register pinned memory implementations for CUDA specifically, eliminating the need for extra virtual methods. This makes it possible for other backends to override the behavior of pinned memory, c.f. #59291 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Differential Revision: [D29331881](https://our.internmc.facebook.com/intern/diff/D29331881) [ghstack-poisoned]
|
@ezyang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
These now dispatch on the optional Device argument, which specifies what device you want to pin for. We now directly register pinned memory implementations for CUDA specifically, eliminating the need for extra virtual methods. This makes it possible for other backends to override the behavior of pinned memory, c.f. #59291 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Differential Revision: [D29331881](https://our.internmc.facebook.com/intern/diff/D29331881) [ghstack-poisoned]
These now dispatch on the optional Device argument, which specifies what device you want to pin for. We now directly register pinned memory implementations for CUDA specifically, eliminating the need for extra virtual methods. This makes it possible for other backends to override the behavior of pinned memory, c.f. #59291 Signed-off-by: Edward Z. Yang <ezyang@fb.com> ghstack-source-id: 138c291 Pull Request resolved: #60547
|
@ezyang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
Stack from ghstack:
These now dispatch on the optional Device argument, which specifies
what device you want to pin for. We now directly register pinned
memory implementations for CUDA specifically, eliminating the need
for extra virtual methods.
This makes it possible for other backends to override the behavior
of pinned memory, c.f. #59291
Signed-off-by: Edward Z. Yang ezyang@fb.com
Differential Revision: D29331881