KEMBAR78
[structured] Preserve computed elements from meta func to impl by SplitInfinity · Pull Request #61746 · pytorch/pytorch · GitHub
Skip to content

Conversation

@SplitInfinity
Copy link

@SplitInfinity SplitInfinity commented Jul 16, 2021

Stack from ghstack:

Summary
This commit introduces a new feature for structured kernels that allows
kernels to declare quantities as "precomputed" in
native_functions.yaml, compute them once in the meta function and
reuse them again in the impl. The names and types of these quantities
are used to generate code for a struct containing them that the meta
function must return. In the case of a handful of surveyed kernels
(all,, any, avg_pool2d), these quantities that are used both in
the meta and impl have the same meaning as certain kernel arguments
and in fact supersede them. Accordingly, the correspondence between a
kernel argument and the precomputed elements that supersede it is also
captured in native_functions.yaml. This information is used to unpack
the struct returned by meta and pass its contents correctly to the
impl function.

The primary goal is to avoid recompute and enhance developer experience
(e.g. sometimes people can forget to compute these elements while
porting a kernel).

Differential Revision: D30407831

@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented Jul 16, 2021

🔗 Helpful links

💊 CI failures summary and remediations

As of commit 7d56ae6 (more details on the Dr. CI page):


  • 1/1 failures introduced in this PR

🕵️ 1 new failure recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

See GitHub Actions build linux-xenial-py3.6-gcc5.4 / build (1/1)

Step: "Unknown" (full log | diagnosis details | 🔁 rerun)

2021-08-31T19:13:01.6034807Z rm: cannot remove ...kins/sccache_error.log': No such file or directory
2021-08-31T19:13:01.5992168Z ++++ extract_trap_cmd
2021-08-31T19:13:01.5992872Z ++++ printf '%s\n' ''
2021-08-31T19:13:01.5993332Z +++ printf '%s\n' cleanup
2021-08-31T19:13:01.5993957Z ++ trap -- '
2021-08-31T19:13:01.5994329Z cleanup' EXIT
2021-08-31T19:13:01.5996333Z ++ [[ linux-xenial-py3.6-gcc5.4 != *win-* ]]
2021-08-31T19:13:01.5996762Z ++ which sccache
2021-08-31T19:13:01.6005690Z ++ sccache --stop-server
2021-08-31T19:13:01.6027114Z ++ true
2021-08-31T19:13:01.6027502Z ++ rm /var/lib/jenkins/sccache_error.log
2021-08-31T19:13:01.6034807Z rm: cannot remove '/var/lib/jenkins/sccache_error.log': No such file or directory
2021-08-31T19:13:01.6035867Z ++ true
2021-08-31T19:13:01.6036530Z ++ [[ -n 1 ]]
2021-08-31T19:13:01.6037380Z ++ echo 'Skipping sccache server initialization, setting environment variables'
2021-08-31T19:13:01.6038118Z Skipping sccache server initialization, setting environment variables
2021-08-31T19:13:01.6038659Z ++ export SCCACHE_IDLE_TIMEOUT=1200
2021-08-31T19:13:01.6039043Z ++ SCCACHE_IDLE_TIMEOUT=1200
2021-08-31T19:13:01.6039516Z ++ export SCCACHE_ERROR_LOG=/var/lib/jenkins/sccache_error.log
2021-08-31T19:13:01.6040034Z ++ SCCACHE_ERROR_LOG=/var/lib/jenkins/sccache_error.log
2021-08-31T19:13:01.6040997Z ++ export RUST_LOG=sccache::server=error
2021-08-31T19:13:01.6041425Z ++ RUST_LOG=sccache::server=error

This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

SplitInfinity pushed a commit that referenced this pull request Jul 16, 2021
ghstack-source-id: 969136c
Pull Request resolved: #61746
@SplitInfinity SplitInfinity removed the request for review from ezyang July 16, 2021 00:32
SplitInfinity pushed a commit that referenced this pull request Jul 16, 2021
ghstack-source-id: 7c90e5d
Pull Request resolved: #61746

TORCH_META_FUNC2(all, dim)(const Tensor& self, int64_t dim, bool keepdim) {
check_allany_for_meta(*this, "all", self, dim, keepdim);
this->dim = maybe_wrap_dim(dim, self.dim());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mmmm. Let's enforce the convention that all members have trailing underscores, it will make it more obvious what's going on here

@ezyang
Copy link
Contributor

ezyang commented Jul 16, 2021

Some more philosophical questions that I've wondered about for this task (they may be insoluble, just trying to put some seeds in your brain):

  • Is preservation of computed elements in this way an indication that the level of abstraction for the arguments in the function is incorrect? For example, you could imagine rewriting the example function "all" here into two functions, "all" (with the same semantics as before), and "_explicit_all" (where dim is guaranteed to be wrapped correctly). What are the downsides of doing this? (Overhead?) Which one ought to be exposed to external backends? Would this work in all cases?
  • Is there a way to do this computed element preservation without forcing them to be spilled to memory (as is the case for class members, due to class ABI requirements)?
  • Is there a way to make sure implementations use this->dim (in your example) and not the explicitly passed in dim?
  • When is it appropriate vs inappropriate to add a field like this?

SplitInfinity pushed a commit that referenced this pull request Jul 19, 2021
ghstack-source-id: a80ae35
Pull Request resolved: #61746
SplitInfinity pushed a commit that referenced this pull request Jul 23, 2021
ghstack-source-id: 6da6175
Pull Request resolved: #61746
SplitInfinity pushed a commit that referenced this pull request Aug 5, 2021
ghstack-source-id: 3e9679b
Pull Request resolved: #61746
SplitInfinity pushed a commit that referenced this pull request Aug 5, 2021
ghstack-source-id: 42a0221
Pull Request resolved: #61746
SplitInfinity pushed a commit that referenced this pull request Aug 6, 2021
ghstack-source-id: c389fa5
Pull Request resolved: #61746
#define TORCH_META_FUNC2(name, overload) void structured_##name##_##overload::meta

#define TORCH_PRECOMPUTE_META_FUNC(name) structured_##name::precompute_out structured_##name::meta
#define TORCH_PRECOMPUTE_META_FUNC2(name, overload) structured_##name##_##overload::precompute_out structured_##name##_##overload::meta
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if there actually is any reason to change the naming convention here, since, IIUC, precompute meta and meta are mutually exclusive (so it wouldn't be like you would have a naming conflict)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reupping

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which naming convention did I change? New macros are required because the return type for meta functions of kernels that have precompute are different.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh true. I guess we could unconditionally define structured_##name::precompute_out (maybe as return_type) and then just typedef it to void when it's not precomputed he he

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After your last comment, I don't feel too strongly about this, and wouldn't block on this

}

return {
kH=kH,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

huh, the dots are optional? Should we prefer if people specify dot or not?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I think what's happening is that the kH declared above is being assigned to itself and then used to make the struct. The dots are definitely required. But on that note, I can't reproduce the error that is supposed to be emitted with this syntax: https://godbolt.org/z/Y6dxh1Eqd

I remember clearly that we did manage to write code that produces that error during a VC, but I can't seem to do it again. I tried gcc and clang both.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

blegh. Well I guess we'll just have to write a little lint rule for this. Maybe clang tidy can catch it.

SplitInfinity pushed a commit that referenced this pull request Aug 25, 2021
**Summary**
This commit introduces a new feature for structured kernels that allows
kernels to declare quantities as "precomputed" in
`native_functions.yaml`, compute them once in the `meta` function and
reuse them again in the `impl`. The names and types of these quantities
are used to generate code for a struct containing them that the `meta`
function must return. In the case of a handful of surveyed kernels
(`all,`, `any`, `avg_pool2d`), these quantities that are used both in
the `meta` and `impl` have the same meaning as certain kernel arguments
and in fact supersede them. Accordingly, the correspondence between a
kernel argument and the precomputed elements that supersede it is also
captured in `native_functions.yaml`. This information is used to unpack
the struct returned by `meta` and pass its contents correctly to the
`impl` function.

The primary goal is to avoid recompute and enhance developer experience
(e.g. sometimes people can forget to compute these elements while
porting a kernel).

ghstack-source-id: 74eca07
Pull Request resolved: #61746
@SplitInfinity
Copy link
Author

SplitInfinity commented Aug 25, 2021

Okay, so I tried the fluent approach with a templated class. It went pretty smoothly for the most part, but I commented on the diff with some questions I have. Here's an example of a precompute_out struct generated by this version of the PR with some comments inline added by me to explain what the code does:

struct TORCH_API structured_avg_pool2d : public at::impl::MetaBase {
  template <bool A = false, bool B = false, bool C = false, bool D = false, bool E = false, bool F = false>
  struct TORCH_API precompute_out {
    precompute_out<true, B, C, D, E, F> set_kH(int64_t value) {
      // Make sure people only try to set kH once.
      TORCH_INTERNAL_ASSERT(A == false);

      // Create a new precompute_out instance with the same state as *this,
      // but with kH set to its new value. Set the template parameter corresponding
      // to kH to true as well to indicate that it has been set.
      precompute_out<true, B, C, D, E, F> ret;
      ret.kH = value;
      ret.kW = this->kW;  
      ret.dH = this->dH;  
      ret.dW = this->dW;
      ret.padH = this->padH;
      ret.padW = this->padW;
      return ret;             
    }   
    
    precompute_out<A, true, C, D, E, F> set_kW(int64_t value) {
      TORCH_INTERNAL_ASSERT(B == false);
      precompute_out<A, true, C, D, E, F> ret;
      ret.kH = this->kH;
      ret.kW = value;
      ret.dH = this->dH;  
      ret.dW = this->dW;
      ret.padH = this->padH;
    }

    ...

    int64_t kH;
    int64_t kW;
    int64_t dH;
    int64_t dW;         
    int64_t padH;
    int64_t padW;       
};

I left the other setters out but I think you get the idea.

meta_return = "void"
precomputed = g.out.precomputed if g.structured else None

if precomputed:
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code added below looks really ugly:

struct TORCH_API structured_avg_pool2d : public at::impl::MetaBase {
  
                template <bool A = false, bool B = false, bool C = false, bool D
                struct TORCH_API precompute_out {
                    
                    precompute_out<true, B, C, D, E, F> set_kH(int64_t value) {
                        TORCH_INTERNAL_ASSERT(A == false);
                        precompute_out<true, B, C, D, E, F> ret;
ret.kH = value;     
ret.kW = this->kW;      
ret.dH = this->dH;      
ret.dW = this->dW;
ret.padH = this->padH;
ret.padW = this->padW;  
return ret;
                    }

Is there a way to indent these blocks properly (other than adding tab characters manually)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, sorry :( I kind of want to run all of the generated code through an autoformatter but never have gotten around to it.

@SplitInfinity SplitInfinity requested a review from ezyang August 25, 2021 05:30
@SplitInfinity
Copy link
Author

@SplitInfinity has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@SplitInfinity
Copy link
Author

Internal tests on D30407831 look good, so that means Windows is happy

@ezyang
Copy link
Contributor

ezyang commented Aug 26, 2021

What do the error messages look like when you miss a parameter?

@SplitInfinity
Copy link
Author

What do the error messages look like when you miss a parameter?

I removed .set_padW(padW) from the meta for avg_pool2d and got:

../aten/src/ATen/native/AveragePool2d.cpp: In member function ‘at::meta::structured_avg_pool2d::meta_return_ty at::meta::structured_avg_pool2d::meta(const at::Tensor&, c10::IntArrayRef, c10::IntArrayRef, c10::IntArrayRef, bool, bool, c10::optional<long int>)’:
../aten/src/ATen/native/AveragePool2d.cpp:85:100: error: could not convert ‘at::meta::structured_avg_pool2d::precompute_out<>{0, 0, 0, 0, 0, 0}.at::meta::structured_avg_pool2d::precompute_out<>::set_kH(((int64_t)kH)).at::meta::structured_avg_pool2d::precompute_out<true, false, false, false, false, false>::set_kW(((int64_t)kW)).at::meta::structured_avg_pool2d::precompute_out<true, true, false, false, false, false>::set_dH(((int64_t)dH)).at::meta::structured_avg_pool2d::precompute_out<true, true, true, false, false, false>::set_dW(((int64_t)dW)).at::meta::structured_avg_pool2d::precompute_out<true, true, true, true, false, false>::set_padH(((int64_t)padH))’ from ‘precompute_out<[...],[...],[...],[...],[...],fals>’ to ‘precompute_out<[...],[...],[...],[...],[...],true>’
   return TORCH_PRECOMPUTE_STRUCT(avg_pool2d)().set_kH(kH).set_kW(kW).set_dH(dH).set_dW(dW).set_padH(padH);

Hmm, if I didn't know what I know about the internals of structured kernels, I wouldn't know what to do here without digging into the autogenerated code. The codegen (purposely) writes out the return type of the meta as precompute_out<true, true, true, ..., true>, which is what generates this error.

@ezyang
Copy link
Contributor

ezyang commented Aug 26, 2021

Hmm, if I didn't know what I know about the internals of structured kernels, I wouldn't know what to do here without digging into the autogenerated code. The codegen (purposely) writes out the return type of the meta as precompute_out<true, true, true, ..., true>, which is what generates this error.

I'm going to hope that something like

static_assert(KH, "kh field wasn't set");
static_assert(KW, "kw field wasn't set");
...

will make this error message better.

Actually looks like this doesn't work, because we need to set the return type of the function directly.

else:
construction_stmts.append(f"ret.{elem.name} = this->{elem.name};")

construction_stmts.append("return ret;")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know if the compiler is smart enough to optimize all of the repeated reconstruction of the precompute struct here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No; how would I go about determining that?


# Generate a string containing declarations of all precomputed elements.
precomputed_elements_with_cpp_types = [
structured.argument_type(elem, binds=elem.name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If someone ever precomputes tensors (or other non-value types), we will have to be a bit careful here, because reference types (which can be produced by argument_type) are not assignable, so this will choke to death. Technically, we probably need another api module for this, and use translate to get from the argument type to the struct type, but I guess for now this is OK.

@ezyang
Copy link
Contributor

ezyang commented Aug 26, 2021

You already have my approve, but the modified API looks great!

…impl"

**Summary**
This commit introduces a new feature for structured kernels that allows
kernels to declare quantities as "precomputed" in
`native_functions.yaml`, compute them once in the `meta` function and
reuse them again in the `impl`. The names and types of these quantities
are used to generate code for a struct containing them that the `meta`
function must return. In the case of a handful of surveyed kernels
(`all,`, `any`, `avg_pool2d`), these quantities that are used both in
the `meta` and `impl` have the same meaning as certain kernel arguments
and in fact supersede them. Accordingly, the correspondence between a
kernel argument and the precomputed elements that supersede it is also
captured in `native_functions.yaml`. This information is used to unpack
the struct returned by `meta` and pass its contents correctly to the
`impl` function.

The primary goal is to avoid recompute and enhance developer experience
(e.g. sometimes people can forget to compute these elements while
porting a kernel).

Differential Revision: [D30407831](https://our.internmc.facebook.com/intern/diff/D30407831)

[ghstack-poisoned]
SplitInfinity pushed a commit that referenced this pull request Aug 31, 2021
**Summary**
This commit introduces a new feature for structured kernels that allows
kernels to declare quantities as "precomputed" in
`native_functions.yaml`, compute them once in the `meta` function and
reuse them again in the `impl`. The names and types of these quantities
are used to generate code for a struct containing them that the `meta`
function must return. In the case of a handful of surveyed kernels
(`all,`, `any`, `avg_pool2d`), these quantities that are used both in
the `meta` and `impl` have the same meaning as certain kernel arguments
and in fact supersede them. Accordingly, the correspondence between a
kernel argument and the precomputed elements that supersede it is also
captured in `native_functions.yaml`. This information is used to unpack
the struct returned by `meta` and pass its contents correctly to the
`impl` function.

The primary goal is to avoid recompute and enhance developer experience
(e.g. sometimes people can forget to compute these elements while
porting a kernel).

ghstack-source-id: 1a553d2
Pull Request resolved: #61746
@SplitInfinity
Copy link
Author

@SplitInfinity has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

…impl"

**Summary**
This commit introduces a new feature for structured kernels that allows
kernels to declare quantities as "precomputed" in
`native_functions.yaml`, compute them once in the `meta` function and
reuse them again in the `impl`. The names and types of these quantities
are used to generate code for a struct containing them that the `meta`
function must return. In the case of a handful of surveyed kernels
(`all,`, `any`, `avg_pool2d`), these quantities that are used both in
the `meta` and `impl` have the same meaning as certain kernel arguments
and in fact supersede them. Accordingly, the correspondence between a
kernel argument and the precomputed elements that supersede it is also
captured in `native_functions.yaml`. This information is used to unpack
the struct returned by `meta` and pass its contents correctly to the
`impl` function.

The primary goal is to avoid recompute and enhance developer experience
(e.g. sometimes people can forget to compute these elements while
porting a kernel).

Differential Revision: [D30407831](https://our.internmc.facebook.com/intern/diff/D30407831)

[ghstack-poisoned]
SplitInfinity pushed a commit that referenced this pull request Aug 31, 2021
**Summary**
This commit introduces a new feature for structured kernels that allows
kernels to declare quantities as "precomputed" in
`native_functions.yaml`, compute them once in the `meta` function and
reuse them again in the `impl`. The names and types of these quantities
are used to generate code for a struct containing them that the `meta`
function must return. In the case of a handful of surveyed kernels
(`all,`, `any`, `avg_pool2d`), these quantities that are used both in
the `meta` and `impl` have the same meaning as certain kernel arguments
and in fact supersede them. Accordingly, the correspondence between a
kernel argument and the precomputed elements that supersede it is also
captured in `native_functions.yaml`. This information is used to unpack
the struct returned by `meta` and pass its contents correctly to the
`impl` function.

The primary goal is to avoid recompute and enhance developer experience
(e.g. sometimes people can forget to compute these elements while
porting a kernel).

ghstack-source-id: 6159291
Pull Request resolved: #61746
@SplitInfinity
Copy link
Author

@SplitInfinity has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@SplitInfinity merged this pull request in 968d7ee.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants