[jit]maskrcnn & bert AD coverage part 1 #16689

ailzhang · 2019-02-02T04:58:47Z

Moved a few functions from autograd namespace to aten namespace to be visible from JIT nativeResolver.
Added a hack to loop up keyword only argument. Will add proper support for kw only later
Simulate function overload in aten using _<number> as function name suffix.
Even forward returns multiple outputs like in kthvalue, there's at most one requires grad that we currently support.
Removed the TensorList related ops here since partial TensorList support is prone to bugs. Our symbolic diff for cat was never tested with autodiff, and it seems broken. Need to find another proper way to support these ops(either by properly supporting TensorList or sth like prim::ConstantChunk and leave them for next PR.

Ops supported in this PR:

erf
expand_as
index
kthvalue
mean
permute
pow
rsub
select
sqrt
squeeze
t
to
topk
transpose
view
var
embedding
logsumexp
// grad is None
_dim_arange
contiguous
nonzero
ones_like

…askrcnn

ailzhang · 2019-02-05T01:53:02Z

Failed tests are unrelated. Can I get a review on this @zdevito @apaszke ? Thanks!

zdevito

This looks good. I have a few nits, most important of which is to avoid saving self if all that is needed is self.sizes(). Saving something just for its sizes potentially uses a lot of memory. I tried to find all the places where just the sizes could be saved.

zdevito · 2019-02-08T21:38:08Z

tools/autograd/derivatives.yaml


 - name: mean(Tensor self, IntList dim, bool keepdim)
-  self: sum_backward(grad, self.sizes(), dim, keepdim) / _safe_size(self.sizes(), dim)
+  self: sum_backward(grad, self.sizes(), dim, keepdim) / at::_safe_size(self.sizes(), dim)


in person: microbenchmark that moving these functions into at:: did not add any overhead. Microbenchmark by just running, e.g. mean forward/backward on small sizes.

torch/csrc/jit/autodiff.cpp

torch/csrc/jit/symbolic_script.cpp

zdevito · 2019-02-08T22:04:43Z

torch/csrc/jit/symbolic_script.cpp

+    auto pos = schema_name.find_last_of('_');
+    auto schema_name_suffix = schema_name.substr(pos + 1);
    std::string key = canonicalSchemaString(actual_schema);
+    if (!schema_name_suffix.empty() && schema_name_suffix.find_first_not_of("0123456789") == string::npos) {


Can you refactor the string processing into function?

apaszke

Why are we adding a bunch of backward ops instead of putting them as Python code in AD? From what I understand the plan is to deduplicate them later, and start using Python for everything, so it doesn't seem necessary to do that in every case

apaszke · 2019-02-08T22:34:09Z

torch/csrc/jit/symbolic_script.cpp

+
+            return torch.mean(self), backward
+
+        def mean_1(self,


What is mean_0 and mean_1? How do we match up those things with definitions later?

These suffixes will be removed later when looking up function schema cache. This is a hack to simulate function overloading we have in aten. https://github.com/pytorch/pytorch/pull/16689/files/1a3d72ec52ddc4755008dfe920d4c0a25b8a1e13#diff-758cb565d2c84fc5268de0b63958f78dR330

name: mean(Tensor self)

name: mean(Tensor self, ScalarType dtype)

name: mean(Tensor self, IntList dim, bool keepdim)

name: mean(Tensor self, IntList dim, ScalarType dtype)

name: mean(Tensor self, IntList dim, bool keepdim, ScalarType dtype)

There are mean overloads that has dtype in the signature, should we also include those overloads?

@wanchaol Yea but this dtype option is not exposed in our public api. Thus these ops are not specifically tested in common_methods_invocations.py. Do you know some ops that call into the dtype version of mean?
I didn't want to add some ops without proper tests, thus those were not in this PR.

I found one callsite in here, but that's only a test and no ops found that calls mean with dtype, so I guess it should be fine.

aha this option is actually exposed but not documented :( a = torch.mean(a, dtype=torch.double) works. I'm can add AD & test for them.

Actually I realized all dtypes here are keyword only arguments, and since we don't document them I assume it's rarely used in real models :P
I would prefer to add them together with proper keyword-only argument parser so that we are less hacky on this. But if there's a usecase that needs this AD formula, let me know so that I can prioritize. :D

I don't think we have too much use cases on this either. Let's just add this with the keyword arg parser together.

ailzhang · 2019-02-08T22:50:52Z

@apaszke I added those backward function since they already existed in autograd namespace with correct behavior. so I tried avoid duplicating implementations.

In general I think if we want Python for everything in future, maybe we can move them after we enrich our Torchscript with a larger subset of Python language? (eg. list comprehension etc).

…p_min instead

…askrcnn

facebook-github-bot

@ailzhang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

zdevito · 2019-02-13T01:47:54Z

6.277 vs 6.821 is like an 8% slowdown. Is that number stable across runs, in which case it seems bad, or is it just noise? The test of the mean operator should be compared against the base revision as well since we want to know if we regressed the backward speed by adding the aten op.

ailzhang · 2019-02-14T04:14:40Z

My old benchmarks were run with DEBUG=1 and the timings vary a lot among runs. Deleting... From my current observation with cset, the resnet benchmark is pretty stable and consistent with master branch.
JIT run time for lstm benchmark is volatile itself as on master branch, it ranges from 6.0 - 6.3 in 5 runs with a large variance 0.26. I noticed with this PR in, it's noticeably a little slower than master branch as the range move to 6.1 - 6.4 in 5 runs etc. But with this benchmark, it's pretty hard to debug where this slowness came from.

facebook-github-bot

@ailzhang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Summary: - Moved a few functions from `autograd` namespace to `aten` namespace to be visible from JIT nativeResolver. - Added a hack to loop up keyword only argument. Will add proper support for kw only later - Simulate function overload in aten using `_<number>` as function name suffix. - Even `forward` returns multiple outputs like in `kthvalue`, there's at most one requires grad that we currently support. - Removed the `TensorList` related ops here since partial `TensorList` support is prone to bugs. Our symbolic diff for `cat` was never tested with autodiff, and it seems broken. Need to find another proper way to support these ops(either by properly supporting `TensorList` or sth like `prim::ConstantChunk` and leave them for next PR. Ops supported in this PR: ``` erf expand_as index kthvalue mean permute pow rsub select sqrt squeeze t to topk transpose view var embedding logsumexp // grad is None _dim_arange contiguous nonzero ones_like ``` Pull Request resolved: pytorch/pytorch#16689 Differential Revision: D14020806 Pulled By: ailzhang fbshipit-source-id: a5e2c144a7be5a0d39d7ac5f93cb402ec12503a5

sinkingsugar · 2019-02-19T13:19:16Z

torch/csrc/jit/symbolic_script.cpp

+  auto schema_name_suffix = schema_name.substr(pos + 1);
+  std::string schema_string = canonicalSchemaString(schema);
+  if (!schema_name_suffix.empty()
+      && schema_name_suffix.find_first_not_of("0123456789") == string::npos) {


missing std:: before string::npos, failing here on all my builds :/, weird that no CI catched this. @ailzhang

@sinkingsugar Would you mind using the collect_env.py (https://raw.githubusercontent.com/pytorch/pytorch/master/torch/utils/collect_env.py) to let me know your environment?
Basically this is indeed a typo on my side, but it got hidden by this header file included indirectly. https://github.com/pytorch/pytorch/blob/master/c10/util/string_utils.h#L6 We likely want to fix this on our side. But I'm curious why your build failed. Thanks!

At a guess, probably some ifdefs toggled differently causing the header to not be included.

@ezyang Yea agree, would love to know @sinkingsugar 's build environment, maybe worth adding to our CI?

We can probably figure it out just by tracing the include path ourselves :)

@ailzhang
well, didn't have time to run the script but I will just give you the full Dockerfile 😄
https://gist.github.com/sinkingsugar/81a42c6104bf01efd00e1d4ff9358c9b

@sinkingsugar Interestingly the Dockerfile built successfully on my end :(

Hmm weird :) pretty sure I gave you the right one! but on master it's now fixed anyway so all good I suppose!

Summary: add missing std introduced by #16689 . Investigating why this wasn't caught in CI (nor my local dev environment). Pull Request resolved: #17263 Reviewed By: ezyang Differential Revision: D14134556 Pulled By: ailzhang fbshipit-source-id: 6f0753fa858d3997e654924779646228d6d49838

Summary: This PR removes a few size of `self` that passed from forward pass to backward pass when `self` is already required in backward pass. This could be reason that cause the potential slow down in #16689 . I will attach a few perf numbers (still a bit volatile among runs tho) I got in the comment. Pull Request resolved: #17187 Differential Revision: D14179512 Pulled By: ailzhang fbshipit-source-id: 5f3b1f6f26a3fef6dec15623b940380cc13656fa

Summary: - Moved a few functions from `autograd` namespace to `aten` namespace to be visible from JIT nativeResolver. - Added a hack to loop up keyword only argument. Will add proper support for kw only later - Simulate function overload in aten using `_<number>` as function name suffix. - Even `forward` returns multiple outputs like in `kthvalue`, there's at most one requires grad that we currently support. - Removed the `TensorList` related ops here since partial `TensorList` support is prone to bugs. Our symbolic diff for `cat` was never tested with autodiff, and it seems broken. Need to find another proper way to support these ops(either by properly supporting `TensorList` or sth like `prim::ConstantChunk` and leave them for next PR. Ops supported in this PR: ``` erf expand_as index kthvalue mean permute pow rsub select sqrt squeeze t to topk transpose view var embedding logsumexp // grad is None _dim_arange contiguous nonzero ones_like ``` Pull Request resolved: pytorch#16689 Differential Revision: D14020806 Pulled By: ailzhang fbshipit-source-id: a5e2c144a7be5a0d39d7ac5f93cb402ec12503a5

) Summary: It is done by flattening all tensor lists that are inputs/outputs to the graph into the inputs/outputs list in the autograd graph. This is less desirable than simply allowing IValues to exist in the inputs/outputs of autograd::Function but it is substantially less intrusive. CaptureList describes the variables captured for backward in a single class. UnpackInstructs describes how the flattened inputs to backwards are re-packed into lists. ailzhang This PR is also part 2 of covering maskrcnn & bert AD formulas, following #16689. Ops added in this PR: ``` cat index meshgrid reshape split split_with_sizes stack unbind ``` I will also add a few perf numbers here. Pull Request resolved: #16784 Differential Revision: D14104063 Pulled By: ailzhang fbshipit-source-id: 5ceadadfd67ccaac60c5fd6740786c5354e252b9

…orch#16784) Summary: It is done by flattening all tensor lists that are inputs/outputs to the graph into the inputs/outputs list in the autograd graph. This is less desirable than simply allowing IValues to exist in the inputs/outputs of autograd::Function but it is substantially less intrusive. CaptureList describes the variables captured for backward in a single class. UnpackInstructs describes how the flattened inputs to backwards are re-packed into lists. ailzhang This PR is also part 2 of covering maskrcnn & bert AD formulas, following pytorch#16689. Ops added in this PR: ``` cat index meshgrid reshape split split_with_sizes stack unbind ``` I will also add a few perf numbers here. Pull Request resolved: pytorch#16784 Differential Revision: D14104063 Pulled By: ailzhang fbshipit-source-id: 5ceadadfd67ccaac60c5fd6740786c5354e252b9

Ailing Zhang added 2 commits February 1, 2019 20:56

maskrcnn & bert coverage 1/2

ba3aeca

resolve conflict

722e4d1

facebook-github-bot added the oncall: jit Add this issue/PR to JIT oncall triage queue label Feb 2, 2019

fix tests

0879ddb

ailzhang force-pushed the AD/maskrcnn branch from 5ff7aa3 to 0879ddb Compare February 3, 2019 03:11

Ailing Zhang added 2 commits February 3, 2019 23:30

Merge branch 'master' of https://github.com/pytorch/pytorch into AD/m…

af487c4

…askrcnn

trigger circle rebuild

1a3d72e

jspisak requested review from apaszke, umanwizard and zdevito February 8, 2019 19:06

zdevito approved these changes Feb 8, 2019

View reviewed changes

apaszke reviewed Feb 8, 2019

View reviewed changes

Ailing Zhang added 2 commits February 8, 2019 15:57

resolve conflict with master

47986c5

IntList->int[]/IntArrayRef, add logsumexp

f288c09

ailzhang mentioned this pull request Feb 9, 2019

[JIT] Support keyword-only arguments in Torchscript #16921

Closed

Ailing Zhang added 2 commits February 9, 2019 10:05

move .size() out of backward; remove aten::cat in AD; resolve comments

7046693

graph changes when passing self.size() instead of self

049ba51

ailzhang force-pushed the AD/maskrcnn branch from 56d3f49 to 14dd5b9 Compare February 10, 2019 02:24

nn.normalize triggers wrong AD formula of clamp in autodiff, use clam…

e89a93f

…p_min instead

ailzhang force-pushed the AD/maskrcnn branch from 14dd5b9 to e89a93f Compare February 10, 2019 05:39

Ailing Zhang added 2 commits February 9, 2019 22:02

Merge branch 'master' of https://github.com/pytorch/pytorch into AD/m…

f734128

…askrcnn

Merge branch 'master' of https://github.com/pytorch/pytorch into AD/m…

70bb77a

…askrcnn

zdevito mentioned this pull request Feb 11, 2019

[Ready] Allow Tensor lists to show up in symbolic differentiable graphs. #16784

Closed

ailzhang changed the title ~~[jit]maskrcnn & bert AD coverage 1/2~~ [jit]maskrcnn & bert AD coverage part 1 Feb 11, 2019

facebook-github-bot reviewed Feb 11, 2019

View reviewed changes

Ailing Zhang added 2 commits February 13, 2019 10:05

add aten::to

7191e2a

merge with master

4e59141

fmassa mentioned this pull request Feb 14, 2019

[WIP] Tracing / Scripting facebookresearch/maskrcnn-benchmark#138

Closed

facebook-github-bot reviewed Feb 14, 2019

View reviewed changes

facebook-github-bot closed this in b0545aa Feb 14, 2019

ailzhang mentioned this pull request Feb 16, 2019

[JIT][AD]Improvements for current AD #17187

Closed

sinkingsugar reviewed Feb 19, 2019

View reviewed changes

ailzhang mentioned this pull request Feb 19, 2019

fix missing std #17263

Closed

zou3519 mentioned this pull request Mar 12, 2019

maskrcnn & bert AD coverage part 1 (#16689) zou3519/pytorch#8

Closed

ezyang added the merged label Jun 25, 2019

[jit]maskrcnn & bert AD coverage part 1 #16689

[jit]maskrcnn & bert AD coverage part 1 #16689

Uh oh!

Conversation

ailzhang commented Feb 2, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ailzhang commented Feb 5, 2019

Uh oh!

zdevito left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

apaszke left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wanchaol Feb 14, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ailzhang commented Feb 8, 2019

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

zdevito commented Feb 13, 2019

Uh oh!

ailzhang commented Feb 14, 2019

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

ailzhang commented Feb 2, 2019 •

edited

Loading

wanchaol Feb 14, 2019 •

edited

Loading