KEMBAR78
[JIT][AD]matmul/dropout by ailzhang · Pull Request #17523 · pytorch/pytorch · GitHub
Skip to content

Conversation

@ailzhang
Copy link
Contributor

@ailzhang ailzhang commented Feb 27, 2019

dim = 3
x = range(dim) 
# x = [0, 1, 2] and is List[int]

@facebook-github-bot facebook-github-bot added the oncall: jit Add this issue/PR to JIT oncall triage queue label Feb 27, 2019
@ailzhang ailzhang requested review from apaszke and zdevito February 27, 2019 05:05
Copy link
Contributor

@zdevito zdevito left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The non-convolution changes look good. In person: let's merge the non-convolution parts of this PR but hold off on changing _convolution. It is not clear dropping really complicated _convolution op into the differentiation will allow any backends to do meaningful optimization.

// impl_index 6 - 12 are in _convolution_nogroup
auto returned_tuple = at::_convolution_nogroup(
input, weight, bias, params.stride, params.padding, params.dilation, params.transposed, params.output_padding);
output = std::get<0>(returned_tuple);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

std::tie would be nicer

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ailzhang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@ailzhang ailzhang changed the title [JIT][AD]convolution/matmul/dropout [JIT][AD]matmul/dropout Feb 28, 2019
zdevito pushed a commit to zdevito/ATen that referenced this pull request Feb 28, 2019
Summary:
* Add AD formula for _convolution & matmul & dropout
* add prim::range, fixes #17483
Example:
```
dim = 3
x = range(dim)
```
Pull Request resolved: pytorch/pytorch#17523

Differential Revision: D14254002

Pulled By: ailzhang

fbshipit-source-id: ba60d77b047db347929b72beca2623fb26aec957

if (n->matches(
"aten::dropout(Tensor input, float p, bool train) -> Tensor")) {
auto train = n->get<bool>(attr::train).value();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You never checked that this is a constant. If train is a computed value, this will throw. You have to do n->matches("...", attr::train) above.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also nit: no need to assign to a local just to return

};
}),
Operator(
"prim::range(int n) -> int[]",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm I'm a bit ambivalent about adding this, since those objects have completely different characteristics in Python.

for i in range(2 ** 64):
    ...

is fine in Python 3, but we're using Python 2 semantics in TorchScript? That seems like not a bad choice.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea I wasn't sure about this but it seems very useful atm as it saves me from writing for unnecessary loops in torchscript.
An alternative of this I thought about was list comprehension which in general I think it's also super useful, I went for prim::range because it's quick to implement I think. Possibly we could not document this feature and remove it when list comprehension is ready?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with adding a special primitive, just don't call it range. If we use Python names we should match Python semantics.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea that makes sense, I will rename it in a following PR. :) Thanks!

train: bool):
mask = torch.empty_like(input)
mask.bernoulli_(1 - p)
res = mask * input / (1.0 - p)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we replacing forward code with this? It's not necessary, and avoiding it might open up more optimization opportunities and is less bug prone

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I wrote this I had 2 options :

I think a design question here is : do we want to keep AD formula as close as possible to its ATen counterpart (for easier possible future deduplication and consistency), or do we want to keep AD only math and simple? In some cases our aten functions considers backend/special sizes etc, I'm not certain if we would like to include those in AD formula or not. Let me know your thoughts on this, would love to have some guidelines on it. Thanks!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough, I forgot that you need the mask! Let's keep it like that.


auto element = getBoolItem(list->elements(), idx);
push(stack, std::move(element));
push(stack, element);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did we loose the move here? Seems like a regression?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm iirc this is done by clang-tidy hook. It was suggesting moving a bool doesn't make a perf difference and I thought that makes sense. I could bring it back if this might cause regression.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ohhh I haven't noticed it's a bool. Yeah, in that case it shouldn't be there

@apaszke
Copy link
Contributor

apaszke commented Mar 2, 2019

@zdevito FWIW adding convolution might still be nice because we could wrap the whole CNN in a single DifferentiableGraph and save up on constructing autograd graphs (assuming our DifferentiableGraphFunction is fast enough). But it's a lot of work.

facebook-github-bot pushed a commit that referenced this pull request Mar 6, 2019
Summary:
fixes #17669
Address apaszke 's comments in #17523
Pull Request resolved: #17691

Differential Revision: D14328083

Pulled By: ailzhang

fbshipit-source-id: 9ec4a54f13bfd1aaf4b1821dd00c31793ac07a44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

oncall: jit Add this issue/PR to JIT oncall triage queue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[JIT] support range of int

5 participants