Generalize catArray for contiguous inputs and dim != 0 #17032

jamesr66a · 2019-02-12T22:55:37Z

I noticed that we were sinking a lot of time into cat operations in machine translation on CPU, and drilled down to us doing the cat element-by-element, even though all the inputs were contiguous. The reason was we were doing the cat along a dimension that was not 0, and that caused us to not use the fast memcpy branch. This PR generalizes that branch.

Quick benchmark script:

import torch, time

tensors = [torch.rand(6, 2, 1024) for i in range(5)]

NITER = 1000
s = time.time()
for i in range(NITER):
    torch.cat(tensors, dim=1)
print('time per iter ', (time.time() - s) / NITER)

Before:

time per iter  8.089399337768554e-05

After:

time per iter  2.183413505554199e-05

cpuhrsch · 2019-02-13T00:41:30Z

Do our tests explicitly exercise this branch? If not, please add.

jamesr66a · 2019-02-13T00:45:33Z

I believe this test exercises the branch:

https://github.com/pytorch/pytorch/blob/master/test/test_torch.py#L4203

EDIT: looks like the inputs may be noncontiguous, let me dig deeper

EDIT2: I put a print in the branch and ran that test and it printed out, so looks like it's tested.

cpuhrsch · 2019-02-13T00:48:04Z

Looks like it. Maybe double check and make sure the edge case you mentioned is covered as well.

In general we prefer to avoid modifying TH and instead porting over the function to aten. Maybe I could ask you to spend a bit of time on seeing how feasible that is?

gchanan · 2019-02-13T19:12:56Z

I have a slight preference not to port this at the same time as we make changes. It's harder to review and less obvious where the problem is if there's a bug report.

gchanan · 2019-02-13T18:51:23Z

aten/src/TH/generic/THTensor.cpp

+    int64_t outer = 1, inner = 1;
+
+    // Outer is the product of dimensions from the left up to (and not
+    // including the concatenation dimension). This becomes the number of times


nit: you want the ')' after including.

facebook-github-bot

@jamesr66a is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Summary: I noticed that we were sinking a lot of time into `cat` operations in machine translation on CPU, and drilled down to us doing the cat element-by-element, even though all the inputs were contiguous. The reason was we were doing the cat along a dimension that was not 0, and that caused us to not use the fast `memcpy` branch. This PR generalizes that branch. Quick benchmark script: ``` import torch, time tensors = [torch.rand(6, 2, 1024) for i in range(5)] NITER = 1000 s = time.time() for i in range(NITER): torch.cat(tensors, dim=1) print('time per iter ', (time.time() - s) / NITER) ``` Before: ``` time per iter 8.089399337768554e-05 ``` After: ``` time per iter 2.183413505554199e-05 ``` Pull Request resolved: pytorch/pytorch#17032 Differential Revision: D14090038 Pulled By: jamesr66a fbshipit-source-id: 2c733a84915896008ac95f2233f44894bd2573de

Summary: I noticed that we were sinking a lot of time into `cat` operations in machine translation on CPU, and drilled down to us doing the cat element-by-element, even though all the inputs were contiguous. The reason was we were doing the cat along a dimension that was not 0, and that caused us to not use the fast `memcpy` branch. This PR generalizes that branch. Quick benchmark script: ``` import torch, time tensors = [torch.rand(6, 2, 1024) for i in range(5)] NITER = 1000 s = time.time() for i in range(NITER): torch.cat(tensors, dim=1) print('time per iter ', (time.time() - s) / NITER) ``` Before: ``` time per iter 8.089399337768554e-05 ``` After: ``` time per iter 2.183413505554199e-05 ``` Pull Request resolved: pytorch#17032 Differential Revision: D14090038 Pulled By: jamesr66a fbshipit-source-id: 2c733a84915896008ac95f2233f44894bd2573de

Generalize catArray for contiguous inputs and dim != 0

c10918c

jamesr66a requested review from cpuhrsch, ssnl and zdevito February 12, 2019 22:56

zdevito removed their request for review February 13, 2019 02:54

gchanan approved these changes Feb 13, 2019

View reviewed changes

facebook-github-bot reviewed Feb 14, 2019

View reviewed changes

facebook-github-bot closed this in f1da989 Feb 15, 2019

zou3519 mentioned this pull request Mar 12, 2019

Generalize catArray for contiguous inputs and dim != 0 (#17032) zou3519/pytorch#12

Closed

ezyang added the merged label Jun 25, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Generalize catArray for contiguous inputs and dim != 0 #17032

Generalize catArray for contiguous inputs and dim != 0 #17032

Uh oh!

jamesr66a commented Feb 12, 2019

Uh oh!

cpuhrsch commented Feb 13, 2019

Uh oh!

jamesr66a commented Feb 13, 2019 •

edited

Loading

Uh oh!

cpuhrsch commented Feb 13, 2019

Uh oh!

gchanan commented Feb 13, 2019

Uh oh!

gchanan Feb 13, 2019

Uh oh!

facebook-github-bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Generalize catArray for contiguous inputs and dim != 0 #17032

Generalize catArray for contiguous inputs and dim != 0 #17032

Uh oh!

Conversation

jamesr66a commented Feb 12, 2019

Uh oh!

cpuhrsch commented Feb 13, 2019

Uh oh!

jamesr66a commented Feb 13, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cpuhrsch commented Feb 13, 2019

Uh oh!

gchanan commented Feb 13, 2019

Uh oh!

gchanan Feb 13, 2019

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

jamesr66a commented Feb 13, 2019 •

edited

Loading