-
Notifications
You must be signed in to change notification settings - Fork 25.7k
Support torch.concat alias, add cat OpInfo & remove OpInfo test_out skips {cat, stack, hstack, vtack, dstack}
#62560
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful links
💊 CI failures summary and remediationsAs of commit 95f9736 (more details on the Dr. CI page):
🕵️ 3 new failures recognized by patternsThe following CI failures do not appear to be due to upstream breakages:
|
|
This is cool @AnirudhDagar and the implementation looks correct. This does need to add an OpInfo for torch.cat and then list torch.concat as an alias of it, however, because otherwise the alias isn't tested. I would just extend this PR with the OpInfo. @krshrimali and @kshitij12345 should be able to help you. I may be unavailable for the rest of the week but if need help merging this PR you can ping @heitorschueroff and @anjali411 and one of them should available to help you. |
|
Thanks for the review @mruberry! No worries, I'll add the |
Sounds good! It'll be great to have more OpInfos! |
|
I've added OpInfo for cat/concat. There are a few skips that were required though, namely:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, @AnirudhDagar for working on this. Looks like a great start, left a few suggestions for sample inputs.
Codecov Report
@@ Coverage Diff @@
## master #62560 +/- ##
===========================================
+ Coverage 47.37% 59.88% +12.50%
===========================================
Files 660 683 +23
Lines 86440 88279 +1839
===========================================
+ Hits 40951 52864 +11913
+ Misses 45489 35415 -10074 |
|
@krshrimali thanks for reviewing and sharing useful suggestions.
That's a good idea, I've updated I was also able to remove the skip for @anjali411 @kshitij12345 |
torch.concat alias to torch.cattorch.concat alias to torch.cat & add cat OpInfo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can remove this test as it should be covered by test_out in test_ops.py
pytorch/test/test_tensor_creation_ops.py
Lines 711 to 754 in 3df4870
| @onlyCUDA | |
| def test_cat_stack_cross_devices(self, device): | |
| cuda = torch.randn((3, 3), device=device) | |
| cpu = torch.randn((3, 3), device='cpu') | |
| out_cpu = cpu.clone() | |
| out_cuda = cuda.clone() | |
| with self.assertRaisesRegex(RuntimeError, | |
| "Expected all tensors to be on the same device"): | |
| torch.cat((cuda, cpu)) | |
| with self.assertRaisesRegex(RuntimeError, | |
| "Expected all tensors to be on the same device"): | |
| torch.cat((cpu, cuda)) | |
| with self.assertRaisesRegex(RuntimeError, | |
| "Expected all tensors to be on the same device"): | |
| torch.cat((cpu, cuda), out=out_cuda) | |
| with self.assertRaisesRegex(RuntimeError, | |
| "Expected all tensors to be on the same device"): | |
| torch.cat((cpu, cpu), out=out_cuda) | |
| with self.assertRaisesRegex(RuntimeError, | |
| "Expected all tensors to be on the same device"): | |
| torch.cat((cuda, cuda), out=out_cpu) | |
| # Stack | |
| with self.assertRaisesRegex(RuntimeError, | |
| "Expected all tensors to be on the same device"): | |
| torch.stack((cuda, cpu)) | |
| with self.assertRaisesRegex(RuntimeError, | |
| "Expected all tensors to be on the same device"): | |
| torch.stack((cpu, cuda)) | |
| with self.assertRaisesRegex(RuntimeError, | |
| "Expected all tensors to be on the same device"): | |
| torch.stack((cpu, cuda), out=out_cuda) | |
| with self.assertRaisesRegex(RuntimeError, | |
| "Expected all tensors to be on the same device"): | |
| torch.stack((cpu, cpu), out=out_cuda) | |
| with self.assertRaisesRegex(RuntimeError, | |
| "Expected all tensors to be on the same device"): | |
| torch.stack((cuda, cuda), out=out_cpu) |
cc: @mruberry
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks really good @AnirudhDagar, I have just a couple questions for your review. Looking forward to hearing your thoughts!
Problem with cat_out memory format behaviourReplying to this review comment here because it became extremely large while explaining. Apologies for that. @mruberry this seems more interesting than what I expected initially. The reason is that
So I wrote a separate test (see below) specifically for testing and making sure that the First test has inputs with different Click to expand: FAILING test_cat_out_memory_format def test_cat_out_memory_format(self, device):
inp_size = (4, 4, 4, 4)
x_cuda = torch.randn(inp_size, device=device).contiguous(memory_format=torch.channels_last)
x_cpu = torch.randn(inp_size, device='cpu').contiguous(memory_format=torch.channels_last)
y_cuda = torch.randn(inp_size, device=device)
y_cpu = torch.randn(inp_size, device='cpu')
# Case 1: if out= is the correct shape then the memory format of out= is respected
expected_size = (8, 4, 4, 4)
out_cuda = torch.empty(expected_size, device=device).contiguous(memory_format=torch.contiguous_format)
res1_cuda = torch.cat((x_cuda, y_cuda), out=out_cuda)
out_cpu = torch.empty(expected_size, device='cpu').contiguous(memory_format=torch.contiguous_format)
res1_cpu = torch.cat((x_cpu, y_cpu), out=out_cpu)
self.assertTrue(res1_cuda.is_contiguous(memory_format=torch.contiguous_format))
self.assertTrue(res1_cpu.is_contiguous(memory_format=torch.contiguous_format))
# Case 2: if out= is not the correct shape then the memory format is that of the first tensor
# output size specifically set wrong so that it is reshaped internally in cat
out_cuda = torch.empty((0), device=device).contiguous(memory_format=torch.contiguous_format)
res2_cuda = torch.cat((x_cuda, y_cuda), out=out_cuda)
out_cpu = torch.empty((0), device='cpu').contiguous(memory_format=torch.contiguous_format)
res2_cpu = torch.cat((x_cpu, y_cpu), out=out_cpu)
self.assertTrue(res2_cuda.is_contiguous(memory_format=torch.channels_last))
self.assertTrue(res2_cpu.is_contiguous(memory_format=torch.channels_last))This one has inputs with same Click to expand: PASSING test_cat_out_memory_format def test_cat_out_memory_format(self, device):
inp_size = (4, 4, 4, 4)
x_cuda = torch.randn(inp_size, device=device).contiguous(memory_format=torch.channels_last)
x_cpu = torch.randn(inp_size, device='cpu').contiguous(memory_format=torch.channels_last)
# Making sure all input tensors follow the same memory_format makes the whole test happy again
y_cuda = torch.randn(inp_size, device=device).contiguous(memory_format=torch.channels_last)
y_cpu = torch.randn(inp_size, device='cpu').contiguous(memory_format=torch.channels_last)
# Case 1: if out= is the correct shape then the memory format of out= is respected
expected_size = (8, 4, 4, 4)
out_cuda = torch.empty(expected_size, device=device).contiguous(memory_format=torch.contiguous_format)
res1_cuda = torch.cat((x_cuda, y_cuda), out=out_cuda)
out_cpu = torch.empty(expected_size, device='cpu').contiguous(memory_format=torch.contiguous_format)
res1_cpu = torch.cat((x_cpu, y_cpu), out=out_cpu)
self.assertTrue(res1_cuda.is_contiguous(memory_format=torch.contiguous_format))
self.assertTrue(res1_cpu.is_contiguous(memory_format=torch.contiguous_format))
# Case 2: if out= is not the correct shape then the memory format is that of the first tensor
# output size specifically set wrong so that it is reshaped internally in cat
out_cuda = torch.empty((0), device=device).contiguous(memory_format=torch.contiguous_format)
res2_cuda = torch.cat((x_cuda, y_cuda), out=out_cuda)
out_cpu = torch.empty((0), device='cpu').contiguous(memory_format=torch.contiguous_format)
res2_cpu = torch.cat((x_cpu, y_cpu), out=out_cpu)
self.assertTrue(res2_cuda.is_contiguous(memory_format=torch.channels_last))
self.assertTrue(res2_cpu.is_contiguous(memory_format=torch.channels_last))My OpinionWe should fix this behaviour and make both the cuda and cpu variant behave the same way. We'll need to decide which behaviour we go ahead with. I believe all this (including the tests to be added) should be carried out in a separate issue and a PR because this might be BC Breaking. If you feel the same, I can file an issue and send a subsequent PR for this later. Ps. I know that Please let me know if something is unclear. Additional Comments@mruberry @kshitij12345 when digging deeper into memory_format, I found that PyTorch supports three of these in total other than the
We should add I can add this to the docs in a separate PR. |
|
I had a chance to sync with @ngimel, @kimishpatel, and @VitalyFedyunin who previously worked on the out= behavior. I'm worried about changing the behavior so soon to our PyTorch 1.10 branch cut, so I'd like to propose the following:
Then...
How does that sound, @AnirudhDagar? Great discovery! |
|
Thanks, @mruberry for having a discussion on this. I absolutely agree with everything you mentioned, it is definitely a good idea to hold it for after 1.10 branch cut. Just a small question:
Should we add this test here or in the PR which will finally implement a consistent behaviour after 1.10 cut? I believe the test has nothing to do with this PR particularly so we can add that later as well. In case we do add it now, once the behaviour is changed, the other PR will also include the relevant updates in the test. Also other than this test (which we might want to add or leave here), everything is ready for this PR. Ps. I already have the test ready to go, the second code snippet ("Click to expand: PASSING test_cat_out_memory_format") in my last comment. It is just a matter of adding that here or leaving it for later. |
I'd prefer adding the test in this PR (it's pretty simple) just to make the current behavior clear (the test can also link to the relevant issue) but it's OK, we can follow-up with it separately as long as it's clear the behavior isn't changing in this PR. |
|
@mruberry I've added the test in my recent commits. Let me know if that looks good or needs any improvement. Also raised issue #63998 describing the inconsistent behaviour. Ps. I know that |
| self.assertTrue(res1_cuda.is_contiguous(memory_format=torch.contiguous_format)) | ||
| self.assertTrue(res1_cpu.is_contiguous(memory_format=torch.contiguous_format)) | ||
|
|
||
| # Case 2: if out= is not the correct shape then the output it is resized internally |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really nice comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work, @AnirudhDagar! This was a very challenging PR because it required understanding the new out= behavior, identifying a very subtle discrepancy between cat's CPU and CUDA implementation, and then updating the existing code and tests to account for that.
|
@krshrimali would you like to take another look? |
|
@mruberry has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, great work @AnirudhDagar 🎉 Thanks!
Fixes #61767
Changes
torch.concatalias totorch.catcat/concattest_outskips (Useat::native::resize_outputorat::native::resize_output_check)cat/concatstackhstackdstackvstack/row_stackcat/stackI've not addedcat/concatto OpInfoop_dbyet, since cat is a little more tricky than other OpInfos (should have a lot of tests) and currently there are no OpInfos for that. I can try to add that in a subsequent PR or maybe here itself, whatever is suggested.Edit: cat/concat OpInfo has been added.
Note: I've added the named tensor support for
concatalias as well, maybe that's out of spec inarray-apibut it is still useful for consistency in PyTorch.Thanks to @krshrimali for guidance on my first PR :))
cc @mruberry @rgommers @pmeier @asmeurer @leofang @AnirudhDagar @asi1024 @emcastillo @kmaehashi @heitorschueroff @krshrimali