[NT] Backward support for broadcasting binary ops #112519

soulitzer · 2023-10-31T17:32:06Z

Stack from ghstack (oldest at bottom):

[ghstack-poisoned]

pytorch-bot · 2023-10-31T17:32:12Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/112519

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 08d2e23 with merge base 1855153 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

test/test_nestedtensor.py

torch/csrc/autograd/input_metadata.h

[ghstack-poisoned]

jbschlosser

Thanks for the PR! got some minor comments and questions

torch/nested/__init__.py

torch/nested/_internal/nested_tensor.py

jbschlosser · 2023-11-01T16:58:59Z

torch/nested/_internal/ops.py

+    # sum_dim_IntList can produce a NT or a T depending on whether the ragged dim
+    # is summed over


hm I'm a little wary of this, although it does make sense. I believe this is the first op operating on NTs that can produce a dense T. In general, we should have consistent semantics for ops that reduce out the raggedness, whichever way we decide. @cpuhrsch any opinion on if conditionally returning a dense T here is the way to go?

If we return a T here and the user wants to go back to NT land, a non-copying as_nested_tensor(t) seems useful.

I think returning plain T has more known benefits, and it's not clear if returning NT today would give us more flexibility wrt bc either. If we return NT today and make users to grab values explicitly, that would no longer work if we choose to return T later.

Pros:

avoid having additional state on NT tracking whether we are a dense NT or not

less additional logic within autograd to handle the conversion from dense NT to dense

users would not have to worry about having to manually converting the dense NT back to dense to avoid subclass overhead

Cons:

some ops on NT unexpectedly(?) return non-NTs. I'm not actually sure how problematic it is. On the NT front, as long as NTs can freely promote, doing any NT related ops should not be problematic on this dense T.

From offline discussion @albanD thinks this seems fine as long as there are no silent correctness issues.

Cool I'm good with it then :) I'm finishing up an impl of as_nested_tensor(t) so that should help as well.

torch/nested/_internal/ops.py

jbschlosser · 2023-11-01T17:10:46Z

torch/nested/_internal/ops.py

+    t_shape = t.shape
+    extra = 0
+    for s in t_shape:
+        if s == 1:
+            if t.dim() > nt.dim():
+                extra += 1
+            t = t.squeeze(0)
+    return t, extra


does this all work within torch.compile without unwanted guards?

Hm let me check

I do think we do want the extra guards here.

hm okay, which guards show up?

I do think we do want the extra guards here.

Whoops, this is wrong actually. What is happening is that there ARE always guards testing whether inputs given are zero/one or not (due to zero-one specialization) which is indeed what we want, but also, as a result of that there are also no "extra" guards from this s == 1 test here.

So there are actually two cases: (1) the case where the inputs are also the inputs to the entire program, and (2) the case where the inputs are not.

In (1), what I wrote about zero-one specialization holds. But otherwise, the "one" would've needed to been created somewhere, and likely not a symint, so no extra guards either way.

torch/nested/_internal/ops.py

jbschlosser · 2023-11-01T17:18:23Z

torch/nested/_internal/ops.py

+    # ex: (B, j0, ?, ?) + (1, 1, ?, ?) -> (B, j0, ?, ?)
+    nt, t = (a, b) if a_is_nt else (b, a)
+    # See Note: [ Squeezing leading ones ]
+    t_squeezed, extra = squeeze_leading_ones_get_extra(t, nt)


dumb Q: can you explain a bit more the purpose behind this change? I'm unclear as to what it's solving in the context of this PR

Previously we were too relaxed when it comes to checking whether the pre-existing broadcasting logic is valid for the given NTs. e.g. someone could try JT: [B, *, D] + T: [sum(*), D] and still go through the eays path and produce something as output, even though broadcasting didn't really make sense in the first place.

What we kind of what to do is to make sure that the JT has a dim that is two greater than that of the other tensor, so that we would have something like (B, j0, a0, ..., an) + (b0, ..., bm) where m < n where B and j0 are always broadcasted over uniformly (as in any value in a given batch acquires the same values) during broadcasting logic. This is important because otherwise we would fall into the unbind case below.

We want this squeeze_leading_ones_get_extra helper because naively doing nt.dim() > t.dim() + 2 doesn't quite work though because the t might have leading ones in its dim that shouldn't disqualify it from the easy case.

cool, thanks for the explanation :)

Note that if we see t.dim() > nt.dim(), we know we're in an unsupported case, even if t is all ones. If we bail out early for this, it might be a little more readable.

test/test_nestedtensor.py

[ghstack-poisoned]

ghstack-source-id: 487d5e4 Pull Request resolved: #112519

torch/csrc/autograd/input_metadata.h

[ghstack-poisoned]

ghstack-source-id: 9391c09 Pull Request resolved: #112519

soulitzer · 2023-11-06T20:27:52Z

@pytorchbot merge -i

pytorchmergebot · 2023-11-06T20:29:54Z

Merge started

Your change will be merged while ignoring the following 1 checks: trunk / win-vs2019-cuda11.8-py3 / build

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

Pull Request resolved: pytorch#112519 Approved by: https://github.com/jbschlosser ghstack dependencies: pytorch#113031

Fixes #112845 Pull Request resolved: #113091 Approved by: https://github.com/jbschlosser ghstack dependencies: #113031, #112519

…ded_tensor msg (#113162) Improvements: improves to_padded_tensor error message when passed a NT with zero numel Pull Request resolved: #113162 Approved by: https://github.com/jbschlosser ghstack dependencies: #113031, #112519, #113091

Pull Request resolved: pytorch#112519 Approved by: https://github.com/jbschlosser ghstack dependencies: pytorch#113031

…13091) Fixes pytorch#112845 Pull Request resolved: pytorch#113091 Approved by: https://github.com/jbschlosser ghstack dependencies: pytorch#113031, pytorch#112519

…ded_tensor msg (pytorch#113162) Improvements: improves to_padded_tensor error message when passed a NT with zero numel Pull Request resolved: pytorch#113162 Approved by: https://github.com/jbschlosser ghstack dependencies: pytorch#113031, pytorch#112519, pytorch#113091

This PR solves two problems with `sum()` support in NJT: * `sum()` over a dim with `keepdim=True` returns the wrong shape (i.e. it'll keep the wrong dim). This is a long-standing bug from way back in #112519. * Historically, we've only supported `sum()` over a dim and not a full reduction. This PR adds the full reduction form (forward only, backward still fails). [ghstack-poisoned]

This PR solves two problems with `sum()` support in NJT: * `sum()` over a dim with `keepdim=True` returns the wrong shape (i.e. it'll keep the wrong dim). This is a long-standing bug from way back in #112519. * Historically, we've only supported `sum()` over a dim and not a full reduction. This PR adds the full reduction form (forward only, backward still fails). Pull Request resolved: #131945 Approved by: https://github.com/davidberard98, https://github.com/jananisriram

This PR solves two problems with `sum()` support in NJT: * `sum()` over a dim with `keepdim=True` returns the wrong shape (i.e. it'll keep the wrong dim). This is a long-standing bug from way back in pytorch#112519. * Historically, we've only supported `sum()` over a dim and not a full reduction. This PR adds the full reduction form (forward only, backward still fails). Pull Request resolved: pytorch#131945 Approved by: https://github.com/davidberard98, https://github.com/jananisriram

[NT] Backward support for broadcasting binary ops

01e11b6

[ghstack-poisoned]

soulitzer requested a review from albanD as a code owner October 31, 2023 17:32

soulitzer commented Oct 31, 2023

View reviewed changes

test/test_nestedtensor.py Outdated Show resolved Hide resolved

soulitzer requested review from cpuhrsch and jbschlosser October 31, 2023 17:35

soulitzer mentioned this pull request Oct 31, 2023

[Tracker] Move nested tensors to beta #112398

Open

52 tasks

YuqingJ reviewed Oct 31, 2023

View reviewed changes

torch/csrc/autograd/input_metadata.h Outdated Show resolved Hide resolved

Update on "[NT] Backward support for broadcasting binary ops"

7fa26c1

[ghstack-poisoned]

jbschlosser reviewed Nov 1, 2023

View reviewed changes

Update on "[NT] Backward support for broadcasting binary ops"

4668868

[ghstack-poisoned]

soulitzer mentioned this pull request Nov 1, 2023

Set torch._dynamo.config.suppress_errors=True in setUp #112618

Closed

Update on "[NT] Backward support for broadcasting binary ops"

e9870bf

[ghstack-poisoned]

Update on "[NT] Backward support for broadcasting binary ops"

1b2c5ff

[ghstack-poisoned]

This was referenced Nov 1, 2023

Graph break cleanly for test_nestedtensor #112662

Closed

[ignore] testing only #112663

Closed

soulitzer requested a review from jbschlosser November 2, 2023 21:43

Update on "[NT] Backward support for broadcasting binary ops"

899deaf

[ghstack-poisoned]

Update on "[NT] Backward support for broadcasting binary ops"

0e64812

[ghstack-poisoned]

soulitzer added a commit that referenced this pull request Nov 3, 2023

[NT] Backward support for broadcasting binary ops

fca96e5

ghstack-source-id: 487d5e4 Pull Request resolved: #112519

albanD reviewed Nov 3, 2023

View reviewed changes

torch/csrc/autograd/input_metadata.h Outdated Show resolved Hide resolved

Update on "[NT] Backward support for broadcasting binary ops"

65fd39d

[ghstack-poisoned]

soulitzer mentioned this pull request Nov 6, 2023

Split out input_metadata.cpp from input_metadata.h #113031

Closed

Update on "[NT] Backward support for broadcasting binary ops"

ce77a97

[ghstack-poisoned]

soulitzer added a commit that referenced this pull request Nov 6, 2023

[NT] Backward support for broadcasting binary ops

41672e2

ghstack-source-id: 9391c09 Pull Request resolved: #112519

soulitzer mentioned this pull request Nov 6, 2023

Factor out test_nestedtensor setUp tearDown and call super #113091

Closed

pytorchmergebot added Merged and removed merging labels Nov 7, 2023

pytorchmergebot closed this in c2084da Nov 7, 2023

xuhancn pushed a commit to xuhancn/pytorch that referenced this pull request Nov 7, 2023

[NT] Backward support for broadcasting binary ops (pytorch#112519)

cdf62b9

Pull Request resolved: pytorch#112519 Approved by: https://github.com/jbschlosser ghstack dependencies: pytorch#113031

soulitzer mentioned this pull request Nov 7, 2023

Do not generate zero-numel NT by default in helper and improve to_padded_tensor msg #113162

Closed

pytorchmergebot pushed a commit that referenced this pull request Nov 7, 2023

Factor out test_nestedtensor setUp tearDown and call super (#113091)

0c991ac

Fixes #112845 Pull Request resolved: #113091 Approved by: https://github.com/jbschlosser ghstack dependencies: #113031, #112519

facebook-github-bot deleted the gh/soulitzer/247/head branch November 10, 2023 15:24

Skylion007 pushed a commit to Skylion007/pytorch that referenced this pull request Nov 14, 2023

[NT] Backward support for broadcasting binary ops (pytorch#112519)

a4c0baf

Pull Request resolved: pytorch#112519 Approved by: https://github.com/jbschlosser ghstack dependencies: pytorch#113031

jbschlosser mentioned this pull request Jul 26, 2024

Fix sum() forward for NJT #131945

Closed

		# sum_dim_IntList can produce a NT or a T depending on whether the ragged dim
		# is summed over

[NT] Backward support for broadcasting binary ops #112519

[NT] Backward support for broadcasting binary ops #112519

Uh oh!

Conversation

soulitzer commented Oct 31, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Oct 31, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/112519

✅ No Failures

Uh oh!

Uh oh!

Uh oh!

jbschlosser left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jbschlosser Nov 6, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

soulitzer Nov 2, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

soulitzer Nov 1, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

soulitzer commented Nov 6, 2023

Uh oh!

pytorchmergebot commented Nov 6, 2023

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

soulitzer commented Oct 31, 2023 •

edited

Loading

pytorch-bot bot commented Oct 31, 2023 •

edited

Loading

jbschlosser Nov 6, 2023 •

edited

Loading

soulitzer Nov 2, 2023 •

edited

Loading

soulitzer Nov 1, 2023 •

edited

Loading