Initial vmap + NT support with unbind fallback #106786

jbschlosser · 2023-08-08T14:28:18Z

Stack from ghstack (oldest at bottom):

PoC demonstrating vmap + NT based on the design doc. This PR:

Allows BatchedTensorImpls to contain NTs
Introduces a BatchedNestedTensor dispatch key for NT-specific batching rules
Provides a batching rule fallback that unbinds the NTs -> performs computation on constituent -> rebinds results into NT

Restrictions:

Only supports one level of vmap
Only supports vmapping over dim=0 for NTs
- For operations with mixed NT / dense inputs, support is also limited to dim=0 for the dense inputs

[ghstack-poisoned]

pytorch-bot · 2023-08-08T14:28:21Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/106786

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❌ 14 New Failures, 1 Unrelated Failure

As of commit c53b848 with merge base e58d3ed ():

NEW FAILURES - The following jobs have failed:

FLAKY - The following job failed but was likely due to flakiness present on trunk:

linux-focal-py3.11-clang10 / test (default, 1, 3, linux.2xlarge) (gh)

This comment was automatically generated by Dr. CI and updates every 15 minutes.

PoC demonstrating vmap + NT based on the [design doc](https://docs.google.com/document/d/1dVVk6TOqz93PLTIneU2T3xaxCs9qZ0MaJyCvOAp_bC0). This PR: * Allows `BatchedTensorImpl`s to contain NTs * Introduces a `BatchedNestedTensor` dispatch key for NT-specific batching rules * Provides a batching rule fallback that unbinds the NTs -> performs computation on constituent -> rebinds results into NT [ghstack-poisoned]

ghstack-source-id: a18e4b6 Pull Request resolved: #106786

ezyang · 2023-08-08T15:43:34Z

Is this enough to make SAM work?

zou3519

This looks pretty good. Next steps sound like we should try to get a batching rule working?

If we want to turn this POC to not a POC, then at some point we should:

figure out what limitations we're putting on vmap + nestedtensor and then raise error messages when we go beyond them. E.g. maybe we only support a single vmap on a nestedtensor that has a single batch dimension.
figure out if we're banning size/strides/ etc correctly on BatchedTensor(NestedTensor)

aten/src/ATen/functorch/BatchedTensorImpl.cpp

aten/src/ATen/functorch/LegacyBatchingRegistrations.cpp

aten/src/ATen/functorch/BatchedFallback.cpp

jbschlosser · 2023-08-08T20:11:09Z

Is this enough to make SAM work?

Almost, wrapping this up

Edit: no (internal link)

PoC demonstrating vmap + NT based on the [design doc](https://docs.google.com/document/d/1dVVk6TOqz93PLTIneU2T3xaxCs9qZ0MaJyCvOAp_bC0). This PR: * Allows `BatchedTensorImpl`s to contain NTs * Introduces a `BatchedNestedTensor` dispatch key for NT-specific batching rules * Provides a batching rule fallback that unbinds the NTs -> performs computation on constituent -> rebinds results into NT [ghstack-poisoned]

PoC demonstrating vmap + NT based on the [design doc](https://docs.google.com/document/d/1dVVk6TOqz93PLTIneU2T3xaxCs9qZ0MaJyCvOAp_bC0). This PR: * Allows `BatchedTensorImpl`s to contain NTs * Introduces a `BatchedNestedTensor` dispatch key for NT-specific batching rules * Provides a batching rule fallback that unbinds the NTs -> performs computation on constituent -> rebinds results into NT Restrictions: * Only supports one level of vmap * Only supports vmapping over dim=0 for NTs * For operations with mixed NT / dense inputs, support is also limited to dim=0 for the dense inputs [ghstack-poisoned]

jbschlosser · 2023-08-29T15:26:54Z

@zou3519 I think there's just one thing outstanding wrt error messages: #106786 (comment)

the error messages after we change to sizes_custom/strides_custom for the non-nestedtensor case. Ideally these would be the same before/after this PR

AFAICT these are the same. After my changes, the error message for this example rightfully mentions 2D bounds:

def f(x):
    x.size(5)
    return x

x = torch.randn(2, 3, 4)

# IndexError: Dimension out of range (expected to be in range of [-2, 1], but got 5)
output = vmap(f)(x)

zou3519 · 2023-08-30T16:03:44Z

AFAICT these are the same. After my changes, the error message for this example rightfully mentions 2D bounds:

Thanks for checking. I think we're good here then, let me give this a quick re-read

zou3519

LGTM, thank you!

jbschlosser · 2023-08-31T18:51:30Z

@pytorchbot merge

pytorchmergebot · 2023-08-31T18:53:08Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2023-08-31T18:53:14Z

Merge failed

Reason: 1 mandatory check(s) failed. The first few are:

pull / linux-bionic-cuda12.1-py3.10-gcc9-sm86 / test (default, 1, 5, linux.g5.4xlarge.nvidia.gpu)

Dig deeper by viewing the failures on hud

Details for Dev Infra team

Raised by workflow job

Failing merge rule: Core Maintainers

jbschlosser · 2023-09-01T17:30:49Z

@pytorchbot merge -f "ignore spurious failure"

pytorchmergebot · 2023-09-01T17:32:20Z

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2023-09-01T17:32:29Z

Merge failed

Reason: Command git -C /home/runner/work/pytorch/pytorch cherry-pick -x b9c5dfbea029058de432735fa801da06c081ca41 returned non-zero exit code 1

Auto-merging aten/src/ATen/functorch/BatchedTensorImpl.cpp
CONFLICT (content): Merge conflict in aten/src/ATen/functorch/BatchedTensorImpl.cpp
Auto-merging aten/src/ATen/functorch/BatchedTensorImpl.h
Auto-merging test/functorch/test_vmap.py
Auto-merging torch/csrc/functorch/init.cpp
Auto-merging torchgen/model.py
error: could not apply b9c5dfbea02... Initial vmap + NT support with unbind fallback
hint: After resolving the conflicts, mark them with
hint: "git add/rm <pathspec>", then run
hint: "git cherry-pick --continue".
hint: You can instead skip this commit with "git cherry-pick --skip".
hint: To abort and get back to the state before "git cherry-pick",
hint: run "git cherry-pick --abort".

Details for Dev Infra team

Raised by workflow job

PoC demonstrating vmap + NT based on the [design doc](https://docs.google.com/document/d/1dVVk6TOqz93PLTIneU2T3xaxCs9qZ0MaJyCvOAp_bC0). This PR: * Allows `BatchedTensorImpl`s to contain NTs * Introduces a `BatchedNestedTensor` dispatch key for NT-specific batching rules * Provides a batching rule fallback that unbinds the NTs -> performs computation on constituent -> rebinds results into NT Restrictions: * Only supports one level of vmap * Only supports vmapping over dim=0 for NTs * For operations with mixed NT / dense inputs, support is also limited to dim=0 for the dense inputs [ghstack-poisoned]

jbschlosser · 2023-09-07T13:51:03Z

@pytorchbot merge

pytorchmergebot · 2023-09-07T13:53:11Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

(WIP) PoC for vmap + NT

94b8e79

[ghstack-poisoned]

jbschlosser mentioned this pull request Aug 8, 2023

Binary op support for (B, *, D) NT with (B, 1, 1) dense #106785

Closed

github-actions bot requested review from SherlockNoMad, albanD, antoniojkim, bdhirsh, ezyang, miladm, voznesenskym and wconstab August 8, 2023 14:28

jbschlosser removed request for SherlockNoMad, albanD, antoniojkim, bdhirsh, ezyang, miladm, voznesenskym and wconstab August 8, 2023 14:32

jbschlosser marked this pull request as draft August 8, 2023 14:32

jbschlosser added a commit that referenced this pull request Aug 8, 2023

(WIP) PoC for vmap + NT

f55e6c5

ghstack-source-id: a18e4b6 Pull Request resolved: #106786

ezyang requested a review from zou3519 August 8, 2023 15:52

zou3519 reviewed Aug 8, 2023

View reviewed changes

zou3519 requested a review from kshitij12345 August 10, 2023 15:05

This was referenced Aug 24, 2023

Unary out-of-place sin / cos support for NT #107891

Closed

Binary out-of-place ge.Scalar / eq.Scalar support for NT #107892

Closed

zou3519 approved these changes Aug 30, 2023

View reviewed changes

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Aug 31, 2023

jbschlosser added topic: not user facing topic category release notes: nested tensor Changes that have a direct impact on nested tensors labels Aug 31, 2023

pytorchmergebot added the merging label Aug 31, 2023

pytorchmergebot removed the merging label Aug 31, 2023

pytorchmergebot added the merging label Sep 1, 2023

pytorchmergebot removed the merging label Sep 1, 2023

jbschlosser added 3 commits September 1, 2023 16:06

pytorchmergebot added the merging label Sep 7, 2023

pytorchmergebot added Merged and removed merging labels Sep 7, 2023

pytorchmergebot closed this in b928e08 Sep 7, 2023

facebook-github-bot deleted the gh/jbschlosser/87/head branch September 10, 2023 14:22

Initial vmap + NT support with unbind fallback #106786

Initial vmap + NT support with unbind fallback #106786

Uh oh!

Conversation

jbschlosser commented Aug 8, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Aug 8, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/106786

❌ 14 New Failures, 1 Unrelated Failure

Uh oh!

ezyang commented Aug 8, 2023

Uh oh!

zou3519 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jbschlosser commented Aug 8, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jbschlosser commented Aug 29, 2023

Uh oh!

zou3519 commented Aug 30, 2023

Uh oh!

zou3519 left a comment

Choose a reason for hiding this comment

Uh oh!

jbschlosser commented Aug 31, 2023

Uh oh!

pytorchmergebot commented Aug 31, 2023

Merge started

Uh oh!

pytorchmergebot commented Aug 31, 2023

Merge failed

Uh oh!

jbschlosser commented Sep 1, 2023

Uh oh!

pytorchmergebot commented Sep 1, 2023

Merge started

Uh oh!

pytorchmergebot commented Sep 1, 2023

Merge failed

Uh oh!

jbschlosser commented Sep 7, 2023

Uh oh!

pytorchmergebot commented Sep 7, 2023

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

jbschlosser commented Aug 8, 2023 •

edited

Loading

pytorch-bot bot commented Aug 8, 2023 •

edited

Loading

zou3519 left a comment •

edited

Loading

jbschlosser commented Aug 8, 2023 •

edited

Loading