fix correctness for dynamo inlining RangeVariable contains #122751

bdhirsh · 2024-03-27T00:03:42Z

It looks like iter_contains() in dynamo expects to take in something like iter_contains(List[VariableTracker], VariableTracker]). Previously, when we called this function where the list in question was a RangeVariable, we would pass in RangeVariable.items as our list.

This is wrong, though since RangeVariable.items just contains the underlying [start, stop, step]. It looks like unpack_var_sequence does the right thing of "materializing" the range into a list of VariableTrackers, so I used that instead.

Stack from ghstack (oldest at bottom):

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @chenyang78 @kadeng @chauhang

[ghstack-poisoned]

pytorch-bot · 2024-03-27T00:03:45Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/122751

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 81edfe3 with merge base 69c6e0b ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

ghstack-source-id: 854d9e1 Pull Request resolved: #122751

bdhirsh · 2024-03-27T01:32:22Z

test/dynamo/test_repros.py


+    def test_contains_range_constprop(self):
+        def fn(x):
+            # dynamo should const prop to False


whoops, True*

…ns__" Fixes #122379 It looks like `iter_contains()` in dynamo expects to take in something like `iter_contains(List[VariableTracker], VariableTracker])`. Previously, when we called this function where the list in question was a `RangeVariable`, we would pass in `RangeVariable.items` as our list. This is wrong, though since `RangeVariable.items` just contains the underlying [start, stop, step]. It looks like `unpack_var_sequence` does the right thing of "materializing" the range into a list of `VariableTrackers`, so I used that instead. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang [ghstack-poisoned]

@voznesenskym

Fixes #123298 I was also seeing some crashes in torchtrain due to dynamic shapes, even when I set `compile(dynamic=False)` (cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @chenyang78 @kadeng @chauhang @wanchaol). This doesn't fix the underlying dynamic shape issues with compile + DTensor, but it does prevent dynamic shapes from leaking in. Pull Request resolved: #123348 Approved by: https://github.com/ezyang ghstack dependencies: #122502, #122751

@anijain2305

…#123347) Fixes #122459, pytorch/torchtitan#61 Even with the previous PR ("support DTensor/subclass constructors directly in the graph"), I still see some errors when running the repro above that start some logs showing that dynamo is inlining `__new__`. I noticed that putting `@torch._dynamo.disable` on DTensor's `__new__` makes the entire repro pass. Why does having dynamo try to inline `Subclass.__new__` run into problems? Morally, dynamo probably shouldn't be inlining __new__ ("creating a subclass" is a blackbox operation that AOTAutograd can trace through anyway). But concretely, we can end up with a node in the dynamo FX graph that has a "partially initialized tensor subclass" as its example value, because the subclass has been created but its fields have not been assigned to yet. This breaks a bunch of invariants throughout dynamo: there are many places where if we have a tensor subclass node, we want to look at its inner tensors, to see if they are FakeTensors, what their FakeTensorMode is, and if they have dynamic shapes. One option is to decide that "uninitialized subclass" is a first-class thing that anyone looking at the FX node examples values on the dynamo graph needs to handle, but this seems like a lot of work when in reality we don't need dynamo to trace the __new__ at all. Hence the `torch._dynamo.disable`. I still wasn't very satisfied, since it was unclear to me **why** dynamo was inlining the `__new__` call, instead of interposing on the `DTensor()` constructor directly. After a long chat with @anijain2305, he explained that with code like this: ``` @torch._dynamo.disable(recursive=False) def f(x): out = SubclassConstructor(x) ``` Dynamo will never get the chance to interpose on the subclass constructor. Instead, what will happen is: (1) Dynamo hands back control to cpython to run `f()`, since we disabled that frame (2) `SubclassConstructor(x)` is run in eager mode (3) `SubclassConstructor(x)` eventually calls `SubclassConstructor__new__` (4) this is a new frame, that cpython then allows dynamo to intercept and start compiling So it looks like we are basically forced to handle the situation where dynamo might directly start compiling `Subclass.__new__` All of the above does not explain the story for `__torch_dispatch__` though. Empirically, I have a repro in torchtrain where looking at the dynamo logs, we see dynamo try to inline `__torch_dispatch__`. ``` [rank0]:DEBUG: Skipping frame because no content in function call _prepare_output_fn /data/users/hirsheybar/b/pytorch/torch/distributed/tensor/parallel/style.py 318 [rank0]:DEBUG: torchdynamo start compiling __torch_dispatch__ /data/users/hirsheybar/b/pytorch/torch/distributed/_tensor/api.py:297, stack (elided 5 frames): ``` I haven't been able to create a smaller repro of the problem (even using `_dynamo.disable(recursive=False)`), although in theory, if there is a `torch.*` op that you were to inline (where one of the inputs is a subclass), the next frame would likely be `__torch_dispatch__`. Dynamo always treats `torch.*` operations as not-inlinable though, so in theory we shouldn't ever see dynamo inline `__torch_dispatch__`, but a `_dynamo.disable()` fixes the problem. I asked Animesh if we can have dynamo automatically apply this behavior to subclasses instead of needing it to be added explicitly. He pointed out that for `disable(recursive=False)`, we can't really do this within dynamo Pull Request resolved: #123347 Approved by: https://github.com/zou3519 ghstack dependencies: #122502, #122751, #123348

…ch#122751) Fixes pytorch#122379 It looks like `iter_contains()` in dynamo expects to take in something like `iter_contains(List[VariableTracker], VariableTracker])`. Previously, when we called this function where the list in question was a `RangeVariable`, we would pass in `RangeVariable.items` as our list. This is wrong, though since `RangeVariable.items` just contains the underlying [start, stop, step]. It looks like `unpack_var_sequence` does the right thing of "materializing" the range into a list of `VariableTrackers`, so I used that instead. Pull Request resolved: pytorch#122751 Approved by: https://github.com/anijain2305, https://github.com/jansel ghstack dependencies: pytorch#122502

@voznesenskym

…123348) Fixes pytorch#123298 I was also seeing some crashes in torchtrain due to dynamic shapes, even when I set `compile(dynamic=False)` (cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @chenyang78 @kadeng @chauhang @wanchaol). This doesn't fix the underlying dynamic shape issues with compile + DTensor, but it does prevent dynamic shapes from leaking in. Pull Request resolved: pytorch#123348 Approved by: https://github.com/ezyang ghstack dependencies: pytorch#122502, pytorch#122751

@anijain2305

…pytorch#123347) Fixes pytorch#122459, pytorch/torchtitan#61 Even with the previous PR ("support DTensor/subclass constructors directly in the graph"), I still see some errors when running the repro above that start some logs showing that dynamo is inlining `__new__`. I noticed that putting `@torch._dynamo.disable` on DTensor's `__new__` makes the entire repro pass. Why does having dynamo try to inline `Subclass.__new__` run into problems? Morally, dynamo probably shouldn't be inlining __new__ ("creating a subclass" is a blackbox operation that AOTAutograd can trace through anyway). But concretely, we can end up with a node in the dynamo FX graph that has a "partially initialized tensor subclass" as its example value, because the subclass has been created but its fields have not been assigned to yet. This breaks a bunch of invariants throughout dynamo: there are many places where if we have a tensor subclass node, we want to look at its inner tensors, to see if they are FakeTensors, what their FakeTensorMode is, and if they have dynamic shapes. One option is to decide that "uninitialized subclass" is a first-class thing that anyone looking at the FX node examples values on the dynamo graph needs to handle, but this seems like a lot of work when in reality we don't need dynamo to trace the __new__ at all. Hence the `torch._dynamo.disable`. I still wasn't very satisfied, since it was unclear to me **why** dynamo was inlining the `__new__` call, instead of interposing on the `DTensor()` constructor directly. After a long chat with @anijain2305, he explained that with code like this: ``` @torch._dynamo.disable(recursive=False) def f(x): out = SubclassConstructor(x) ``` Dynamo will never get the chance to interpose on the subclass constructor. Instead, what will happen is: (1) Dynamo hands back control to cpython to run `f()`, since we disabled that frame (2) `SubclassConstructor(x)` is run in eager mode (3) `SubclassConstructor(x)` eventually calls `SubclassConstructor__new__` (4) this is a new frame, that cpython then allows dynamo to intercept and start compiling So it looks like we are basically forced to handle the situation where dynamo might directly start compiling `Subclass.__new__` All of the above does not explain the story for `__torch_dispatch__` though. Empirically, I have a repro in torchtrain where looking at the dynamo logs, we see dynamo try to inline `__torch_dispatch__`. ``` [rank0]:DEBUG: Skipping frame because no content in function call _prepare_output_fn /data/users/hirsheybar/b/pytorch/torch/distributed/tensor/parallel/style.py 318 [rank0]:DEBUG: torchdynamo start compiling __torch_dispatch__ /data/users/hirsheybar/b/pytorch/torch/distributed/_tensor/api.py:297, stack (elided 5 frames): ``` I haven't been able to create a smaller repro of the problem (even using `_dynamo.disable(recursive=False)`), although in theory, if there is a `torch.*` op that you were to inline (where one of the inputs is a subclass), the next frame would likely be `__torch_dispatch__`. Dynamo always treats `torch.*` operations as not-inlinable though, so in theory we shouldn't ever see dynamo inline `__torch_dispatch__`, but a `_dynamo.disable()` fixes the problem. I asked Animesh if we can have dynamo automatically apply this behavior to subclasses instead of needing it to be added explicitly. He pointed out that for `disable(recursive=False)`, we can't really do this within dynamo Pull Request resolved: pytorch#123347 Approved by: https://github.com/zou3519 ghstack dependencies: pytorch#122502, pytorch#122751, pytorch#123348

…ch#122751) Fixes pytorch#122379 It looks like `iter_contains()` in dynamo expects to take in something like `iter_contains(List[VariableTracker], VariableTracker])`. Previously, when we called this function where the list in question was a `RangeVariable`, we would pass in `RangeVariable.items` as our list. This is wrong, though since `RangeVariable.items` just contains the underlying [start, stop, step]. It looks like `unpack_var_sequence` does the right thing of "materializing" the range into a list of `VariableTrackers`, so I used that instead. Pull Request resolved: pytorch#122751 Approved by: https://github.com/anijain2305, https://github.com/jansel ghstack dependencies: pytorch#122502

@voznesenskym

…123348) Fixes pytorch#123298 I was also seeing some crashes in torchtrain due to dynamic shapes, even when I set `compile(dynamic=False)` (cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @chenyang78 @kadeng @chauhang @wanchaol). This doesn't fix the underlying dynamic shape issues with compile + DTensor, but it does prevent dynamic shapes from leaking in. Pull Request resolved: pytorch#123348 Approved by: https://github.com/ezyang ghstack dependencies: pytorch#122502, pytorch#122751

@anijain2305

…pytorch#123347) Fixes pytorch#122459, pytorch/torchtitan#61 Even with the previous PR ("support DTensor/subclass constructors directly in the graph"), I still see some errors when running the repro above that start some logs showing that dynamo is inlining `__new__`. I noticed that putting `@torch._dynamo.disable` on DTensor's `__new__` makes the entire repro pass. Why does having dynamo try to inline `Subclass.__new__` run into problems? Morally, dynamo probably shouldn't be inlining __new__ ("creating a subclass" is a blackbox operation that AOTAutograd can trace through anyway). But concretely, we can end up with a node in the dynamo FX graph that has a "partially initialized tensor subclass" as its example value, because the subclass has been created but its fields have not been assigned to yet. This breaks a bunch of invariants throughout dynamo: there are many places where if we have a tensor subclass node, we want to look at its inner tensors, to see if they are FakeTensors, what their FakeTensorMode is, and if they have dynamic shapes. One option is to decide that "uninitialized subclass" is a first-class thing that anyone looking at the FX node examples values on the dynamo graph needs to handle, but this seems like a lot of work when in reality we don't need dynamo to trace the __new__ at all. Hence the `torch._dynamo.disable`. I still wasn't very satisfied, since it was unclear to me **why** dynamo was inlining the `__new__` call, instead of interposing on the `DTensor()` constructor directly. After a long chat with @anijain2305, he explained that with code like this: ``` @torch._dynamo.disable(recursive=False) def f(x): out = SubclassConstructor(x) ``` Dynamo will never get the chance to interpose on the subclass constructor. Instead, what will happen is: (1) Dynamo hands back control to cpython to run `f()`, since we disabled that frame (2) `SubclassConstructor(x)` is run in eager mode (3) `SubclassConstructor(x)` eventually calls `SubclassConstructor__new__` (4) this is a new frame, that cpython then allows dynamo to intercept and start compiling So it looks like we are basically forced to handle the situation where dynamo might directly start compiling `Subclass.__new__` All of the above does not explain the story for `__torch_dispatch__` though. Empirically, I have a repro in torchtrain where looking at the dynamo logs, we see dynamo try to inline `__torch_dispatch__`. ``` [rank0]:DEBUG: Skipping frame because no content in function call _prepare_output_fn /data/users/hirsheybar/b/pytorch/torch/distributed/tensor/parallel/style.py 318 [rank0]:DEBUG: torchdynamo start compiling __torch_dispatch__ /data/users/hirsheybar/b/pytorch/torch/distributed/_tensor/api.py:297, stack (elided 5 frames): ``` I haven't been able to create a smaller repro of the problem (even using `_dynamo.disable(recursive=False)`), although in theory, if there is a `torch.*` op that you were to inline (where one of the inputs is a subclass), the next frame would likely be `__torch_dispatch__`. Dynamo always treats `torch.*` operations as not-inlinable though, so in theory we shouldn't ever see dynamo inline `__torch_dispatch__`, but a `_dynamo.disable()` fixes the problem. I asked Animesh if we can have dynamo automatically apply this behavior to subclasses instead of needing it to be added explicitly. He pointed out that for `disable(recursive=False)`, we can't really do this within dynamo Pull Request resolved: pytorch#123347 Approved by: https://github.com/zou3519 ghstack dependencies: pytorch#122502, pytorch#122751, pytorch#123348

fix correctness for dynamo inlining RangeVariable __contains__

22022f6

[ghstack-poisoned]

This was referenced Mar 22, 2024

inductor: fix for functional_collectives.wait() followed by view() #118802

Closed

compile: ban mutations on non-compositional uses of as_strided #122502

Closed

pytorch-bot bot added ciflow/inductor module: dynamo labels Mar 27, 2024

bdhirsh added a commit that referenced this pull request Mar 27, 2024

fix correctness for dynamo inlining RangeVariable __contains__

a151068

ghstack-source-id: 854d9e1 Pull Request resolved: #122751

github-actions bot requested review from SherlockNoMad, albanD, antoniojkim, ezyang and miladm March 27, 2024 00:03

bdhirsh mentioned this pull request Mar 27, 2024

[torch.compile] split_cat_norm and cat_mutated cause the optimized model to return a tensor with wrong shape #122379

Closed

bdhirsh commented Mar 27, 2024

View reviewed changes

albanD removed their request for review March 27, 2024 22:21

ezyang requested review from anijain2305 and jansel and removed request for ezyang March 28, 2024 13:20

bdhirsh added 3 commits April 4, 2024 08:45

anijain2305 approved these changes Apr 11, 2024

View reviewed changes

jansel approved these changes Apr 11, 2024

View reviewed changes

bdhirsh added the release notes: dynamo label Apr 11, 2024

pytorchmergebot added the Merged label Apr 12, 2024

pytorchmergebot closed this in 96fe3c5 Apr 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix correctness for dynamo inlining RangeVariable contains #122751

fix correctness for dynamo inlining RangeVariable contains #122751

Uh oh!

bdhirsh commented Mar 27, 2024 •

edited

Loading

Uh oh!

pytorch-bot bot commented Mar 27, 2024 •

edited

Loading

Uh oh!

bdhirsh Mar 27, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

fix correctness for dynamo inlining RangeVariable __contains__ #122751

fix correctness for dynamo inlining RangeVariable __contains__ #122751

Uh oh!

Conversation

bdhirsh commented Mar 27, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Mar 27, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/122751

✅ No Failures

Uh oh!

bdhirsh Mar 27, 2024

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

fix correctness for dynamo inlining RangeVariable contains #122751

fix correctness for dynamo inlining RangeVariable contains #122751

bdhirsh commented Mar 27, 2024 •

edited

Loading

pytorch-bot bot commented Mar 27, 2024 •

edited

Loading