fix FakeTensor creation on noncontiguous subclasses #124399

bdhirsh · 2024-04-18T15:22:36Z

Fixes #124090, context on the issue

Stack from ghstack (oldest at bottom):

cc @mrshenli @pritamdamania87 @zhaojuanmao @satgera @rohan-varma @gqchen @aazzolini @osalpekar @jiayisuse @H-Huang @kwen2501 @awgu @penguinwu @fegin @XilunWu @wanchaol @fduwjj @wz337 @tianyu-l @wconstab @yf225 @chauhang @d4l3k @ezyang @msaroufim @anijain2305 @voznesenskym @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @chenyang78 @kadeng

…ontiguous subclasses [ghstack-poisoned]

pytorch-bot · 2024-04-18T15:22:40Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/124399

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 0aac007 with merge base e16f1ee ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

…ion on noncontiguous subclasses" [ghstack-poisoned]

bdhirsh · 2024-04-19T13:44:36Z

torch/autograd/graph.py

+    for it: we need the FakeTensor to have accurate is_leaf information,
+    even though we don't actually plan to run autograd through the graph input.
+    """
+    torch._C._forbid_in_autograd(tensor)


@albanD does this look ok to you as public API + the docs I wrote? Let me know if you think I should make it clearer that it's a sharp-edge API

albanD

That sounds fair as a public API for me. I'll let @soulitzer give his opinion in case he prefers to keep it private.
Either way, it needs testing in test_autograd!

albanD · 2024-04-19T14:01:54Z

torch/csrc/autograd/variable.cpp

 }

+void forbid_in_autograd(const Variable& self) {
+  TORCH_CHECK(


You need a bit more error checking here to ensure that:

This is a leaf (no existing grad_fn)

This is not already part of a graph (no grad_accumulator)

albanD · 2024-04-19T14:02:45Z

torch/csrc/autograd/variable.cpp

+      self.defined(), "cannot call forbid_in_autograd() on undefined tensor");
+  auto new_grad_fn = std::shared_ptr<torch::autograd::Error>(
+      new torch::autograd::Error(
+          "Cannot backprop through Error node, file a bug in PyTorch"),


Cannot backprop through a Tensor that was marked as forbidden in backward.
Or something similar

albanD · 2024-04-19T14:03:01Z

torch/autograd/graph.py

    torch._C._increment_version(tensor)


+def forbid_in_autograd(tensor):


forbid_in_backward() ?
It's not forbidden in forward mode AD or from interracting in a non-differentiable way with autograd.

albanD · 2024-04-19T14:05:05Z

torch/autograd/graph.py



+def forbid_in_autograd(tensor):
+    """Replaces the current tensor's grad_fn with an Error node.


While I like the factuality and it makes it clear to me what this does, I think we need a bit more sugar coating for end users. See below.

Also maybe have a small note that this is an advanced API that we don't expect most users to use and we expect that detach() and no_grad() should be used by most users to locally disable autograd as discussed in https://pytorch.org/docs/stable/notes/autograd.html#locally-disabling-gradient-computation

albanD · 2024-04-19T14:05:24Z

torch/autograd/graph.py

+    If the tensor was originally an autograd leaf (tensor.is_leaf == False),
+    setting the tensor's grad_fn to an error node will flip tensor.is_leaf to True.


You inverted the True/False here

albanD · 2024-04-19T14:05:48Z

torch/autograd/graph.py

+def forbid_in_autograd(tensor):
+    """Replaces the current tensor's grad_fn with an Error node.
+
+    This effectively forbids the tensor from having a gradient computed during backward.


This sounds like the right intro for just above

albanD · 2024-04-19T14:06:09Z

torch/autograd/graph.py

+    If the tensor was originally an autograd leaf (tensor.is_leaf == False),
+    setting the tensor's grad_fn to an error node will flip tensor.is_leaf to True.
+
+    This is a convenient API used in torch.compile internals, when we need to


.. note:: One example where this API is used is ...

soulitzer · 2024-04-19T17:17:30Z

Public API sounds okay, but preferably with a clear story around when we'd use this over existing APIs that do similar "forbid_in_autograd" things like .detach() or setting .requires_grad=False. (We also have the private torch._C._functions.DelayedError, which is an out-of-place version of this.) If it is difficult to have such examples, I don't mind keeping it private either for now, but no strong preference.

…ion on noncontiguous subclasses" Fixes #124090, context on the issue [ghstack-poisoned]

bdhirsh · 2024-05-01T01:06:48Z

I updated the PR with @soulitzer's idea to not add a new API: it seems like torch._C._functions.DelayedError should do the job.

Adding that ErrorNode also caused some test failures, which made me realize that autograd.backward is broken in dynamo: #125287. I just fixed it directly in this PR.

bdhirsh · 2024-05-01T15:25:02Z

Hmm... the errors I'm getting are because y = torch._C._functions.DelayedError(1)(x) don't "propagate" the fact that x is a fake tensor (it returns a plain tensor)

bdhirsh · 2024-05-01T15:52:35Z

oh it's probably because the SparseTensorImpl C++ subclasses don't override shallow_copy_and_detach properly...

We probably are not able to handle "tensor subclass holding a fake sparse tensor" today for other reasons, so I'm going to leave the sparse fakify logic alone and have it continue using clone for now.

Fixes #125287 Fixes #124090, context on the issue cc mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse H-Huang kwen2501 awgu penguinwu fegin XilunWu wanchaol fduwjj wz337 tianyu-l wconstab yf225 chauhang d4l3k ezyang msaroufim anijain2305 voznesenskym EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng [ghstack-poisoned]

…or is noncontiguous (#124400) Fixes #124397 Pull Request resolved: #124400 Approved by: https://github.com/ezyang, https://github.com/yoyoyocmu ghstack dependencies: #124398, #124399

…spatch__ (#123347)" (#125288) Re-land of #123347. The original PR broke internal because of a circular import due to importing dynamo in the DTensor code. The new version uses `torch._dynamo_disable` to work around This reverts commit 9d88339. Pull Request resolved: #125288 Approved by: https://github.com/ezyang, https://github.com/yanboliang, https://github.com/yoyoyocmu, https://github.com/anijain2305, https://github.com/fegin ghstack dependencies: #124398, #124399, #124400

Fixes pytorch#125287 Fixes pytorch#124090, context on the issue Pull Request resolved: pytorch#124399 Approved by: https://github.com/soulitzer ghstack dependencies: pytorch#124398

…or is noncontiguous (#124400) Fixes #124397 Pull Request resolved: #124400 Approved by: https://github.com/ezyang, https://github.com/yoyoyocmu ghstack dependencies: #124398, #124399

…spatch__ (#123347)" (#125288) Re-land of #123347. The original PR broke internal because of a circular import due to importing dynamo in the DTensor code. The new version uses `torch._dynamo_disable` to work around This reverts commit 9d88339. Pull Request resolved: #125288 Approved by: https://github.com/ezyang, https://github.com/yanboliang, https://github.com/yoyoyocmu, https://github.com/anijain2305, https://github.com/fegin ghstack dependencies: #124398, #124399, #124400

add forbid_in_autograd api, use it to fix FakeTensor creation on nonc…

98b6359

…ontiguous subclasses [ghstack-poisoned]

bdhirsh requested review from albanD and soulitzer as code owners April 18, 2024 15:22

pytorch-bot bot added ciflow/inductor oncall: distributed Add this issue/PR to distributed oncall triage queue labels Apr 18, 2024

github-actions bot requested review from SherlockNoMad, antoniojkim, ezyang and miladm April 18, 2024 15:22

albanD removed their request for review April 18, 2024 15:28

Update on "add forbid_in_autograd api, use it to fix FakeTensor creat…

c80cc9f

…ion on noncontiguous subclasses" [ghstack-poisoned]

bdhirsh commented Apr 19, 2024

View reviewed changes

albanD reviewed Apr 19, 2024

View reviewed changes

Update on "add forbid_in_autograd api, use it to fix FakeTensor creat…

3eb2cfb

…ion on noncontiguous subclasses" Fixes #124090, context on the issue [ghstack-poisoned]

bdhirsh mentioned this pull request May 1, 2024

Reland "make sure dynamo doesn't inline DTensor __new__ or __torch_dispatch__ (#123347)" #125288

Closed

pytorch-bot bot added module: dynamo oncall: pt2 labels May 1, 2024

bdhirsh changed the title ~~add forbid_in_autograd api, use it to fix FakeTensor creation on noncontiguous subclasses~~ fix FakeTensor creation on noncontiguous subclasses May 1, 2024

ezyang removed their request for review May 1, 2024 02:54

soulitzer approved these changes May 1, 2024

View reviewed changes

bdhirsh added 2 commits May 1, 2024 08:56

bdhirsh added the release notes: composability release notes category label May 1, 2024

pytorchmergebot added the Merged label May 1, 2024

pytorchmergebot closed this in 5173cbe May 1, 2024

github-actions bot deleted the gh/bdhirsh/555/head branch June 4, 2024 02:01

		torch._C._increment_version(tensor)


		def forbid_in_autograd(tensor):



		def forbid_in_autograd(tensor):
		"""Replaces the current tensor's grad_fn with an Error node.

		If the tensor was originally an autograd leaf (tensor.is_leaf == False),
		setting the tensor's grad_fn to an error node will flip tensor.is_leaf to True.

fix FakeTensor creation on noncontiguous subclasses #124399

fix FakeTensor creation on noncontiguous subclasses #124399

Uh oh!

Conversation

bdhirsh commented Apr 18, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Apr 18, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/124399

✅ No Failures

Uh oh!

Choose a reason for hiding this comment

Uh oh!

albanD left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

soulitzer commented Apr 19, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bdhirsh commented May 1, 2024

Uh oh!

bdhirsh commented May 1, 2024

Uh oh!

bdhirsh commented May 1, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

bdhirsh commented Apr 18, 2024 •

edited

Loading

pytorch-bot bot commented Apr 18, 2024 •

edited

Loading

soulitzer commented Apr 19, 2024 •

edited

Loading