[dtensor] full_tensor to return synchronously #113322

wanchaol · 2023-11-09T00:50:14Z

Stack from ghstack (oldest at bottom):

full_tensor API should return synchronously instead of
AsyncCollectiveTensor and if the return is that, we do the wait
directly, this makes the full_tensor API be more percise

full_tensor API should return synchronously instead of AsyncCollectiveTensor and if the return is that, we do the wait directly, this makes the full_tensor API be more percise [ghstack-poisoned]

pytorch-bot · 2023-11-09T00:50:23Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/113322

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit e098488 with merge base 84d64d7 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

full_tensor API should return synchronously instead of AsyncCollectiveTensor and if the return is that, we do the wait directly, this makes the full_tensor API be more percise [ghstack-poisoned]

wz337

LGTM!

wanchaol · 2023-11-09T18:00:21Z

@pytorchbot merge

pytorchmergebot · 2023-11-09T18:02:12Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

full_tensor API should return synchronously instead of AsyncCollectiveTensor and if the return is that, we do the wait directly, this makes the full_tensor API be more percise Pull Request resolved: pytorch#113322 Approved by: https://github.com/wz337

fduwjj · 2023-11-16T17:29:26Z

torch/distributed/_tensor/api.py

+    def forward(  # type: ignore[override]
+        ctx,
+        input: "DTensor",
+        grad_placements: Optional[Sequence[Placement]],
+        async_output: bool,
+    ):
        ctx.dtensor_spec = input._spec
        ctx.grad_placements = grad_placements
+        local_tensor = input._local_tensor
+        if not async_output and isinstance(local_tensor, funcol.AsyncCollectiveTensor):
+            # synchronously wait for any pending collectives to get the result tensor
+            local_tensor = local_tensor.trigger_wait()
+            local_tensor = local_tensor.elem  # type: ignore[attr-defined]


I don't have a strong opinion on this, but this might bring some perf regression because for every .to_local we are waiting. But if we think this is the defined or expected behavior, then that's fine.

[dtensor] full_tensor to return synchronously

aedb91a

full_tensor API should return synchronously instead of AsyncCollectiveTensor and if the return is that, we do the wait directly, this makes the full_tensor API be more percise [ghstack-poisoned]

wanchaol requested review from H-Huang, LucasLLC, awgu, d4l3k, fduwjj, fegin, kiukchung, kwen2501, mrshenli, rohan-varma, wz337 and zhaojuanmao as code owners November 9, 2023 00:50

wanchaol added ciflow/trunk Trigger trunk jobs on your pull request release notes: distributed (dtensor) release notes category labels Nov 9, 2023

wanchaol mentioned this pull request Nov 9, 2023

[funcol] add two APIs: wait() and numpy() #113323

Closed

wanchaol requested a review from wconstab November 9, 2023 00:58

wanchaol mentioned this pull request Nov 9, 2023

[funcol] a few optimizations to funcol #113324

Closed

Update on "[dtensor] full_tensor to return synchronously"

e098488

full_tensor API should return synchronously instead of AsyncCollectiveTensor and if the return is that, we do the wait directly, this makes the full_tensor API be more percise [ghstack-poisoned]

wz337 approved these changes Nov 9, 2023

View reviewed changes

pytorchmergebot added the merging label Nov 9, 2023

pytorchmergebot added the Merged label Nov 9, 2023

pytorchmergebot closed this in 9834fb7 Nov 9, 2023

pytorchmergebot removed the merging label Nov 9, 2023

facebook-github-bot deleted the gh/wanchaol/388/head branch November 13, 2023 15:25

fduwjj reviewed Nov 16, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[dtensor] full_tensor to return synchronously #113322

[dtensor] full_tensor to return synchronously #113322

Uh oh!

wanchaol commented Nov 9, 2023 •

edited

Loading

Uh oh!

pytorch-bot bot commented Nov 9, 2023 •

edited

Loading

Uh oh!

wz337 left a comment

Uh oh!

wanchaol commented Nov 9, 2023

Uh oh!

pytorchmergebot commented Nov 9, 2023

Uh oh!

fduwjj Nov 16, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[dtensor] full_tensor to return synchronously #113322

[dtensor] full_tensor to return synchronously #113322

Uh oh!

Conversation

wanchaol commented Nov 9, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Nov 9, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/113322

✅ No Failures

Uh oh!

wz337 left a comment

Choose a reason for hiding this comment

Uh oh!

wanchaol commented Nov 9, 2023

Uh oh!

pytorchmergebot commented Nov 9, 2023

Merge started

Uh oh!

fduwjj Nov 16, 2023

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

wanchaol commented Nov 9, 2023 •

edited

Loading

pytorch-bot bot commented Nov 9, 2023 •

edited

Loading