[DTensor] Support matmul in inference_mode #142197

kwen2501 · 2024-12-06T01:22:56Z

Stack from ghstack (oldest at bottom):

-> [DTensor] Support matmul in inference_mode #142197

The solution is to add a decompose_handler for aten.matmul, similar to how we handle aten.linear.
With the decomposition, aten.matmul becomes aten.mm which has sharding strategy registered with DTensor.

cc @H-Huang @awgu @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @c-p-i-o

[ghstack-poisoned]

pytorch-bot · 2024-12-06T01:22:59Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/142197

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit ac1c07d with merge base 61dc5e9 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

ghstack-source-id: 3dd1f5f Pull Request resolved: #142197

XilunWu

LGTM except the var naming.

XilunWu · 2024-12-06T01:42:08Z

test/distributed/_tensor/test_matrix_ops.py

+        dx = distribute_tensor(x, device_mesh, [Replicate()])
+        dA = distribute_tensor(A, device_mesh, [Shard(0)])


i remember that people complain the var name dX dA indicates gradients. Names such as x_dist A_distare preferred. cc @awgu

kwen2501 · 2024-12-06T02:44:15Z

@pytorchbot merge

pytorchmergebot · 2024-12-06T02:45:53Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

wz337

Thanks. I learned how to let DTensor decompose through this diff.

Fixes #142190 . The solution is to add a `decompose_handler` for `aten.matmul`, similar to how we handle `aten.linear`. With the decomposition, `aten.matmul` becomes `aten.mm` which has sharding strategy registered with DTensor. Pull Request resolved: #142197 Approved by: https://github.com/XilunWu, https://github.com/wz337

[DTensor] Support matmul in inference_mode

ac1c07d

[ghstack-poisoned]

pytorch-bot bot added the oncall: distributed Add this issue/PR to distributed oncall triage queue label Dec 6, 2024

kwen2501 added a commit that referenced this pull request Dec 6, 2024

[DTensor] Support matmul in inference_mode

f02625a

ghstack-source-id: 3dd1f5f Pull Request resolved: #142197

kwen2501 requested review from bdhirsh and wz337 December 6, 2024 01:28

kwen2501 added the release notes: distributed (dtensor) release notes category label Dec 6, 2024

XilunWu approved these changes Dec 6, 2024

View reviewed changes

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Dec 6, 2024

pytorchmergebot added the merging label Dec 6, 2024

wz337 approved these changes Dec 6, 2024

View reviewed changes

pytorchmergebot added the Merged label Dec 6, 2024

pytorchmergebot closed this in 8bdcdae Dec 6, 2024

pytorchmergebot removed the merging label Dec 6, 2024

github-actions bot deleted the gh/kwen2501/110/head branch January 6, 2025 02:08

kwen2501 mentioned this pull request Feb 3, 2025

Add sharding strategy for torch.distributed.tensor.parallel.ParallelStyle with inference_mode #145725

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[DTensor] Support matmul in inference_mode #142197

[DTensor] Support matmul in inference_mode #142197

Uh oh!

kwen2501 commented Dec 6, 2024 •

edited

Loading

Uh oh!

pytorch-bot bot commented Dec 6, 2024 •

edited

Loading

Uh oh!

XilunWu left a comment

Uh oh!

XilunWu Dec 6, 2024

Uh oh!

kwen2501 commented Dec 6, 2024

Uh oh!

pytorchmergebot commented Dec 6, 2024

Uh oh!

wz337 left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

		dx = distribute_tensor(x, device_mesh, [Replicate()])
		dA = distribute_tensor(A, device_mesh, [Shard(0)])

[DTensor] Support matmul in inference_mode #142197

[DTensor] Support matmul in inference_mode #142197

Uh oh!

Conversation

kwen2501 commented Dec 6, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Dec 6, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/142197

✅ No Failures

Uh oh!

XilunWu left a comment

Choose a reason for hiding this comment

Uh oh!

XilunWu Dec 6, 2024

Choose a reason for hiding this comment

Uh oh!

kwen2501 commented Dec 6, 2024

Uh oh!

pytorchmergebot commented Dec 6, 2024

Merge started

Uh oh!

wz337 left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

kwen2501 commented Dec 6, 2024 •

edited

Loading

pytorch-bot bot commented Dec 6, 2024 •

edited

Loading