Making batching rule for F.embedding DTensor-aware #162117

zou3519 · 2025-09-04T00:37:06Z

Stack from ghstack (oldest at bottom):

-> Making batching rule for F.embedding DTensor-aware #162117

vmap(F.embedding)(DTensor, DTensor) was failing because F.embedding's
batching rule generates a new tensor via at::arange, at::arange
generates a regular tensor, and DTensor rightfully errors on mixed
DTensor-regular Tensor operations.

This PR fixes the problem by activating DTensor implicit replication on
just the at::arange and the subsequent add operation.

In order to accomplish this I move the DTensor implicit replication flag
to C++ (most batching rules are in C++).

Test Plan:

new test

cc @H-Huang @awgu @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @pragupta @ezyang @msaroufim

`vmap(F.embedding)(DTensor, DTensor)` was failing because F.embedding's batching rule generates a new tensor via at::arange, at::arange generates a regular tensor, and DTensor rightfully errors on mixed DTensor-regular Tensor operations. This PR fixes the problem by activating DTensor implicit replication on every non-randomness batching rule. This is safe to do, because batching rules must return tensors of the same shape and factory functions will not return tensors of different values. In order to accomplish this I move the DTensor implicit replication flag to C++ (most batching rules are in C++). Test Plan: - new test [ghstack-poisoned]

pytorch-bot · 2025-09-04T00:37:10Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/162117

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 0bb02bb with merge base f4c33cd ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

`vmap(F.embedding)(DTensor, DTensor)` was failing because F.embedding's batching rule generates a new tensor via at::arange, at::arange generates a regular tensor, and DTensor rightfully errors on mixed DTensor-regular Tensor operations. This PR fixes the problem by activating DTensor implicit replication on every non-randomness batching rule. This is safe to do, because batching rules must return tensors of the same shape and factory functions will not return tensors of different values. In order to accomplish this I move the DTensor implicit replication flag to C++ (most batching rules are in C++). Test Plan: - new test ghstack-source-id: 911c4af Pull Request resolved: #162117

`vmap(F.embedding)(DTensor, DTensor)` was failing because F.embedding's batching rule generates a new tensor via at::arange, at::arange generates a regular tensor, and DTensor rightfully errors on mixed DTensor-regular Tensor operations. This PR fixes the problem by activating DTensor implicit replication on every non-randomness batching rule. This is safe to do, because batching rules must return tensors of the same shape and factory functions will not return tensors of different values. In order to accomplish this I move the DTensor implicit replication flag to C++ (most batching rules are in C++). Test Plan: - new test cc H-Huang awgu wanchaol fegin fduwjj wz337 wconstab d4l3k pragupta [ghstack-poisoned]

`vmap(F.embedding)(DTensor, DTensor)` was failing because F.embedding's batching rule generates a new tensor via at::arange, at::arange generates a regular tensor, and DTensor rightfully errors on mixed DTensor-regular Tensor operations. This PR fixes the problem by activating DTensor implicit replication on every non-randomness batching rule. This is safe to do, because batching rules must return tensors of the same shape and factory functions will not return tensors of different values. In order to accomplish this I move the DTensor implicit replication flag to C++ (most batching rules are in C++). Test Plan: - new test ghstack-source-id: 2ad3045 Pull Request resolved: #162117

torchgen/gen_vmap_plumbing.py

`vmap(F.embedding)(DTensor, DTensor)` was failing because F.embedding's batching rule generates a new tensor via at::arange, at::arange generates a regular tensor, and DTensor rightfully errors on mixed DTensor-regular Tensor operations. This PR fixes the problem by activating DTensor implicit replication on just the at::arange and the subsequent add operation. In order to accomplish this I move the DTensor implicit replication flag to C++ (most batching rules are in C++). Test Plan: - new test [ghstack-poisoned]

`vmap(F.embedding)(DTensor, DTensor)` was failing because F.embedding's batching rule generates a new tensor via at::arange, at::arange generates a regular tensor, and DTensor rightfully errors on mixed DTensor-regular Tensor operations. This PR fixes the problem by activating DTensor implicit replication on just the at::arange and the subsequent add operation. In order to accomplish this I move the DTensor implicit replication flag to C++ (most batching rules are in C++). Test Plan: - new test ghstack-source-id: 58b403f Pull Request resolved: #162117

zou3519 · 2025-09-05T13:28:00Z

Updated to a more local workaround since the previous approach didn't work

bdhirsh · 2025-09-05T18:03:58Z

build_variables.bzl

    "aten/src/ATen/DeviceAccelerator.cpp",
    "aten/src/ATen/Context.cpp",
    "aten/src/ATen/DLConvertor.cpp",
+    "aten/src/ATen/DTensorState.cpp",


oh god non-globbed ATen file sources

bdhirsh

sgtm!

zou3519 · 2025-09-05T18:31:37Z

@pytorchbot merge

pytorchmergebot · 2025-09-05T18:33:26Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

`vmap(F.embedding)(DTensor, DTensor)` was failing because F.embedding's batching rule generates a new tensor via at::arange, at::arange generates a regular tensor, and DTensor rightfully errors on mixed DTensor-regular Tensor operations. This PR fixes the problem by activating DTensor implicit replication on just the at::arange and the subsequent add operation. In order to accomplish this I move the DTensor implicit replication flag to C++ (most batching rules are in C++). Test Plan: - new test Pull Request resolved: pytorch#162117 Approved by: https://github.com/bdhirsh

F.one_hot(dtensor) used to run into a mixed DTensor-Tensor operation due to an arange call creating a new Tensor (not DTensor). This PR fixes it by allowing implicit replication of Tensors for the arange call and the one consumer of the arange call (the at::eq call). Test Plan: - new test. Also, F.one_hot(num_classes=-1) is broken so we skip that. Pull Request resolved: #162307 Approved by: https://github.com/ezyang ghstack dependencies: #162117

`vmap(F.embedding)(DTensor, DTensor)` was failing because F.embedding's batching rule generates a new tensor via at::arange, at::arange generates a regular tensor, and DTensor rightfully errors on mixed DTensor-regular Tensor operations. This PR fixes the problem by activating DTensor implicit replication on just the at::arange and the subsequent add operation. In order to accomplish this I move the DTensor implicit replication flag to C++ (most batching rules are in C++). Test Plan: - new test Pull Request resolved: pytorch#162117 Approved by: https://github.com/bdhirsh

F.one_hot(dtensor) used to run into a mixed DTensor-Tensor operation due to an arange call creating a new Tensor (not DTensor). This PR fixes it by allowing implicit replication of Tensors for the arange call and the one consumer of the arange call (the at::eq call). Test Plan: - new test. Also, F.one_hot(num_classes=-1) is broken so we skip that. Pull Request resolved: pytorch#162307 Approved by: https://github.com/ezyang ghstack dependencies: pytorch#162117

…or operations (#162651) Also updates the error message to point to the guide. Pull Request resolved: #162651 Approved by: https://github.com/ezyang ghstack dependencies: #162117, #162307

This PR adds an experimental way to register a custom rule for if inductor should partition the graph around an operator. Test Plan: - new test Pull Request resolved: #163310 Approved by: https://github.com/ProExpertProg, https://github.com/BoyuanFeng, https://github.com/eellison ghstack dependencies: #162117, #162307, #162651

`vmap(F.embedding)(DTensor, DTensor)` was failing because F.embedding's batching rule generates a new tensor via at::arange, at::arange generates a regular tensor, and DTensor rightfully errors on mixed DTensor-regular Tensor operations. This PR fixes the problem by activating DTensor implicit replication on just the at::arange and the subsequent add operation. In order to accomplish this I move the DTensor implicit replication flag to C++ (most batching rules are in C++). Test Plan: - new test Pull Request resolved: pytorch#162117 Approved by: https://github.com/bdhirsh

F.one_hot(dtensor) used to run into a mixed DTensor-Tensor operation due to an arange call creating a new Tensor (not DTensor). This PR fixes it by allowing implicit replication of Tensors for the arange call and the one consumer of the arange call (the at::eq call). Test Plan: - new test. Also, F.one_hot(num_classes=-1) is broken so we skip that. Pull Request resolved: pytorch#162307 Approved by: https://github.com/ezyang ghstack dependencies: pytorch#162117

…or operations (pytorch#162651) Also updates the error message to point to the guide. Pull Request resolved: pytorch#162651 Approved by: https://github.com/ezyang ghstack dependencies: pytorch#162117, pytorch#162307

This PR adds an experimental way to register a custom rule for if inductor should partition the graph around an operator. Test Plan: - new test Pull Request resolved: pytorch#163310 Approved by: https://github.com/ProExpertProg, https://github.com/BoyuanFeng, https://github.com/eellison ghstack dependencies: pytorch#162117, pytorch#162307, pytorch#162651

`vmap(F.embedding)(DTensor, DTensor)` was failing because F.embedding's batching rule generates a new tensor via at::arange, at::arange generates a regular tensor, and DTensor rightfully errors on mixed DTensor-regular Tensor operations. This PR fixes the problem by activating DTensor implicit replication on just the at::arange and the subsequent add operation. In order to accomplish this I move the DTensor implicit replication flag to C++ (most batching rules are in C++). Test Plan: - new test Pull Request resolved: pytorch#162117 Approved by: https://github.com/bdhirsh

F.one_hot(dtensor) used to run into a mixed DTensor-Tensor operation due to an arange call creating a new Tensor (not DTensor). This PR fixes it by allowing implicit replication of Tensors for the arange call and the one consumer of the arange call (the at::eq call). Test Plan: - new test. Also, F.one_hot(num_classes=-1) is broken so we skip that. Pull Request resolved: pytorch#162307 Approved by: https://github.com/ezyang ghstack dependencies: pytorch#162117

…or operations (pytorch#162651) Also updates the error message to point to the guide. Pull Request resolved: pytorch#162651 Approved by: https://github.com/ezyang ghstack dependencies: pytorch#162117, pytorch#162307

This PR adds an experimental way to register a custom rule for if inductor should partition the graph around an operator. Test Plan: - new test Pull Request resolved: pytorch#163310 Approved by: https://github.com/ProExpertProg, https://github.com/BoyuanFeng, https://github.com/eellison ghstack dependencies: pytorch#162117, pytorch#162307, pytorch#162651

This PR adds an experimental way to register a custom rule for if inductor should partition the graph around an operator. Test Plan: - new test Pull Request resolved: #163310 Approved by: https://github.com/ProExpertProg, https://github.com/BoyuanFeng, https://github.com/eellison ghstack dependencies: #162117, #162307, #162651

`vmap(F.embedding)(DTensor, DTensor)` was failing because F.embedding's batching rule generates a new tensor via at::arange, at::arange generates a regular tensor, and DTensor rightfully errors on mixed DTensor-regular Tensor operations. This PR fixes the problem by activating DTensor implicit replication on just the at::arange and the subsequent add operation. In order to accomplish this I move the DTensor implicit replication flag to C++ (most batching rules are in C++). Test Plan: - new test Pull Request resolved: pytorch#162117 Approved by: https://github.com/bdhirsh

F.one_hot(dtensor) used to run into a mixed DTensor-Tensor operation due to an arange call creating a new Tensor (not DTensor). This PR fixes it by allowing implicit replication of Tensors for the arange call and the one consumer of the arange call (the at::eq call). Test Plan: - new test. Also, F.one_hot(num_classes=-1) is broken so we skip that. Pull Request resolved: pytorch#162307 Approved by: https://github.com/ezyang ghstack dependencies: pytorch#162117

…or operations (pytorch#162651) Also updates the error message to point to the guide. Pull Request resolved: pytorch#162651 Approved by: https://github.com/ezyang ghstack dependencies: pytorch#162117, pytorch#162307

This PR adds an experimental way to register a custom rule for if inductor should partition the graph around an operator. Test Plan: - new test Pull Request resolved: pytorch#163310 Approved by: https://github.com/ProExpertProg, https://github.com/BoyuanFeng, https://github.com/eellison ghstack dependencies: pytorch#162117, pytorch#162307, pytorch#162651

pytorch-bot bot added ciflow/inductor oncall: distributed Add this issue/PR to distributed oncall triage queue labels Sep 4, 2025

zou3519 requested review from bdhirsh and tianyu-l September 4, 2025 13:29

ezyang reviewed Sep 4, 2025

View reviewed changes

torchgen/gen_vmap_plumbing.py Outdated Show resolved Hide resolved

bdhirsh reviewed Sep 4, 2025

View reviewed changes

torchgen/gen_vmap_plumbing.py Outdated Show resolved Hide resolved

zou3519 changed the title ~~Batching rules assume DTensor implicit replication~~ Making batching rule for F.embedding DTensor-aware Sep 4, 2025

zou3519 requested review from bdhirsh and ezyang September 5, 2025 13:27

bdhirsh reviewed Sep 5, 2025

View reviewed changes

bdhirsh approved these changes Sep 5, 2025

View reviewed changes

zou3519 added ciflow/trunk Trigger trunk jobs on your pull request release notes: distributed (dtensor) release notes category labels Sep 5, 2025

pytorchmergebot added the merging label Sep 5, 2025

pytorchmergebot added the Merged label Sep 5, 2025

pytorchmergebot closed this in 70d36e0 Sep 5, 2025

pytorchmergebot removed the merging label Sep 5, 2025

This was referenced Sep 5, 2025

[DTensor] fix F.one_hot #162307

Closed

[DTensor] DTensor.item() should Replicate() inputs #162312

Open

zou3519 mentioned this pull request Sep 10, 2025

[DTensor] Add guide for what to do about mixed torch.Tensor and DTensor operations #162651

Closed

zou3519 mentioned this pull request Sep 19, 2025

[graph partition] Add way to register custom rule #163310

Closed

zou3519 mentioned this pull request Sep 20, 2025

[graph partition] Add way to register custom rule (#163310) #163395

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Making batching rule for F.embedding DTensor-aware #162117

Making batching rule for F.embedding DTensor-aware #162117

Uh oh!

zou3519 commented Sep 4, 2025 •

edited by pytorch-bot bot

Loading

Uh oh!

pytorch-bot bot commented Sep 4, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

zou3519 commented Sep 5, 2025

Uh oh!

bdhirsh Sep 5, 2025

Uh oh!

bdhirsh left a comment

Uh oh!

zou3519 commented Sep 5, 2025

Uh oh!

pytorchmergebot commented Sep 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Making batching rule for F.embedding DTensor-aware #162117

Making batching rule for F.embedding DTensor-aware #162117

Uh oh!

Conversation

zou3519 commented Sep 4, 2025 • edited by pytorch-bot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Sep 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/162117

✅ No Failures

Uh oh!

Uh oh!

Uh oh!

zou3519 commented Sep 5, 2025

Uh oh!

bdhirsh Sep 5, 2025

Choose a reason for hiding this comment

Uh oh!

bdhirsh left a comment

Choose a reason for hiding this comment

Uh oh!

zou3519 commented Sep 5, 2025

Uh oh!

pytorchmergebot commented Sep 5, 2025

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

zou3519 commented Sep 4, 2025 •

edited by pytorch-bot bot

Loading

pytorch-bot bot commented Sep 4, 2025 •

edited

Loading