[DTensor] dispatch to sharding prop over decomps #159324

wconstab · 2025-07-29T01:12:48Z

Stack from ghstack (oldest at bottom):

-> [DTensor] dispatch to sharding prop over decomps #159324

Fixes #159110

cc @H-Huang @awgu @wanchaol @fegin @fduwjj @wz337 @d4l3k @pragupta

Fixes #159110 [ghstack-poisoned]

pytorch-bot · 2025-07-29T01:12:52Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/159324

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit d175ad9 with merge base 1abff80 ():

UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:

pull / linux-jammy-py3_9-clang9-xla / test (xla, 1, 1, linux.12xlarge, unstable) (gh) (#158876)
sccache: error: couldn't connect to server

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Fixes #159110 ghstack-source-id: 39a25be Pull Request resolved: #159324

Fixes #159110 cc H-Huang awgu wanchaol fegin fduwjj wz337 d4l3k pragupta [ghstack-poisoned]

Fixes #159110 ghstack-source-id: 303e356 Pull Request resolved: #159324

eqy · 2025-07-29T17:12:55Z

Does that mean this needs to be updated if rms_norm will now go to the fused path?

pytorch/test/distributed/tensor/test_math_ops.py

Line 633 in c55e72b

expected_fwd_comm = 0 if shard_dim < norm_idx else 2

CC @AaronWang04

AaronWang04 · 2025-07-29T17:45:19Z

@eqy I didn't merge in sharding rule for forward pass of rms_norm since it never got triggered. Will add a PR for that and update the test after this gets merged

wconstab · 2025-07-29T17:53:46Z

@pytorchbot merge

pytorchmergebot · 2025-07-29T17:57:10Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

tianyu-l · 2025-07-30T03:56:23Z

@AaronWang04
I'm afraid this change will break torchtitan again.
So it'd be great if you could help with the forward op strategy soon.

AaronWang04 · 2025-07-30T05:10:48Z

@tianyu-l I planned to add the forward op sharding strategy once this PR is stable. why would this change break torchtitan? I expect this change to be able to fall back on composite

Fixes #159110 Pull Request resolved: #159324 Approved by: https://github.com/ezyang

Reduces collective calls in the forward pass from 2 to 1 In #158716 I added the sharding rule for the backward pass but didn't add the forward pass as it didn't get dispatched. After #159324 this should get properly dispatched hence I am adding it now. Pull Request resolved: #159692 Approved by: https://github.com/tianyu-l

Reduces collective calls in the forward pass from 2 to 1 In pytorch#158716 I added the sharding rule for the backward pass but didn't add the forward pass as it didn't get dispatched. After pytorch#159324 this should get properly dispatched hence I am adding it now. Pull Request resolved: pytorch#159692 Approved by: https://github.com/tianyu-l

[DTensor] dispatch to sharding prop over decomps

cca254e

Fixes #159110 [ghstack-poisoned]

pytorch-bot bot added ciflow/inductor oncall: distributed Add this issue/PR to distributed oncall triage queue labels Jul 29, 2025

wconstab added a commit that referenced this pull request Jul 29, 2025

[DTensor] dispatch to sharding prop over decomps

54e9d82

Fixes #159110 ghstack-source-id: 39a25be Pull Request resolved: #159324

Update on "[DTensor] dispatch to sharding prop over decomps"

d175ad9

Fixes #159110 cc H-Huang awgu wanchaol fegin fduwjj wz337 d4l3k pragupta [ghstack-poisoned]

wconstab added a commit that referenced this pull request Jul 29, 2025

[DTensor] dispatch to sharding prop over decomps

f8be3c8

Fixes #159110 ghstack-source-id: 303e356 Pull Request resolved: #159324

wconstab added the release notes: distributed (dtensor) release notes category label Jul 29, 2025

wconstab mentioned this pull request Jul 29, 2025

TP broken due to newly added fused RMSNorm op pytorch/torchtitan#1421

Closed

ezyang approved these changes Jul 29, 2025

View reviewed changes

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Jul 29, 2025

pytorchmergebot added the merging label Jul 29, 2025

pytorchmergebot added the Merged label Jul 29, 2025

pytorchmergebot closed this in 2176d48 Jul 29, 2025

pytorchmergebot removed the merging label Jul 29, 2025

yangw-dev pushed a commit that referenced this pull request Aug 1, 2025

[DTensor] dispatch to sharding prop over decomps (#159324)

ae09c1a

Fixes #159110 Pull Request resolved: #159324 Approved by: https://github.com/ezyang

AaronWang04 mentioned this pull request Aug 5, 2025

[DTensor] Registers sharding rule for rms_norm #159692

Closed

github-actions bot deleted the gh/wconstab/432/head branch August 30, 2025 02:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[DTensor] dispatch to sharding prop over decomps #159324

[DTensor] dispatch to sharding prop over decomps #159324

Uh oh!

wconstab commented Jul 29, 2025 •

edited by pytorch-bot bot

Loading

Uh oh!

pytorch-bot bot commented Jul 29, 2025 •

edited

Loading

Uh oh!

eqy commented Jul 29, 2025

Uh oh!

AaronWang04 commented Jul 29, 2025

Uh oh!

wconstab commented Jul 29, 2025

Uh oh!

pytorchmergebot commented Jul 29, 2025

Uh oh!

tianyu-l commented Jul 30, 2025

Uh oh!

AaronWang04 commented Jul 30, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

[DTensor] dispatch to sharding prop over decomps #159324

[DTensor] dispatch to sharding prop over decomps #159324

Uh oh!

Conversation

wconstab commented Jul 29, 2025 • edited by pytorch-bot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Jul 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/159324

✅ You can merge normally! (1 Unrelated Failure)

Uh oh!

eqy commented Jul 29, 2025

Uh oh!

AaronWang04 commented Jul 29, 2025

Uh oh!

wconstab commented Jul 29, 2025

Uh oh!

pytorchmergebot commented Jul 29, 2025

Merge started

Uh oh!

tianyu-l commented Jul 30, 2025

Uh oh!

AaronWang04 commented Jul 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

wconstab commented Jul 29, 2025 •

edited by pytorch-bot bot

Loading

pytorch-bot bot commented Jul 29, 2025 •

edited

Loading

AaronWang04 commented Jul 30, 2025 •

edited

Loading