[PP] Allow intermediate nodes in ZB to have multiple grads #159084

H-Huang · 2025-07-24T21:19:13Z

Stack from ghstack (oldest at bottom):

-> [PP] Allow intermediate nodes in ZB to have multiple grads #159084

Fixes a ZB regression (https://github.com/pytorch/torchtitan/actions/runs/16478292562/job/46585646792)

Previously we only allowed an intermediate node to have 1 gradient. Recently a torchtitan ZB test started failing and I tracked to back to FusedRMSNorm grad_fn having two values (grad, None) (see #153666) and it started breaking our ZB tests.

This PR allows stage_backward_weight intermediate nodes to have multiple grads (it sums them together or if the grad value is None, then ignores it). Here is an example where the backward would have two grad values (gI1, gI2):

class Func(torch.autograd.Function):
    @staticmethod
    def forward(ctx, x):
        return x, 2
    @staticmethod
    def backward(ctx, gI1, gI2):
        assert gI2 is None
        return gI1

cc @awgu @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @pragupta

[ghstack-poisoned]

pytorch-bot · 2025-07-24T21:19:16Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/159084

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit cc48dfd with merge base 70b4a88 ():

UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:

pull / linux-jammy-py3_9-clang9-xla / test (xla, 1, 1, linux.12xlarge, unstable) (gh) (#158876)
sccache: error: couldn't connect to server

This comment was automatically generated by Dr. CI and updates every 15 minutes.

ghstack-source-id: 74b51af Pull Request resolved: #159084

H-Huang · 2025-07-24T21:20:54Z

torch/distributed/pipelining/_backward.py

        weight_grads.append(weight.grad)

    for param_group in param_groups:
-        # TODO: Handle case where intermediate can have multiple outputs


This PR fixes this TODO

Fixes a ZB regression (https://github.com/pytorch/torchtitan/actions/runs/16478292562/job/46585646792) Previously we only allowed an intermediate node to have 1 gradient. Recently a torchtitan ZB test started failing and I tracked to back to FusedRMSNorm grad_fn having two values `(grad, None)`. I am not sure why this was introduced but it started breaking our ZB tests. This PR allows `stage_backward_weight` intermediate nodes to have multiple grads (it sums them together or if the grad value is None, then ignores it). Here is an example where the backward would have two grad values (gI1, gI2): ```python class Func(torch.autograd.Function): staticmethod def forward(ctx, x): return x, 2 staticmethod def backward(ctx, gI1, gI2): assert gI2 is None return gI1 ``` cc awgu wanchaol fegin fduwjj wz337 wconstab d4l3k pragupta [ghstack-poisoned]

ghstack-source-id: 595bd6f Pull Request resolved: #159084

tianyu-l

The regression came from recent support of fused RMSNorm kernels #153666

I know that in the past it was going through some decomposition at aten level, but I didn't pay attention to why we are seeing more fields passed around.

If this fix is a general improvement, rather than specifically tailors to this regression, then sounds good to me.

H-Huang · 2025-07-25T14:20:58Z

The regression came from recent support of fused RMSNorm kernels #153666

Thanks for providing this context.

Yep, this fix is a general improvement, that change just happened to surface it!

tianyu-l

stamp to unblock

AaronWang04 · 2025-07-25T21:21:49Z

The regression came from recent support of fused RMSNorm kernels #153666

I know that in the past it was going through some decomposition at aten level, but I didn't pay attention to why we are seeing more fields passed around.

If this fix is a general improvement, rather than specifically tailors to this regression, then sounds good to me.

Just to add some context, an unfused composite implementation relies on autograd to save intermediate values for backprop

In a fused implementation, autograd doesn't have context as to what happens within the fused kernel, thus we have to return some intermediate values so that they can be used and do not have to be recomputed during the backward pass

H-Huang · 2025-07-27T19:09:33Z

@pytorchbot merge

pytorchmergebot · 2025-07-27T19:11:17Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

These should be fixed now that pytorch/pytorch#159084 has landed

@staticmethod

Fixes a ZB regression (https://github.com/pytorch/torchtitan/actions/runs/16478292562/job/46585646792) Previously we only allowed an intermediate node to have 1 gradient. Recently a torchtitan ZB test started failing and I tracked to back to FusedRMSNorm grad_fn having two values `(grad, None)` (see #153666) and it started breaking our ZB tests. This PR allows `stage_backward_weight` intermediate nodes to have multiple grads (it sums them together or if the grad value is None, then ignores it). Here is an example where the backward would have two grad values (gI1, gI2): ```python class Func(torch.autograd.Function): @staticmethod def forward(ctx, x): return x, 2 @staticmethod def backward(ctx, gI1, gI2): assert gI2 is None return gI1 ``` Pull Request resolved: #159084 Approved by: https://github.com/tianyu-l

These should be fixed now that pytorch/pytorch#159084 has landed

[PP] Allow intermediate nodes in ZB to have multiple grads

2f7b9a6

[ghstack-poisoned]

pytorch-bot bot added the oncall: distributed Add this issue/PR to distributed oncall triage queue label Jul 24, 2025

H-Huang added a commit that referenced this pull request Jul 24, 2025

[PP] Allow intermediate nodes in ZB to have multiple grads

fbf7e82

ghstack-source-id: 74b51af Pull Request resolved: #159084

H-Huang commented Jul 24, 2025

View reviewed changes

H-Huang added pipeline parallelism Issues related to https://pytorch.org/docs/master/pipeline.html release notes: distributed (pipeline) release notes category labels Jul 24, 2025

H-Huang requested review from kwen2501, tianyu-l and wconstab July 24, 2025 21:27

H-Huang added a commit that referenced this pull request Jul 24, 2025

[PP] Allow intermediate nodes in ZB to have multiple grads

a2e65cb

ghstack-source-id: 595bd6f Pull Request resolved: #159084

tianyu-l reviewed Jul 24, 2025

View reviewed changes

tianyu-l approved these changes Jul 25, 2025

View reviewed changes

H-Huang added the ciflow/trunk Trigger trunk jobs on your pull request label Jul 25, 2025

pytorchmergebot added the merging label Jul 27, 2025

pytorchmergebot closed this in ede6186 Jul 27, 2025

pytorchmergebot added Merged and removed merging labels Jul 27, 2025

H-Huang mentioned this pull request Jul 28, 2025

Re-enable pipeline parallel tests pytorch/torchtitan#1477

Merged

H-Huang added a commit to pytorch/torchtitan that referenced this pull request Jul 29, 2025

Re-enable pipeline parallel tests (#1477)

f26179e

These should be fixed now that pytorch/pytorch#159084 has landed

bentherien pushed a commit to bentherien/torchtitan_ that referenced this pull request Aug 5, 2025

Re-enable pipeline parallel tests (pytorch#1477)

07adfae

These should be fixed now that pytorch/pytorch#159084 has landed

joellidin pushed a commit to one-covenant/torchtitan that referenced this pull request Aug 8, 2025

Re-enable pipeline parallel tests (pytorch#1477)

08656a7

These should be fixed now that pytorch/pytorch#159084 has landed

joellidin pushed a commit to one-covenant/torchtitan that referenced this pull request Aug 8, 2025

Re-enable pipeline parallel tests (pytorch#1477)

2ffe36c

These should be fixed now that pytorch/pytorch#159084 has landed

github-actions bot deleted the gh/H-Huang/198/head branch August 27, 2025 02:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[PP] Allow intermediate nodes in ZB to have multiple grads #159084

[PP] Allow intermediate nodes in ZB to have multiple grads #159084

Uh oh!

H-Huang commented Jul 24, 2025 •

edited

Loading

Uh oh!

pytorch-bot bot commented Jul 24, 2025 •

edited

Loading

Uh oh!

H-Huang Jul 24, 2025

Uh oh!

tianyu-l left a comment

Uh oh!

H-Huang commented Jul 25, 2025 •

edited

Loading

Uh oh!

tianyu-l left a comment

Uh oh!

AaronWang04 commented Jul 25, 2025

Uh oh!

H-Huang commented Jul 27, 2025

Uh oh!

pytorchmergebot commented Jul 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[PP] Allow intermediate nodes in ZB to have multiple grads #159084

[PP] Allow intermediate nodes in ZB to have multiple grads #159084

Uh oh!

Conversation

H-Huang commented Jul 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Jul 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/159084

✅ You can merge normally! (1 Unrelated Failure)

Uh oh!

H-Huang Jul 24, 2025

Choose a reason for hiding this comment

Uh oh!

tianyu-l left a comment

Choose a reason for hiding this comment

Uh oh!

H-Huang commented Jul 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tianyu-l left a comment

Choose a reason for hiding this comment

Uh oh!

AaronWang04 commented Jul 25, 2025

Uh oh!

H-Huang commented Jul 27, 2025

Uh oh!

pytorchmergebot commented Jul 27, 2025

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

H-Huang commented Jul 24, 2025 •

edited

Loading

pytorch-bot bot commented Jul 24, 2025 •

edited

Loading

H-Huang commented Jul 25, 2025 •

edited

Loading