KEMBAR78
[pipelining] add type checking to _backward functions by H-Huang · Pull Request #140019 · pytorch/pytorch · GitHub
Skip to content

Conversation

@H-Huang
Copy link
Member

@H-Huang H-Huang commented Nov 7, 2024

@pytorch-bot
Copy link

pytorch-bot bot commented Nov 7, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/140019

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit c31dde6 with merge base 0a0915f (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

H-Huang added a commit that referenced this pull request Nov 7, 2024
ghstack-source-id: 760c097
Pull Request resolved: #140019
@pytorch-bot pytorch-bot bot added the oncall: distributed Add this issue/PR to distributed oncall triage queue label Nov 7, 2024
@H-Huang H-Huang added the release notes: distributed (pipeline) release notes category label Nov 7, 2024
@H-Huang H-Huang requested review from kwen2501 and wconstab and removed request for kwen2501 November 12, 2024 16:15
@H-Huang H-Huang marked this pull request as ready for review November 12, 2024 16:15
@H-Huang H-Huang requested a review from kwen2501 November 12, 2024 16:16
fix #139405


cc awgu kwen2501 wanchaol fegin fduwjj wz337 wconstab d4l3k c-p-i-o

[ghstack-poisoned]
H-Huang added a commit that referenced this pull request Nov 12, 2024
ghstack-source-id: 944b282
Pull Request resolved: #140019
handles.append(handle)

# Stage 0 inputs do not require grads? Should we skip in that case?
if all(tensor.requires_grad for tensor in input_values):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how come this if condition can be removed now?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we no longer call stage_backward_input for the first stage anymore

weight.grad += dw
# return grads in the original order weights were provided in
return weight_grads
return tuple(weight_grads)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these changes look pretty innocuous to me, but can you convince me that this change doesn't add any restriction or limitation to the user code?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't think of any restrictions unless the user is explicitly was explicitly checking the type is list. But in terms of consistency, the autograd.grad() API (https://pytorch.org/docs/stable/generated/torch.autograd.grad.html#torch-autograd-grad) also returns a tuple, so this changes matches it better.

Copy link
Contributor

@wconstab wconstab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, thanks!

pytorchmergebot pushed a commit that referenced this pull request Nov 12, 2024
Clean up methods related to stage input/output shape verification which are no longer needed

Pull Request resolved: #140418
Approved by: https://github.com/wconstab
ghstack dependencies: #140019
pobin6 pushed a commit to pobin6/pytorch that referenced this pull request Dec 5, 2024
pobin6 pushed a commit to pobin6/pytorch that referenced this pull request Dec 5, 2024
Clean up methods related to stage input/output shape verification which are no longer needed

Pull Request resolved: pytorch#140418
Approved by: https://github.com/wconstab
ghstack dependencies: pytorch#140019
@github-actions github-actions bot deleted the gh/H-Huang/153/head branch December 14, 2024 02:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Merged oncall: distributed Add this issue/PR to distributed oncall triage queue release notes: distributed (pipeline) release notes category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants