-
Notifications
You must be signed in to change notification settings - Fork 25.7k
Refactor commonalities between two approaches #62624
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
[ghstack-poisoned]
🔗 Helpful links
💊 CI failures summary and remediationsAs of commit 30ebafb (more details on the Dr. CI page):
1 failure not recognized by patterns:
This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.Please report bugs/suggestions to the (internal) Dr. CI Users group. |
[ghstack-poisoned]
|
@andwgu has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks for improving code readability.
| assert bucket_index in overlap_info.offsets, \ | ||
| f"Bucket index {bucket_index} was not assigned to rank {rank}" | ||
| offset = overlap_info.offsets[bucket_index] | ||
| bucket_gradients = bucket.gradients() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: bucket_gradients seems just used once, looks like we don't need to create a var for it?
| bucket: dist.GradBucket, | ||
| zero: ZeroRedundancyOptimizer, | ||
| rank: int, | ||
| rank_to_update: int, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
owner_rank?
And also, would I be correct if I assume owner_rank is always the same as the source_rank argument in _broadcast_bucket, as the owner of those params should both update and broadcast. If yes, let's consolidate these two args to use the same name.
**Overview:** This refactors some commonalities between the two approaches to overlapping DDP with ZeRO. This also partially addresses this comment: #62157 (comment) **Test Plan:** ``` gpurun4 python test/distributed/optim/test_zero_redundancy_optimizer.py ``` Differential Revision: [D30058543](https://our.internmc.facebook.com/intern/diff/D30058543) [ghstack-poisoned]
|
@andwgu has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
Overview:
This refactors some commonalities between the two approaches to overlapping DDP with ZeRO. This also partially addresses this comment: #62157 (comment)
Test Plan:
Stack from ghstack:
Differential Revision: D30058543