KEMBAR78
Refactor commonalities between two approaches by awgu · Pull Request #62624 · pytorch/pytorch · GitHub
Skip to content

Conversation

@awgu
Copy link
Collaborator

@awgu awgu commented Aug 2, 2021

Overview:
This refactors some commonalities between the two approaches to overlapping DDP with ZeRO. This also partially addresses this comment: #62157 (comment)

Test Plan:

gpurun4 python test/distributed/optim/test_zero_redundancy_optimizer.py

Stack from ghstack:

Differential Revision: D30058543

@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented Aug 2, 2021

🔗 Helpful links

💊 CI failures summary and remediations

As of commit 30ebafb (more details on the Dr. CI page):


  • 1/1 failures introduced in this PR

1 failure not recognized by patterns:

Job Step Action
GitHub Actions Lint / flake8-py3 Fail if there were any warnings 🔁 rerun

This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

awgu pushed a commit that referenced this pull request Aug 3, 2021
ghstack-source-id: 3bf8cdf
Pull Request resolved: #62624
@awgu
Copy link
Collaborator Author

awgu commented Aug 3, 2021

@andwgu has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Copy link
Contributor

@mrshenli mrshenli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for improving code readability.

assert bucket_index in overlap_info.offsets, \
f"Bucket index {bucket_index} was not assigned to rank {rank}"
offset = overlap_info.offsets[bucket_index]
bucket_gradients = bucket.gradients()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: bucket_gradients seems just used once, looks like we don't need to create a var for it?

bucket: dist.GradBucket,
zero: ZeroRedundancyOptimizer,
rank: int,
rank_to_update: int,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

owner_rank?

And also, would I be correct if I assume owner_rank is always the same as the source_rank argument in _broadcast_bucket, as the owner of those params should both update and broadcast. If yes, let's consolidate these two args to use the same name.

**Overview:**
This refactors some commonalities between the two approaches to overlapping DDP with ZeRO. This also partially addresses this comment: #62157 (comment)

**Test Plan:**
```
gpurun4 python test/distributed/optim/test_zero_redundancy_optimizer.py
```


Differential Revision: [D30058543](https://our.internmc.facebook.com/intern/diff/D30058543)

[ghstack-poisoned]
awgu pushed a commit that referenced this pull request Aug 3, 2021
ghstack-source-id: d6043f8
Pull Request resolved: #62624
@awgu
Copy link
Collaborator Author

awgu commented Aug 3, 2021

@andwgu has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@andwgu merged this pull request in 43327cc.

@facebook-github-bot facebook-github-bot deleted the gh/andwgu/14/head branch August 7, 2021 14:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla signed Merged oncall: distributed Add this issue/PR to distributed oncall triage queue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants