-
Notifications
You must be signed in to change notification settings - Fork 25.7k
[DDP] Custom buffer reduction #64513
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Proposal: #63041 Support custom buffer reduction in DDP via hook Differential Revision: [D30751152](https://our.internmc.facebook.com/intern/diff/D30751152/) [ghstack-poisoned]
🔗 Helpful links
💊 CI failures summary and remediationsAs of commit fe5d59b (more details on the Dr. CI page):
🕵️ 1 new failure recognized by patternsThe following CI failures do not appear to be due to upstream breakages:
|
Proposal: #63041 Support custom buffer reduction in DDP via hook. This is required for an internal training use case and could be more broadly applicable to OSS scenarios. Differential Revision: [D30751152](https://our.internmc.facebook.com/intern/diff/D30751152/) cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse agolynski @SciPioneer @H-Huang mrzzd cbalioglu gcramer23 [ghstack-poisoned]
| state, | ||
| hook: callable, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: make state after hook? following up (func, args) convention?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pointing this out!
I followed the same order as comm hook (state, hook): https://github.com/pytorch/pytorch/blob/master/torch/nn/parallel/distributed.py#L1224.
Should we try to keep it the same or change the order here?
|
|
||
| @dataclass | ||
| class _BufferCommHook: | ||
| buffer_comm_hook: Callable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: document how the signature of a buffer_comm_hook looks like:
def buffer_comm_hook(state, self.name_buffers)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added the documentation to the function in the PR at the top of this stack: https://github.com/pytorch/pytorch/pull/64515/files, will just leave it there if you're okay (plan is to land the whole stack together).
Proposal: #63041 Support custom buffer reduction in DDP via hook. This is required for an internal training use case and could be more broadly applicable to OSS scenarios. Differential Revision: [D30751152](https://our.internmc.facebook.com/intern/diff/D30751152/) cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse agolynski @SciPioneer @H-Huang mrzzd cbalioglu gcramer23 [ghstack-poisoned]
Proposal: #63041 Support custom buffer reduction in DDP via hook. This is required for an internal training use case and could be more broadly applicable to OSS scenarios. Differential Revision: [D30751152](https://our.internmc.facebook.com/intern/diff/D30751152/) cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse agolynski @SciPioneer @H-Huang mrzzd cbalioglu gcramer23 [ghstack-poisoned]
Proposal: #63041 Support custom buffer reduction in DDP via hook. This is required for an internal training use case and could be more broadly applicable to OSS scenarios. Differential Revision: [D30751152](https://our.internmc.facebook.com/intern/diff/D30751152/) cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse agolynski @SciPioneer @H-Huang mrzzd cbalioglu gcramer23 [ghstack-poisoned]
|
|
||
| # sync params according to location (before/after forward) user | ||
| # specified as part of hook, if hook was specified. | ||
| if ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moved to helper fn in the next diff
Proposal: #63041 Support custom buffer reduction in DDP via hook. This is required for an internal training use case and could be more broadly applicable to OSS scenarios. Differential Revision: [D30751152](https://our.internmc.facebook.com/intern/diff/D30751152/) cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse agolynski @SciPioneer @H-Huang mrzzd cbalioglu gcramer23 [ghstack-poisoned]
Pull Request resolved: #64513 Proposal: #63041 Support custom buffer reduction in DDP via hook ghstack-source-id: 138419449 Differential Revision: [D30751152](https://our.internmc.facebook.com/intern/diff/D30751152/)
Proposal: #63041 Support custom buffer reduction in DDP via hook. This is required for an internal training use case and could be more broadly applicable to OSS scenarios. Differential Revision: [D30751152](https://our.internmc.facebook.com/intern/diff/D30751152/) cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse agolynski @SciPioneer @H-Huang mrzzd cbalioglu gcramer23 [ghstack-poisoned]
Proposal: #63041 Support custom buffer reduction in DDP via hook. This is required for an internal training use case and could be more broadly applicable to OSS scenarios. Differential Revision: [D30751152](https://our.internmc.facebook.com/intern/diff/D30751152/) cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse agolynski @SciPioneer @H-Huang mrzzd cbalioglu gcramer23 [ghstack-poisoned]
|
This pull request has been merged in ce5981e. |
Stack from ghstack:
Proposal: #63041
Support custom buffer reduction in DDP via hook. This is required for an internal training use case and could be more broadly applicable to OSS scenarios.
Differential Revision: D30751152
cc @pietern @mrshenli @pritamdamania87 @zhaojuanmao @satgera @rohan-varma @gqchen @aazzolini @osalpekar @jiayisuse @agolynski @SciPioneer @H-Huang @mrzzd @cbalioglu @gcramer23