KEMBAR78
[DDP] Custom buffer reduction by rohan-varma · Pull Request #64513 · pytorch/pytorch · GitHub
Skip to content

Conversation

@rohan-varma
Copy link
Contributor

@rohan-varma rohan-varma commented Sep 5, 2021

Stack from ghstack:

Proposal: #63041
Support custom buffer reduction in DDP via hook. This is required for an internal training use case and could be more broadly applicable to OSS scenarios.

Differential Revision: D30751152

cc @pietern @mrshenli @pritamdamania87 @zhaojuanmao @satgera @rohan-varma @gqchen @aazzolini @osalpekar @jiayisuse @agolynski @SciPioneer @H-Huang @mrzzd @cbalioglu @gcramer23

Proposal: #63041
Support custom buffer reduction in DDP via hook

Differential Revision: [D30751152](https://our.internmc.facebook.com/intern/diff/D30751152/)

[ghstack-poisoned]
@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented Sep 5, 2021

🔗 Helpful links

💊 CI failures summary and remediations

As of commit fe5d59b (more details on the Dr. CI page):


  • 1/1 failures introduced in this PR

🕵️ 1 new failure recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

See GitHub Actions build linux-bionic-py3.8-gcc9-coverage / test (default, 1, 2, linux.2xlarge) (1/1)

Step: "Unknown" (full log | diagnosis details | 🔁 rerun)

2021-09-21T21:26:55.2910962Z CONTINUE_THROUGH_ERROR: false
  "oncall: distributed",
  "cla signed"
]
2021-09-21T21:26:55.2906556Z   GITHUB_TOKEN: ***
2021-09-21T21:26:55.2907472Z   DOCKER_IMAGE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-bionic-py3.8-gcc9:74e757e8b0cf750d2f91db6aa4c29640abce32ea
2021-09-21T21:26:55.2908630Z   JOB_BASE_NAME: linux-bionic-py3.8-gcc9-coverage-test
2021-09-21T21:26:55.2909183Z   TEST_CONFIG: default
2021-09-21T21:26:55.2909496Z   SHARD_NUMBER: 1
2021-09-21T21:26:55.2909801Z   NUM_TEST_SHARDS: 2
2021-09-21T21:26:55.2910150Z   PYTORCH_IGNORE_DISABLED_ISSUES: 
2021-09-21T21:26:55.2910962Z   CONTINUE_THROUGH_ERROR: false
2021-09-21T21:26:55.2911284Z   SHM_SIZE: 1g
2021-09-21T21:26:55.2911567Z   PR_NUMBER: 64513
2021-09-21T21:26:55.2911826Z   IS_GHA: 1
2021-09-21T21:26:55.2912123Z   CIRCLE_BRANCH: pull/64513
2021-09-21T21:26:55.2912455Z   CIRCLE_PR_NUMBER: 64513
2021-09-21T21:26:55.2912925Z   CIRCLE_SHA1: fe5d59bf8fa514183e44f65287847a601cd643e0
2021-09-21T21:26:55.2913443Z   AWS_DEFAULT_REGION: us-east-1
2021-09-21T21:26:55.2913778Z ##[endgroup]
2021-09-21T21:27:08.4533513Z Processing ./dist/torch-1.10.0a0+gitb0b4960-cp38-cp38-linux_x86_64.whl
2021-09-21T21:27:08.4797762Z Requirement already satisfied: typing-extensions in /opt/conda/lib/python3.8/site-packages (from torch==1.10.0a0+gitb0b4960) (3.10.0.2)

This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

Proposal: #63041
Support custom buffer reduction in DDP via hook. This is required for an internal training use case and could be more broadly applicable to OSS scenarios. 

Differential Revision: [D30751152](https://our.internmc.facebook.com/intern/diff/D30751152/)

cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse agolynski @SciPioneer @H-Huang mrzzd cbalioglu gcramer23

[ghstack-poisoned]
Comment on lines +1249 to +1250
state,
hook: callable,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: make state after hook? following up (func, args) convention?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing this out!

I followed the same order as comm hook (state, hook): https://github.com/pytorch/pytorch/blob/master/torch/nn/parallel/distributed.py#L1224.

Should we try to keep it the same or change the order here?


@dataclass
class _BufferCommHook:
buffer_comm_hook: Callable
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: document how the signature of a buffer_comm_hook looks like:

def buffer_comm_hook(state, self.name_buffers)

Copy link
Contributor Author

@rohan-varma rohan-varma Sep 17, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the documentation to the function in the PR at the top of this stack: https://github.com/pytorch/pytorch/pull/64515/files, will just leave it there if you're okay (plan is to land the whole stack together).

Proposal: #63041
Support custom buffer reduction in DDP via hook. This is required for an internal training use case and could be more broadly applicable to OSS scenarios. 

Differential Revision: [D30751152](https://our.internmc.facebook.com/intern/diff/D30751152/)

cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse agolynski @SciPioneer @H-Huang mrzzd cbalioglu gcramer23

[ghstack-poisoned]
Proposal: #63041
Support custom buffer reduction in DDP via hook. This is required for an internal training use case and could be more broadly applicable to OSS scenarios. 

Differential Revision: [D30751152](https://our.internmc.facebook.com/intern/diff/D30751152/)

cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse agolynski @SciPioneer @H-Huang mrzzd cbalioglu gcramer23

[ghstack-poisoned]
Proposal: #63041
Support custom buffer reduction in DDP via hook. This is required for an internal training use case and could be more broadly applicable to OSS scenarios. 

Differential Revision: [D30751152](https://our.internmc.facebook.com/intern/diff/D30751152/)

cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse agolynski @SciPioneer @H-Huang mrzzd cbalioglu gcramer23

[ghstack-poisoned]

# sync params according to location (before/after forward) user
# specified as part of hook, if hook was specified.
if (
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved to helper fn in the next diff

Proposal: #63041
Support custom buffer reduction in DDP via hook. This is required for an internal training use case and could be more broadly applicable to OSS scenarios. 

Differential Revision: [D30751152](https://our.internmc.facebook.com/intern/diff/D30751152/)

cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse agolynski @SciPioneer @H-Huang mrzzd cbalioglu gcramer23

[ghstack-poisoned]
rohan-varma added a commit that referenced this pull request Sep 18, 2021
Pull Request resolved: #64513

Proposal: #63041
Support custom buffer reduction in DDP via hook
ghstack-source-id: 138419449

Differential Revision: [D30751152](https://our.internmc.facebook.com/intern/diff/D30751152/)
Proposal: #63041
Support custom buffer reduction in DDP via hook. This is required for an internal training use case and could be more broadly applicable to OSS scenarios. 

Differential Revision: [D30751152](https://our.internmc.facebook.com/intern/diff/D30751152/)

cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse agolynski @SciPioneer @H-Huang mrzzd cbalioglu gcramer23

[ghstack-poisoned]
Proposal: #63041
Support custom buffer reduction in DDP via hook. This is required for an internal training use case and could be more broadly applicable to OSS scenarios. 

Differential Revision: [D30751152](https://our.internmc.facebook.com/intern/diff/D30751152/)

cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse agolynski @SciPioneer @H-Huang mrzzd cbalioglu gcramer23

[ghstack-poisoned]
@facebook-github-bot
Copy link
Contributor

This pull request has been merged in ce5981e.

@facebook-github-bot facebook-github-bot deleted the gh/rohan-varma/404/head branch September 26, 2021 14:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla signed Merged oncall: distributed Add this issue/PR to distributed oncall triage queue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants