refactor ps benchmark #60784

gcramer23 · 2021-06-25T20:54:34Z

Stack from ghstack:

add trainer hook functions #60785 add trainer hook functions
refactor ps benchmark #60784 refactor ps benchmark

This pr refactors the ps benchmark for modular trainers.

Differential Revision: D29697291

[ghstack-poisoned]

facebook-github-bot · 2021-06-25T20:54:41Z

💊 CI failures summary and remediations

As of commit c52cc12 (more details on the Dr. CI page and at hud.pytorch.org/pr/60784):

4/4 failures possibly* introduced in this PR
- 1/4 non-scanned failure(s)

🕵️ 2 new failures recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

pytorch_linux_xenial_py3_6_gcc5_4_build (1/2)

Step: "(Optional) Merge target branch" (full log | diagnosis details | 🔁 rerun)

Automatic merge failed; fix conflicts and then commit the result.

CONFLICT (add/add): Merge conflict in .circleci/config.yml
Auto-merging .circleci/config.yml
CONFLICT (add/add): Merge conflict in .circleci/cimodel/data/windows_build_definitions.py
Auto-merging .circleci/cimodel/data/windows_build_definitions.py
CONFLICT (add/add): Merge conflict in .circleci/cimodel/data/pytorch_build_data.py
Auto-merging .circleci/cimodel/data/pytorch_build_data.py
CONFLICT (add/add): Merge conflict in .azure_pipelines/job_templates/build-verify-publish-template-win.yml
Auto-merging .azure_pipelines/job_templates/build-verify-publish-template-win.yml
CONFLICT (add/add): Merge conflict in .azure_pipelines/job_templates/build-verify-publish-template-unix.yml
Auto-merging .azure_pipelines/job_templates/build-verify-publish-template-unix.yml
Automatic merge failed; fix conflicts and then commit the result.


Exited with code exit status 1

pytorch_xla_linux_bionic_py3_6_clang9_build (2/2)

Step: "(Optional) Merge target branch" (full log | diagnosis details | 🔁 rerun)

Automatic merge failed; fix conflicts and then commit the result.

CONFLICT (add/add): Merge conflict in .circleci/config.yml
Auto-merging .circleci/config.yml
CONFLICT (add/add): Merge conflict in .circleci/cimodel/data/windows_build_definitions.py
Auto-merging .circleci/cimodel/data/windows_build_definitions.py
CONFLICT (add/add): Merge conflict in .circleci/cimodel/data/pytorch_build_data.py
Auto-merging .circleci/cimodel/data/pytorch_build_data.py
CONFLICT (add/add): Merge conflict in .azure_pipelines/job_templates/build-verify-publish-template-win.yml
Auto-merging .azure_pipelines/job_templates/build-verify-publish-template-win.yml
CONFLICT (add/add): Merge conflict in .azure_pipelines/job_templates/build-verify-publish-template-unix.yml
Auto-merging .azure_pipelines/job_templates/build-verify-publish-template-unix.yml
Automatic merge failed; fix conflicts and then commit the result.


Exited with code exit status 1

1 failure not recognized by patterns:

Job	Step	Action
^{Linux CI (pytorch-linux-xenial-py3.6-gcc5.4) / pytorch_python_doc_build}	^{Chown workspace}	🔁 rerun

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

benchmarks/distributed/rpc/parameter_server/launcher.py

codecov · 2021-06-26T00:44:52Z

Codecov Report

Merging #60784 (33f39ac) into gh/gcramer23/16/base (e1bd496) will decrease coverage by 0.00%.
The diff coverage is n/a.

❗ Current head 33f39ac differs from pull request most recent head c49555e. Consider uploading reports for the commit c49555e to get more accurate results

@@                   Coverage Diff                    @@
##           gh/gcramer23/16/base   #60784      +/-   ##
========================================================
- Coverage                 76.22%   76.22%   -0.01%     
========================================================
  Files                      2061     2061              
  Lines                    205068   205068              
========================================================
- Hits                     156316   156307       -9     
- Misses                    48752    48761       +9

This pr refactors the ps benchmark for modular trainers. [ghstack-poisoned]

ghstack-source-id: fbfd100 Pull Request resolved: #60784

mrshenli

Three general comments

let's just modularize components that we expect users to override (i.e., overriding those would have impact specific to PS training efficiency).
let's try to improve the file structure a bit to avoid tiny and fragmented files.
The rest of the PyTorch is usually very careful when introducing new arguments to APIs, because more arguments usually is more confusing. Let's try to apply the same spirit here as well.

benchmarks/distributed/rpc/parameter_server/criterion_functions/cel.py

benchmarks/distributed/rpc/parameter_server/data/DummyData.py

benchmarks/distributed/rpc/parameter_server/preprocess_data_functions/preprocess_dummy_data.py

benchmarks/distributed/rpc/parameter_server/optimizer_functions/sgd_optimizer.py

benchmarks/distributed/rpc/parameter_server/trainers/DdpTrainer.py

benchmarks/distributed/rpc/parameter_server/data/DummyData.py

This pr refactors the ps benchmark for modular trainers. [ghstack-poisoned]

ghstack-source-id: c6ebf91 Pull Request resolved: #60784

This pr refactors the ps benchmark for modular trainers. [ghstack-poisoned]

benchmarks/distributed/rpc/parameter_server/trainer/criterions.py

benchmarks/distributed/rpc/parameter_server/trainer/trainer.py

This pr refactors the ps benchmark for modular trainers. [ghstack-poisoned]

gcramer23 · 2021-07-14T16:10:37Z

@gcramer23 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot · 2021-07-14T20:19:42Z

@gcramer23 merged this pull request in 304c02e.

refactor ps benchmark

16c092c

[ghstack-poisoned]

facebook-github-bot added oncall: distributed Add this issue/PR to distributed oncall triage queue cla signed labels Jun 25, 2021

This was referenced Jun 25, 2021

add trainer hook functions #60785

Closed

add ps experiments #60786

Closed

gcramer23 requested review from H-Huang and mrshenli June 25, 2021 20:55

gcramer23 commented Jun 25, 2021

View reviewed changes

benchmarks/distributed/rpc/parameter_server/launcher.py Show resolved Hide resolved

Update on "refactor ps benchmark"

33f39ac

This pr refactors the ps benchmark for modular trainers. [ghstack-poisoned]

gcramer23 added a commit that referenced this pull request Jun 26, 2021

refactor ps benchmark

3b2dc7c

ghstack-source-id: fbfd100 Pull Request resolved: #60784

mrshenli reviewed Jun 29, 2021

View reviewed changes

Update on "refactor ps benchmark"

969fae5

This pr refactors the ps benchmark for modular trainers. [ghstack-poisoned]

Update on "refactor ps benchmark"

c49555e

This pr refactors the ps benchmark for modular trainers. [ghstack-poisoned]

gcramer23 added a commit that referenced this pull request Jun 30, 2021

refactor ps benchmark

d383d2b

ghstack-source-id: c6ebf91 Pull Request resolved: #60784

gcramer23 added 4 commits June 30, 2021 05:29

Update on "refactor ps benchmark"

b374f2e

This pr refactors the ps benchmark for modular trainers. [ghstack-poisoned]

Update on "refactor ps benchmark"

38feeb5

This pr refactors the ps benchmark for modular trainers. [ghstack-poisoned]

Update on "refactor ps benchmark"

353e6e5

This pr refactors the ps benchmark for modular trainers. [ghstack-poisoned]

Update on "refactor ps benchmark"

c406e4e

This pr refactors the ps benchmark for modular trainers. [ghstack-poisoned]

H-Huang approved these changes Jul 6, 2021

View reviewed changes

benchmarks/distributed/rpc/parameter_server/trainer/criterions.py Outdated Show resolved Hide resolved

mrshenli approved these changes Jul 9, 2021

View reviewed changes

benchmarks/distributed/rpc/parameter_server/trainer/trainer.py Outdated Show resolved Hide resolved

gcramer23 added 2 commits July 14, 2021 14:26

Update on "refactor ps benchmark"

71ce9fc

This pr refactors the ps benchmark for modular trainers. [ghstack-poisoned]

Update on "refactor ps benchmark"

c52cc12

This pr refactors the ps benchmark for modular trainers. [ghstack-poisoned]

facebook-github-bot closed this in 304c02e Jul 14, 2021

facebook-github-bot added the Merged label Jul 14, 2021

gcramer23 deleted the gh/gcramer23/16/head branch July 14, 2021 20:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

refactor ps benchmark #60784

refactor ps benchmark #60784

Uh oh!

gcramer23 commented Jun 25, 2021 •

edited

Loading

Uh oh!

facebook-github-bot commented Jun 25, 2021 •

edited

Loading

Uh oh!

Uh oh!

codecov bot commented Jun 26, 2021 •

edited

Loading

Uh oh!

mrshenli left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gcramer23 commented Jul 14, 2021

Uh oh!

facebook-github-bot commented Jul 14, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

refactor ps benchmark #60784

refactor ps benchmark #60784

Uh oh!

Conversation

gcramer23 commented Jun 25, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

facebook-github-bot commented Jun 25, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

💊 CI failures summary and remediations

🕵️ 2 new failures recognized by patterns

pytorch_linux_xenial_py3_6_gcc5_4_build (1/2)

pytorch_xla_linux_bionic_py3_6_clang9_build (2/2)

1 failure not recognized by patterns:

Uh oh!

Uh oh!

codecov bot commented Jun 26, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

mrshenli left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gcramer23 commented Jul 14, 2021

Uh oh!

facebook-github-bot commented Jul 14, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

gcramer23 commented Jun 25, 2021 •

edited

Loading

facebook-github-bot commented Jun 25, 2021 •

edited

Loading

codecov bot commented Jun 26, 2021 •

edited

Loading