KEMBAR78
refactor ps benchmark by gcramer23 · Pull Request #60784 · pytorch/pytorch · GitHub
Skip to content

Conversation

@gcramer23
Copy link
Contributor

@gcramer23 gcramer23 commented Jun 25, 2021

Stack from ghstack:

This pr refactors the ps benchmark for modular trainers.

Differential Revision: D29697291

[ghstack-poisoned]
@facebook-github-bot facebook-github-bot added oncall: distributed Add this issue/PR to distributed oncall triage queue cla signed labels Jun 25, 2021
@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented Jun 25, 2021

💊 CI failures summary and remediations

As of commit c52cc12 (more details on the Dr. CI page and at hud.pytorch.org/pr/60784):


  • 4/4 failures possibly* introduced in this PR
    • 1/4 non-scanned failure(s)

🕵️ 2 new failures recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

See CircleCI build pytorch_linux_xenial_py3_6_gcc5_4_build (1/2)

Step: "(Optional) Merge target branch" (full log | diagnosis details | 🔁 rerun)

Automatic merge failed; fix conflicts and then commit the result.
CONFLICT (add/add): Merge conflict in .circleci/config.yml
Auto-merging .circleci/config.yml
CONFLICT (add/add): Merge conflict in .circleci/cimodel/data/windows_build_definitions.py
Auto-merging .circleci/cimodel/data/windows_build_definitions.py
CONFLICT (add/add): Merge conflict in .circleci/cimodel/data/pytorch_build_data.py
Auto-merging .circleci/cimodel/data/pytorch_build_data.py
CONFLICT (add/add): Merge conflict in .azure_pipelines/job_templates/build-verify-publish-template-win.yml
Auto-merging .azure_pipelines/job_templates/build-verify-publish-template-win.yml
CONFLICT (add/add): Merge conflict in .azure_pipelines/job_templates/build-verify-publish-template-unix.yml
Auto-merging .azure_pipelines/job_templates/build-verify-publish-template-unix.yml
Automatic merge failed; fix conflicts and then commit the result.


Exited with code exit status 1

See CircleCI build pytorch_xla_linux_bionic_py3_6_clang9_build (2/2)

Step: "(Optional) Merge target branch" (full log | diagnosis details | 🔁 rerun)

Automatic merge failed; fix conflicts and then commit the result.
CONFLICT (add/add): Merge conflict in .circleci/config.yml
Auto-merging .circleci/config.yml
CONFLICT (add/add): Merge conflict in .circleci/cimodel/data/windows_build_definitions.py
Auto-merging .circleci/cimodel/data/windows_build_definitions.py
CONFLICT (add/add): Merge conflict in .circleci/cimodel/data/pytorch_build_data.py
Auto-merging .circleci/cimodel/data/pytorch_build_data.py
CONFLICT (add/add): Merge conflict in .azure_pipelines/job_templates/build-verify-publish-template-win.yml
Auto-merging .azure_pipelines/job_templates/build-verify-publish-template-win.yml
CONFLICT (add/add): Merge conflict in .azure_pipelines/job_templates/build-verify-publish-template-unix.yml
Auto-merging .azure_pipelines/job_templates/build-verify-publish-template-unix.yml
Automatic merge failed; fix conflicts and then commit the result.


Exited with code exit status 1


1 failure not recognized by patterns:

Job Step Action
GitHub Actions Linux CI (pytorch-linux-xenial-py3.6-gcc5.4) / pytorch_python_doc_build Chown workspace 🔁 rerun

This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

This was referenced Jun 25, 2021
@gcramer23 gcramer23 requested review from H-Huang and mrshenli June 25, 2021 20:55
@codecov
Copy link

codecov bot commented Jun 26, 2021

Codecov Report

Merging #60784 (33f39ac) into gh/gcramer23/16/base (e1bd496) will decrease coverage by 0.00%.
The diff coverage is n/a.

❗ Current head 33f39ac differs from pull request most recent head c49555e. Consider uploading reports for the commit c49555e to get more accurate results

@@                   Coverage Diff                    @@
##           gh/gcramer23/16/base   #60784      +/-   ##
========================================================
- Coverage                 76.22%   76.22%   -0.01%     
========================================================
  Files                      2061     2061              
  Lines                    205068   205068              
========================================================
- Hits                     156316   156307       -9     
- Misses                    48752    48761       +9     

This pr refactors the ps benchmark for modular trainers.

[ghstack-poisoned]
gcramer23 added a commit that referenced this pull request Jun 26, 2021
ghstack-source-id: fbfd100
Pull Request resolved: #60784
Copy link
Contributor

@mrshenli mrshenli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Three general comments

  1. let's just modularize components that we expect users to override (i.e., overriding those would have impact specific to PS training efficiency).
  2. let's try to improve the file structure a bit to avoid tiny and fragmented files.
  3. The rest of the PyTorch is usually very careful when introducing new arguments to APIs, because more arguments usually is more confusing. Let's try to apply the same spirit here as well.

This pr refactors the ps benchmark for modular trainers.

[ghstack-poisoned]
This pr refactors the ps benchmark for modular trainers.

[ghstack-poisoned]
gcramer23 added a commit that referenced this pull request Jun 30, 2021
ghstack-source-id: c6ebf91
Pull Request resolved: #60784
This pr refactors the ps benchmark for modular trainers.

[ghstack-poisoned]
This pr refactors the ps benchmark for modular trainers.

[ghstack-poisoned]
This pr refactors the ps benchmark for modular trainers.

[ghstack-poisoned]
This pr refactors the ps benchmark for modular trainers.

[ghstack-poisoned]
This pr refactors the ps benchmark for modular trainers.

[ghstack-poisoned]
This pr refactors the ps benchmark for modular trainers.

[ghstack-poisoned]
@gcramer23
Copy link
Contributor Author

@gcramer23 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@gcramer23 merged this pull request in 304c02e.

@gcramer23 gcramer23 deleted the gh/gcramer23/16/head branch July 14, 2021 20:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla signed Merged oncall: distributed Add this issue/PR to distributed oncall triage queue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants