[trainer] sharded _load_best_model #17150

stas00 · 2022-05-10T03:11:22Z

Looks like a copy-in-paste issue. This code path is probably untested.

probably needs a test?

HuggingFaceDocBuilderDev · 2022-05-10T03:29:07Z

The documentation is not available anymore as the PR was closed or merged.

sgugger

Thanks for fixing. It is currently untested as there is no way to activate checkpoint sharding from the Trainer without training a very large model, which would is unfeasible on any of the CI runners.

stas00 · 2022-05-10T14:58:39Z

Thank you for explaining why the testing of this path is complicated, Sylvain.

I think I can make it partially tested by using zero3 w/o "stage3_gather_16bit_weights_on_model_save" which would make it fall through and at least test that condition. I will be adding these tests here #17151

* [trainer] sharded _load_best_model probably needs a test? * undo delete

stas00 added 2 commits May 9, 2022 20:11

[trainer] sharded _load_best_model

7f3e084

probably needs a test?

undo delete

d5b6ced

sgugger approved these changes May 10, 2022

View reviewed changes

stas00 merged commit 9aeacfe into main May 10, 2022

stas00 deleted the stas00-patch-1 branch May 10, 2022 14:58

ArthurZucker pushed a commit to ArthurZucker/transformers that referenced this pull request May 12, 2022

[trainer] sharded _load_best_model (huggingface#17150)

72018c2

* [trainer] sharded _load_best_model probably needs a test? * undo delete

elusenji pushed a commit to elusenji/transformers that referenced this pull request Jun 12, 2022

[trainer] sharded _load_best_model (huggingface#17150)

a5aa384

* [trainer] sharded _load_best_model probably needs a test? * undo delete

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[trainer] sharded _load_best_model #17150

[trainer] sharded _load_best_model #17150

Uh oh!

stas00 commented May 10, 2022 •

edited

Loading

Uh oh!

HuggingFaceDocBuilderDev commented May 10, 2022 •

edited

Loading

Uh oh!

sgugger left a comment

Uh oh!

stas00 commented May 10, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[trainer] sharded _load_best_model #17150

[trainer] sharded _load_best_model #17150

Uh oh!

Conversation

stas00 commented May 10, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented May 10, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sgugger left a comment

Choose a reason for hiding this comment

Uh oh!

stas00 commented May 10, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

stas00 commented May 10, 2022 •

edited

Loading

HuggingFaceDocBuilderDev commented May 10, 2022 •

edited

Loading