KEMBAR78
Kill async pushes when calling push_to_hub with blocking=True by sgugger · Pull Request #16755 · huggingface/transformers · GitHub
Skip to content

Conversation

@sgugger
Copy link
Collaborator

@sgugger sgugger commented Apr 13, 2022

What does this PR do?

This PR fixes a bug that sometimes appear in the Trainer when push_to_hub=True: if one of the async pushes finishes after a regular non-async push, the history gets messed up and we end up with an error like this:

The push command with PID 1468 failed.
remote: error: cannot lock ref 'refs/heads/main': is at 07c85fd69cd46a7daee6323c5a5eefc3e6a886da but expected 1fd7f122ef725f8e340a14bc97d537812de44076      

To fix this, when the Trainer (or the user) calls push_to_hub with blocking=True, we interrupt any push in progress. The commit history will still be good since the commit don't take time.

cc @philschmid who had the error.

@sgugger sgugger requested a review from LysandreJik April 13, 2022 15:53
@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Apr 13, 2022

The documentation is not available anymore as the PR was closed or merged.

@LysandreJik
Copy link
Member

I think this is the right solution, but I wonder if we can't test it in a reproducible manner. Maybe we could do something like the following:

  • Commit a largish file, and push that
  • Reset the head without keeping the changes so that we're back on the previous commit
  • Commit a small file, and push that.

Now the first commit will likely fail with the error above. Unfortunately, this is likely a very heavy test, so I'm not sure it should be part of the CI. It can stil be used to test the validity of the solution above, though!

Copy link
Member

@LysandreJik LysandreJik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM as @sgugger has tested it offline

@sgugger
Copy link
Collaborator Author

sgugger commented Apr 14, 2022

Actually @philschmid but let's say we are the same person ;-)

@sgugger sgugger merged commit 3e4eec4 into main Apr 14, 2022
@sgugger sgugger deleted the push_to_hub_fix_async branch April 14, 2022 14:02
@philschmid
Copy link
Contributor

For more context tested with roberta-large

elusenji pushed a commit to elusenji/transformers that referenced this pull request Jun 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants