KEMBAR78
fix _setup_devices in case where there is no torch.distributed package in build by dlwh · Pull Request #16821 · huggingface/transformers · GitHub
Skip to content

Conversation

@dlwh
Copy link
Contributor

@dlwh dlwh commented Apr 18, 2022

What does this PR do?

At least in some instances (e.g. conda on my m1), torch is built without distributed support enabled. This takes the form of torch.distributed.is_available() returning false and torch.distributed.is_initialized() raising an exception. In this particular method, it's enough to skip the check if it's not available.

This behavior causing this crash was added in #16487

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).

  • Did you read the contributor guideline,
    Pull Request section?

  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.

  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.

  • Did you write any new necessary tests?

    I could mock this out but it seems painful and unnecessary. I can try to do it if you want.

Who can review?

Seems like @sgugger is the best reviewer here?

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Apr 18, 2022

The documentation is not available anymore as the PR was closed or merged.

Copy link
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks for fixing! Could you paste this in training_args_sm as well (just in case)?

@dlwh
Copy link
Contributor Author

dlwh commented Apr 18, 2022

done!

@sgugger sgugger merged commit 989a15d into huggingface:main Apr 18, 2022
@sgugger
Copy link
Collaborator

sgugger commented Apr 18, 2022

Thanks!

@dlwh dlwh deleted the check_if_distributed_available branch April 18, 2022 22:53
elusenji pushed a commit to elusenji/transformers that referenced this pull request Jun 12, 2022
…e in build (huggingface#16821)

* fix _setup_devices in case where there is not torch.distributed

* in training_args_sm.py as well
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants