-
Notifications
You must be signed in to change notification settings - Fork 25.7k
wrap cudaStreamSynchronize calls #61889
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
💊 CI failures summary and remediationsAs of commit 5febc92 (more details on the Dr. CI page and at hud.pytorch.org/pr/61889):
🕵️ 1 new failure recognized by patternsThe following CI failures do not appear to be due to upstream breakages:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Handling the ifdef in one place is a nice bonus. Will you add a lint to flag bare occurrences of these calls?
|
Yes, will add a lint! |
|
@ngimel has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
|
@ngimel has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
|
@ngimel has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
1 similar comment
|
@ngimel has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
|
@ngimel Any chance Also cc @ezyang @ailzhang @asuhan @JackCaoG as we first identify this issue when supporting detection/segmentation models in PyTorch XLA but then find out it is mainly from the limitation (tensor shape must be concrete value) in PyTorch core. |
|
Given the existing semantics of the operation, removing the synchronization is not possible. What we should do, however, is support JAX's extension to the nonzero API https://jax.readthedocs.io/en/latest/_autosummary/jax.numpy.nonzero.html where an explicit size can be specified, giving an upper bound to the number of nonzero entries that will be returned (and zero padded if there aren't enough). Can you file an issue for this? |
|
We would need this extension not only for nonzero, but also for indexing ops with mask, the most common situation when people encounter this particular sync is |
Raised in #62320 |
Summary: This is a first step towards creating context manager that errors out on synchronizing calls. Pull Request resolved: pytorch/pytorch#61889 Reviewed By: albanD Differential Revision: D29805280 Pulled By: ngimel fbshipit-source-id: b66400fbe0941b7daa51e6b30abe27b9cccd4e8a
Summary: This is a first step towards creating context manager that errors out on synchronizing calls. Pull Request resolved: pytorch/pytorch#61889 Reviewed By: albanD Differential Revision: D29805280 Pulled By: ngimel fbshipit-source-id: b66400fbe0941b7daa51e6b30abe27b9cccd4e8a
This is a first step towards creating context manager that errors out on synchronizing calls.