KEMBAR78
A quick fix for Stream operation errors on non-current device by mrshenli · Pull Request #15689 · pytorch/pytorch · GitHub
Skip to content

Conversation

@mrshenli
Copy link
Contributor

@mrshenli mrshenli commented Jan 3, 2019

see #15682

This is a quick fix by implementing the simpler solution as suggested by @colesbury. As benchmark result shows, it slows down Stream.query() by ~20%, I would be happy to further pursue a more complex solution by implementing this in C++/ATen. But I would still vote for merge this quick fix first just to get rid of the bug sooner.

Test TBA Added

FYI @jeffreyksmithjr

Benchmark

now

In [1]: def f():
   ...:     d0 = torch.device('cuda:0')
   ...:     d1 = torch.device('cuda:1')
   ...:     with torch.cuda.device(d0):
   ...:         s0 = torch.cuda.current_stream()
   ...:     with torch.cuda.device(d1):
   ...:         s1 = torch.cuda.current_stream()
   ...:     s0.query()
   ...:     s1.query()

In [4]: %timeit f()
38.1 µs ± 4.2 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [5]: %timeit f()
37.6 µs ± 2.7 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

before

In [4]: %timeit f()
28.5 µs ± 1.74 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [5]: %timeit f()
35.3 µs ± 2.91 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Copy link
Contributor

@ezyang ezyang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Someone else will need to merge

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mrshenli has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@mrshenli
Copy link
Contributor Author

mrshenli commented Jan 3, 2019

Hi @ezyang thanks for reviewing the code.

Could you please help to click the rerun button of this failed test. It was aborted for don't know what reason. I don't have a write permission to do so. Thanks.

@ezyang
Copy link
Contributor

ezyang commented Jan 3, 2019

@mrshenli If you go to 'oss pytorch' you can add yourself as a member which should give you access. (In any case I restarted your job.)

@gchanan
Copy link
Contributor

gchanan commented Jan 3, 2019

doesn't this issue exist on a bunch of functions (not just query?). Is there some more holistic approach we could take here?

@mrshenli
Copy link
Contributor Author

mrshenli commented Jan 3, 2019

@gchanan

I am thinking about implementing something like torch._C._cuda_queryStream, and replacing cudaStreamQuery invocations with the new implementation (not sure). I initially wanted to change cudaStreamQuery directly, but it seems to be an API in cuda which we cannot modify (maybe we can talk to NVIDIA??).

@mrshenli
Copy link
Contributor Author

mrshenli commented Jan 3, 2019

Hmm.., the new test keeps hitting error in one of the ci test...

@mrshenli
Copy link
Contributor Author

mrshenli commented Jan 3, 2019

test_streams_multi_gpu_query (test_cuda.TestCuda) keep failing (no error message) on rocmdeb, will skip the test on that platform.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mrshenli has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mrshenli has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@mrshenli mrshenli deleted the stream branch January 17, 2019 21:02
@ezyang ezyang added the merged label Jun 25, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants