-
Notifications
You must be signed in to change notification settings - Fork 25.7k
Add mutex for CPU RNG and move TH to C++ #4041
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
This will break the Windows build and is not exception safe. We should use |
|
To clarify on the exception issue: a lot of functions in I suspect that moving all of TH to C++ would be a pain. You can avoid the C++ types in the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see @colesbury 's comments
|
@yf225 can you finish this this week, i think folks are blocked on it. |
|
@colesbury What do we think of using different generators for multi-threads (as @Stonesjtu suggested in #3794 (comment))? Perf-wise it might be faster than using mutex because we are not waiting for the lock at all, not sure about implementation yet. |
|
@yf225 the PR corresponding to that was closed. We should just finish this as it currently stands and merge. |
|
I think thread local generators are not a good idea because it’s easy to forget to seed a worker thread |
285e369 to
c76ce0d
Compare
aten/src/TH/THRandom.cpp
Outdated
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
|
@colesbury It seems that if I do it as an opaque struct pointer and use I guess moving TH to C++ land might be a cleaner way to go, although the code change would be quite big. |
|
Another issue is that if you want to actually acquire the mutex from C you are going to have to work hard to get unwinding to actually free it when necessary. |
aten/src/TH/THRandom.cpp
Outdated
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
|
This still has the issue that it's not exception safe. An exception thrown after the lock call will not release the lock, causing deadlocks the next time a random function is called. We should covert the C files to C++ and use |
|
We don't need to convert all of |
|
@colesbury I think if we put the |
|
@colesbury If we change Haven't figured out what's the appropriate solution to this yet, or if adding try-catch block to catch |
|
@yf225, yes. I think your current trick in As Adam Paszke points out, you can also simplify the definition of |
|
I don't think a try-catch block is a good solution. We should be moving towards using C++, even if it requires more work. |
|
After changing things to C++, https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/test/scalar_tensor_test.cpp#L214 seems to fail when The error message is: In the normal case, Any suggestions on how I should go about debugging this? EDIT: fixed in #4857. |
|
There are a lot of unused variable warnings in TH after moving things to C++. I can fix them here or submit another PR for it. |
|
@colesbury @apaszke Wondering is there anything else I should do for this PR? |
aten/src/TH/THRandom.cpp
Outdated
| self->left = 1; | ||
| self->seeded = 0; | ||
| self->normal_is_valid = 0; | ||
| self->mutex = new std::mutex(); |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
|
|
||
| void THTensor_(geometric)(THTensor *self, THGenerator *_generator, double p) | ||
| { | ||
| std::lock_guard<std::mutex> lock(*(_generator->mutex)); |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
| THArgCheck(THTensor_(isContiguous)(self), 1, "RNG state needs to be contiguous"); | ||
| rng_state = (THGenerator *)THTensor_(data)(self); | ||
| THGenerator_copy(rng_state, _generator); | ||
| rng_state->mutex = NULL; // mutex should not be part of the generator state |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
| _PS256_CONST_TYPE(inv_mant_mask, int, ~0x7f800000); | ||
|
|
||
| _PS256_CONST_TYPE(sign_mask, int, 0x80000000); | ||
| _PS256_CONST_TYPE(sign_mask, int, (int)0x80000000); |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
| THArgCheck(THTensor_(isContiguous)(self), 1, "RNG state needs to be contiguous"); | ||
| rng_state = (THGenerator *)THTensor_(data)(self); | ||
| THGenerator_copy(rng_state, _generator); | ||
| rng_state->mutex = NULL; // mutex should not be part of the generator state |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks pretty good. Three comments inline.
| _PS256_CONST_TYPE(inv_mant_mask, int, ~0x7f800000); | ||
|
|
||
| _PS256_CONST_TYPE(sign_mask, int, 0x80000000); | ||
| _PS256_CONST_TYPE(sign_mask, int, (int)0x80000000); |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
aten/src/TH/THRandom.cpp
Outdated
| { | ||
| memcpy(self, from, sizeof(THGenerator)); | ||
| THGeneratorState_copy(&self->gen_state, &from->gen_state); | ||
| new (&self->mutex) std::mutex(); |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
| { | ||
| static const size_t size = sizeof(THGenerator); | ||
| THGenerator *rng_state; | ||
| static const size_t size = sizeof(THGeneratorState); |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
This reverts commit 96239dd.
* Revert "Clarify grad_input_mask documentation in derivatives.yaml (#4963)" This reverts commit 6f3266b. * Revert "fix triu and tril for zero-strided inputs on gpu (#4962)" This reverts commit 6c197c2. * Revert "Add mutex for CPU RNG and move TH to C++ (#4041)" This reverts commit 96239dd. * Revert "Support multivariate TransformedDistributions (#4937)" This reverts commit ca5071d. * Revert "Only check that arguments are Variables in VariableType (#4943)" This reverts commit d444379. * Revert "torch.set_num_threads sets MKL option too (#4949)" This reverts commit 2aaeec0.
|
I think this breaks C FFI plugins since those are built with C, rather than C++. Specifically, I get this error: |
|
CC @yf225 @colesbury @soumith ^^ |
|
Drat! I guess, if we want to support FFI, we're going to need some C extern blocks and a C-compatible header for FFI plugins to import T_T |
|
Yeah, I actually wanted to write my header file in C++, but parser didn't support |
|
There’s a new C++-based extensions system underway. It might be simpler to recommend moving to that, but we should figure out an easy way to migrate (since this is breaking) |
|
@apaszke, I'd love to try new C++ extension out when it's ready. That said, would you support the C way for a while for backward compatibility? |
|
yes we will support the C extension. We'll issue fixes shortly on this. |
|
@alsrgv now fixed in master |
|
@soumith, thanks for the quick turnaround! |
This fixes #3794.
Manually tested and there is no perf regression coming from the change.