-
Notifications
You must be signed in to change notification settings - Fork 25.7k
Ensure thread id is valid in nested parallel regions #60183
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Fixes #59149 (comment) `parallel_for` will call the function directly if it would have run on only a single thread anyway. This is great for performance, but causes an issue in nested parallel regions because `get_thread_num` will reflect the parent parallel region instead of the current `parallel_for` call. I fix this by using a `thread_local` variable for the current thread id and manually setting it before each call to the user-provided function. [ghstack-poisoned]
💊 CI failures summary and remediationsAs of commit 6677a6e (more details on the Dr. CI page and at hud.pytorch.org/pr/60183):
🕵️ 4 new failures recognized by patternsThe following CI failures do not appear to be due to upstream breakages:
|
| Job | Step | Action |
|---|---|---|
| Report results | 🔁 rerun |
1 job timed out:
pytorch_linux_bionic_py3_6_clang9_noarch_test
This comment was automatically generated by Dr. CI (expand for details).
Follow this link to opt-out of these comments for your Pull Requests.Please report bugs/suggestions to the (internal) Dr. CI Users group.
|
Does only OpenMP need changes here? What about other parallel backends? No changes needed, but it's a pattern that you've eliminated recently in reduction, so now it became slightly more expensive. |
Fixes #59149 (comment) `parallel_for` will call the function directly if it would have run on only a single thread anyway. This is great for performance, but causes an issue in nested parallel regions because `get_thread_num` will reflect the parent parallel region instead of the current `parallel_for` call. I fix this by using a `thread_local` variable for the current thread id and manually setting it before each call to the user-provided function. [ghstack-poisoned]
Fixes #59149 (comment) `parallel_for` will call the function directly if it would have run on only a single thread anyway. This is great for performance, but causes an issue in nested parallel regions because `get_thread_num` will reflect the parent parallel region instead of the current `parallel_for` call. I fix this by using a `thread_local` variable for the current thread id and manually setting it before each call to the user-provided function. [ghstack-poisoned]
I think your right, I've made similar changes to the other backends as well. |
Fixes #59149 (comment) `parallel_for` will call the function directly if it would have run on only a single thread anyway. This is great for performance, but causes an issue in nested parallel regions because `get_thread_num` will reflect the parent parallel region instead of the current `parallel_for` call. I fix this by using a `thread_local` variable for the current thread id and manually setting it before each call to the user-provided function. [ghstack-poisoned]
Fixes #59149 (comment) `parallel_for` will call the function directly if it would have run on only a single thread anyway. This is great for performance, but causes an issue in nested parallel regions because `get_thread_num` will reflect the parent parallel region instead of the current `parallel_for` call. I fix this by using a `thread_local` variable for the current thread id and manually setting it before each call to the user-provided function. [ghstack-poisoned]
Fixes #59149 (comment) `parallel_for` will call the function directly if it would have run on only a single thread anyway. This is great for performance, but causes an issue in nested parallel regions because `get_thread_num` will reflect the parent parallel region instead of the current `parallel_for` call. I fix this by using a `thread_local` variable for the current thread id and manually setting it before each call to the user-provided function. [ghstack-poisoned]
Fixes #59149 (comment) `parallel_for` will call the function directly if it would have run on only a single thread anyway. This is great for performance, but causes an issue in nested parallel regions because `get_thread_num` will reflect the parent parallel region instead of the current `parallel_for` call. I fix this by using a `thread_local` variable for the current thread id and manually setting it before each call to the user-provided function. [ghstack-poisoned]
|
Great, can you please add a test for it (probably cpp test)? |
|
@ngimel has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
Fixes #59149 (comment) `parallel_for` will call the function directly if it would have run on only a single thread anyway. This is great for performance, but causes an issue in nested parallel regions because `get_thread_num` will reflect the parent parallel region instead of the current `parallel_for` call. I fix this by using a `thread_local` variable for the current thread id and manually setting it before each call to the user-provided function. Differential Revision: [D29287816](https://our.internmc.facebook.com/intern/diff/D29287816) [ghstack-poisoned]
| TEST(TestParallel, TestParallel) { | ||
| manual_seed(123); | ||
| set_num_threads(1); | ||
| NumThreadsGuard guard(1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test was having the global side-effect of disabling multi-threading.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for fixing this!
|
@ngimel has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
Stack from ghstack:
Fixes #59149 (comment)
parallel_forwill call the function directly if it would have run on only asingle thread anyway. This is great for performance, but causes an issue in
nested parallel regions because
get_thread_numwill reflect the parentparallel region instead of the current
parallel_forcall.I fix this by using a
thread_localvariable for the current thread id andmanually setting it before each call to the user-provided function.
Differential Revision: D29287816