Ensure thread id is valid in nested parallel regions #60183

peterbell10 · 2021-06-17T16:22:12Z

Stack from ghstack:

Ensure num_threads is initialized before calling omp_get_max_threads #60185 Ensure num_threads is initialized before calling omp_get_max_threads
OpenMP: Refactor parallel_reduce to share code with parallel_for #60184 OpenMP: Refactor parallel_reduce to share code with parallel_for
Ensure thread id is valid in nested parallel regions #60183 Ensure thread id is valid in nested parallel regions

parallel_for will call the function directly if it would have run on only a
single thread anyway. This is great for performance, but causes an issue in
nested parallel regions because get_thread_num will reflect the parent
parallel region instead of the current parallel_for call.

I fix this by using a thread_local variable for the current thread id and
manually setting it before each call to the user-provided function.

Differential Revision: D29287816

Fixes #59149 (comment) `parallel_for` will call the function directly if it would have run on only a single thread anyway. This is great for performance, but causes an issue in nested parallel regions because `get_thread_num` will reflect the parent parallel region instead of the current `parallel_for` call. I fix this by using a `thread_local` variable for the current thread id and manually setting it before each call to the user-provided function. [ghstack-poisoned]

facebook-github-bot · 2021-06-17T16:22:18Z

💊 CI failures summary and remediations

As of commit 6677a6e (more details on the Dr. CI page and at hud.pytorch.org/pr/60183):

5/5 failures introduced in this PR

🕵️ 4 new failures recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

pytorch_linux_xenial_py3_6_gcc5_4_build (1/4)

Step: "(Optional) Merge target branch" (full log | diagnosis details | 🔁 rerun)

Automatic merge failed; fix conflicts and then commit the result.

CONFLICT (add/add): Merge conflict in .circleci/scripts/windows_cuda_install.sh
Auto-merging .circleci/scripts/windows_cuda_install.sh
CONFLICT (add/add): Merge conflict in .circleci/scripts/setup_ci_environment.sh
Auto-merging .circleci/scripts/setup_ci_environment.sh
CONFLICT (add/add): Merge conflict in .circleci/docker/build.sh
Auto-merging .circleci/docker/build.sh
CONFLICT (add/add): Merge conflict in .circleci/config.yml
Auto-merging .circleci/config.yml
CONFLICT (add/add): Merge conflict in .circleci/cimodel/data/windows_build_definitions.py
Auto-merging .circleci/cimodel/data/windows_build_definitions.py
Automatic merge failed; fix conflicts and then commit the result.


Exited with code exit status 1

pytorch_xla_linux_bionic_py3_6_clang9_build (2/4)

Step: "(Optional) Merge target branch" (full log | diagnosis details | 🔁 rerun)

Automatic merge failed; fix conflicts and then commit the result.

CONFLICT (add/add): Merge conflict in .circleci/scripts/windows_cuda_install.sh
Auto-merging .circleci/scripts/windows_cuda_install.sh
CONFLICT (add/add): Merge conflict in .circleci/scripts/setup_ci_environment.sh
Auto-merging .circleci/scripts/setup_ci_environment.sh
CONFLICT (add/add): Merge conflict in .circleci/docker/build.sh
Auto-merging .circleci/docker/build.sh
CONFLICT (add/add): Merge conflict in .circleci/config.yml
Auto-merging .circleci/config.yml
CONFLICT (add/add): Merge conflict in .circleci/cimodel/data/windows_build_definitions.py
Auto-merging .circleci/cimodel/data/windows_build_definitions.py
Automatic merge failed; fix conflicts and then commit the result.


Exited with code exit status 1

pytorch_macos_10_13_py3_test (3/4)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

Jun 22 14:36:05 ERROR [0.004s]: test_poisson_sample (__main__.TestDistributions)

Jun 22 14:36:05   File "distributions/test_distributions.py", line 805, in _check_sampler_discrete
Jun 22 14:36:05     chisq, p = scipy.stats.chisquare(counts[msk], pmf[msk] * num_samples)
Jun 22 14:36:05   File "/Users/distiller/workspace/miniconda3/lib/python3.7/site-packages/scipy/stats/stats.py", line 6853, in chisquare
Jun 22 14:36:05     lambda_="pearson")
Jun 22 14:36:05   File "/Users/distiller/workspace/miniconda3/lib/python3.7/site-packages/scipy/stats/stats.py", line 6694, in power_divergence
Jun 22 14:36:05     raise ValueError(msg)
Jun 22 14:36:05 ValueError: For each axis slice, the sum of the observed frequencies must agree with the sum of the expected frequencies to a relative tolerance of 1e-08, but the percent differences are:
Jun 22 14:36:05 0.008265582255680495
Jun 22 14:36:05 
Jun 22 14:36:05 ======================================================================
Jun 22 14:36:05 ERROR [0.004s]: test_poisson_sample (__main__.TestDistributions)
Jun 22 14:36:05 ----------------------------------------------------------------------
Jun 22 14:36:05 Traceback (most recent call last):
Jun 22 14:36:05   File "distributions/test_distributions.py", line 1333, in test_poisson_sample
Jun 22 14:36:05     failure_rate=1e-3)
Jun 22 14:36:05   File "distributions/test_distributions.py", line 805, in _check_sampler_discrete
Jun 22 14:36:05     chisq, p = scipy.stats.chisquare(counts[msk], pmf[msk] * num_samples)
Jun 22 14:36:05   File "/Users/distiller/workspace/miniconda3/lib/python3.7/site-packages/scipy/stats/stats.py", line 6853, in chisquare
Jun 22 14:36:05     lambda_="pearson")
Jun 22 14:36:05   File "/Users/distiller/workspace/miniconda3/lib/python3.7/site-packages/scipy/stats/stats.py", line 6694, in power_divergence
Jun 22 14:36:05     raise ValueError(msg)

pytorch_linux_bionic_py3_8_gcc9_coverage_test2 (4/4)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun)

Jun 22 15:50:32 ERROR [2.344s]: test_forward_sy...ipt (__main__.TensorPipeRemoteModuleTestWithSpawn)

Jun 22 15:50:27   test_send_remote_module_over_the_wire (__main__.TensorPipeThreeWorkersRemoteModuleTestWithSpawn) ... libgcov profiling error:/var/lib/jenkins/workspace/build/sleef/src/common/CMakeFiles/common.dir/common.c.gcda:Version mismatch - expected 9.3 (release) (A93*) got 9.4 (release) (A94*)
Jun 22 15:50:27 libgcov profiling error:/var/lib/jenkins/workspace/build/sleef/src/common/CMakeFiles/common.dir/common.c.gcda:Version mismatch - expected 9.3 (release) (A93*) got 9.4 (release) (A94*)
Jun 22 15:50:27 libgcov profiling error:/var/lib/jenkins/workspace/build/sleef/src/common/CMakeFiles/common.dir/common.c.gcda:Version mismatch - expected 9.3 (release) (A93*) got 9.4 (release) (A94*)
Jun 22 15:50:29 ok (2.859s)
Jun 22 15:50:30   test_send_remote_module_over_the_wire_script_not_supported (__main__.TensorPipeThreeWorkersRemoteModuleTestWithSpawn) ... libgcov profiling error:/var/lib/jenkins/workspace/build/sleef/src/common/CMakeFiles/common.dir/common.c.gcda:Version mismatch - expected 9.3 (release) (A93*) got 9.4 (release) (A94*)
Jun 22 15:50:30 libgcov profiling error:/var/lib/jenkins/workspace/build/sleef/src/common/CMakeFiles/common.dir/common.c.gcda:Version mismatch - expected 9.3 (release) (A93*) got 9.4 (release) (A94*)
Jun 22 15:50:30 libgcov profiling error:/var/lib/jenkins/workspace/build/sleef/src/common/CMakeFiles/common.dir/common.c.gcda:Version mismatch - expected 9.3 (release) (A93*) got 9.4 (release) (A94*)
Jun 22 15:50:32 ok (2.856s)
Jun 22 15:50:32 
Jun 22 15:50:32 ======================================================================
Jun 22 15:50:32 ERROR [2.344s]: test_forward_sync_script (__main__.TensorPipeRemoteModuleTestWithSpawn)
Jun 22 15:50:32 ----------------------------------------------------------------------
Jun 22 15:50:32 Traceback (most recent call last):
Jun 22 15:50:32   File "/opt/conda/lib/python3.8/site-packages/torch/testing/_internal/common_distributed.py", line 398, in wrapper
Jun 22 15:50:32     self._join_processes(fn)
Jun 22 15:50:32   File "/opt/conda/lib/python3.8/site-packages/torch/testing/_internal/common_distributed.py", line 590, in _join_processes
Jun 22 15:50:32     self._check_return_codes(elapsed_time)
Jun 22 15:50:32   File "/opt/conda/lib/python3.8/site-packages/torch/testing/_internal/common_distributed.py", line 633, in _check_return_codes
Jun 22 15:50:32     raise RuntimeError(error)
Jun 22 15:50:32 RuntimeError: Process 0 exited with error code 10 and exception:
Jun 22 15:50:32 Traceback (most recent call last):

1 failure not recognized by patterns:

Job	Step	Action
^{pytorch_linux_bionic_py3_6_clang9_noarch_test}	^{Report results}	🔁 rerun

1 job timed out:

pytorch_linux_bionic_py3_6_clang9_noarch_test

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

ngimel · 2021-06-17T16:51:37Z

Does only OpenMP need changes here? What about other parallel backends?
Also, TLS accesses are expensive, and while it looks that one is unavoidable here, it also means that unfortunately for perf reasons people will keep writing at the call site

if (there_is_not_a_lot_of_work){
f()
} else {
at::parallel_for(...)
}

No changes needed, but it's a pattern that you've eliminated recently in reduction, so now it became slightly more expensive.

Fixes #59149 (comment) `parallel_for` will call the function directly if it would have run on only a single thread anyway. This is great for performance, but causes an issue in nested parallel regions because `get_thread_num` will reflect the parent parallel region instead of the current `parallel_for` call. I fix this by using a `thread_local` variable for the current thread id and manually setting it before each call to the user-provided function. [ghstack-poisoned]

peterbell10 · 2021-06-17T20:13:36Z

Does only OpenMP need changes here? What about other parallel backends?

I think your right, I've made similar changes to the other backends as well.

Fixes #59149 (comment) `parallel_for` will call the function directly if it would have run on only a single thread anyway. This is great for performance, but causes an issue in nested parallel regions because `get_thread_num` will reflect the parent parallel region instead of the current `parallel_for` call. I fix this by using a `thread_local` variable for the current thread id and manually setting it before each call to the user-provided function. [ghstack-poisoned]

ngimel · 2021-06-21T20:35:38Z

Great, can you please add a test for it (probably cpp test)?

ngimel · 2021-06-22T05:24:19Z

@ngimel has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Fixes #59149 (comment) `parallel_for` will call the function directly if it would have run on only a single thread anyway. This is great for performance, but causes an issue in nested parallel regions because `get_thread_num` will reflect the parent parallel region instead of the current `parallel_for` call. I fix this by using a `thread_local` variable for the current thread id and manually setting it before each call to the user-provided function. Differential Revision: [D29287816](https://our.internmc.facebook.com/intern/diff/D29287816) [ghstack-poisoned]

peterbell10 · 2021-06-22T13:36:09Z

aten/src/ATen/test/test_parallel.cpp

 TEST(TestParallel, TestParallel) {
  manual_seed(123);
-  set_num_threads(1);
+  NumThreadsGuard guard(1);


This test was having the global side-effect of disabling multi-threading.

Thanks for fixing this!

ngimel · 2021-06-22T21:35:59Z

@ngimel has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot · 2021-06-23T08:18:43Z

@ngimel merged this pull request in c75714e.

Summary: Small typo in #60183 Pull Request resolved: #60532 Reviewed By: walterddr Differential Revision: D29336173 Pulled By: ngimel fbshipit-source-id: 57d753f21d484bbae26a23cb3eb35e497e25118a

facebook-github-bot added the cla signed label Jun 17, 2021

This was referenced Jun 17, 2021

OpenMP: Refactor parallel_reduce to share code with parallel_for #60184

Closed

Ensure num_threads is initialized before calling omp_get_max_threads #60185

Closed

peterbell10 requested a review from ngimel June 17, 2021 16:26

peterbell10 added module: openmp Related to OpenMP (omp) support in PyTorch open source labels Jun 17, 2021

peterbell10 added 2 commits June 17, 2021 21:07

peterbell10 added 4 commits June 17, 2021 21:28

peterbell10 commented Jun 22, 2021

View reviewed changes

ngimel approved these changes Jun 22, 2021

View reviewed changes

facebook-github-bot closed this in c75714e Jun 23, 2021

facebook-github-bot added the Merged label Jun 23, 2021

peterbell10 mentioned this pull request Jun 23, 2021

Fix parallel tbb build #60532

Closed

facebook-github-bot deleted the gh/peterbell10/76/head branch June 26, 2021 14:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Ensure thread id is valid in nested parallel regions #60183

Ensure thread id is valid in nested parallel regions #60183

Uh oh!

peterbell10 commented Jun 17, 2021 •

edited by ngimel

Loading

Uh oh!

facebook-github-bot commented Jun 17, 2021 •

edited

Loading

Uh oh!

ngimel commented Jun 17, 2021

Uh oh!

peterbell10 commented Jun 17, 2021

Uh oh!

ngimel commented Jun 21, 2021

Uh oh!

ngimel commented Jun 22, 2021

Uh oh!

peterbell10 Jun 22, 2021

Uh oh!

ngimel Jun 22, 2021

Uh oh!

ngimel commented Jun 22, 2021

Uh oh!

facebook-github-bot commented Jun 23, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Ensure thread id is valid in nested parallel regions #60183

Ensure thread id is valid in nested parallel regions #60183

Uh oh!

Conversation

peterbell10 commented Jun 17, 2021 • edited by ngimel Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

facebook-github-bot commented Jun 17, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

💊 CI failures summary and remediations

🕵️ 4 new failures recognized by patterns

pytorch_linux_xenial_py3_6_gcc5_4_build (1/4)

pytorch_xla_linux_bionic_py3_6_clang9_build (2/4)

pytorch_macos_10_13_py3_test (3/4)

pytorch_linux_bionic_py3_8_gcc9_coverage_test2 (4/4)

1 failure not recognized by patterns:

Uh oh!

ngimel commented Jun 17, 2021

Uh oh!

peterbell10 commented Jun 17, 2021

Uh oh!

ngimel commented Jun 21, 2021

Uh oh!

ngimel commented Jun 22, 2021

Uh oh!

peterbell10 Jun 22, 2021

Choose a reason for hiding this comment

Uh oh!

ngimel Jun 22, 2021

Choose a reason for hiding this comment

Uh oh!

ngimel commented Jun 22, 2021

Uh oh!

facebook-github-bot commented Jun 23, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

peterbell10 commented Jun 17, 2021 •

edited by ngimel

Loading

facebook-github-bot commented Jun 17, 2021 •

edited

Loading