KEMBAR78
Ensure thread id is valid in nested parallel regions by peterbell10 · Pull Request #60183 · pytorch/pytorch · GitHub
Skip to content

Conversation

@peterbell10
Copy link
Collaborator

@peterbell10 peterbell10 commented Jun 17, 2021

Stack from ghstack:

Fixes #59149 (comment)

parallel_for will call the function directly if it would have run on only a
single thread anyway. This is great for performance, but causes an issue in
nested parallel regions because get_thread_num will reflect the parent
parallel region instead of the current parallel_for call.

I fix this by using a thread_local variable for the current thread id and
manually setting it before each call to the user-provided function.

Differential Revision: D29287816

Fixes #59149 (comment)

`parallel_for` will call the function directly if it would have run on only a
single thread anyway. This is great for performance, but causes an issue in
nested parallel regions because `get_thread_num` will reflect the parent
parallel region instead of the current `parallel_for` call.

I fix this by using a `thread_local` variable for the current thread id and
manually setting it before each call to the user-provided function.

[ghstack-poisoned]
@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented Jun 17, 2021

💊 CI failures summary and remediations

As of commit 6677a6e (more details on the Dr. CI page and at hud.pytorch.org/pr/60183):


  • 5/5 failures introduced in this PR

🕵️ 4 new failures recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

See CircleCI build pytorch_linux_xenial_py3_6_gcc5_4_build (1/4)

Step: "(Optional) Merge target branch" (full log | diagnosis details | 🔁 rerun)

Automatic merge failed; fix conflicts and then commit the result.
CONFLICT (add/add): Merge conflict in .circleci/scripts/windows_cuda_install.sh
Auto-merging .circleci/scripts/windows_cuda_install.sh
CONFLICT (add/add): Merge conflict in .circleci/scripts/setup_ci_environment.sh
Auto-merging .circleci/scripts/setup_ci_environment.sh
CONFLICT (add/add): Merge conflict in .circleci/docker/build.sh
Auto-merging .circleci/docker/build.sh
CONFLICT (add/add): Merge conflict in .circleci/config.yml
Auto-merging .circleci/config.yml
CONFLICT (add/add): Merge conflict in .circleci/cimodel/data/windows_build_definitions.py
Auto-merging .circleci/cimodel/data/windows_build_definitions.py
Automatic merge failed; fix conflicts and then commit the result.


Exited with code exit status 1

See CircleCI build pytorch_xla_linux_bionic_py3_6_clang9_build (2/4)

Step: "(Optional) Merge target branch" (full log | diagnosis details | 🔁 rerun)

Automatic merge failed; fix conflicts and then commit the result.
CONFLICT (add/add): Merge conflict in .circleci/scripts/windows_cuda_install.sh
Auto-merging .circleci/scripts/windows_cuda_install.sh
CONFLICT (add/add): Merge conflict in .circleci/scripts/setup_ci_environment.sh
Auto-merging .circleci/scripts/setup_ci_environment.sh
CONFLICT (add/add): Merge conflict in .circleci/docker/build.sh
Auto-merging .circleci/docker/build.sh
CONFLICT (add/add): Merge conflict in .circleci/config.yml
Auto-merging .circleci/config.yml
CONFLICT (add/add): Merge conflict in .circleci/cimodel/data/windows_build_definitions.py
Auto-merging .circleci/cimodel/data/windows_build_definitions.py
Automatic merge failed; fix conflicts and then commit the result.


Exited with code exit status 1

See CircleCI build pytorch_macos_10_13_py3_test (3/4)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

Jun 22 14:36:05 ERROR [0.004s]: test_poisson_sample (__main__.TestDistributions)
Jun 22 14:36:05   File "distributions/test_distributions.py", line 805, in _check_sampler_discrete
Jun 22 14:36:05     chisq, p = scipy.stats.chisquare(counts[msk], pmf[msk] * num_samples)
Jun 22 14:36:05   File "/Users/distiller/workspace/miniconda3/lib/python3.7/site-packages/scipy/stats/stats.py", line 6853, in chisquare
Jun 22 14:36:05     lambda_="pearson")
Jun 22 14:36:05   File "/Users/distiller/workspace/miniconda3/lib/python3.7/site-packages/scipy/stats/stats.py", line 6694, in power_divergence
Jun 22 14:36:05     raise ValueError(msg)
Jun 22 14:36:05 ValueError: For each axis slice, the sum of the observed frequencies must agree with the sum of the expected frequencies to a relative tolerance of 1e-08, but the percent differences are:
Jun 22 14:36:05 0.008265582255680495
Jun 22 14:36:05 
Jun 22 14:36:05 ======================================================================
Jun 22 14:36:05 ERROR [0.004s]: test_poisson_sample (__main__.TestDistributions)
Jun 22 14:36:05 ----------------------------------------------------------------------
Jun 22 14:36:05 Traceback (most recent call last):
Jun 22 14:36:05   File "distributions/test_distributions.py", line 1333, in test_poisson_sample
Jun 22 14:36:05     failure_rate=1e-3)
Jun 22 14:36:05   File "distributions/test_distributions.py", line 805, in _check_sampler_discrete
Jun 22 14:36:05     chisq, p = scipy.stats.chisquare(counts[msk], pmf[msk] * num_samples)
Jun 22 14:36:05   File "/Users/distiller/workspace/miniconda3/lib/python3.7/site-packages/scipy/stats/stats.py", line 6853, in chisquare
Jun 22 14:36:05     lambda_="pearson")
Jun 22 14:36:05   File "/Users/distiller/workspace/miniconda3/lib/python3.7/site-packages/scipy/stats/stats.py", line 6694, in power_divergence
Jun 22 14:36:05     raise ValueError(msg)

See CircleCI build pytorch_linux_bionic_py3_8_gcc9_coverage_test2 (4/4)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun)

Jun 22 15:50:32 ERROR [2.344s]: test_forward_sy...ipt (__main__.TensorPipeRemoteModuleTestWithSpawn)
Jun 22 15:50:27   test_send_remote_module_over_the_wire (__main__.TensorPipeThreeWorkersRemoteModuleTestWithSpawn) ... libgcov profiling error:/var/lib/jenkins/workspace/build/sleef/src/common/CMakeFiles/common.dir/common.c.gcda:Version mismatch - expected 9.3 (release) (A93*) got 9.4 (release) (A94*)
Jun 22 15:50:27 libgcov profiling error:/var/lib/jenkins/workspace/build/sleef/src/common/CMakeFiles/common.dir/common.c.gcda:Version mismatch - expected 9.3 (release) (A93*) got 9.4 (release) (A94*)
Jun 22 15:50:27 libgcov profiling error:/var/lib/jenkins/workspace/build/sleef/src/common/CMakeFiles/common.dir/common.c.gcda:Version mismatch - expected 9.3 (release) (A93*) got 9.4 (release) (A94*)
Jun 22 15:50:29 ok (2.859s)
Jun 22 15:50:30   test_send_remote_module_over_the_wire_script_not_supported (__main__.TensorPipeThreeWorkersRemoteModuleTestWithSpawn) ... libgcov profiling error:/var/lib/jenkins/workspace/build/sleef/src/common/CMakeFiles/common.dir/common.c.gcda:Version mismatch - expected 9.3 (release) (A93*) got 9.4 (release) (A94*)
Jun 22 15:50:30 libgcov profiling error:/var/lib/jenkins/workspace/build/sleef/src/common/CMakeFiles/common.dir/common.c.gcda:Version mismatch - expected 9.3 (release) (A93*) got 9.4 (release) (A94*)
Jun 22 15:50:30 libgcov profiling error:/var/lib/jenkins/workspace/build/sleef/src/common/CMakeFiles/common.dir/common.c.gcda:Version mismatch - expected 9.3 (release) (A93*) got 9.4 (release) (A94*)
Jun 22 15:50:32 ok (2.856s)
Jun 22 15:50:32 
Jun 22 15:50:32 ======================================================================
Jun 22 15:50:32 ERROR [2.344s]: test_forward_sync_script (__main__.TensorPipeRemoteModuleTestWithSpawn)
Jun 22 15:50:32 ----------------------------------------------------------------------
Jun 22 15:50:32 Traceback (most recent call last):
Jun 22 15:50:32   File "/opt/conda/lib/python3.8/site-packages/torch/testing/_internal/common_distributed.py", line 398, in wrapper
Jun 22 15:50:32     self._join_processes(fn)
Jun 22 15:50:32   File "/opt/conda/lib/python3.8/site-packages/torch/testing/_internal/common_distributed.py", line 590, in _join_processes
Jun 22 15:50:32     self._check_return_codes(elapsed_time)
Jun 22 15:50:32   File "/opt/conda/lib/python3.8/site-packages/torch/testing/_internal/common_distributed.py", line 633, in _check_return_codes
Jun 22 15:50:32     raise RuntimeError(error)
Jun 22 15:50:32 RuntimeError: Process 0 exited with error code 10 and exception:
Jun 22 15:50:32 Traceback (most recent call last):

1 failure not recognized by patterns:

Job Step Action
CircleCI pytorch_linux_bionic_py3_6_clang9_noarch_test Report results 🔁 rerun

1 job timed out:

  • pytorch_linux_bionic_py3_6_clang9_noarch_test

This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

@ngimel
Copy link
Collaborator

ngimel commented Jun 17, 2021

Does only OpenMP need changes here? What about other parallel backends?
Also, TLS accesses are expensive, and while it looks that one is unavoidable here, it also means that unfortunately for perf reasons people will keep writing at the call site

if (there_is_not_a_lot_of_work){
f()
} else {
at::parallel_for(...)
}

No changes needed, but it's a pattern that you've eliminated recently in reduction, so now it became slightly more expensive.

Fixes #59149 (comment)

`parallel_for` will call the function directly if it would have run on only a
single thread anyway. This is great for performance, but causes an issue in
nested parallel regions because `get_thread_num` will reflect the parent
parallel region instead of the current `parallel_for` call.

I fix this by using a `thread_local` variable for the current thread id and
manually setting it before each call to the user-provided function.

[ghstack-poisoned]
Fixes #59149 (comment)

`parallel_for` will call the function directly if it would have run on only a
single thread anyway. This is great for performance, but causes an issue in
nested parallel regions because `get_thread_num` will reflect the parent
parallel region instead of the current `parallel_for` call.

I fix this by using a `thread_local` variable for the current thread id and
manually setting it before each call to the user-provided function.

[ghstack-poisoned]
@peterbell10
Copy link
Collaborator Author

Does only OpenMP need changes here? What about other parallel backends?

I think your right, I've made similar changes to the other backends as well.

Fixes #59149 (comment)

`parallel_for` will call the function directly if it would have run on only a
single thread anyway. This is great for performance, but causes an issue in
nested parallel regions because `get_thread_num` will reflect the parent
parallel region instead of the current `parallel_for` call.

I fix this by using a `thread_local` variable for the current thread id and
manually setting it before each call to the user-provided function.

[ghstack-poisoned]
Fixes #59149 (comment)

`parallel_for` will call the function directly if it would have run on only a
single thread anyway. This is great for performance, but causes an issue in
nested parallel regions because `get_thread_num` will reflect the parent
parallel region instead of the current `parallel_for` call.

I fix this by using a `thread_local` variable for the current thread id and
manually setting it before each call to the user-provided function.

[ghstack-poisoned]
Fixes #59149 (comment)

`parallel_for` will call the function directly if it would have run on only a
single thread anyway. This is great for performance, but causes an issue in
nested parallel regions because `get_thread_num` will reflect the parent
parallel region instead of the current `parallel_for` call.

I fix this by using a `thread_local` variable for the current thread id and
manually setting it before each call to the user-provided function.

[ghstack-poisoned]
Fixes #59149 (comment)

`parallel_for` will call the function directly if it would have run on only a
single thread anyway. This is great for performance, but causes an issue in
nested parallel regions because `get_thread_num` will reflect the parent
parallel region instead of the current `parallel_for` call.

I fix this by using a `thread_local` variable for the current thread id and
manually setting it before each call to the user-provided function.

[ghstack-poisoned]
@ngimel
Copy link
Collaborator

ngimel commented Jun 21, 2021

Great, can you please add a test for it (probably cpp test)?

@ngimel
Copy link
Collaborator

ngimel commented Jun 22, 2021

@ngimel has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Fixes #59149 (comment)

`parallel_for` will call the function directly if it would have run on only a
single thread anyway. This is great for performance, but causes an issue in
nested parallel regions because `get_thread_num` will reflect the parent
parallel region instead of the current `parallel_for` call.

I fix this by using a `thread_local` variable for the current thread id and
manually setting it before each call to the user-provided function.

Differential Revision: [D29287816](https://our.internmc.facebook.com/intern/diff/D29287816)

[ghstack-poisoned]
TEST(TestParallel, TestParallel) {
manual_seed(123);
set_num_threads(1);
NumThreadsGuard guard(1);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test was having the global side-effect of disabling multi-threading.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing this!

@ngimel
Copy link
Collaborator

ngimel commented Jun 22, 2021

@ngimel has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@ngimel merged this pull request in c75714e.

facebook-github-bot pushed a commit that referenced this pull request Jun 23, 2021
Summary:
Small typo in #60183

Pull Request resolved: #60532

Reviewed By: walterddr

Differential Revision: D29336173

Pulled By: ngimel

fbshipit-source-id: 57d753f21d484bbae26a23cb3eb35e497e25118a
@facebook-github-bot facebook-github-bot deleted the gh/peterbell10/76/head branch June 26, 2021 14:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla signed Merged module: openmp Related to OpenMP (omp) support in PyTorch open source

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants