Use cpuinfo to determine c10::ThreadPool thread number #107010

cyyever · 2023-08-11T03:46:56Z

This PR prefers "logical processor number" (the cpu cores shown in htop) returned by cpuinfo for determining c10 thread number. If that fails, it uses hardware_concurrency exactly.
The motivation is that in a x86 host with 64 cores and Hyper-Threading disabled, the current behavior uses 32 threads, resulting half of cores being idle.

pytorch-bot · 2023-08-11T03:46:58Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/107010

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure

As of commit 40d248c with merge base d392963 ():

NEW FAILURE - The following job has failed:

win-vs2019-cpu-py3 / test (default, 2, 3, windows.4xlarge.nonephemeral) (gh)

This comment was automatically generated by Dr. CI and updates every 15 minutes.

cyyever · 2023-08-11T03:49:08Z

@pytorchbot label "topic: not user facing"

colesbury

This looks OK to me. @ezyang, what do you think? Is there someone who still maintains c10 ThreadPool?

I removed "topic: not user facing" because if we are changing the default number of threads, I think that's worth mentioning in release notes.

jgong5 · 2023-08-15T05:02:13Z

c10/core/thread_pool.cpp

+  size_t num_threads = cpuinfo_get_processors_count();
+  if (num_threads > 0) {
+    return num_threads;
+  }


Does it make more sense to invoke cpuinfo_get_cores_count instead? It is preferrable to only utilize physical cores in the case of HT on. cpuinfo_get_cores_count makes sure we always utilize all physical CPU cores in both HT on and HT off.

@jgong5 cpuinfo_get_processors_count should return core count when HT off. As a reference implementation, caffe2/utils/threadpool/ThreadPool.cc also uses cpuinfo_get_processors_count for thread number.

I'm concerned about the changes of the behavior when HT is on. Previously, only number of physical cores would be used. Now it would be all cores including HT cores, isn't it?

ezyang

Yes, this seems OK. I don't think anyone is full timing on thread pool, so if you're willing to keep pushing on these changes this would be a big help.

cyyever · 2023-08-15T17:15:54Z

@pytorchmergebot merge

pytorchmergebot · 2023-08-15T17:26:17Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

cyyever · 2023-08-15T17:32:19Z

@ezyang My pleasure to help the community

izaitsevfb · 2023-08-16T02:18:22Z

@pytorchbot revert -m 'Breaks internal meta builds' -c ghfirst

Hi, @cyyever, I'm terribly sorry, but I have to revert this for now, as it breaks internal meta builds with:

caffe2/c10/core/thread_pool.cpp:3:10: fatal error: 'cpuinfo.h' file not found
#include <cpuinfo.h>
         ^~~~~~~~~~~
1 error generated.
    When running <c++ preprocess_and_compile>.

I'll look into forward fixing the builds ASAP and help you reland this PR. Again, apologies for the inconvenience.

pytorchmergebot · 2023-08-16T02:20:26Z

@pytorchbot successfully started a revert job. Check the current status here.
Questions? Feedback? Please reach out to the PyTorch DevX Team

pytorchmergebot · 2023-08-16T02:20:36Z

@cyyever your PR has been successfully reverted.

…)" This reverts commit ad04765. Reverted #107010 on behalf of https://github.com/izaitsevfb due to Breaks internal meta builds ([comment](#107010 (comment)))

cyyever · 2023-08-16T02:54:18Z

@izaitsevfb Maybe some bazel target is missing cpuinfo dependency

huydhn · 2023-08-16T03:20:55Z

Just FYI, this could be captured in OSS CI by adding the ciflow/periodic label to run run buck job there. Here is an example failure on periodic trunk buck job https://hud.pytorch.org/pytorch/pytorch/commit/22f5889753d38661aefeb8ea7110ff5787a1b5e9

This PR prefers "logical processor number" (the cpu cores shown in htop) returned by cpuinfo for determining c10 thread number. If that fails, it uses hardware_concurrency exactly. The motivation is that in a x86 host with 64 cores and Hyper-Threading disabled, the current behavior uses 32 threads, resulting half of cores being idle. Pull Request resolved: pytorch#107010 Approved by: https://github.com/ezyang

…ch#107010)" This reverts commit ad04765. Reverted pytorch#107010 on behalf of https://github.com/izaitsevfb due to Breaks internal meta builds ([comment](pytorch#107010 (comment)))

Relands PR #107010 and fixes BUCK builds. Pull Request resolved: #107339 Approved by: https://github.com/ezyang

cyyever changed the title ~~use cpuinfo to get thread num~~ Use cpuinfo to determine thread pool size Aug 11, 2023

pytorch-bot bot added the topic: not user facing topic category label Aug 11, 2023

cyyever changed the title ~~Use cpuinfo to determine thread pool size~~ Use cpuinfo to determine c10::ThreadPool size Aug 11, 2023

cyyever changed the title ~~Use cpuinfo to determine c10::ThreadPool size~~ Use cpuinfo to determine c10::ThreadPool thread number Aug 11, 2023

pytorchbot added the open source label Aug 11, 2023

cyyever force-pushed the cpu_core_count branch 2 times, most recently from afdf87f to b03c33f Compare August 11, 2023 04:07

cyyever marked this pull request as draft August 11, 2023 05:09

cyyever force-pushed the cpu_core_count branch 3 times, most recently from 3f88f7f to 434cfb2 Compare August 11, 2023 06:19

cyyever marked this pull request as ready for review August 12, 2023 01:54

cyyever force-pushed the cpu_core_count branch 3 times, most recently from 5fd145c to dd44c51 Compare August 14, 2023 01:16

use cpuinfo to determin thread num

40d248c

cyyever force-pushed the cpu_core_count branch from dd44c51 to 40d248c Compare August 14, 2023 01:17

colesbury added release notes: intel release notes category and removed topic: not user facing topic category labels Aug 14, 2023

colesbury requested a review from ezyang August 14, 2023 19:43

colesbury reviewed Aug 14, 2023

View reviewed changes

colesbury added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Aug 14, 2023

jgong5 requested changes Aug 15, 2023

View reviewed changes

cyyever requested a review from jgong5 August 15, 2023 05:24

ezyang approved these changes Aug 15, 2023

View reviewed changes

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Aug 15, 2023

pytorchmergebot added the merging label Aug 15, 2023

pytorchmergebot added Merged and removed merging labels Aug 15, 2023

pytorchmergebot closed this in ad04765 Aug 15, 2023

pytorchmergebot added the Reverted label Aug 16, 2023

cyyever mentioned this pull request Aug 17, 2023

[Reland]Use cpuinfo to determine c10::ThreadPool thread number #107339

Closed

cyyever deleted the cpu_core_count branch September 13, 2023 02:00

pytorchmergebot pushed a commit that referenced this pull request Sep 14, 2023

[Reland]Use cpuinfo to determine c10::ThreadPool thread number (#107339)

8cb96f5

Relands PR #107010 and fixes BUCK builds. Pull Request resolved: #107339 Approved by: https://github.com/ezyang

Use cpuinfo to determine c10::ThreadPool thread number #107010

Use cpuinfo to determine c10::ThreadPool thread number #107010

Uh oh!

Conversation

cyyever commented Aug 11, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Aug 11, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/107010

❌ 1 New Failure

Uh oh!

cyyever commented Aug 11, 2023

Uh oh!

colesbury left a comment

Choose a reason for hiding this comment

Uh oh!

jgong5 Aug 15, 2023

Choose a reason for hiding this comment

Uh oh!

cyyever Aug 15, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jgong5 Aug 16, 2023

Choose a reason for hiding this comment

Uh oh!

ezyang left a comment

Choose a reason for hiding this comment

Uh oh!

cyyever commented Aug 15, 2023

Uh oh!

pytorchmergebot commented Aug 15, 2023

Merge started

Uh oh!

cyyever commented Aug 15, 2023

Uh oh!

izaitsevfb commented Aug 16, 2023

Uh oh!

pytorchmergebot commented Aug 16, 2023

Uh oh!

pytorchmergebot commented Aug 16, 2023

Uh oh!

cyyever commented Aug 16, 2023

Uh oh!

huydhn commented Aug 16, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

cyyever commented Aug 11, 2023 •

edited

Loading

pytorch-bot bot commented Aug 11, 2023 •

edited

Loading

cyyever Aug 15, 2023 •

edited

Loading