-
Notifications
You must be signed in to change notification settings - Fork 25.7k
[Reland]Use cpuinfo to determine c10::ThreadPool thread number #107339
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This PR needs a
|
|
@pytorchbot label ciflow/periodic |
|
❌ 🤖 pytorchbot command failed: Try |
|
@pytorchbot label ciflow/slow |
|
@pytorchbot label ciflow/trunk |
|
@pytorchbot label ciflow/inductor |
|
@izaitsevfb has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
|
I've imported the PR to get the signal from internal meta builds ahead of the merge. Will let you know the results as soon as they are ready. |
|
Unfortunately, the internal builds are still failing. I'm trying to see how to forward fix them them on meta's side. |
|
Please don't merge yet, I'm still working on the internal build fix. |
|
@izaitsevfb What is the situation? |
I'm sorry for the delay, unfortunately, the fix is not trivial. We don't have a precedent yet of including third-party libraries into c10 core internally, and my attempts so far were unsuccessful. But I'm still on it, and I'll try to solve this EoW. If not, I'll escalate. |
|
@pytorchbot label ciflow/binaries |
|
@izaitsevfb If it is hard to add cpuinfo to c10, may we remove the cpuinfo test temporarily and try that in another PR? |
…r + internal patch Summary: Testing pytorch#107339 combined with internal patches. Differential Revision: D49109231
I think I finally found everything that needs to be patched internally. The following changes should be made to your PR (I can do this myself if you're ok with that) (see #109079):
I'll try to merge this PR tomorrow. Apologies again for the delay. |
|
@izaitsevfb Thank you so much for the efforts. I will add the changes. |
b54f476 to
ae44dfa
Compare
|
@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here |
|
Successfully rebased |
ae44dfa to
59b327c
Compare
|
@izaitsevfb has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
62d1775 to
59b327c
Compare
|
Ok, unfortunately |
|
@izaitsevfb I had a local bazel build and "//third_party/cpuinfo" could work. |
Yep, that's the way to go. I found how to support this variant internally as well, and all internal tests are passing now on the patched version. So the plan is following (mostly for the record):
|
|
@izaitsevfb has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
|
The corresponding diff was merged internally, so our tooling should merge this PR shortly (if it doesn't, I'll force-merge it). |
|
@pytorchbot merge -f 'Landed internally as D48443523' |
Merge startedYour change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
|
@izaitsevfb Thank you! |
|
@cyyever thank you for you patience, that was quite an adventure :) |
This addresses a confusing bug on HUD and Dr.CI where a bunch of unrelated cancelled signals showing up, forcing people to force merge. For example, * Dr.CI pytorch/pytorch#107339 * HUD https://hud.pytorch.org/pr/107339 * Dr.CI pytorch/pytorch#105055 * HUD https://hud.pytorch.org/pr/105055 They are cancelled signals from the previous workflow run that had been retried successfully. The cancelled signal were there because the names are different, i.e. `manywheel-py3_10-cuda11_8-test (cancel)` became `manywheel-py3_10-cuda11_8-test / test (success)` after retrying The fix I have here is to use a trie search to check if a cancelled job has been retried successfully and remove it from the list accordingly. ### Testing * https://torchci-git-fork-huydhn-remove-wrong-cancel-948947-fbopensource.vercel.app/pr/107339 * https://torchci-git-fork-huydhn-remove-wrong-cancel-948947-fbopensource.vercel.app/pr/105055 * **Dr.CI #107339** <!-- drci-comment-start --> ## 🔗 Helpful Links ### 🧪 See artifacts and rendered test results at [hud.pytorch.org/pr/107339](https://hud.pytorch.org/pr/107339) * 📄 Preview [Python docs built from this PR](https://docs-preview.pytorch.org/pytorch/pytorch/107339/index.html) * 📄 Preview [C++ docs built from this PR](https://docs-preview.pytorch.org/pytorch/pytorch/107339/cppdocs/index.html) * ❓ Need help or want to give feedback on the CI? Visit the [bot commands wiki](https://github.com/pytorch/pytorch/wiki/Bot-commands) or our [office hours](https://github.com/pytorch/pytorch/wiki/Dev-Infra-Office-Hours) Note: Links to docs will display an error until the docs builds have been completed. ## ❌ 1 New Failure, 6 Unrelated Failures As of commit 59b327cdb6891318111fe98c0dbc72c7da0e7b95 with merge base 5531a23b204b4daa2c0bb3c52610e9a0ba79dacf (<sub><sub><img alt="image" width=70 src="https://img.shields.io/date/1694521453?label=&color=FFFFFF&style=flat-square"></sub></sub>): <details open><summary><b>NEW FAILURE</b> - The following job has failed:</summary><p> * [wheel-py3_10-cpu-test](https://hud.pytorch.org/pr/pytorch/pytorch/107339#16727706927) ([gh](https://github.com/pytorch/pytorch/actions/runs/6163427856/job/16727706927)) </p></details> <details ><summary><b>FLAKY</b> - The following job failed but was likely due to flakiness present on trunk:</summary><p> * [macos-12-py3-x86-64 / test (default, 1, 4, macos-12)](https://hud.pytorch.org/pr/pytorch/pytorch/107339#16727674082) ([gh](https://github.com/pytorch/pytorch/actions/runs/6163427777/job/16727674082)) </p></details> <details ><summary><b>UNSTABLE</b> - The following jobs failed but were likely due to flakiness present on trunk and has been marked as unstable:</summary><p> * [linux-focal-cuda11.8-py3.9-gcc9 / test (multigpu, 1, 1, linux.g5.12xlarge.nvidia.gpu, unstable)](https://hud.pytorch.org/pr/pytorch/pytorch/107339#16727757788) ([gh](https://github.com/pytorch/pytorch/actions/runs/6163427777/job/16727757788)) * [linux-focal-py3.11-clang10 / test (dynamo, 1, 2, linux.2xlarge, unstable)](https://hud.pytorch.org/pr/pytorch/pytorch/107339#16724273623) ([gh](https://github.com/pytorch/pytorch/actions/runs/6162428606/job/16724273623)) * [linux-focal-py3.11-clang10 / test (dynamo, 2, 2, linux.2xlarge, unstable)](https://hud.pytorch.org/pr/pytorch/pytorch/107339#16724273805) ([gh](https://github.com/pytorch/pytorch/actions/runs/6162428606/job/16724273805)) * [linux-focal-py3.8-clang10 / test (dynamo, 1, 2, linux.2xlarge, unstable)](https://hud.pytorch.org/pr/pytorch/pytorch/107339#16724267602) ([gh](https://github.com/pytorch/pytorch/actions/runs/6162428606/job/16724267602)) * [linux-focal-py3.8-clang10 / test (dynamo, 2, 2, linux.2xlarge, unstable)](https://hud.pytorch.org/pr/pytorch/pytorch/107339#16724267789) ([gh](https://github.com/pytorch/pytorch/actions/runs/6162428606/job/16724267789)) </p></details> This comment was automatically generated by Dr. CI and updates every 15 minutes. <!-- drci-comment-end --> * **Dr.CI #105055** <!-- drci-comment-start --> ## 🔗 Helpful Links ### 🧪 See artifacts and rendered test results at [hud.pytorch.org/pr/105055](https://hud.pytorch.org/pr/105055) * 📄 Preview [Python docs built from this PR](https://docs-preview.pytorch.org/pytorch/pytorch/105055/index.html) * 📄 Preview [C++ docs built from this PR](https://docs-preview.pytorch.org/pytorch/pytorch/105055/cppdocs/index.html) * ❓ Need help or want to give feedback on the CI? Visit the [bot commands wiki](https://github.com/pytorch/pytorch/wiki/Bot-commands) or our [office hours](https://github.com/pytorch/pytorch/wiki/Dev-Infra-Office-Hours) Note: Links to docs will display an error until the docs builds have been completed. ## ✅ You can merge normally! (2 Unrelated Failures) As of commit 1f762dfc92b46323950c3e6e95d99d9687741451 with merge base d0f8ee45bdd3d68895dfecf38b39c363ebf82483 (<sub><sub><img alt="image" width=70 src="https://img.shields.io/date/1692769911?label=&color=FFFFFF&style=flat-square"></sub></sub>): <details ><summary><b>FLAKY</b> - The following job failed but was likely due to flakiness present on trunk:</summary><p> * [cuda12.1-py3.10-gcc9-sm86-periodic-dynamo-benchmarks / test (aot_eager_torchbench, 1, 1, linux.g5.4xlarge.nvidia.gpu)](https://hud.pytorch.org/pr/pytorch/pytorch/105055#16135412390) ([gh](https://github.com/pytorch/pytorch/actions/runs/5949203435/job/16135412390)) </p></details> <details ><summary><b>UNSTABLE</b> - The following job failed but was likely due to flakiness present on trunk and has been marked as unstable:</summary><p> * [linux-focal-rocm5.6-py3.8 / test (default, 1, 3, linux.rocm.gpu, unstable)](https://hud.pytorch.org/pr/pytorch/pytorch/105055#16135257231) ([gh](https://github.com/pytorch/pytorch/actions/runs/5949203290/job/16135257231)) </p></details> This comment was automatically generated by Dr. CI and updates every 15 minutes. <!-- drci-comment-end -->
Relands PR #107010 and fixes BUCK builds.
cc @ezyang @izaitsevfb