fix cpp extension distributed warning spew #162764

msaroufim · 2025-09-11T23:08:52Z

With the new change we only log the warning if we're running non distributed code or if we're in rank 0. Unit testing that certain messages get printed on certain ranks only feels kinda jank so test plan is below instead

Test plan

# torchrun --nproc_per_node=2 demo_fix.py 

import os
import logging

logging.getLogger('torch.utils.cpp_extension').setLevel(logging.DEBUG)

import torch
if 'RANK' in os.environ:
    torch.distributed.init_process_group('nccl')

from torch.utils.cpp_extension import _get_cuda_arch_flags
_get_cuda_arch_flags()

print(f"Rank {os.environ.get('RANK', '0')} done")

Logs showing how how TORCH_CUDA_ARCH_LISTonly shows up once if we explicitly set the the logging level to logging.DEBUG. It also improves the debug message to explain what the actual behavior will be

(source) [marksaroufim@devgpu005]~% torchrun --nproc_per_node=2 demo_fix.py

W0911 18:30:16.594000 1315439 /home/marksaroufim/pytorch/torch/distributed/run.py:814] 
W0911 18:30:16.594000 1315439 /home/marksaroufim/pytorch/torch/distributed/run.py:814] *****************************************
W0911 18:30:16.594000 1315439 /home/marksaroufim/pytorch/torch/distributed/run.py:814] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
W0911 18:30:16.594000 1315439 /home/marksaroufim/pytorch/torch/distributed/run.py:814] *****************************************
[rank0]:V0911 18:30:18.921000 1316753 pytorch/torch/utils/cpp_extension.py:2444] TORCH_CUDA_ARCH_LIST is not set, using TORCH_CUDA_ARCH_LIST='10.0+PTX' for visible GPU architectures. Set os.environ['TORCH_CUDA_ARCH_LIST'] to override.
Rank 0 done
Rank 1 done

But if we just use the default and comment out logging.getLogger('torch.utils.cpp_extension').setLevel(logging.DEBUG)

Then we get

(source) [marksaroufim@devgpu005]~% torchrun --nproc_per_node=2 demo_fix.py
W0911 18:14:33.926000 690759 /home/marksaroufim/pytorch/torch/distributed/run.py:814] 
W0911 18:14:33.926000 690759 /home/marksaroufim/pytorch/torch/distributed/run.py:814] *****************************************
W0911 18:14:33.926000 690759 /home/marksaroufim/pytorch/torch/distributed/run.py:814] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
W0911 18:14:33.926000 690759 /home/marksaroufim/pytorch/torch/distributed/run.py:814] *****************************************
Rank 0 done
Rank 1 done
(source) [marksaroufim@devgpu005]~%

pytorch-bot · 2025-09-11T23:08:55Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/162764

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit fb59b11 with merge base 468c1f9 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

ezyang

Not sure why it got reordered though lol

msaroufim · 2025-09-12T03:27:10Z

Just makes it easier to show the user what architectures were detected, there's both arch_list and _arch_list. The older warning wasn't too helpful even when it did spew

msaroufim · 2025-09-12T03:27:15Z

@pytorchbot merge

pytorchmergebot · 2025-09-12T03:29:09Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

With the new change we only log the warning if we're running non distributed code or if we're in rank 0. Unit testing that certain messages get printed on certain ranks only feels kinda jank so test plan is below instead Test plan ```python # torchrun --nproc_per_node=2 demo_fix.py import os import logging logging.getLogger('torch.utils.cpp_extension').setLevel(logging.DEBUG) import torch if 'RANK' in os.environ: torch.distributed.init_process_group('nccl') from torch.utils.cpp_extension import _get_cuda_arch_flags _get_cuda_arch_flags() print(f"Rank {os.environ.get('RANK', '0')} done") ``` Logs showing how how `TORCH_CUDA_ARCH_LIST`only shows up once if we explicitly set the the logging level to `logging.DEBUG`. It also improves the debug message to explain what the actual behavior will be ``` (source) [marksaroufim@devgpu005]~% torchrun --nproc_per_node=2 demo_fix.py W0911 18:30:16.594000 1315439 /home/marksaroufim/pytorch/torch/distributed/run.py:814] W0911 18:30:16.594000 1315439 /home/marksaroufim/pytorch/torch/distributed/run.py:814] ***************************************** W0911 18:30:16.594000 1315439 /home/marksaroufim/pytorch/torch/distributed/run.py:814] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. W0911 18:30:16.594000 1315439 /home/marksaroufim/pytorch/torch/distributed/run.py:814] ***************************************** [rank0]:V0911 18:30:18.921000 1316753 pytorch/torch/utils/cpp_extension.py:2444] TORCH_CUDA_ARCH_LIST is not set, using TORCH_CUDA_ARCH_LIST='10.0+PTX' for visible GPU architectures. Set os.environ['TORCH_CUDA_ARCH_LIST'] to override. Rank 0 done Rank 1 done ``` But if we just use the default and comment out `logging.getLogger('torch.utils.cpp_extension').setLevel(logging.DEBUG)` Then we get ``` (source) [marksaroufim@devgpu005]~% torchrun --nproc_per_node=2 demo_fix.py W0911 18:14:33.926000 690759 /home/marksaroufim/pytorch/torch/distributed/run.py:814] W0911 18:14:33.926000 690759 /home/marksaroufim/pytorch/torch/distributed/run.py:814] ***************************************** W0911 18:14:33.926000 690759 /home/marksaroufim/pytorch/torch/distributed/run.py:814] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. W0911 18:14:33.926000 690759 /home/marksaroufim/pytorch/torch/distributed/run.py:814] ***************************************** Rank 0 done Rank 1 done (source) [marksaroufim@devgpu005]~% ``` Pull Request resolved: pytorch#162764 Approved by: https://github.com/ezyang, https://github.com/zou3519

Camyll · 2025-10-06T21:15:54Z

@pytorchbot cherry-pick --onto release/2.9 --c critical

With the new change we only log the warning if we're running non distributed code or if we're in rank 0. Unit testing that certain messages get printed on certain ranks only feels kinda jank so test plan is below instead Test plan ```python # torchrun --nproc_per_node=2 demo_fix.py import os import logging logging.getLogger('torch.utils.cpp_extension').setLevel(logging.DEBUG) import torch if 'RANK' in os.environ: torch.distributed.init_process_group('nccl') from torch.utils.cpp_extension import _get_cuda_arch_flags _get_cuda_arch_flags() print(f"Rank {os.environ.get('RANK', '0')} done") ``` Logs showing how how `TORCH_CUDA_ARCH_LIST`only shows up once if we explicitly set the the logging level to `logging.DEBUG`. It also improves the debug message to explain what the actual behavior will be ``` (source) [marksaroufim@devgpu005]~% torchrun --nproc_per_node=2 demo_fix.py W0911 18:30:16.594000 1315439 /home/marksaroufim/pytorch/torch/distributed/run.py:814] W0911 18:30:16.594000 1315439 /home/marksaroufim/pytorch/torch/distributed/run.py:814] ***************************************** W0911 18:30:16.594000 1315439 /home/marksaroufim/pytorch/torch/distributed/run.py:814] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. W0911 18:30:16.594000 1315439 /home/marksaroufim/pytorch/torch/distributed/run.py:814] ***************************************** [rank0]:V0911 18:30:18.921000 1316753 pytorch/torch/utils/cpp_extension.py:2444] TORCH_CUDA_ARCH_LIST is not set, using TORCH_CUDA_ARCH_LIST='10.0+PTX' for visible GPU architectures. Set os.environ['TORCH_CUDA_ARCH_LIST'] to override. Rank 0 done Rank 1 done ``` But if we just use the default and comment out `logging.getLogger('torch.utils.cpp_extension').setLevel(logging.DEBUG)` Then we get ``` (source) [marksaroufim@devgpu005]~% torchrun --nproc_per_node=2 demo_fix.py W0911 18:14:33.926000 690759 /home/marksaroufim/pytorch/torch/distributed/run.py:814] W0911 18:14:33.926000 690759 /home/marksaroufim/pytorch/torch/distributed/run.py:814] ***************************************** W0911 18:14:33.926000 690759 /home/marksaroufim/pytorch/torch/distributed/run.py:814] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. W0911 18:14:33.926000 690759 /home/marksaroufim/pytorch/torch/distributed/run.py:814] ***************************************** Rank 0 done Rank 1 done (source) [marksaroufim@devgpu005]~% ``` Pull Request resolved: #162764 Approved by: https://github.com/ezyang, https://github.com/zou3519 (cherry picked from commit f7e8321)

pytorchbot · 2025-10-06T21:21:02Z

Cherry picking #162764

The cherry pick PR is at #164785 and it is recommended to link a critical cherry pick PR with an issue. The following tracker issues are updated:

[v.2.9.0] Release Tracker #162497 (comment)

Details for Dev Infra team

Raised by workflow job

fix cpp extension distributed warning spew (#162764) With the new change we only log the warning if we're running non distributed code or if we're in rank 0. Unit testing that certain messages get printed on certain ranks only feels kinda jank so test plan is below instead Test plan ```python # torchrun --nproc_per_node=2 demo_fix.py import os import logging logging.getLogger('torch.utils.cpp_extension').setLevel(logging.DEBUG) import torch if 'RANK' in os.environ: torch.distributed.init_process_group('nccl') from torch.utils.cpp_extension import _get_cuda_arch_flags _get_cuda_arch_flags() print(f"Rank {os.environ.get('RANK', '0')} done") ``` Logs showing how how `TORCH_CUDA_ARCH_LIST`only shows up once if we explicitly set the the logging level to `logging.DEBUG`. It also improves the debug message to explain what the actual behavior will be ``` (source) [marksaroufim@devgpu005]~% torchrun --nproc_per_node=2 demo_fix.py W0911 18:30:16.594000 1315439 /home/marksaroufim/pytorch/torch/distributed/run.py:814] W0911 18:30:16.594000 1315439 /home/marksaroufim/pytorch/torch/distributed/run.py:814] ***************************************** W0911 18:30:16.594000 1315439 /home/marksaroufim/pytorch/torch/distributed/run.py:814] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. W0911 18:30:16.594000 1315439 /home/marksaroufim/pytorch/torch/distributed/run.py:814] ***************************************** [rank0]:V0911 18:30:18.921000 1316753 pytorch/torch/utils/cpp_extension.py:2444] TORCH_CUDA_ARCH_LIST is not set, using TORCH_CUDA_ARCH_LIST='10.0+PTX' for visible GPU architectures. Set os.environ['TORCH_CUDA_ARCH_LIST'] to override. Rank 0 done Rank 1 done ``` But if we just use the default and comment out `logging.getLogger('torch.utils.cpp_extension').setLevel(logging.DEBUG)` Then we get ``` (source) [marksaroufim@devgpu005]~% torchrun --nproc_per_node=2 demo_fix.py W0911 18:14:33.926000 690759 /home/marksaroufim/pytorch/torch/distributed/run.py:814] W0911 18:14:33.926000 690759 /home/marksaroufim/pytorch/torch/distributed/run.py:814] ***************************************** W0911 18:14:33.926000 690759 /home/marksaroufim/pytorch/torch/distributed/run.py:814] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. W0911 18:14:33.926000 690759 /home/marksaroufim/pytorch/torch/distributed/run.py:814] ***************************************** Rank 0 done Rank 1 done (source) [marksaroufim@devgpu005]~% ``` Pull Request resolved: #162764 Approved by: https://github.com/ezyang, https://github.com/zou3519 (cherry picked from commit f7e8321) Co-authored-by: Mark Saroufim <marksaroufim@meta.com>

fix cpp extension distributed warning spew

459f765

msaroufim added 4 commits September 11, 2025 17:55

update

9a746d6

update

82a87cf

update

2a81a41

simplify

0a97f7b

msaroufim marked this pull request as ready for review September 12, 2025 01:15

msaroufim requested review from ezyang, fmassa and malfet as code owners September 12, 2025 01:15

msaroufim requested a review from albanD September 12, 2025 01:15

msaroufim mentioned this pull request Sep 12, 2025

remove a newly added cpp_ext warning #161629

Closed

msaroufim added the release notes: cpp release notes category label Sep 12, 2025

msaroufim added 2 commits September 11, 2025 18:29

update

ab18adf

yolo

fb59b11

msaroufim requested a review from zou3519 September 12, 2025 02:23

ezyang approved these changes Sep 12, 2025

View reviewed changes

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Sep 12, 2025

msaroufim linked an issue Sep 12, 2025 that may be closed by this pull request

remove a newly added cpp_ext warning #161629

Closed

pytorchmergebot added the merging label Sep 12, 2025

zou3519 approved these changes Sep 12, 2025

View reviewed changes

pytorchmergebot added the Merged label Sep 12, 2025

pytorchmergebot closed this in f7e8321 Sep 12, 2025

pytorchmergebot removed the merging label Sep 12, 2025

msaroufim deleted the fix-cpp-extension-warning-161629 branch September 12, 2025 15:21

msaroufim mentioned this pull request Sep 15, 2025

[RFC]: No Distributed Log Spew #162999

Open

pytorchbot mentioned this pull request Oct 6, 2025

[v.2.9.0] Release Tracker #162497

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix cpp extension distributed warning spew #162764

fix cpp extension distributed warning spew #162764

Uh oh!

msaroufim commented Sep 11, 2025 •

edited

Loading

Uh oh!

pytorch-bot bot commented Sep 11, 2025 •

edited

Loading

Uh oh!

ezyang left a comment

Uh oh!

msaroufim commented Sep 12, 2025 •

edited

Loading

Uh oh!

msaroufim commented Sep 12, 2025

Uh oh!

pytorchmergebot commented Sep 12, 2025

Uh oh!

Camyll commented Oct 6, 2025

Uh oh!

pytorchbot commented Oct 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

fix cpp extension distributed warning spew #162764

fix cpp extension distributed warning spew #162764

Uh oh!

Conversation

msaroufim commented Sep 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Sep 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/162764

✅ No Failures

Uh oh!

ezyang left a comment

Choose a reason for hiding this comment

Uh oh!

msaroufim commented Sep 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

msaroufim commented Sep 12, 2025

Uh oh!

pytorchmergebot commented Sep 12, 2025

Merge started

Uh oh!

Camyll commented Oct 6, 2025

Uh oh!

pytorchbot commented Oct 6, 2025

Cherry picking #162764

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

msaroufim commented Sep 11, 2025 •

edited

Loading

pytorch-bot bot commented Sep 11, 2025 •

edited

Loading

msaroufim commented Sep 12, 2025 •

edited

Loading