KEMBAR78
[C10D] Track pg name in c++. by kumpera · Pull Request #108813 · pytorch/pytorch · GitHub
Skip to content

Conversation

@kumpera
Copy link
Contributor

@kumpera kumpera commented Sep 7, 2023

[ghstack-poisoned]
@pytorch-bot
Copy link

pytorch-bot bot commented Sep 7, 2023

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/108813

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 6456632 with merge base 9b3f582 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Copy link
Contributor

@wconstab wconstab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stamp based on the fixes you promised

@kumpera kumpera added the topic: not user facing topic category label Sep 14, 2023
@kumpera
Copy link
Contributor Author

kumpera commented Sep 14, 2023

@pytorchmergebot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Sep 14, 2023
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@facebook-github-bot facebook-github-bot deleted the gh/kumpera/57/head branch September 18, 2023 14:24
pytorchmergebot pushed a commit that referenced this pull request Sep 20, 2023
#108814)

Collectives timing gates the tracking when a collective starts on device.

Currently it's enabled by set the NCCL_ENABLE_TIMING env var.

The goal of this PR is to make it possible to dynamically enable that flag so users of the PG hooks don't have to set that flag in order to have their hooks work.

The design is that once set, all new collectives will have such behavior so we track it on each Work object.

We make enableTiming_ atomic in PGNCCL to avoid races on non-TSO hardware.

To ensure consistency, we copy its value during Work construction and replace all previous usage of enableTiming_ from the PG with usages from the Work, which now has an immutable value.

Pull Request resolved: #108814
Approved by: https://github.com/wconstab, https://github.com/fduwjj
ghstack dependencies: #108813
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk Trigger trunk jobs on your pull request Merged release notes: distributed (c10d) release notes category topic: not user facing topic category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants