KEMBAR78
[dist] expose unsafe_get_ptr for dist.ProcessGroupNCCL.NCCLConfig by youkaichao · Pull Request #161136 · pytorch/pytorch · GitHub
Skip to content

Conversation

@youkaichao
Copy link
Collaborator

@youkaichao youkaichao commented Aug 21, 2025

expose the pointer so that we can create the ncclConfig_t object from pytorch and use it elsewhere. this is useful to control the nccl communicator parameters for multiple nccl communicators.

cc @H-Huang @awgu @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @pragupta

@pytorch-bot
Copy link

pytorch-bot bot commented Aug 21, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/161136

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 8f1a9e5 with merge base 7f201ba (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added oncall: distributed Add this issue/PR to distributed oncall triage queue release notes: distributed (c10d) release notes category labels Aug 21, 2025
@youkaichao youkaichao requested a review from kwen2501 August 21, 2025 03:50
Copy link
Contributor

@kwen2501 kwen2501 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw, in Python (in particular CPython), you can always get an address of a python object by the built-in id(obj) method.

.def_readwrite("nvls_ctas", &ncclConfig_t::nvlsCTAs)
#endif
.def_property_readonly(
"ptr",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mind naming it as unsafe_get_ptr? The reason is that in line 3382 above the object is created via std::make_unique thus this object could disappear after one gets the inner pointer. (With the rename, this entry would probably look more like a method than a property.)

.def_property_readonly(
"ptr",
[](const ncclConfig_t& self) {
return reinterpret_cast<int64_t>(&self);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use uintptr_t instead of int64_t?

cga_cluster_size: int
min_ctas: int
max_ctas: int
ptr: int
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need if we make it a unsafe_get_ptr method instead of property.

@youkaichao
Copy link
Collaborator Author

@kwen2501 updated according to your comments.

@youkaichao youkaichao changed the title [dist] expose pointer for ncclConfig_t [dist] expose unsafe_get_ptr for dist.ProcessGroupNCCL.NCCLConfig Aug 21, 2025
@youkaichao
Copy link
Collaborator Author

@pytorchmergebot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Aug 21, 2025
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@youkaichao youkaichao deleted the config_ptr branch August 21, 2025 10:58
markc-614 pushed a commit to markc-614/pytorch that referenced this pull request Sep 17, 2025
…torch#161136)

expose the pointer so that we can create the `ncclConfig_t` object from pytorch and use it elsewhere. this is useful to control the nccl communicator parameters for multiple nccl communicators.

Pull Request resolved: pytorch#161136
Approved by: https://github.com/kwen2501
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk Trigger trunk jobs on your pull request Merged oncall: distributed Add this issue/PR to distributed oncall triage queue open source release notes: distributed (c10d) release notes category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants