KEMBAR78
[SymmMem] Add runtime detection of NVSHMEM by kwen2501 · Pull Request #156291 · pytorch/pytorch · GitHub
Skip to content

Conversation

@kwen2501
Copy link
Contributor

@kwen2501 kwen2501 commented Jun 18, 2025

Stack from ghstack (oldest at bottom):

so that we can pick the default backend for SymmetricMemory without
fully relying on env var TORCH_SYMMMEM=CUDA | NVSHMEM

On Python side, the following API is added:
torch.distributed._symmetric_memory.is_nvshmem_available()

cc @H-Huang @awgu @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k

[ghstack-poisoned]
@pytorch-bot
Copy link

pytorch-bot bot commented Jun 18, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/156291

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit 6ba7684 with merge base 4d9d884 (image):

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added oncall: distributed Add this issue/PR to distributed oncall triage queue release notes: distributed (c10d) release notes category labels Jun 18, 2025
@kwen2501 kwen2501 requested review from fduwjj, fegin and ngimel June 18, 2025 15:29
py::arg("module"));
#endif

module.def(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: this is a C++ keyword. Not good for future versions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see

In C++, module is a context-sensitive keyword introduced with C++20 for the modules feature, used in declarations like export module MyModule;.

I guess this file needs a big refactor then.

[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
kwen2501 added a commit that referenced this pull request Jun 18, 2025
so that we can pick the default backend for SymmetricMemory without
relying on env var.

ghstack-source-id: 4588d63
Pull-Request-resolved: #156291
[ghstack-poisoned]
kwen2501 added a commit that referenced this pull request Jun 18, 2025
so that we can pick the default backend for SymmetricMemory without
relying on env var.

ghstack-source-id: b4cc209
Pull-Request-resolved: #156291
[ghstack-poisoned]
kwen2501 added a commit that referenced this pull request Jun 19, 2025
so that we can pick the default backend for SymmetricMemory without
relying on env var.

ghstack-source-id: de17da3
Pull-Request-resolved: #156291
[ghstack-poisoned]
kwen2501 added a commit that referenced this pull request Jun 19, 2025
so that we can pick the default backend for SymmetricMemory without
relying on env var.

ghstack-source-id: 2941a79
Pull-Request-resolved: #156291
@kwen2501 kwen2501 added the ciflow/trunk Trigger trunk jobs on your pull request label Jun 19, 2025
@kwen2501
Copy link
Contributor Author

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk Trigger trunk jobs on your pull request Merged oncall: distributed Add this issue/PR to distributed oncall triage queue release notes: distributed (c10d) release notes category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants