-
Notifications
You must be signed in to change notification settings - Fork 25.7k
[SymmetricMemory] introduce user-facing APIs empty() and rendezvous() #139677
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
[ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/139677
Note: Links to docs will display an error until the docs builds have been completed. ❗ 1 Active SEVsThere are 1 currently active SEVs. If your PR is affected, please view them below: ✅ No FailuresAs of commit 760b83f with merge base b86b534 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
cc H-Huang awgu kwen2501 wanchaol fegin fduwjj wz337 wconstab d4l3k c-p-i-o [ghstack-poisoned]
cc H-Huang awgu kwen2501 wanchaol fegin fduwjj wz337 wconstab d4l3k c-p-i-o [ghstack-poisoned]
…rendezvous()"
Previously `SymmetricMemory` only had private pybind APIs:
```python
from torch.distributed._symmetric_memory import _SymmetricMemory
t = _SymmetricMemory.empty_strided_p2p(
size=(64,),
stride=(1,),
dtype=torch.float32,
device=device,
)
symm_mem_hdl = _SymmetricMemory.rendezvous(t, group_name=group.group_name)
```
This PR introduces user-facing APIs: empty() and rendezvous():
```python
import torch.distributed._symmetric_memory as symm_mem
t = symm_mem.empty(64, device="cuda")
symm_mem_hdl = symm_mem.rendezvous(t, group_name=group.group_name)
```
Notable differences compared to the pybind APIs:
- `empty()` now resembles `torch.empty()`:
- shape can either be an integer sequence or pack
- no need to/can't specify stride anymore
- device can either be `torch.device` or string
- `group_name` needs to be specified at rendezvous time as opposed to allocation time. See #139529 for the rationales. I feel the new semantic is superior, hence enforcing it in the public API.
- Currently, the pybind API still support specifying `group_name` at rendezvous time.
This PR does not change the behavior of the pybind APIs.
cc H-Huang awgu kwen2501 wanchaol fegin fduwjj wz337 wconstab d4l3k c-p-i-o
[ghstack-poisoned]
…endezvous()"
Previously `SymmetricMemory` only had private pybind APIs:
```python
from torch.distributed._symmetric_memory import _SymmetricMemory
t = _SymmetricMemory.empty_strided_p2p(
size=(64,),
stride=(1,),
dtype=torch.float32,
device=device,
)
symm_mem_hdl = _SymmetricMemory.rendezvous(t, group_name=group.group_name)
```
This PR introduces user-facing APIs empty() and rendezvous():
```python
import torch.distributed._symmetric_memory as symm_mem
t = symm_mem.empty(64, device="cuda")
symm_mem_hdl = symm_mem.rendezvous(t, group_name=group.group_name)
```
Notable differences compared to the pybind APIs:
- `empty()` now resembles `torch.empty()`:
- shape can either be an integer sequence or pack
- no need to/can't specify stride anymore
- device can either be `torch.device` or string
- `group_name` needs to be specified at rendezvous time as opposed to allocation time. See #139529 for the rationales. I feel the new semantic is superior, hence enforcing it in the public API.
- Currently, the pybind API still support specifying `group_name` at rendezvous time.
This PR does not change the behavior of the pybind APIs.
cc H-Huang awgu kwen2501 wanchaol fegin fduwjj wz337 wconstab d4l3k c-p-i-o
[ghstack-poisoned]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the API looks good, I believe it boils down to the same "verbs" as before (it used to be two steps, it is still two steps) but they have become somewhat simplified (no more explicit strides, one can pass PG instances, ...) and they have become public, which makes it feel more "ok" to call them :)
The only question I have is how do you intend them to be imported in user code:
- The function names (
emptyandrendezvous) are quite common so I guess one won't import them directly (i.e., nofrom torch.distributed._symmetric_memory import ...). - However, their fully qualified name is long, hence it's perhaps a bit unwieldy to invoke
torch.distributed._symmetric_memory.rendezvousdirectly. - Is the intended usage that people import the module with a shorthand (
import torch.distributed._symmetric_memory as symm_mem) and then use that to invoke the functions (symm_mem.empty(...))? This is what is done in the tests.
| _SymmetricMemory: the symmetric memory workspace associated with the | ||
| group. | ||
| """ | ||
| enable_symm_mem_for_group(group_name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah this looks like a key change! Does it mean that now, if one calls the fused matmul+comms ops, they lazily initialize the SymmetricMemory on their own?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, right! I realized it's redundant to require users to call enable_symm_mem_for_group(group_name) when they are already explicitly using the group for symmetric memory. This indirectly addresses the tracing issue you encountered.
The third bullet point is what I had in mind. My thinking is that, since the API is already scoped at the module level, it might be redundant to further distinguish it at the function level. Here are two approaches we can recommend to users for working with the API: import torch.distributed._symmetric_memory as symm_mem
t = symm_mem.empty(...)from torch.distributed._symmetric_memory import empty as symm_mem_empty
t = symm_mem_empty(...)This approach is also consistent with other Let me know if you have further suggestions! |
…endezvous()"
Previously `SymmetricMemory` only had private pybind APIs:
```python
from torch.distributed._symmetric_memory import _SymmetricMemory
t = _SymmetricMemory.empty_strided_p2p(
size=(64,),
stride=(1,),
dtype=torch.float32,
device=device,
)
symm_mem_hdl = _SymmetricMemory.rendezvous(t, group_name=group.group_name)
```
This PR introduces user-facing APIs empty() and rendezvous():
```python
import torch.distributed._symmetric_memory as symm_mem
t = symm_mem.empty(64, device="cuda")
symm_mem_hdl = symm_mem.rendezvous(t, group_name=group.group_name)
```
Notable differences compared to the pybind APIs:
- `empty()` now resembles `torch.empty()`:
- shape can either be an integer sequence or pack
- no need to/can't specify stride anymore
- device can either be `torch.device` or string
- `group_name` needs to be specified at rendezvous time as opposed to allocation time. See #139529 for the rationales. I feel the new semantic is superior, hence enforcing it in the public API.
- Currently, the pybind API still support specifying `group_name` at rendezvous time.
This PR does not change the behavior of the pybind APIs.
cc H-Huang awgu kwen2501 wanchaol fegin fduwjj wz337 wconstab d4l3k c-p-i-o
[ghstack-poisoned]
…endezvous()"
Previously `SymmetricMemory` only had private pybind APIs:
```python
from torch.distributed._symmetric_memory import _SymmetricMemory
t = _SymmetricMemory.empty_strided_p2p(
size=(64,),
stride=(1,),
dtype=torch.float32,
device=device,
)
symm_mem_hdl = _SymmetricMemory.rendezvous(t, group_name=group.group_name)
```
This PR introduces user-facing APIs empty() and rendezvous():
```python
import torch.distributed._symmetric_memory as symm_mem
t = symm_mem.empty(64, device="cuda")
symm_mem_hdl = symm_mem.rendezvous(t, group_name=group.group_name)
```
Notable differences compared to the pybind APIs:
- `empty()` now resembles `torch.empty()`:
- shape can either be an integer sequence or pack
- no need to/can't specify stride anymore
- device can either be `torch.device` or string
- `group_name` needs to be specified at rendezvous time as opposed to allocation time. See #139529 for the rationales. I feel the new semantic is superior, hence enforcing it in the public API.
- Currently, the pybind API still support specifying `group_name` at rendezvous time.
This PR does not change the behavior of the pybind APIs.
cc H-Huang awgu kwen2501 wanchaol fegin fduwjj wz337 wconstab d4l3k c-p-i-o
[ghstack-poisoned]
…endezvous()"
Previously `SymmetricMemory` only had private pybind APIs:
```python
from torch.distributed._symmetric_memory import _SymmetricMemory
t = _SymmetricMemory.empty_strided_p2p(
size=(64,),
stride=(1,),
dtype=torch.float32,
device=device,
)
symm_mem_hdl = _SymmetricMemory.rendezvous(t, group_name=group.group_name)
```
This PR introduces user-facing APIs empty() and rendezvous():
```python
import torch.distributed._symmetric_memory as symm_mem
t = symm_mem.empty(64, device="cuda")
symm_mem_hdl = symm_mem.rendezvous(t, group_name=group.group_name)
```
Notable differences compared to the pybind APIs:
- `empty()` now resembles `torch.empty()`:
- shape can either be an integer sequence or pack
- no need to/can't specify stride anymore
- device can either be `torch.device` or string
- `group_name` needs to be specified at rendezvous time as opposed to allocation time. See #139529 for the rationales. I feel the new semantic is superior, hence enforcing it in the public API.
- Currently, the pybind API still support specifying `group_name` at rendezvous time.
This PR does not change the behavior of the pybind APIs.
cc H-Huang awgu kwen2501 wanchaol fegin fduwjj wz337 wconstab d4l3k c-p-i-o
[ghstack-poisoned]
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
…pytorch#139677) Previously `SymmetricMemory` only had private pybind APIs: ```python from torch.distributed._symmetric_memory import _SymmetricMemory t = _SymmetricMemory.empty_strided_p2p( size=(64,), stride=(1,), dtype=torch.float32, device=device, ) symm_mem_hdl = _SymmetricMemory.rendezvous(t, group_name=group.group_name) ``` This PR introduces user-facing APIs empty() and rendezvous(): ```python import torch.distributed._symmetric_memory as symm_mem t = symm_mem.empty(64, device="cuda") symm_mem_hdl = symm_mem.rendezvous(t, group_name=group.group_name) ``` Notable differences compared to the pybind APIs: - `empty()` now resembles `torch.empty()`: - shape can either be an integer sequence or pack - no need to/can't specify stride anymore - device can either be `torch.device` or string - `group_name` needs to be specified at rendezvous time as opposed to allocation time. See pytorch#139529 for the rationales. I feel the new semantic is superior, hence enforcing it in the public API. - Currently, the pybind API still support specifying `group_name` at rendezvous time. This PR does not change the behavior of the pybind APIs. Pull Request resolved: pytorch#139677 Approved by: https://github.com/lw ghstack dependencies: pytorch#139529
Stack from ghstack (oldest at bottom):
Previously
SymmetricMemoryonly had private pybind APIs:This PR introduces user-facing APIs empty() and rendezvous():
Notable differences compared to the pybind APIs:
empty()now resemblestorch.empty():torch.deviceor stringgroup_nameneeds to be specified at rendezvous time as opposed to allocation time. See [SymmetricMemory] support specifying group_name at rendezvous time #139529 for the rationales. I feel the new semantic is superior, hence enforcing it in the public API.group_nameat rendezvous time.This PR does not change the behavior of the pybind APIs.
cc @H-Huang @awgu @kwen2501 @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @c-p-i-o