KEMBAR78
[SymmetricMemory] introduce user-facing APIs empty() and rendezvous() by yifuwang · Pull Request #139677 · pytorch/pytorch · GitHub
Skip to content

Conversation

@yifuwang
Copy link
Collaborator

@yifuwang yifuwang commented Nov 5, 2024

Stack from ghstack (oldest at bottom):

Previously SymmetricMemory only had private pybind APIs:

from torch.distributed._symmetric_memory import _SymmetricMemory
t = _SymmetricMemory.empty_strided_p2p(
    size=(64,),
    stride=(1,),
    dtype=torch.float32,
    device=device,
)
symm_mem_hdl = _SymmetricMemory.rendezvous(t, group_name=group.group_name)

This PR introduces user-facing APIs empty() and rendezvous():

import torch.distributed._symmetric_memory as symm_mem
t = symm_mem.empty(64, device="cuda")
symm_mem_hdl = symm_mem.rendezvous(t, group_name=group.group_name)

Notable differences compared to the pybind APIs:

  • empty() now resembles torch.empty():
    • shape can either be an integer sequence or pack
    • no need to/can't specify stride anymore
    • device can either be torch.device or string
  • group_name needs to be specified at rendezvous time as opposed to allocation time. See [SymmetricMemory] support specifying group_name at rendezvous time #139529 for the rationales. I feel the new semantic is superior, hence enforcing it in the public API.
    • Currently, the pybind API still support specifying group_name at rendezvous time.

This PR does not change the behavior of the pybind APIs.

cc @H-Huang @awgu @kwen2501 @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @c-p-i-o

@pytorch-bot
Copy link

pytorch-bot bot commented Nov 5, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/139677

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

✅ No Failures

As of commit 760b83f with merge base b86b534 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

yifuwang pushed a commit that referenced this pull request Nov 5, 2024
ghstack-source-id: da9afa9
Pull Request resolved: #139677
@yifuwang yifuwang changed the title [SymmetricMemory] initial user-facing API [SymmetricMemory] initial user-facing APIs Nov 5, 2024
cc H-Huang awgu kwen2501 wanchaol fegin fduwjj wz337 wconstab d4l3k c-p-i-o

[ghstack-poisoned]
cc H-Huang awgu kwen2501 wanchaol fegin fduwjj wz337 wconstab d4l3k c-p-i-o

[ghstack-poisoned]
yifuwang pushed a commit that referenced this pull request Nov 14, 2024
ghstack-source-id: 34c4cab
Pull Request resolved: #139677
@yifuwang yifuwang changed the title [SymmetricMemory] initial user-facing APIs [SymmetricMemory] introduce user-facing APIs: empty() and rendezvous() Nov 14, 2024
@yifuwang yifuwang requested review from Chillee, lw and weifengpy November 14, 2024 20:37
…rendezvous()"


Previously `SymmetricMemory` only had private pybind APIs:
```python
from torch.distributed._symmetric_memory import _SymmetricMemory
t = _SymmetricMemory.empty_strided_p2p(
    size=(64,),
    stride=(1,),
    dtype=torch.float32,
    device=device,
)
symm_mem_hdl = _SymmetricMemory.rendezvous(t, group_name=group.group_name)
```

This PR introduces user-facing APIs: empty() and rendezvous():
```python
import torch.distributed._symmetric_memory as symm_mem
t = symm_mem.empty(64, device="cuda")
symm_mem_hdl = symm_mem.rendezvous(t, group_name=group.group_name)
```

Notable differences compared to the pybind APIs:
- `empty()` now resembles `torch.empty()`:
  - shape can either be an integer sequence or pack
  - no need to/can't specify stride anymore
  - device can either be `torch.device` or string
- `group_name` needs to be specified at rendezvous time as opposed to allocation time. See #139529 for the rationales. I feel the new semantic is superior, hence enforcing it in the public API.
  - Currently, the pybind API still support specifying `group_name` at rendezvous time. 

This PR does not change the behavior of the pybind APIs.

cc H-Huang awgu kwen2501 wanchaol fegin fduwjj wz337 wconstab d4l3k c-p-i-o

[ghstack-poisoned]
@yifuwang yifuwang marked this pull request as ready for review November 14, 2024 20:52
@yifuwang yifuwang changed the title [SymmetricMemory] introduce user-facing APIs: empty() and rendezvous() [SymmetricMemory] introduce user-facing API empty() and rendezvous() Nov 14, 2024
@yifuwang yifuwang changed the title [SymmetricMemory] introduce user-facing API empty() and rendezvous() [SymmetricMemory] introduce user-facing APIs empty() and rendezvous() Nov 14, 2024
…endezvous()"


Previously `SymmetricMemory` only had private pybind APIs:
```python
from torch.distributed._symmetric_memory import _SymmetricMemory
t = _SymmetricMemory.empty_strided_p2p(
    size=(64,),
    stride=(1,),
    dtype=torch.float32,
    device=device,
)
symm_mem_hdl = _SymmetricMemory.rendezvous(t, group_name=group.group_name)
```

This PR introduces user-facing APIs empty() and rendezvous():
```python
import torch.distributed._symmetric_memory as symm_mem
t = symm_mem.empty(64, device="cuda")
symm_mem_hdl = symm_mem.rendezvous(t, group_name=group.group_name)
```

Notable differences compared to the pybind APIs:
- `empty()` now resembles `torch.empty()`:
  - shape can either be an integer sequence or pack
  - no need to/can't specify stride anymore
  - device can either be `torch.device` or string
- `group_name` needs to be specified at rendezvous time as opposed to allocation time. See #139529 for the rationales. I feel the new semantic is superior, hence enforcing it in the public API.
  - Currently, the pybind API still support specifying `group_name` at rendezvous time. 

This PR does not change the behavior of the pybind APIs.

cc H-Huang awgu kwen2501 wanchaol fegin fduwjj wz337 wconstab d4l3k c-p-i-o

[ghstack-poisoned]
yifuwang pushed a commit that referenced this pull request Nov 14, 2024
Copy link
Contributor

@lw lw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the API looks good, I believe it boils down to the same "verbs" as before (it used to be two steps, it is still two steps) but they have become somewhat simplified (no more explicit strides, one can pass PG instances, ...) and they have become public, which makes it feel more "ok" to call them :)

The only question I have is how do you intend them to be imported in user code:

  • The function names (empty and rendezvous) are quite common so I guess one won't import them directly (i.e., no from torch.distributed._symmetric_memory import ...).
  • However, their fully qualified name is long, hence it's perhaps a bit unwieldy to invoke torch.distributed._symmetric_memory.rendezvous directly.
  • Is the intended usage that people import the module with a shorthand (import torch.distributed._symmetric_memory as symm_mem) and then use that to invoke the functions (symm_mem.empty(...))? This is what is done in the tests.

_SymmetricMemory: the symmetric memory workspace associated with the
group.
"""
enable_symm_mem_for_group(group_name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah this looks like a key change! Does it mean that now, if one calls the fused matmul+comms ops, they lazily initialize the SymmetricMemory on their own?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, right! I realized it's redundant to require users to call enable_symm_mem_for_group(group_name) when they are already explicitly using the group for symmetric memory. This indirectly addresses the tracing issue you encountered.

@yifuwang
Copy link
Collaborator Author

The only question I have is how do you intend them to be imported in user code:

  • The function names (empty and rendezvous) are quite common so I guess one won't import them directly (i.e., no from torch.distributed._symmetric_memory import ...).
  • However, their fully qualified name is long, hence it's perhaps a bit unwieldy to invoke torch.distributed._symmetric_memory.rendezvous directly.
  • Is the intended usage that people import the module with a shorthand (import torch.distributed._symmetric_memory as symm_mem) and then use that to invoke the functions (symm_mem.empty(...))? This is what is done in the tests.

The third bullet point is what I had in mind.

My thinking is that, since the API is already scoped at the module level, it might be redundant to further distinguish it at the function level. Here are two approaches we can recommend to users for working with the API:

import torch.distributed._symmetric_memory as symm_mem

t = symm_mem.empty(...)
from torch.distributed._symmetric_memory import empty as symm_mem_empty

t = symm_mem_empty(...)

This approach is also consistent with other empty() variants in distributed (e.g. dtensor api).

Let me know if you have further suggestions!

…endezvous()"


Previously `SymmetricMemory` only had private pybind APIs:
```python
from torch.distributed._symmetric_memory import _SymmetricMemory
t = _SymmetricMemory.empty_strided_p2p(
    size=(64,),
    stride=(1,),
    dtype=torch.float32,
    device=device,
)
symm_mem_hdl = _SymmetricMemory.rendezvous(t, group_name=group.group_name)
```

This PR introduces user-facing APIs empty() and rendezvous():
```python
import torch.distributed._symmetric_memory as symm_mem
t = symm_mem.empty(64, device="cuda")
symm_mem_hdl = symm_mem.rendezvous(t, group_name=group.group_name)
```

Notable differences compared to the pybind APIs:
- `empty()` now resembles `torch.empty()`:
  - shape can either be an integer sequence or pack
  - no need to/can't specify stride anymore
  - device can either be `torch.device` or string
- `group_name` needs to be specified at rendezvous time as opposed to allocation time. See #139529 for the rationales. I feel the new semantic is superior, hence enforcing it in the public API.
  - Currently, the pybind API still support specifying `group_name` at rendezvous time. 

This PR does not change the behavior of the pybind APIs.

cc H-Huang awgu kwen2501 wanchaol fegin fduwjj wz337 wconstab d4l3k c-p-i-o

[ghstack-poisoned]
yifuwang pushed a commit that referenced this pull request Nov 16, 2024
…endezvous()"


Previously `SymmetricMemory` only had private pybind APIs:
```python
from torch.distributed._symmetric_memory import _SymmetricMemory
t = _SymmetricMemory.empty_strided_p2p(
    size=(64,),
    stride=(1,),
    dtype=torch.float32,
    device=device,
)
symm_mem_hdl = _SymmetricMemory.rendezvous(t, group_name=group.group_name)
```

This PR introduces user-facing APIs empty() and rendezvous():
```python
import torch.distributed._symmetric_memory as symm_mem
t = symm_mem.empty(64, device="cuda")
symm_mem_hdl = symm_mem.rendezvous(t, group_name=group.group_name)
```

Notable differences compared to the pybind APIs:
- `empty()` now resembles `torch.empty()`:
  - shape can either be an integer sequence or pack
  - no need to/can't specify stride anymore
  - device can either be `torch.device` or string
- `group_name` needs to be specified at rendezvous time as opposed to allocation time. See #139529 for the rationales. I feel the new semantic is superior, hence enforcing it in the public API.
  - Currently, the pybind API still support specifying `group_name` at rendezvous time. 

This PR does not change the behavior of the pybind APIs.

cc H-Huang awgu kwen2501 wanchaol fegin fduwjj wz337 wconstab d4l3k c-p-i-o

[ghstack-poisoned]
…endezvous()"


Previously `SymmetricMemory` only had private pybind APIs:
```python
from torch.distributed._symmetric_memory import _SymmetricMemory
t = _SymmetricMemory.empty_strided_p2p(
    size=(64,),
    stride=(1,),
    dtype=torch.float32,
    device=device,
)
symm_mem_hdl = _SymmetricMemory.rendezvous(t, group_name=group.group_name)
```

This PR introduces user-facing APIs empty() and rendezvous():
```python
import torch.distributed._symmetric_memory as symm_mem
t = symm_mem.empty(64, device="cuda")
symm_mem_hdl = symm_mem.rendezvous(t, group_name=group.group_name)
```

Notable differences compared to the pybind APIs:
- `empty()` now resembles `torch.empty()`:
  - shape can either be an integer sequence or pack
  - no need to/can't specify stride anymore
  - device can either be `torch.device` or string
- `group_name` needs to be specified at rendezvous time as opposed to allocation time. See #139529 for the rationales. I feel the new semantic is superior, hence enforcing it in the public API.
  - Currently, the pybind API still support specifying `group_name` at rendezvous time. 

This PR does not change the behavior of the pybind APIs.

cc H-Huang awgu kwen2501 wanchaol fegin fduwjj wz337 wconstab d4l3k c-p-i-o

[ghstack-poisoned]
yifuwang pushed a commit that referenced this pull request Nov 17, 2024
@yifuwang
Copy link
Collaborator Author

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Nov 17, 2024
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

pobin6 pushed a commit to pobin6/pytorch that referenced this pull request Dec 5, 2024
…pytorch#139677)

Previously `SymmetricMemory` only had private pybind APIs:
```python
from torch.distributed._symmetric_memory import _SymmetricMemory
t = _SymmetricMemory.empty_strided_p2p(
    size=(64,),
    stride=(1,),
    dtype=torch.float32,
    device=device,
)
symm_mem_hdl = _SymmetricMemory.rendezvous(t, group_name=group.group_name)
```

This PR introduces user-facing APIs empty() and rendezvous():
```python
import torch.distributed._symmetric_memory as symm_mem
t = symm_mem.empty(64, device="cuda")
symm_mem_hdl = symm_mem.rendezvous(t, group_name=group.group_name)
```

Notable differences compared to the pybind APIs:
- `empty()` now resembles `torch.empty()`:
  - shape can either be an integer sequence or pack
  - no need to/can't specify stride anymore
  - device can either be `torch.device` or string
- `group_name` needs to be specified at rendezvous time as opposed to allocation time. See pytorch#139529 for the rationales. I feel the new semantic is superior, hence enforcing it in the public API.
  - Currently, the pybind API still support specifying `group_name` at rendezvous time.

This PR does not change the behavior of the pybind APIs.

Pull Request resolved: pytorch#139677
Approved by: https://github.com/lw
ghstack dependencies: pytorch#139529
@github-actions github-actions bot deleted the gh/yifuwang/167/head branch December 19, 2024 02:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk Trigger trunk jobs on your pull request Merged oncall: distributed Add this issue/PR to distributed oncall triage queue release notes: distributed (c10d) release notes category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants