KEMBAR78
[DTensor][1/N] add forward layer norm support by XilunWu · Pull Request #113105 · pytorch/pytorch · GitHub
Skip to content

Conversation

@XilunWu
Copy link
Contributor

@XilunWu XilunWu commented Nov 7, 2023

Stack from ghstack (oldest at bottom):

Summary:
This PR adds DTensor implementation for ATen op native_layer_norm.

Test:
pytest test/distributed/_tensor/test_dtensor_ops.py -s -k layer_norm

cc @wanchaol

@pytorch-bot
Copy link

pytorch-bot bot commented Nov 7, 2023

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/113105

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 400ef8f with merge base 56e514a (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

XilunWu added a commit that referenced this pull request Nov 7, 2023
ghstack-source-id: 00df82a
Pull Request resolved: #113105
Copy link
Collaborator

@wanchaol wanchaol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

first pass, please see the inline comments, I think the semantics of replicate_dim_start_at looks wrong and it might make all the tensor dims be replicated.

# must be in form of OpStrategy
assert isinstance(input_schema, OpStrategy)
assert isinstance(normalized_shape, (int, Sequence, torch.Size))
normalized_size = (
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought you should call normalize_to_torch_size that's why we refactored that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops, the code was override in rebase...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: forward fix normalize_to_torch_size in #113244 to handle the size argument of type int, Sequence, or torch.Size. This fix also enables 3 dtensor op tests.

**Summary**:
This PR adds DTensor implementation for ATen op `native_layer_norm`.

**Test**:
`pytest test/distributed/_tensor/test_dtensor_ops.py -s -k layer_norm`


[ghstack-poisoned]
XilunWu added a commit that referenced this pull request Nov 7, 2023
ghstack-source-id: 36d25c9
Pull Request resolved: #113105
Copy link
Collaborator

@wanchaol wanchaol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have a few comments inlined otherwise good to go!

@@ -1,5 +1,5 @@
# Copyright (c) Meta Platforms, Inc. and affiliates
from typing import cast, List, Optional, Tuple
from typing import cast, List, Optional, Sequence, Tuple
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition to the op level tests, I think we should add a test in test_math_ops.py to use distribute_module to partition a nn.LayerNorm, and mark input as sharded in prepare_input_fn, to make sure things work for the forward case when we apply this to nn.Module, this can be a follow up PR

**Summary**:
This PR adds DTensor implementation for ATen op `native_layer_norm`.

**Test**:
`pytest test/distributed/_tensor/test_dtensor_ops.py -s -k layer_norm`


[ghstack-poisoned]
XilunWu added a commit that referenced this pull request Nov 7, 2023
ghstack-source-id: 4e87fa4
Pull Request resolved: #113105
Copy link
Collaborator

@wanchaol wanchaol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

pytorchmergebot pushed a commit that referenced this pull request Nov 8, 2023
…ize to accept single int size (#113244)

**Summary**:
In #113105 I used the util function `normalize_to_torch_size` to unify the `size` argument that may be in multiple formats. However the use of that function would only handle input of `Sequence[int]` therefore I submit this forward fix to have `normalize_to_torch_size` also able to handle the size argument of type `int` and `torch.Size`. A side product of this fix is it also enables 3 dtensor op tests (check `test_dtensor_ops.py`).

**Test**:
`pytest test/distributed/_tensor/test_dtensor_ops.py`

Pull Request resolved: #113244
Approved by: https://github.com/wanchaol
ghstack dependencies: #113105
@facebook-github-bot facebook-github-bot deleted the gh/XilunWu/48/head branch November 12, 2023 15:23
Skylion007 pushed a commit to Skylion007/pytorch that referenced this pull request Nov 14, 2023
**Summary**:
This PR adds DTensor implementation for ATen op `native_layer_norm`.

**Test**:
`pytest test/distributed/_tensor/test_dtensor_ops.py -s -k layer_norm`

Pull Request resolved: pytorch#113105
Approved by: https://github.com/wanchaol
Skylion007 pushed a commit to Skylion007/pytorch that referenced this pull request Nov 14, 2023
…ize to accept single int size (pytorch#113244)

**Summary**:
In pytorch#113105 I used the util function `normalize_to_torch_size` to unify the `size` argument that may be in multiple formats. However the use of that function would only handle input of `Sequence[int]` therefore I submit this forward fix to have `normalize_to_torch_size` also able to handle the size argument of type `int` and `torch.Size`. A side product of this fix is it also enables 3 dtensor op tests (check `test_dtensor_ops.py`).

**Test**:
`pytest test/distributed/_tensor/test_dtensor_ops.py`

Pull Request resolved: pytorch#113244
Approved by: https://github.com/wanchaol
ghstack dependencies: pytorch#113105
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk Trigger trunk jobs on your pull request Merged module: dtensor distributed tensor tag release notes: distributed (dtensor) release notes category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants