-
Notifications
You must be signed in to change notification settings - Fork 25.7k
[DTensor][1/N] add forward layer norm support #113105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
[ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/113105
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 400ef8f with merge base 56e514a ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
first pass, please see the inline comments, I think the semantics of replicate_dim_start_at looks wrong and it might make all the tensor dims be replicated.
| # must be in form of OpStrategy | ||
| assert isinstance(input_schema, OpStrategy) | ||
| assert isinstance(normalized_shape, (int, Sequence, torch.Size)) | ||
| normalized_size = ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought you should call normalize_to_torch_size that's why we refactored that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oops, the code was override in rebase...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note: forward fix normalize_to_torch_size in #113244 to handle the size argument of type int, Sequence, or torch.Size. This fix also enables 3 dtensor op tests.
**Summary**: This PR adds DTensor implementation for ATen op `native_layer_norm`. **Test**: `pytest test/distributed/_tensor/test_dtensor_ops.py -s -k layer_norm` [ghstack-poisoned]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have a few comments inlined otherwise good to go!
| @@ -1,5 +1,5 @@ | |||
| # Copyright (c) Meta Platforms, Inc. and affiliates | |||
| from typing import cast, List, Optional, Tuple | |||
| from typing import cast, List, Optional, Sequence, Tuple | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In addition to the op level tests, I think we should add a test in test_math_ops.py to use distribute_module to partition a nn.LayerNorm, and mark input as sharded in prepare_input_fn, to make sure things work for the forward case when we apply this to nn.Module, this can be a follow up PR
**Summary**: This PR adds DTensor implementation for ATen op `native_layer_norm`. **Test**: `pytest test/distributed/_tensor/test_dtensor_ops.py -s -k layer_norm` [ghstack-poisoned]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm!
…ize to accept single int size (#113244) **Summary**: In #113105 I used the util function `normalize_to_torch_size` to unify the `size` argument that may be in multiple formats. However the use of that function would only handle input of `Sequence[int]` therefore I submit this forward fix to have `normalize_to_torch_size` also able to handle the size argument of type `int` and `torch.Size`. A side product of this fix is it also enables 3 dtensor op tests (check `test_dtensor_ops.py`). **Test**: `pytest test/distributed/_tensor/test_dtensor_ops.py` Pull Request resolved: #113244 Approved by: https://github.com/wanchaol ghstack dependencies: #113105
**Summary**: This PR adds DTensor implementation for ATen op `native_layer_norm`. **Test**: `pytest test/distributed/_tensor/test_dtensor_ops.py -s -k layer_norm` Pull Request resolved: pytorch#113105 Approved by: https://github.com/wanchaol
…ize to accept single int size (pytorch#113244) **Summary**: In pytorch#113105 I used the util function `normalize_to_torch_size` to unify the `size` argument that may be in multiple formats. However the use of that function would only handle input of `Sequence[int]` therefore I submit this forward fix to have `normalize_to_torch_size` also able to handle the size argument of type `int` and `torch.Size`. A side product of this fix is it also enables 3 dtensor op tests (check `test_dtensor_ops.py`). **Test**: `pytest test/distributed/_tensor/test_dtensor_ops.py` Pull Request resolved: pytorch#113244 Approved by: https://github.com/wanchaol ghstack dependencies: pytorch#113105
Stack from ghstack (oldest at bottom):
Summary:
This PR adds DTensor implementation for ATen op
native_layer_norm.Test:
pytest test/distributed/_tensor/test_dtensor_ops.py -s -k layer_normcc @wanchaol