-
Notifications
You must be signed in to change notification settings - Fork 25.7k
[TP] Enable embedding sharding in TP API #111177
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
[ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/111177
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit cd69e2d with merge base 35750bf ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you also update the doc of ColwiseParallel
and RowwiseParallel
style to explicitly mention what modules are supported?
We see use cases where embedding sharding is also needed in TP API so we enabled it in the API since DTensor already support colwise embedding sharding. [ghstack-poisoned]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
stamp to unblock, some suggestions about the doc
We see use cases where embedding sharding is also needed in TP API so we enabled it in the API since DTensor already support colwise embedding sharding. [ghstack-poisoned]
We see use cases where embedding sharding is also needed in TP API so we enabled it in the API since DTensor already support colwise embedding sharding. [ghstack-poisoned]
We see use cases where embedding sharding is also needed in TP API so we enabled it in the API since DTensor already support colwise embedding sharding. [ghstack-poisoned]
We see use cases where embedding sharding is also needed in TP API so we enabled it in the API since DTensor already support colwise embedding sharding. [ghstack-poisoned]
We see use cases where embedding sharding is also needed in TP API so we enabled it in the API since DTensor already support colwise embedding sharding. [ghstack-poisoned]
We see use cases where embedding sharding is also needed in TP API so we enabled it in the API since DTensor already support colwise embedding sharding. [ghstack-poisoned]
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Merge failedReason: 1 mandatory check(s) failed. The first few are: Dig deeper by viewing the failures on hud |
We see use cases where embedding sharding is also needed in TP API so we enabled it in the API since DTensor already support colwise embedding sharding. [ghstack-poisoned]
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
@fduwjj can you fix the doc errors? Looks like the doc generated from this PR have multiple issues, see https://docs-preview.pytorch.org/pytorch/pytorch/111177/distributed.tensor.parallel.html#torch.distributed.tensor.parallel.style.ColwiseParallel All the examples in the doc (i.e. Colwise/Rowwise) are not properly formatted, and also |
We see use cases where embedding sharding is also needed in TP API so we enabled it in the API since DTensor already support colwise embedding sharding. Pull Request resolved: pytorch#111177 Approved by: https://github.com/wanchaol ghstack dependencies: pytorch#111160, pytorch#111166, pytorch#111176
Pull Request resolved: pytorch#111346 Approved by: https://github.com/wanchaol ghstack dependencies: pytorch#111160, pytorch#111166, pytorch#111176, pytorch#111177
Stack from ghstack (oldest at bottom):
We see use cases where embedding sharding is also needed in TP API so we enabled it in the API since DTensor already support colwise embedding sharding.