[FSDP2] Added `shard_placement_fn` arg #137496

awgu · 2024-10-08T16:15:20Z

Stack from ghstack (oldest at bottom):

Overview

This PR adds a shard_placement_fn: Optional[Callable[[nn.Parameter], Optional[Shard]] arg to fully_shard that allows users to specify FSDP sharding on a nonzero tensor dim. If doing so, then the tensor dim size must be divisible by the FSDP shard world size.

# Example:
def shard_placement_fn(param: nn.Parameter) -> Optional[Shard]:
    largest_dim = largest_dim_size = -1
    for dim, dim_size in enumerate(param.shape):
        if dim_size > largest_dim_size:
            largest_dim = dim
            largest_dim_size = dim_size
    return Shard(largest_dim)

fully_shard(module, shard_placement_fn=shard_placement_fn)

Follow-Ups

Copy kernels: For all-gather copy-out, we currently copy-out to temporaries and then chunk-dim-0 -> cat-shard-dim, incurring an extra copy for parameters sharded on nonzero tensor dim. Similarly, for reduce-scatter copy-in, we currently chunk-shard-dim -> cat-dim-0, incurring an extra copy for gradients sharded on nonzero tensor dim. @yifuwang has ideas for adding additional split size args to the copy ops that allows fusing these extra copies into the existing all-gather copy-out and reduce-scatter copy-in.

cc @XilunWu @H-Huang @kwen2501 @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @c-p-i-o

[ghstack-poisoned]

pytorch-bot · 2024-10-08T16:15:25Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/137496

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 8fb9a36 with merge base d1b87e2 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

ghstack-source-id: 71ac63d Pull Request resolved: #137496

weifengpy

epic work in such a short time! will sync with @yifuwang on copy kernels

## Overview This PR adds a `shard_placement_fn: Optional[Callable[[nn.Parameter], Optional[Shard]]` arg to `fully_shard` that allows users to specify FSDP sharding on a nonzero tensor dim. If doing so, then the tensor dim size must be divisible by the FSDP shard world size. ``` # Example: def shard_placement_fn(param: nn.Parameter) -> Optional[Shard]: largest_dim = largest_dim_size = -1 for dim, dim_size in enumerate(param.shape): if dim_size > largest_dim_size: largest_dim = dim largest_dim_size = dim_size return Shard(largest_dim) fully_shard(module, shard_placement_fn=shard_placement_fn) ``` ## Follow-Ups - **Copy kernels:** For all-gather copy-out, we currently copy-out to temporaries and then chunk-dim-0 -> cat-shard-dim, incurring an extra copy for parameters sharded on nonzero tensor dim. Similarly, for reduce-scatter copy-in, we currently chunk-shard-dim -> cat-dim-0, incurring an extra copy for gradients sharded on nonzero tensor dim. yifuwang has ideas for adding additional split size args to the copy ops that allows fusing these extra copies into the existing all-gather copy-out and reduce-scatter copy-in. cc XilunWu H-Huang kwen2501 wanchaol fegin fduwjj wz337 wconstab d4l3k c-p-i-o [ghstack-poisoned]

ghstack-source-id: 7d4c314 Pull Request resolved: #137496

awgu · 2024-10-09T19:06:17Z

@pytorchbot merge

pytorchmergebot · 2024-10-09T19:08:10Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

[FSDP2] Added shard_placement_fn arg

f7a7866

[ghstack-poisoned]

pytorch-bot bot added oncall: distributed Add this issue/PR to distributed oncall triage queue release notes: distributed (fsdp) release notes category labels Oct 8, 2024

awgu pushed a commit that referenced this pull request Oct 8, 2024

[FSDP2] Added shard_placement_fn arg

a9594a7

ghstack-source-id: 71ac63d Pull Request resolved: #137496

awgu added release notes: distributed (fsdp2) release notes category and removed release notes: distributed (fsdp) release notes category labels Oct 8, 2024

awgu requested a review from weifengpy October 8, 2024 19:13

awgu marked this pull request as ready for review October 8, 2024 19:13

awgu requested a review from yifuwang October 8, 2024 19:13

weifengpy approved these changes Oct 8, 2024

View reviewed changes

awgu added the ciflow/trunk Trigger trunk jobs on your pull request label Oct 9, 2024

awgu mentioned this pull request Oct 9, 2024

[FSDP2] Fixed incorrect tensor meta after .to(dtype) #137593

Closed

awgu pushed a commit that referenced this pull request Oct 9, 2024

[FSDP2] Added shard_placement_fn arg

25e7c4a

ghstack-source-id: 7d4c314 Pull Request resolved: #137496

pytorchmergebot added the merging label Oct 9, 2024

pytorchmergebot closed this in aa61e25 Oct 9, 2024

pytorchmergebot added Merged and removed merging labels Oct 9, 2024

This was referenced Oct 9, 2024

NotImplementedError: Operator aten.unbind.int does not have a sharding #137649

Closed

[DO NOT REVIEW] --experimental.fsdp_sharding_on_largest_dim pytorch/torchtitan#607

Open

[benchmark] mimic new copy-out kernels with out_views #137683

Closed

awgu mentioned this pull request Oct 21, 2024

[FSDP2] Added shard_placement_fn arg #136221

Closed

github-actions bot deleted the gh/awgu/649/head branch November 9, 2024 02:04

awgu mentioned this pull request Feb 25, 2025

[RFC] Per-Parameter-Sharding FSDP #114299

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[FSDP2] Added `shard_placement_fn` arg #137496

[FSDP2] Added `shard_placement_fn` arg #137496

Uh oh!

awgu commented Oct 8, 2024 •

edited

Loading

Uh oh!

pytorch-bot bot commented Oct 8, 2024 •

edited

Loading

Uh oh!

weifengpy left a comment

Uh oh!

awgu commented Oct 9, 2024

Uh oh!

pytorchmergebot commented Oct 9, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[FSDP2] Added shard_placement_fn arg #137496

[FSDP2] Added shard_placement_fn arg #137496

Uh oh!

Conversation

awgu commented Oct 8, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Follow-Ups

Uh oh!

pytorch-bot bot commented Oct 8, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/137496

✅ No Failures

Uh oh!

weifengpy left a comment

Choose a reason for hiding this comment

Uh oh!

awgu commented Oct 9, 2024

Uh oh!

pytorchmergebot commented Oct 9, 2024

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[FSDP2] Added `shard_placement_fn` arg #137496

[FSDP2] Added `shard_placement_fn` arg #137496

awgu commented Oct 8, 2024 •

edited

Loading

pytorch-bot bot commented Oct 8, 2024 •

edited

Loading