[DTensor] Used new placements for neg dim in `from_local` #114134

awgu · 2023-11-20T16:47:21Z

Stack from ghstack (oldest at bottom):

[ghstack-poisoned]

pytorch-bot · 2023-11-20T16:47:26Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/114134

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 43a9a1f with merge base 140c54e ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

awgu · 2023-11-20T17:24:19Z

torch/distributed/_tensor/api.py

+            placements = list(placements)
+            for idx, placement in enumerate(placements):
+                # normalize shard dim to be positive
+                if placement.is_shard():


@wanchaol Should we converge to using placement.is_shard() or to using isinstance(placement, Shard)? The former calls the latter but allows for passing a dim arg to further check against, and the latter avoids having to use cast(Shard, placement).

It seems like is_shard() is a higher level construct and should be preferred, but I wanted to check.

that's exactly the trade off you pointed out lol, I would like to use the former uniformly, but mypy can't recognize it as a result there're many redundant cast needed if we switch all callsite to that..

I think we can use either of them when we feel one is more easy to use. Maybe we can do this in the meanwhile:

isinstance(placement, Shard) preferred if do simple type check

is_shard(dim) where dim become non-optional, so that this API only used as a util to check if the placement is shard on a certain tensor dim

awgu · 2023-11-20T17:33:47Z

torch/distributed/_tensor/api.py

+                    placement = cast(Shard, placement)
+                    if placement.dim < 0:
+                        placements[idx] = Shard(placement.dim + local_tensor.ndim)



The conversion of placements to tuple is below:

pytorch/torch/distributed/_tensor/api.py

Lines 358 to 365 in 43a9a1f

return _FromTorchTensor.apply( # pyre-ignore[16]: autograd func

local_tensor,

device_mesh,

tuple(placements),

run_check,

shape,

stride,

)

wanchaol

cool!

Pull Request resolved: #113925 Approved by: https://github.com/wanchaol ghstack dependencies: #113919, #113924, #114134

…3930) Pull Request resolved: #113930 Approved by: https://github.com/wanchaol ghstack dependencies: #113919, #113924, #114134, #113925

This is a replacement for #113922. I think we can still leave the check for negative shard dimension in `compute_local_shape_and_global_offset` and replace the normalization logic with an assert. This should provide us a stack trace to see which user-facing API did not normalize the dim as expected. Pull Request resolved: #114141 Approved by: https://github.com/wanchaol ghstack dependencies: #113919, #113924, #114134, #113925, #113930

**Overview** Generally, I think we can try to freeze as many of these classes used in DTensor sharding propagation as possible so that we can cache hashes. This PR targets hashing `DTensorSpec`, which turns out to be relatively expensive. **Details** It looks like `tensor_meta` is only updated in `_wrap_output_spec_tensor_meta`, which only runs if the propagation was not cached: https://github.com/pytorch/pytorch/blob/ae94c7e491e22f58d3df66571c1a568e51d70acd/torch/distributed/_tensor/sharding_prop.py#L137 https://github.com/pytorch/pytorch/blob/ae94c7e491e22f58d3df66571c1a568e51d70acd/torch/distributed/_tensor/sharding_prop.py#L153 In that case, I think we can cache the hash for the `DTensorSpec` and only update it when one of the hashed attributes changes, which we only really expect to happen for `tensor_meta`. To ensure correctness, we need that all hashed attributes are immutable. - `DeviceMesh` caches its hash: https://github.com/pytorch/pytorch/blob/a9134fa99a8986adf478a12db2ea5729d24554db/torch/distributed/_device_mesh.py#L181 - This PR makes each `Placement` a frozen `dataclass`, making them immutable (relying on the fact that they do not have references to any mutable objects). - `TensorMeta` is a `NamedTuple` of `torch.Size`, `Tuple[int, ...]`, and `torch.dtype`, so it is immutable: https://github.com/pytorch/pytorch/blob/9916d8a9eaaf2c05c131f2a2dbe9eabeeaa9dffc/torch/distributed/_tensor/placement_types.py#L369-L375 **Example** For some simple small GPT model: Before: 0.125 ms <img width="509" alt="Screenshot 2023-11-16 at 10 08 05 PM" src="https://github.com/pytorch/pytorch/assets/31054793/10e59401-f635-431f-80b5-1b48df3a706e"> After: 0.048 ms <img width="294" alt="Screenshot 2023-11-16 at 10 08 47 PM" src="https://github.com/pytorch/pytorch/assets/31054793/09a3b0b9-f68c-4afc-bca1-c29a4b01c2fb"> The overall Adam CPU step time decreases from 7.647 ms to 6.451 ms. Pull Request resolved: #113915 Approved by: https://github.com/wanchaol ghstack dependencies: #113919, #113924, #114134, #113925, #113930, #114141

This is a nit change to save one `isinstance` call for when `dim` is not `None` but the placement is not `Shard`. Pull Request resolved: #114140 Approved by: https://github.com/Skylion007, https://github.com/wanchaol ghstack dependencies: #113919, #113924, #114134, #113925, #113930, #114141, #113915

This is a forward fix for #113781. We lazily compute the hash so that we do not try to compute the hash on `SymInt`s (for the stride) during Dynamo tracing. Tested via: ``` python test/distributed/_tensor/test_dtensor_compile.py -k test_2d_fsdp_tp_ac_compile ``` Pull Request resolved: #114322 Approved by: https://github.com/wanchaol ghstack dependencies: #113919, #113924, #114134, #113925, #113930, #114141, #113915, #114140

[DTensor] Used new placements for neg dim in from_local

43a9a1f

[ghstack-poisoned]

This was referenced Nov 20, 2023

[DTensor] Made _Partial, Replicate frozen dataclasses #113919

Closed

[DTensor] Used new placements for neg dim in redistribute #113924

Closed

awgu commented Nov 20, 2023

View reviewed changes

awgu mentioned this pull request Nov 20, 2023

[DTensor] Replaced neg dim normalization with assert in helper #114141

Closed

awgu requested a review from wanchaol November 20, 2023 17:32

awgu commented Nov 20, 2023

View reviewed changes

awgu marked this pull request as ready for review November 20, 2023 17:34

wanchaol approved these changes Nov 20, 2023

View reviewed changes

awgu added ciflow/trunk Trigger trunk jobs on your pull request release notes: distributed (dtensor) release notes category labels Nov 20, 2023

pytorchmergebot added the Merged label Nov 20, 2023

pytorchmergebot closed this in f4ffd46 Nov 20, 2023

pytorchmergebot pushed a commit that referenced this pull request Nov 20, 2023

[DTensor] Ensured grad_placements was tuple (#113925)

e2095a0

Pull Request resolved: #113925 Approved by: https://github.com/wanchaol ghstack dependencies: #113919, #113924, #114134

awgu mentioned this pull request Nov 22, 2023

[DTensor] Computed DTensorSpec hash lazily #114322

Closed

awgu mentioned this pull request Nov 22, 2023

[FSDP] Added DDP parity test for CPU training #114372

Closed

facebook-github-bot deleted the gh/awgu/462/head branch November 24, 2023 15:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[DTensor] Used new placements for neg dim in `from_local` #114134

[DTensor] Used new placements for neg dim in `from_local` #114134

Uh oh!

awgu commented Nov 20, 2023 •

edited

Loading

Uh oh!

pytorch-bot bot commented Nov 20, 2023 •

edited

Loading

Uh oh!

awgu Nov 20, 2023

Uh oh!

wanchaol Nov 20, 2023 •

edited

Loading

Uh oh!

awgu Nov 20, 2023

Uh oh!

wanchaol left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	return _FromTorchTensor.apply( # pyre-ignore[16]: autograd func
	local_tensor,
	device_mesh,
	tuple(placements),
	run_check,
	shape,
	stride,
	)

[DTensor] Used new placements for neg dim in from_local #114134

[DTensor] Used new placements for neg dim in from_local #114134

Uh oh!

Conversation

awgu commented Nov 20, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Nov 20, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/114134

✅ No Failures

Uh oh!

awgu Nov 20, 2023

Choose a reason for hiding this comment

Uh oh!

wanchaol Nov 20, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

awgu Nov 20, 2023

Choose a reason for hiding this comment

Uh oh!

wanchaol left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[DTensor] Used new placements for neg dim in `from_local` #114134

[DTensor] Used new placements for neg dim in `from_local` #114134

awgu commented Nov 20, 2023 •

edited

Loading

pytorch-bot bot commented Nov 20, 2023 •

edited

Loading

wanchaol Nov 20, 2023 •

edited

Loading