-
Notifications
You must be signed in to change notification settings - Fork 25.7k
Relax tensor contiguity requirement for P2P ops #114982
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/114982
Note: Links to docs will display an error until the docs builds have been completed. ✅ You can merge normally! (3 Unrelated Failures)As of commit bf80d3f with merge base 624f202 ( FLAKY - The following jobs failed but were likely due to flakiness present on trunk:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
| self.assertEqual(send_tensor, recv_tensor) | ||
|
|
||
| # Test with non-contiguous tensors. | ||
| send_tensor_view = send_tensor.t() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we keep this test and assert values are expected on rank 1?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point! When we confirm that we can officially support this use case, we should add that test back.
## Description This example requires PyTorch PR pytorch/pytorch#114982 to work, because stage 0 and stage 2 seem to be transmitting non-contiguous tensors. ## Test https://gist.github.com/kwen2501/33fa5723496992691f8b1cc7daaadd89
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Needs pytorch/pytorch#114982 to work. ``` BlenderbotForCausalLM( (model): BlenderbotDecoderWrapper( (decoder): BlenderbotDecoder( (embed_tokens): Embedding(8008, 2560, padding_idx=0) (embed_positions): BlenderbotLearnedPositionalEmbedding(128, 2560) (layers): ModuleList( (0-23): 24 x BlenderbotDecoderLayer( (self_attn): BlenderbotAttention( (k_proj): Linear(in_features=2560, out_features=2560, bias=True) (v_proj): Linear(in_features=2560, out_features=2560, bias=True) (q_proj): Linear(in_features=2560, out_features=2560, bias=True) (out_proj): Linear(in_features=2560, out_features=2560, bias=True) ) (activation_fn): GELUActivation() (self_attn_layer_norm): LayerNorm((2560,), eps=1e-05, elementwise_affine=True) (encoder_attn): BlenderbotAttention( (k_proj): Linear(in_features=2560, out_features=2560, bias=True) (v_proj): Linear(in_features=2560, out_features=2560, bias=True) (q_proj): Linear(in_features=2560, out_features=2560, bias=True) (out_proj): Linear(in_features=2560, out_features=2560, bias=True) ) (encoder_attn_layer_norm): LayerNorm((2560,), eps=1e-05, elementwise_affine=True) (fc1): Linear(in_features=2560, out_features=10240, bias=True) (fc2): Linear(in_features=10240, out_features=2560, bias=True) (final_layer_norm): LayerNorm((2560,), eps=1e-05, elementwise_affine=True) ) ) (layer_norm): LayerNorm((2560,), eps=1e-05, elementwise_affine=True) ) ) (lm_head): Linear(in_features=2560, out_features=8008, bias=False) ) ```
Needs pytorch/pytorch#114982 to work. ``` PLBartForCausalLM( (model): PLBartDecoderWrapper( (decoder): PLBartDecoder( (embed_tokens): Embedding(50005, 768, padding_idx=1) (embed_positions): PLBartLearnedPositionalEmbedding(1026, 768) (layers): ModuleList( (0-5): 6 x PLBartDecoderLayer( (self_attn): PLBartAttention( (k_proj): Linear(in_features=768, out_features=768, bias=True) (v_proj): Linear(in_features=768, out_features=768, bias=True) (q_proj): Linear(in_features=768, out_features=768, bias=True) (out_proj): Linear(in_features=768, out_features=768, bias=True) ) (activation_fn): GELUActivation() (self_attn_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (encoder_attn): PLBartAttention( (k_proj): Linear(in_features=768, out_features=768, bias=True) (v_proj): Linear(in_features=768, out_features=768, bias=True) (q_proj): Linear(in_features=768, out_features=768, bias=True) (out_proj): Linear(in_features=768, out_features=768, bias=True) ) (encoder_attn_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) (fc1): Linear(in_features=768, out_features=3072, bias=True) (fc2): Linear(in_features=3072, out_features=768, bias=True) (final_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True) ) ) (layernorm_embedding): LayerNorm((768,), eps=1e-05, elementwise_affine=True) ) ) (lm_head): Linear(in_features=768, out_features=50005, bias=False) ) ```
Requires pytorch/pytorch#114982 to work. ``` TrOCRForCausalLM( (model): TrOCRDecoderWrapper( (decoder): TrOCRDecoder( (embed_tokens): Embedding(50265, 1024, padding_idx=1) (embed_positions): TrOCRLearnedPositionalEmbedding(514, 1024) (layernorm_embedding): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) (layers): ModuleList( (0-11): 12 x TrOCRDecoderLayer( (self_attn): TrOCRAttention( (k_proj): Linear(in_features=1024, out_features=1024, bias=True) (v_proj): Linear(in_features=1024, out_features=1024, bias=True) (q_proj): Linear(in_features=1024, out_features=1024, bias=True) (out_proj): Linear(in_features=1024, out_features=1024, bias=True) ) (activation_fn): GELUActivation() (self_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) (encoder_attn): TrOCRAttention( (k_proj): Linear(in_features=1024, out_features=1024, bias=True) (v_proj): Linear(in_features=1024, out_features=1024, bias=True) (q_proj): Linear(in_features=1024, out_features=1024, bias=True) (out_proj): Linear(in_features=1024, out_features=1024, bias=True) ) (encoder_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) (fc1): Linear(in_features=1024, out_features=4096, bias=True) (fc2): Linear(in_features=4096, out_features=1024, bias=True) (final_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) ) ) ) ) (output_projection): Linear(in_features=1024, out_features=50265, bias=False) ) ```
I hit the following error when performing pipeline parallel for T5:
```
return default_pg.send([tensor], dst, tag)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: Tensors must be contiguous
```
In theory, we shouldn't require the tensors to be contiguous, especially for P2P ops, because we are just doing bit-wise "copy".
Thus, this PR relaxes the requirement and instead calls out that it would be user responsibility to guarantee the source and destination tensors have the same contiguity setting.
Pull Request resolved: pytorch#114982
Approved by: https://github.com/H-Huang
I hit the following error when performing pipeline parallel for T5:
In theory, we shouldn't require the tensors to be contiguous, especially for P2P ops, because we are just doing bit-wise "copy".
Thus, this PR relaxes the requirement and instead calls out that it would be user responsibility to guarantee the source and destination tensors have the same contiguity setting.
cc @mrshenli @pritamdamania87 @zhaojuanmao @satgera @rohan-varma @gqchen @aazzolini @osalpekar @jiayisuse @H-Huang @awgu @penguinwu @fegin @XilunWu @wanchaol @fduwjj @wz337 @tianyu-l @wconstab @yf225 @kiukchung @d4l3k @LucasLLC