-
Notifications
You must be signed in to change notification settings - Fork 25.7k
[sparse] Fix semi-structured sparse shape mismatch bug #110420
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Summary: Currently, PyTorch incorrectly calculates the size of the returned matrix when we pass a non-contiguous batched (>2d) input to the semi-structured sparse subclass. This is most common in MLP layers, where we have 2 linear layers back to back. This will lead to an error like the following: ``` RuntimeError: shape '[20, 64, 64, 3072]' is invalid for input of size 62914560 ``` Where the size of the sparse matmul result is off by a factor of 4. I'm not sure exactly where this bug comes from, but I traced it to [this](https://github.com/pytorch/pytorch/blob/01b2f25ebda85d307b27847ad67efe2b5bb54265/aten/src/ATen/native/LinearAlgebra.cpp#L1959) function. Note that this error goes away in inference mode, since we avoid decomposing the aten.linear op This fix overload __torch_function, specifically for the F.linear op. The goal is to implement our own folding to 2d / unfolding code so that we can avoid running into this issue. An alternative way to fix this issue is to set TORCH_FLATTEN_LINEAR_3D=True, which will also fix this error. Test Plan: ``` python test/test_sparse_semi_structured.py -k test_mlp ``` Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/110420
Note: Links to docs will display an error until the docs builds have been completed. ✅ You can merge normally! (3 Unrelated Failures)As of commit e8ee6f0 with merge base a3e5ec4 ( FLAKY - The following job failed but was likely due to flakiness present on trunk:
UNSTABLE - The following jobs failed but were likely due to flakiness present on trunk and has been marked as unstable:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
Summary: Currently, PyTorch incorrectly calculates the size of the returned matrix when we pass a non-contiguous batched (>2d) input to the semi-structured sparse subclass. This is most common in MLP layers, where we have 2 linear layers back to back. This will lead to an error like the following: ``` RuntimeError: shape '[20, 64, 64, 3072]' is invalid for input of size 62914560 ``` Where the size of the sparse matmul result is off by a factor of 4. I'm not sure exactly where this bug comes from, but I traced it to [this](https://github.com/pytorch/pytorch/blob/01b2f25ebda85d307b27847ad67efe2b5bb54265/aten/src/ATen/native/LinearAlgebra.cpp#L1959) function. Note that this error goes away in inference mode, since we avoid decomposing the aten.linear op This fix overload __torch_function, specifically for the F.linear op. The goal is to implement our own folding to 2d / unfolding code so that we can avoid running into this issue. An alternative way to fix this issue is to set TORCH_FLATTEN_LINEAR_3D=True, which will also fix this error. Test Plan: ``` python test/test_sparse_semi_structured.py -k test_mlp ``` Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
Summary: Currently, PyTorch incorrectly calculates the size of the returned matrix when we pass a non-contiguous batched (>2d) input to the semi-structured sparse subclass. This is most common in MLP layers, where we have 2 linear layers back to back. This will lead to an error like the following: ``` RuntimeError: shape '[20, 64, 64, 3072]' is invalid for input of size 62914560 ``` Where the size of the sparse matmul result is off by a factor of 4. I'm not sure exactly where this bug comes from, but I traced it to [this](https://github.com/pytorch/pytorch/blob/01b2f25ebda85d307b27847ad67efe2b5bb54265/aten/src/ATen/native/LinearAlgebra.cpp#L1959) function. Note that this error goes away in inference mode, since we avoid decomposing the aten.linear op This fix overload __torch_function, specifically for the F.linear op. The goal is to implement our own folding to 2d / unfolding code so that we can avoid running into this issue. An alternative way to fix this issue is to set TORCH_FLATTEN_LINEAR_3D=True, which will also fix this error. Test Plan: ``` python test/test_sparse_semi_structured.py -k test_mlp ``` Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
Summary: Currently, PyTorch incorrectly calculates the size of the returned matrix when we pass a non-contiguous batched (>2d) input to the semi-structured sparse subclass. This is most common in MLP layers, where we have 2 linear layers back to back. This will lead to an error like the following: ``` RuntimeError: shape '[20, 64, 64, 3072]' is invalid for input of size 62914560 ``` Where the size of the sparse matmul result is off by a factor of 4. I'm not sure exactly where this bug comes from, but I traced it to [this](https://github.com/pytorch/pytorch/blob/01b2f25ebda85d307b27847ad67efe2b5bb54265/aten/src/ATen/native/LinearAlgebra.cpp#L1959) function. Note that this error goes away in inference mode, since we avoid decomposing the aten.linear op This fix overload __torch_function, specifically for the F.linear op. The goal is to implement our own folding to 2d / unfolding code so that we can avoid running into this issue. An alternative way to fix this issue is to set TORCH_FLATTEN_LINEAR_3D=True, which will also fix this error. Test Plan: ``` python test/test_sparse_semi_structured.py -k test_mlp ``` Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 82c08a0 Pull Request resolved: #110420
Summary: Currently, PyTorch incorrectly calculates the size of the returned matrix when we pass a non-contiguous batched (>2d) input to the semi-structured sparse subclass. This is most common in MLP layers, where we have 2 linear layers back to back. This will lead to an error like the following: ``` RuntimeError: shape '[20, 64, 64, 3072]' is invalid for input of size 62914560 ``` Where the size of the sparse matmul result is off by a factor of 4. I'm not sure exactly where this bug comes from, but I traced it to [this](https://github.com/pytorch/pytorch/blob/01b2f25ebda85d307b27847ad67efe2b5bb54265/aten/src/ATen/native/LinearAlgebra.cpp#L1959) function. Note that this error goes away in inference mode, since we avoid decomposing the aten.linear op This fix overload __torch_function, specifically for the F.linear op. The goal is to implement our own folding to 2d / unfolding code so that we can avoid running into this issue. An alternative way to fix this issue is to set TORCH_FLATTEN_LINEAR_3D=True, which will also fix this error. Test Plan: ``` python test/test_sparse_semi_structured.py -k test_mlp ``` Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
Summary: Currently, PyTorch incorrectly calculates the size of the returned matrix when we pass a non-contiguous batched (>2d) input to the semi-structured sparse subclass. This is most common in MLP layers, where we have 2 linear layers back to back. This will lead to an error like the following: ``` RuntimeError: shape '[20, 64, 64, 3072]' is invalid for input of size 62914560 ``` Where the size of the sparse matmul result is off by a factor of 4. I'm not sure exactly where this bug comes from, but I traced it to [this](https://github.com/pytorch/pytorch/blob/01b2f25ebda85d307b27847ad67efe2b5bb54265/aten/src/ATen/native/LinearAlgebra.cpp#L1959) function. Note that this error goes away in inference mode, since we avoid decomposing the aten.linear op This fix overload __torch_function, specifically for the F.linear op. The goal is to implement our own folding to 2d / unfolding code so that we can avoid running into this issue. An alternative way to fix this issue is to set TORCH_FLATTEN_LINEAR_3D=True, which will also fix this error. Test Plan: ``` python test/test_sparse_semi_structured.py -k test_mlp ``` Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 613fdce Pull Request resolved: #110420
|
cc @albanD since I think this is a curious Tensor subclass "edge case" |
|
LGTM (aside for test failing at the moment because of an import). |
|
I'm confused. Is this just a big in the at::linear op? Why not just fix that instead of doing this? |
|
cc @albanD I can create an issue to triage this more precisely and fix it but this issue only arises for semi-structured sparse tensors and I'm not sure if it's a bug or a special case that needs to be handled. This fix seemed easier. |
Summary: Currently, PyTorch incorrectly calculates the size of the returned matrix when we pass a non-contiguous batched (>2d) input to the semi-structured sparse subclass. This is most common in MLP layers, where we have 2 linear layers back to back. This will lead to an error like the following: ``` RuntimeError: shape '[20, 64, 64, 3072]' is invalid for input of size 62914560 ``` Where the size of the sparse matmul result is off by a factor of 4. I'm not sure exactly where this bug comes from, but I traced it to [this](https://github.com/pytorch/pytorch/blob/01b2f25ebda85d307b27847ad67efe2b5bb54265/aten/src/ATen/native/LinearAlgebra.cpp#L1959) function. Note that this error goes away in inference mode, since we avoid decomposing the aten.linear op This fix overload __torch_function, specifically for the F.linear op. The goal is to implement our own folding to 2d / unfolding code so that we can avoid running into this issue. An alternative way to fix this issue is to set TORCH_FLATTEN_LINEAR_3D=True, which will also fix this error. Test Plan: ``` python test/test_sparse_semi_structured.py -k test_mlp ``` Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
Summary: Currently, PyTorch incorrectly calculates the size of the returned matrix when we pass a non-contiguous batched (>2d) input to the semi-structured sparse subclass. This is most common in MLP layers, where we have 2 linear layers back to back. This will lead to an error like the following: ``` RuntimeError: shape '[20, 64, 64, 3072]' is invalid for input of size 62914560 ``` Where the size of the sparse matmul result is off by a factor of 4. I'm not sure exactly where this bug comes from, but I traced it to [this](https://github.com/pytorch/pytorch/blob/01b2f25ebda85d307b27847ad67efe2b5bb54265/aten/src/ATen/native/LinearAlgebra.cpp#L1959) function. Note that this error goes away in inference mode, since we avoid decomposing the aten.linear op This fix overload __torch_function, specifically for the F.linear op. The goal is to implement our own folding to 2d / unfolding code so that we can avoid running into this issue. An alternative way to fix this issue is to set TORCH_FLATTEN_LINEAR_3D=True, which will also fix this error. Test Plan: ``` python test/test_sparse_semi_structured.py -k test_mlp ``` Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 13ac044 Pull Request resolved: #110420
|
@pytorchbot rebase |
|
@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here |
|
Rebase failed due to Command Raised by https://github.com/pytorch/pytorch/actions/runs/6434285998 |
Summary: Currently, PyTorch incorrectly calculates the size of the returned matrix when we pass a non-contiguous batched (>2d) input to the semi-structured sparse subclass. This is most common in MLP layers, where we have 2 linear layers back to back. This will lead to an error like the following: ``` RuntimeError: shape '[20, 64, 64, 3072]' is invalid for input of size 62914560 ``` Where the size of the sparse matmul result is off by a factor of 4. I'm not sure exactly where this bug comes from, but I traced it to [this](https://github.com/pytorch/pytorch/blob/01b2f25ebda85d307b27847ad67efe2b5bb54265/aten/src/ATen/native/LinearAlgebra.cpp#L1959) function. Note that this error goes away in inference mode, since we avoid decomposing the aten.linear op This fix overload __torch_function, specifically for the F.linear op. The goal is to implement our own folding to 2d / unfolding code so that we can avoid running into this issue. An alternative way to fix this issue is to set TORCH_FLATTEN_LINEAR_3D=True, which will also fix this error. Test Plan: ``` python test/test_sparse_semi_structured.py -k test_mlp ``` Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
Summary: Currently, PyTorch incorrectly calculates the size of the returned matrix when we pass a non-contiguous batched (>2d) input to the semi-structured sparse subclass. This is most common in MLP layers, where we have 2 linear layers back to back. This will lead to an error like the following: ``` RuntimeError: shape '[20, 64, 64, 3072]' is invalid for input of size 62914560 ``` Where the size of the sparse matmul result is off by a factor of 4. I'm not sure exactly where this bug comes from, but I traced it to [this](https://github.com/pytorch/pytorch/blob/01b2f25ebda85d307b27847ad67efe2b5bb54265/aten/src/ATen/native/LinearAlgebra.cpp#L1959) function. Note that this error goes away in inference mode, since we avoid decomposing the aten.linear op This fix overload __torch_function, specifically for the F.linear op. The goal is to implement our own folding to 2d / unfolding code so that we can avoid running into this issue. An alternative way to fix this issue is to set TORCH_FLATTEN_LINEAR_3D=True, which will also fix this error. Test Plan: ``` python test/test_sparse_semi_structured.py -k test_mlp ``` Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
Summary: Currently, PyTorch incorrectly calculates the size of the returned matrix when we pass a non-contiguous batched (>2d) input to the semi-structured sparse subclass. This is most common in MLP layers, where we have 2 linear layers back to back. This will lead to an error like the following: ``` RuntimeError: shape '[20, 64, 64, 3072]' is invalid for input of size 62914560 ``` Where the size of the sparse matmul result is off by a factor of 4. I'm not sure exactly where this bug comes from, but I traced it to [this](https://github.com/pytorch/pytorch/blob/01b2f25ebda85d307b27847ad67efe2b5bb54265/aten/src/ATen/native/LinearAlgebra.cpp#L1959) function. Note that this error goes away in inference mode, since we avoid decomposing the aten.linear op This fix overload __torch_function, specifically for the F.linear op. The goal is to implement our own folding to 2d / unfolding code so that we can avoid running into this issue. An alternative way to fix this issue is to set TORCH_FLATTEN_LINEAR_3D=True, which will also fix this error. Test Plan: ``` python test/test_sparse_semi_structured.py -k test_mlp ``` Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
Summary: Currently, PyTorch incorrectly calculates the size of the returned matrix when we pass a non-contiguous batched (>2d) input to the semi-structured sparse subclass. This is most common in MLP layers, where we have 2 linear layers back to back. This will lead to an error like the following: ``` RuntimeError: shape '[20, 64, 64, 3072]' is invalid for input of size 62914560 ``` Where the size of the sparse matmul result is off because we infer the output shape with the wrong tensor shape. This happens because of a bug where we did not update the subclass tensor shape when doing transpose. For semi-structured sparsity, transposing is a no-op where we just set the boolean flag, but we forgot to also update the tensor shape. Note that this error goes away in inference mode, since we avoid decomposing the aten.linear op and handle shape folding ourselves, which changes the execution path. An alternative way to fix this issue is to set TORCH_FLATTEN_LINEAR_3D=True, which will also fix this error. Test Plan: ``` python test/test_sparse_semi_structured.py -k test_mlp ``` Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Merge failedReason: 1 jobs have failed, first few of them are: linux-binary-libtorch-cxx11-abi / libtorch-cpu-shared-with-deps-cxx11-abi-build / build Details for Dev Infra teamRaised by workflow job |
Summary: Currently, PyTorch incorrectly calculates the size of the returned matrix when we pass a non-contiguous batched (>2d) input to the semi-structured sparse subclass. This is most common in MLP layers, where we have 2 linear layers back to back. This will lead to an error like the following: ``` RuntimeError: shape '[20, 64, 64, 3072]' is invalid for input of size 62914560 ``` Where the size of the sparse matmul result is off because we infer the output shape with the wrong tensor shape. This happens because of a bug where we did not update the subclass tensor shape when doing transpose. For semi-structured sparsity, transposing is a no-op where we just set the boolean flag, but we forgot to also update the tensor shape. Note that this error goes away in inference mode, since we avoid decomposing the aten.linear op and handle shape folding ourselves, which changes the execution path. An alternative way to fix this issue is to set TORCH_FLATTEN_LINEAR_3D=True, which will also fix this error. Test Plan: ``` python test/test_sparse_semi_structured.py -k test_mlp ``` Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
|
@pytorchbot merge -f "unrelated failures" |
Merge startedYour change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Summary: Currently, PyTorch incorrectly calculates the size of the returned matrix when we pass a non-contiguous batched (>2d) input to the semi-structured sparse subclass. This is most common in MLP layers, where we have 2 linear layers back to back. This will lead to an error like the following: ``` RuntimeError: shape '[20, 64, 64, 3072]' is invalid for input of size 62914560 ``` Where the size of the sparse matmul result is off because we infer the output shape with the wrong tensor shape. This happens because of a bug where we did not update the subclass tensor shape when doing transpose. For semi-structured sparsity, transposing is a no-op where we just set the boolean flag, but we forgot to also update the tensor shape. Note that this error goes away in inference mode, since we avoid decomposing the aten.linear op and handle shape folding ourselves, which changes the execution path. An alternative way to fix this issue is to set TORCH_FLATTEN_LINEAR_3D=True, which will also fix this error. Test Plan: ``` python test/test_sparse_semi_structured.py -k test_mlp ``` Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: #110420 Approved by: https://github.com/alexsamardzic, https://github.com/cpuhrsch
Summary: Currently, PyTorch incorrectly calculates the size of the returned matrix when we pass a non-contiguous batched (>2d) input to the semi-structured sparse subclass. This is most common in MLP layers, where we have 2 linear layers back to back. This will lead to an error like the following: ``` RuntimeError: shape '[20, 64, 64, 3072]' is invalid for input of size 62914560 ``` Where the size of the sparse matmul result is off because we infer the output shape with the wrong tensor shape. This happens because of a bug where we did not update the subclass tensor shape when doing transpose. For semi-structured sparsity, transposing is a no-op where we just set the boolean flag, but we forgot to also update the tensor shape. Note that this error goes away in inference mode, since we avoid decomposing the aten.linear op and handle shape folding ourselves, which changes the execution path. An alternative way to fix this issue is to set TORCH_FLATTEN_LINEAR_3D=True, which will also fix this error. Test Plan: ``` python test/test_sparse_semi_structured.py -k test_mlp ``` Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: #110420 Approved by: https://github.com/alexsamardzic, https://github.com/cpuhrsch
Summary: Currently, PyTorch incorrectly calculates the size of the returned matrix when we pass a non-contiguous batched (>2d) input to the semi-structured sparse subclass. This is most common in MLP layers, where we have 2 linear layers back to back. This will lead to an error like the following: ``` RuntimeError: shape '[20, 64, 64, 3072]' is invalid for input of size 62914560 ``` Where the size of the sparse matmul result is off because we infer the output shape with the wrong tensor shape. This happens because of a bug where we did not update the subclass tensor shape when doing transpose. For semi-structured sparsity, transposing is a no-op where we just set the boolean flag, but we forgot to also update the tensor shape. Note that this error goes away in inference mode, since we avoid decomposing the aten.linear op and handle shape folding ourselves, which changes the execution path. An alternative way to fix this issue is to set TORCH_FLATTEN_LINEAR_3D=True, which will also fix this error. Test Plan: ``` python test/test_sparse_semi_structured.py -k test_mlp ``` Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: #110420 Approved by: https://github.com/alexsamardzic, https://github.com/cpuhrsch
Summary: Currently, PyTorch incorrectly calculates the size of the returned matrix when we pass a non-contiguous batched (>2d) input to the semi-structured sparse subclass. This is most common in MLP layers, where we have 2 linear layers back to back. This will lead to an error like the following: ``` RuntimeError: shape '[20, 64, 64, 3072]' is invalid for input of size 62914560 ``` Where the size of the sparse matmul result is off because we infer the output shape with the wrong tensor shape. This happens because of a bug where we did not update the subclass tensor shape when doing transpose. For semi-structured sparsity, transposing is a no-op where we just set the boolean flag, but we forgot to also update the tensor shape. Note that this error goes away in inference mode, since we avoid decomposing the aten.linear op and handle shape folding ourselves, which changes the execution path. An alternative way to fix this issue is to set TORCH_FLATTEN_LINEAR_3D=True, which will also fix this error. Test Plan: ``` python test/test_sparse_semi_structured.py -k test_mlp ``` Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: #110420 Approved by: https://github.com/alexsamardzic, https://github.com/cpuhrsch
Stack from ghstack (oldest at bottom):
Summary:
Fixes: #110664
Currently, PyTorch incorrectly calculates the size of the returned
matrix when we pass a non-contiguous batched (>2d) input to the
semi-structured sparse subclass.
This is most common in MLP layers, where we have 2 linear layers back to back.
This will lead to an error like the following:
Where the size of the sparse matmul result is off because we infer the
output shape with the wrong tensor shape.
This happens because of a bug where we did not update the subclass
tensor shape when doing transpose.
For semi-structured sparsity, transposing is a no-op where we just set
the boolean flag, but we forgot to also update the tensor shape.
Note that this error goes away in inference mode, since we avoid
decomposing the aten.linear op and handle shape folding ourselves,
which changes the execution path.
An alternative way to fix this issue is to set
TORCH_FLATTEN_LINEAR_3D=True, which will also fix this error.
Test Plan:
Reviewers:
Subscribers:
Tasks:
Tags: