[sparse] semi-structured sparse + torch.compile support #111049

jcaip · 2023-10-11T17:56:47Z

Stack from ghstack (oldest at bottom):

-> [sparse] semi-structured sparse + torch.compile support #111049

Summary:

This PR adds in torch.compile support for semi-structured sparsity,
using the subclass tracing @bdhirsh added.

Based on wether we are using cuSPARSELt or CUTLASS, we return a
different representation of the inner tensors.

Test Plan:

python test/test_sparse_semi_structured.py -k compile

Reviewers:

Subscribers:

Tasks:

Tags:

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]

pytorch-bot · 2023-10-11T17:56:52Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/111049

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure

As of commit 575d075 with merge base 4b324a8 ():

NEW FAILURE - The following job has failed:

pull / linux-focal-cuda12.1-py3.10-gcc9 / test (default, 4, 5, linux.4xlarge.nvidia.gpu) (gh)

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 56dd0a8 Pull Request resolved: #111049

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 999ce27 Pull Request resolved: #111049

jcaip · 2023-10-12T19:14:02Z

So I think we will need some additional work before subclass tracing and 2:4 sparsity can work together. Loosely I think the problem is the op level at which inductor breaks stuff down to call the subclass is more decomposed than in eager (which we have support for).

For example - fp16 addmm in eager is a simple swap with torch._cslt_sparse_mm, but in the compiled version it looks like it's trying to break down addmm into smaller pieces, including casts to fp32 for computation type. See here for a trace: https://www.internalfb.com/phabricator/paste/view/P851580855

It looks like when compiling, inductor is trying different combinations of ops to represent aten.linear

transpose -> addmm
permute(0,1) -> addmm
reinterpret_tensor(flip strides) -> addmm.out (instead of addmm.default)

When I try a linear -> contiguous -> relu I also see it trying to run mm and mm.out.

So in order to support this, we'll either need to

Restrict the decompositions/codegen somehow? Not sure exactly what the issue here so will need to dig deeper. It's not clear to me what part of the codebase is trying different combinations.
Add op support for all of the possible generated ops. This is probably something we should do anyways, as some of these combinations may be faster. For example, adding support for passing in the output matrix to sparse matmul.

permute(0,1) and reinterpret_tensor should be mapped to transpose.

fp32 casts can be no-ops.

Summary: Placeholder PR for subclassing + 2:4 sparsity Test Plan: Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 743b278 Pull Request resolved: #111049

jcaip · 2023-10-12T20:32:17Z

torch/sparse/semi_structured.py

                transposed=not args[0].transposed,
            )

+        if func is torch.ops.prims.convert_element_type.default:


@bdhirsh These are the ops I added to get this test to "work", you can comment these out to see where it fails earlier in the chain.

Summary: Placeholder PR for subclassing + 2:4 sparsity Test Plan: Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 7183856 Pull Request resolved: #111049

Summary: Placeholder PR for subclassing + 2:4 sparsity Test Plan: Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]

Summary: This PR adds in torch.compile support for semi-structured sparsity, using the subclass tracing bdhirsh added. Based on wether we are using cuSPARSELt or CUTLASS, we return a different representation of the inner tensors. Test Plan: ``` python test/test_sparse_semi_structured.py -k compile ``` Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]

@bdhirsh

Summary: This PR adds in torch.compile support for semi-structured sparsity, using the subclass tracing @bdhirsh added. Based on wether we are using cuSPARSELt or CUTLASS, we return a different representation of the inner tensors. Test Plan: ``` python test/test_sparse_semi_structured.py -k compile ``` Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: eeba936 Pull Request resolved: #111049

jcaip · 2023-10-18T19:34:46Z

This is kind of strange.

I was able to reproduce this error running locally, but I don't think it's related to my changes here.

Running the following command yields failures for the torch.compile test, note that the other cusparselt/cutlass tests do not report a memory leak.

PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/test_sparse_semi_structured.py

............................................................................................
======================================================================
ERROR: test_conversions_all_patterns_backend_cutlass_cuda_float16 (__main__.TestSparseSemiStructuredCUDA)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/jessecai/local/A/pytorch/torch/testing/_internal/common_utils.py", line 2453, in wrapper
    method(*args, **kwargs)
  File "/home/jessecai/local/A/pytorch/torch/testing/_internal/common_utils.py", line 2453, in wrapper
    method(*args, **kwargs)
  File "/home/jessecai/local/A/pytorch/torch/testing/_internal/common_utils.py", line 2452, in wrapper
    with policy():
  File "/home/jessecai/local/A/pytorch/torch/testing/_internal/common_utils.py", line 1902, in __exit__
    raise RuntimeError(msg)
RuntimeError: CUDA driver API confirmed a leak in __main__.TestSparseSemiStructuredCUDA.test_conversions_all_patterns_backend_cutlass_cuda_float16! Caching allocator allocated memory was 1536 and is now reported as 2560 on device 0. CUDA driver allocated memory was 1044054016 and is now 1046151168.

To execute this test, run the following from the base repo dir:
    PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/test_sparse_semi_structured.py -k test_conversions_all_patterns_backend_cutlass_cuda_float16

This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0

======================================================================
ERROR: test_mlp_contiguous_relu_compile_backend_cusparselt_dense_input_shape_(1, 128)_cuda (__main__.TestSparseSemiStructuredCUDA)
Test nn.Linear + .contiguous() + nn.ReLU with SparseSemiStructuredTensor + torch.compile
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/jessecai/local/A/pytorch/torch/testing/_internal/common_utils.py", line 2453, in wrapper
    method(*args, **kwargs)
  File "/home/jessecai/local/A/pytorch/torch/testing/_internal/common_utils.py", line 2453, in wrapper
    method(*args, **kwargs)
  File "/home/jessecai/local/A/pytorch/torch/testing/_internal/common_utils.py", line 2452, in wrapper
    with policy():
  File "/home/jessecai/local/A/pytorch/torch/testing/_internal/common_utils.py", line 1902, in __exit__
    raise RuntimeError(msg)
RuntimeError: CUDA driver API confirmed a leak in __main__.TestSparseSemiStructuredCUDA.test_mlp_contiguous_relu_compile_backend_cusparselt_dense_input_shape_(1, 128)_cuda! Caching allocator allocated memory was 2048 and is now reported as 53760 on device 0. CUDA driver allocated memory was 1438318592 and is now 1440415744.

To execute this test, run the following from the base repo dir:
    PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/test_sparse_semi_structured.py -k test_mlp_contiguous_relu_compile_backend_cusparselt_dense_input_shape_(1, 128)_cuda

This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0

----------------------------------------------------------------------
Ran 168 tests in 110.646s

FAILED (errors=2)

Then I tried commenting out the call to to_sparse_semi_structured.py, which means we are just compare torch.compile vs eager mode. But I still see the compile tests fail. strangely I see more failures?

............................................................................................
======================================================================
ERROR: test_conversions_all_patterns_backend_cutlass_cuda_float16 (__main__.TestSparseSemiStructuredCUDA)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/jessecai/local/A/pytorch/torch/testing/_internal/common_utils.py", line 2453, in wrapper
    method(*args, **kwargs)
  File "/home/jessecai/local/A/pytorch/torch/testing/_internal/common_utils.py", line 2453, in wrapper
    method(*args, **kwargs)
  File "/home/jessecai/local/A/pytorch/torch/testing/_internal/common_utils.py", line 2452, in wrapper
    with policy():
  File "/home/jessecai/local/A/pytorch/torch/testing/_internal/common_utils.py", line 1902, in __exit__
    raise RuntimeError(msg)
RuntimeError: CUDA driver API confirmed a leak in __main__.TestSparseSemiStructuredCUDA.test_conversions_all_patterns_backend_cutlass_cuda_float16! Caching allocator allocated memory was 1536 and is now reported as 2560 on device 0. CUDA driver allocated memory was 1044054016 and is now 1046151168.

To execute this test, run the following from the base repo dir:
    PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/test_sparse_semi_structured.py -k test_conversions_all_patterns_backend_cutlass_cuda_float16

This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0

======================================================================
ERROR: test_mlp_contiguous_relu_compile_backend_cusparselt_dense_input_shape_(1, 128)_cuda (__main__.TestSparseSemiStructuredCUDA)
Test nn.Linear + .contiguous() + nn.ReLU with SparseSemiStructuredTensor + torch.compile
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/jessecai/local/A/pytorch/torch/testing/_internal/common_utils.py", line 2453, in wrapper
    method(*args, **kwargs)
  File "/home/jessecai/local/A/pytorch/torch/testing/_internal/common_utils.py", line 2453, in wrapper
    method(*args, **kwargs)
  File "/home/jessecai/local/A/pytorch/torch/testing/_internal/common_utils.py", line 2452, in wrapper
    with policy():
  File "/home/jessecai/local/A/pytorch/torch/testing/_internal/common_utils.py", line 1902, in __exit__
    raise RuntimeError(msg)
RuntimeError: CUDA driver API confirmed a leak in __main__.TestSparseSemiStructuredCUDA.test_mlp_contiguous_relu_compile_backend_cusparselt_dense_input_shape_(1, 128)_cuda! Caching allocator allocated memory was 2048 and is now reported as 35328 on device 0. CUDA driver allocated memory was 1438318592 and is now 1440415744.

To execute this test, run the following from the base repo dir:
    PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/test_sparse_semi_structured.py -k test_mlp_contiguous_relu_compile_backend_cusparselt_dense_input_shape_(1, 128)_cuda

This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0

======================================================================
ERROR: test_mlp_contiguous_relu_compile_backend_cusparselt_dense_input_shape_(128, 128)_cuda (__main__.TestSparseSemiStructuredCUDA)
Test nn.Linear + .contiguous() + nn.ReLU with SparseSemiStructuredTensor + torch.compile
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/jessecai/local/A/pytorch/torch/testing/_internal/common_utils.py", line 2453, in wrapper
    method(*args, **kwargs)
  File "/home/jessecai/local/A/pytorch/torch/testing/_internal/common_utils.py", line 2453, in wrapper
    method(*args, **kwargs)
  File "/home/jessecai/local/A/pytorch/torch/testing/_internal/common_utils.py", line 2452, in wrapper
    with policy():
  File "/home/jessecai/local/A/pytorch/torch/testing/_internal/common_utils.py", line 1902, in __exit__
    raise RuntimeError(msg)
RuntimeError: CUDA driver API confirmed a leak in __main__.TestSparseSemiStructuredCUDA.test_mlp_contiguous_relu_compile_backend_cusparselt_dense_input_shape_(128, 128)_cuda! Caching allocator allocated memory was 35328 and is now reported as 68608 on device 0. CUDA driver allocated memory was 1440415744 and is now 1442512896.

To execute this test, run the following from the base repo dir:
    PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/test_sparse_semi_structured.py -k test_mlp_contiguous_relu_compile_backend_cusparselt_dense_input_shape_(128, 128)_cuda

This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0

======================================================================
ERROR: test_mlp_contiguous_relu_compile_backend_cusparselt_dense_input_shape_(64, 128)_cuda (__main__.TestSparseSemiStructuredCUDA)
Test nn.Linear + .contiguous() + nn.ReLU with SparseSemiStructuredTensor + torch.compile
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/jessecai/local/A/pytorch/torch/testing/_internal/common_utils.py", line 2453, in wrapper
    method(*args, **kwargs)
  File "/home/jessecai/local/A/pytorch/torch/testing/_internal/common_utils.py", line 2453, in wrapper
    method(*args, **kwargs)
  File "/home/jessecai/local/A/pytorch/torch/testing/_internal/common_utils.py", line 2452, in wrapper
    with policy():
  File "/home/jessecai/local/A/pytorch/torch/testing/_internal/common_utils.py", line 1902, in __exit__
    raise RuntimeError(msg)
RuntimeError: CUDA driver API confirmed a leak in __main__.TestSparseSemiStructuredCUDA.test_mlp_contiguous_relu_compile_backend_cusparselt_dense_input_shape_(64, 128)_cuda! Caching allocator allocated memory was 68608 and is now reported as 101888 on device 0. CUDA driver allocated memory was 1442512896 and is now 1444610048.
                                                                                                                                                                                                                                                                                          [0/1936]
To execute this test, run the following from the base repo dir:
    PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/test_sparse_semi_structured.py -k test_mlp_contiguous_relu_compile_backend_cusparselt_dense_input_shape_(64, 128)_cuda

This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0

======================================================================
ERROR: test_mlp_contiguous_relu_compile_backend_cusparselt_dense_input_shape_(64, 128, 128)_cuda (__main__.TestSparseSemiStructuredCUDA)
Test nn.Linear + .contiguous() + nn.ReLU with SparseSemiStructuredTensor + torch.compile
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/jessecai/local/A/pytorch/torch/testing/_internal/common_utils.py", line 2453, in wrapper
    method(*args, **kwargs)
  File "/home/jessecai/local/A/pytorch/torch/testing/_internal/common_utils.py", line 2453, in wrapper
    method(*args, **kwargs)
  File "/home/jessecai/local/A/pytorch/torch/testing/_internal/common_utils.py", line 2452, in wrapper
    with policy():
  File "/home/jessecai/local/A/pytorch/torch/testing/_internal/common_utils.py", line 1902, in __exit__
    raise RuntimeError(msg)
RuntimeError: CUDA driver API confirmed a leak in __main__.TestSparseSemiStructuredCUDA.test_mlp_contiguous_relu_compile_backend_cusparselt_dense_input_shape_(64, 128, 128)_cuda! Caching allocator allocated memory was 101888 and is now reported as 135168 on device 0. CUDA driver allocated
memory was 1444610048 and is now 1446707200.

To execute this test, run the following from the base repo dir:
    PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/test_sparse_semi_structured.py -k test_mlp_contiguous_relu_compile_backend_cusparselt_dense_input_shape_(64, 128, 128)_cuda

This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0

======================================================================
ERROR: test_mlp_contiguous_relu_compile_backend_cutlass_dense_input_shape_(128, 128)_cuda (__main__.TestSparseSemiStructuredCUDA)
Test nn.Linear + .contiguous() + nn.ReLU with SparseSemiStructuredTensor + torch.compile
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/jessecai/local/A/pytorch/torch/testing/_internal/common_utils.py", line 2453, in wrapper
    method(*args, **kwargs)
  File "/home/jessecai/local/A/pytorch/torch/testing/_internal/common_utils.py", line 2453, in wrapper
    method(*args, **kwargs)
  File "/home/jessecai/local/A/pytorch/torch/testing/_internal/common_utils.py", line 2452, in wrapper
    with policy():
  File "/home/jessecai/local/A/pytorch/torch/testing/_internal/common_utils.py", line 1902, in __exit__
    raise RuntimeError(msg)
RuntimeError: CUDA driver API confirmed a leak in __main__.TestSparseSemiStructuredCUDA.test_mlp_contiguous_relu_compile_backend_cutlass_dense_input_shape_(128, 128)_cuda! Caching allocator allocated memory was 167424 and is now reported as 200704 on device 0. CUDA driver allocated memory
was 1446707200 and is now 1448804352.

To execute this test, run the following from the base repo dir:
    PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/test_sparse_semi_structured.py -k test_mlp_contiguous_relu_compile_backend_cutlass_dense_input_shape_(128, 128)_cuda

This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0

======================================================================
ERROR: test_mlp_contiguous_relu_compile_backend_cutlass_dense_input_shape_(64, 128)_cuda (__main__.TestSparseSemiStructuredCUDA)
Test nn.Linear + .contiguous() + nn.ReLU with SparseSemiStructuredTensor + torch.compile
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/jessecai/local/A/pytorch/torch/testing/_internal/common_utils.py", line 2453, in wrapper
    method(*args, **kwargs)
  File "/home/jessecai/local/A/pytorch/torch/testing/_internal/common_utils.py", line 2453, in wrapper
    method(*args, **kwargs)
  File "/home/jessecai/local/A/pytorch/torch/testing/_internal/common_utils.py", line 2452, in wrapper
    with policy():
  File "/home/jessecai/local/A/pytorch/torch/testing/_internal/common_utils.py", line 1902, in __exit__
    raise RuntimeError(msg)
RuntimeError: CUDA driver API confirmed a leak in __main__.TestSparseSemiStructuredCUDA.test_mlp_contiguous_relu_compile_backend_cutlass_dense_input_shape_(64, 128)_cuda! Caching allocator allocated memory was 200704 and is now reported as 233984 on device 0. CUDA driver allocated memory w
as 1448804352 and is now 1450901504.

To execute this test, run the following from the base repo dir:
    PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/test_sparse_semi_structured.py -k test_mlp_contiguous_relu_compile_backend_cutlass_dense_input_shape_(64, 128)_cuda

This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0

----------------------------------------------------------------------
Ran 168 tests in 119.071s

FAILED (errors=7)

cc @clee2000 @bdhirsh do y'all have any ideas what might be causing this? Could it be something funky with the tooling / torch.compile?

bdhirsh · 2023-10-20T07:26:03Z

Hey @jcaip - I messed around with the test locally, and this seems like a memory leak directly in cutlass / the SparseSemiStructuredTensor subclass. Here's a minimal repro:

import torch
import random
from torch.sparse.semi_structured import SparseSemiStructuredTensor, to_sparse_semi_structured

SparseSemiStructuredTensor._FORCE_CUTLASS = True
def f():
    mask_entries = [random.choice([[0, 1], [1, 0]]) for i in range(16384)]
    A = torch.tensor(mask_entries, dtype=torch.float16, device='cuda').reshape(128, 256).contiguous()
    A_sparse = to_sparse_semi_structured(A)

print(torch.cuda.memory_allocated())
f()
print(torch.cuda.memory_allocated())

prints:

0
512

cpuhrsch · 2023-10-20T19:11:08Z

@alexsamardzic - Alek can you take a look into this as well?

alexsamardzic · 2023-10-20T20:20:47Z

This is sort of known problem; namely, the code doing conversion is actually @torch.compiled-d PyTorch code, and seemingly the memory leak appears because of this. Namely, the leak will disappear if in torch/sparse/_semi_structured_conversions.py, line:

def sparse_semi_structured_from_dense_cutlass(dense, compile=True):

changed to:

def sparse_semi_structured_from_dense_cutlass(dense, compile=False):

While trying to change the function in question to come up with minimum self-contained example, I came up with something even unrelated:

import torch

@torch.compile
def foo(A):
    return A

def f():
    A = torch.ones(128, dtype=torch.float16, device='cuda')
    B = foo(A)

print(torch.cuda.memory_allocated())
f()
print(torch.cuda.memory_allocated())

It will also print 512 instead of 0.

@lezcano @peterbell10

jcaip · 2023-10-23T18:28:13Z

@alexsamardzic What do you think about turning off this flag to false now that we have torch.compile support for subclasses? I think this can also cause issues if we call to torch.compile again later.

alexsamardzic · 2023-10-23T19:47:30Z

That's fine with me.

Summary: This PR adds in torch.compile support for semi-structured sparsity, using the subclass tracing bdhirsh added. Based on wether we are using cuSPARSELt or CUTLASS, we return a different representation of the inner tensors. Test Plan: ``` python test/test_sparse_semi_structured.py -k compile ``` Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]

jcaip · 2023-10-23T20:38:59Z

Ah I believe the test is failing because I was not using torch._dynamo.test_case.TestCase for my tests.

Writing a new test for semi-structured sparse should fix this.

Summary: This PR adds in torch.compile support for semi-structured sparsity, using the subclass tracing bdhirsh added. Based on wether we are using cuSPARSELt or CUTLASS, we return a different representation of the inner tensors. Test Plan: ``` python test/test_sparse_semi_structured.py -k compile ``` Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]

@bdhirsh

Summary: This PR adds in torch.compile support for semi-structured sparsity, using the subclass tracing @bdhirsh added. Based on wether we are using cuSPARSELt or CUTLASS, we return a different representation of the inner tensors. Test Plan: ``` python test/test_sparse_semi_structured.py -k compile ``` Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 8e127cc Pull Request resolved: #111049

jcaip · 2023-10-24T02:21:18Z

@pytorchbot merge -f "passing ciflow/slow now and unrelated failing test"

pytorchmergebot · 2023-10-24T02:23:06Z

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

@bdhirsh

Summary: This PR adds in torch.compile support for semi-structured sparsity, using the subclass tracing @bdhirsh added. Based on wether we are using cuSPARSELt or CUTLASS, we return a different representation of the inner tensors. Test Plan: ``` python test/test_sparse_semi_structured.py -k compile ``` Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: pytorch#111049 Approved by: https://github.com/cpuhrsch

@bdhirsh

Summary: This PR adds in torch.compile support for semi-structured sparsity, using the subclass tracing @bdhirsh added. Based on wether we are using cuSPARSELt or CUTLASS, we return a different representation of the inner tensors. Test Plan: ``` python test/test_sparse_semi_structured.py -k compile ``` Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: pytorch#111049 Approved by: https://github.com/cpuhrsch

[wip] semi-structured sparse + torch.compile support

e0b2be8

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]

jcaip mentioned this pull request Oct 11, 2023

[sparse] Add padding for dense matrices in semi-structured sparse #110583

Closed

pytorch-bot bot added the release notes: sparse release notes category label Oct 11, 2023

jcaip added a commit that referenced this pull request Oct 11, 2023

[wip] semi-structured sparse + torch.compile support

cae381b

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 56dd0a8 Pull Request resolved: #111049

Update on "[wip] semi-structured sparse + torch.compile support"

c88541a

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]

jcaip added a commit that referenced this pull request Oct 11, 2023

[wip] semi-structured sparse + torch.compile support

889b1cd

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 999ce27 Pull Request resolved: #111049

Update on "[wip] semi-structured sparse + torch.compile support"

8d3d0fe

Summary: Placeholder PR for subclassing + 2:4 sparsity Test Plan: Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]

jcaip force-pushed the gh/jcaip/46/head branch from c88541a to 8d3d0fe Compare October 12, 2023 20:31

jcaip added a commit that referenced this pull request Oct 12, 2023

[wip] semi-structured sparse + torch.compile support

d76393b

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 743b278 Pull Request resolved: #111049

jcaip commented Oct 12, 2023

View reviewed changes

Update on "[wip] semi-structured sparse + torch.compile support"

b5deb6c

Summary: Placeholder PR for subclassing + 2:4 sparsity Test Plan: Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]

Update on "[wip] semi-structured sparse + torch.compile support"

72525cb

Summary: Placeholder PR for subclassing + 2:4 sparsity Test Plan: Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]

Update on "[wip] semi-structured sparse + torch.compile support"

6268235

Summary: Placeholder PR for subclassing + 2:4 sparsity Test Plan: Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]

jcaip added a commit that referenced this pull request Oct 13, 2023

[wip] semi-structured sparse + torch.compile support

c84c508

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 7183856 Pull Request resolved: #111049

Update on "[wip] semi-structured sparse + torch.compile support"

e095da5

Summary: Placeholder PR for subclassing + 2:4 sparsity Test Plan: Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]

Update on "[wip] semi-structured sparse + torch.compile support"

7d4e97b

Summary: Placeholder PR for subclassing + 2:4 sparsity Test Plan: Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]

Update on "[wip] semi-structured sparse + torch.compile support"

80c93ac

Summary: Placeholder PR for subclassing + 2:4 sparsity Test Plan: Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]

Update on "[wip] semi-structured sparse + torch.compile support"

192f01c

Summary: Placeholder PR for subclassing + 2:4 sparsity Test Plan: Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]

jcaip changed the title ~~[wip] semi-structured sparse + torch.compile support~~ [sparse] semi-structured sparse + torch.compile support Oct 13, 2023

jcaip force-pushed the gh/jcaip/46/head branch from 192f01c to e2d9bae Compare October 13, 2023 16:45

jcaip reopened this Oct 18, 2023

jcaip removed Merged Reverted labels Oct 24, 2023

pytorchmergebot added the merging label Oct 24, 2023

pytorchmergebot added Merged and removed merging labels Oct 24, 2023

pytorchmergebot closed this in 702aaf8 Oct 24, 2023

facebook-github-bot deleted the gh/jcaip/46/head branch October 27, 2023 14:25

jon-chuang mentioned this pull request Nov 23, 2023

[inductor] _sparse_semi_structured_linear fallback - no meta registration; not on testing path #114477

Closed

alexsamardzic mentioned this pull request Dec 2, 2023

"to_sparse_semi_structured" cannot save memory on A100 using CUTLASS #115008

Closed

jcaip mentioned this pull request Dec 12, 2023

[Tracker] torch.sparse semi-structured 2.3 beta release #115662

Closed

[sparse] semi-structured sparse + torch.compile support #111049

[sparse] semi-structured sparse + torch.compile support #111049

Uh oh!

Conversation

jcaip commented Oct 11, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Oct 11, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/111049

❌ 1 New Failure

Uh oh!

jcaip commented Oct 12, 2023

Uh oh!

jcaip Oct 12, 2023

Choose a reason for hiding this comment

Uh oh!

jcaip commented Oct 18, 2023

Uh oh!

bdhirsh commented Oct 20, 2023

Uh oh!

cpuhrsch commented Oct 20, 2023

Uh oh!

alexsamardzic commented Oct 20, 2023

Uh oh!

jcaip commented Oct 23, 2023

Uh oh!

alexsamardzic commented Oct 23, 2023

Uh oh!

jcaip commented Oct 23, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jcaip commented Oct 24, 2023

Uh oh!

pytorchmergebot commented Oct 24, 2023

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

jcaip commented Oct 11, 2023 •

edited

Loading

pytorch-bot bot commented Oct 11, 2023 •

edited

Loading

jcaip commented Oct 23, 2023 •

edited

Loading