[Inductor CUTLASS backend] Step 2: CUDACodeCache #107847

ipiszy · 2023-08-24T06:49:44Z

This is the step 2 to add cutlass as an alternative inductor backend.
Feature request: #106991.

Stack from ghstack (oldest at bottom):

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx @peterbell10 @ngimel @yf225 @chenyang78 @kadeng @muchulee8 @aakhundov

[ghstack-poisoned]

pytorch-bot · 2023-08-24T06:49:46Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/107847

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure, 2 Unrelated Failures

As of commit 5fd7946 with merge base f9a250c ():

NEW FAILURE - The following job has failed:

linux-focal-py3.11-clang10 / test (crossref, 2, 2, linux.2xlarge) (gh)

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

lintrunner / linux-job (gh)

UNSTABLE - The following job failed but was likely due to flakiness present on trunk and has been marked as unstable:

cuda12.1-py3.10-gcc9-sm86 / test (inductor, 1, 1, linux.g5.4xlarge.nvidia.gpu, unstable) (gh)

This comment was automatically generated by Dr. CI and updates every 15 minutes.

ghstack-source-id: f78a86d Pull Request resolved: #107847

kadeng · 2023-08-25T10:32:42Z

torch/_inductor/codecache.py



+def _cuda_compiler() -> str:
+    return "nvcc"


Maybe we should use the same nvcc executable that is used and detected by torch's CMAKE build script, as that might not be the one that's on the path.

Do you refer to cmake/public/cuda.cmake? From my understanding, this is written in CMake language and requires cmake to execute. I'm not familiar with CMake, and it doesn't look very trivial to do the integration. Since the cutlass integration is new, I'd like make it work first and then update the surrounding pieces, e.g. env setup, etc. For now I would just add a config in Inductor to let user specify nvcc path if they need.

This won't work since CMAKE often runs on a different machine than pytorch gets pip installed on.

kadeng · 2023-08-25T10:38:58Z

torch/_inductor/codecache.py

+
+        if f_dlclose is not None:
+            f_dlclose.argtypes = [c_void_p]
+            f_dlclose(self.DLL._handle)


I think if this returns, we should set self.DLL = None

All self.DLL access is guarded by self.is_open already.

kadeng · 2023-08-25T10:41:06Z

torch/_inductor/codecache.py

+        """
+
+        cuda_command = repr(
+            cuda_compile_command(["dummy_input"], "dummy_output", dst_file_ext)


Is this "dummy_input" intentional?

Yes this is intentional. This is used to generate a dummy command for code hash purpose.

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ngimel yf225 chenyang78 kadeng muchulee8 aakhundov [ghstack-poisoned]

ipiszy

Thanks @kadeng !

ipiszy · 2023-08-26T17:35:28Z

torch/_inductor/codecache.py

+        """
+
+        cuda_command = repr(
+            cuda_compile_command(["dummy_input"], "dummy_output", dst_file_ext)


Yes this is intentional. This is used to generate a dummy command for code hash purpose.

ipiszy · 2023-08-26T17:36:30Z

torch/_inductor/codecache.py

+
+        if f_dlclose is not None:
+            f_dlclose.argtypes = [c_void_p]
+            f_dlclose(self.DLL._handle)


All self.DLL access is guarded by self.is_open already.

ipiszy · 2023-08-26T17:49:14Z

torch/_inductor/codecache.py



+def _cuda_compiler() -> str:
+    return "nvcc"


Do you refer to cmake/public/cuda.cmake? From my understanding, this is written in CMake language and requires cmake to execute. I'm not familiar with CMake, and it doesn't look very trivial to do the integration. Since the cutlass integration is new, I'd like make it work first and then update the surrounding pieces, e.g. env setup, etc. For now I would just add a config in Inductor to let user specify nvcc path if they need.

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ngimel yf225 chenyang78 kadeng muchulee8 aakhundov [ghstack-poisoned]

This is the step 2 to add cutlass as an alternative inductor backend. Feature request: #106991. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ngimel yf225 chenyang78 kadeng muchulee8 aakhundov [ghstack-poisoned]

aakhundov · 2023-08-28T13:54:45Z

torch/_inductor/config.py

    use_fast_math = False

+    # Path to CUDA NVCC.
+    # NVCC search priority:


Nit: "NVCC search order"?

aakhundov · 2023-08-28T13:55:11Z

torch/_inductor/config.py

+    # 2）CUDACXX env
+    # 3）CUDA_HOME env


Better expand to "environment variable", imho.

aakhundov · 2023-08-28T13:58:27Z

test/inductor/test_cudacodecache.py

+from torch._inductor.exc import CUDACompileError
+from torch.testing._internal.common_utils import TestCase as TorchTestCase
+
+_SOURCE_CODE = r"""


This looks specific to the TestCUDACodeCache. Hide it within the class?

I feel it's okay since this file is only for test_cudacodecache, and the variable follows the underscore prefix convention to mark it as internal only.

aakhundov · 2023-08-28T14:04:27Z

torch/_inductor/codegen/cuda/cuda_env.py

        return None
+
+
+def nvcc_exist(nvcc_path: str = "nvcc") -> bool:


Nit: Optional[str]. mypy will be unhappy otherwise.

Seems that it's okay.

aakhundov · 2023-08-28T14:05:31Z

torch/_inductor/codecache.py

+        extra_ldflags.append("-lcuda")
+        extra_ldflags.append("-lcudart")
+    else:
+        raise NotImplementedError("Unsupported env, failed to find cuda libs!")


If only Linux is supported at the moment, maybe let's be specific about it in the exception text?

aakhundov · 2023-08-28T14:06:53Z

torch/_inductor/codecache.py

+
+
+def _cuda_lib_options() -> List[str]:
+    from torch.utils import cpp_extension


Just in case: the other day I've found out that cpp_extension doesn't quite work internally (in my case: for inline compilation with ninja). Maybe for this particular use case it's fine, though.

By internally, do you mean fbcode or something else? Could you help explain more?

Yes, in fbcode. More context in this comment. Maybe irrelevant for this PR, just a heads-up.

jansel · 2023-08-29T15:25:48Z

test/inductor/test_cudacodecache.py

+
+__attribute__((__visibility__("default")))
+int saxpy(int n, float a, float *x, float *y) {
+  // Perform SAXPY


what is SAXPY? Does it stand for something?

Haha it's "Single-precision A*X Plus Y". This code is from https://developer.nvidia.com/blog/easy-introduction-cuda-c-and-c/.

jansel · 2023-08-29T15:29:08Z

torch/_inductor/codecache.py



+def _cuda_compiler() -> str:
+    return "nvcc"


This won't work since CMAKE often runs on a different machine than pytorch gets pip installed on.

jansel · 2023-08-29T15:30:34Z

torch/_inductor/codecache.py

+def _cutlass_include_paths() -> List[str]:
+    from torch.utils import cpp_extension
+
+    cutlass_path = os.path.join(cpp_extension._TORCH_PATH, "../third_party/cutlass")


I don't think this will work when pytorch is pip installed, may need to copy headers.

Refined it a bit to fetch cutlass path from the default inductor config and allow users defining a custom cutlass path.

kadeng

looks good to me

This is the step 2 to add cutlass as an alternative inductor backend. Feature request: #106991. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ngimel yf225 chenyang78 kadeng muchulee8 aakhundov [ghstack-poisoned]

jansel · 2023-09-07T19:52:46Z

torch/_inductor/codegen/cuda/cuda_env.py

        return None
+
+
+def nvcc_exist(nvcc_path: str = "nvcc") -> bool:


functools.lru_cache around this since subprocess.call is expensive.

This is the step 2 to add cutlass as an alternative inductor backend. Feature request: #106991. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ngimel yf225 chenyang78 kadeng muchulee8 aakhundov [ghstack-poisoned]

ipiszy · 2023-09-09T18:31:35Z

@pytorchbot label "topic: not user facing"

ipiszy · 2023-09-11T16:54:42Z

@pytorchbot merge

pytorchmergebot · 2023-09-11T16:57:43Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2023-09-11T17:13:01Z

Merge failed

Reason: 1 jobs have failed, first few of them are: trunk / macos-12-py3-arm64 / test (default, 3, 3, macos-m1-12)

Details for Dev Infra team

Raised by workflow job

This is the step 2 to add cutlass as an alternative inductor backend. Feature request: #106991. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ngimel yf225 chenyang78 kadeng muchulee8 aakhundov [ghstack-poisoned]

…kRequest (#107901) This is the step 3 to add cutlass as an alternative inductor backend. Full tests can be found from the last PR in the stack. Feature request: #106991. Pull Request resolved: #107901 Approved by: https://github.com/jansel, https://github.com/aakhundov, https://github.com/kadeng ghstack dependencies: #107802, #107847

This is the step 4 to add cutlass as an alternative inductor backend. Full tests can be found from the last PR in the stack. Feature request: #106991. Pull Request resolved: #107931 Approved by: https://github.com/aakhundov, https://github.com/jansel, https://github.com/kadeng ghstack dependencies: #107802, #107847, #107901

This is the step 5 to add cutlass as an alternative inductor backend. Feature request: #106991. Pull Request resolved: #108015 Approved by: https://github.com/kadeng, https://github.com/jansel, https://github.com/aakhundov ghstack dependencies: #107802, #107847, #107901, #107931

CUDAcodecache

b5d21cb

[ghstack-poisoned]

ipiszy mentioned this pull request Aug 24, 2023

[Inductor CUTLASS backend] Step 1: Inductor config for cuda / cutlass, util functions. #107802

Closed

github-actions bot added module: inductor ciflow/inductor labels Aug 24, 2023

ipiszy added a commit that referenced this pull request Aug 24, 2023

CUDAcodecache

391e1b9

ghstack-source-id: f78a86d Pull Request resolved: #107847

ipiszy changed the title ~~CUDAcodecache~~ [Inductor CUTLASS backend] Step 2: CUDACodeCache Aug 24, 2023

This was referenced Aug 24, 2023

[Inductor CUTLASS backend] Step 3: autotune_process, and CUDABenchmarkRequest #107901

Closed

[Inductor CUTLASS backend] Step 4: CUDA (template) kernels #107931

Closed

[Inductor CUTLASS backend] Step 5: CUTLASS gemm kernels #107933

Closed

kadeng reviewed Aug 25, 2023

View reviewed changes

ipiszy added 8 commits August 25, 2023 11:00

Update on "[Inductor CUTLASS backend] Step 2: CUDACodeCache"

03d50bc

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ngimel yf225 chenyang78 kadeng muchulee8 aakhundov [ghstack-poisoned]

Update on "[Inductor CUTLASS backend] Step 2: CUDACodeCache"

cce6eb3

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ngimel yf225 chenyang78 kadeng muchulee8 aakhundov [ghstack-poisoned]

Update on "[Inductor CUTLASS backend] Step 2: CUDACodeCache"

ef35866

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ngimel yf225 chenyang78 kadeng muchulee8 aakhundov [ghstack-poisoned]

Update on "[Inductor CUTLASS backend] Step 2: CUDACodeCache"

ca662eb

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ngimel yf225 chenyang78 kadeng muchulee8 aakhundov [ghstack-poisoned]

Update on "[Inductor CUTLASS backend] Step 2: CUDACodeCache"

e76b901

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ngimel yf225 chenyang78 kadeng muchulee8 aakhundov [ghstack-poisoned]

Update on "[Inductor CUTLASS backend] Step 2: CUDACodeCache"

05ba8a4

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ngimel yf225 chenyang78 kadeng muchulee8 aakhundov [ghstack-poisoned]

Update on "[Inductor CUTLASS backend] Step 2: CUDACodeCache"

24bf06e

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ngimel yf225 chenyang78 kadeng muchulee8 aakhundov [ghstack-poisoned]

Update on "[Inductor CUTLASS backend] Step 2: CUDACodeCache"

3e3dc9b

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ngimel yf225 chenyang78 kadeng muchulee8 aakhundov [ghstack-poisoned]

ipiszy commented Aug 26, 2023

View reviewed changes

ipiszy added 2 commits August 26, 2023 12:19

Update on "[Inductor CUTLASS backend] Step 2: CUDACodeCache"

9dbf599

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ngimel yf225 chenyang78 kadeng muchulee8 aakhundov [ghstack-poisoned]

ipiszy requested review from aakhundov and jansel August 27, 2023 01:34

ipiszy marked this pull request as ready for review August 27, 2023 01:34

ipiszy mentioned this pull request Aug 27, 2023

[Inductor CUTLASS backend] Step 5: Gemm CUTLASS templates #108015

Closed

aakhundov reviewed Aug 28, 2023

View reviewed changes

jansel requested changes Aug 29, 2023

View reviewed changes

ipiszy requested review from aakhundov, jansel and kadeng September 6, 2023 03:56

kadeng approved these changes Sep 6, 2023

View reviewed changes

ipiszy added 3 commits September 6, 2023 15:09

jansel approved these changes Sep 7, 2023

View reviewed changes

ipiszy added 4 commits September 7, 2023 16:11

pytorch-bot bot added the topic: not user facing topic category label Sep 9, 2023

aakhundov approved these changes Sep 10, 2023

View reviewed changes

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Sep 11, 2023

pytorchmergebot added the merging label Sep 11, 2023

pytorchmergebot removed the merging label Sep 11, 2023

ipiszy added 2 commits September 11, 2023 12:04

pytorchmergebot added the Merged label Sep 12, 2023

pytorchmergebot closed this in 102fefa Sep 12, 2023

facebook-github-bot deleted the gh/ipiszy@gmail.com/2/head branch September 16, 2023 14:23



		def _cuda_lib_options() -> List[str]:
		from torch.utils import cpp_extension

[Inductor CUTLASS backend] Step 2: CUDACodeCache #107847

[Inductor CUTLASS backend] Step 2: CUDACodeCache #107847

Uh oh!

Conversation

ipiszy commented Aug 24, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Aug 24, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/107847

❌ 1 New Failure, 2 Unrelated Failures

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ipiszy left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kadeng left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ipiszy commented Sep 9, 2023

Uh oh!

ipiszy commented Sep 11, 2023

Uh oh!

pytorchmergebot commented Sep 11, 2023

Merge started

Uh oh!

pytorchmergebot commented Sep 11, 2023

Merge failed

ipiszy commented Aug 24, 2023 •

edited

Loading

pytorch-bot bot commented Aug 24, 2023 •

edited

Loading