KEMBAR78
[Inductor CUTLASS backend] Step 2: CUDACodeCache by ipiszy · Pull Request #107847 · pytorch/pytorch · GitHub
Skip to content

Conversation

[ghstack-poisoned]
@pytorch-bot
Copy link

pytorch-bot bot commented Aug 24, 2023

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/107847

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure, 2 Unrelated Failures

As of commit 5fd7946 with merge base f9a250c (image):

NEW FAILURE - The following job has failed:

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

UNSTABLE - The following job failed but was likely due to flakiness present on trunk and has been marked as unstable:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

ipiszy added a commit that referenced this pull request Aug 24, 2023
ghstack-source-id: f78a86d
Pull Request resolved: #107847
@ipiszy ipiszy changed the title CUDAcodecache [Inductor CUTLASS backend] Step 2: CUDACodeCache Aug 24, 2023


def _cuda_compiler() -> str:
return "nvcc"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should use the same nvcc executable that is used and detected by torch's CMAKE build script, as that might not be the one that's on the path.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you refer to cmake/public/cuda.cmake? From my understanding, this is written in CMake language and requires cmake to execute. I'm not familiar with CMake, and it doesn't look very trivial to do the integration. Since the cutlass integration is new, I'd like make it work first and then update the surrounding pieces, e.g. env setup, etc. For now I would just add a config in Inductor to let user specify nvcc path if they need.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This won't work since CMAKE often runs on a different machine than pytorch gets pip installed on.


if f_dlclose is not None:
f_dlclose.argtypes = [c_void_p]
f_dlclose(self.DLL._handle)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if this returns, we should set self.DLL = None

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All self.DLL access is guarded by self.is_open already.

"""

cuda_command = repr(
cuda_compile_command(["dummy_input"], "dummy_output", dst_file_ext)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this "dummy_input" intentional?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes this is intentional. This is used to generate a dummy command for code hash purpose.

ipiszy added 8 commits August 25, 2023 11:00
cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ngimel yf225 chenyang78 kadeng muchulee8 aakhundov

[ghstack-poisoned]
cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ngimel yf225 chenyang78 kadeng muchulee8 aakhundov

[ghstack-poisoned]
cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ngimel yf225 chenyang78 kadeng muchulee8 aakhundov

[ghstack-poisoned]
cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ngimel yf225 chenyang78 kadeng muchulee8 aakhundov

[ghstack-poisoned]
cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ngimel yf225 chenyang78 kadeng muchulee8 aakhundov

[ghstack-poisoned]
cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ngimel yf225 chenyang78 kadeng muchulee8 aakhundov

[ghstack-poisoned]
cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ngimel yf225 chenyang78 kadeng muchulee8 aakhundov

[ghstack-poisoned]
cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ngimel yf225 chenyang78 kadeng muchulee8 aakhundov

[ghstack-poisoned]
Copy link
Contributor Author

@ipiszy ipiszy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @kadeng !

"""

cuda_command = repr(
cuda_compile_command(["dummy_input"], "dummy_output", dst_file_ext)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes this is intentional. This is used to generate a dummy command for code hash purpose.


if f_dlclose is not None:
f_dlclose.argtypes = [c_void_p]
f_dlclose(self.DLL._handle)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All self.DLL access is guarded by self.is_open already.



def _cuda_compiler() -> str:
return "nvcc"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you refer to cmake/public/cuda.cmake? From my understanding, this is written in CMake language and requires cmake to execute. I'm not familiar with CMake, and it doesn't look very trivial to do the integration. Since the cutlass integration is new, I'd like make it work first and then update the surrounding pieces, e.g. env setup, etc. For now I would just add a config in Inductor to let user specify nvcc path if they need.

ipiszy added 2 commits August 26, 2023 12:19
cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ngimel yf225 chenyang78 kadeng muchulee8 aakhundov

[ghstack-poisoned]
This is the step 2 to add cutlass as an alternative inductor backend.
Feature request: #106991.




cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ngimel yf225 chenyang78 kadeng muchulee8 aakhundov

[ghstack-poisoned]
@ipiszy ipiszy requested review from aakhundov and jansel August 27, 2023 01:34
@ipiszy ipiszy marked this pull request as ready for review August 27, 2023 01:34
use_fast_math = False

# Path to CUDA NVCC.
# NVCC search priority:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: "NVCC search order"?

Comment on lines 433 to 434
# 2)CUDACXX env
# 3)CUDA_HOME env
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better expand to "environment variable", imho.

from torch._inductor.exc import CUDACompileError
from torch.testing._internal.common_utils import TestCase as TorchTestCase

_SOURCE_CODE = r"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks specific to the TestCUDACodeCache. Hide it within the class?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel it's okay since this file is only for test_cudacodecache, and the variable follows the underscore prefix convention to mark it as internal only.

return None


def nvcc_exist(nvcc_path: str = "nvcc") -> bool:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Optional[str]. mypy will be unhappy otherwise.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems that it's okay.

extra_ldflags.append("-lcuda")
extra_ldflags.append("-lcudart")
else:
raise NotImplementedError("Unsupported env, failed to find cuda libs!")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If only Linux is supported at the moment, maybe let's be specific about it in the exception text?



def _cuda_lib_options() -> List[str]:
from torch.utils import cpp_extension
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just in case: the other day I've found out that cpp_extension doesn't quite work internally (in my case: for inline compilation with ninja). Maybe for this particular use case it's fine, though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By internally, do you mean fbcode or something else? Could you help explain more?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, in fbcode. More context in this comment. Maybe irrelevant for this PR, just a heads-up.

__attribute__((__visibility__("default")))
int saxpy(int n, float a, float *x, float *y) {
// Perform SAXPY
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is SAXPY? Does it stand for something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haha it's "Single-precision A*X Plus Y". This code is from https://developer.nvidia.com/blog/easy-introduction-cuda-c-and-c/.



def _cuda_compiler() -> str:
return "nvcc"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This won't work since CMAKE often runs on a different machine than pytorch gets pip installed on.

def _cutlass_include_paths() -> List[str]:
from torch.utils import cpp_extension

cutlass_path = os.path.join(cpp_extension._TORCH_PATH, "../third_party/cutlass")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this will work when pytorch is pip installed, may need to copy headers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refined it a bit to fetch cutlass path from the default inductor config and allow users defining a custom cutlass path.

Copy link
Contributor

@kadeng kadeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me

This is the step 2 to add cutlass as an alternative inductor backend.
Feature request: #106991.




cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ngimel yf225 chenyang78 kadeng muchulee8 aakhundov

[ghstack-poisoned]
This is the step 2 to add cutlass as an alternative inductor backend.
Feature request: #106991.




cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ngimel yf225 chenyang78 kadeng muchulee8 aakhundov

[ghstack-poisoned]
This is the step 2 to add cutlass as an alternative inductor backend.
Feature request: #106991.




cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ngimel yf225 chenyang78 kadeng muchulee8 aakhundov

[ghstack-poisoned]
return None


def nvcc_exist(nvcc_path: str = "nvcc") -> bool:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

functools.lru_cache around this since subprocess.call is expensive.

This is the step 2 to add cutlass as an alternative inductor backend.
Feature request: #106991.




cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ngimel yf225 chenyang78 kadeng muchulee8 aakhundov

[ghstack-poisoned]
This is the step 2 to add cutlass as an alternative inductor backend.
Feature request: #106991.




cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ngimel yf225 chenyang78 kadeng muchulee8 aakhundov

[ghstack-poisoned]
This is the step 2 to add cutlass as an alternative inductor backend.
Feature request: #106991.




cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ngimel yf225 chenyang78 kadeng muchulee8 aakhundov

[ghstack-poisoned]
This is the step 2 to add cutlass as an alternative inductor backend.
Feature request: #106991.




cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ngimel yf225 chenyang78 kadeng muchulee8 aakhundov

[ghstack-poisoned]
@ipiszy
Copy link
Contributor Author

ipiszy commented Sep 9, 2023

@pytorchbot label "topic: not user facing"

@pytorch-bot pytorch-bot bot added the topic: not user facing topic category label Sep 9, 2023
@ipiszy
Copy link
Contributor Author

ipiszy commented Sep 11, 2023

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Sep 11, 2023
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: 1 jobs have failed, first few of them are: trunk / macos-12-py3-arm64 / test (default, 3, 3, macos-m1-12)

Details for Dev Infra team Raised by workflow job

This is the step 2 to add cutlass as an alternative inductor backend.
Feature request: #106991.




cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ngimel yf225 chenyang78 kadeng muchulee8 aakhundov

[ghstack-poisoned]
This is the step 2 to add cutlass as an alternative inductor backend.
Feature request: #106991.




cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ngimel yf225 chenyang78 kadeng muchulee8 aakhundov

[ghstack-poisoned]
pytorchmergebot pushed a commit that referenced this pull request Sep 12, 2023
…kRequest (#107901)

This is the step 3 to add cutlass as an alternative inductor backend.
Full tests can be found from the last PR in the stack.

Feature request: #106991.

Pull Request resolved: #107901
Approved by: https://github.com/jansel, https://github.com/aakhundov, https://github.com/kadeng
ghstack dependencies: #107802, #107847
pytorchmergebot pushed a commit that referenced this pull request Sep 12, 2023
This is the step 4 to add cutlass as an alternative inductor backend.
Full tests can be found from the last PR in the stack.

Feature request: #106991.

Pull Request resolved: #107931
Approved by: https://github.com/aakhundov, https://github.com/jansel, https://github.com/kadeng
ghstack dependencies: #107802, #107847, #107901
pytorchmergebot pushed a commit that referenced this pull request Sep 12, 2023
This is the step 5 to add cutlass as an alternative inductor backend.

Feature request: #106991.

Pull Request resolved: #108015
Approved by: https://github.com/kadeng, https://github.com/jansel, https://github.com/aakhundov
ghstack dependencies: #107802, #107847, #107901, #107931
@facebook-github-bot facebook-github-bot deleted the gh/ipiszy@gmail.com/2/head branch September 16, 2023 14:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants