-
Notifications
You must be signed in to change notification settings - Fork 25.7k
[Inductor CUTLASS backend] Step 2: CUDACodeCache #107847
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
[ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/107847
Note: Links to docs will display an error until the docs builds have been completed. ❌ 1 New Failure, 2 Unrelated FailuresAs of commit 5fd7946 with merge base f9a250c ( NEW FAILURE - The following job has failed:
BROKEN TRUNK - The following job failed but were present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
UNSTABLE - The following job failed but was likely due to flakiness present on trunk and has been marked as unstable:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
|
||
|
|
||
| def _cuda_compiler() -> str: | ||
| return "nvcc" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should use the same nvcc executable that is used and detected by torch's CMAKE build script, as that might not be the one that's on the path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you refer to cmake/public/cuda.cmake? From my understanding, this is written in CMake language and requires cmake to execute. I'm not familiar with CMake, and it doesn't look very trivial to do the integration. Since the cutlass integration is new, I'd like make it work first and then update the surrounding pieces, e.g. env setup, etc. For now I would just add a config in Inductor to let user specify nvcc path if they need.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This won't work since CMAKE often runs on a different machine than pytorch gets pip installed on.
|
|
||
| if f_dlclose is not None: | ||
| f_dlclose.argtypes = [c_void_p] | ||
| f_dlclose(self.DLL._handle) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think if this returns, we should set self.DLL = None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All self.DLL access is guarded by self.is_open already.
| """ | ||
|
|
||
| cuda_command = repr( | ||
| cuda_compile_command(["dummy_input"], "dummy_output", dst_file_ext) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this "dummy_input" intentional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes this is intentional. This is used to generate a dummy command for code hash purpose.
cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ngimel yf225 chenyang78 kadeng muchulee8 aakhundov [ghstack-poisoned]
cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ngimel yf225 chenyang78 kadeng muchulee8 aakhundov [ghstack-poisoned]
cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ngimel yf225 chenyang78 kadeng muchulee8 aakhundov [ghstack-poisoned]
cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ngimel yf225 chenyang78 kadeng muchulee8 aakhundov [ghstack-poisoned]
cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ngimel yf225 chenyang78 kadeng muchulee8 aakhundov [ghstack-poisoned]
cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ngimel yf225 chenyang78 kadeng muchulee8 aakhundov [ghstack-poisoned]
cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ngimel yf225 chenyang78 kadeng muchulee8 aakhundov [ghstack-poisoned]
cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ngimel yf225 chenyang78 kadeng muchulee8 aakhundov [ghstack-poisoned]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @kadeng !
| """ | ||
|
|
||
| cuda_command = repr( | ||
| cuda_compile_command(["dummy_input"], "dummy_output", dst_file_ext) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes this is intentional. This is used to generate a dummy command for code hash purpose.
|
|
||
| if f_dlclose is not None: | ||
| f_dlclose.argtypes = [c_void_p] | ||
| f_dlclose(self.DLL._handle) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All self.DLL access is guarded by self.is_open already.
|
|
||
|
|
||
| def _cuda_compiler() -> str: | ||
| return "nvcc" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you refer to cmake/public/cuda.cmake? From my understanding, this is written in CMake language and requires cmake to execute. I'm not familiar with CMake, and it doesn't look very trivial to do the integration. Since the cutlass integration is new, I'd like make it work first and then update the surrounding pieces, e.g. env setup, etc. For now I would just add a config in Inductor to let user specify nvcc path if they need.
cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ngimel yf225 chenyang78 kadeng muchulee8 aakhundov [ghstack-poisoned]
This is the step 2 to add cutlass as an alternative inductor backend. Feature request: #106991. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ngimel yf225 chenyang78 kadeng muchulee8 aakhundov [ghstack-poisoned]
torch/_inductor/config.py
Outdated
| use_fast_math = False | ||
|
|
||
| # Path to CUDA NVCC. | ||
| # NVCC search priority: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: "NVCC search order"?
torch/_inductor/config.py
Outdated
| # 2)CUDACXX env | ||
| # 3)CUDA_HOME env |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better expand to "environment variable", imho.
| from torch._inductor.exc import CUDACompileError | ||
| from torch.testing._internal.common_utils import TestCase as TorchTestCase | ||
|
|
||
| _SOURCE_CODE = r""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks specific to the TestCUDACodeCache. Hide it within the class?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel it's okay since this file is only for test_cudacodecache, and the variable follows the underscore prefix convention to mark it as internal only.
| return None | ||
|
|
||
|
|
||
| def nvcc_exist(nvcc_path: str = "nvcc") -> bool: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Optional[str]. mypy will be unhappy otherwise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems that it's okay.
torch/_inductor/codecache.py
Outdated
| extra_ldflags.append("-lcuda") | ||
| extra_ldflags.append("-lcudart") | ||
| else: | ||
| raise NotImplementedError("Unsupported env, failed to find cuda libs!") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If only Linux is supported at the moment, maybe let's be specific about it in the exception text?
|
|
||
|
|
||
| def _cuda_lib_options() -> List[str]: | ||
| from torch.utils import cpp_extension |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just in case: the other day I've found out that cpp_extension doesn't quite work internally (in my case: for inline compilation with ninja). Maybe for this particular use case it's fine, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By internally, do you mean fbcode or something else? Could you help explain more?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, in fbcode. More context in this comment. Maybe irrelevant for this PR, just a heads-up.
| __attribute__((__visibility__("default"))) | ||
| int saxpy(int n, float a, float *x, float *y) { | ||
| // Perform SAXPY |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is SAXPY? Does it stand for something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haha it's "Single-precision A*X Plus Y". This code is from https://developer.nvidia.com/blog/easy-introduction-cuda-c-and-c/.
|
|
||
|
|
||
| def _cuda_compiler() -> str: | ||
| return "nvcc" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This won't work since CMAKE often runs on a different machine than pytorch gets pip installed on.
torch/_inductor/codecache.py
Outdated
| def _cutlass_include_paths() -> List[str]: | ||
| from torch.utils import cpp_extension | ||
|
|
||
| cutlass_path = os.path.join(cpp_extension._TORCH_PATH, "../third_party/cutlass") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this will work when pytorch is pip installed, may need to copy headers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refined it a bit to fetch cutlass path from the default inductor config and allow users defining a custom cutlass path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good to me
This is the step 2 to add cutlass as an alternative inductor backend. Feature request: #106991. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ngimel yf225 chenyang78 kadeng muchulee8 aakhundov [ghstack-poisoned]
This is the step 2 to add cutlass as an alternative inductor backend. Feature request: #106991. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ngimel yf225 chenyang78 kadeng muchulee8 aakhundov [ghstack-poisoned]
This is the step 2 to add cutlass as an alternative inductor backend. Feature request: #106991. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ngimel yf225 chenyang78 kadeng muchulee8 aakhundov [ghstack-poisoned]
| return None | ||
|
|
||
|
|
||
| def nvcc_exist(nvcc_path: str = "nvcc") -> bool: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
functools.lru_cache around this since subprocess.call is expensive.
This is the step 2 to add cutlass as an alternative inductor backend. Feature request: #106991. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ngimel yf225 chenyang78 kadeng muchulee8 aakhundov [ghstack-poisoned]
This is the step 2 to add cutlass as an alternative inductor backend. Feature request: #106991. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ngimel yf225 chenyang78 kadeng muchulee8 aakhundov [ghstack-poisoned]
This is the step 2 to add cutlass as an alternative inductor backend. Feature request: #106991. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ngimel yf225 chenyang78 kadeng muchulee8 aakhundov [ghstack-poisoned]
This is the step 2 to add cutlass as an alternative inductor backend. Feature request: #106991. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ngimel yf225 chenyang78 kadeng muchulee8 aakhundov [ghstack-poisoned]
|
@pytorchbot label "topic: not user facing" |
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Merge failedReason: 1 jobs have failed, first few of them are: trunk / macos-12-py3-arm64 / test (default, 3, 3, macos-m1-12) Details for Dev Infra teamRaised by workflow job |
This is the step 2 to add cutlass as an alternative inductor backend. Feature request: #106991. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ngimel yf225 chenyang78 kadeng muchulee8 aakhundov [ghstack-poisoned]
This is the step 2 to add cutlass as an alternative inductor backend. Feature request: #106991. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ngimel yf225 chenyang78 kadeng muchulee8 aakhundov [ghstack-poisoned]
…kRequest (#107901) This is the step 3 to add cutlass as an alternative inductor backend. Full tests can be found from the last PR in the stack. Feature request: #106991. Pull Request resolved: #107901 Approved by: https://github.com/jansel, https://github.com/aakhundov, https://github.com/kadeng ghstack dependencies: #107802, #107847
This is the step 4 to add cutlass as an alternative inductor backend. Full tests can be found from the last PR in the stack. Feature request: #106991. Pull Request resolved: #107931 Approved by: https://github.com/aakhundov, https://github.com/jansel, https://github.com/kadeng ghstack dependencies: #107802, #107847, #107901
This is the step 5 to add cutlass as an alternative inductor backend. Feature request: #106991. Pull Request resolved: #108015 Approved by: https://github.com/kadeng, https://github.com/jansel, https://github.com/aakhundov ghstack dependencies: #107802, #107847, #107901, #107931
This is the step 2 to add cutlass as an alternative inductor backend.
Feature request: #106991.
Stack from ghstack (oldest at bottom):
cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx @peterbell10 @ngimel @yf225 @chenyang78 @kadeng @muchulee8 @aakhundov