-
Notifications
You must be signed in to change notification settings - Fork 25.7k
[MPS] Compile kernels into Metallib #138636
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/138636
Note: Links to docs will display an error until the docs builds have been completed. ❌ 1 New FailureAs of commit 01db75e with merge base 5e4c8b6 ( NEW FAILURE - The following job has failed:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
Nice improvement! We rely on existing tests? |
Yes, existing tests should be sufficient, though I would need to figure out how to install Xcode on our runners, otherwise they'll have to use embedded codepath |
94e7c46 to
3a0a1f2
Compare
| return() | ||
| endif() | ||
|
|
||
| set(BFLOAT_METAL_CODE " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we have metal code in the .cmake file? Can me move that inside some bfloat_utils.metal file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one is not a real kernels, it just checks if host can compile something, this is a pretty standard practice to embed trivial test code in .cmake files, see
pytorch/cmake/Modules/FindAVX.cmake
Lines 5 to 14 in 86b45bd
| SET(AVX_CODE " | |
| #include <immintrin.h> | |
| int main() | |
| { | |
| __m256 a; | |
| a = _mm256_set1_ps(0); | |
| return 0; | |
| } | |
| ") |
Pros: - Faster runtime/validation, allows one to use shared headers Cons: - Needs a bit of an build system work - Needs XCode rather just a CommandLineTools to make it work
05e37d3 to
9f3fd2e
Compare
|
@pytorchbot merge -f "Builds and tests are green" |
Merge startedYour change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
|
started getting build errors on my mac about missing aten/src/ATen/metallib_dummy.cpp locally on mac running |
Line 688 in 3179eb1
|
Not sure how it works on some machines, but clean build fails for me after #138636 was landed, even though it works fine on another machine. Solution is to create an empty file when one adds a dependency, but later this dependency will be updated by the build rule
Not sure how it works on some machines, but clean build fails for me after #138636 was landed, even though it works fine on another machine. Solution is to create an empty file when one adds a dependency, but later this dependency will be updated by the build rule Pull Request resolved: #139651 Approved by: https://github.com/atalman
PyTorch MPS backend for the most part relies on MPSGraph to provide specific operations, but recently more and more often one had to implement custom kernel here that were simply embedded in the operator codebase and were compiled directly using [`- id<MTLLibrary>newLibraryWithSource:options:error:`](https://developer.apple.com/documentation/metal/mtldevice/1433431-newlibrarywithsource) (first metal kernel to MPS backend was added in pytorch#82307 ) Later on, as number of operator grew, those were refactored into `MetalShaderLibrary` convenience class (see pytorch#125550 ) But as number of kernels keeps growing, it's time to make a next step and properly compile them into `.metalib` This PR does exactly that by: - Moving shader sources into separate .metal files - Adds check on whether full Xcode installed or just DeveloperTools - If full Xcode is installed, compiles and links shaders into .metallib for Metal-3.0(Available on MacOS 13) and Metal-3.1 standard (available on MacOS 14, can use bfloat) and bundles both using `-sectcreate` linker option and `getsectiondata` API call. `metallib_dummy.cpp` file is used to properly express dependencies between metallib build and torch_cpu link stages. Logic for generating metallibraries is loosely based on https://github.com/ml-explore/mlx/blob/main/mlx/backend/metal/kernels/CMakeLists.txt. - If only DeveloperTools CLI is installed, automatically wraps .metal into `_metallib.h` that contains shader source wrapped in `MetalShaderLibrary` Bulk of changes introduced in this PR are just moving code around. I.e. for every file that contains non-templated shader definition in `aten/src/ATen/native/mps/operators` folder, corresponding `.metal` file is created in `aten/src/ATen/native/mps/kernels` folder and embedded shader definition is replaced with the following ```cpp #ifndef PYTORCH_JIT_COMPILE_SHADERS static auto& lib = MetalShaderLibrary::getBundledLibrary(); #else #include <ATen/native/mps/OpName_metallib.h> #endif ``` Some historical stats: | PyTorch Version | Number of shaders in MPS | Ops added | | ------------- | ------------- | ---- | | 1.12 | 0 | | | 1.13 | 2 | bitwise_ops and index.out | | 2.0 | 4 | cross repeat and view) | | 2.1 | 9 | unary_ops, histogram, renorm, binary_ops | | 2.2 | 11 | gamma and bucketization | | 2.3 | 12 | naive_matmul (to workaround crash) | | 2.4 | 13 | quantized_mm | | 2.5 | 14 | fused_adam | Pros: - Better code structure/readability - Eventually allows one to use shared headers (and implement something like `TensorIterator`) - Faster runtime (as compilation is done ahead of time) and perhaps better optimized compiled kernels Cons: - Build process is a bit more complicated that it used to be - Need to maintain two codepath (as our CI builders only has DeveloperTools installed) Pull Request resolved: pytorch#138636 Approved by: https://github.com/manuelcandales
PyTorch MPS backend for the most part relies on MPSGraph to provide specific operations, but recently more and more often one had to implement custom kernel here that were simply embedded in the operator codebase and were compiled directly using
- id<MTLLibrary>newLibraryWithSource:options:error:(first metal kernel to MPS backend was added in #82307 )Later on, as number of operator grew, those were refactored into
MetalShaderLibraryconvenience class (see #125550 )But as number of kernels keeps growing, it's time to make a next step and properly compile them into
.metalibThis PR does exactly that by:
-sectcreatelinker option andgetsectiondataAPI call.metallib_dummy.cppfile is used to properly express dependencies between metallib build and torch_cpu link stages. Logic for generating metallibraries is loosely based on https://github.com/ml-explore/mlx/blob/main/mlx/backend/metal/kernels/CMakeLists.txt._metallib.hthat contains shader source wrapped inMetalShaderLibraryBulk of changes introduced in this PR are just moving code around. I.e. for every file that contains non-templated shader definition in
aten/src/ATen/native/mps/operatorsfolder, corresponding.metalfile is created inaten/src/ATen/native/mps/kernelsfolder and embedded shader definition is replaced with the followingSome historical stats:
Pros:
TensorIterator)Cons:
cc @albanD