-
Notifications
You must be signed in to change notification settings - Fork 25.7k
[Pytorch][Vulkan] Add bmm op #109360
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Pytorch][Vulkan] Add bmm op #109360
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/109360
Note: Links to docs will display an error until the docs builds have been completed. ✅ You can merge normally! (2 Unrelated Failures)As of commit 6108108 with merge base 238fb66 ( FLAKY - The following job failed but was likely due to flakiness present on trunk:
UNSTABLE - The following job failed but was likely due to flakiness present on trunk and has been marked as unstable:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
This pull request was exported from Phabricator. Differential Revision: D49306279 |
|
This pull request was exported from Phabricator. Differential Revision: D49306279 |
6b4216e to
ee0ed44
Compare
|
This pull request was exported from Phabricator. Differential Revision: D49306279 |
Summary: Pull Request resolved: pytorch#109360 BMM is developed on top of MM methodology, the main difference is the 1st input matrix changed from a standard Vulkan 2D tensor to a Vulkan 3D tenser, so the indexing changed quite differently. The matrices of a batch are appended on z-dimension and the channel of size 4 (texel). The 2nd input matrix remains the same as a packed format (fit a `H*W` matrix into a `H/2 * W/2 * 1` 3D image texture by utilizing all 4 values in the texel), but appends more matrices in the batch on z-dimension (only has 1 element in the case of MM). **Vulkan 2D Basic (1st input of MM & output):** ``` ivec3 pos(j, i, 0); float v = texelFetch(uInput, pos, 0)[0]; # no batch ``` **Vulkan 3D Basic (1st input of BMM & output):** ``` ivec3 pos(k, j, i/4); float v = texelFetch(uInput, pos, 0)[i % 4]; # i as batch id ``` **Packed weights (2nd input of MM):** ``` ivec3 pos(k_, j_, 0); float v = texelFetch(uInput, pos, 0) # v.xyzw are 4 numbers in one matrix # no batch # k_, j_ has only 1/4 of the range as the original matrix size (H*W matrix i=> H/2*W/2*1 3D Image). ``` **Packed weights (2nd input of BMM):** ``` ivec3 pos(k_, j_, i); float v = texelFetch(uInput, pos, 0) # v.xyzw are 4 numbers in one matrix # i as batch id ``` Based on the different indexing of MM & BMM. I modified the MM methodology to produce the desired output image. Test Plan: ``` [ttingchulin@27298.od /data/sandcastle/boxes/fbsource (bmm)]$ LD_LIBRARY_PATH=third-party/swiftshader/lib/linux-x64/ buck run fbcode/mode/dev-nosan //xplat/caffe2:pt_vulkan_api_test_bin -- --gtest_filter="*<test>*" eg. -- --gtest_filter="*mm*" Building: finished in 0.1 sec (100%) 328/3361 jobs, 0/3361 updated Total time: 0.1 sec BUILD SUCCEEDED Running main() from xplat/third-party/gmock/googletest-1.12.1/googletest/src/gtest_main.cc Note: Google Test filter = *mm* [==========] Running 8 tests from 1 test suite. [----------] Global test environment set-up. [----------] 8 tests from VulkanAPITest [ RUN ] VulkanAPITest.addmm [ OK ] VulkanAPITest.addmm (125 ms) [ RUN ] VulkanAPITest.addmm_expand [ OK ] VulkanAPITest.addmm_expand (76 ms) [ RUN ] VulkanAPITest.addmm_expand2 [ OK ] VulkanAPITest.addmm_expand2 (0 ms) [ RUN ] VulkanAPITest.bmm [ OK ] VulkanAPITest.bmm (152 ms) [ RUN ] VulkanAPITest.bmm_large [ OK ] VulkanAPITest.bmm_large (4818 ms) [ RUN ] VulkanAPITest.bmm_small [ OK ] VulkanAPITest.bmm_small (4 ms) [ RUN ] VulkanAPITest.bmm_one [ OK ] VulkanAPITest.bmm_one (0 ms) [ RUN ] VulkanAPITest.mm [ OK ] VulkanAPITest.mm (55 ms) [----------] 8 tests from VulkanAPITest (5233 ms total) [----------] Global test environment tear-down [==========] 8 tests from 1 test suite ran. (5233 ms total) [ PASSED ] 8 tests. ``` Differential Revision: D49306279 fbshipit-source-id: 46558c105873f60743c22a3fa52368084cc11c0a
ee0ed44 to
dc72f99
Compare
|
This pull request was exported from Phabricator. Differential Revision: D49306279 |
dc72f99 to
b13eb1b
Compare
Summary: Pull Request resolved: pytorch#109360 BMM is developed on top of MM methodology, the main difference is the 1st input matrix changed from a standard Vulkan 2D tensor to a Vulkan 3D tenser, so the indexing changed quite differently. The matrices of a batch are appended on z-dimension and the channel of size 4 (texel). The 2nd input matrix remains the same as a packed format (fit a `H*W` matrix into a `H/2 * W/2 * 1` 3D image texture by utilizing all 4 values in the texel), but appends more matrices in the batch on z-dimension (only has 1 element in the case of MM). **Vulkan 2D Basic (1st input of MM & output):** ``` ivec3 pos(j, i, 0); float v = texelFetch(uInput, pos, 0)[0]; # no batch ``` **Vulkan 3D Basic (1st input of BMM & output):** ``` ivec3 pos(k, j, i/4); float v = texelFetch(uInput, pos, 0)[i % 4]; # i as batch id ``` **Packed weights (2nd input of MM):** ``` ivec3 pos(k_, j_, 0); float v = texelFetch(uInput, pos, 0) # v.xyzw are 4 numbers in one matrix # no batch # k_, j_ has only 1/4 of the range as the original matrix size (H*W matrix i=> H/2*W/2*1 3D Image). ``` **Packed weights (2nd input of BMM):** ``` ivec3 pos(k_, j_, i); float v = texelFetch(uInput, pos, 0) # v.xyzw are 4 numbers in one matrix # i as batch id ``` Based on the different indexing of MM & BMM. I modified the MM methodology to produce the desired output image. Test Plan: ``` [ttingchulin@27298.od /data/sandcastle/boxes/fbsource (bmm)]$ LD_LIBRARY_PATH=third-party/swiftshader/lib/linux-x64/ buck run fbcode/mode/dev-nosan //xplat/caffe2:pt_vulkan_api_test_bin -- --gtest_filter="*<test>*" eg. -- --gtest_filter="*mm*" Building: finished in 0.1 sec (100%) 328/3361 jobs, 0/3361 updated Total time: 0.1 sec BUILD SUCCEEDED Running main() from xplat/third-party/gmock/googletest-1.12.1/googletest/src/gtest_main.cc Note: Google Test filter = *mm* [==========] Running 8 tests from 1 test suite. [----------] Global test environment set-up. [----------] 8 tests from VulkanAPITest [ RUN ] VulkanAPITest.addmm [ OK ] VulkanAPITest.addmm (125 ms) [ RUN ] VulkanAPITest.addmm_expand [ OK ] VulkanAPITest.addmm_expand (76 ms) [ RUN ] VulkanAPITest.addmm_expand2 [ OK ] VulkanAPITest.addmm_expand2 (0 ms) [ RUN ] VulkanAPITest.bmm [ OK ] VulkanAPITest.bmm (152 ms) [ RUN ] VulkanAPITest.bmm_large [ OK ] VulkanAPITest.bmm_large (4818 ms) [ RUN ] VulkanAPITest.bmm_small [ OK ] VulkanAPITest.bmm_small (4 ms) [ RUN ] VulkanAPITest.bmm_one [ OK ] VulkanAPITest.bmm_one (0 ms) [ RUN ] VulkanAPITest.mm [ OK ] VulkanAPITest.mm (55 ms) [----------] 8 tests from VulkanAPITest (5233 ms total) [----------] Global test environment tear-down [==========] 8 tests from 1 test suite ran. (5233 ms total) [ PASSED ] 8 tests. ``` Differential Revision: D49306279 fbshipit-source-id: f906be557cd446d011d9e9deaea4315352521312
Summary: BMM is developed on top of MM methodology, the main difference is the 1st input matrix changed from a standard Vulkan 2D tensor to a Vulkan 3D tenser, so the indexing changed quite differently. The matrices of a batch are appended on z-dimension and the channel of size 4 (texel). The 2nd input matrix remains the same as a packed format (fit a `H*W` matrix into a `H/2 * W/2 * 1` 3D image texture by utilizing all 4 values in the texel), but appends more matrices in the batch on z-dimension (only has 1 element in the case of MM). **Vulkan 2D Basic (1st input of MM & output):** ``` ivec3 pos(j, i, 0); float v = texelFetch(uInput, pos, 0)[0]; # no batch ``` **Vulkan 3D Basic (1st input of BMM & output):** ``` ivec3 pos(k, j, i/4); float v = texelFetch(uInput, pos, 0)[i % 4]; # i as batch id ``` **Packed weights (2nd input of MM):** ``` ivec3 pos(k_, j_, 0); float v = texelFetch(uInput, pos, 0) # v.xyzw are 4 numbers in one matrix # no batch # k_, j_ has only 1/4 of the range as the original matrix size (H*W matrix i=> H/2*W/2*1 3D Image). ``` **Packed weights (2nd input of BMM):** ``` ivec3 pos(k_, j_, i); float v = texelFetch(uInput, pos, 0) # v.xyzw are 4 numbers in one matrix # i as batch id ``` Based on the different indexing of MM & BMM. I modified the MM methodology to produce the desired output image. Test Plan: ``` [ttingchulin@27298.od /data/sandcastle/boxes/fbsource (bmm)]$ LD_LIBRARY_PATH=third-party/swiftshader/lib/linux-x64/ buck run fbcode/mode/dev-nosan //xplat/caffe2:pt_vulkan_api_test_bin -- --gtest_filter="*<test>*" eg. -- --gtest_filter="*mm*" Building: finished in 0.1 sec (100%) 328/3361 jobs, 0/3361 updated Total time: 0.1 sec BUILD SUCCEEDED Running main() from xplat/third-party/gmock/googletest-1.12.1/googletest/src/gtest_main.cc Note: Google Test filter = *mm* [==========] Running 8 tests from 1 test suite. [----------] Global test environment set-up. [----------] 8 tests from VulkanAPITest [ RUN ] VulkanAPITest.addmm [ OK ] VulkanAPITest.addmm (125 ms) [ RUN ] VulkanAPITest.addmm_expand [ OK ] VulkanAPITest.addmm_expand (76 ms) [ RUN ] VulkanAPITest.addmm_expand2 [ OK ] VulkanAPITest.addmm_expand2 (0 ms) [ RUN ] VulkanAPITest.bmm [ OK ] VulkanAPITest.bmm (152 ms) [ RUN ] VulkanAPITest.bmm_large [ OK ] VulkanAPITest.bmm_large (4818 ms) [ RUN ] VulkanAPITest.bmm_small [ OK ] VulkanAPITest.bmm_small (4 ms) [ RUN ] VulkanAPITest.bmm_one [ OK ] VulkanAPITest.bmm_one (0 ms) [ RUN ] VulkanAPITest.mm [ OK ] VulkanAPITest.mm (55 ms) [----------] 8 tests from VulkanAPITest (5233 ms total) [----------] Global test environment tear-down [==========] 8 tests from 1 test suite ran. (5233 ms total) [ PASSED ] 8 tests. ``` Reviewed By: yipjustin Differential Revision: D49306279
b13eb1b to
6108108
Compare
|
This pull request was exported from Phabricator. Differential Revision: D49306279 |
|
@pytorchbot merge (Initiating merge automatically since Phabricator Diff has merged) |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Summary:
BMM is developed on top of MM methodology, the main difference is the 1st input matrix changed from a standard Vulkan 2D tensor to a Vulkan 3D tenser, so the indexing changed quite differently. The matrices of a batch are appended on z-dimension and the channel of size 4 (texel).
The 2nd input matrix remains the same as a packed format (fit a
H*Wmatrix into aH/2 * W/2 * 13D image texture by utilizing all 4 values in the texel), but appends more matrices in the batch on z-dimension (only has 1 element in the case of MM).Vulkan 2D Basic (1st input of MM & output):
Vulkan 3D Basic (1st input of BMM & output):
Packed weights (2nd input of MM):
Packed weights (2nd input of BMM):
Based on the different indexing of MM & BMM. I modified the MM methodology to produce the desired output image.
Test Plan:
Differential Revision: D49306279