[Pytorch][Vulkan] Add bmm op #109360

tina134 · 2023-09-15T06:27:54Z

Summary:
BMM is developed on top of MM methodology, the main difference is the 1st input matrix changed from a standard Vulkan 2D tensor to a Vulkan 3D tenser, so the indexing changed quite differently. The matrices of a batch are appended on z-dimension and the channel of size 4 (texel).

The 2nd input matrix remains the same as a packed format (fit a H*W matrix into a H/2 * W/2 * 1 3D image texture by utilizing all 4 values in the texel), but appends more matrices in the batch on z-dimension (only has 1 element in the case of MM).

Vulkan 2D Basic (1st input of MM & output):

ivec3 pos(j, i, 0);
float v = texelFetch(uInput, pos, 0)[0];
# no batch

Vulkan 3D Basic (1st input of BMM & output):

ivec3 pos(k, j, i/4);
float v = texelFetch(uInput, pos, 0)[i % 4];
# i as batch id

Packed weights (2nd input of MM):

ivec3 pos(k_, j_, 0);
float v = texelFetch(uInput, pos, 0)
# v.xyzw are 4 numbers in one matrix
# no batch
# k_, j_ has only 1/4 of the range as the original matrix size (H*W matrix i=> H/2*W/2*1 3D Image).

Packed weights (2nd input of BMM):

ivec3 pos(k_, j_, i);
float v = texelFetch(uInput, pos, 0)
# v.xyzw are 4 numbers in one matrix
# i as batch id

Based on the different indexing of MM & BMM. I modified the MM methodology to produce the desired output image.

Test Plan:

[ttingchulin@27298.od /data/sandcastle/boxes/fbsource (bmm)]$ LD_LIBRARY_PATH=third-party/swiftshader/lib/linux-x64/ buck run fbcode/mode/dev-nosan //xplat/caffe2:pt_vulkan_api_test_bin -- --gtest_filter="*<test>*" eg.  -- --gtest_filter="*mm*"
Building: finished in 0.1 sec (100%) 328/3361 jobs, 0/3361 updated
  Total time: 0.1 sec
BUILD SUCCEEDED
Running main() from xplat/third-party/gmock/googletest-1.12.1/googletest/src/gtest_main.cc
Note: Google Test filter = *mm*
[==========] Running 8 tests from 1 test suite.
[----------] Global test environment set-up.
[----------] 8 tests from VulkanAPITest
[ RUN      ] VulkanAPITest.addmm
[       OK ] VulkanAPITest.addmm (125 ms)
[ RUN      ] VulkanAPITest.addmm_expand
[       OK ] VulkanAPITest.addmm_expand (76 ms)
[ RUN      ] VulkanAPITest.addmm_expand2
[       OK ] VulkanAPITest.addmm_expand2 (0 ms)
[ RUN      ] VulkanAPITest.bmm
[       OK ] VulkanAPITest.bmm (152 ms)
[ RUN      ] VulkanAPITest.bmm_large
[       OK ] VulkanAPITest.bmm_large (4818 ms)
[ RUN      ] VulkanAPITest.bmm_small
[       OK ] VulkanAPITest.bmm_small (4 ms)
[ RUN      ] VulkanAPITest.bmm_one
[       OK ] VulkanAPITest.bmm_one (0 ms)
[ RUN      ] VulkanAPITest.mm
[       OK ] VulkanAPITest.mm (55 ms)
[----------] 8 tests from VulkanAPITest (5233 ms total)


[----------] Global test environment tear-down
[==========] 8 tests from 1 test suite ran. (5233 ms total)
[  PASSED  ] 8 tests.

Differential Revision: D49306279

pytorch-bot · 2023-09-15T06:27:56Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/109360

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (2 Unrelated Failures)

As of commit 6108108 with merge base 238fb66 ():

FLAKY - The following job failed but was likely due to flakiness present on trunk:

linux-focal-rocm5.6-py3.8 / test (default, 1, 3, linux.rocm.gpu) (gh)

UNSTABLE - The following job failed but was likely due to flakiness present on trunk and has been marked as unstable:

linux-focal-cuda11.8-py3.9-gcc9 / test (multigpu, 1, 1, linux.g5.12xlarge.nvidia.gpu, unstable) (gh)

This comment was automatically generated by Dr. CI and updates every 15 minutes.

facebook-github-bot · 2023-09-15T06:28:04Z