-
Notifications
You must be signed in to change notification settings - Fork 5.2k
Allow TernaryLogic to optimize down to BlendVariableMask where appropriate #97468
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch Issue DetailsGiven the following code: public static Vector512<float> M(Vector512<float> x, Vector512<float> y)
{
return Vector512.ConditionalSelect(Vector512.Equals(x, y), x, y);
}###Before: ; Method Program:M(System.Runtime.Intrinsics.Vector512`1[float],System.Runtime.Intrinsics.Vector512`1[float]):System.Runtime.Intrinsics.Vector512`1[float] (FullOpts)
G_M44221_IG01: ;; offset=0x0000
vzeroupper
;; size=3 bbWeight=1 PerfScore 1.00
G_M44221_IG02: ;; offset=0x0003
vmovups zmm0, zmmword ptr [rdx]
vmovups zmm1, zmmword ptr [r8]
vcmpps k1, zmm0, zmm1, 0
vpmovm2d zmm2, k1
vpternlogd zmm2, zmm0, zmm1, -54
vmovups zmmword ptr [rcx], zmm2
mov rax, rcx
;; size=41 bbWeight=1 PerfScore 14.75
G_M44221_IG03: ;; offset=0x002C
vzeroupper
ret
;; size=4 bbWeight=1 PerfScore 2.00
; Total bytes of code: 48After:; Method Program:M(System.Runtime.Intrinsics.Vector512`1[float],System.Runtime.Intrinsics.Vector512`1[float]):System.Runtime.Intrinsics.Vector512`1[float] (FullOpts)
G_M44221_IG01: ;; offset=0x0000
vzeroupper
;; size=3 bbWeight=1 PerfScore 1.00
G_M44221_IG02: ;; offset=0x0003
vmovups zmm0, zmmword ptr [rdx]
vmovups zmm1, zmmword ptr [r8]
vcmpps k1, zmm0, zmm1, 0
vblendmps zmm0 k1, zmm1, zmm0
vmovups zmmword ptr [rcx], zmm0
mov rax, rcx
;; size=35 bbWeight=1 PerfScore 13.75
G_M44221_IG03: ;; offset=0x0026
vzeroupper
ret
;; size=4 bbWeight=1 PerfScore 2.00
; Total bytes of code: 42
|
|
You'd mentioned to me such lowering is already being done in some cases. For my edification, how does the scenerios this newly covers differ from what was already covered? Is this about 512 specifically now? |
Diff results for #97468Assembly diffsAssembly diffs for linux/x64 ran on windows/x64Diffs are based on 2,512,262 contexts (977,780 MinOpts, 1,534,482 FullOpts). Overall (-10,720 bytes)
MinOpts (-248 bytes)
FullOpts (-10,472 bytes)
Assembly diffs for windows/x64 ran on windows/x64Diffs are based on 2,373,201 contexts (928,756 MinOpts, 1,444,445 FullOpts). Overall (-2,856 bytes)
MinOpts (-248 bytes)
FullOpts (-2,608 bytes)
Details here Assembly diffs for windows/x86 ran on windows/x86Diffs are based on 2,299,121 contexts (840,463 MinOpts, 1,458,658 FullOpts). MISSED contexts: 7 (0.00%) Overall (-2,931 bytes)
FullOpts (-2,931 bytes)
Details here Assembly diffs for windows/x86 ran on linux/x86Diffs are based on 2,299,121 contexts (840,463 MinOpts, 1,458,658 FullOpts). MISSED contexts: 7 (0.00%) Overall (-3,338 bytes)
FullOpts (-3,338 bytes)
Details here |
|
This is specifically about handling Basically ensuring that the codegen is optimal for V512, where it was already handling this for V128/V256. |
|
In the diffs just linked, you can see an example of this where we can remove a few - vpmovm2b zmm2, k1
- vxorps ymm3, ymm3, ymm3
- vpcmpub k1, zmm0, zmm3, 1
- vpmovm2b zmm3, k1
- vpternlogd zmm3, zmm0, zmm1, -54
- vpcmpub k1, zmm0, zmm1, 1
- vpmovm2b zmm4, k1
- vpternlogd zmm4, zmm0, zmm1, -54
- vpternlogd zmm2, zmm3, zmm4, -54
- vmovups zmmword ptr [rdi], zmm2
+ vxorps ymm2, ymm2, ymm2
+ vpcmpub k2, zmm0, zmm2, 1
+ vpblendmb zmm2 {k2}, zmm1, zmm0
+ vpcmpub k2, zmm0, zmm1, 1
+ vpblendmb zmm0 {k2}, zmm1, zmm0
+ vpblendmb zmm0 {k1}, zmm0, zmm2
+ vmovups zmmword ptr [rdi], zmm0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
src/coreclr/jit/morph.cpp
Outdated
| #endif | ||
|
|
||
| #include "allocacheck.h" // for alloca | ||
| #include "hwintrinsic.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, no. I had started by doing the change in morph and realized it wouldn't be a good fit due to the parameter swapping and needing to deal with potential side effects from commas/etc
Diff results for #97468Assembly diffsAssembly diffs for windows/x64 ran on linux/x64Diffs are based on 2,366,413 contexts (928,740 MinOpts, 1,437,673 FullOpts). MISSED contexts: 6,788 (0.29%) Overall (-2,856 bytes)
MinOpts (-248 bytes)
FullOpts (-2,608 bytes)
Details here |
Diff results for #97468Assembly diffsAssembly diffs for linux/x64 ran on windows/x64Diffs are based on 2,505,358 contexts (977,766 MinOpts, 1,527,592 FullOpts). MISSED contexts: 6,904 (0.27%) Overall (-10,720 bytes)
MinOpts (-248 bytes)
FullOpts (-10,472 bytes)
Assembly diffs for windows/x64 ran on windows/x64Diffs are based on 2,366,413 contexts (928,740 MinOpts, 1,437,673 FullOpts). MISSED contexts: 6,788 (0.29%) Overall (-2,856 bytes)
MinOpts (-248 bytes)
FullOpts (-2,608 bytes)
Details here Assembly diffs for windows/x86 ran on windows/x86Diffs are based on 2,292,278 contexts (840,452 MinOpts, 1,451,826 FullOpts). MISSED contexts: 6,850 (0.30%) Overall (-3,338 bytes)
FullOpts (-3,338 bytes)
Details here |
Diff results for #97468Assembly diffsAssembly diffs for linux/x64 ran on windows/x64Diffs are based on 2,505,358 contexts (977,766 MinOpts, 1,527,592 FullOpts). MISSED contexts: 6,904 (0.27%) Overall (-10,720 bytes)
MinOpts (-248 bytes)
FullOpts (-10,472 bytes)
Assembly diffs for windows/x64 ran on windows/x64Diffs are based on 2,366,413 contexts (928,740 MinOpts, 1,437,673 FullOpts). MISSED contexts: 6,788 (0.29%) Overall (-2,856 bytes)
MinOpts (-248 bytes)
FullOpts (-2,608 bytes)
Details here Assembly diffs for windows/x86 ran on windows/x86Diffs are based on 2,292,278 contexts (840,452 MinOpts, 1,451,826 FullOpts). MISSED contexts: 6,850 (0.30%) Overall (-3,338 bytes)
FullOpts (-3,338 bytes)
Details here |
1 similar comment
Diff results for #97468Assembly diffsAssembly diffs for linux/x64 ran on windows/x64Diffs are based on 2,505,358 contexts (977,766 MinOpts, 1,527,592 FullOpts). MISSED contexts: 6,904 (0.27%) Overall (-10,720 bytes)
MinOpts (-248 bytes)
FullOpts (-10,472 bytes)
Assembly diffs for windows/x64 ran on windows/x64Diffs are based on 2,366,413 contexts (928,740 MinOpts, 1,437,673 FullOpts). MISSED contexts: 6,788 (0.29%) Overall (-2,856 bytes)
MinOpts (-248 bytes)
FullOpts (-2,608 bytes)
Details here Assembly diffs for windows/x86 ran on windows/x86Diffs are based on 2,292,278 contexts (840,452 MinOpts, 1,451,826 FullOpts). MISSED contexts: 6,850 (0.30%) Overall (-3,338 bytes)
FullOpts (-3,338 bytes)
Details here |
Diff results for #97468Assembly diffsAssembly diffs for linux/x64 ran on windows/x64Diffs are based on 2,505,358 contexts (977,766 MinOpts, 1,527,592 FullOpts). MISSED contexts: 6,904 (0.27%) Overall (-10,720 bytes)
MinOpts (-248 bytes)
FullOpts (-10,472 bytes)
Assembly diffs for windows/x64 ran on windows/x64Diffs are based on 2,366,413 contexts (928,740 MinOpts, 1,437,673 FullOpts). MISSED contexts: 6,788 (0.29%) Overall (-2,856 bytes)
MinOpts (-248 bytes)
FullOpts (-2,608 bytes)
Details here Assembly diffs for windows/x86 ran on windows/x86Diffs are based on 2,292,278 contexts (840,452 MinOpts, 1,451,826 FullOpts). MISSED contexts: 6,850 (0.30%) Overall (-3,338 bytes)
FullOpts (-3,338 bytes)
Details here |
Diff results for #97468Assembly diffsAssembly diffs for linux/x64 ran on windows/x64Diffs are based on 2,505,358 contexts (977,766 MinOpts, 1,527,592 FullOpts). MISSED contexts: 6,904 (0.27%) Overall (-10,720 bytes)
MinOpts (-248 bytes)
FullOpts (-10,472 bytes)
Assembly diffs for windows/x64 ran on windows/x64Diffs are based on 2,366,413 contexts (928,740 MinOpts, 1,437,673 FullOpts). MISSED contexts: 6,788 (0.29%) Overall (-2,856 bytes)
MinOpts (-248 bytes)
FullOpts (-2,608 bytes)
Details here Assembly diffs for windows/x86 ran on windows/x86Diffs are based on 2,292,278 contexts (840,452 MinOpts, 1,451,826 FullOpts). MISSED contexts: 6,850 (0.30%) Overall (-3,338 bytes)
FullOpts (-3,338 bytes)
Details here |
4 similar comments
Diff results for #97468Assembly diffsAssembly diffs for linux/x64 ran on windows/x64Diffs are based on 2,505,358 contexts (977,766 MinOpts, 1,527,592 FullOpts). MISSED contexts: 6,904 (0.27%) Overall (-10,720 bytes)
MinOpts (-248 bytes)
FullOpts (-10,472 bytes)
Assembly diffs for windows/x64 ran on windows/x64Diffs are based on 2,366,413 contexts (928,740 MinOpts, 1,437,673 FullOpts). MISSED contexts: 6,788 (0.29%) Overall (-2,856 bytes)
MinOpts (-248 bytes)
FullOpts (-2,608 bytes)
Details here Assembly diffs for windows/x86 ran on windows/x86Diffs are based on 2,292,278 contexts (840,452 MinOpts, 1,451,826 FullOpts). MISSED contexts: 6,850 (0.30%) Overall (-3,338 bytes)
FullOpts (-3,338 bytes)
Details here |
Diff results for #97468Assembly diffsAssembly diffs for linux/x64 ran on windows/x64Diffs are based on 2,505,358 contexts (977,766 MinOpts, 1,527,592 FullOpts). MISSED contexts: 6,904 (0.27%) Overall (-10,720 bytes)
MinOpts (-248 bytes)
FullOpts (-10,472 bytes)
Assembly diffs for windows/x64 ran on windows/x64Diffs are based on 2,366,413 contexts (928,740 MinOpts, 1,437,673 FullOpts). MISSED contexts: 6,788 (0.29%) Overall (-2,856 bytes)
MinOpts (-248 bytes)
FullOpts (-2,608 bytes)
Details here Assembly diffs for windows/x86 ran on windows/x86Diffs are based on 2,292,278 contexts (840,452 MinOpts, 1,451,826 FullOpts). MISSED contexts: 6,850 (0.30%) Overall (-3,338 bytes)
FullOpts (-3,338 bytes)
Details here |
Diff results for #97468Assembly diffsAssembly diffs for linux/x64 ran on windows/x64Diffs are based on 2,505,358 contexts (977,766 MinOpts, 1,527,592 FullOpts). MISSED contexts: 6,904 (0.27%) Overall (-10,720 bytes)
MinOpts (-248 bytes)
FullOpts (-10,472 bytes)
Assembly diffs for windows/x64 ran on windows/x64Diffs are based on 2,366,413 contexts (928,740 MinOpts, 1,437,673 FullOpts). MISSED contexts: 6,788 (0.29%) Overall (-2,856 bytes)
MinOpts (-248 bytes)
FullOpts (-2,608 bytes)
Details here Assembly diffs for windows/x86 ran on windows/x86Diffs are based on 2,292,278 contexts (840,452 MinOpts, 1,451,826 FullOpts). MISSED contexts: 6,850 (0.30%) Overall (-3,338 bytes)
FullOpts (-3,338 bytes)
Details here |
Diff results for #97468Assembly diffsAssembly diffs for linux/x64 ran on windows/x64Diffs are based on 2,505,358 contexts (977,766 MinOpts, 1,527,592 FullOpts). MISSED contexts: 6,904 (0.27%) Overall (-10,720 bytes)
MinOpts (-248 bytes)
FullOpts (-10,472 bytes)
Assembly diffs for windows/x64 ran on windows/x64Diffs are based on 2,366,413 contexts (928,740 MinOpts, 1,437,673 FullOpts). MISSED contexts: 6,788 (0.29%) Overall (-2,856 bytes)
MinOpts (-248 bytes)
FullOpts (-2,608 bytes)
Details here Assembly diffs for windows/x86 ran on windows/x86Diffs are based on 2,292,278 contexts (840,452 MinOpts, 1,451,826 FullOpts). MISSED contexts: 6,850 (0.30%) Overall (-3,338 bytes)
FullOpts (-3,338 bytes)
Details here |
Diff results for #97468Assembly diffsAssembly diffs for linux/x64 ran on windows/x64Diffs are based on 2,505,358 contexts (977,766 MinOpts, 1,527,592 FullOpts). MISSED contexts: 6,904 (0.27%) Overall (-10,720 bytes)
MinOpts (-248 bytes)
FullOpts (-10,472 bytes)
Assembly diffs for windows/x64 ran on windows/x64Diffs are based on 2,366,413 contexts (928,740 MinOpts, 1,437,673 FullOpts). MISSED contexts: 6,788 (0.29%) Overall (-2,856 bytes)
MinOpts (-248 bytes)
FullOpts (-2,608 bytes)
Details here Assembly diffs for windows/x86 ran on windows/x86Diffs are based on 2,292,278 contexts (840,452 MinOpts, 1,451,826 FullOpts). MISSED contexts: 6,850 (0.30%) Overall (-3,338 bytes)
FullOpts (-3,338 bytes)
Details here |
Diff results for #97468Assembly diffsAssembly diffs for linux/x64 ran on windows/x64Diffs are based on 2,505,358 contexts (977,766 MinOpts, 1,527,592 FullOpts). MISSED contexts: 6,904 (0.27%) Overall (-10,720 bytes)
MinOpts (-248 bytes)
FullOpts (-10,472 bytes)
Assembly diffs for windows/x64 ran on windows/x64Diffs are based on 2,366,413 contexts (928,740 MinOpts, 1,437,673 FullOpts). MISSED contexts: 6,788 (0.29%) Overall (-2,856 bytes)
MinOpts (-248 bytes)
FullOpts (-2,608 bytes)
Details here Assembly diffs for windows/x86 ran on windows/x86Diffs are based on 2,292,278 contexts (840,452 MinOpts, 1,451,826 FullOpts). MISSED contexts: 6,850 (0.30%) Overall (-3,338 bytes)
FullOpts (-3,338 bytes)
Details here |
Diff results for #97468Assembly diffsAssembly diffs for linux/x64 ran on windows/x64Diffs are based on 2,505,358 contexts (977,766 MinOpts, 1,527,592 FullOpts). MISSED contexts: 6,904 (0.27%) Overall (-10,720 bytes)
MinOpts (-248 bytes)
FullOpts (-10,472 bytes)
Assembly diffs for windows/x64 ran on windows/x64Diffs are based on 2,366,413 contexts (928,740 MinOpts, 1,437,673 FullOpts). MISSED contexts: 6,788 (0.29%) Overall (-2,856 bytes)
MinOpts (-248 bytes)
FullOpts (-2,608 bytes)
Details here Assembly diffs for windows/x86 ran on windows/x86Diffs are based on 2,292,278 contexts (840,452 MinOpts, 1,451,826 FullOpts). MISSED contexts: 6,850 (0.30%) Overall (-3,338 bytes)
FullOpts (-3,338 bytes)
Details here |
Given the following code:
###Before:
After: