-
Notifications
You must be signed in to change notification settings - Fork 5.2k
Implement LoadPairVector64 and LoadPairVector128 #64864
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement LoadPairVector64 and LoadPairVector128 #64864
Conversation
|
Note regarding the This serves as a reminder for when your PR is modifying a ref *.cs file and adding/modifying public APIs, to please make sure the API implementation in the src *.cs file is documented with triple slash comments, so the PR reviewers can sign off that change. |
|
Tagging subscribers to this area: @JulieLeeMSFT Issue Detailsnull
|
…formNotSupported.cs
…dvSimd.cs AdvSimd.PlatformNotSupported.cs
…elpers.cs src/tests/JIT/HardwareIntrinsics/Arm/Shared/Helpers.tt
…nsic() in hwintrinsic.cpp
…src/coreclr/jit/hwintrinsicarm64.cpp
…insiccodegenarm64.cpp
…mStructVal() to allow intrinsics returning a struct in importer.cpp
…alues in multiple registers in lsra.h lsraarm64.cpp lsraxarch.cpp
…-reg intrinsic in src/coreclr/jit/morph.cpp
…eclr/jit/morphblock.cpp
|
@dotnet/jit-contrib @tannergooding PTAL |
src/coreclr/jit/gentree.cpp
Outdated
| unreached(); | ||
| } | ||
| #elif defined(TARGET_XARCH) | ||
| return 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: would be good to explicitly cover the intrinsic IDs for xarch as well (definitely could be a separate PR).
(that is switch (intrinsicId) with a default: unreached())
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, given that none are supported currently on X86 - I can replace this with unreached() and update with the switch when working on MultiplyNoFlags2 (or whatever name we decided)
src/coreclr/jit/gentree.cpp
Outdated
| case NI_AdvSimd_Arm64_LoadPairVector64: | ||
| case NI_AdvSimd_Arm64_LoadPairVector64NonTemporal: | ||
| case NI_AdvSimd_Arm64_LoadPairVector128: | ||
| case NI_AdvSimd_Arm64_LoadPairVector128NonTemporal: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll probably see later in the review, but under what conditions are these containable?
I didn't think Arm64 really had ins reg, [mem] operations outside atomic instructions; particularly for non-temporal operations...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I don't see where the containment handling is happening for this. I don't see any changes to lowering
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I marked them by mistake when experimenting with some ideas I had. Let me update this.
src/coreclr/jit/gentree.h
Outdated
| if (OperIsHWIntrinsic()) | ||
| { | ||
| return (TypeGet() == TYP_STRUCT); | ||
| return TypeIs(TYP_STRUCT); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder how well this will hold in the future? That is, are we expecting things that return TYP_STRUCT to always be multi-reg?
There are cases like say System.Half where we'll eventually need to add support and we'll need to treat it as TYP_HALF or recognize it some other way if we don't want it to be an issue.
I wonder if we should track this as a flag rather than by type just to be safe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, looks like we have a flag. Maybe we should assert it or use it here instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I can use a flag here.
src/coreclr/jit/gentree.h
Outdated
| if (OperIsHWIntrinsic()) | ||
| { | ||
| assert(TypeGet() == TYP_STRUCT); | ||
| #ifdef TARGET_ARM64 | ||
| const GenTreeHWIntrinsic* intrinsic = AsHWIntrinsic(); | ||
| const NamedIntrinsic intrinsicId = intrinsic->GetHWIntrinsicId(); | ||
|
|
||
| switch (intrinsicId) | ||
| { | ||
| // TODO-ARM64-NYI: Support hardware intrinsics operating on multiple contiguous registers. | ||
| case NI_AdvSimd_Arm64_LoadPairScalarVector64: | ||
| case NI_AdvSimd_Arm64_LoadPairScalarVector64NonTemporal: | ||
| case NI_AdvSimd_Arm64_LoadPairVector64: | ||
| case NI_AdvSimd_Arm64_LoadPairVector64NonTemporal: | ||
| case NI_AdvSimd_Arm64_LoadPairVector128: | ||
| case NI_AdvSimd_Arm64_LoadPairVector128NonTemporal: | ||
| return 2; | ||
|
|
||
| default: | ||
| unreached(); | ||
| } | ||
| #elif defined(TARGET_XARCH) | ||
| return 2; | ||
| } | ||
| #endif | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks to be the same logic as in gentree.cpp above. Should it be factored out into a shared helper or are there conditions where they won't/shouldn't be in sync?
| else | ||
| { | ||
| assert(AsHWIntrinsic()->GetSimdSize() == 8); | ||
| return TYP_SIMD8; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have any cases of "arg size is 8" but "return size is 16" or vice-versa?
I know some instructions fit that bill, I'm not sure if any of the multi-reg cases will
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see an example on Arm64 when this wouldn't hold.
ld[1-4] should be similar to ldp.
As for tbl and tbx:
TBX <Vd>.<Ta>, { <Vn>.16B, <Vn+1>.16B, <Vn+2>.16B }, <Vm>.<Ta>
the return value is going to be single-reg but the first source operand is multi-reg and composed of Vector128<byte>.
|
Changes generally LGTM. Left some comments/questions on a couple bits on how we expect certain checks to work long term. Would be great to see a couple diffs/codegen examples particularly for the library changes. |
…tree.h src/coreclr/jit/gentree.cpp
@tannergooding There are suboptimalities in the code using To fix that I would need to implement #64863 and #64857. If I apply these changes on top of this PR the code diffs would look as expected: GetIndexOfFirstCharToEncodeAdvSimd64 @@ -69,10 +73,8 @@ G_M52035_IG03:
;; bbWeight=0.50 PerfScore 0.25
G_M52035_IG04:
add x6, x1, x4, LSL #1
- ld1 {v20.8h}, [x6]
+ ldp q20, q21, [x6]
sqxtun v20.8b, v20.8h
- add x6, x6, #16
- ld1 {v21.8h}, [x6]
sqxtun2 v20.16b, v21.8h
and v21.16b, v20.16b, v16.16b
tbl v21.16b, {v19.16b}, v21.16b
@@ -87,7 +89,7 @@ G_M52035_IG04:
add x4, x4, #16
cmp x4, x5
blo G_M52035_IG04
- ;; bbWeight=4 PerfScore 100.00
+ ;; bbWeight=4 PerfScore 86.00GetIndexOfFirstNonAsciiByte_Intrinsified @@ -208,9 +211,8 @@ G_M18966_IG11:
@@ -208,9 +211,7 @@ G_M18966_IG11:
sub x22, x0, #32
;; bbWeight=0.50 PerfScore 1.75
G_M18966_IG12:
- ld1 {v16.16b}, [x19]
- add x0, x19, #16
- ld1 {v10.16b}, [x0]
+ ldp q16, q9, [x19]
sshr v16.16b, v16.16b, #7
and v16.16b, v16.16b, v8.16b
addp v16.16b, v16.16b, v16.16bHere is my plan:
I will keep the changes to BitArray:CopyTo though @@ -413,18 +414,16 @@ G_M40488_IG18:
zip1 v19.16b, v18.16b, v18.16b
and v19.16b, v19.16b, v17.16b
umin v19.16b, v19.16b, v16.16b
- st1 {v19.16b}, [x3]
zip2 v18.16b, v18.16b, v18.16b
and v18.16b, v18.16b, v17.16b
umin v18.16b, v18.16b, v16.16b
- add x3, x3, #16
- st1 {v18.16b}, [x3]
+ stp q19, q18, [x3]
add w1, w1, #32
add w3, w1, #32
ldr w4, [x19,#16]
cmp w3, w4
bls G_M40488_IG18 |
|
@dotnet/jit-contrib PTAL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
I'm 23 hours late to this but: the changes still look good to me (although I don't know if me signing off on my own implementation is kosher) |
Resolves #39243