Implement LoadPairVector64 and LoadPairVector128 #64864

echesakov · 2022-02-06T01:54:30Z

Resolves #39243

ghost · 2022-02-06T01:54:35Z

Note regarding the new-api-needs-documentation label:

This serves as a reminder for when your PR is modifying a ref *.cs file and adding/modifying public APIs, to please make sure the API implementation in the src *.cs file is documented with triple slash comments, so the PR reviewers can sign off that change.

ghost · 2022-02-06T01:54:37Z

Tagging subscribers to this area: @JulieLeeMSFT
See info in area-owners.md if you want to be subscribed.

Issue Details

null

Author:	echesakovMSFT
Assignees:	echesakovMSFT
Labels:	`arch-arm64`, `area-CodeGen-coreclr`
Milestone:	-

…formNotSupported.cs

…dvSimd.cs AdvSimd.PlatformNotSupported.cs

…otSupported.cs

…elpers.cs src/tests/JIT/HardwareIntrinsics/Arm/Shared/Helpers.tt

…imd64.cs

…nsic() in hwintrinsic.cpp

…src/coreclr/jit/hwintrinsicarm64.cpp

…insiccodegenarm64.cpp

…mStructVal() to allow intrinsics returning a struct in importer.cpp

…in lsra.cpp

…alues in multiple registers in lsra.h lsraarm64.cpp lsraxarch.cpp

…-reg intrinsic in src/coreclr/jit/morph.cpp

…eclr/jit/morphblock.cpp

echesakov · 2022-02-09T03:28:50Z

@dotnet/jit-contrib @tannergooding PTAL

tannergooding · 2022-02-09T03:49:09Z

src/coreclr/jit/gentree.cpp

+                unreached();
+        }
+#elif defined(TARGET_XARCH)
        return 2;


nit: would be good to explicitly cover the intrinsic IDs for xarch as well (definitely could be a separate PR).

(that is switch (intrinsicId) with a default: unreached())

Right, given that none are supported currently on X86 - I can replace this with unreached() and update with the switch when working on MultiplyNoFlags2 (or whatever name we decided)

tannergooding · 2022-02-09T03:53:25Z

src/coreclr/jit/gentree.cpp

+        case NI_AdvSimd_Arm64_LoadPairVector64:
+        case NI_AdvSimd_Arm64_LoadPairVector64NonTemporal:
+        case NI_AdvSimd_Arm64_LoadPairVector128:
+        case NI_AdvSimd_Arm64_LoadPairVector128NonTemporal:


I'll probably see later in the review, but under what conditions are these containable?

I didn't think Arm64 really had ins reg, [mem] operations outside atomic instructions; particularly for non-temporal operations...

Actually, I don't see where the containment handling is happening for this. I don't see any changes to lowering

I think I marked them by mistake when experimenting with some ideas I had. Let me update this.

tannergooding · 2022-02-09T03:59:35Z

src/coreclr/jit/gentree.h

+    if (OperIsHWIntrinsic())
    {
-        return (TypeGet() == TYP_STRUCT);
+        return TypeIs(TYP_STRUCT);


I wonder how well this will hold in the future? That is, are we expecting things that return TYP_STRUCT to always be multi-reg?

There are cases like say System.Half where we'll eventually need to add support and we'll need to treat it as TYP_HALF or recognize it some other way if we don't want it to be an issue.

I wonder if we should track this as a flag rather than by type just to be safe?

Ah, looks like we have a flag. Maybe we should assert it or use it here instead?

Sure, I can use a flag here.

tannergooding · 2022-02-09T04:00:50Z

src/coreclr/jit/gentree.h

+    if (OperIsHWIntrinsic())
    {
        assert(TypeGet() == TYP_STRUCT);
+#ifdef TARGET_ARM64
+        const GenTreeHWIntrinsic* intrinsic   = AsHWIntrinsic();
+        const NamedIntrinsic      intrinsicId = intrinsic->GetHWIntrinsicId();
+
+        switch (intrinsicId)
+        {
+            // TODO-ARM64-NYI: Support hardware intrinsics operating on multiple contiguous registers.
+            case NI_AdvSimd_Arm64_LoadPairScalarVector64:
+            case NI_AdvSimd_Arm64_LoadPairScalarVector64NonTemporal:
+            case NI_AdvSimd_Arm64_LoadPairVector64:
+            case NI_AdvSimd_Arm64_LoadPairVector64NonTemporal:
+            case NI_AdvSimd_Arm64_LoadPairVector128:
+            case NI_AdvSimd_Arm64_LoadPairVector128NonTemporal:
+                return 2;
+
+            default:
+                unreached();
+        }
+#elif defined(TARGET_XARCH)
        return 2;
-    }
 #endif
+    }


This looks to be the same logic as in gentree.cpp above. Should it be factored out into a shared helper or are there conditions where they won't/shouldn't be in sync?

tannergooding · 2022-02-09T04:02:37Z

src/coreclr/jit/gentree.h

+        else
+        {
+            assert(AsHWIntrinsic()->GetSimdSize() == 8);
+            return TYP_SIMD8;


Do we have any cases of "arg size is 8" but "return size is 16" or vice-versa?

I know some instructions fit that bill, I'm not sure if any of the multi-reg cases will

I don't see an example on Arm64 when this wouldn't hold.
ld[1-4] should be similar to ldp.

As for tbl and tbx:

TBX <Vd>.<Ta>, { <Vn>.16B, <Vn+1>.16B, <Vn+2>.16B }, <Vm>.<Ta>

the return value is going to be single-reg but the first source operand is multi-reg and composed of Vector128<byte>.

tannergooding · 2022-02-09T04:12:56Z

Changes generally LGTM. Left some comments/questions on a couple bits on how we expect certain checks to work long term.

Would be great to see a couple diffs/codegen examples particularly for the library changes.

SamMonoRT · 2022-02-09T14:39:39Z

cc @imhameed @fanyang-mono

…gentree.cpp

…ntrinsic.h

…tree.h src/coreclr/jit/gentree.cpp

echesakov · 2022-02-10T00:57:12Z

Would be great to see a couple diffs/codegen examples particularly for the library changes.

@tannergooding There are suboptimalities in the code using LoadPairVector in the libraries.

To fix that I would need to implement #64863 and #64857.

If I apply these changes on top of this PR the code diffs would look as expected:

GetIndexOfFirstCharToEncodeAdvSimd64

@@ -69,10 +73,8 @@ G_M52035_IG03:
                                                ;; bbWeight=0.50 PerfScore 0.25
 G_M52035_IG04:
             add     x6, x1, x4, LSL #1
-            ld1     {v20.8h}, [x6]
+            ldp     q20, q21, [x6]
             sqxtun  v20.8b, v20.8h
-            add     x6, x6, #16
-            ld1     {v21.8h}, [x6]
             sqxtun2 v20.16b, v21.8h
             and     v21.16b, v20.16b, v16.16b
             tbl     v21.16b, {v19.16b}, v21.16b
@@ -87,7 +89,7 @@ G_M52035_IG04:
             add     x4, x4, #16
             cmp     x4, x5
             blo     G_M52035_IG04
-                                               ;; bbWeight=4    PerfScore 100.00
+                                               ;; bbWeight=4    PerfScore 86.00

GetIndexOfFirstNonAsciiByte_Intrinsified

@@ -208,9 +211,8 @@ G_M18966_IG11:
@@ -208,9 +211,7 @@ G_M18966_IG11:
             sub     x22, x0, #32
                                                ;; bbWeight=0.50 PerfScore 1.75
 G_M18966_IG12:
-            ld1     {v16.16b}, [x19]
-            add     x0, x19, #16
-            ld1     {v10.16b}, [x0]
+            ldp     q16, q9, [x19]
             sshr    v16.16b, v16.16b, #7
             and     v16.16b, v16.16b, v8.16b
             addp    v16.16b, v16.16b, v16.16b

Here is my plan:

Undo the changes to these methods
Merge this PR as is
Finish the above-mentioned two PRs
Follow-up with the libraries changes

I will keep the changes to BitArray:CopyTo though

@@ -413,18 +414,16 @@ G_M40488_IG18:
             zip1    v19.16b, v18.16b, v18.16b
             and     v19.16b, v19.16b, v17.16b
             umin    v19.16b, v19.16b, v16.16b
-            st1     {v19.16b}, [x3]
             zip2    v18.16b, v18.16b, v18.16b
             and     v18.16b, v18.16b, v17.16b
             umin    v18.16b, v18.16b, v16.16b
-            add     x3, x3, #16
-            st1     {v18.16b}, [x3]
+            stp     q19, q18, [x3]
             add     w1, w1, #32
             add     w3, w1, #32
             ldr     w4, [x19,#16]
             cmp     w3, w4
             bls     G_M40488_IG18

…/coreclr/jit/gentree.cpp src/coreclr/jit/gentree.h

This reverts commit dda86a1.

…der.AdvSimd64.cs" This reverts commit bfa5a9f.

echesakov · 2022-02-10T01:11:12Z

Can someone on Mono team to sign off on the relevant changes, please?

For context: 15e56a0 was implemented by @imhameed during my initial attempt to get these changes in #52424

echesakov · 2022-02-10T01:11:28Z

@dotnet/jit-contrib PTAL

BruceForstall

LGTM

imhameed · 2022-02-10T23:55:11Z

Can someone on Mono team to sign off on the relevant changes, please?

For context: 15e56a0 was implemented by @imhameed during my initial attempt to get these changes in #52424

I'm 23 hours late to this but: the changes still look good to me (although I don't know if me signing off on my own implementation is kosher)

echesakov added arch-arm64 area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI labels Feb 6, 2022

echesakov self-assigned this Feb 6, 2022

ghost added the new-api-needs-documentation label Feb 6, 2022

echesakov mentioned this pull request Feb 7, 2022

Enable multi-register intrinsics support for Arm64 #64921

Closed

13 tasks

echesakov added 23 commits February 8, 2022 19:27

Add LoadPairVector64 and LoadPairVector128 in AdvSimd.cs AdvSimd.Plat…

97d2ef3

…formNotSupported.cs

Add LoadPairScalarVector64 in AdvSimd.cs AdvSimd.PlatformNotSupported.cs

e7e4ef7

Add LoadPairVector64NonTemporal and LoadPairVector128NonTemporal in A…

fe3cae3

…dvSimd.cs AdvSimd.PlatformNotSupported.cs

Add LoadPairScalarVector64NonTemporal in AdvSimd.cs AdvSimd.PlatformN…

e8acfcb

…otSupported.cs

Update System.Runtime.Intrinsics.cs

9c7c982

Add LoadPairScalar() in src/tests/JIT/HardwareIntrinsics/Arm/Shared/H…

b3f9bef

…elpers.cs src/tests/JIT/HardwareIntrinsics/Arm/Shared/Helpers.tt

Add LoadPairVectorTest.template

5109ecb

Add LoadPairVector64 and LoadPairVector128 in GenerateTests.csx

4fc3de7

Update src/tests/JIT/HardwareIntrinsics/Arm/AdvSimd.Arm64/*

120e1e3

Use AdvSimd.Arm64.LoadPairVector128 in ASCIIUtility.cs

dda86a1

Use AdvSimd.Arm64.StorePair in BitArray.cs

6d41f27

Use AdvSimd.Arm64.LoadPairVector128 in OptimizedInboxTextEncoder.AdvS…

bfa5a9f

…imd64.cs

Add HW_Flag_MultiReg and HWIntrinsicInfo::IsMultiReg() in hwintrinsic.h

c22f924

Add LoadPairVector64 and LoadPairVector128 in hwintrinsiclistarm64.h

a62d2bc

Adjust asserts to support multireg intrinsics in Compiler::impHWIntri…

e25774f

…nsic() in hwintrinsic.cpp

Implement LoadPairVector128/64 in Compiler::impSpecialIntrinsic() in …

ccc3f2d

…src/coreclr/jit/hwintrinsicarm64.cpp

Implement LoadPairVector128/64 in CodeGen::genHWIntrinsic() in hwintr…

1246824

…insiccodegenarm64.cpp

Adjust asserts in Compiler::impAssignStructPtr() and Compiler::impNor…

f3f5d42

…mStructVal() to allow intrinsics returning a struct in importer.cpp

Support multi-register HW intrinsics on arm64 in gentree.h gentree.cpp

4d17799

Support multi-register HW intrinsics on arm64 in lsraAssignRegToTree …

589742f

…in lsra.cpp

Extend LinearScan::BuildHWIntrinsic to support intrinsics returning v…

b5898c7

…alues in multiple registers in lsra.h lsraarm64.cpp lsraxarch.cpp

Don't insert indirection when source of a block assignment is a multi…

c1cbc9a

…-reg intrinsic in src/coreclr/jit/morph.cpp

Don't morph multireg intrinsic on rhs of a block assigment in src/cor…

e974d4c

…eclr/jit/morphblock.cpp

echesakov requested review from SamMonoRT and vargaz as code owners February 9, 2022 03:27

echesakov requested a review from tannergooding February 9, 2022 03:28

tannergooding reviewed Feb 9, 2022

View reviewed changes

echesakov added 3 commits February 9, 2022 10:33

Undo marking LoadPairVector64/128 as containable in /src/coreclr/jit/…

11e58a4

…gentree.cpp

Add HWIntrinsicInfo::GetMultiRegCount() helper in src/coreclr/jit/hwi…

3420e44

…ntrinsic.h

Use HWIntrinsicInfo::GetMultiRegCount() helper in src/coreclr/jit/gen…

a88b96f

…tree.h src/coreclr/jit/gentree.cpp

echesakov added 3 commits February 9, 2022 17:03

Use HWIntrinsicInfo::IsMultiReg() in GenTree::IsMultiRegNode() in src…

41b4adb

…/coreclr/jit/gentree.cpp src/coreclr/jit/gentree.h

Revert "Use AdvSimd.Arm64.LoadPairVector128 in ASCIIUtility.cs"

ed5e9c0

This reverts commit dda86a1.

Revert "Use AdvSimd.Arm64.LoadPairVector128 in OptimizedInboxTextEnco…

936bfa0

…der.AdvSimd64.cs" This reverts commit bfa5a9f.

echesakov requested a review from tannergooding February 10, 2022 01:07

BruceForstall approved these changes Feb 10, 2022

View reviewed changes

tannergooding approved these changes Feb 10, 2022

View reviewed changes

echesakov merged commit 7814396 into dotnet:main Feb 10, 2022

echesakov deleted the Arm64-ASIMD-LoadPairVector64-LoadPairVector128 branch February 10, 2022 19:51

elinor-fung mentioned this pull request Feb 11, 2022

[linux][arm64] Assert failure: 'regNum >= 0 && regNum <= 30' at gcinfodecoder.cpp #65175

Closed

JulieLeeMSFT mentioned this pull request Mar 3, 2022

What's new in .NET 7 Preview 2 [WIP] dotnet/core#7107

Closed

tannergooding mentioned this pull request Mar 9, 2022

Add DivMod instrinct for intel x86/x64 #27292

Closed

ghost locked as resolved and limited conversation to collaborators Mar 13, 2022

Implement LoadPairVector64 and LoadPairVector128 #64864

Implement LoadPairVector64 and LoadPairVector128 #64864

Uh oh!

Conversation

echesakov commented Feb 6, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ghost commented Feb 6, 2022

Uh oh!

ghost commented Feb 6, 2022

Uh oh!

echesakov commented Feb 9, 2022

Uh oh!

tannergooding Feb 9, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tannergooding commented Feb 9, 2022

Uh oh!

SamMonoRT commented Feb 9, 2022

Uh oh!

echesakov commented Feb 10, 2022

Uh oh!

echesakov commented Feb 10, 2022

Uh oh!

echesakov commented Feb 10, 2022

Uh oh!

BruceForstall left a comment

Choose a reason for hiding this comment

Uh oh!

imhameed commented Feb 10, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

echesakov commented Feb 6, 2022 •

edited

Loading

tannergooding Feb 9, 2022 •

edited

Loading