- 
                Notifications
    You must be signed in to change notification settings 
- Fork 5.2k
JIT: Unify arm64 and x64 GT_SELECT handling #82610
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This unifies GT_SELECT/GT_SELECTCC handling between arm64 and x64. The arm64 backend no longer uses containment for compare chains; instead, there is a new GT_CCMP node that both produces and consumes flags, and lowering can lower GT_AND(op, relop) down to this node.
| { | ||
| child->AsOp()->SetContained(); | ||
| GenCondition cond2 = GenCondition::FromRelop(op2); | ||
| op2->SetOper(GT_CCMP); | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I know this is done elsewhere), but why is this safe? op2 is moving from a GenTreeOp to a GenTreeCCMP. The latter class is bigger than the former. What's stopping memory overflow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The JIT has only two different node sizes: small nodes (of size TREE_NODE_SZ_SMALL) and large nodes (of size TREE_NODE_SZ_LARGE). Whenever a node is allocated, its size is rounded up to one of these. GenTree::InitNodeSize has various static asserts for the expected node sizes.
All the nodes here are small, so bashing between them is "safe" as far as size issues goes, though this ignores that this is not conformant C++.
Historically (before I worked on the JIT) the IR node hierarchy was not represented via inheritance, but was represented via unions instead, which made this pattern of bashing between node types less controversial. But as you've noted we still do it in various places due to the nice perf characteristics of changing nodes in place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GenTree::SetOper also has asserts that the bashing done is safe as far as the size issue goes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, hoped there was something like that.
Looking at the static assert checks for TREE_NODE_SZ_SMALL / TREE_NODE_SZ_LARGE, there's nothing there for GenTreeConditional
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added one for GenTreeConditional with the latest commits
| /azp run runtime-coreclr jitstress, runtime-coreclr libraries-jitstress, Fuzzlyn | 
| Azure Pipelines successfully started running 3 pipeline(s). | 
| Code looks fine to me. Are there any code diffs for this PR? | 
| 
 Yes, a few improvements: https://dev.azure.com/dnceng-public/public/_build/results?buildId=184319&view=ms.vss-build-web.run-extensions-tab The ones I looked at are new cases where we use  @@ -22,9 +22,6 @@ N012 ( 30, 12) [000003] DA-XG------                         ▌  STORE_LCL_VAR r
                [000037] -----------                            IL_OFFSET void   INLRT @ 0x007[E-]
 N001 (  1,  1) [000004] -----------                    t4 =    LCL_VAR   ref    V01 loc0         u:2 $82
 N002 (  1,  2) [000005] -c---------                    t5 =    CNS_INT   ref    null $VN.Null
-                                                            ┌──▌  t4     ref    
-                                                            ├──▌  t5     ref    
-N003 (  6,  4) [000006] -----------                    t6 = ▌  NE        int    $240
 N004 (  1,  1) [000007] -----------                    t7 =    LCL_VAR   ref    V01 loc0         u:2 (last use) $82
                                                             ┌──▌  t7     ref    
 N006 (  3,  4) [000035] -c---------                   t35 = ▌  LEA(b+96) byref 
@@ -35,11 +32,12 @@ N009 (  6,  6) [000033] -c---------                   t33 = ▌  LEA(b+104) byre
                                                             ┌──▌  t33    byref  
 N010 (  7,  5) [000019] ---XG------                   t19 = ▌  IND       int    <l:$244, c:$245>
 N011 (  1,  2) [000012] -c---------                   t12 =    CNS_INT   int    4 $44
+                                                            ┌──▌  t4     ref    
+                                                            ├──▌  t5     ref    
+N003 (  6,  4) [000006] -----------                         ▌  CMP       void  
                                                             ┌──▌  t19    int    
                                                             ├──▌  t12    int    
-N012 ( 12,  8) [000013] ---XG------                   t13 = ▌  EQ        int    <l:$249, c:$248>
-                                                            ┌──▌  t6     int    
-                                                            ├──▌  t13    int    
-N013 ( 19, 13) [000014] ---XG------                   t14 = ▌  AND       int    <l:$24d, c:$24c>
+N012 ( 12,  8) [000013] ---XG------                         ▌  CCMP      void   cond=UNE flags=0
+N013 ( 19, 13) [000014] ---XG------                   t14 =    SETCC     int    cond=UEQ
                                                             ┌──▌  t14    int    
 N014 ( 20, 14) [000015] ---XG------                         ▌  RETURN    int    <l:$14a, c:$149>Notice that  I also just realized that the  FWIW, with this change you will not see  [MethodImpl(MethodImplOptions.NoInlining)]
public static int Foo(int a, int b, int c)
{
    if (a < b & b < c & c < 10)
        return 42;
    return 13;
}Codegen:             cmp     w0, w1
            ccmp    w1, w2, z, lt
            ccmp    w2, #10, z, lt
            cset    x0, lt
            mov     w1, #13
            mov     w2, #42
            cmp     w0, #0
            csel    w0, w1, w2, eqExpected codegen is something like: cmp w0, w1
ccmp w1, w2, z, lt
ccmp w2, #10, z, lt
csel w0, w1, w2, ltI think the best fix would be to teach  | 
| 
 Agreed this makes sense to do. Is that something your planning to do? because if so I can hold off again from rebasing #79283. | 
| 
 Yep, I can do that in a follow-up. This PR should be ready, cc @dotnet/jit-contrib PTAL @kunalspathak @BruceForstall | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice cleanup. Added some questions/comments.
| // Returns: | ||
| // A flags immediate that, if those flags were set, would cause the specified condition to be true. | ||
| // | ||
| insCflags Lowering::TruthifyingFlags(GenCondition condition) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we repurpose InsCflagsForCcmp ? Move it to a GenTree, make it static and use it from codegen as well as lower?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need that function in codegen anymore -- this PR is removing it from codegenarm64.cpp (it is effectively the same function, except without reversing the condition). In a follow-up I will add the ability to generate ccmp for OR(relop, relop) too which will use TruthifyingFlags again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, for some reason I didn't noticed that you have removed InsCflagsForCcmp().
        
          
                src/coreclr/jit/lowerarmarch.cpp
              
                Outdated
          
        
      | } | ||
| else if (child->OperIsCmpCompare() && varTypeIsIntegral(child->gtGetOp1()) && varTypeIsIntegral(child->gtGetOp2())) | ||
|  | ||
| if (!op2->OperIsCmpCompare() || !varTypeIsIntegral(op2->gtGetOp1())) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wondering why this is also not !op2->operIsCompare(), just like you have !op1->OperIsCompare() above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The operands of op2 will be compared with the ccmp instruction, and we can only do that for the "basic" relops: GT_EQ, GT_NE, GT_LE, GT_LT, GT_GT, GT_GE. The remaining compare operators, GT_TEST_EQ and GT_TEST_NE (whose semantics are (a & b) == 0 and (a & b) != 0) cannot be implemented as a conditional compare -- it would need some kind of ctst instruction.
On the other hand, the first operand can be anything that has already set the condition flags in any way, including e.g. floating point compares or tst instruction (or, a previous conditional compare, which is how we end up with the chains).
Let me add a more detailed comment here about this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a more detailed comment. I also experimented with making this logic symmetric (allow ccmp for either op1 or op2), but diffs showed that it didn't matter. Might still do it as part of the OR follow up, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
This unifies GT_SELECT/GT_SELECTCC handling between arm64 and x64. The arm64 backend no longer uses containment for compare chains; instead, there is a new GT_CCMP node that both produces and consumes flags, and lowering can lower GT_AND(op, relop) down to this node.
To do this, the JIT verifies invariance and reorders the
cmp/ccmpsto happen successively in order. For example, given C# code like:we see LIR:
When lowering the first
AND([000006]), we will notice that its operands are compares, and lower this as:I.e. we moved the compare
[000002]forward and turned it into aCMPnode, and[000005]was then turned into aCCMPnode.When we see the second
ANDnode ([000010]) we do the same thing again: verify invariance and move the entire compare chain forward:When we finally get to the
SELECTwe do the same thing -- move all the conditional compares in front of theSELECTand then change it toSELECTCC.cc @a74nh