-
Notifications
You must be signed in to change notification settings - Fork 5.2k
JIT: Lift remaining cmov restrictions by introducing GT_SELECTCC #82235
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This introduces GT_SELECTCC and unifies its handling with GT_JCC. We no longer use containment for GT_SELECT conditions in the xarch backend. Additionally teaches liveness DCE about GT_SETCC and GT_SELECTCC by allowing it to remove GTF_SET_FLAGS from the previous node when they are unused.
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch, @kunalspathak Issue DetailsThis introduces GT_SELECTCC and unifies its handling with GT_JCC. We no longer use containment for GT_SELECT conditions in the xarch backend. Additionally teaches liveness DCE about GT_SETCC and GT_SELECTCC by allowing it to remove GTF_SET_FLAGS from the previous node when they are unused. Minor diffs expected; the additional cases are really not that common. The main benefit is that cc @a74nh
|
/azp run runtime-coreclr jitstress, runtime-coreclr libraries-jitstress, Fuzzlyn |
Azure Pipelines successfully started running 3 pipeline(s). |
Failures look like #81688, #81901, #82252. Diffs. As expected not very impactful (the cases I highlighted are not that common), but it does unify There is a small TP impact, but it is mostly related to cc @dotnet/jit-contrib PTAL @BruceForstall |
}; | ||
|
||
// Represents a node with two operands and a condition. | ||
struct GenTreeOpCC final : public GenTreeOp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any difference between a GenTreeOpCC and a GenTreeConditional?
If not, I assume it's here just to help with categorising nodes.
Could GenTreeOpCC be a subclass of GenTreeConditional (or vice versa?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes -- GenTreeOpCC
is a binary node with a condition code in it. As such, it only makes sense in the backend with explicit ordering where the flags definition can happen right before it. There is no explicit link to the flags definition (at least until #82355, though this is just an experiment for now).
So it's sort of a lowered version of GenTreeConditional
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't spot this at first but GenTreeOpCC
does not have a pointer to the genTree for the condition, except via gtPrev. I see that GenTreeCC
works in the same way. That feels odd at first, but this is LIR, and it's the prev/next that matter. Ok, I'm happy with how that works now.
Thinking about the CCMP nodes, they would also use GenTreeOpCC
. We'd need a CCMP
node plus all 6 conditions (CCMP_EQ
, CCMP_NE
, CCMP_LT
, etc).
Coming from HIR we have:
JTRUE(NE(AND(LT, GT), 0))
OptimizeConstCompare() will lower to a TEST_EQ:
JTRUE(TEST_EQ(LT, GT)))
Then lowerCompare() will create a CCMP and CMP:
CMP
JTRUE(CCMP_LT[GT])
Finally LowerJtrue() will create the JCC:
CMP
CCMP[GT]
JCC[LT]
Which means I need to wait for this PR before I can continue with the CCMP work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds good to me. Feel free to lift any of the code from this PR if it helps.
We'd need a CCMP node plus all 6 conditions (CCMP_EQ, CCMP_NE, CCMP_LT, etc).
Note that this PR is teaching TryLowerConditionToFlagsNode
about GT_SETCC
nodes. I think I had a note on it somewhere else, but I wouldn't expect the 6 combinations to be necessary due to that. I.e. we can represent CCMP_LT
as CCMP + SETCC[LT]
. The transformation done in this PR will automatically lower JTRUE(SETCC[LT])
into JCC[LT]
.
I think it's likely compare chains can end up being handled similarly to this, that is, something like AND(SETCC[cond], x)
can be turned into some shape of CCMP + SETCC
and then lowering will continue working on these successively (haven't worked out the details).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With that scheme:
Coming from HIR we have:
JTRUE(NE(AND(LT, GT), 0))
OptimizeConstCompare() will lower to a TEST_EQ:
JTRUE(TEST_EQ(LT, GT)))
Then lowerCompare() will create a CCMP, SETCC and CMP:
CMP
CCMP[GT]
JTRUE(SETCC[LT])
Finally LowerJtrue() will create the JCC:
CMP
CCMP[GT]
JCC[LT]
I'm happy with that setup.
It's not clear to me how much of the existing AND chains code will be left in codegen after this, but that can be checked later.
} | ||
} | ||
|
||
#ifdef TARGET_XARCH |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
X86 only - did you plan on doing this for the other architectures?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, in a follow-up. As you can imagine it will be more involved to do for the compare chains. We might also need a new node type due to the "flags" immediate that is part of ccmp
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, in a follow-up. As you can imagine it will be more involved to do for the compare chains. We might also need a new node type due to the "flags" immediate that is part of
ccmp
.
Did you still plan on doing both SELECTCC
for arm64 and compare chains to CCMP
in a single PR?
I'm rebasing my AND chains, and it's going to conflict heavily if your doing compare chains soon. I'm happy to hold off again if it's all in your plan.
Also, for CCMP nodes the cflags add a little big of complexity. Consider:
cmp w0, #0
ccmp w1, #0, nzc, eq
cset x0, gt
The cflagsnzc
is created from the gt
(inverting it then turning into flags). So that means the CCMP
node needs to hold both the condition taken from the previous compare (eq
) and it's own condition now held by the cset (gt
). Either CCMP
would have to have two gtCondition fields, or we'd have to have CCMP_GT
etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you still plan on doing both SELECTCC for arm64 and compare chains to CCMP in a single PR?
Hmm, I suppose it will be necessary to change both at the same time. Let me prioritize working on this to unblock you and to avoid unnecessarily churning things multiple times.
The cflagsnzc is created from the gt (inverting it then turning into flags). So that means the CCMP node needs to hold both the condition taken from the previous compare (eq) and it's own condition now held by the cset (gt).
Right, I think we can add a GenTreeCCMP
node that has the flags immediate too. Or we could encode it as part of gtFlags
. Or we could do what you said and have separate node types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I suppose it will be necessary to change both at the same time. Let me prioritize working on this to unblock you and to avoid unnecessarily churning things multiple times.
Ta!
Right, I think we can add a
GenTreeCCMP
node that has the flags immediate too. Or we could encode it as part ofgtFlags
. Or we could do what you said and have separate node types.
gtFlags
feels bad because most of the flag space is in use, but I'm happy with any of the above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
/azp run runtime-coreclr superpmi-diffs, runtime-coreclr superpmi-replay |
Azure Pipelines successfully started running 2 pipeline(s). |
Failure is #82397 and known timeouts |
GTNODE(BT , GenTreeOp ,0,(GTK_BINOP|GTK_NOVALUE|DBK_NOTHIR)) | ||
#endif | ||
// Variant of SELECT that reuses flags computed by a previous node with the specified condition. | ||
GTNODE(SELECTCC , GenTreeCC ,0,GTK_BINOP|DBK_NOTHIR) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should SELECTCC here be a GenTreeOpCC ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it should. Doesn't look like it has any impact since we only use this for some node measuring that can be turned on, but I will keep in mind to fix it in a follow-up.
This introduces GT_SELECTCC and unifies its handling with GT_JCC. We no longer use containment for GT_SELECT conditions in the xarch backend.
Additionally teaches liveness DCE about GT_SETCC and GT_SELECTCC by allowing it to remove GTF_SET_FLAGS from the previous node when they are unused.
Minor diffs expected; the additional cases are really not that common. The main benefit is that
GT_SELECT
is now fully on par withGT_JTRUE
, and does not have any odd limitations. The code to handle conditions is completely shared.cc @a74nh
Example:
Before:
After:
Before:
After:
A cool diff for
BitOperations.IsPow2
:Example LIR diff:
Base:
Diff: