KEMBAR78
Add simple Else conditions to If Conversion by a74nh · Pull Request #77728 · dotnet/runtime · GitHub
Skip to content

Conversation

@a74nh
Copy link
Contributor

@a74nh a74nh commented Nov 1, 2022

For example:
if (x < 7) { a = 5; } else { a = 9; }
a = (cond) ? b : c;

The else condition must write to the same variable as the then statement.

This patch builds on top of #73472

It refactors a lot of the code in the If Conversion pass. Various parts are split into new functions. Following the style of other passes, I put everything inside it's own OptIfConversionDsc class too.

Detection is slightly reworked - It now finds a valid flow and then checks all the statements inside that flow are valid. I think this is slightly better for performance.

The existing if conversion logic should be unchanged - all the checks and modifications are the same.

This does not catch GT_RETURN cases, for example:
return (a==b) ? 1 : 0
It should be possible to add that on top of this.

For example:
if (x < 7) { a = 5; } else { a = 9; }
a = (cond) ? b : c;

The else condition must write to the same variable as the then
statement.
@ghost ghost added community-contribution Indicates that the PR has been added by a community member area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI labels Nov 1, 2022
@ghost
Copy link

ghost commented Nov 1, 2022

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

For example:
if (x < 7) { a = 5; } else { a = 9; }
a = (cond) ? b : c;

The else condition must write to the same variable as the then statement.

This patch builds on top of #73472

It refactors a lot of the code in the If Conversion pass. Various parts are split into new functions. Following the style of other passes, I put everything inside it's own OptIfConversionDsc class too.

Detection is slightly reworked - It now finds a valid flow and then checks all the statements inside that flow are valid. I think this is slightly better for performance.

The existing if conversion logic should be unchanged - all the checks and modifications are the same.

This does not catch GT_RETURN cases, for example:
return (a==b) ? 1 : 0
It should be possible to add that on top of this.

Author: a74nh
Assignees: -
Labels:

area-CodeGen-coreclr, community-contribution

Milestone: -

@a74nh
Copy link
Contributor Author

a74nh commented Nov 1, 2022

Note, due to #77723, after rebasing on top of the latest head, I was no longer able to build anything. All changes are contained within optIfConvert, so I'm not expecting any issues due to the rebase.

@a74nh
Copy link
Contributor Author

a74nh commented Nov 1, 2022

The way I've done the SSA feels a little like a fudge, but I wasn't sure how else to do it.

Before optimisation we have two assignments and one jtrue. This gives us two SSA definitions

JTRUE(EQ (A,B))
ASG X = 1   [ssa def 1]
ASG X = 0   [ssa def 2]

After the optimisation, we now just have a single assignment:

ASG X = SELECT(EQ(A,B),1,0)

But, we need two SSA definitions. Especially so given that the next block will probably be a PHI.
To work around this I created a dummy assignment:

ASG X = SELECT(EQ(A,B),1,0)   [ssa def 1]
ASG X = X                     [ssa def 2]

This feels wrong, but the second assignment is eventually optimised away to nothing.

The alternative would be to find all uses of SSA DEF 2 across all blocks and update them to use SSA DEF 1 ? Then somehow remove SSA DEF 2 ?

@a74nh
Copy link
Contributor Author

a74nh commented Nov 1, 2022

@kunalspathak, @jakobbotsch

@SingleAccretion
Copy link
Contributor

How to handle SSA updates

Since these are currently not possible - move the phase to run after range check and declare it to invalidate SSA?

@kunalspathak
Copy link
Contributor

@dotnet/jit-contrib

@a74nh
Copy link
Contributor Author

a74nh commented Nov 2, 2022

Since these are currently not possible - move the phase to run after range check and declare it to invalidate SSA?

I tried moving optIfConversion to be after optOptimizeBools and removing all my SSA fix ups (included the ones in HEAD). With that I get no test failures.

Does the SSA state not matter after a certain point?

Is there something specific I should be calling to state that the SSA is invalid?

@SingleAccretion
Copy link
Contributor

Does the SSA state not matter after a certain point?

Yes, after the last phase that uses it. Currently this is range check.

Is there something specific I should be calling to state that the SSA is invalid?

No, a comment in compCompile will suffice.

@a74nh
Copy link
Contributor Author

a74nh commented Nov 2, 2022

I've been having a quick look at GT_RETURN. It's going to require a little reworking of the code in the PR, but otherwise looks fairly simple to add in. To avoid people re-reviewing another rewrite of the same bits of code, I was thinking it's probably best to get it ready then merge it with this PR.
(Maybe close this PR in the meantime?)

@AndyAyersMS
Copy link
Member

Does the SSA state not matter after a certain point?

Yes, after the last phase that uses it. Currently this is range check.

Is there something specific I should be calling to state that the SSA is invalid?

No, a comment in compCompile will suffice.

For now you can use assert(!fgDomsComputed) as we explicitly reset this after the SSA phases run.

@a74nh a74nh marked this pull request as draft November 7, 2022 17:11
@a74nh a74nh marked this pull request as ready for review November 8, 2022 09:09
@a74nh
Copy link
Contributor Author

a74nh commented Nov 8, 2022

Some spmidiff results on Arm64 Linux:

[09:09:17] Total bytes of base: 16748340 (overridden on cmd)
[09:09:17] Total bytes of diff: 16741204 (overridden on cmd)
[09:09:17] Total bytes of delta: -7136 (-0.04 % of base)
[09:09:46] Total bytes of base: 154583348 (overridden on cmd)
[09:09:46] Total bytes of diff: 154515596 (overridden on cmd)
[09:09:46] Total bytes of delta: -67752 (-0.04 % of base)
[09:11:41] Total bytes of base: 29368716 (overridden on cmd)
[09:11:41] Total bytes of diff: 29350440 (overridden on cmd)
[09:11:41] Total bytes of delta: -18276 (-0.06 % of base)

Regressions look like just the usual issues around 0s. That should improve once we starting using XZR register for csel.

@a74nh
Copy link
Contributor Author

a74nh commented Nov 8, 2022

I'm happy with this patch now. It catches return cases, eg: return (a==b) ? 1 : 0. Also added a lot of DisasmCheck test cases for all supported if conversion cases. Pass has been moved much later, so that all SSA modifications could be removed from the pass.

One test failure, but that looks like just a spurious build failure.

Not happy with the IsPreferredRelop() function, ideally it would use the GenConditionDesc map, but that's in Codegen, and I wasn't sure if was ok be calling into codegen from a front end pass. Essentially, just need to make sure the GenConditionDesc created in genCodeForSelect only requires a single condition (which is now guarded with an assert).

@jakobbotsch
Copy link
Member

I'm happy with this patch now. It catches return cases, eg: return (a==b) ? 1 : 0. Also added a lot of DisasmCheck test cases for all supported if conversion cases. Pass has been moved much later, so that all SSA modifications could be removed from the pass.

One test failure, but that looks like just a spurious build failure.

Sounds great! I'll try to get to this soon. In the meantime I will run some testing.

Not happy with the IsPreferredRelop() function, ideally it would use the GenConditionDesc map, but that's in Codegen, and I wasn't sure if was ok be calling into codegen from a front end pass. Essentially, just need to make sure the GenConditionDesc created in genCodeForSelect only requires a single condition (which is now guarded with an assert).

I assume this is just heuristically, right? The backend should be able to handle GT_SELECT even in other cases.

@jakobbotsch
Copy link
Member

/azp run runtime-coreclr jitstress, runtime-coreclr libraries-jitstress, Fuzzlyn

@azure-pipelines
Copy link

Azure Pipelines successfully started running 3 pipeline(s).

@a74nh
Copy link
Contributor Author

a74nh commented Nov 16, 2022

It looks like we are missing setting GTF_ORDER_SIDEEFF in some earlier phase.

What's happening here is there is a IND, which much earlier is created from a ARR_ADDR plus BOUNDS_CHECK. At creation of the IND it decides the IND cannot fault (setting GTF_IND_NONFAULTING) - because directly before the IND there is a JTRUE/LE which essentially does the bounds check.

Later during If Conversion, it hoists the IND before the JTRUE/LE, meaning the IND always gets executed.

I don't think we can or would want to recalculate boundary checks during if conversion, so the best I can do with the flags available is:

// Don't believe INDs cannot fault - the if conversion may move it before a check.
if (op2->OperIsIndir() && (op2->gtFlags & GTF_IND_NONFAULTING) != 0)
{
    return false;
}

Incidentally, I think boundary checks always have an else statement, which is why it's only coming up in this patch.

@jakobbotsch
Copy link
Member

jakobbotsch commented Nov 16, 2022

I don't think we can or would want to recalculate boundary checks during if conversion, so the best I can do with the flags available is:

I think the fix should be to make sure we set GTF_ORDER_SIDEEFF on the GT_IND node when we do that optimization. Can you submit a separate PR for it?

@a74nh
Copy link
Contributor Author

a74nh commented Nov 16, 2022

I think the fix should be to make sure we set GTF_ORDER_SIDEEFF on the GT_IND node when we do that optimization. Can you submit a separate PR for it?

Yes, I think that'd work. Not sure if a failing test case on main is possible - will give it a go.

@a74nh
Copy link
Contributor Author

a74nh commented Nov 29, 2022

Ok, so.... Patch has been rebased after #78698, all review comments should be covered and all the tests are passing.

@jakobbotsch
Copy link
Member

/azp run runtime-coreclr jitstress, runtime-coreclr libraries-jitstress

@azure-pipelines
Copy link

Azure Pipelines successfully started running 2 pipeline(s).

@jakobbotsch
Copy link
Member

The stress failures so far look like #78912, #78909, #78898

@a74nh
Copy link
Contributor Author

a74nh commented Nov 29, 2022

The stress failures so far look like #78912, #78909, #78898

Was just looking at these - then realised they are all X86 failures, so shouldn't be connected to this patch.

@jakobbotsch
Copy link
Member

I looked at some of the diffs:

8 (13.33 % of base) : 51635.dasm - IfStatements.IfStatements:AndInner(int,int):

-G_M14134_IG02:        ; gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, isz
-            tbnz    w0, #0, G_M14134_IG04
-                                               ;; size=4 bbWeight=1    PerfScore 1.00
-G_M14134_IG03:        ; gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
-            mov     w2, #5
-            tst     w1, #1
-            csel    w0, w2, w0, eq
-                                               ;; size=12 bbWeight=0.50 PerfScore 0.75
-G_M14134_IG04:        ; gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
+G_M14134_IG02:        ; gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
+            and     w2, w0, #1
+            and     w3, w1, #1
+            orr     w2, w2, w3
+            mov     w3, #5
+            cmp     w2, #0
+            csel    w0, w3, w0, eq

Let's keep an eye on this one in the benchmarks.
(Side note, not related to this PR: ideally we would have some kind of peephole to get rid of the cmp in favor of orrs here)

8 (3.33 % of base) : 48815.dasm - LinqBenchmarks+<>c:<Where01LinqQueryX>b__7_0(Product):bool:this:

-            ldr     x0, [fp, #0x28]	// [V20 tmp18]
-            ldr     w1, [fp, #0x24]	// [V19 tmp17]
+            ldr     x0, [fp, #0x30]	// [V20 tmp18]
+            ldr     w1, [fp, #0x2C]	// [V19 tmp17]
             orr     x0, x0, x1
-            cbnz    x0, G_M58282_IG04
-            mov     w2, wzr
-            b       G_M58282_IG08
-						;; size=84 bbWeight=0.50 PerfScore 13.50
-G_M58282_IG04:        ; gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
-            ldr     w2, [fp, #0x20]	// [V18 tmp16]
-            asr     w0, w2, #31
-            orr     w2, w0, #1
-            b       G_M58282_IG08
+            ldr     w1, [fp, #0x28]	// [V18 tmp16]
+            asr     w1, w1, #31
+            orr     w1, w1, #1
+            cmp     x0, #0
+            csel    w19, w1, wzr, ne
+            b       G_M58282_IG07
+						;; size=96 bbWeight=0.50 PerfScore 15.00

Looks pretty good, the regression comes from different (worse) register allocation.

On the opposite end:

         -24 (-37.50 % of base) : 3407.dasm - System.Numerics.INumber`1[ushort]:Min(ushort,ushort):ushort
         -16 (-36.36 % of base) : 13042.dasm - System.Xml.XmlElement:get_LastNode():System.Xml.XmlLinkedNode:this
         -12 (-33.33 % of base) : 2804.dasm - System.Math:Max(int,int):int
         -12 (-33.33 % of base) : 2806.dasm - System.Math:Max(uint,uint):uint
         -12 (-33.33 % of base) : 2579.dasm - System.Math:Min(int,int):int
         -12 (-33.33 % of base) : 2805.dasm - System.Math:Min(uint,uint):uint

We should keep an eye on benchmarks that may be impacted by these (cc @dotnet/jit-contrib), but overall it seems great to me to use conditional moves for these primitive methods. The diffs look like:
-24 (-37.50 % of base) : 3407.dasm - System.Numerics.INumber1[ushort]:Min(ushort,ushort):ushort`:

 						;; size=8 bbWeight=1    PerfScore 1.50
-G_M1277_IG02:        ; gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, isz
+G_M1277_IG02:        ; gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
             uxth    w0, w0
             uxth    w1, w1
             cmp     w0, w1
-            beq     G_M1277_IG06
-						;; size=16 bbWeight=1    PerfScore 2.50
-G_M1277_IG03:        ; gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, isz
+            csel    w2, w0, w1, lt
             cmp     w0, w1
-            blt     G_M1277_IG05
-            mov     w0, w1
-						;; size=12 bbWeight=0.50 PerfScore 1.00
-G_M1277_IG04:        ; , epilog, nogc, extend
+            csel    w0, w1, w2, eq
+						;; size=24 bbWeight=1    PerfScore 3.00
+G_M1277_IG03:        ; , epilog, nogc, extend
             ldp     fp, lr, [sp], #0x10
             ret     lr
-						;; size=8 bbWeight=0.50 PerfScore 1.00
-G_M1277_IG05:        ; gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, epilog, nogc
-            ldp     fp, lr, [sp], #0x10
-            ret     lr
-						;; size=8 bbWeight=0.50 PerfScore 1.00
-G_M1277_IG06:        ; gcVars=0000000000000000 {}, gcrefRegs=0000 {}, byrefRegs=0000 {}, gcvars, byref
-            mov     w0, w1
-						;; size=4 bbWeight=0.50 PerfScore 0.25
-G_M1277_IG07:        ; , epilog, nogc, extend
-            ldp     fp, lr, [sp], #0x10
-            ret     lr
-						;; size=8 bbWeight=0.50 PerfScore 1.00
+						;; size=8 bbWeight=1    PerfScore 2.00

The double compare looks like an instance of #78385.

-12 (-33.33 % of base) : 2804.dasm - System.Math:Max(int,int):int:

-G_M56171_IG01:        ; gcVars=0000000000000000 {}, gcrefRegs=0000 {}, byrefRegs=0000 {}, gcvars, byref, nogc <-- Prolog IG
+G_M56171_IG01:        ; gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, nogc <-- Prolog IG
             stp     fp, lr, [sp, #-0x10]!
             mov     fp, sp
 						;; size=8 bbWeight=1    PerfScore 1.50
-G_M56171_IG02:        ; gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, isz
+G_M56171_IG02:        ; gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
             cmp     w0, w1
-            bge     G_M56171_IG05
-						;; size=8 bbWeight=1    PerfScore 1.50
-G_M56171_IG03:        ; gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
-            mov     w0, w1
-						;; size=4 bbWeight=0.50 PerfScore 0.25
-G_M56171_IG04:        ; , epilog, nogc, extend
+            csel    w0, w0, w1, ge
+						;; size=8 bbWeight=1    PerfScore 1.00
+G_M56171_IG03:        ; , epilog, nogc, extend
             ldp     fp, lr, [sp], #0x10
             ret     lr
-						;; size=8 bbWeight=0.50 PerfScore 1.00
-G_M56171_IG05:        ; gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, epilog, nogc
-            ldp     fp, lr, [sp], #0x10
-            ret     lr
-						;; size=8 bbWeight=0.50 PerfScore 1.00
+						;; size=8 bbWeight=1    PerfScore 2.00

Another one:
-12 (-30.00 % of base) : 3496.dasm - System.Text.ASCIIUtility:FirstCharInUInt32IsAscii(uint):bool:

-G_M6195_IG01:        ; gcVars=0000000000000000 {}, gcrefRegs=0000 {}, byrefRegs=0000 {}, gcvars, byref, nogc <-- Prolog IG
+G_M6195_IG01:        ; gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, nogc <-- Prolog IG
             stp     fp, lr, [sp, #-0x10]!
             mov     fp, sp
 						;; size=8 bbWeight=1    PerfScore 1.50
-G_M6195_IG02:        ; gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, isz
+G_M6195_IG02:        ; gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
+            mov     w1, #1
             tst     w0, #0xD1FFAB1E
-            beq     G_M6195_IG05
-						;; size=8 bbWeight=1    PerfScore 1.50
-G_M6195_IG03:        ; gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
-            mov     w0, wzr
-						;; size=4 bbWeight=0.50 PerfScore 0.25
-G_M6195_IG04:        ; , epilog, nogc, extend
+            csel    w0, w1, wzr, eq
+						;; size=12 bbWeight=1    PerfScore 1.50
+G_M6195_IG03:        ; , epilog, nogc, extend
             ldp     fp, lr, [sp], #0x10
             ret     lr
-						;; size=8 bbWeight=0.50 PerfScore 1.00
-G_M6195_IG05:        ; gcVars=0000000000000000 {}, gcrefRegs=0000 {}, byrefRegs=0000 {}, gcvars, byref
-            mov     w0, #1
-						;; size=4 bbWeight=0.50 PerfScore 0.25
-G_M6195_IG06:        ; , epilog, nogc, extend
-            ldp     fp, lr, [sp], #0x10
-            ret     lr
-						;; size=8 bbWeight=0.50 PerfScore 1.00
+						;; size=8 bbWeight=1    PerfScore 2.00

Given the heuristics on if-conversion around loops there's probably not much to worry about here, and overall the diffs look great.

@jakobbotsch jakobbotsch merged commit 5356857 into dotnet:main Nov 30, 2022
@jakobbotsch
Copy link
Member

Thanks again!

@a74nh a74nh deleted the github_a74nh_csel_optimizer_else branch November 30, 2022 10:55
@ghost ghost locked as resolved and limited conversation to collaborators Dec 30, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI community-contribution Indicates that the PR has been added by a community member

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants