KEMBAR78
Arm64: Add If conversion pass by a74nh · Pull Request #73472 · dotnet/runtime · GitHub
Skip to content

Conversation

a74nh
Copy link
Contributor

@a74nh a74nh commented Aug 5, 2022

This patch builds upon the work in #71705

A single condition can be optimised by replacing a JTRUE
with a SELECT and removing the control flow.

For example:

if (op1 > 0) { op1 = 5; }
IN0001: 000008                    mov     w2, dotnet/perf-autofiling-issues#5
IN0002: 00000C                    cmp     w0, #0
IN0003: 000010                    csel    w0, w0, w2, eq

Additional JTRUE statements can be replaced by placing AND
chains onto the conditional part of a SELECT.

For example:

if (op1 > 3 && op2 != 10 && op3 < 7) { op1 = 5; }
IN0001: 000008                    cmp     w2, dotnet/runtime#56402
IN0002: 00000C                    ccmp    w1, dotnet/perf-autofiling-issues#10, z, lt
IN0003: 000010                    ccmp    w0, dotnet/perf-autofiling-issues#3, nzc, ne
IN0004: 000014                    mov     w2, dotnet/perf-autofiling-issues#5
IN0005: 000018                    csel    w0, w2, w0, gt

Performance tests are here: dotnet/performance#2517

@ghost ghost added area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI community-contribution Indicates that the PR has been added by a community member labels Aug 5, 2022
@ghost
Copy link

ghost commented Aug 5, 2022

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

This patch builds upon the work in #71705

A single condition can be optimised by replacing a JTRUE
with a SELECT and removing the control flow.

For example:

if (op1 > 0) { op1 = 5; }
IN0001: 000008                    mov     w2, dotnet/perf-autofiling-issues#5
IN0002: 00000C                    cmp     w0, #0
IN0003: 000010                    csel    w0, w0, w2, eq

Additional JTRUE statements can be replaced by placing AND
chains onto the conditional part of a SELECT.

For example:

if (op1 > 3 && op2 != 10 && op3 < 7) { op1 = 5; }
IN0001: 000008                    cmp     w2, dotnet/runtime#56402
IN0002: 00000C                    ccmp    w1, dotnet/perf-autofiling-issues#10, z, lt
IN0003: 000010                    ccmp    w0, dotnet/perf-autofiling-issues#3, nzc, ne
IN0004: 000014                    mov     w2, dotnet/perf-autofiling-issues#5
IN0005: 000018                    csel    w0, w2, w0, gt

Performance tests are here: dotnet/performance#2517

Author: a74nh
Assignees: -
Labels:

area-CodeGen-coreclr

Milestone: -

@a74nh
Copy link
Contributor Author

a74nh commented Aug 5, 2022

The first commit in this PR is a copy of #71705. Once that PR is merged I will remove that commit from this patch. This was just so I could get this out a little earlier.

@kunalspathak @jakobbotsch

Also, this probably needs an additional check in the optimizer pass to limit the max size of the chains.

@a74nh
Copy link
Contributor Author

a74nh commented Aug 10, 2022

PR is now rebased on top of HEAD, and is good for a review. Taking this out of draft.

I do have the one failure being hit a few times in the testsuite - debugging this now. However, I'm not expecting that to make big changes to the patch.

I'm away for two weeks from Monday. Getting this in before then feels unlikely. However, it'd be good to know how close it is.

@kunalspathak @jakobbotsch

@a74nh a74nh marked this pull request as ready for review August 10, 2022 11:09
Copy link
Member

@jakobbotsch jakobbotsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of the logic looks right on the surface to me, but I would like a review of the flow checks/changes from @AndyAyersMS.

The current implementation looks quite limited, seems like we could get more out of it, I've left a few comments about that. I'm ok with leaving that for future PRs.

The diffs look like size wise regressions, why is that? I would expect this pattern should give smaller code in most cases.

The code needs to be updated to follow our code style, in particular camelCase for variables and parentheses around all binary operations.

I'm away for two weeks from Monday. Getting this in before then feels unlikely. However, it'd be good to know how close it is.

Yes, I think it's unlikely. We will also need to do benchmarking and check regressions, plus run extended testing, and doing all of that before Friday seems unlikely.

Copy link
Contributor

@kunalspathak kunalspathak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @jakobbotsch that this is unlikely to get merged by Friday provided code size diff is a concern and we want to see performance improvements on some small microbenchmarks. Our .NET 8 branch should open up in couple of weeks and we should merge this PR at that point. The previous related PRs still is giving us good code coverage and is part of .NET 7.

@ghost ghost added needs-author-action An issue or pull request that requires more info or actions from the author. and removed needs-author-action An issue or pull request that requires more info or actions from the author. labels Aug 10, 2022
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For min/max reductions or sort inner loops seems like we might want to also if convert?

Also perhaps some idea of how costly it might be to evaluate the conditional input eagerly -- we are limited to just one tree so perhaps some cap on GetCostEx for that input.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@a74nh are you working on this? (Note that GetCostSz() below does not capture the fact that the optimization changes some code to be eagerly evaluated, as Andy points out)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was avoiding using this as loops as csel can be slower inside loops. Specifically, it's slower when the loop iterator is being used as an input to the condition (therefore blocking it from executing). I'm not sure exactly how we would test for that.

@a74nh
Copy link
Contributor Author

a74nh commented Aug 11, 2022

The diffs look like size wise regressions, why is that? I would expect this pattern should give smaller code in most cases.

For all the cases where the code is longer that will be due to no longer using cbz/cbnz instructions:

cbnz x0, LABEL1
mov x1, 5
LABEL1:

now becomes:

cmp x0, 0
mov x2, 5
csel x1, x2, x1, eq

Which is one instruction longer. Assuming the branch taken/not taken is fairly random, then the new code will be faster. I should probably extend dotnet/performance#2517 to make sure those cases are being tested.

@kunalspathak kunalspathak added this to the 8.0.0 milestone Aug 11, 2022
@jakobbotsch
Copy link
Member

Assuming the branch taken/not taken is fairly random, then the new code will be faster.

I'm not sure this is a fair assumption in practice. Can you post some micro benchmarks for all cases (both randomly taken branches and biased branches)?

@jakobbotsch
Copy link
Member

/azp run Fuzzlyn

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@jakobbotsch
Copy link
Member

/azp run runtime-coreclr jitstress, runtime-coreclr libraries-jitstress, runtime-coreclr gcstress

@azure-pipelines
Copy link

Azure Pipelines successfully started running 2 pipeline(s).

@jakobbotsch
Copy link
Member

/azp run runtime-coreclr gcstress0x3-gcstress0xc

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@jakobbotsch
Copy link
Member

I'll take a look at the failures and see if they look related to this PR.

@jakobbotsch
Copy link
Member

Most of the failures looked to be about installing the .NET SDK. I've rerun them, but if they fail again I will just merge, since this is an arm64 change I think we had sufficient coverage in the arm64 jobs that actually were able to run.

@kunalspathak
Copy link
Contributor

Do you mind adding some disasm-checks tests as well?

@a74nh
Copy link
Contributor Author

a74nh commented Oct 31, 2022

Do you mind adding some disasm-checks tests as well?

Sure, happy to do that. If @jakobbotsch plans on merging this, then I could add it as a new PR.

@jakobbotsch
Copy link
Member

Sure, happy to do that. If @jakobbotsch plans on merging this, then I could add it as a new PR.

I'm ok with doing this in a follow-up PR (for example, @EgorBo and I are hoping for a follow up to handle ternaries that it could be added as part of :-) ). Does that sound fine to you @kunalspathak?

@a74nh
Copy link
Contributor Author

a74nh commented Oct 31, 2022

I'm ok with doing this in a follow-up PR (for example, @EgorBo and I are hoping for a follow up to handle ternaries that it could be added as part of :-) ). Does that sound fine to you @kunalspathak?

That patch is sat ready awaiting this :) Quite a bit of refactoring, but all the changes are in the one function/file.

@kunalspathak
Copy link
Contributor

Does that sound fine to you @kunalspathak?

Sounds good to me. I will go through the PR today (since it is awaiting my respond).

Copy link
Contributor

@kunalspathak kunalspathak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for the valuable feedback @jakobbotsch and thanks for your patience @a74nh .

//-----------------------------------------------------------------------------
// optIfConvert
//
// Find blocks representing simple if statements represented by conditional jumps
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the wonderful summary.

{
#ifndef TARGET_ARM64
return false;
#else
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#else
#endif

nit: No need of #else

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might give warnings due to unreachable code?
In any case, let's leave it for the follow-up PR so we don't need to rerun the CI.


// Reverse iterate through the blocks.
BasicBlock* block = fgLastBB;
while (block != nullptr)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not for this PR, but just a thought - Is there a quick way to check if the block doesn't definitely have conditional in the end that can be optimized? I am trying to see if have information that we can do a quick check here to eliminate calls to optIfConvert() and doing analysis only to find out that block cannot be optimized with if-convert.

@kunalspathak
Copy link
Contributor

System.Transactions.Tests.OleTxTests.ImplicitDistributedTransactions_cannot_be_changed_after_being_read_as_true seems from this PR?

superpmi-diff

Preliminerary scan seems an infra issue:

C:\h\w\AD3B0987\p\superpmi.py:600: DeprecationWarning: There is no current event loop

  loop = asyncio.get_event_loop()

Traceback (most recent call last):

  File "C:\h\w\AD3B0987\p\superpmi.py", line 1720, in create_one_artifact

    with open(item_path, 'r') as file_handle:

FileNotFoundError: [Errno 2] No such file or directory: 'C:\\h\\w\\AD3B0987\\p\\artifacts\\spmi\\asm.coreclr_tests.run.Linux.arm64.checked\\base\\209719.dasm'



The above exception was the direct cause of the following exception:



Traceback (most recent call last):

  File "C:\h\w\AD3B0987\p\superpmi.py", line 4451, in <module>

    sys.exit(main(args))

  File "C:\h\w\AD3B0987\p\superpmi.py", line 4342, in main

    success = asm_diffs.replay_with_asm_diffs()

  File "C:\h\w\AD3B0987\p\superpmi.py", line 1736, in replay_with_asm_diffs

    subproc_helper.run_to_completion(create_replay_artifacts, self, mch_file, asm_dotnet_vars_full_env, text_differences, base_asm_location, diff_asm_location, ".dasm")

  File "C:\h\w\AD3B0987\p\superpmi.py", line 601, in run_to_completion

    loop.run_until_complete(self.__run_to_completion__(async_callback, *extra_args))

  File "C:\python3\lib\asyncio\base_events.py", line 649, in run_until_complete

    return future.result()

  File "C:\h\w\AD3B0987\p\superpmi.py", line 584, in __run_to_completion__

    await asyncio.gather(*tasks)

  File "C:\h\w\AD3B0987\p\superpmi.py", line 556, in __get_item__

    await async_callback(print_prefix, item, *extra_args)

  File "C:\h\w\AD3B0987\p\superpmi.py", line 1726, in create_replay_artifacts

    base_txt = await create_one_artifact(self.base_jit_path, base_location, flags + base_option_flags_for_diff_artifact)

  File "C:\h\w\AD3B0987\p\superpmi.py", line 1723, in create_one_artifact

    raise create_exception() from err

Exception: Failure while creating JitStdOutFile.

Exit code: 0

@jakobbotsch
Copy link
Member

jakobbotsch commented Nov 1, 2022

System.Transactions.Tests.OleTxTests.ImplicitDistributedTransactions_cannot_be_changed_after_being_read_as_true seems from this PR?

This one is #77241.

superpmi-diff

Preliminerary scan seems an infra issue:

Yeah, it's not related to this PR. It was fixed by #77504 but needed a new collection after that PR for the fix to kick in. It should work on a rerun now that we have new collections.

@jakobbotsch
Copy link
Member

/azp run runtime-coreclr superpmi-diffs

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@jakobbotsch
Copy link
Member

Ah, well now superpmi-diffs conflicted with the JIT-EE GUID update in #77593 :-) ok, well this is ready anyway so I will merge it.
For posterity, the last fully successful diffs is at https://dev.azure.com/dnceng-public/public/_build/results?buildId=64480&view=ms.vss-build-web.run-extensions-tab.

Thanks for working with us on this @a74nh!

@jakobbotsch jakobbotsch merged commit 33be29e into dotnet:main Nov 1, 2022
@a74nh a74nh deleted the github_a74nh_csel_optimizer branch November 1, 2022 13:05
@AndyAyersMS
Copy link
Member

No, I don't think any were transferred.

@kunalspathak
Copy link
Contributor

#77984

@ghost ghost locked as resolved and limited conversation to collaborators Dec 10, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI community-contribution Indicates that the PR has been added by a community member

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants