KEMBAR78
[Arm64] Replace pairs of `str` with `stp` by kunalspathak · Pull Request #85032 · dotnet/runtime · GitHub
Skip to content

Conversation

@kunalspathak
Copy link
Contributor

Optimize pair of str with stp along with tracking the gc information.

Contributes to #55365 and #77010
Fixes: #35134, #35133 and #81278

@ghost ghost added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Apr 19, 2023
@ghost
Copy link

ghost commented Apr 19, 2023

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch, @kunalspathak
See info in area-owners.md if you want to be subscribed.

Issue Details

Optimize pair of str with stp along with tracking the gc information.

Contributes to #55365 and #77010
Fixes: #35134, #35133 and #81278

Author: kunalspathak
Assignees: -
Labels:

area-CodeGen-coreclr

Milestone: -

@kunalspathak
Copy link
Contributor Author

Diffs looks good (as expected):

image

I will run gcstress job to make sure there are no gc holes.

@kunalspathak
Copy link
Contributor Author

/azp run runtime-coreclr gcstress0x3-gcstress0xc

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@kunalspathak
Copy link
Contributor Author

/azp run runtime-coreclr gcstress0x3-gcstress0xc

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@kunalspathak
Copy link
Contributor Author

gcstress is green with few known failures.

@kunalspathak kunalspathak marked this pull request as ready for review April 20, 2023 16:40
@kunalspathak
Copy link
Contributor Author

@dotnet/jit-contrib , @BruceForstall

cc: @a74nh @SwapnilGaikwad

@BruceForstall
Copy link
Contributor

Diffs

@BruceForstall
Copy link
Contributor

BruceForstall commented Apr 21, 2023

This diffs look great. Interesting there are some zero-size diffs where we convert str/stp to stp/str.

fwiw, I noticed a case maybe we could do better in libraries.crossgen2 on win-arm64:

16194.dasm - System.Reflection.TypeNameParser:GetType(System.String,bool,bool,System.Reflection.Assembly):System.Type
...
             strb    wzr, [fp, #0x38]
             strb    wzr, [fp, #0x39]
             strb    wzr, [fp, #0x3A]
             strb    wzr, [fp, #0x3B]
             strb    wzr, [fp, #0x3C]

=>

             str     wzr, [fp, #0x38]
             strb    wzr, [fp, #0x3C]

?

Copy link
Contributor

@BruceForstall BruceForstall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you show any diffs (including GC diffs) where both addresses are GC lcl vars?

(immediate).
*/

void emitter::emitIns_SS_R_R_R_I(instruction ins,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this not a variant or extension of emitIns_S_S_R_R?

In that case, no reg3 is passed because it is determined based on the lclvar. In this case, there are 2 lclvar. Presumably they would both have the same FPbased value, thus determining the same address base register. In fact, you could probably determine / assert the offsets are correct.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have referred the code from emitIns_R_R_R_I() instead which operates on vector registers as well. For example, it operates on scale of 2, 3 and 4, whereas emitIns_S_S_R_R() operates just on scale == 3. Also, it has special handling for useRegForAdr which I assume is just for the stack stores. And yes, the reg3 is always FP or SP.

@ghost ghost added the needs-author-action An issue or pull request that requires more info or actions from the author. label Apr 21, 2023
@BruceForstall
Copy link
Contributor

I presume the MinOpts TP regression is due to extra checks related to instrDesc size, and possibly due to fewer constants fitting in a small instrDesc.

@ghost ghost removed the needs-author-action An issue or pull request that requires more info or actions from the author. label Apr 21, 2023
@kunalspathak
Copy link
Contributor Author

Can you show any diffs (including GC diffs) where both addresses are GC lcl vars?

Sure, this is one example 4997 in coreclr_test.mch of windows.arm64 where I missed the gc tracking, and fixed it in 92b67de.

Here is the diff. Of course, we need to fix #84504
image

@kunalspathak
Copy link
Contributor Author

I presume the MinOpts TP regression is due to extra checks related to instrDesc size, and possibly due to fewer constants fitting in a small instrDesc.

correct.

@kunalspathak
Copy link
Contributor Author

fwiw, I noticed a case maybe we could do better in libraries.crossgen2 on win-arm64:

That is an interesting pattern coming from accessing 4~5 bool fields that are next to each other.

private bool _throwOnError;
private bool _ignoreCase;
private bool _extensibleParser;
private bool _requireAssemblyQualifiedName;
private bool _suppressContextualReflectionContext;

Unfortunately, the popular peephole optimization that we do today just check the last instruction. We could use the infrastructure that we have recently developed to see last N instructions and combine them, but honestly, I don't know if removing last N instructions is well tested and works. I would expect this optimization to happen higher up in morph or something, but should be tracked as a separate issue, although I am not sure how many of such patterns exist.

regNumber reg3,
ssize_t imm,
int varx1,
int offs,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

offs here is the offset within a local var. Since there are 2 lclvars, there should be two offsets: offs1, offs2.

Copy link
Contributor Author

@kunalspathak kunalspathak Apr 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 2nd offset would be just offs + TARGET_POINTER_SIZE so we don't need to pass it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not correct. The offset is within the local, not from the base register.

@ghost ghost added needs-author-action An issue or pull request that requires more info or actions from the author. and removed needs-author-action An issue or pull request that requires more info or actions from the author. labels Apr 21, 2023
@kunalspathak
Copy link
Contributor Author

@BruceForstall - this should be ready to review again.

@kunalspathak
Copy link
Contributor Author

Latest diffs:

image

Seems like some good TPdiffs because of consolidation of ldp and stp instructions:

image

@ghost ghost added needs-author-action An issue or pull request that requires more info or actions from the author. and removed needs-author-action An issue or pull request that requires more info or actions from the author. labels Apr 26, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ARM64: Optimize pair of "str reg, [fp]" to stp

3 participants