KEMBAR78
JIT: better addressing mode for floating point on arm64 by EgorBo · Pull Request #65468 · dotnet/runtime · GitHub
Skip to content

Conversation

@EgorBo
Copy link
Member

@EgorBo EgorBo commented Feb 16, 2022

Closes #64819

float Test(float[] arr, int i) => arr[i];

codegen diff:

G_M60861_IG01:
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
G_M60861_IG02:
            ldr     w0, [x1,#8]
            cmp     w2, w0
            bhs     G_M60861_IG04
-           ubfiz   x0, x2, #2, #32
-           add     x0, x0, #16
-           ldr     s0, [x1, x0]
+           add     x0, x1, #16
+           ldr     s0, [x0, w2, UXTW #2]
G_M60861_IG03:
            ldp     fp, lr, [sp],#16
            ret     lr
G_M60861_IG04:
            bl      CORINFO_HELP_RNGCHKFAIL
            brk_windows #0
-; Total bytes of code: 48
+; Total bytes of code: 44

Diffs

@ghost ghost added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Feb 16, 2022
@ghost ghost assigned EgorBo Feb 16, 2022
@ghost
Copy link

ghost commented Feb 16, 2022

Tagging subscribers to this area: @JulieLeeMSFT
See info in area-owners.md if you want to be subscribed.

Issue Details

Closes #64819

float Test(float[] arr, int i) => arr[i];

codegen diff:

G_M60861_IG01:
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
G_M60861_IG02:
            ldr     w0, [x1,#8]
            cmp     w2, w0
            bhs     G_M60861_IG04
-           ubfiz   x0, x2, #2, #32
-           add     x0, x0, #16
-           ldr     s0, [x1, x0]
+           add     x0, x1, #16
+           ldr     s0, [x0, w2, UXTW #2]
G_M60861_IG03:
            ldp     fp, lr, [sp],#16
            ret     lr
G_M60861_IG04:
            bl      CORINFO_HELP_RNGCHKFAIL
            brk_windows #0
-; Total bytes of code: 48
+; Total bytes of code: 44
Author: EgorBo
Assignees: -
Labels:

area-CodeGen-coreclr

Milestone: -

@EgorBo EgorBo changed the title Add addressing mode support for floats JIT: better addressing mode for floating point on arm64 Feb 16, 2022
@EgorBo
Copy link
Member Author

EgorBo commented Feb 16, 2022

@kunalspathak PTAL

Copy link
Contributor

@kunalspathak kunalspathak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@kunalspathak
Copy link
Contributor

Seems there are quite a few regressions. E.g. linux arm64 coreclr_tests. Can you double check?

          52 (3.94 % of base) : 249396.dasm - Benchstone.BenchF.LLoops:Init():this
          48 (6.22 % of base) : 161900.dasm - testout1:Sub_Funclet_303():double
          44 (3.63 % of base) : 237914.dasm - SciMark2.LU:factor(System.Double[][],System.Int32[]):int

@EgorBo
Copy link
Member Author

EgorBo commented Feb 17, 2022

I pushed one more change and it made diffs much better.
Unfortunately, the way addr mode works now on ARM64 will always produce regressions because we have to decide early what we're going to make CSE-friendly:

1) ArrayRef + (ConstDataOffset + Index)
2) (ArrayRef + ConstDataOffset) + Index

(we don't care about it on x64 because the whole thing can be contained)

2nd is usually better for ARM64 but there are cases where it's not, e.g:

static double[] x;
static double[] y;

double foo(int i)
{
    return x[i] + y[i];
}

here we access two different arrays with the same index so we'd better do CSE for (ConstDataOffset + Index)

I propose we merge this as is (because all collections except tests are improved) and then think of a better way to address this problem, most likely it needs a special "Reassociate" phase. or special casing in CSE

non floating-point addressing modes are also affected, I'll file an issue with more details.

@kunalspathak
Copy link
Contributor

I propose we merge this as is (because all collections except tests are improved)

Sounds good to me. Can you share how does the regression looks like?

@EgorBo
Copy link
Member Author

EgorBo commented Feb 17, 2022

I propose we merge this as is (because all collections except tests are improved)

Sounds good to me. Can you share how does the regression looks like?

Sure, here is both good and bad cases:

static double[] x;
static double[] y;

static double Bad(int i) 
    => x[i] + y[i]; // (ArrayRef + dataConstOffset) can not be CSE'd, only index is

static double Good(int i) 
    => x[i] + x[i + 1]; // (ArrayRef + dataConstOffset) can be CSE'd, index is not

Diff: https://www.diffchecker.com/voeWEctL
We just guess what would be better to give CSE. I don't think we correctly guess in morph at all so this all needs better thinking and a more complex optimization.

Btw, other primitives are also affected

@kunalspathak
Copy link
Contributor

Btw, other primitives are also affected

You mean with this change other primitives are also affected?

@EgorBo
Copy link
Member Author

EgorBo commented Feb 17, 2022

Btw, other primitives are also affected

You mean with this change other primitives are also affected?

No, I mean other primitives already go the path "prefer CSE for 'ArrayRef + DataConstOffset'" - this PR basically aligns behavior between floats and other primitives

@kunalspathak
Copy link
Contributor

Makes sense.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Arm64: Better addressing mode for float/double array access

2 participants