KEMBAR78
ARM64: Optimize Volatile.Read/Write for floats by EgorBo · Pull Request #101359 · dotnet/runtime · GitHub
Skip to content

Conversation

EgorBo
Copy link
Member

@EgorBo EgorBo commented Apr 22, 2024

Closes #67254

This PR eliminates explicit full/load memory barriers for loads and stores for floating points. Example:

double _location = 0;

double Read() => Volatile.Read(ref _location);
void Write(double val) => Volatile.Write(ref _location, val);
; Method Tests:Read():double:this (FullOpts)
            stp     fp, lr, [sp, #-0x10]!
            mov     fp, sp
            add     x0, x0, #8
-           ldr     d0, [x0]
-           dmb     ishld
+           ldar    x0, [x0]
+           fmov    d0, x0
            ldp     fp, lr, [sp], #0x10
            ret     lr
; Total bytes of code: 28


; Method Tests:Write(double):this (FullOpts)
            stp     fp, lr, [sp, #-0x10]!
            mov     fp, sp
            add     x0, x0, #8
-           dmb     ish
-           str     d0, [x0]
+           mov     x1, v0.d[0]
+           stlr    x1, [x0]
            ldp     fp, lr, [sp], #0x10
            ret     lr
; Total bytes of code: 28

@ghost ghost added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Apr 22, 2024
@EgorBo EgorBo marked this pull request as ready for review April 30, 2024 16:34
@EgorBo
Copy link
Member Author

EgorBo commented Apr 30, 2024

PTAL @kunalspathak @TIHan @VSadov @dotnet/jit-contrib

Mostly text diffs since the codegen size is the same. A couple of regressions because ldr + dmb can encode addressing modes, but when we switch to ldapr - we no longer can contain it, we can do it only with armv8.4 LDAPUR, but our SPMI collections don't have this ISA

@EgorBo
Copy link
Member Author

EgorBo commented May 2, 2024

ping @kunalspathak @TIHan

@kunalspathak
Copy link
Contributor

Not sure if I understand this...in the asmdiff above, we are loading/storing int value instead of float value?

-           ldr     d0, [x0]
-           dmb     ishld
+           ldar    x0, [x0]
+           fmov    d0, x0

@EgorBo
Copy link
Member Author

EgorBo commented May 2, 2024

@kunalspathak

Not sure if I understand this...in the asmdiff above, we are loading/storing int value instead of float value?

-           ldr     d0, [x0]
-           dmb     ishld
+           ldar    x0, [x0]
+           fmov    d0, x0

So

-           ldr     d0, [x0]
-           dmb     ishld

means "let's load a 4-byte float from x0 memory directly into simd(float) reg". Since there is no acq-release kind of ldr (you can't do ldar d0, [x0]) for SIMD destination, we have to emit an explicit memory barrier.

So in order to workaround it, we do this 4-byte load into a GPR reg (int) first, so we can avoid emitting a memory barrier since ldar gives us the needed semantics already, and then we just move it to a float reg, hence, fmov d0, x0

Effectively, we do:

float foo;

float LoadVolatile()
{
    return Unsafe.BitCast<int, float>
        (Volatile.Read(
            ref Unsafe.As<float, int>(ref foo)));
}

@EgorBo
Copy link
Member Author

EgorBo commented May 2, 2024

Native compiles use the same trick, e.g.: https://godbolt.org/z/KKTh79xbc

Copy link
Contributor

@kunalspathak kunalspathak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@VSadov VSadov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting! LGTM.

@EgorBo EgorBo merged commit 5da5873 into dotnet:main May 2, 2024
@EgorBo EgorBo deleted the volatile-stores-loads-floats branch May 2, 2024 19:18
@EgorBo
Copy link
Member Author

EgorBo commented May 9, 2024

michaelgsharp pushed a commit to michaelgsharp/runtime that referenced this pull request May 9, 2024
Ruihan-Yin pushed a commit to Ruihan-Yin/runtime that referenced this pull request May 30, 2024
@github-actions github-actions bot locked and limited conversation to collaborators Jun 9, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[arm64] JIT: Volatile.Write emits full memory barrier for floats

3 participants