-
Notifications
You must be signed in to change notification settings - Fork 5.2k
Closed
Labels
area-CodeGen-coreclrCLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMICLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Milestone
Description
When JIT unrolls memset/memcpy it does suboptimal decisions for certain sizes, e.g. to memset 30 bytes:
struct MyStruct
{
fixed byte a[30];
}
MyStruct Test()
{
MyStruct s = default;
return s;
} xor eax, eax
vxorps xmm0, xmm0
vmovdqu xmmword ptr [rdx], xmm0
mov qword ptr [rdx+10H], rax
mov qword ptr [rdx+16H], raxso to zero 30 bytes it uses GPR twice. It's better to keep using SIMD and overlap with previously zeroed part:
vxorps xmm0, xmm0, xmm0
vmovups xmmword ptr [rdx+ 14], xmm0
vmovups xmmword ptr [rdx], xmm0Etc for other sizes.
PS: it seems that arm64 is doing the right thing here
Metadata
Metadata
Assignees
Labels
area-CodeGen-coreclrCLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMICLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI