KEMBAR78
Suboptimal codegen for memset/memcpy unrolling · Issue #83277 · dotnet/runtime · GitHub
Skip to content

Suboptimal codegen for memset/memcpy unrolling #83277

@EgorBo

Description

@EgorBo

When JIT unrolls memset/memcpy it does suboptimal decisions for certain sizes, e.g. to memset 30 bytes:

struct MyStruct
{
    fixed byte a[30];
}

MyStruct Test()
{
    MyStruct s = default; 
    return s;
}
  xor      eax, eax
  vxorps   xmm0, xmm0
  vmovdqu  xmmword ptr [rdx], xmm0
  mov      qword ptr [rdx+10H], rax
  mov      qword ptr [rdx+16H], rax

so to zero 30 bytes it uses GPR twice. It's better to keep using SIMD and overlap with previously zeroed part:

  vxorps   xmm0, xmm0, xmm0
  vmovups  xmmword ptr [rdx+ 14], xmm0
  vmovups  xmmword ptr [rdx], xmm0

Etc for other sizes.

PS: it seems that arm64 is doing the right thing here

Metadata

Metadata

Assignees

Labels

area-CodeGen-coreclrCLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions