KEMBAR78
JIT: Intrinsify UTF16->UTF8 conversion for string literals (Encoding.UTF8.TryGetBytes) by EgorBo · Pull Request #85328 · dotnet/runtime · GitHub
Skip to content

Conversation

@EgorBo
Copy link
Member

@EgorBo EgorBo commented Apr 25, 2023

Closes #84659
The main goal is to make upcoming UTF8 string interpolation faster (to be on par with UTF16 one). If JIT sees that the source is a string literal (can be extended to cover RVA ROS<char>, "shifted" pointers to literal literals, static readonly string) and the data in it is ASCII only - it peforms the conversion in the JIT time and just copies it to the destination.

Benchmark:

byte[] _outputBuffer = new byte[1024];

[Benchmark]
public int TryGetBytes_const()
{
    Encoding.UTF8.TryGetBytes("hello", _outputBuffer.AsSpan(), out int written1);
    return written1;
}
Method Toolchain Mean Ratio
TryGetBytes_const \Core_Root\corerun.exe 0.3435 ns 0.06
TryGetBytes_const \Core_Root_base\corerun.exe 5.4439 ns 1.00

cc @stephentoub

@ghost ghost added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Apr 25, 2023
@ghost ghost assigned EgorBo Apr 25, 2023
@ghost
Copy link

ghost commented Apr 25, 2023

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

Closes #84659
The main goal is to make upcoming UTF8 string interpolation faster (to be on par with UTF16 one). If JIT sees that the source is a string literal (can be extended to cover RVA ROS, "shifted" pointers to literal literals, static readonly string) and the data in it is ASCII only - it peforms the conversion in the JIT time and just copies it to the destination.

Benchmark:

byte[] _outputBuffer = new byte[1024];

[Benchmark]
public int TryGetBytes_2B()
{
    Encoding.UTF8.TryGetBytes("Hi", _outputBuffer.AsSpan(), out int written); 
    return written;
}

[Benchmark]
public int TryGetBytes_27B()
{
    Encoding.UTF8.TryGetBytes("MemoryMarshal.GetReference", _outputBuffer.AsSpan(), out int written); 
    return written;
}

[Benchmark]
public int TryGetBytes_122B()
{
    Encoding.UTF8.TryGetBytes(
        ".NET Runtime uses third-party libraries or other resources" +
        "that may be distributed under licenses different than the .NET", 
        _outputBuffer.AsSpan(), out int written); 
    return written;
}
Method Toolchain Mean
TryGetBytes_2B \runtime-PR\corerun.exe 0.7416 ns
TryGetBytes_2B \runtime\corerun.exe 4.9567 ns
TryGetBytes_27B \runtime-PR\corerun.exe 0.8916 ns
TryGetBytes_27B \runtime\corerun.exe 6.5213 ns
TryGetBytes_122B \runtime-PR\corerun.exe 1.6277 ns
TryGetBytes_122B \runtime\corerun.exe 7.9679 ns

With DynamicPGO=1, results are worse in the default mode (up to 10ns)

codegen diff for TryGetBytes_122B: https://www.diffchecker.com/0udd0z9q/
The actual copying is done via:

;; length check
vmovups  zmm0, zmmword ptr [reloc @RWD00]
vmovups  zmmword ptr [r9], zmm0
vmovups  zmm0, zmmword ptr [reloc @RWD64]
vmovups  zmmword ptr [r9+38H], zmm0
mov      eax, 120 ; bytes written

	
RWD00   dq      6E75522054454E2Eh, 65737520656D6974h, 2D64726968742073h, 696C207974726170h, 2073656972617262h, 726568746F20726Fh, 6372756F73657220h, 6D20746168747365h
RWD64   dq      6D20746168747365h, 6964206562207961h, 6574756269727473h, 207265646E752064h, 7365736E6563696Ch, 6572656666696420h, 206E61687420746Eh, 54454E2E20656874h

in this case - via two avx-512 loads from the data section.

For now it needs DynamicPGO being enabled to workaround #85209, I'll land a fix for separately (or a workaround). tldr: None of Encoding.UTF8.X apis are ever inlined without PGO despite UTF8 returning something with an exact type.

cc @stephentoub

TODO:
Tests

Author: EgorBo
Assignees: -
Labels:

area-CodeGen-coreclr

Milestone: -

@EgorBo
Copy link
Member Author

EgorBo commented Apr 26, 2023

@jakobbotsch @SingleAccretion yet another gtSplitTree use case PTAL 🙂 not urgent

I recommend to enable "ignore whitespaces" mode. There were a couple of "refactorings" + a pretty simple expansion for GetUtf8Bytes, I'm only concerned whether I preserve side-effects and their order correctly.

@jakobbotsch
Copy link
Member

@EgorBo Can you merge and rerun CI?

@jakobbotsch jakobbotsch self-requested a review May 1, 2023 14:01
EgorBo and others added 3 commits July 18, 2023 19:01
Co-authored-by: Jakob Botsch Nielsen <Jakob.botsch.nielsen@gmail.com>
Copy link
Member

@jakobbotsch jakobbotsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The JIT changes look good to me.

@EgorBo
Copy link
Member Author

EgorBo commented Jul 18, 2023

the penalty can be mitigated if we slightly improve IsKnownConstant in jit and will be able to detect cases where span points to a literal in advance

Is this complicated / expensive? If not, can we do that? I'd expect the common case for direct end-user usage of Encoding.UTF8.TryGetBytes to be with non-const inputs. The const input case I've been primarily interested in is with Utf8.TryWrite and the AppendLiteral method that'll be called by code emitted by the C# compiler for the literal portion of interpolated strings. If it's too complicated / expensive / etc. to avoid the overhead for non-const inputs with TryGetBytes, then we should just skip TryGetBytes altogether and have the Utf8.TryWriteInterpolatedStringHandler.AppendLiteral use this intrinsic directly.

@stephentoub It's tricky to implement but it should be doable. Calling this static method directly from interpolation should be a better solution because we won't have to rely on inlining just like Jan pointed out earlier, so I guess for now I'll remove its usage from TryGetValue till IsKnonwConstant is implemented

@stephentoub
Copy link
Member

It's tricky to implement but it should be doable. Calling this static method directly from interpolation should be a better solution because we won't have to rely on inlining just like Jan pointed out earlier, so I guess for now I'll remove its usage from TryGetValue till IsKnonwConstant is implemented

Ok, sounds good.

@EgorBo
Copy link
Member Author

EgorBo commented Jul 19, 2023

/azp run runtime-coreclr pgo

@azure-pipelines
Copy link

No commit pushedDate could be found for PR 85328 in repo dotnet/runtime

@stephentoub
Copy link
Member

@EgorBo, anything I can do to help land this?

@EgorBo
Copy link
Member Author

EgorBo commented Jul 23, 2023

@EgorBo, anything I can do to help land this?

I was thinking of filing a separate PR to improve IsKnownConstant first, but I think it can be landed as is too in case if you want to call this new static ReadUtf8 directly from the interpolation? (it's actually a preferrable option because NAOT can't inline Encoding.UTF8.TryGetBytes currently)

@stephentoub
Copy link
Member

if you want to call this new static ReadUtf8 directly from the interpolation

Yes, I thought that's what we agreed on above, using it from AppendLiteral and not from TryGetBytes yet.

@EgorBo EgorBo merged commit fe03904 into dotnet:main Jul 23, 2023
@EgorBo
Copy link
Member Author

EgorBo commented Jul 23, 2023

Merging then, I've verified that the API works as expected when it's used inside TryGetBytes and removed it from there so now this API is unused. The additional runtime tests for TryGetBytes won't hurt I guess.

@EgorBo EgorBo deleted the intrinsify-utf16-utf8-literal branch July 23, 2023 21:47
@stephentoub
Copy link
Member

I'll submit a pr. Thanks.

uint16_t ch = bufferU16[charIndex];
if (ch > 127)
{
// Only ASCII is supported.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why?

@ghost ghost locked as resolved and limited conversation to collaborators Aug 23, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Intrinsify Encoding.TryGetBytes for UTF16 -> UTF8 literals

7 participants