KEMBAR78
Clean up hex formatting, vectorize Guid.TryFormat for UTF8 by EgorBo · Pull Request #81666 · dotnet/runtime · GitHub
Skip to content

Conversation

EgorBo
Copy link
Member

@EgorBo EgorBo commented Feb 5, 2023

This PR:
Vectorizes Utf8Formatter.Guid.TryFormat and unifies SIMD paths for Guid.TryFormat, Utf8Formatter.Guid.TryFormat and HexConverter.

JSON UTF-8 serialization benchmark:

[Benchmark]
[ArgumentsSource(nameof(JsonUtf8_TestData))]
public void JsonUtf8(Stream stream, MyEntity entity)
{
    JsonSerializer.Serialize(stream, entity, MyEntityContext.Default.MyEntity);
    stream.Position = 0;
}


public record MyEntity(Guid Id1, Guid Id2);
Method Toolchain Mean Error StdDev Ratio
JsonUtf8 \Core_Root_PR\corerun.exe 128.0 ns 0.73 ns 0.65 ns 0.78
JsonUtf8 \Core_Root_base\corerun.exe 163.3 ns 0.33 ns 0.26 ns 1.00

20% improvement for a DTO with two guid fields.

@ghost ghost assigned EgorBo Feb 5, 2023
@ghost
Copy link

ghost commented Feb 5, 2023

I couldn't figure out the best area label to add to this PR. If you have write-permissions please help me learn by adding exactly one area label.

@EgorBo
Copy link
Member Author

EgorBo commented Feb 5, 2023

Double checked that this PR doesn't regress performance for Guid.TryFormat and HexConverter. The only change is basically Utf8Formatter.TryFormat improvement.

Benchmarks:
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
using System.Text.Json;
using System.Text.Json.Serialization;

[JsonSerializable(typeof(Program.MyEntity))]
internal partial class MyEntityContext : JsonSerializerContext { }

public class Program
{
    static void Main(string[] args)
    {
        BenchmarkSwitcher.FromAssembly(typeof(Program).Assembly).Run(args);
    }

    public record MyEntity(Guid Id1, Guid Id2);

    [Benchmark]
    [ArgumentsSource(nameof(JsonUtf8_TestData))]
    public void JsonUtf8(Stream stream, MyEntity entity)
    {
        JsonSerializer.Serialize(stream, entity, MyEntityContext.Default.MyEntity);
        stream.Position = 0;
    }

    public static IEnumerable<object[]> JsonUtf8_TestData()
    {
        yield return new object[]
        {
            new MemoryStream(new byte[1024]),
            new MyEntity(
                new Guid("{48352100-E036-4661-ADE7-07B35B4DDEAD}"),
                new Guid("{06A079E7-9B62-41F4-B5C0-41A8EC6129BD}"))
        };
    }
    public static IEnumerable<Guid> Guid_TestData()
    {
        yield return new Guid("{483BC244-A481-4D29-986F-8C7D647AA20D}");
    }

    [Benchmark]
    [ArgumentsSource(nameof(Guid_TestData))]
    public string Guid_ToStringD(Guid guid) => guid.ToString(); // Default - "D"

    [Benchmark]
    [ArgumentsSource(nameof(Guid_TestData))]
    public string Guid_ToStringN(Guid guid) => guid.ToString("N");

    [Benchmark]
    [ArgumentsSource(nameof(Guid_TestData))]
    public string Guid_ToStringB(Guid guid) => guid.ToString("B");

    [Benchmark]
    [ArgumentsSource(nameof(Guid_TestData))]
    public string Guid_ToStringP(Guid guid) => guid.ToString("P");

    [Benchmark]
    [ArgumentsSource(nameof(HexConverter_TestData))]
    public string Convert_ToHexString(byte[] data) => Convert.ToHexString(data);

    public static IEnumerable<byte[]> HexConverter_TestData()
    {
        yield return Enumerable.Range(0, 4).Select(i => (byte)i).ToArray();
        yield return Enumerable.Range(0, 5).Select(i => (byte)i).ToArray();
        yield return Enumerable.Range(0, 7).Select(i => (byte)i).ToArray();
        yield return Enumerable.Range(0, 9).Select(i => (byte)i).ToArray();
        yield return Enumerable.Range(0, 15).Select(i => (byte)i).ToArray();
        yield return Enumerable.Range(0, 16).Select(i => (byte)i).ToArray();
        yield return Enumerable.Range(0, 20).Select(i => (byte)i).ToArray();
        yield return Enumerable.Range(0, 32).Select(i => (byte)i).ToArray();
        yield return Enumerable.Range(0, 100).Select(i => (byte)i).ToArray();
        yield return Enumerable.Range(0, 5000).Select(i => (byte)i).ToArray();
    }
}

image

@ghost
Copy link

ghost commented Feb 6, 2023

Tagging subscribers to this area: @dotnet/area-system-runtime
See info in area-owners.md if you want to be subscribed.

Issue Details

This PR:
Vectorizes Utf8Formatter.Guid.TryFormat and unifies SIMD paths for Guid.TryFormat, Utf8Formatter.Guid.TryFormat and HexConverter.

JSON UTF-8 serialization benchmark:

[Benchmark]
[ArgumentsSource(nameof(JsonUtf8_TestData))]
public void JsonUtf8(Stream stream, MyEntity entity)
{
    JsonSerializer.Serialize(stream, entity, MyEntityContext.Default.MyEntity);
    stream.Position = 0;
}


public record MyEntity(Guid Id1, Guid Id2);
Method Toolchain Mean Error StdDev Ratio
JsonUtf8 \Core_Root_PR\corerun.exe 128.0 ns 0.73 ns 0.65 ns 0.78
JsonUtf8 \Core_Root_base\corerun.exe 163.3 ns 0.33 ns 0.26 ns 1.00

20% improvement for a DTO with two guid fields.

Author: EgorBo
Assignees: EgorBo
Labels:

area-System.Runtime

Milestone: -


Vector128<byte> hexMap = casing == Casing.Upper ?
Vector128.Create("0123456789ABCDEF"u8) :
Vector128.Create("0123456789abcdef"u8);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the before/after Guid formatting benchmark results on Mono?

Copy link
Member Author

@EgorBo EgorBo Feb 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you concerned about the codegen for this on Mono? I spent a couple of hours trying to setup LLVM-AOT benchmarks (was told to do that on iOS in the Mono channel) and was not able to get numbers. But managed to get the codegen for

[MethodImpl(MethodImplOptions.NoInlining)]
public static Vector128<byte> Egor1() => 
    Vector128.Create("0123456789ABCDEF"u8);

[MethodImpl(MethodImplOptions.NoInlining)]
public static Vector128<byte> Egor2() => 
    Vector128.Create((byte)42);

[MethodImpl(MethodImplOptions.NoInlining)]
public static bool Egor3() =>
    AdvSimd.IsSupported;

iOS-arm64 LLVM-AOT codegen:

00000000000001a8 <_Program_Program_Egor1>:
1b0: f4 4f be a9    stp  x20, x19, [sp, #-32]!
1b4: fd 7b 01 a9    stp  x29, x30, [sp, #16]
1b8: 08 00 00 90    adrp  x8, 0x0 <_Program_Program_Egor1+0x8>
1bc: 13 00 00 90    adrp  x19, 0x0 <_Program_Program_Egor1+0xc>
1c0: 08 01 40 f9    ldr  x8, [x8]
1c4: 74 02 40 39    ldrb  w20, [x19]
1c8: 08 01 40 f9    ldr  x8, [x8]
1cc: 08 01 00 b5    cbnz  x8, 0x1ec <_Program_Program_Egor1+0x3c>
1d0: 34 01 00 34    cbz  w20, 0x1f4 <_Program_Program_Egor1+0x44>
1d4: 08 00 00 90    adrp  x8, 0x0 <_Program_Program_Egor1+0x24>
1d8: fd 7b 41 a9    ldp  x29, x30, [sp, #16]
1dc: 08 01 40 f9    ldr  x8, [x8]
1e0: 00 05 40 a9    ldp  x0, x1, [x8]
1e4: f4 4f c2 a8    ldp  x20, x19, [sp], #32
1e8: c0 03 5f d6    ret
1ec: 00 00 00 94    bl  0x1ec <_Program_Program_Egor1+0x3c>
1f0: 34 ff ff 35    cbnz  w20, 0x1d4 <_Program_Program_Egor1+0x24>
1f4: 00 00 00 90    adrp  x0, 0x0 <_Program_Program_Egor1+0x44>
1f8: 00 00 00 91    add  x0, x0, #0
1fc: 00 00 00 94    bl  0x1fc <_Program_Program_Egor1+0x4c>
200: 28 00 80 52    mov  w8, #1
204: 68 02 00 39    strb  w8, [x19]
208: f3 ff ff 17    b  0x1d4 <_Program_Program_Egor1+0x24>

00000000000001f0 <_Program_Program_Egor2>:
1f0: 40 45 85 d2    mov  x0, #10794
1f4: 41 45 85 d2    mov  x1, #10794
1f8: 40 45 a5 f2    movk  x0, #10794, lsl #16
1fc: 41 45 a5 f2    movk  x1, #10794, lsl #16
200: 40 45 c5 f2    movk  x0, #10794, lsl #32
204: 41 45 c5 f2    movk  x1, #10794, lsl #32
208: 40 45 e5 f2    movk  x0, #10794, lsl #48
20c: 41 45 e5 f2    movk  x1, #10794, lsl #48
210: c0 03 5f d6    ret

0000000000000214 <_Program_Program_Egor3>:
214: 20 00 80 52    mov  w0, #1  ;;;  AdvSimd.IsSupported
218: c0 03 5f d6    ret

So presumably it's better to revert this part or we can leave as is and wait for https://pvscmdupload.blob.core.windows.net/reports/allTestHistory%2frefs%2fheads%2fmain_arm64_ubuntu%2020.04_LLVM%3dfalse_MonoAOT%3dtrue_MonoInterpreter%3dfalse_RunKind%3dmicro_mono%2fSystem.Tests.Perf_Guid.GuidToString.html benchmark results (which are triaged twice a week).

So far it seems Mono-LLVM has quite a few CQ issues around crossplat Vector128 APIs

Copy link
Member

@jkotas jkotas Feb 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you concerned about the codegen for this on Mono?

Yes.

We had a fire drill at the end of .NET 7 where vectorization introduced major ship-stopper regressions on Mono.

The RCA (root cause analysis) report produced for this incident had number of action items to prevent the repeat of the situation. I can forward the report if you have not seen it. One of the action item was: "Libraries work that exercises new runtime features or patterns will be tested in more scenarios during implementation". This is using a new runtime feature, so I am asking to validate the perf impact on Mono according to what we said that we are going to do in the RCA.

cc @jeffhandley @adamsitnik @DrewScoggins @lewing @sblom

Copy link
Member Author

@EgorBo EgorBo Feb 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jkotas thanks! I've reverted the change, although, I feel like even with sub-optimal codegen it should still be faster than .NET 7.0, e.g. here are the wins from the initial vectorization in CoreCLR: #81650 (comment)

Also, it looks like we don't run benchmarks for LLVM JIT/AOT, do we? I don't see anything here https://aka.ms/HistoryIndex/ Presumably, WASM is interp there and Ubuntu Mono AOT aren't LLVM judging by the configs. So looks like Vector128 APIs aren't benchmarked at all since regular Mono (mini) doesn't yet support them.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DrewScoggins @lewing @sblom Can you comment on benchmarks that are run on Mono?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@EgorBo Yes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@EgorBo - there seems to be some reporting issues in the perf labs, which is not showing results from past 2 days. @LoopedBard3 is looking into it and we'll see if there is a way to get you reports manually.

Copy link
Member Author

@EgorBo EgorBo Feb 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

the data is updated but it didn't improve nor regressed from #81650 - it doesn't make sense to me unless AdvSimd.Arm64.IsSupported returned false

CoreCLR arm64:

image

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the data is updated but it didn't improve nor regressed from #81650 - it doesn't make sense to me unless AdvSimd.Arm64.IsSupported returned false

Is that a property of the runtime or something related to the physical hardware that the run is happening on? Both the CoreCLR and Mono runs are happening on the same hardware, so I would not expect one to have this property be true and another be false. It seems useful to track down why we are not seeing this win when we expect there to be one.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I am 100% sure it either doesn't use LLVM actually or AdvSimd.Arm64.IsSupported is never true there. There is no way that benchmark wouldn't be affected otherwise

// TODO: remove once https://github.com/dotnet/runtime/pull/80963 is merged
[MethodImpl(MethodImplOptions.AggressiveInlining)]
static Vector128<byte> ShuffleUnsafe(Vector128<byte> value, Vector128<byte> mask)
=> Ssse3.IsSupported ? Ssse3.Shuffle(value, mask) : AdvSimd.Arm64.VectorTableLookup(value, mask);
Copy link
Member

@lewing lewing Feb 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@radekdoulik could you take a look at this and the wasm implementation

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NOTE: we already have this hand-made crossplat helper in 4 different places (and @MihaZupan unified them, it's that the PR is not yet merged)

@EgorBo
Copy link
Member Author

EgorBo commented Feb 8, 2023

Anything else needed from my side?

@stephentoub @MihaZupan @jkotas

@adamsitnik adamsitnik added the tenet-performance Performance related issue label Feb 9, 2023
@EgorBo
Copy link
Member Author

EgorBo commented Feb 19, 2023

Failure is #81123

@EgorBo EgorBo merged commit 5ac3de1 into dotnet:main Feb 19, 2023
@EgorBo EgorBo deleted the hex-formatting-cleanup branch February 19, 2023 10:06
@ghost ghost locked as resolved and limited conversation to collaborators Mar 23, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants