KEMBAR78
Optimize library code using arm64 intrinsics · Issue #33308 · dotnet/runtime · GitHub
Skip to content

Optimize library code using arm64 intrinsics #33308

@BruceForstall

Description

@BruceForstall

The following classes/functions in the libraries have Intel x86/x64 intrinsics usage. These are where _ISA_.IsSupported() is called. This information was collected manually and might not be complete. Some of these function names represent many overloads. There are some vectorized helper methods not shown here -- where a function calls IsSupported and then calls a specific helper function to do the actual work, such as for SSE2 or AVX2 specifically. There are other cases where Vector<T> is used, but arm64 already supports that (it should be verified that the arm64 Vector<T> code is complete and performant).

When each of these has added an arm64-specific intrinsics optimization, it should be "checked off".

The sections below are ordered in the presumed priority order that they should be implemented in. (There is no assumed priority order for the individual functions in each section.)

It is expected that System.Collections.BitArray, System.Numerics, and System.SpanHelpers will be "arm64 intrinsi-fied" for .NET 5. If possible, System.Buffers and System.Text will as well, but that is not considered required.

System.Collections.BitArray #33309

  • System.Collections.BitArray - constructor
  • System.Collections.BitArray.And()
  • System.Collections.BitArray.Or()
  • System.Collections.BitArray.Xor()
  • System.Collections.BitArray.Not()
  • System.Collections.BitArray.CopyTo()

System.Runtime.Intrinsics #33496

Vector64

  • As<T, U>()
  • AsInt64()
  • AsUInt64()
  • AsDouble()
  • CreateScalarUnsafe(int value);
  • CreateScalarUnsafe(uint value);
  • CreateScalarUnsafe(float value);
  • CreateScalarUnsafe(byte value);
  • CreateScalarUnsafe(sbyte value);
  • CreateScalarUnsafe(short value);
  • CreateScalarUnsafe(ushort value);
  • CreateScalar(uint)
  • CreateScalar(float)
  • CreateScalar(sbyte)
  • CreateScalar(ushort)
  • CreateScalar(short)
  • CreateScalar(byte)
  • CreateScalar(int)
  • Create(sbyte, sbyte, sbyte, sbyte, sbyte, sbyte, sbyte, sbyte)
  • Create(byte, byte, byte, byte, byte, byte, byte, byte)
  • Create(ushort, ushort, ushort, ushort)
  • Create(short, short, short, short)
  • Create(float, float)
  • Create(int, int)
  • Create(ulong)
  • Create(uint)
  • Create(uint, uint)
  • Create(float)
  • Create(sbyte)
  • Create(long)
  • Create(int)
  • Create(short)
  • Create(double)
  • Create(byte)
  • Create(ushort)
  • GetElement(int index)
  • ToScalar()
  • ToVector128()
  • ToVector128Unsafe()
  • WithElement(Vector64, int, T)

Vector128

  • As<T, U>()
  • AsVector()
  • AsVector4(Vector128)
  • AsVector128(Vector)
  • AsVector128(Vector4)
  • CreateScalarUnsafe(int value);
  • CreateScalarUnsafe(uint value);
  • CreateScalarUnsafe(float value);
  • CreateScalarUnsafe(long value);
  • CreateScalarUnsafe(ulong value);
  • CreateScalarUnsafe(double value);
  • CreateScalarUnsafe(byte value);
  • CreateScalarUnsafe(sbyte value);
  • CreateScalarUnsafe(short value);
  • CreateScalarUnsafe(ushort value);
  • CreateScalar(ulong)
  • CreateScalar(uint)
  • CreateScalar(ushort)
  • CreateScalar(sbyte)
  • CreateScalar(float)
  • CreateScalar(int)
  • CreateScalar(short)
  • CreateScalar(double)
  • CreateScalar(byte)
  • CreateScalar(long)
  • Create(sbyte, sbyte, sbyte, sbyte, sbyte, sbyte, sbyte, sbyte, sbyte, sbyte, sbyte, sbyte, sbyte, sbyte, sbyte, sbyte)
  • Create(byte, byte, byte, byte, byte, byte, byte, byte, byte, byte, byte, byte, byte, byte, byte, byte)
  • Create(ushort, ushort, ushort, ushort, ushort, ushort, ushort, ushort)
  • Create(short, short, short, short, short, short, short, short)
  • Create(uint, uint, uint, uint)
  • Create(float, float, float, float)
  • Create(int, int, int, int)
  • Create(ulong, ulong)
  • Create(Vector64, Vector64)
  • Create(Vector64, Vector64)
  • Create(Vector64, Vector64)
  • Create(Vector64, Vector64)
  • Create(Vector64, Vector64)
  • Create(Vector64, Vector64)
  • Create(Vector64, Vector64)
  • Create(Vector64, Vector64)
  • Create(byte)
  • Create(double)
  • Create(short)
  • Create(int)
  • Create(long)
  • Create(Vector64, Vector64)
  • Create(float)
  • Create(sbyte)
  • Create(uint)
  • Create(ulong)
  • Create(double, double)
  • Create(long, long)
  • Create(Vector64, Vector64)
  • Create(ushort)
  • GetElement(int index)
  • GetLower()
  • GetUpper()
  • WithElement(Vector128, int, T)
  • WithLower(Vector64)
  • WithUpper(Vector64)
  • ToScalar()

Vector256

  • Software fallback

System.Numerics

System.Numerics.BitOperations #33495

  • System.Numerics.BitOperations.LeadingZeroCount()
  • System.Numerics.BitOperations.Log2()
  • System.Numerics.BitOperations.PopCount()
  • System.Numerics.BitOperations.TrailingZeroCount()

System.Numerics.Matrix4x4 #33565

  • System.Numerics.Matrix4x4.Transpose()
  • System.Numerics.Matrix4x4.Lerp()
  • System.Numerics.Matrix4x4.operator-()
  • System.Numerics.Matrix4x4.operator+()
  • System.Numerics.Matrix4x4.operator*()
  • System.Numerics.Matrix4x4.operator==()
  • System.Numerics.Matrix4x4.operator!=()

System.SpanHelpers #33707

  • System.SpanHelpers.IndexOf(byte)
  • System.SpanHelpers.IndexOf(char)
  • System.SpanHelpers.IndexOfAny(byte)
    [ ] System.SpanHelpers.SequenceCompareTo(byte) (SIMD vector implementation is fast enough)
    [ ] System.SpanHelpers.SequenceEqual(byte) (SIMD vector implementation is fast enough)
    [ ] System.SpanHelpers.LocateFirstFoundByte() (Only used by SIMD version of IndexOf and IndexOfAny which are already optimized by ARM64 intrinsics)

System.Buffers #35033

(Not completed in 5.0.0; moved to 6.0.0)

  • System.Buffers.Text.Base64.DecodeFromUtf8()
  • System.Buffers.Text.Base64.EncodeToUtf8()

System.Text

System.Text.ASCIIUtility #35034

(Not completed in 5.0.0; #41292 contains the items moved to 6.0.0)

  • System.Text.ASCIIUtility.GetIndexOfFirstNonAsciiByte()
  • System.Text.ASCIIUtility.GetIndexOfFirstNonAsciiChar() - (Not completed in 5.0.0; moved to 6.0.0) - PR Get index of first non ascii char #39507
  • System.Text.ASCIIUtility.NarrowFourUtf16CharsToAsciiAndWriteToBuffer()
  • System.Text.ASCIIUtility.NarrowUtf16ToAscii() - (Not completed in 5.0.0; moved to 6.0.0) - PR Narrow utf16 to ascii #39509
  • System.Text.ASCIIUtility.WidenAsciiToUtf16()
  • System.Text.ASCIIUtility.WidenFourAsciiBytesToUtf16AndWriteToBuffer()
  • System.Text.ASCIIUtility.CountNumberOfLeadingAsciiBytesFromUInt32WithSomeNonAsciiData()

System.Text.Unicode #35035

  • System.Text.Unicode.Utf16Utility.GetPointerToFirstInvalidChar()
  • System.Text.Unicode.Utf8Utility.TranscodeToUtf8()
  • System.Text.Unicode.Utf8Utility.GetPointerToFirstInvalidByte()

System.Text.Encodings.Web #35036

  • System.Text.Encodings.Web.DefaultJavaScriptEncoder.FindFirstCharacterToEncodeUtf8()
  • System.Text.Encodings.Web.DefaultJavaScriptEncoderBasicLatin.FindFirstCharacterToEncode()
  • System.Text.Encodings.Web.DefaultJavaScriptEncoderBasicLatin.FindFirstCharacterToEncodeUtf8()
  • System.Text.Encodings.Web.TextEncoder.FindFirstCharacterToEncodeUtf8()
  • System.Text.Encodings.Web.UnsafeRelaxedJavaScriptEncoder.WillEncode()
  • System.Text.Encodings.Web.UnsafeRelaxedJavaScriptEncoder.FindFirstCharacterToEncodeUtf8()

Metadata

Metadata

Assignees

No one assigned

    Labels

    EpicGroups multiple user stories. Can be grouped under a theme.arch-arm64area-Metahelp wanted[up-for-grabs] Good issue for external contributors

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions