[wasm] SIMD support improvements #73289

radekdoulik · 2022-08-03T10:50:37Z

Add internal PackedSimd class with wasm specific SIMD intrinsics methods.

The llvm code generator works nicely with them.

…wip-2

So that C# WasmBase.Constant(0xff11ff22ff33ff44, 0xff55ff66ff77ff88) is compiled into wasm code v128.const 0xff11ff22ff33ff44ff55ff66ff77ff88 [SIMD]

This will need more work, as it crashes clang during 'WebAssembly Instruction Selection' pass: WasmApp.Native.targets(353,5): error : 3. Running pass 'WebAssembly Instruction Selection' on function '@corlib_System_Runtime_Intrinsics_Wasm_WasmBase_Shuffle_System_Runtime_Intrinsics_Vector128_1_byte_System_Runtime_Intrinsics_Vector128_1_byte_System_Runtime_Intrinsics_Vector128_1_byte'

…wip-2

| measurement | no SIMD | SIMD | |-:|-:|-:| | Span, Reverse bytes | 0.0341ms | 0.0028ms | | Span, Reverse chars | 0.0394ms | 0.0062ms |

…wip-4

tannergooding · 2022-08-03T14:22:06Z

src/libraries/System.Private.CoreLib/src/System/SpanHelpers.Byte.cs

                length -= numIters * numElements * 2;
            }
-            else if (Ssse3.IsSupported && (nuint)Vector128<byte>.Count * 2 <= length)
+            else if ((Ssse3.IsSupported || WasmBase.IsSupported) && (nuint)Vector128<byte>.Count * 2 <= length)


Same here, but using Vector128.Shuffle(tempFirst, Vector128.Create((byte)15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0)); and Vector128.Shuffle(tempLast, Vector128.Create((byte)15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0));

The Create calls do need to be "inline" due to a JIT limitation in .NET 7, but they will get optimized and hoisted regardless since we have the relevant GT_CNS_VEC support in RyuJIT (Mono LLVM has something similar).

tannergooding · 2022-08-03T14:25:36Z

src/libraries/System.Private.CoreLib/src/System/Runtime/Intrinsics/Wasm/WasmBase.cs

+namespace System.Runtime.Intrinsics.Wasm
+{
+    [Intrinsic]
+    internal abstract class WasmBase


We should take this (well #53730) to actual API review for .NET 8.

Provided that the relevant hookups exist for the xplat Vector128<T> (and Vector<T>) APIs, WASM will also be able to light up on any paths using those APIs prior to WASM support being directly exposed.

Provided that the relevant hookups exist for the xplat Vector128<T> (and Vector<T>) APIs, WASM will also be able to light up on any paths using those APIs prior to WASM support being directly exposed.

IIRC we already handle xplat for Vector128<T> on wasm. Vector<T> is next for me to look into, I hope I can speedup string and other parts on wasm.

Thanks for the info!

We're looking at moving more code to use Vector128<T> xplat APIs where possible so that we can support more platforms with less code overall (and only using the platform specific APIs where it gives significant improvement or where functionality can't be represented using the xplat APIs).

So supporting Vector128<T> will allow more WASM light-up faster.

radekdoulik · 2022-08-04T10:39:39Z

Any more extensive perf numbers for these changes?

I can add more wasm measurents, if that is what you are looking for?

We have many paths which use the Vector128<T> (and several which still use Vector<T>) xplat APIs now and which could get WASM lightup provided the recognition existed.

We already have some of Vector128<T> intrinsics in place for wasm.

tannergooding · 2022-08-04T16:21:28Z

I can add more wasm measurents, if that is what you are looking for?

I misunderstood exactly what lightup this was bringing at first. I thought it was adding Vector128<T> light up as well and so was expecting perf numbers for more than just Span<T>.

Now that I understand its just adding WasmBase and using that directly for Span<T> where we were already using xplat intrinsics, I think the numbers provided are fine.

radical · 2022-08-05T04:27:12Z

@radekdoulik btw, you can use /azp run runtime-wasm-perf to run all the wasm benchmarks from dotnet/performance with this PR.

tannergooding · 2022-08-09T19:14:07Z

...s/System.Private.CoreLib/src/System/Runtime/Intrinsics/Wasm/WasmBase.PlatformNotSupported.cs

+
+namespace System.Runtime.Intrinsics.Wasm
+{
+    internal abstract class WasmBase


Not critical for this PR, but just noting that the approved API surface differs slightly from what the proposal initially had: #53730 (comment)

WasmBase -> PackedSimd, Constant is cut as you should just use Vector128.Create(...) instead, most other things just had the parameter names updated from x, a, b, and imm to be things like left, right, value, index, etc.

@radekdoulik btw, you can use /azp run runtime-wasm-perf to run all the wasm benchmarks from dotnet/performance with this PR.

nice to have this pipeline. it would not measure these changes though, as they are hidden behind WasmSIMD property, which is disabled by default.

ghost · 2022-08-11T04:48:46Z

Tagging subscribers to this area: @dotnet/area-system-runtime-intrinsics
See info in area-owners.md if you want to be subscribed.

Issue Details

Add internal WasmBase class with wasm specific SIMD intrinsics methods.

Improve Span Reverse and IndexOf performance on wasm with SIMD

Measurements chrome/aot/amd64:

measurement	no SIMD	SIMD
Span, Reverse bytes	0.0332ms	0.0025ms
Span, Reverse chars	0.0332ms	0.0060ms
Span, IndexOf bytes	0.2068us	0.1002us
Span, IndexOf chars	0.0146ms	0.0028ms

Measurements firefox/aot/amd64:

measurement	no SIMD	SIMD
Span, Reverse bytes	0.0338ms	0.0022ms
Span, Reverse chars	0.0339ms	0.0048ms
Span, IndexOf bytes	0.2533us	0.1394us
Span, IndexOf chars	0.0201ms	0.0039ms

Code size reduction, the browser-bench sample: -5184 bytes (no SIMD 19539839 --> SIMD 19534655)

Author:	radekdoulik
Assignees:	radekdoulik
Labels:	`arch-wasm`, `area-System.Runtime.Intrinsics`
Milestone:	-

…wip-4

And update parameter names and indentation

radekdoulik · 2022-08-19T10:58:26Z

Nice surprise to have the API approved! :-) I have updated the PR to reflect the changes and removed the span helper changes, as @adamsitnik's change uses the Vector128 now. The perf is not yet where it was, I will address that with next PR.

I am moving the old measurements here from PR description to keep them for later comparison.

Improve Span Reverse and IndexOf performance on wasm with SIMD

Measurements chrome/aot/amd64:

measurement	no SIMD	SIMD
Span, Reverse bytes	0.0332ms	0.0025ms
Span, Reverse chars	0.0332ms	0.0060ms
Span, IndexOf bytes	0.2068us	0.1002us
Span, IndexOf chars	0.0146ms	0.0028ms

Measurements firefox/aot/amd64:

measurement	no SIMD	SIMD
Span, Reverse bytes	0.0338ms	0.0022ms
Span, Reverse chars	0.0339ms	0.0048ms
Span, IndexOf bytes	0.2533us	0.1394us
Span, IndexOf chars	0.0201ms	0.0039ms

Code size reduction, the browser-bench sample: -5184 bytes (no SIMD 19539839 --> SIMD 19534655)

radekdoulik · 2022-08-25T18:04:28Z

I think I addressed all things from reviews. Is there anything else to change? I would like it to merge before it becomes outdated again ;-)

radical · 2022-08-25T18:21:58Z

/azp run runtime-wasm,runtime-wasm-perf

azure-pipelines · 2022-08-25T18:22:21Z

Azure Pipelines successfully started running 2 pipeline(s).

radical · 2022-08-25T18:22:21Z

Running runtime-wasm-perf just as a sanity check for the perf pipeline.

radekdoulik added 30 commits April 4, 2022 15:47

Initial wasm SIMD support

d2f12d0

Enable Vector intrinsic on wasm

68fe93a

The llvm code generator works nicely with them.

Add missing files

5b8576c

Make SIMD support conditional

bb233c8

Remove test code

3d290a1

Merge remote-tracking branch 'remotes/origin/main' into pr-wasm-simd-…

0a901ca

…wip-2

Fix debug build

e8d854d

Update after merge

1093fff

Add Splat and ExcractLane methods

5c86e35

Switch i64 values for Constant method

7f3dab5

So that C# WasmBase.Constant(0xff11ff22ff33ff44, 0xff55ff66ff77ff88) is compiled into wasm code v128.const 0xff11ff22ff33ff44ff55ff66ff77ff88 [SIMD]

Update PlatformNotSupported version of WasmBase

3bd8792

Fix CI build

d86aa45

Add ReplaceLane and Swizzle

7cf1640

Change WasmBase.Constant to get Vector128 as input

b7cb41d

Merge remote-tracking branch 'remotes/origin/main' into pr-wasm-simd-…

cadd77c

…wip-2

Handle SN_Shuffle

139a389

Fix crash in OP_STOREX_MEMBASE

b1f2dbf

Add build test

9237cc5

Merge remote-tracking branch 'remotes/origin/main' into pr-wasm-simd-…

8d1ab4f

…wip-2

Fix remaining conflict

ae0cbfa

Remove unused prop

45f6329

IsSupported should be static

2bd2f21

Handle SN_get_IsSupported

b157f36

Keep passing --enable-simd to wasm-opt

d3aa296

Add Span reverse measurements

efecdf9

Use WasmBase to improve perf of Span reverse

9cbe7d5

| measurement | no SIMD | SIMD | |-:|-:|-:| | Span, Reverse bytes | 0.0341ms | 0.0028ms | | Span, Reverse chars | 0.0394ms | 0.0062ms |

Remove debug prints

74933e0

Merge remote-tracking branch 'remotes/origin/main' into pr-wasm-simd-…

75ff3f6

…wip-4

Do not expose WasmBase API

83a9e56

tannergooding reviewed Aug 3, 2022

View reviewed changes

Feedback, remove OP_WASM_SIMD_V128_CONST

6031e60

tannergooding reviewed Aug 9, 2022

View reviewed changes

jeffhandley added arch-wasm WebAssembly architecture area-System.Runtime.Intrinsics labels Aug 11, 2022

radekdoulik added 4 commits August 19, 2022 11:30

Merge remote-tracking branch 'remotes/origin/main' into pr-wasm-simd-…

cacd9de

…wip-4

Merge remote-tracking branch 'remotes/origin/main' into pr-wasm-simd-…

c35c4a9

…wip-4

s/WasmBase/PackedSimd

e24ad3b

And update parameter names and indentation

Remove Constant method implementation

ecc1e2a

radekdoulik added 3 commits August 19, 2022 13:01

Remove using clauses

7301c51

One more place to remove Constant method

a8b6a8a

Fix white space

48055ec

This was referenced Aug 22, 2022

Infra improvements for Helix #68176

Closed

Methodical_others test JIT/Methodical/Coverage/copy_prop_byref_to_native_int crashing #69832

Open

Long Running Test: Interop/MonoAPI/MonoMono/PInvokeDetach/PInvokeDetach.sh #73040

Closed

lewing requested review from tannergooding and vargaz August 30, 2022 16:17

lewing approved these changes Aug 31, 2022

View reviewed changes

lewing merged commit 4190ef8 into dotnet:main Aug 31, 2022

ghost locked as resolved and limited conversation to collaborators Sep 30, 2022

[wasm] SIMD support improvements #73289

[wasm] SIMD support improvements #73289

Uh oh!

Conversation

radekdoulik commented Aug 3, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tannergooding Aug 3, 2022

Choose a reason for hiding this comment

Uh oh!

tannergooding Aug 3, 2022

Choose a reason for hiding this comment

Uh oh!

radekdoulik Aug 4, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tannergooding Aug 4, 2022

Choose a reason for hiding this comment

Uh oh!

radekdoulik commented Aug 4, 2022

Uh oh!

tannergooding commented Aug 4, 2022

Uh oh!

radical commented Aug 5, 2022

Uh oh!

tannergooding Aug 9, 2022

Choose a reason for hiding this comment

Uh oh!

radekdoulik Aug 19, 2022

Choose a reason for hiding this comment

Uh oh!

ghost commented Aug 11, 2022

Uh oh!

radekdoulik commented Aug 19, 2022

Uh oh!

radekdoulik commented Aug 25, 2022

Uh oh!

radical commented Aug 25, 2022

Uh oh!

azure-pipelines bot commented Aug 25, 2022

Uh oh!

radical commented Aug 25, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

radekdoulik commented Aug 3, 2022 •

edited

Loading

radekdoulik Aug 4, 2022 •

edited

Loading