-
Notifications
You must be signed in to change notification settings - Fork 5.2k
[wasm] SIMD support improvements #73289
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The llvm code generator works nicely with them.
So that C#
WasmBase.Constant(0xff11ff22ff33ff44, 0xff55ff66ff77ff88)
is compiled into wasm code
v128.const 0xff11ff22ff33ff44ff55ff66ff77ff88 [SIMD]
This will need more work, as it crashes clang during 'WebAssembly
Instruction Selection' pass:
WasmApp.Native.targets(353,5): error : 3. Running pass 'WebAssembly Instruction Selection' on function '@corlib_System_Runtime_Intrinsics_Wasm_WasmBase_Shuffle_System_Runtime_Intrinsics_Vector128_1_byte_System_Runtime_Intrinsics_Vector128_1_byte_System_Runtime_Intrinsics_Vector128_1_byte'
| measurement | no SIMD | SIMD | |-:|-:|-:| | Span, Reverse bytes | 0.0341ms | 0.0028ms | | Span, Reverse chars | 0.0394ms | 0.0062ms |
| length -= numIters * numElements * 2; | ||
| } | ||
| else if (Ssse3.IsSupported && (nuint)Vector128<byte>.Count * 2 <= length) | ||
| else if ((Ssse3.IsSupported || WasmBase.IsSupported) && (nuint)Vector128<byte>.Count * 2 <= length) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, but using Vector128.Shuffle(tempFirst, Vector128.Create((byte)15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0)); and Vector128.Shuffle(tempLast, Vector128.Create((byte)15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0));
The Create calls do need to be "inline" due to a JIT limitation in .NET 7, but they will get optimized and hoisted regardless since we have the relevant GT_CNS_VEC support in RyuJIT (Mono LLVM has something similar).
| namespace System.Runtime.Intrinsics.Wasm | ||
| { | ||
| [Intrinsic] | ||
| internal abstract class WasmBase |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should take this (well #53730) to actual API review for .NET 8.
Provided that the relevant hookups exist for the xplat Vector128<T> (and Vector<T>) APIs, WASM will also be able to light up on any paths using those APIs prior to WASM support being directly exposed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Provided that the relevant hookups exist for the xplat
Vector128<T>(andVector<T>) APIs, WASM will also be able to light up on any paths using those APIs prior to WASM support being directly exposed.
IIRC we already handle xplat for Vector128<T> on wasm. Vector<T> is next for me to look into, I hope I can speedup string and other parts on wasm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the info!
We're looking at moving more code to use Vector128<T> xplat APIs where possible so that we can support more platforms with less code overall (and only using the platform specific APIs where it gives significant improvement or where functionality can't be represented using the xplat APIs).
So supporting Vector128<T> will allow more WASM light-up faster.
I can add more wasm measurents, if that is what you are looking for?
We already have some of |
I misunderstood exactly what lightup this was bringing at first. I thought it was adding Now that I understand its just adding |
|
@radekdoulik btw, you can use |
|
|
||
| namespace System.Runtime.Intrinsics.Wasm | ||
| { | ||
| internal abstract class WasmBase |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not critical for this PR, but just noting that the approved API surface differs slightly from what the proposal initially had: #53730 (comment)
WasmBase -> PackedSimd, Constant is cut as you should just use Vector128.Create(...) instead, most other things just had the parameter names updated from x, a, b, and imm to be things like left, right, value, index, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@radekdoulik btw, you can use
/azp run runtime-wasm-perfto run all the wasm benchmarks fromdotnet/performancewith this PR.
nice to have this pipeline. it would not measure these changes though, as they are hidden behind WasmSIMD property, which is disabled by default.
|
Tagging subscribers to this area: @dotnet/area-system-runtime-intrinsics Issue DetailsAdd internal WasmBase class with wasm specific SIMD intrinsics methods. Improve Span Reverse and IndexOf performance on wasm with SIMD Measurements chrome/aot/amd64:
Measurements firefox/aot/amd64:
Code size reduction, the browser-bench sample: -5184 bytes (no SIMD 19539839 --> SIMD 19534655)
|
And update parameter names and indentation
|
Nice surprise to have the API approved! :-) I have updated the PR to reflect the changes and removed the span helper changes, as @adamsitnik's change uses the Vector128 now. The perf is not yet where it was, I will address that with next PR. I am moving the old measurements here from PR description to keep them for later comparison. Improve Span Reverse and IndexOf performance on wasm with SIMD Measurements chrome/aot/amd64:
Measurements firefox/aot/amd64:
Code size reduction, the browser-bench sample: -5184 bytes (no SIMD 19539839 --> SIMD 19534655) |
|
I think I addressed all things from reviews. Is there anything else to change? I would like it to merge before it becomes outdated again ;-) |
|
/azp run runtime-wasm,runtime-wasm-perf |
|
Azure Pipelines successfully started running 2 pipeline(s). |
|
Running |
Add internal PackedSimd class with wasm specific SIMD intrinsics methods.