- 
                Notifications
    You must be signed in to change notification settings 
- Fork 5.2k
Description
UPDATED (Steve Harter):
Current plan is to avoid adding new emit-based logic (which should be considered a last resort) and instead use reflection with new, faster APIs. It is possible that we can also completely remove the usage of LINQ\compiled expressions pending perf and tradeoffs. NativeAOT would also use this faster reflection path (NativeAOT doesn't support emit and LINQ is not used since it is slower than standard reflection when LINQ is in "reflection fallback" mode).
Also note that when this issue was logged, there was not a reflection path. See #81262.
Steps in the plan:
- For the size regression (the main ask for this issue), change DI to use reflection instead of LINQ compiled expressions for Blazor in ActivatorUtilities. Also enable reflection to use emit in Blazor\WASM (currently not being activated). Interpreting the generated IL is faster than standard reflection (this was also shown to be true in System.Text.Json). Unblocks this issue so that ASP.NET can add their changes to use DI factories without a size regression in WASM.. Note that LINQ in DI will not be turned off in general at this point, just for WASM.
- Verify "before" benchmarks. Update: when the DI code is forced to use reflection instead of expressions, perf locally on Windows\CoreClr is ~5.4x slower (explicit args - specified by user) and ~2.4 slower (injected by DI) than compiled expressions before any perf improvements:
Forced to use reflection
|            Method |     Mean |    Error |   StdDev |   Median |      Min |      Max |   Gen0 | Allocated |
|------------------ |---------:|---------:|---------:|---------:|---------:|---------:|-------:|----------:|
| Factory_3Explicit | 54.40 ns | 0.278 ns | 0.247 ns | 54.33 ns | 54.04 ns | 54.85 ns | 0.02   |      88 B |
| Factory_3Injected | 74.10 ns | 0.585 ns | 0.489 ns | 73.95 ns | 73.26 ns | 74.98 ns | 0.0084 |      88 B |
Default behavior of using expressions
|            Method |     Mean |    Error |   StdDev |   Median |      Min |      Max |   Gen0 | Allocated |
|------------------ |---------:|---------:|---------:|---------:|---------:|---------:|-------:|----------:|
| Factory_3Explicit | 10.18 ns | 0.170 ns | 0.159 ns | 10.20 ns |  9.97 ns | 10.52 ns | 0.0038 |      40 B |
| Factory_3Injected | 31.32 ns | 0.131 ns | 0.110 ns | 31.37 ns | 31.10 ns | 31.45 ns | 0.0037 |      40 B |
- Have the DI code use the the new MethodInvoker APIs from New reflection invoke APIs [draft] designs#286 and For perf, use the new ConstructorInvoker APIs for ActivatorUtilities.CreateFactory #89573.
- Run the benchmarks again.
Update: below are CorClr\Windows with the "before" is compiled expressions in latest v8 and "after" is reflection in latest v8+PR above. The "injection" reflection scenarios are now ~1.1x (10%) slower than expressions and the "explicit" scenario is ~1.9x slower than expressions. So reflection went from 5.4->1.9x slower (explicit) and 2.4->1.1x (injected) slower than compiled expressions and comparing against the original above shows ~3.1 \ ~1.9x improvement and with no extra allocs. Disclaimer: perf improvements are "fast path" scenarios of <= 4 args and meet other constraints; otherwise perf gain will be less; also Mono+Blazor and NativeAot will have different perf characteristics, but all reflection optimizations were also applied there.
|                                 Method |        Job |              Toolchain |       Mean |      Error |     StdDev |     Median |        Min |        Max | Ratio | RatioSD |   Gen0 | Allocated | Alloc Ratio |
|--------------------------------------- |----------- |----------------------- |-----------:|-----------:|-----------:|-----------:|-----------:|-----------:|------:|--------:|-------:|----------:|------------:|
|                      Factory_3Explicit | Job-FWQKKQ |  \DI_AFTER\corerun.exe |  17.355 ns |  0.1731 ns |  0.1619 ns |  17.352 ns |  17.156 ns |  17.746 ns |  1.00 |    0.00 | 0.0038 |      40 B |        1.00 |
|                      Factory_3Explicit | Job-GYQEFS | \DI_BEFORE\corerun.exe |   9.322 ns |  0.1178 ns |  0.1102 ns |   9.277 ns |   9.151 ns |   9.538 ns |  0.54 |    0.01 | 0.0038 |      40 B |        1.00 |
|                                        |            |                        |            |            |            |            |            |            |       |         |        |           |             |
|                      Factory_3Injected | Job-FWQKKQ |  \DI_AFTER\corerun.exe |  39.187 ns |  0.1887 ns |  0.1673 ns |  39.154 ns |  38.938 ns |  39.474 ns |  1.00 |    0.00 | 0.0037 |      40 B |        1.00 |
|                      Factory_3Injected | Job-GYQEFS | \DI_BEFORE\corerun.exe |  35.648 ns |  0.3830 ns |  0.3395 ns |  35.623 ns |  35.209 ns |  36.312 ns |  0.91 |    0.01 | 0.0037 |      40 B |        1.00 |
However, the perf gains on Mono interpreter not as significant; they are ~1.3-1.5x faster:
Before (for 1,000,000 iterations) in ms:
Factory_3Injected: 1454
Factory_3Explicit: 588
After:
Factory_3Injected: 1140
Factory_3Explicit: 385
which are still 1.4x slower than the LINQ expressions for injection and 3.4x slower for the direct case:
Factory_3Injected: 833
Factory_3Explicit: 112
It appears overhead of lambda-capture methods is more significant on Mono interpreter than CoreClr; this needs more investigation. Reflection itself does not seem to be the bottleneck anymore.
- Completely remove the use of LINQ\compiled expressions in DI (for .NET 8.0) if perf is acceptable here and in the non-WASM server scenario. Update: at this point, keeping the compiled expression for non-Blazor and non-NativeAot makes sense since they are still faster.
- If perf is not acceptable for all cases, add a new feature switch that will use LINQ instead of reflection. It would be off by default in WASM, but could be turned on if LINQ is already used elsewhere in the app, for example. Update we need to verify against actual ASP.NET benchmarks, but perf is likely acceptable given the trade-off in Blazor is acceptable CPU perf tradeoff vs. having the large LINQ assembly.
Original
Additional Context: dotnet/aspnetcore#40521.
In 7.0-preview2, Blazor was updated to type activate component types using ActivatorUtilities.CreateFactory. The implementation of this API uses ExpressionTrees, and results in about a 100kb size increase to Blazor WebAssembly apps by default.
Given that ServiceProvider's default implementation in .NET (not-NativeAOT) apps is to use ILEmit, we could avoid the size hit to Blazor by updating ActivatorUtilities.CreateFactory to use ILEmit too when possible.
Note that Blazor is rolling back the change to type activate components until we can ensure it can be accommodated without the size regression i.e. until this issue is resolved.