KEMBAR78
Don't inline expand boxes with many GC pointers by MichalStrehovsky · Pull Request #101669 · dotnet/runtime · GitHub
Skip to content

Conversation

@MichalStrehovsky
Copy link
Member

No description provided.

@ghost ghost added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Apr 29, 2024
@dotnet-policy-service
Copy link
Contributor

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

@MichalStrehovsky
Copy link
Member Author

/benchmark plaintext,json,fortunes aspnet-citrine-win runtime,libs

@pr-benchmarks
Copy link

pr-benchmarks bot commented Apr 29, 2024

Benchmark started for plaintext, json, fortunes on aspnet-citrine-win with runtime, libs. Logs: link

@pr-benchmarks
Copy link

pr-benchmarks bot commented Apr 29, 2024

plaintext - aspnet-citrine-win

application plaintext.base plaintext.pr
Max CPU Usage (%) 80 70 -12.50%
Max Cores usage (%) 2,233 1,961 -12.18%
Max Working Set (MB) 72 71 -1.39%
Max Private Memory (MB) 94 94 0.00%
Build Time (ms) 8,219 6,994 -14.90%
Start Time (ms) 358 352 -1.68%
Published Size (KB) 105,748 105,748 0.00%
Symbols Size (KB) 54 54 0.00%
.NET Core SDK Version 9.0.100-preview.5.24227.1 9.0.100-preview.5.24227.1
load plaintext.base plaintext.pr
Max CPU Usage (%) 83 82 -1.20%
Max Cores usage (%) 2,328 2,305 -0.99%
Max Working Set (MB) 46 46 0.00%
Max Private Memory (MB) 370 370 0.00%
Build Time (ms) 6,432 5,524 -14.12%
Start Time (ms) 0 0
Published Size (KB) 72,259 72,259 0.00%
Symbols Size (KB) 0 0
.NET Core SDK Version 8.0.204 8.0.204
First Request (ms) 90 89 -1.11%
Requests/sec 9,995,353 9,944,556 -0.51%
Requests 150,903,360 150,159,616 -0.49%
Mean latency (ms) 15.58 17.14 +10.01%
Max latency (ms) 551.67 442.85 -19.73%
Bad responses 0 0
Socket errors 0 0
Read throughput (MB/s) 1,198.08 1,198.08 0.00%
Latency 50th (ms) 0.90 0.90 0.00%
Latency 75th (ms) 8.77 8.99 +2.51%
Latency 90th (ms) 51.77 56.33 +8.81%
Latency 99th (ms) 177.66 214.30 +20.62%

json - aspnet-citrine-win

application json.base json.pr
Max CPU Usage (%) 77 79 +2.60%
Max Cores usage (%) 2,160 2,203 +1.99%
Max Working Set (MB) 61 62 +1.64%
Max Private Memory (MB) 83 84 +1.20%
Build Time (ms) 3,563 3,604 +1.15%
Start Time (ms) 362 353 -2.49%
Published Size (KB) 105,748 105,748 0.00%
Symbols Size (KB) 54 54 0.00%
.NET Core SDK Version 9.0.100-preview.5.24227.1 9.0.100-preview.5.24227.1
load json.base json.pr
Max CPU Usage (%) 78 78 0.00%
Max Cores usage (%) 2,185 2,176 -0.41%
Max Working Set (MB) 46 46 0.00%
Max Private Memory (MB) 363 363 0.00%
Build Time (ms) 3,288 3,329 +1.25%
Start Time (ms) 0 0
Published Size (KB) 72,259 72,259 0.00%
Symbols Size (KB) 0 0
.NET Core SDK Version 8.0.204 8.0.204
First Request (ms) 124 119 -4.03%
Requests/sec 1,163,836 1,148,431 -1.32%
Requests 17,573,830 17,341,261 -1.32%
Mean latency (ms) 2.39 2.34 -2.09%
Max latency (ms) 104.85 104.56 -0.28%
Bad responses 0 0
Socket errors 0 0
Read throughput (MB/s) 162.05 159.90 -1.33%
Latency 50th (ms) 0.28 0.29 +1.05%
Latency 75th (ms) 0.98 1.02 +4.08%
Latency 90th (ms) 7.82 7.36 -5.88%
Latency 99th (ms) 27.17 27.34 +0.63%

fortunes - aspnet-citrine-win

db fortunes.base fortunes.pr
Max CPU Usage (%) 31 32 +3.23%
Max Cores usage (%) 873 886 +1.49%
Max Working Set (MB) 53 53 0.00%
Build Time (ms) 1,194 1,185 -0.75%
Start Time (ms) 1,194 1,198 +0.34%
Published Size (KB) 421,392 421,392 0.00%
application fortunes.base fortunes.pr
Max CPU Usage (%) 89 88 -1.12%
Max Cores usage (%) 2,500 2,452 -1.92%
Max Working Set (MB) 530 561 +5.85%
Max Private Memory (MB) 540 589 +9.07%
Build Time (ms) 3,703 3,837 +3.62%
Start Time (ms) 2,244 2,136 -4.81%
Published Size (KB) 105,755 105,755 0.00%
Symbols Size (KB) 55 55 0.00%
.NET Core SDK Version 9.0.100-preview.5.24227.1 9.0.100-preview.5.24227.1
load fortunes.base fortunes.pr
Max CPU Usage (%) 39 38 -2.56%
Max Cores usage (%) 1,081 1,069 -1.11%
Max Working Set (MB) 46 46 0.00%
Max Private Memory (MB) 363 363 0.00%
Build Time (ms) 3,397 3,423 +0.77%
Start Time (ms) 0 0
Published Size (KB) 72,259 72,259 0.00%
Symbols Size (KB) 0 0
.NET Core SDK Version 8.0.204 8.0.204
First Request (ms) 106 105 -0.94%
Requests/sec 439,111 436,789 -0.53%
Requests 6,630,672 6,595,331 -0.53%
Mean latency (ms) 1.51 1.49 -1.32%
Max latency (ms) 40.65 24.59 -39.51%
Bad responses 0 0
Socket errors 0 0
Read throughput (MB/s) 577.48 574.43 -0.53%
Latency 50th (ms) 1.06 1.07 +0.94%
Latency 75th (ms) 1.32 1.32 0.00%
Latency 90th (ms) 1.93 1.92 -0.52%
Latency 99th (ms) 11.57 11.78 +1.82%

// After a certain number of GC pointers, the write barriers used
// in inline expansion stop being profitable.
ClassLayout* layout = typGetObjLayout(pResolvedToken->hClass);
if (layout->GetGCPtrCount() > 3)
Copy link
Member

@EgorBo EgorBo Apr 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect this might need more work, e.g. when we box a struct with 4 fields but all of them are null or e.g. nongc strings - this will be a regression. Also, I suspect this might ruin some "optimize boxing" optimizations.

I think we need to either fix all those places to deal with a helper call or we need to emit a bulk copy in codegen, just like I did in #99096 (but to do it for batched-copy, not batched-precise-barrier).

@jkotas
Copy link
Member

jkotas commented Apr 29, 2024

We have optimized JIT_Box helper variants only on Windows:

#ifdef TARGET_UNIX
SetJitHelperFunction(CORINFO_HELP_NEWSFAST, JIT_NewS_MP_FastPortable);
SetJitHelperFunction(CORINFO_HELP_NEWSFAST_ALIGN8, JIT_NewS_MP_FastPortable);
SetJitHelperFunction(CORINFO_HELP_NEWARR_1_VC, JIT_NewArr1VC_MP_FastPortable);
SetJitHelperFunction(CORINFO_HELP_NEWARR_1_OBJ, JIT_NewArr1OBJ_MP_FastPortable);
ECall::DynamicallyAssignFCallImpl(GetEEFuncEntryPoint(AllocateString_MP_FastPortable), ECall::FastAllocateString);
#else // TARGET_UNIX
// if (multi-proc || server GC)
if (GCHeapUtilities::UseThreadAllocationContexts())
{
SetJitHelperFunction(CORINFO_HELP_NEWSFAST, JIT_TrialAllocSFastMP_InlineGetThread);
SetJitHelperFunction(CORINFO_HELP_NEWSFAST_ALIGN8, JIT_TrialAllocSFastMP_InlineGetThread);
SetJitHelperFunction(CORINFO_HELP_BOX, JIT_BoxFastMP_InlineGetThread);
. We would need to implemented the optimized non-Windows JIT_Box helper variants on non-Windows too to make this optimization work well, using the same strategy as the optimized non-Windows allocation helpers are implemented.

@MichalStrehovsky
Copy link
Member Author

We would need to implemented the optimized non-Windows JIT_Box helper variants on non-Windows too to make this optimization work well, using the same strategy as the optimized non-Windows allocation helpers are implemented.

For ContainsPointers the optimized assembly helper needs to thunk out to C++ anyway so the optimization might not amount to much for the relevant cases. I'll add this to the issue. Not actually planning to work on this based on Egor's comments.

@MichalStrehovsky MichalStrehovsky deleted the gcptrbox branch April 29, 2024 21:55
@EgorBo
Copy link
Member

EgorBo commented Apr 29, 2024

Not actually planning to work on this based on Egor's comments.

I can try myself, but I think it should be more or less trivial, you just need to introduce a new helper (where you'll just do memcpyGC) and call it in genCodeForCpObj (so the actuall allocation for the box will be handled efficiently via the NEWS helper as is).

@MichalStrehovsky
Copy link
Member Author

I can try myself, but I think it should be more or less trivial, you just need to introduce a new helper

If we want to keep the NEWS, we might just be able to use the CORINFO_HELP_ASSIGN_STRUCT to do the copying.

@EgorBo
Copy link
Member

EgorBo commented Apr 29, 2024

I can try myself, but I think it should be more or less trivial, you just need to introduce a new helper

If we want to keep the NEWS, we might just be able to use the CORINFO_HELP_ASSIGN_STRUCT to do the copying.

Yep, that's what I meant, the helper is completely unused/unimplemented, so at least we won't have to bump R2R format 🙂

@github-actions github-actions bot locked and limited conversation to collaborators May 31, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants