KEMBAR78
Dynamic heap count by PeterSolMS · Pull Request #86245 · dotnet/runtime · GitHub
Skip to content

Conversation

@PeterSolMS
Copy link
Contributor

This is an initial implementation for changing the GC heap count dynamically in response to changing load conditions.

Using more heaps will increase memory footprint, but in most cases also improve throughput because more GC work is parallelized, and lock contention on the allocation code path is reduced by spreading the load.

The algorithm used makes this tradeoff explicit and increases the heap count by comparing the estimated percentage increase in throughput with the estimated percentage increase in memory footprint. It increases the heap count if the throughput is estimated to increase at least one percentage point more than the memory footprint increase, and decreases the heap count if the estimated reduction in memory footprint is at least one percentage point more than the decrease in throughput.

Because the input data for GC pause etc. are quite noisy, we use a median of 3 filter before the data is used to make decisions. Preliminary data suggests this is effective, but probably not enough.

PeterSolMS and others added 30 commits March 9, 2023 09:20
- park extra threads on gc_idle_thread_event
- update thread count in join
- null free lists and have background GC rebuild them
- have redistribute_regions call fix_allocation_contexts
- distribute free regions as well
- fix up ephemeral_heap_segment and generation_allocation_segment
Add hack to dynamically enable heap verify when we change the heap count.
- move finalization data between heaps
- update free list space per heap when rethreading free lists
- update allocation contexts so they don't reference decommissioned heaps
- allow redistribute_regions to fail
- allow enter_spin_lock to fail when called from try_allocate_more_space
- don't decommision heaps with a taken more space lock
- poison dynamic data and generation table for decommissioned heaps
- Add instrumentation
- Be more careful regarding signed vs. unsigned types to make GCC happy.
…ecking for containing heap to verify_free_lists, adding checking decommissioned heaps to verify_heap.
- park extra threads on gc_idle_thread_event
- update thread count in join
- null free lists and have background GC rebuild them
- have redistribute_regions call fix_allocation_contexts
- distribute free regions as well
- fix up ephemeral_heap_segment and generation_allocation_segment
Add hack to dynamically enable heap verify when we change the heap count.
- move finalization data between heaps
- update free list space per heap when rethreading free lists
- update allocation contexts so they don't reference decommissioned heaps
- allow redistribute_regions to fail
- allow enter_spin_lock to fail when called from try_allocate_more_space
- don't decommision heaps with a taken more space lock
- poison dynamic data and generation table for decommissioned heaps
- Add instrumentation
- Be more careful regarding signed vs. unsigned types to make GCC happy.
@PeterSolMS PeterSolMS requested review from mangod9 and mrsharm May 15, 2023 12:58
@ghost ghost assigned PeterSolMS May 15, 2023
@ghost ghost added the area-GC-coreclr label May 15, 2023
@ghost
Copy link

ghost commented May 15, 2023

Tagging subscribers to this area: @dotnet/gc
See info in area-owners.md if you want to be subscribed.

Issue Details

This is an initial implementation for changing the GC heap count dynamically in response to changing load conditions.

Using more heaps will increase memory footprint, but in most cases also improve throughput because more GC work is parallelized, and lock contention on the allocation code path is reduced by spreading the load.

The algorithm used makes this tradeoff explicit and increases the heap count by comparing the estimated percentage increase in throughput with the estimated percentage increase in memory footprint. It increases the heap count if the throughput is estimated to increase at least one percentage point more than the memory footprint increase, and decreases the heap count if the estimated reduction in memory footprint is at least one percentage point more than the decrease in throughput.

Because the input data for GC pause etc. are quite noisy, we use a median of 3 filter before the data is used to make decisions. Preliminary data suggests this is effective, but probably not enough.

Author: PeterSolMS
Assignees: PeterSolMS
Labels:

area-GC-coreclr

Milestone: -

PeterSolMS added 10 commits May 15, 2023 15:00
- rename config to GCDynamicAdaptation
- change dynamic heap count dprintf level so we can print just those messages
- aim for a percentage overhead reading between 1 and 5% - if above 10%, ramp up agressively, if above 5%, ramp up a step, if below 1% and significant space gains are possible, ramp down a step.
- make space cost computation per heap more realistic - use min gen 0 budget
…ze to 2.5 MB if COMPLUS_GCDynamicAdaption is enabled.
…mic_data - we had moved regions to other heaps, so total_gen_size became 0. We had adjusted generation_free_list_space for this when rethreading the free lists, but not generation_free_obj_space. So dd_current_size became a large positive number as a result.
Copy link
Member

@Maoni0 Maoni0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@PeterSolMS and I talked about this and we do want to get this in for Preview 5 to get more testing. we've already gone through the changes offline.

… for the assert, which is likely some faulty bookkeeping.
@PeterSolMS
Copy link
Contributor Author

Regarding the assert failures in compute_new_dynamic_data, the cases I could repro were all for gen 1 where the dd_current_size is actually irrelevant for the budget computation. Nonetheless, it seemed safer to set dd_current_size to 0 rather than a huge positive value for the case of total_gen_size < dd_fragmentation (dd). My guess is that when we empty gen 1, we don't actually zero generation_free_obj_space.

@mangod9
Copy link
Member

mangod9 commented May 23, 2023

Regarding the assert failures in compute_new_dynamic_data, the cases I could repro were all for gen 1 where the dd_current_size is actually irrelevant for the budget computation. Nonetheless, it seemed safer to set dd_current_size to 0 rather than a huge positive value for the case of total_gen_size < dd_fragmentation (dd). My guess is that when we empty gen 1, we don't actually zero generation_free_obj_space.

Is this something which needs to be fixed before merging?

@PeterSolMS
Copy link
Contributor Author

No, this is a lurking issue that has nothing to do with dynamic heap count. It happened without the feature active, and all the cases I looked at were on WKS GC, and were benign in the sense that there was no impact on the budget computation or other correctness aspects.

@mangod9
Copy link
Member

mangod9 commented May 23, 2023

Looks like the failures are known per Build-Analysis. Should be ok to merge.

@PeterSolMS PeterSolMS merged commit f6f7d89 into dotnet:main May 23, 2023
@PeterSolMS
Copy link
Contributor Author

Agree. Note that I have a PR out for the assert failure.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants