KEMBAR78
Improve dynamic heap count by PeterSolMS · Pull Request #87619 · dotnet/runtime · GitHub
Skip to content

Conversation

PeterSolMS
Copy link
Contributor

Change the logic around check_heap_count so it's called on a single thread - this eliminates the cost of the join call from check_heap_count. When a change in heap count is found to be necessary, set the gc_start_event to wake up the other GC threads so they can participate in rethreading free list items.

Stress testing this with STRESS_DYNAMIC_HEAP_COUNT showed some issues (mostly assert failures) which are addressed as well.

- Fix hang bug caused by race condition in change_heap_count
- Change way we store dynamic heap input metrics to make it easier to surface them via ETW events.
- Refactor enter_spin_lock_msl into an inlineable part and a slower, more complex out-of-line part.
- Subtract time spent in safe_switch_to_thread and WiatLongerNoInstru from msl wait time - this makes this metric much less noisy.
- add more diagnostic output to check_heap_count and change_heap_count.
- add more spinning to EnterFinalizeLock to address slow suspensions in some ASP.NET benchmarks.
…ad and change_heap_count on n_heap threads.

This substantially reduces the time spent in check_heap_count because we can get rid of the join in that method.
…applies to the total budget, not the budget per heap.

This is equivalent, as long the heap count stays the same. If the heap count increases drastically, this change should reduce the amount of overshoot in total gen 0 budget that we see.
- fix assert in gc_thread_function that fired for inactive GC threads.
- fix build error in gc_thread_function for builds without dynamic heap count
- in trigger_gc_for_alloc, leave soh msl before triggering a collection
- fix STRESS_DYNAMIC_HEAP_COUNT issue where we tried to set n_heaps to 0.
- fixed initialization issue with dynamic_heap_count_data.new_n_heaps
- fixed assert failure for allocation of finalizable objects while heap count is changing.
- wrapped gc_lock around the body of GCHeap::GetTotalBytesInUse to ensure n_heaps wouldn't change while we iterate.
- removed the gc_lock around the body of GCHeap::ApproxTotalBytesInUse
@ghost ghost added the area-GC-coreclr label Jun 15, 2023
@ghost ghost assigned PeterSolMS Jun 15, 2023
@ghost
Copy link

ghost commented Jun 15, 2023

Tagging subscribers to this area: @dotnet/gc
See info in area-owners.md if you want to be subscribed.

Issue Details

Change the logic around check_heap_count so it's called on a single thread - this eliminates the cost of the join call from check_heap_count. When a change in heap count is found to be necessary, set the gc_start_event to wake up the other GC threads so they can participate in rethreading free list items.

Stress testing this with STRESS_DYNAMIC_HEAP_COUNT showed some issues (mostly assert failures) which are addressed as well.

Author: PeterSolMS
Assignees: PeterSolMS
Labels:

area-GC-coreclr

Milestone: -

{
assert (!gc_t_join.joined());
// inactive GC threads may observe gc_t_join.joined() being true here
assert ((n_heaps <= heap_number) || !gc_t_join.joined());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how can joined be true for the inactive GC threads?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gc_t_join.joined() just returns the value of a static field that will flicker based on what the active GC threads are doing, so occasionally this will fire for the inactive threads.


void gc_heap::check_heap_count ()
{
dynamic_heap_count_data.new_n_heaps = n_heaps;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since you are doing this here, you don't need to do it again here since it doesn't change inbetween

    if (gc_heap::background_running_p())
    {
        // can't have background gc running while we change the number of heaps
        // so it's useless to compute a new number of heaps here
        dynamic_heap_count_data.new_n_heaps = n_heaps;
    }
    else

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right. Thanks for spotting!

Comment on lines 25082 to 25085
new_n_heaps = min (dynamic_heap_count_data.lowest_heap_with_msl_uoh, new_n_heaps);

// but not down to zero, obviously...
new_n_heaps = max (new_n_heaps, 1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
new_n_heaps = min (dynamic_heap_count_data.lowest_heap_with_msl_uoh, new_n_heaps);
// but not down to zero, obviously...
new_n_heaps = max (new_n_heaps, 1);
new_n_heaps = min ((dynamic_heap_count_data.lowest_heap_with_msl_uoh + 1), new_n_heaps);

since dynamic_heap_count_data.lowest_heap_with_msl_uoh is a heap index...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, no - the intention of this code is specifically to stress the situation where we are excluding a heap with a taken lock. But when the lowest heap is heap 0, we can't exclude that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ahh you are right, if it's heap#3, we'd want to exclude heap#3.

// to register the object for finalization on the heap it was allocated on
hp = acontext->get_alloc_heap()->pGenGCHeap;
assert ((newAlloc == nullptr) || (hp == gc_heap::heap_of ((uint8_t*)newAlloc)));
hp = (newAlloc == nullptr) ? acontext->get_alloc_heap()->pGenGCHeap : gc_heap::heap_of ((uint8_t*)newAlloc);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like you've simply gotten rid of the assert. probably meant something like

            // the heap may have changed due to heap balancing - it's important
            // to register the object for finalization on the heap it was allocated on
            // but if that heap went out of service it wouldn't be the same 
            hp = acontext->get_alloc_heap()->pGenGCHeap;
            assert ((newAlloc == nullptr)
#ifndef DYNAMIC_HEAP_COUNT
                        || (hp == gc_heap::heap_of ((uint8_t*)newAlloc))
#endif //!DYNAMIC_HEAP_COUNT
             );

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have observed the assert firing, and I think it was because acontext->get_alloc_heap()->pGenGCHeap went out of service. It wouldn't be good to try to register an object for finalization on that heap.

Thus I think this part: hp = ... gc_heap::heap_of ((uint8_t*)newAlloc) is actually important.

I have #ifdef'd the code so we use the new version (without the assert) in the DYNAMIC_HEAP_COUNT case and the previous code otherwise.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ahh indeed we need to set the correct hp. I did not notice we have the finalization part below this code.

@Maoni0
Copy link
Member

Maoni0 commented Jun 16, 2023

now that the synchronization around changing the heap count becomes non trivial, it would be great if you could write a comment for gc_thread_function that talks about when n_heaps update is supposed to be observed by GC threads, when the decommissioned threads are supposed to wake up and etc.

@PeterSolMS
Copy link
Contributor Author

I agree with non-trivial, will add a comment as suggested.

}
#ifdef DYNAMIC_HEAP_COUNT
// check if we should adjust the number of heaps
check_heap_count();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that user threads can actually start executing during the call to check_heap_count. This implies that the more space lock can be taken again.

Copy link
Member

@Maoni0 Maoni0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@PeterSolMS PeterSolMS merged commit 1ffb77d into dotnet:main Jun 27, 2023
@ghost ghost locked as resolved and limited conversation to collaborators Jul 27, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants