KEMBAR78
Optimize decommit by PeterSolMS · Pull Request #35896 · dotnet/runtime · GitHub
Skip to content

Conversation

@PeterSolMS
Copy link
Contributor

Optimize decommitting GC heap memory pages:

  1. Move the decommit out of the GC pause for server GC
  2. Throttle decommit so the decommit rate stays reasonable
  3. Take gen 1 budget and free list space into account when planning what to decommit
  4. Add exponential smoothing for the decommit target to avoid decommit/recommit for short term fluctuations

Key ideas:
 - in decommit_ephemeral_segment_pages, instead of decommitting right there, just remember what we intended to decommit.
- in gc_thread function, have the thread with heap_number 0 periodically wake up and do gradual decommits for all heaps by calling the new method decommit_ephemeral_pages_step on them.
- decommit_ephemeral_pages_step will decommit a "reasonable" amount of space per heap and return how much it decomitted. A return value of 0 indicates no further progress.
Details:
- add code checking the assumption that nobody is changing the committed size or the desired allocation size while we do the decommitting
- fix logic error causing race between committing and decommitting
- add a decommit step at the end of GC to make sure decommit would still happen even for very frequent GCs.
- fix issue that crept in during pinned heap work
… smoothing of the decommit target when ramping down, limit decommit rate for workstation GC.
@ghost
Copy link

ghost commented May 6, 2020

Tagging subscribers to this area: @Maoni0
Notify danmosemsft if you want to be subscribed.

Copy link
Member

@Maoni0 Maoni0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

other than those comments I just have a nit comment on the coding style - it appears the spacing is done randomly. in GC we (well I at least 😄) try to conform to the style that is

one space between method/macro and '(', and no space if the method/macro takes no argument.

however I've been thinking to eliminate the 2nd part since it just seems folks in general have trouble conforming to it. instead of having to tell everyone a longer rule I'd rather tell them a shorter rule. so I'm ok if you leave the space in gc_heap::decommit_step (). at some point we can convert all of them to having one space.

also there should always be () around an expression, even when it's simple just so that we are constant so this line

uint8_t* page_start = align_on_page ( max (decommit_target, committed - DECOMMIT_STEP_SIZE) );

should be changed to

uint8_t* page_start = align_on_page (max (decommit_target, (committed - DECOMMIT_STEP_SIZE)));

and

heap_segment_committed(ephemeral_heap_segment)

should be changed to

heap_segment_committed (ephemeral_heap_segment)

thanks for conforming to these to make our code base more consistent :)

@Maoni0
Copy link
Member

Maoni0 commented May 7, 2020

for testing, please make sure the perf testing is done as we discussed, thanks! I would also do a stress run on this (I'm paranoid) even though I don't think it would turn up anything.

@PeterSolMS
Copy link
Contributor Author

I also fixed the coding style issues.

@PeterSolMS PeterSolMS requested a review from Maoni0 May 15, 2020 11:04
decommit_size = max (decommit_size, (ptrdiff_t)(100 * OS_PAGE_SIZE));

// but don't try to do more than what's actually committed in the segment
decommit_size = min ((heap_segment_committed (ephemeral_heap_segment) - heap_segment_mem (ephemeral_heap_segment)),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be

decommit_size = min ((heap_segment_committed (ephemeral_heap_segment) - heap_segment_allocated (ephemeral_heap_segment)),

however I think we should be able to refactor this to make it simpler... the changes involve so many different numbers when calculating the decommit size, plus the existing ones... that's a lot to keep track of. I'll have some suggestion to simplify it some.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your code works too, my intention was just to guard against the case of a large decommit size and a very low segment address where committed - decommit_size would overflow and wrap around.

@Maoni0
Copy link
Member

Maoni0 commented May 20, 2020

what do you think of this?

// get rid of decommit_heap_segment_pages_worker
size_t gc_heap::decommit_heap_segment_pages (heap_segment* seg,
                                             uint8_t* new_committed=0)
{
    if (use_large_pages_p)
        return;
    // Note that I got the extra 4 pages that you had in decommit_ephemeral_segment_pages_step; 
    // If there is a reason that it's 4 and not 2, you could change the 2 pages here to 4.
    uint8_t* page_start = align_on_page (new_committed ? new_committed : heap_segment_allocated (seg)) + (2 * OS_PAGE_SIZE);
    size_t size = 0;
    if (page_start < heap_segment_committed (seg))
    {
        size_t size = heap_segment_committed (seg) - page_start;
        if (size > (100 * OS_PAGE_SIZE))
        {
            virtual_decommit (page_start, size, heap_number);
            dprintf (3, ("Decommitting heap segment [%Ix, %Ix[(%d)",
                (size_t)page_start,
                (size_t)(page_start + size),
                size));
            heap_segment_committed (seg) = page_start;
            if (heap_segment_used (seg) > heap_segment_committed (seg))
            {
                heap_segment_used (seg) = heap_segment_committed (seg);
            }
        }
        else
            // didn't do any decommit, return 0.
            size = 0;
    }

    return size;
}


size_t gc_heap::decommit_ephemeral_segment_pages_step ()
{
    // we rely on desired allocation not being changed outside of GC
    assert (ephemeral_heap_segment->saved_desired_allocation == dd_desired_allocation (dynamic_data_of (0)));

    uint8_t* decommit_target = heap_segment_decommit_target (ephemeral_heap_segment);
    uint8_t* committed = heap_segment_committed (ephemeral_heap_segment);
    if (decommit_target < committed)
    {
        // we rely on other threads not messing with committed if we are about to trim it down
        assert (ephemeral_heap_segment->saved_committed == heap_segment_committed (ephemeral_heap_segment));

        // limit the decommit size to some reasonable value per time interval
        ptrdiff_t decommit_size = ((DECOMMIT_SIZE_PER_MILLISECOND * DECOMMIT_TIME_STEP_MILLISECONDS) / n_heaps);

        // figure out where the new committed should be
        uint8_t* new_committed = max (decommit_target, (committed - decommit_size));
        size_t size = decommit_heap_segment_pages (ephemeral_heap_segment, new_committed);

#ifdef _DEBUG
        ephemeral_heap_segment->saved_committed = committed - size;
#endif // _DEBUG

        return size;
    }
    return 0;
}

the decommit_heap_segment_pages we are calling for WKS will need to pass in an appropriate new_committed which is (allocated + slack_space).

@PeterSolMS
Copy link
Contributor Author

The problem with your suggested solution is that the additional 2 pages and the 100 page threshold in decommit_heap_segment_pages will apply to every decommit step. So if you have more than about 40 heaps, nothing will actually happen in decommit_heap_segment_pages when called by decommit_ephemeral_segment_pages_step because the step is smaller than 100 pages.

Of course you could just compensate for this in decommit_ephemeral_segment_pages_step by asking for at least 103 pages, but this is error prone because decommit_ephemeral_segment_pages_step would now knows what's going on in decommit_heap_segment_pages, and thus if one is changed but not the other, the code will malfuntion.

I also think the complications I put in make sense (or at least we should discuss in detail):

  • on machines with many heaps we should have a minimum step per heap (whether it should be a 100 pages or something else is of course debatable).
  • I think it also makes sense to guard against arithmetic overflow in the subtraction "committed - decommit_size".

@Maoni0
Copy link
Member

Maoni0 commented May 20, 2020

ahh, I misread then. I thought you also wanted 100pages minimum but now I read your last iteration and clearly that was not the case. this was the policy we had in the implementation before you change. I actually don't think 100 pages are that bad. but I'm also fine with fewer pages when the # of heaps is large. in which we can just change the 100 number and still simplify it because this new logic + the old logic is unnecessarily complicated.

would you be ok if we changed 100 to something like 20 pages (~ a large object size)?

I think it also makes sense to guard against arithmetic overflow in the subtraction "committed - decommit_size".

I noticed that too and the implementation I suggested does not have this problem.

@PeterSolMS
Copy link
Contributor Author

I noticed that too and the implementation I suggested does not have this problem.
Could you explain why not? I think it does have this problem in the case of a segment with a very low virtual address and very little committed memory.

The thing that bothers me about your solution is the implied coupling between decommit_ephemeral_segment_pages_step and decommit_heap_segment_pages in the sense that the former has to know about the internal workings of the latter. That's why I thought it cleaner to have a worker method has the common core that just does the mechanics exactly as it is being told.

- move computation of reasonable decommit step size to init code
- change value of EXTRA_SPACE in decommit_ephemeral_segment_pages_step to be consistent with decommit_heap_segment_pages
- reformulate computation of new_committed in decommit_ephemeral_segment_pages_step to eliminate possibility of arithmetic overflow.
Copy link
Member

@Maoni0 Maoni0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@PeterSolMS PeterSolMS merged commit 2641e23 into dotnet:master May 26, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Dec 9, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants