KEMBAR78
Avoid extra `Array.Copy` in `List<T>.Insert(Range)` when the list is full by karakasa · Pull Request #90089 · dotnet/runtime · GitHub
Skip to content

Conversation

@karakasa
Copy link
Contributor

@karakasa karakasa commented Aug 7, 2023

Originally there are 3 Array.Copy during List<T>.Insert when Capacity is maxed out. Avoid one extra copy with a modified Grow.

Microbenchmark in following comments. List<T>.InsertRange also benefits from similar patterns. Performance is ~20% better for best cases.

Fix #89915

Originally there are 3 `Array.Copy` during `List<T>.Insert`
when `Capacity` is maxed out. Avoid this extra copy with a modified `Grow`.

Fix dotnet#89915
@ghost ghost added area-System.Collections community-contribution Indicates that the PR has been added by a community member labels Aug 7, 2023
@ghost
Copy link

ghost commented Aug 7, 2023

Tagging subscribers to this area: @dotnet/area-system-collections
See info in area-owners.md if you want to be subscribed.

Issue Details

Originally there are 3 Array.Copy during List<T>.Insert when Capacity is maxed out. Avoid this extra copy with a modified Grow.

Microbenchmark at the issue page.

Fix #89915

Author: karakasa
Assignees: -
Labels:

area-System.Collections

Milestone: -

@karakasa karakasa changed the title Avoid extra copy when insertting on List at capacity Avoid extra copy when inserting to List at capacity Aug 7, 2023
@karakasa karakasa changed the title Avoid extra copy when inserting to List at capacity Avoid extra Array.Copy when inserting to List<T> at capacity Aug 7, 2023
@EgorBo
Copy link
Member

EgorBo commented Aug 7, 2023

If I understand it correctly, this gives us up to 3.5% imprvoement for the best case - insertion at the 0 index in a large array, right?

@karakasa
Copy link
Contributor Author

karakasa commented Aug 7, 2023

If I understand it correctly, this gives us up to 3.5% imprvoement for the best case - insertion at the 0 index in a large array, right?

The gain is 20%-ish for insertion at 0 into 10000-long list.

Length Type Method Time (ns) Ratio
100 Original Insert 74 1.00
100 Improved Insert 54 0.73
100 Original InsertRange 81 1.00
100 Improved InsertRange 68 0.84
10000 Original Insert 4340 1.00
10000 Improved Insert 3513 0.81
10000 Original InsertRange 5551 1.00
10000 Improved InsertRange 4475 0.81

(InsertRange: 100->100 or 10000->10000)

The correct benchmark should utilize [IterationSetup]. But for the sake of simplicity, I made two benchmarks (one generates the list only, one generates and inserts), and subtract the generation+insertion time with generation-only time.

@karakasa karakasa changed the title Avoid extra Array.Copy when inserting to List<T> at capacity Avoid extra Array.Copy when inserting into a full List<T> Aug 7, 2023
@karakasa karakasa changed the title Avoid extra Array.Copy when inserting into a full List<T> Avoid extra Array.Copy in List<T>.Insert(Range) when the list is full Aug 7, 2023
@karakasa karakasa marked this pull request as ready for review August 8, 2023 07:48
@karakasa karakasa requested a review from jkotas August 8, 2023 09:22
@jkotas
Copy link
Member

jkotas commented Aug 10, 2023

btw, should we replace Array.Copy with Span.Copy in List when possible? Since there's no type-safety issues. All types are T.

Maybe. It would be a tradeoff between steady state speed vs. and startup time and memory consumption. List<T> is one of the most popular generic types and it does not use span today, except in List.Sort that is very rarely used. If it were to start using span, each List<T> type instantiation would also need to load Span and ReadOnlySpan type instantiations that would regress startup time and increase memory cost of each List<T> instantiation.

This PR is a similar tradeoff at smaller scale. It makes each List<T> type instantiation more expensive to create and take more memory.

@karakasa
Copy link
Contributor Author

karakasa commented Aug 10, 2023

Maybe.

I just did a microbenchmark and it seems replacing Array.Copy with Span.CopyTo doesn't have big observable perf impact, even for 10000-long int list. So I removed that reply.

Method Size Mean Error StdDev
ImprovedInsertOne 100 205.8 ns 1.36 ns 1.13 ns
SpanInsertOne 100 198.0 ns 2.52 ns 2.36 ns
ImprovedInsertRange 100 215.8 ns 1.53 ns 1.35 ns
SpanInsertRange 100 232.6 ns 0.79 ns 0.70 ns
ImprovedInsertOne 10000 15,929.9 ns 50.44 ns 47.18 ns
SpanInsertOne 10000 15,896.6 ns 76.68 ns 71.73 ns
ImprovedInsertRange 10000 16,916.3 ns 63.25 ns 52.82 ns
SpanInsertRange 10000 16,902.4 ns 107.19 ns 100.27 ns

This PR is a similar tradeoff at smaller scale. It makes each List<T> type instantiation more expensive to create and take more memory.

Yes you are right.

@MichalPetryka
Copy link
Contributor

I just did a microbenchmark and it seems replacing Array.Copy with Span.CopyTo doesn't have big observable perf impact, even for 10000-long int list.

The bigger impact here would be with smaller collections, since Span.Copy then skips a few checks and allows unrolling.

@eiriktsarpalis eiriktsarpalis added this to the 9.0.0 milestone Aug 14, 2023
Copy link
Member

@adamsitnik adamsitnik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@karajas Thank you for your contribution!

PTAL at my comments, apply the changes and share updated perf numbers. Thanks!

@ghost ghost added the needs-author-action An issue or pull request that requires more info or actions from the author. label Oct 19, 2023
@karakasa karakasa marked this pull request as draft October 19, 2023 15:09
code formatting

Co-authored-by: Adam Sitnik <adam.sitnik@gmail.com>
@ghost ghost removed the needs-author-action An issue or pull request that requires more info or actions from the author. label Oct 19, 2023
@adamsitnik adamsitnik added the needs-author-action An issue or pull request that requires more info or actions from the author. label Oct 20, 2023
@adamsitnik adamsitnik self-assigned this Oct 20, 2023
Shared logics of getting new capacities in
Grow & GrowForInsertion are moved to
a separated method.
@ghost ghost removed the needs-author-action An issue or pull request that requires more info or actions from the author. label Oct 25, 2023
@karakasa
Copy link
Contributor Author

share updated perf numbers.

Method Toolchain Size Mean (ns) Ratio
Add 8 RC2 100 51.40 1.00
Add PR 100 51.30 1.00
Add PR-Reuse 100 50.40 0.98
AddRange 8 RC2 100 63.70 1.00
AddRange PR 100 61.70 0.97
AddRange PR-Reuse 100 63.60 1.00
InsertAt0 8 RC2 100 46.80 1.00
InsertAt0 PR 100 42.60 0.91
InsertAt0 PR-Reuse 100 41.00 0.88
InsertRangeAt0 8 RC2 100 70.70 1.00
InsertRangeAt0 PR 100 61.00 0.86
InsertRangeAt0 PR-Reuse 100 62.50 0.88
Add 8 RC2 10000 3,655.30 1.00
Add PR 10000 3,452.70 0.94
Add PR-Reuse 10000 3,478.50 0.95
AddRange 8 RC2 10000 3,849.00 1.00
AddRange PR 10000 3,695.20 0.96
AddRange PR-Reuse 10000 3,631.80 0.94
InsertAt0 8 RC2 10000 3,717.20 1.00
InsertAt0 PR 10000 2,674.50 0.72
InsertAt0 PR-Reuse 10000 2,654.10 0.71
InsertRangeAt0 8 RC2 10000 4,908.90 1.00
InsertRangeAt0 PR 10000 3,610.20 0.74
InsertRangeAt0 PR-Reuse 10000 3,666.20 0.75
code
public class InsertIntoExisting
{
    [Params(100, 10000)]
    public int Size { get; set; }
    private int[] _array;
    [GlobalSetup]
    public void InitArrayForInsertRange()
    {
        _array = new int[Size];
    }
    public List<int> PrepareList()
    {
        var list = new List<int>(Size);
        var size = Size;
        for (var i = 0; i < size; i++)
            list.Add(i);

        return list;
    }

    [Benchmark]
    public object Baseline()
    {
        return PrepareList();
    }
    [Benchmark]
    public object InsertAt0()
    {
        var list = PrepareList();
        list.Insert(0, 0);
        return list;
    }
    [Benchmark]
    public object InsertRangeAt0()
    {
        var list = PrepareList();
        list.InsertRange(0, _array);
        return list;
    }
    [Benchmark]
    public object Add()
    {
        var list = PrepareList();
        list.Add(0);
        return list;
    }
    [Benchmark]
    public object AddRange()
    {
        var list = PrepareList();
        list.AddRange(_array);
        return list;
    }
}
Method Toolchain Size Mean
Add_OneByOne 8 RC2 8 34.48 ns
Add_OneByOne PR 8 32.99 ns
Add_OneByOne PR-Reuse 8 32.49 ns
Prepend_OneByOne 8 RC2 8 90.57 ns
Prepend_OneByOne PR 8 87.40 ns
Prepend_OneByOne PR-Reuse 8 88.22 ns
Add_OneByOne 8 RC2 32 96.94 ns
Add_OneByOne PR 32 98.04 ns
Add_OneByOne PR-Reuse 32 97.90 ns
Prepend_OneByOne 8 RC2 32 433.29 ns
Prepend_OneByOne PR 32 418.13 ns
Prepend_OneByOne PR-Reuse 32 412.85 ns
code
public class OneByOneTest
{
    [Params(8, 32)]
    public int Size { get; set; }
    [Benchmark]
    public object Prepend_OneByOne()
    {
        var list = new List<int>();
        var size = Size;

        for (var i = 0; i < size; i++)
            list.Insert(0, 0);

        return list;
    }
    [Benchmark]
    public object Add_OneByOne()
    {
        var list = new List<int>();
        var size = Size;

        for (var i = 0; i < size; i++)
            list.Add(0);

        return list;
    }
}

@karakasa karakasa marked this pull request as ready for review October 25, 2023 05:57
@karakasa karakasa requested a review from adamsitnik October 25, 2023 07:12
Copy link
Member

@adamsitnik adamsitnik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you for your contribution @karakasa !

@adamsitnik adamsitnik merged commit b2ec505 into dotnet:main Oct 25, 2023
@adamsitnik adamsitnik added the tenet-performance Performance related issue label Oct 25, 2023
@karakasa karakasa deleted the issue-89915 branch October 25, 2023 11:16
@karakasa
Copy link
Contributor Author

@adamsitnik btw do you have thoughts on how to benchmark this, which requires a new List<T> for every iteration? Obviously I cannot use [IterationSetup] because each iteration is too short (and cannot be extended through loop)

Currently I added a baseline function and subtract results, but it can be unstable and troublesome.

    [Benchmark]
    public object Baseline()
    {
        return PrepareList();
    }

    [Benchmark]
    public object InsertAt0()
    {
        var list = PrepareList();
        list.Insert(0, 0);
        return list;
    }

@adamsitnik
Copy link
Member

@karakasa this is tricky. You can combine [IterationSetup] with [InvocationCount] which tells how many times a benchmark should be invoked per iteration and then in every benchmark ivocation, access an n-th List that is pre-configured in the setup. We do that for sorting in dotnet/performance:

https://github.com/dotnet/performance/blob/bc0b657f6a759c4a07589a6b5d66818345813304/src/benchmarks/micro/libraries/System.Collections/Sort.cs#L19

https://github.com/dotnet/performance/blob/bc0b657f6a759c4a07589a6b5d66818345813304/src/benchmarks/micro/libraries/System.Collections/Sort.cs#L37-L45

@stephentoub
Copy link
Member

stephentoub commented Oct 26, 2023

It doesn't necessarily require a new List. For a small amount of overhead, it could do:

CollectionsMarshal.SetCount(list, 0);
list.Capacity = 0;

or similar to put it back to the desired configuration. It could even use UnsafeAccessor to poke at list internals directly.

@glenn-slayden
Copy link

Original issue: #89915 (by me)

liveans pushed a commit to liveans/dotnet-runtime that referenced this pull request Nov 9, 2023
@ghost ghost locked as resolved and limited conversation to collaborators Dec 2, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

area-System.Collections community-contribution Indicates that the PR has been added by a community member tenet-performance Performance related issue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

List<T> Insert() - optimize memory copying

8 participants