KEMBAR78
Further special-case arrays in Enumerable.Chunk by stephentoub · Pull Request #99256 · dotnet/runtime · GitHub
Skip to content

Conversation

@stephentoub
Copy link
Member

Chunk is already testing the input to see if it's an array, in order to special-case empty arrays. We can use that existing check to also make the processing of non-empty arrays faster. The implementation for an array is simple and doesn't contribute a meaningful maintenance burden, so while Enumerable.Chunk already isn't a high-perf API, a dedicated path for arrays is significantly faster than the generic enumerable path. From perusal of use via grep.app, arrays being the actual underlying type is also reasonably common, that it seems worthwhile to throw a little extra code at it.

Addresses #85653 (comment)

Method Toolchain Mean Ratio Allocated Alloc Ratio
Count \main\corerun.exe 470.9 ns 1.00 1.26 KB 1.00
Count \pr\corerun.exe 180.8 ns 0.39 1.08 KB 0.86
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;

BenchmarkSwitcher.FromAssembly(typeof(Tests).Assembly).Run(args);

[MemoryDiagnoser(false)]
[HideColumns("Job", "Error", "StdDev", "Median", "RatioSD")]
public partial class Tests
{
    private double[] _values = Enumerable.Range(0, 100).Select(_ => Random.Shared.NextDouble()).ToArray();

    [Benchmark]
    public int Count()
    {
        int count = 0;
        foreach (var chunk in _values.Chunk(10)) count += chunk.Length;
        return count;
    }
}

@stephentoub stephentoub added area-System.Linq tenet-performance Performance related issue labels Mar 4, 2024
@stephentoub stephentoub added this to the 9.0.0 milestone Mar 4, 2024
@ghost
Copy link

ghost commented Mar 4, 2024

Tagging subscribers to this area: @dotnet/area-system-linq
See info in area-owners.md if you want to be subscribed.

Issue Details

Chunk is already testing the input to see if it's an array, in order to special-case empty arrays. We can use that existing check to also make the processing of non-empty arrays faster. The implementation for an array is simple and doesn't contribute a meaningful maintenance burden, so while Enumerable.Chunk already isn't a high-perf API, a dedicated path for arrays is significantly faster than the generic enumerable path. From perusal of use via grep.app, arrays being the actual underlying type is also reasonably common, that it seems worthwhile to throw a little extra code at it.

Addresses #85653 (comment)

Method Toolchain Mean Ratio Allocated Alloc Ratio
Count \main\corerun.exe 470.9 ns 1.00 1.26 KB 1.00
Count \pr\corerun.exe 180.8 ns 0.39 1.08 KB 0.86
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;

BenchmarkSwitcher.FromAssembly(typeof(Tests).Assembly).Run(args);

[MemoryDiagnoser(false)]
[HideColumns("Job", "Error", "StdDev", "Median", "RatioSD")]
public partial class Tests
{
    private double[] _values = Enumerable.Range(0, 100).Select(_ => Random.Shared.NextDouble()).ToArray();

    [Benchmark]
    public int Count()
    {
        int count = 0;
        foreach (var chunk in _values.Chunk(10)) count += chunk.Length;
        return count;
    }
}
Author: stephentoub
Assignees: -
Labels:

area-System.Linq, tenet-performance

Milestone: 9.0.0

@ghost ghost assigned stephentoub Mar 4, 2024
Chunk is already testing the input to see if it's an array, in order to special-case empty arrays. We can use that existing check to also make the processing of non-empty arrays faster. The implementation for an array is simple and doesn't contribute a meaningful maintenance burden, so while Enumerable.Chunk already isn't a high-perf API, a dedicated path for arrays is significantly faster than the generic enumerable path.  From perusal of use via grep.app, arrays being the actual underlying type is also reasonably common, that it seems worthwhile to throw a little extra code at it.
if (source is TSource[] array)
{
return [];
// Special-case arrays, which have an immutable length. This enables us to not only do an
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't we extend that assumption to more types if this method was returning an enumerator instead of an enumerable? IIRC we're already doing this in a few places, notably Take(Range).

Copy link
Member Author

@stephentoub stephentoub Mar 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not following. You mean implement a custom enumerable to type and length test in GetEnumerator? And if the length changes while enumerating, do what? Also, the benefit beyond array / List<> drops significantly due to not having a general efficient way to copy only a portion of the source.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct.

You mean implement a custom enumerable to type and length test in GetEnumerator? And if the length changes while enumerating, do what?

Undefined behavior, presumably. Currently it should throw an exception because of the version checks, but we are considering removing those altogether.

Copy link
Member Author

@stephentoub stephentoub Mar 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could add extra code for it, but I'm not sure it's worth it. We already have the check for arrays, in support of empty, and we would want to keep that; that check can't be made more general as it's only correct if the length is immutable. We could add a check for IList<T> and have a dedicated path for using its indexer, but there you're not able to copy en mass, so you're still going to be walking element-by-element. The main thing you'd gain is better presizing and simpler code for the enumeration (with one indexer access instead of two MoveNext/Currents), but you'd still need to factor in the possibility of the length changing out from under you.

@stephentoub stephentoub merged commit 445a3e8 into dotnet:main Mar 4, 2024
@stephentoub stephentoub deleted the chunkarray branch March 4, 2024 21:54
@github-actions github-actions bot locked and limited conversation to collaborators Apr 4, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

area-System.Linq tenet-performance Performance related issue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants