-
Notifications
You must be signed in to change notification settings - Fork 5.2k
Further special-case arrays in Enumerable.Chunk #99256
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Tagging subscribers to this area: @dotnet/area-system-linq Issue DetailsChunk is already testing the input to see if it's an array, in order to special-case empty arrays. We can use that existing check to also make the processing of non-empty arrays faster. The implementation for an array is simple and doesn't contribute a meaningful maintenance burden, so while Enumerable.Chunk already isn't a high-perf API, a dedicated path for arrays is significantly faster than the generic enumerable path. From perusal of use via grep.app, arrays being the actual underlying type is also reasonably common, that it seems worthwhile to throw a little extra code at it. Addresses #85653 (comment)
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
BenchmarkSwitcher.FromAssembly(typeof(Tests).Assembly).Run(args);
[MemoryDiagnoser(false)]
[HideColumns("Job", "Error", "StdDev", "Median", "RatioSD")]
public partial class Tests
{
private double[] _values = Enumerable.Range(0, 100).Select(_ => Random.Shared.NextDouble()).ToArray();
[Benchmark]
public int Count()
{
int count = 0;
foreach (var chunk in _values.Chunk(10)) count += chunk.Length;
return count;
}
}
|
Chunk is already testing the input to see if it's an array, in order to special-case empty arrays. We can use that existing check to also make the processing of non-empty arrays faster. The implementation for an array is simple and doesn't contribute a meaningful maintenance burden, so while Enumerable.Chunk already isn't a high-perf API, a dedicated path for arrays is significantly faster than the generic enumerable path. From perusal of use via grep.app, arrays being the actual underlying type is also reasonably common, that it seems worthwhile to throw a little extra code at it.
| if (source is TSource[] array) | ||
| { | ||
| return []; | ||
| // Special-case arrays, which have an immutable length. This enables us to not only do an |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couldn't we extend that assumption to more types if this method was returning an enumerator instead of an enumerable? IIRC we're already doing this in a few places, notably Take(Range).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not following. You mean implement a custom enumerable to type and length test in GetEnumerator? And if the length changes while enumerating, do what? Also, the benefit beyond array / List<> drops significantly due to not having a general efficient way to copy only a portion of the source.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct.
You mean implement a custom enumerable to type and length test in GetEnumerator? And if the length changes while enumerating, do what?
Undefined behavior, presumably. Currently it should throw an exception because of the version checks, but we are considering removing those altogether.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could add extra code for it, but I'm not sure it's worth it. We already have the check for arrays, in support of empty, and we would want to keep that; that check can't be made more general as it's only correct if the length is immutable. We could add a check for IList<T> and have a dedicated path for using its indexer, but there you're not able to copy en mass, so you're still going to be walking element-by-element. The main thing you'd gain is better presizing and simpler code for the enumeration (with one indexer access instead of two MoveNext/Currents), but you'd still need to factor in the possibility of the length changing out from under you.
Chunk is already testing the input to see if it's an array, in order to special-case empty arrays. We can use that existing check to also make the processing of non-empty arrays faster. The implementation for an array is simple and doesn't contribute a meaningful maintenance burden, so while Enumerable.Chunk already isn't a high-perf API, a dedicated path for arrays is significantly faster than the generic enumerable path. From perusal of use via grep.app, arrays being the actual underlying type is also reasonably common, that it seems worthwhile to throw a little extra code at it.
Addresses #85653 (comment)