KEMBAR78
Improve `.Distinct().ToList()` and `.Union(e).ToList()` by Windows10CE · Pull Request #95224 · dotnet/runtime · GitHub
Skip to content

Conversation

@Windows10CE
Copy link
Contributor

Enumerable.HashSetToList fills the result list by copying from the set to the list itself, but this can be done faster (and more simply) by using the List<T>(IEnumerable<T>) constructor to call HashSet<T>.CopyTo(T[]).

public class Bench
{
    [Params(10, 1000, 1_000_000)]
    public int Count { get; set; }

    private int[] _arr;
    
    [GlobalSetup]
    public void Init()
    {
        _arr = new int[Count];
        new Random(3).NextBytes(MemoryMarshal.AsBytes(_arr.AsSpan()));
    }
    
    [Benchmark]
    public List<int> DistinctToList()
    {
        return _arr.Distinct().ToList();
    }
}

BenchmarkDotNet v0.13.10, Arch Linux
AMD Ryzen 9 5900X, 1 CPU, 24 logical and 12 physical cores
.NET SDK 8.0.100
  [Host]     : .NET 8.0.0 (8.0.23.53103), X64 RyuJIT AVX2
  Job-KQNFCO : .NET 8.0.1 (42.42.42.42424), X64 RyuJIT AVX2
  Job-BSODDZ : .NET 8.0.0 (8.0.23.53103), X64 RyuJIT AVX2

Runtime=.NET 8.0  

Method Job Toolchain Count Mean Error StdDev Ratio RatioSD
DistinctToList Job-KQNFCO CoreRun 10 119.9 ns 2.42 ns 2.49 ns 0.86 0.02
DistinctToList Job-BSODDZ net8.0 10 138.2 ns 1.33 ns 1.11 ns 1.00 0.00
DistinctToList Job-KQNFCO CoreRun 1000 6,280.2 ns 60.54 ns 56.63 ns 0.80 0.01
DistinctToList Job-BSODDZ net8.0 1000 7,831.8 ns 47.20 ns 44.15 ns 1.00 0.00
DistinctToList Job-KQNFCO CoreRun 1000000 17,175,312.7 ns 341,877.48 ns 406,980.82 ns 0.94 0.02
DistinctToList Job-BSODDZ net8.0 1000000 18,496,042.9 ns 123,385.89 ns 96,331.59 ns 1.00 0.00

@ghost ghost added community-contribution Indicates that the PR has been added by a community member area-System.Linq labels Nov 25, 2023
@ghost
Copy link

ghost commented Nov 25, 2023

Tagging subscribers to this area: @dotnet/area-system-linq
See info in area-owners.md if you want to be subscribed.

Issue Details

Enumerable.HashSetToList fills the result list by copying from the set to the list itself, but this can be done faster (and more simply) by using the List<T>(IEnumerable<T>) constructor to call HashSet<T>.CopyTo(T[]).

public class Bench
{
    [Params(10, 1000, 1_000_000)]
    public int Count { get; set; }

    private int[] _arr;
    
    [GlobalSetup]
    public void Init()
    {
        _arr = new int[Count];
        new Random(3).NextBytes(MemoryMarshal.AsBytes(_arr.AsSpan()));
    }
    
    [Benchmark]
    public List<int> DistinctToList()
    {
        return _arr.Distinct().ToList();
    }
}

BenchmarkDotNet v0.13.10, Arch Linux
AMD Ryzen 9 5900X, 1 CPU, 24 logical and 12 physical cores
.NET SDK 8.0.100
  [Host]     : .NET 8.0.0 (8.0.23.53103), X64 RyuJIT AVX2
  Job-KQNFCO : .NET 8.0.1 (42.42.42.42424), X64 RyuJIT AVX2
  Job-BSODDZ : .NET 8.0.0 (8.0.23.53103), X64 RyuJIT AVX2

Runtime=.NET 8.0  

Method Job Toolchain Count Mean Error StdDev Ratio RatioSD
DistinctToList Job-KQNFCO CoreRun 10 119.9 ns 2.42 ns 2.49 ns 0.86 0.02
DistinctToList Job-BSODDZ net8.0 10 138.2 ns 1.33 ns 1.11 ns 1.00 0.00
DistinctToList Job-KQNFCO CoreRun 1000 6,280.2 ns 60.54 ns 56.63 ns 0.80 0.01
DistinctToList Job-BSODDZ net8.0 1000 7,831.8 ns 47.20 ns 44.15 ns 1.00 0.00
DistinctToList Job-KQNFCO CoreRun 1000000 17,175,312.7 ns 341,877.48 ns 406,980.82 ns 0.94 0.02
DistinctToList Job-BSODDZ net8.0 1000000 18,496,042.9 ns 123,385.89 ns 96,331.59 ns 1.00 0.00
Author: Windows10CE
Assignees: -
Labels:

area-System.Linq, community-contribution

Milestone: -

@Windows10CE
Copy link
Contributor Author

@dotnet-policy-service agree

@huoyaoyuan
Copy link
Member

Can you compare with latest main branch too, and include memory diagnoser? HashSet has a struct enumerator and I'd expect it to be faster with direct foreach.

@Windows10CE
Copy link
Contributor Author

Windows10CE commented Nov 26, 2023

Can you compare with latest main branch too, and include memory diagnoser? HashSet has a struct enumerator and I'd expect it to be faster with direct foreach.

Ah, yes, thank you for saying this, it is indeed less of an improvement than that first bench shows, most of the improvements gained here were also gained with #86796 (when I had checked earlier, I had thought this PR was included in net8, which was why I just benched against that, but looking again it was not).

Here are the new results:

Method Job Toolchain Count Mean Error StdDev Ratio RatioSD Gen0 Gen1 Gen2 Allocated Alloc Ratio
DistinctToList Job-HVDLNJ .NET 8.0 10 132.3 ns 1.97 ns 1.85 ns 1.00 0.00 0.0291 - - 488 B 1.00
DistinctToList Job-GVVNDQ main 10 118.9 ns 0.36 ns 0.28 ns 0.90 0.01 0.0291 - - 488 B 1.00
DistinctToList Job-NUYUKR #95224 10 112.1 ns 1.75 ns 1.64 ns 0.85 0.02 0.0291 - - 488 B 1.00
DistinctToList Job-HVDLNJ .NET 8.0 1000 8,013.0 ns 151.11 ns 155.18 ns 1.00 0.00 1.2970 0.0763 - 21920 B 1.00
DistinctToList Job-GVVNDQ main 1000 6,902.9 ns 136.73 ns 162.76 ns 0.86 0.03 1.3046 0.0763 - 21920 B 1.00
DistinctToList Job-NUYUKR #95224 1000 6,623.9 ns 98.54 ns 92.17 ns 0.83 0.02 1.3046 0.0763 - 21920 B 1.00
DistinctToList Job-HVDLNJ .NET 8.0 1000000 19,286,187.6 ns 314,736.06 ns 262,818.86 ns 1.00 0.00 406.2500 406.2500 406.2500 22603123 B 1.00
DistinctToList Job-GVVNDQ main 1000000 17,551,160.4 ns 133,451.94 ns 111,438.41 ns 0.91 0.02 406.2500 406.2500 406.2500 22603123 B 1.00
DistinctToList Job-NUYUKR #95224 1000000 17,392,611.3 ns 108,270.07 ns 101,275.89 ns 0.90 0.01 343.7500 343.7500 343.7500 22603082 B 1.00

Still a very small improvement, but I apologize for the misleading benchmark in the original description 😅

Copy link
Member

@eiriktsarpalis eiriktsarpalis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still a very small improvement, but I apologize for the misleading benchmark in the original description 😅

No worries at all. This simplifies the implementation while still doing better than main. Thank you for the contribution :-)

@eiriktsarpalis eiriktsarpalis merged commit ddd663d into dotnet:main Nov 27, 2023
@Windows10CE Windows10CE deleted the distinct-tolist-improvements branch November 27, 2023 11:49
@github-actions github-actions bot locked and limited conversation to collaborators Dec 28, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

area-System.Linq community-contribution Indicates that the PR has been added by a community member

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants