KEMBAR78
Frozen collection follow-ups · Issue #77891 · dotnet/runtime · GitHub
Skip to content

Frozen collection follow-ups #77891

@stephentoub

Description

@stephentoub

#77799 adds new FrozenDictionary and FrozenSet types. These are in decent shape but not yet ready to ship. Several follow-ups are required:

  • Frozen collection construction performance #87964
    The initial use cases for these types was for long-lived processes, where overall it's a better use of resources to spend more time at construction time to optimize the data structures and algorithms used for later data access. However, the current construction time is one or more orders of magnitude more than for the standard dictionary/set types. We need to a) invest in driving down those construction costs, and b) consider making the default performance of ToFrozenDictionary/Set as close that of Dictionary/Set as possible via an additional optional parameter to these methods, e.g.
public static FrozenSet<T> ToFrozenSet<T>(this IEnumerable<T> source, IEqualityComparer<T>? comparer = null,
+    double constructionOptimization = default // pick a better name
);

When 0.0, this would effectively default to creating a FrozenSet/Dictionary that just wrapped a HashSet/Dictionary, providing the same FrozenSet/Dictionary semantics but primarily just as a wrapper for a HashSet/Dictionary, such that the only additional costs were cloning of the data (assuming the Items/Keys/Values properties remain as they are today, they could be lazily-created on first need). On a scale from 0.0 to 1.0, the larger the double value, the more optimization would be performed at construction time, enabling the caller to choose how much time is spent at construction vs how fast subsequent data access can be made, with 1.0 meaning construction would take the longest in exchange for fastest data access. Having the default be fast construction would make this the least surprising for casual use and would avoid falling into pits of failure in the average case; consumers willing to spend more time at construction because they know their processes will be long-lived can then opt-in to longer startup times.

  • Frozen Collection Follow-up: Items/Keys/Values properties #86719 blocking-release
    The current design has Items/Keys/Values properties that return ImmutableArray<>. This is to enable access to the data as a ReadOnlySpan<>, indexibility, via refs, and with very fast enumeration. This does, however, place a requirement on implementations that they either store the keys/values/items separately and in contiguous memory or that implementations maintain a second copy of the data that does so. If we'd be willing to give up on the ReadOnlySpan<> and indexing capabilities, these types should instead be changed to some FrozenEnumerable<>-like type that would allow different implementations to vary how the data was stored.
  • Small collections
    There's currently an optimized implementation for empty collections, but not for other small sizes, e.g. <= 8 elements. We should investigate adding simple implementations that don't do any hashing or other complicated storage for small sizes. This includes looking at various code bases / sources of information to determine the most common sizes to be optimized for, e.g. is it worth having a dedicated implementation for collections of just 1 or 2 elements.
  • Other data structures and algorithms
    Would a perfect hashing implementation yield any benefits? Would tries be useful in some situations for strings? Should we consider doing reflection emit code gen in some circumstances?
  • Trimming friendliness
    The factory nature here means that every implementation needs to be kept by a trimmer regardless of whether it's actually used (though a trimmer should, even if not today, be able to get rid of implementations unique to Ts where it can see those Ts are never used). We thus need to be thoughtful about how many custom implementations there are, and potentially have one or more feature switches to exclude one or more that add only incremental value, potentially opting them out by default.
  • Source generation
    If/when Roslyn source generators support some kind of mechanism that would allow a source generator to look at constant data in the C# file and augment ToFrozenSet/Dictionary call sites in some manner, it would be beneficial if the optimizations could be performed at compile-time, feeding something into an optional parameter to ToFrozenSet/Dictionary to enable them to be fast, with all the work supplied at build time. This would likely also require more work on the API to make the existing types extensible, which is something we've actively avoided today.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions