-
Notifications
You must be signed in to change notification settings - Fork 5.2k
Description
EDITED by @stephentoub on 6/26/2023 with updated remaining APIs for review
main...stephentoub:runtime:immutablecollectionsbuilder
namespace System.Runtime.CompilerServices
{
+ [AttributeUsage(AttributeTargets.Class | AttributeTargets.Interface | AttributeTargets.Struct, Inherited = false, AllowMultiple = true)]
+ public sealed class CollectionBuilderAttribute : Attribute
+ {
+ public CollectionBuilderAttribute(Type builderType, string methodName);
+ public Type BuilderType { get; }
+ public string MethodName { get; }
+ }
}
namespace System.Collections.Immutable
{
+ [CollectionBuilder(typeof(ImmutableArray), "Create")]
public readonly struct ImmutableArray<T> { ... }
+ [CollectionBuilder(typeof(ImmutableHashSet), "Create")]
public sealed class ImmutableHashSet<T> { ... }
+ [CollectionBuilder(typeof(ImmutableList), "Create")]
public sealed class ImmutableList<T> { ... }
+ [CollectionBuilder(typeof(ImmutableQueue), "Create")]
public sealed class ImmutableQueue<T> { ... }
+ [CollectionBuilder(typeof(ImmutableSortedSet), "Create")]
public sealed class ImmutableSortedSet<T> { ... }
+ [CollectionBuilder(typeof(ImmutableStack), "Create")]
public sealed class ImmutableStack<T> { ... }
}For C# 12 / .NET 8, the attribute will only recognize the pattern CollectionType Method(ReadOnlySpan). However, the compiler may special-case system types it cares about to do something more efficient based on its knowledge of how they work, in particular for List<T> (which is otherwise supported via its support for collection initializers) and ImmutableArray<T> (which is otherwise supported via copy via the attribute to use Create). We can add more supported patterns in the future.
Background and motivation
the C# design group is moving forward with a proposal for a lightweight syntax construct collections: csharplang/collection-literals.md
We intend to ship this in C# 12 for linear collections (like List<T>, ImmutableArray<T>, HashSet<T>, etc.). We also intend to support this for map collections (like Dictionary<TKey, TValue>) though that may only be in 'preview' in C# 12.
Part of this proposal involves being able to efficiently construct certain collections (like List<T>), as well as construct immutable collections (which have generally never worked with the existing new ImmutableXXX<int>() { 1, 2, 3 } form). To that end, working with @stephentoub and @captainsafia , we've come up with a set of API proposals we'd like to work through with the runtime team to allow these types to "light up" with this language feature. This would be expected to align with the release of C#12.
API Proposal
The API shape is as follows (with all naming/shaping open to bike shedding):
A new attribute to be placed on a type to specify where to find the method responsible for constructing it:
namespace System.Runtime.CompilerServices
{
[AttributeUsage(AttributeTargets.Class | AttributeTargets.Struct | AttributeTargets.Interface, Inherited = false, AllowMultiple = true)]
public sealed class CollectionLiteralsBuilderAttribute : Attribute
{
/// <summary>Initialize the attribute to refer to the <paramref name="methodName"/> method on the <paramref name="builderType"/> type.</summary>
/// <param name="builderType">The type of the builder to use to construct the collection.</param>
/// <param name="methodName">
/// The method on the builder to use to construct the collection. This must refer to a static method,
/// or if the <paramref name="builderType"/> is also the type of the collection being constructed, it
/// may be an empty string to indicate a constructor should be used.
/// </param>
public CollectionLiteralsBuilderAttribute(Type builderType, string methodName)
{
BuilderType = builderType;
MethodName = methodName;
}
/// <summary>Gets the type of the builder to use to construct the collection.</summary>
public Type BuilderType { get; }
/// <summary>Gets the name of the method on the builder to use to construct the collection.</summary>
public string MethodName { get; }
}
}For the purposes of this proposal, you can consider this attribute added to every type whose factory method the proposal suggests.
Pattern1: Specialized entrypoints to efficiently construct List<T> and ImmutableArray<T>.
We would like to be able to construct these two types as efficiently as possible.
[CollectionLiteralsBuilder(typeof(CollectionsMarshal), Name = "Create")]
public class List<T> { }
[CollectionLiteralsBuilder(typeof(CollectionsMarshal), Name = "Create")]
public struct ImmutableArray<T> { }
public static class CollectionsMarshal
{
public static void Create<T>(int capacity, out List<T> list, out Span<T> storage);
public static void Create<T>(int capacity, out ImmutableArray<T> list, out Span<T> storage);
}These apis are placed in CollectionsMarshal as they are viewed as too advanced to be something we want to expose directly on the types themselves.
Here, the compiler would ask the runtime to create an instance of these types with an explicit capacity, and also be given direct access to the underlying storage backing these types. The compiler would then translate code like so:
// User code:
ImmutableArray<string> values = ["a", "b", "c"];
// Translation:
CollectionsMarshal.Create<string>(capacity: 3, out ImmutableArray<string> values, out Span<T> __storage);
__storage[0] = "a";
__storage[1] = "b";
__storage[2] = "c";This would be practically the lowest overhead possible, and would allow users to use immutable collections as easily as normal collections.
Note: the above api would incur a copy cost in the case of code involving async/await (as the compiler would need to make its own array that it populated, which it would then copy into the Span given by this api). We could avoid any copy overhead entirely if the above api were instead:
public static class CollectionsMarshal
{
public static void Create<T>(int capacity, out List<T> list, out T[] storage);
public static void Create<T>(int capacity, out ImmutableArray<T> list, out T[] storage);
}However, there was some reservation about an API (even one in CollectionsMarshal) directly exposing the array to operate over. Language/Compiler will use whatever api is provided here. So if you are comfortable on your end exposing this, then we'll use it. If not, we can always use the Span approach (with the additional cost in async contexts).
Pattern2: ReadOnlySpan<T> entrypoints to construct existing collection types.
This would allow lower overhead construction of certain types (no need for intermediary heap allocations), as well as providing a mechanism for constructing Immutable types (without intermediary builder allocations).
// Note: attributes will be placed on the actual immutable types to point to these factory methods.
// These have been elided for brevity.
public static class ImmutableHashSet
{
+ public static System.Collections.Immutable.ImmutableHashSet<T> Create<T>(System.ReadOnlySpan<T> items);
}
public static class ImmutableList
{
+ public static System.Collections.Immutable.ImmutableList<T> Create<T>(System.ReadOnlySpan<T> items);
}
public static class ImmutableQueue
{
+ public static System.Collections.Immutable.ImmutableQueue<T> Create<T>(System.ReadOnlySpan<T> items);
}
public static class ImmutableSortedSet
{
+ public static System.Collections.Immutable.ImmutableQueue<T> Create<T>(System.ReadOnlySpan<T> items);
}
public static class ImmutableStack
{
+ public static System.Collections.Immutable.ImmutableQueue<T> Create<T>(System.ReadOnlySpan<T> items);
}
+[CollectionLiteralsBuilder(typeof(Stack<>), ".ctor")] // demonstrates how the attribute can point to a constructor.
public class Stack<T>
{
+ public Stack(ReadOnlySpan<T> collection);
}
+[CollectionLiteralsBuilder(typeof(Queue<>), ".ctor")]
public class Queue<T>
{
+ public Stack(Queue<T> collection);
}API Usage
// User code:
ImmutableHashSet<string> values = ["a", "b", "c"];
// Translation:
ReadOnlySpan<string> storage = ["a", "b", "c"];
ImmutableHashSet<string> values = ImmutableHashSet.Create<string>(storage);// User code:
ImmutableArray<string> values = ["a", "b", "c"];
// Translation:
CollectionsMarshal.Create<string>(capacity: 3, out ImmutableArray<string> values, out Span<T> __storage);
__storage[0] = "a";
__storage[1] = "b";
__storage[2] = "c";Alternative Designs
- In the
List<T>/ImmutableArray<T>apis we have the options of returning the underlying array as a span or as an array. e.g.:
public Span<T> Create<T>(int capacity, out List<T> result);
public T[] Create<T>(int capacity, out List<T> result);
The latter would work better in async contexts, but has caused a small amount of concern about directly exposing values that could be placed on the heap. However, this capability is already possible today according to Stephen, so perhaps that is fine.
- In the
List<T>/ImmutableArray<T>we could use multiple out-params instead of out+return. i.e.:
public void Create<T>(int capacity, out List<T> result, out Span<T> storage); // or
public void Create<T>(int capacity, out List<T> result, out T[] storage);
If there is only a single method exposed, this is mainly a stylistic difference. If we did want both methods, this would allow for overloads.
Risks
Adding overloads to existing IEnumerable<T> taking methods introduced ambiguity errors for existing compilers. Specifically:
void M(IEnumerable<int> values); // existing method
void M(ReadOnlySpan<int> values); // new overload
M(new int[] { 1, 2, 3 }); // ambiguity error.The C# team is working through a proposal to make this not an error, and to prefer the ReadOnlySpan version as the preferred overload. This strongly matches the intuition we have that if you have overloads like this, they will have the same semantics, just that the latter will be lower overhead.
However, this may cause issues on other compilers that haven't updated, or on older compilers. To that end, we may want to emit the new ReadOnlySpan overload with a modreq so that older compilers do not consider it a viable option. Newer compilers will be ok with it and will then switch which overload they call on recompile.
This work is being tracked here: dotnet/csharplang#7276