KEMBAR78
Cache ROS constructed from arrays of constants (remaining types) by jcouv · Pull Request #69820 · dotnet/roslyn · GitHub
Skip to content

Conversation

jcouv
Copy link
Member

@jcouv jcouv commented Sep 5, 2023

Closes #69472 by caching arrays of constants when used in ROS construction for remaining types.

For something like:

    public static System.ReadOnlySpan<string> M()
        => new string[] { "hello", "world" };

we'll produce something like:

  new ReadOnlySpan<ElementType>(PrivateImplementationDetails.cachingField ??= new ElementType[] { ... constants ... })

@jcouv jcouv self-assigned this Sep 5, 2023
@ghost ghost added the untriaged Issues and PRs which have not yet been triaged by a lead label Sep 5, 2023
@jcouv jcouv force-pushed the cache-arrays branch 2 times, most recently from 3960ec4 to 81f7507 Compare September 5, 2023 22:05
@jcouv jcouv changed the title Cached ROS constructed from arrays of constants (remaining types) Cache ROS constructed from arrays of constants (remaining types) Sep 5, 2023
@jcouv jcouv added this to the 17.8 milestone Sep 5, 2023
@CyrusNajmabadi
Copy link
Member

CyrusNajmabadi commented Sep 5, 2023

Will this automatically work for collection expressions as well? Worth testing? #Resolved

@jcouv jcouv marked this pull request as ready for review September 6, 2023 00:05
@jcouv jcouv requested a review from a team as a code owner September 6, 2023 00:05
@jcouv jcouv marked this pull request as draft September 6, 2023 04:09
@jcouv jcouv marked this pull request as ready for review September 6, 2023 17:04
return true;
}

if (inPlaceTarget is not null)
Copy link
Member Author

@jcouv jcouv Sep 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📝 I'm not 100% sure about this part. Without it, we leave something on the stack which results in an assertion when the subsequent statement gets emitted. #Resolved

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not 100% sure about this part.

When we are relying on a constructor, I think we can handle inPlaceTarget case the same way it is handled in if (specialElementType.SizeInBytes() == 1) above. We also should be able to handle it in emitAsCachedArrayFromBlob, but we never reach it due to a similar early check. Might be worth adjusting that as well.

@jcouv
Copy link
Member Author

jcouv commented Sep 6, 2023

@dotnet/roslyn-compiler for review. Thanks

1 similar comment
@jcouv
Copy link
Member Author

jcouv commented Sep 11, 2023

@dotnet/roslyn-compiler for review. Thanks

return false;
}

if (inPlaceTarget is null && !used)
Copy link
Contributor

@AlekseyTs AlekseyTs Sep 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if (inPlaceTarget is null && !used)

Would it make sense to move this check above the previous if? #Closed

@AlekseyTs
Copy link
Contributor

AlekseyTs commented Sep 11, 2023

    private bool TryEmitReadonlySpanAsBlobWrapper(NamedTypeSymbol spanType, BoundExpression wrappedExpression, bool used, BoundExpression inPlaceTarget, out bool avoidInPlace, BoundExpression? start = null, BoundExpression? length = null)

Perhaps the name should be adjusted now #Closed


Refers to: src/Compilers/CSharp/Portable/CodeGen/EmitArrayInitializer.cs:398 in 1f8253c. [](commit_id = 1f8253c, deletion_comment = False)

var rosArrayCtor = (MethodSymbol?)Binder.GetWellKnownTypeMember(_module.Compilation, WellKnownMember.System_ReadOnlySpan_T__ctor_Array, _diagnostics, syntax: wrappedExpression.Syntax, isOptional: true);
if (rosArrayCtor is null)
// Emit: new ReadOnlySpan<T>(PrivateImplementationDetails.ArrayField ??= RuntimeHelpers.InitializeArray(new int[Length], PrivateImplementationDetails.DataField));
bool emitAsCachedArrayFromBlob(NamedTypeSymbol spanType, BoundExpression wrappedExpression, int elementCount, ImmutableArray<byte> data, ref ArrayTypeSymbol arrayType, TypeSymbol elementType)
Copy link
Contributor

@AlekseyTs AlekseyTs Sep 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

emitAsCachedArrayFromBlob

It doesn't look like extraction of code into this local function is necessary. The only call site is above its definition. #Closed

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main benefit of this extraction is that it makes the codeflow in the main body of the method clearer. The cases become clearer and in particular this last branch is a single case. I've made sure to keep the diff very clean to minimize review overhead.


if (constants.IsEmpty)
{
emitEmptyReadonlySpan(spanType, arrayCreation, used, inPlaceTarget);
Copy link
Contributor

@AlekseyTs AlekseyTs Sep 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

emitEmptyReadonlySpan

Does it mean that empty spans are optimized regardless of element type? If so, consider doing this optimization earlier and in one place, rather than having two different places checking for the same condition. #Closed

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moving that optimization earlier in the existing code affects some scenarios. Since that is not the purpose of the PR, I kept the original optimization in it's existing location.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moving that optimization earlier in the existing code affects some scenarios.

It doesn't look this way to me. We can discuss offline in more details

}

var initializers = initializer.Initializers;
if (initializers.Any(static init => init.ConstantValueOpt == null))
Copy link
Contributor

@AlekseyTs AlekseyTs Sep 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

init.ConstantValueOpt == null

I am not sure if caching an array of strings is such a good idea. This will hold on to strings forever, at the same time this could be the only time they are used. #Resolved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tagging @stephentoub for thoughts

Copy link
Member

@stephentoub stephentoub Sep 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will hold on to strings forever, at the same time this could be the only time they are used.

As string literals, won't they already be held on to forever as part of interning?

Even without that, this doesn't seem different to me from, say, lambda/delegate caching, where the first time a static lambda is used we cache a delegate to it, and we'll hold onto that delegate forever even if we never use it again.

Copy link
Contributor

@AlekseyTs AlekseyTs Sep 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As string literals, won't they already be held on to forever as part of interning?

To be honest, I do not know the answer. And whether the answer is the same for all flavors of frameworks out there.

this doesn't seem different to me from, say, lambda/delegate caching

Strings could be quite big though. And there could be a lot of them in a single initialization. Also, we do not cache delegates when they are created by using new. So, there is some control over that form of caching.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be honest, I do not know the answer. And whether the answer is the same for all flavors of frameworks out there.

It should be true for both .NET Framework and .NET Core, every time we encounter a string literal we add it to a global hashmap (string interning Stephen mentioned above) where it's essentially rooted forever. Except the cases with unloadable ALCs but I guess it should not be a problem here as well. E.g.

void Test(bool cond)
{
    if (cond)
        Console.WriteLine("true!!");
    else
        Console.WriteLine("false!!");
}

When JIT compiles this method (on its first execution) it will create string objects for both literals even if one of them (e.g. false!!!) will never be used - we might make it more efficient in future, but it's a current behavior of .NET Framework and .NET Core.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would even say that the behavior change caused by extraction of a local, or an inline of a local, might come as a big surprise.

There are plenty of situations where that's the case, including in the brand new collection expressions feature.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We would need to revisit every single use of this in dotnet/runtime where the source might be compiled downlevel, as this could regress all of them. Are you planning to do that?

If we decide that the behavior change, which, If remember correctly, was introduced without much discussion at the time (and likely specifically for the benefit of a single component in development at the time) was a mistake, and should be changed, then we will have to decide what to do with the component.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are plenty of situations where that's the case, including in the brand new collection expressions feature.

Doesn't mean it is a good thing. The nature of differences is not the same, and the impact is quite different. Each situation is somewhat unique.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree there is a non-zero chance this causes unexpected behavior for a customer. There are definitely customers out there that generate large string[] for initialization purposes that are effectively single use. I cannot specifically remember a case where it would combined with ReadOnlySpan<string> such that it would trigger this optimization but it's certainly possible.

There are other optimizations we've taken in the past that had the potential to negatively impact customer scenarios. Even simple optimizations like increased method group to delegate caching broke partner teams. It is always going to be a trade off.

The criteria I usually consider is:

  1. Is this on the whole going to be an improvement? In this case I believe the answer is yes it's overall going to produce significant wins compared to the potential downside.
  2. Is there a reasonable and documented way the user can undo the optimization if it's found to be negative? Consider as an example for method group caching the undo operation was just make the delegate allocation explicit: = new Action(Method) vs. = Method. What is the undo operation here? I believe assigning to an intermediate local would subvert the optimization. Is that the way we want to document? Whatever the answer is I would like for it to be explicitly listed in the PR / issue for customers to see.
  3. Are we violating anything in the language specification? For method group to delegate allocation we had to go back and confirm with LDM that they were okay with this change.

Assuming we have resolutions for (2) and (3) I would overall be in favor of moving forward. I tihnk we should consider an entry in the breaking change list though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was asked if I could make it clearer which optimization path I preferred: weak reference or not.

I would lean towards starting the non-weak reference approach. My rationale is

  1. Customers who find the behavior undesirable can use the undo mechanism
  2. If this does produce enough negative customer feedback we could flip to the WeakReference approach in an update / servicing fix.


ImmutableArray<ConstantValue> constants = initializers.Select(static init => init.ConstantValueOpt!).ToImmutableArray();

if (constants.IsEmpty)
Copy link
Contributor

@AlekseyTs AlekseyTs Sep 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if (constants.IsEmpty)

It looks like we don't need rosArrayCtor to handle this case #Closed


public override bool Equals((ImmutableArray<ConstantValue> Constants, ushort Value) x, (ImmutableArray<ConstantValue> Constants, ushort Value) y) =>
x.Value == y.Value &&
ByteSequenceComparer.Equals(x.Constants, y.Constants);
Copy link
Contributor

@AlekseyTs AlekseyTs Sep 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ByteSequenceComparer.Equals(x.Constants, y.Constants);

I am not sure what method is called here. Could you clarify please? #Closed

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was just wrong :-/ We're dealing with ConstantValue not bytes here. This was definitely not doing what I intended... Thanks for catching this.

switch (constant.Discriminator)
{
case ConstantValueTypeDiscriminator.Nothing:
return _singleZeroByteArray;
Copy link
Contributor

@AlekseyTs AlekseyTs Sep 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return _singleZeroByteArray;

Is this branch reachable? #Closed

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, for null constants. I'll add a comment

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for null constants. I'll add a comment

Consider using ConstantValueTypeDiscriminator.Null instead. "Nothing" is VB term.

static byte[] getBytes(ConstantValue constant)
{
Debug.Assert(Enum.GetValues(typeof(ConstantValueTypeDiscriminator)).Cast<ConstantValueTypeDiscriminator>().Max()
== ConstantValueTypeDiscriminator.DateTime);
Copy link
Contributor

@AlekseyTs AlekseyTs Sep 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't look like we depend on this condition below. If we are not handling a case, the code is going to throw below regardless of underlying value of ConstantValueTypeDiscriminator.DateTime. Instead, it might be better to add a comment in ConstantValueTypeDiscriminator that new values are likely to need a special treatment in this function. #Closed

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I'll shift to a comment.
That said this assertion would fail for existing tests as soon as a new ConstantValueTypeDiscriminator enum entry was added, before the right test was crafted. That would have also raised the right alarm.

return Encoding.Unicode.GetBytes(constant.StringValue!);

case ConstantValueTypeDiscriminator.NInt:
return BitConverter.GetBytes(constant.Int32Value);
Copy link
Contributor

@AlekseyTs AlekseyTs Sep 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BitConverter.GetBytes(constant.Int32Value)

The order of bytes in the array returned by the GetBytes method depends on whether the computer architecture is little-endian or big-endian. So, build on different machines might produce different bytes for the same value. Is this a concern given the purpose of this helper? #Closed


return bytes;
case ConstantValueTypeDiscriminator.DateTime:
return BitConverter.GetBytes(constant.DateTimeValue.Ticks);
Copy link
Contributor

@AlekseyTs AlekseyTs Sep 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return BitConverter.GetBytes(constant.DateTimeValue.Ticks);

Is this code path covered by tests? #Closed

}
""";
var compilation = CreateCompilationWithMscorlibAndSpan(src);
var verifier = CompileAndVerify(compilation, verify: Verification.Skipped);
Copy link
Contributor

@AlekseyTs AlekseyTs Sep 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CompileAndVerify(compilation, verify: Verification.Skipped);

Consider verifying content of the array at runtime. #Closed

CompileAndVerify(compilation, expectedOutput: "1 42", verify: Verification.Skipped).VerifyDiagnostics();
}

[Fact, WorkItem("https://github.com/dotnet/roslyn/issues/69472")]
Copy link
Contributor

@cston cston Dec 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WorkItem("#69472")

Did the behavior of this test or the next test change with this PR? #Resolved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, these two tests behave the same as before. This was to ensure there was no impact on existing scenarios. We probably have these two covered already in some fashion, but it was easier this way

IL_000b: ret
}
""");
}
Copy link
Contributor

@cston cston Dec 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

}

Consider testing new int?[] { null }. #Resolved

}
""";
var compilation = CreateCompilationWithMscorlibAndSpan(src);
var verifier = CompileAndVerify(compilation, expectedOutput: "ran", verify: Verification.Skipped);
Copy link
Contributor

@AlekseyTs AlekseyTs Dec 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

verify: Verification.Skipped

Why is verification skipped for this scenario? #Closed

Copy link
Member Author

@jcouv jcouv Dec 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PEVerify failed for assembly 'C:\Users\jcouv\AppData\Local\Temp\RoslynTests':
[ : C::MString][mdToken=0x6000006][offset 0x00000010] Cannot change initonly field outside its .ctor.
[ : C::MObject][mdToken=0x6000007][offset 0x00000010] Cannot change initonly field outside its .ctor.
[ : C::MC][mdToken=0x6000008][offset 0x00000010] Cannot change initonly field outside its .ctor.
[ : C::MC][mdToken=0x6000008][offset 0x00000015][found ref array mdarray 'System.Object[]'][expected ref array mdarray 'C[]'] Unexpected array type on the stack.

and

System.Exception : IL Verify failed unexpectedly:
[MString]: Cannot change initonly field outside its .ctor. { Offset = 0x10 }
[MString]: Return type is ByRef, TypedReference, ArgHandle, or ArgIterator. { Offset = 0x1a }
[MObject]: Cannot change initonly field outside its .ctor. { Offset = 0x10 }
[MObject]: Return type is ByRef, TypedReference, ArgHandle, or ArgIterator. { Offset = 0x1a }
[MC]: Cannot change initonly field outside its .ctor. { Offset = 0x10 }
[MC]: Unexpected type on the stack. { Offset = 0x15, Found = ref 'object[]', Expected = ref '[02d07b78-95a0-409e-bff4-1cbc39fcff23]C[]' }
[MC]: Return type is ByRef, TypedReference, ArgHandle, or ArgIterator. { Offset = 0x1a }

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[ : C::MString][mdToken=0x6000006][offset 0x00000010] Cannot change initonly field outside its .ctor.

This looks like a real problem. It looks like we shouldn't be marking the field readonly.

[ : C::MC][mdToken=0x6000008][offset 0x00000015][found ref array mdarray 'System.Object[]'][expected ref array mdarray 'C[]'] Unexpected array type on the stack.

This is concerning as well.

  1. At the very least, it looks like we should pay attention to IsPeVerifyCompatEnabled after all (and test that as well).
  2. Are we confident that we are not going to run in trouble trying to consume the span. Could that trigger some hard runtime type check failure in some scenarios, For example, when we try to store an element reference in a ref readonly local, etc. ?

It looks like we should verify expected verification failure reasons in the tests targeting scenarios affected by this PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made the field writable and removed arbitrary reference type scenario, so there's no PEVerify regression (so didn't add any tests targeting IsPeVerifyCompatEnabled case).
I added some usage of the ROS with a ref readonly local.

public class C
{
public static System.ReadOnlySpan<string> MString() => new string[] { null, null, null };
public static System.ReadOnlySpan<string> MString2() => new string[] { null, null, null };
Copy link
Contributor

@AlekseyTs AlekseyTs Dec 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MString2

Why the second versions are necessary? Should we verify IL for them too? #Closed

{
// Code size 27 (0x1b)
.maxstack 2
IL_0000: ldsfld "{{type}}[] <PrivateImplementationDetails>.709E80C88487A2411E1EE4DFB9F22A861492D20C4765150C0C794ABD70F8147C_B{{typeCode}}"
Copy link
Member Author

@jcouv jcouv Dec 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

{{type}}

Looking at this scenario again, I do think there's a problem. The type of the PrivateImplementationDetails field should be object[] for all references. But it seems order-dependent. I'll take a look #Resolved

@AlekseyTs AlekseyTs dismissed their stale review December 14, 2023 16:34

It looks like there are some issue to follow up on

@cston cston self-requested a review December 14, 2023 16:50
}

public override ImmutableArray<byte> MappedData => default(ImmutableArray<byte>);
public override bool IsReadOnly => false;
Copy link
Member Author

@jcouv jcouv Dec 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📝 Note this affects the caching field for blob wrappers too (existing logic, see tryEmitAsCachedArrayFromBlob which involves an assignment to the caching field, so it cannot be read only). But somehow those writes were not flagged by ILVerify warnings (see MultipleArrays_InPlaceAndUsed).

@jcouv jcouv requested a review from AlekseyTs December 15, 2023 00:22
[Main]: Unexpected type on the stack. { Offset = 0x8, Found = address of '<PrivateImplementationDetails>+__StaticArrayInitTypeSize=3', Expected = Native Int }
""";

var verifier = CompileAndVerify(compilation, expectedOutput: "3402", verify: Verification.Fails with { ILVerifyMessage = ilVerifyMessage, PEVerifyMessage = peVerifyMessage });
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verification.Fails with { ILVerifyMessage = ilVerifyMessage, PEVerifyMessage = peVerifyMessage });

Are we expecting verify to fail here?

@AlekseyTs
Copy link
Contributor

@jcouv It looks like correctness legs fail due to a formatting error


if (elementType.IsReferenceType && elementType.SpecialType != SpecialType.System_String)
{
return false;
Copy link
Contributor

@AlekseyTs AlekseyTs Dec 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return false;

Are we going to get here for object? Is this intentional? #Resolved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we get here for object (see ReadOnlySpanFromArrayOfConstants_Null) and it's intentional

Copy link
Contributor

@AlekseyTs AlekseyTs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM (commit 14)

@jcouv jcouv merged commit 6602cb4 into dotnet:main Dec 15, 2023
@ghost ghost modified the milestones: 17.9, Next Dec 15, 2023
@jcouv jcouv deleted the cache-arrays branch December 15, 2023 19:05
@Cosifne Cosifne modified the milestones: Next, 17.9 P3 Jan 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Area-Compilers untriaged Issues and PRs which have not yet been triaged by a lead

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Extend ReadOnlySpan optimization to all constant data

8 participants