KEMBAR78
JIT: Don't use write barriers for frozen objects by EgorBo · Pull Request #76135 · dotnet/runtime · GitHub
Skip to content

Conversation

@EgorBo
Copy link
Member

@EgorBo EgorBo commented Sep 24, 2022

This PR implements @jkotas's idea #76112 (comment) to skip write-barriers for frozen objects, e.g.:

string field;

void Test(string[] array)
{
    array[0] = "str1";

    field = "str";
    // and same for `typeof(X)` and all future uses of FOH
}

codegen diff: https://www.diffchecker.com/J1FH2GlX

jit-diffs/spmi diffs are promising (up to Total bytes of delta: -719913 (-0.62 % of base))

https://dev.azure.com/dnceng-public/public/_build/results?buildId=28997&view=ms.vss-build-web.run-extensions-tab

TODO:

  1. See if we can use VN here to handle locals (if they're of TYP_REF type and non-null) since CSE is enabled for const handles.
  2. Not now: during inlining if we pass a const string to a callee where that string is passed to a field we can give an additional boost (to avoid wb)

I decided to file a separate PR since it's not directly related to #76112 and also, it's not possible to run CI diffs in that PR due to JIT-EE changes.

@ghost ghost added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Sep 24, 2022
@ghost ghost assigned EgorBo Sep 24, 2022
@ghost
Copy link

ghost commented Sep 24, 2022

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

This PR implements @jkotas's idea #76112 (comment) to skip write-barriers for frozen objects, e.g.:

string field;

void Test(string[] array)
{
    array[0] = "str1";

    field = "str";
}

codegen diff: https://www.diffchecker.com/J1FH2GlX

jit-diffs/spmi diffs are promising (up to Total bytes of delta: -719913 (-0.62 % of base))

TODO:

  1. See if we can use VN here to handle locals (if they're of TYP_REF type and non-null) since CSE is enabled for const handles.
  2. Not now: during inlining if we pass a const string to a callee where that string is passed to a field we can give an additional boost (to avoid wb)

I decide to file a separate PR since it's not directly related to #76112 and also, it's not possible to run CI diffs in that PR due to JIT-EE changes.

Author: EgorBo
Assignees: -
Labels:

area-CodeGen-coreclr

Milestone: -

@stephentoub
Copy link
Member

Does this have an impact on the memory model @VSadov just documented?

@EgorBo
Copy link
Member Author

EgorBo commented Sep 24, 2022

I assume it's purely an implementation detail as the frozen objects live forever, there is a thread going on where it's being discussed whether to expose it as a concept for users or not (hopefully, nobody will ever rely on string objects to be pinned by default always).

Or you ask about this case:

myIntField = 42;
myObjField = myObj;
myIntFieldInited = true;

that the middle assignment is suppose to work as a full memory barrier (as a side-effect)?

@jkotas
Copy link
Member

jkotas commented Sep 24, 2022

Do we have any runs with GC stress runs with HEAPVERIFY_BARRIERCHECK? It may need fixes to not complain about these.

@jkotas
Copy link
Member

jkotas commented Sep 24, 2022

cc @dotnet/gc

@jkotas
Copy link
Member

jkotas commented Sep 24, 2022

memory model

It is about this paragraph: https://github.com/dotnet/runtime/blob/aaeadfc1e1078139b657037dafd386212e06b94b/docs/design/specs/Memory-model.md#object-assignment

I do not think that what's written in this paragraph is going to hold for mutable frozen objects with this optimization. We have mutable frozen object in native AOT only currently, but we may want to have them in CoreCLR as well.

@VSadov Thoughts?

@VSadov
Copy link
Member

VSadov commented Sep 24, 2022

Does this have an impact on the memory model

we need to ensure that the thread which creates an immortal object performs a write fence before making the obj available to other threads.
I assume we do that, as we would see failures on ARM, but did not look closely.

The actual write barrier is unnecessary.

@EgorBo
Copy link
Member Author

EgorBo commented Sep 24, 2022

Do we have any runs with GC stress runs with HEAPVERIFY_BARRIERCHECK?

I personally don't see any pipelines but I assume GC team knows better, as for local testing - it always throws "Not enough memory to run HeapVerify level 2 for me 🤔

@VSadov
Copy link
Member

VSadov commented Sep 24, 2022

We have mutable frozen object in native AOT only currently

I thought that the constraint is - no writeable ref fields/elements.

Also the part that we do not trace into immortal objects implies that they cannot root regular objects.

@stephentoub
Copy link
Member

Or you ask about this case

Right. Specifically: "Object assignment to a location potentially accessible by other threads is a release with respect to write operations to the instance’s fields and metadata"

@EgorBo
Copy link
Member Author

EgorBo commented Sep 24, 2022

Or you ask about this case

Right. Specifically: "Object assignment to a location potentially accessible by other threads is a release with respect to write operations to the instance’s fields and metadata"

Good question, technically that statement is violated here, right @VSadov (I assume when JIT optimizes user's code and replaces field = obj; with field = null; where then WB is not emitted it also violated in .NET, am I correct?

@EgorBo
Copy link
Member Author

EgorBo commented Sep 24, 2022

SPMI diffs are ready:

image

@VSadov
Copy link
Member

VSadov commented Sep 24, 2022

field = null; is ok to do as ordinary assignment as null is not an object - it can’t miss a field assignment or a method table

@EgorBo
Copy link
Member Author

EgorBo commented Sep 24, 2022

field = null; is ok to do as ordinary assignment as null is not an object - it can’t miss a field assignment or a method table

I meant the initial user's code where it's not null, so user wrote code like this:

o.a = 42;
o.b = GetMyObject();
o.c = true;

where assignment to b is used as a commit (referring your doc) but after jit optimizations it turns out that GetMyObject is inlined as null

@cshung
Copy link
Contributor

cshung commented Sep 24, 2022

Do we have any runs with GC stress runs with HEAPVERIFY_BARRIERCHECK?

I personally don't see any pipelines but I assume GC team knows better, as for local testing - it always throws "Not enough memory to run HeapVerify level 2 for me 🤔

I vaguely remember the shadow heap for write barrier debugging is not implemented for regions, I might be wrong, it has been quite a while.

@EgorBo
Copy link
Member Author

EgorBo commented Sep 24, 2022

I vaguely remember the shadow heap for write barrier debugging is not implemented for regions,

Right, it attempts to reserve 256Gb of memory via VirtualAlloc, I'll check with the standalone one

@VSadov
Copy link
Member

VSadov commented Sep 24, 2022

o.a = 42;
o.b = GetMyObject();
o.c = true;

I am confused - what is returned by GetMyObject() ? Is it a null or an actual object?
Either way I do not think inlining changes anything.

We do not recommend to rely on sideeffects of object assignments for general purpose ordering as compiler might optimize and potentially reorder or coalesce ordinary assignments.
What we guarantee is that published object will be consistent with respect to writes made to the object, including setting up its object header. When another thread reads o.b, it will not see a corrupted instance.

if o.a = 42 must happen before o.b = GetMyObject() you need to do Volatile.Write(ref o.b, GetMyObject());, and most likely pair this with a read fence in on the consuming side.

@EgorBo
Copy link
Member Author

EgorBo commented Sep 24, 2022

@VSadov thanks for the explanation, this, basically, answers @stephentoub's question that this PR doesn't change anything in the memory model and still satisfies all the statements, am I right?

Regarding HEAPVERIFY_BARRIERCHECK - I tried to run several OSS projects I have locally with standalone gc and these:

DOTNET_HeapVerify=2
COMPlus_GCName=clrgc.dll
COMPlus_GCSegmentSize=800000
no asserts so far (Checked runtime)

@EgorBo EgorBo marked this pull request as ready for review September 24, 2022 23:11
@cshung
Copy link
Contributor

cshung commented Sep 24, 2022

We have mutable frozen object in native AOT only currently

I thought that the constraint is - no writeable ref fields/elements.

Also the part that we do not trace into immortal objects implies that they cannot root regular objects.

Yes, as of now, we do not allow mutable GC references on frozen objects for various reasons. In a test case, I commented on the restriction of having references on Frozen segments.

// It is not okay for a frozen object to reference another frozen object
// This is because the WriteBarrier code may (depending on the pointer
// value returned by AllocHGlobal) determine node2 to be an ephemeral object
// when it isn't.
// node1.next = node2;
// It is not okay for a frozen object to reference another object that is not frozen
// This is because we may miss the marking of the new Node or miss the relocation
// of the new Node.
// node2.next = new Node();

In view of this, we do have an opportunity. With regions only - we have a single immutable reserve range. We might be able to lift the first constraint with minimal effort. The impact could be significant, imagine someone could put their entire cache object graph frozen segments and thus basically ignored during any GC collections, that could be a huge saving for Gen2 time for those types of applications.

@VSadov
Copy link
Member

VSadov commented Sep 24, 2022

we need to ensure that the thread which creates an immortal object performs a write fence before making the obj available to other threads.
I assume we do that, as we would see failures on ARM, but did not look closely.

It looks like the whole string interning deal happens under m_HashTableCrstGlobal lock.
That should be sufficient for the purpose of "publishing must be a release"

@EgorBo
Copy link
Member Author

EgorBo commented Sep 24, 2022

we need to ensure that the thread which creates an immortal object performs a write fence before making the obj available to other threads.
I assume we do that, as we would see failures on ARM, but did not look closely.

It looks like the whole string interning deal happens under m_HashTableCrstGlobal lock. That should be sufficient for the purpose of "publishing must be a release"

FrozenObjectHeapManager also has its own lock

Yes, as of now, we do not allow mutable GC references on frozen objects for various reasons.

And CoreCLR currently doesn't allocate objects with mutable GC references on FOH (RuntimeType object has a gc field keepalive but it's always null in case of FOH). Not sure about NativeAOT's usage but presumably the same?

@EgorBo
Copy link
Member Author

EgorBo commented Sep 24, 2022

/azp run runtime-coreclr outerloop, runtime-coreclr gcstress0x3-gcstress0xc, runtime-extra-platforms

@azure-pipelines
Copy link

Azure Pipelines successfully started running 3 pipeline(s).

@VSadov
Copy link
Member

VSadov commented Sep 24, 2022

FrozenObjectHeapManager also has its own lock

That may not be helpful if it protects only creation of the string as it would not guarantee order in which object and its content are committed. (so if you read something without a lock you cannot assume that having an instance reference implies the instance is consistent)

However m_HashTableCrstGlobal is taken by both readers and writers, so it will guarantee consistency of everything in the table.

@VSadov
Copy link
Member

VSadov commented Sep 25, 2022

Not sure about NativeAOT's usage but presumably the same?

NativeAOT freezes a readonly graph of frozen objects. They may refer to each other.

@VSadov
Copy link
Member

VSadov commented Sep 25, 2022

@VSadov thanks for the explanation, this, basically, answers @stephentoub's question that this PR doesn't change anything in the memory model and still satisfies all the statements, am I right?

Yes. We are ok here.

A string literal becomes shared with other threads when it is added to the literal map - that happens even before the string is exposed to managed code. Other threads may already use the string ref when JIT-ting other methods.

We are ok with this as the literal map is accessed under a lock. The first thing the StringLiteralMap::GetStringLiteral does is to take a lock.

CrstHolder gch(&(SystemDomain::GetGlobalStringLiteralMap()->m_HashTableCrstGlobal));

Note: it does not matter if a string was allocated as immortal or ordinary pinned. The same requirements and the same reason why we are ok.

@stephentoub
Copy link
Member

Since these are ordinary assignments it would be ok if they are reordered.

Prior to this PR, we wouldn't treat those writes into the array as having release semantics?

We've never documented our memory model (until your doc in flight now). As a result devs rely on observed behavior as well as correctly or incorrectly conveyed word of mouth.

@VSadov
Copy link
Member

VSadov commented Sep 25, 2022

It will break code as in my example, will it not?

Yes it might, because the example relies on implementation details that are not a part of the contract.

Many optimizations have the same problem (dependence on implementation details) and we should look carefully at the potential for breakage.
I think in this case the potential for breakage is very low.

@stephentoub
Copy link
Member

because the example relies on implementation details that are not a part of the contract.

My point is there is no contract because we never wrote one down.

@VSadov
Copy link
Member

VSadov commented Sep 25, 2022

My point is there is no contract because we never wrote one down.

I think if the doc existed before, we would still be in the same situation - thinking whether this might break a lot of code.

I think we are ok, since we allowed similar optimizations in the past (i.e. assignments of null do not do fences, local assignments do not call the barrier at all, and that would depend on inlining, etc.. ).

This seems similar to when we made readonly statics not modifiable. It also had potential to break some corner cases, but did we document that as a breaking change?

@VSadov
Copy link
Member

VSadov commented Sep 25, 2022

I think we are in agreement on main points, but correct me if I am wrong:

  1. This is not formally a breaking change. Well, you can't break the spec that was never written, but it fits the overall model/idea that we followed in the past and will not contradict the spec that we will publish.
  2. There is a small possibility of breaking existing code. Not having a spec just makes it a tiny bit more likely.
  3. It might be worth it to put a note in the breaking changes, just because of #2 above. - to warn people or to point to the breaking changes if something does break.

@jkotas
Copy link
Member

jkotas commented Sep 25, 2022

This seems similar to when we made readonly statics not modifiable. It also had potential to break some corner cases, but did we document that as a breaking change?

We did not have a formal breaking change documentation process at that time. We just tagged the breaking change PRs with a label. The readonly statics PR did have that label.

@VSadov VSadov added the breaking-change Issue or PR that represents a breaking API or functional change over a prerelease. label Sep 25, 2022
@ghost ghost added the needs-breaking-change-doc-created Breaking changes need an issue opened with https://github.com/dotnet/docs/issues/new?template=dotnet label Sep 25, 2022
@ghost
Copy link

ghost commented Sep 25, 2022

Added needs-breaking-change-doc-created label because this PR has the breaking-change label.

When you commit this breaking change:

  1. Create and link to this PR and the issue a matching issue in the dotnet/docs repo using the breaking change documentation template, then remove this needs-breaking-change-doc-created label.
  2. Ask a committer to mail the .NET Breaking Change Notification DL.

Tagging @dotnet/compat for awareness of the breaking change.

@jkotas
Copy link
Member

jkotas commented Sep 25, 2022

BTW: The GC has this code since forever:

            // Write barriers for stores of references to frozen objects may be optimized away.
            if (!gc_heap::frozen_object_p(*ptr))

It suggests that this optimization was implemented in the some existing runtime that used frozen objects (.NET Framework 2.0 or .NET Native for UWP).

@jkotas
Copy link
Member

jkotas commented Sep 25, 2022

I think we are in agreement on main points, but correct me if I am wrong:

I would like to see the memory model spec that you are working on updated to be more precise in this area.

Another example:

class MyType
{
    int counter;
    
    // Assume that this instance is allocated in frozen heap
    static readonly MyType singleton = new MyType();
     
    static MyType current;
        
    static void M()
    {
        singleton.counter++;
        
        // Can this be a regular store or does the model require it to be a release?
        current = singleton; 
    }
    
    ... some other uses of current and singleton ...
}

It might be worth it to put a note in the breaking changes,

It may be better to write the breaking change for the memory model spec instead of this detail. Say something like that there was a lot of ambiguity around the guarantees here and that the runtimes will only provide the guarantees written down going forward.

@stephentoub
Copy link
Member

It may be better to write the breaking change for the memory model spec instead of this detail. Say something like that there was a lot of ambiguity around the guarantees here and that the runtimes will only provide the guarantees written down going forward.

Sounds good.

@VSadov
Copy link
Member

VSadov commented Sep 25, 2022

Another example:

I see. We assume that a frozen obj is immutable and thus we can elide the fence, but what if we allow mutations?
Still no need for GC barrier, but perhaps we will need to emit stlr when publishing mutable frozen objects.

Also what about doing GetHashcode or taking a lock on a string literal before posting a ref to it to a shared location?
If the changes to syncblock are always full fences it could be ok.

@jkotas
Copy link
Member

jkotas commented Sep 25, 2022

Still no need for GC barrier, but perhaps we will need to emit stlr when publishing mutable frozen objects.

Right. The current change in this PR is not doing that. It will need augmenting to make it work. I think we will need a new method on the JIT interface that returns whether the frozen object is immutable.

Also what about doing GetHashcode or taking a lock on a string literal before posting a ref to it to a shared location?

These are always full fences. I do not see a problem with these.

@EgorBo
Copy link
Member Author

EgorBo commented Sep 25, 2022

Right. The current change in this PR is not doing that. It will need augmenting to make it work. I think we will need a new method on the JIT interface that returns whether the frozen object is immutable.

So if I understand it correctly, you want me to add a new method something like bool isObjectImmutable(void* obj) and if that returns true - put GTF_IND_VOLATILE on its store operation? (or a new flag that won't affect jit optimizations which currently give up when they see volatile flag)

PS: Correct me if I'm wrong but currently there are no frozen objects JIT works with which are mutable.

@VSadov
Copy link
Member

VSadov commented Sep 25, 2022

It would not need to be in this change as we do not have mutable frozen objects yet.

Yes, a new API is needed to detect mutables, once we have them. It should only affect the emit - should be stlr on arm64, dmb ishst on arm32, no effect on x64.
It should be ok to continue doing all the optimizations, if it is an ordinary write in the IL.

@jkotas
Copy link
Member

jkotas commented Sep 25, 2022

There are no mutable frozen objects exposed to the JIT as of this PR, so this PR is fine. You do not need to introduce the new JIT/EE interface as part of this PR.

There will be mutable frozen objects exposed to the JIT after #76112 (comment). For example, you can try #76135 (comment) . We will need the new JIT/EE interface method as part that change.

@EgorBo
Copy link
Member Author

EgorBo commented Sep 26, 2022

CI failures:

  1. Outerloop tests fail with FileCheck error: '__jit_disasm.out' is empty for outerloop tests #76150
  2. CriticalFinalizer is fixed via Disable frequently failing CriticalFinalizer test #76131
  3. Other gcstress failures fail in other PRs too and I'm looking at Methodical_r1 with GCStress=0x3 as it might be related to RuntimeType on FOH

@EgorBo
Copy link
Member Author

EgorBo commented Sep 26, 2022

Merging since #76112 depends on it (to introduce "is mutable" helper there)

Going to file the breaking-change note as was requested after. Hopefully, will be able to attach performance improvements from https://github.com/dotnet/perf-autofiling-issues/issues in a week.

@EgorBo EgorBo merged commit 315bdd4 into dotnet:main Sep 26, 2022
@VSadov
Copy link
Member

VSadov commented Sep 26, 2022

This particular PR does not need to be tracked as a breaking change, right?
As I understand the new plan is:

It may be better to write the breaking change for the memory model spec instead of this detail. Say something like that there was a lot of ambiguity around the guarantees here and that the runtimes will only provide the guarantees written down going forward.

@jkotas jkotas removed breaking-change Issue or PR that represents a breaking API or functional change over a prerelease. needs-breaking-change-doc-created Breaking changes need an issue opened with https://github.com/dotnet/docs/issues/new?template=dotnet labels Sep 26, 2022
jkotas added a commit to jkotas/runtime that referenced this pull request Sep 27, 2022
jkotas added a commit that referenced this pull request Sep 27, 2022
* Revert "Allocate RuntimeType objects on Frozen Object Heap (#75573)"

This reverts commit 1f1231c.

* Revert "don't use write barriers for frozen objects (#76135)"

This reverts commit 315bdd4.
EgorBo added a commit that referenced this pull request Oct 6, 2022
…en objects" (#76649)

This PR reverts #76235 with a few manual modifications:
The main issue why the initial PRs (#75573 and #76135) were reverted has just been resolved via #76251:

GC could collect some associated (with frozen objects) objects as unreachable, e.g. it could collect a SyncBlock, WeakReferences and Dependent handles associated with frozen objects which could (e.g. for a short period of time) be indeed unreachable but return back to life after.

Co-authored-by: Jan Kotas <jkotas@microsoft.com>
Co-authored-by: Jakob Botsch Nielsen <Jakob.botsch.nielsen@gmail.com>
@ghost ghost locked as resolved and limited conversation to collaborators Oct 26, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants