KEMBAR78
[EventPipe] Block EventPipeProvider Deletion for ongoing callbacks by mdh1418 · Pull Request #106040 · dotnet/runtime · GitHub
Skip to content

Conversation

@mdh1418
Copy link
Member

@mdh1418 mdh1418 commented Aug 6, 2024

Fixes #80666

This PR aims to align behavior between EventPipe's Unregister logic and ETW's Unregister logic by blocking EventPipe's DeleteProvider for in-flight callbacks, so that the gchandle will not be freed before the callback completes. (ETW has its own lock for ETW commands/callbacks).

Our initial attempt to add a corresponding EventPipe lock revealed to us that locks should not be taken around the callback (specifically performing the callback within a lock) because it breaks concurrent callbacks scenarios.

In this PR, we track the EventPipeProvider's callbacks that have been prepared but not yet invoked (i.e. in-flight callbacks), and leverage a signal set/wait to block the EventPipe Provider's deferred deletion.


Repro

Reproduced the crash by:

Console.WriteLine("MIHW Ready to create TestEventSource");
Console.ReadKey();
var testEventSource = new TestEventSource();
Console.WriteLine("MIHW Ready to dispose TestEventSource");
Console.ReadKey();
testEventSource.Dispose();
Console.WriteLine($"MIHW TestEventSource disposed.");

[EventSource(Name = "TestEventSource")]
public class TestEventSource : EventSource
...
  1. Launching the above sample app in windbg
  2. Connecting an EventPipe session via dotnet-trace collect --providers TestEventSource -p <pid of app from dotnet-trace ps>
  3. Setting a breakpoint in ep-provider.c provider_invoke_callback at callback invocation
  4. Closing the EventPipe session (enter || ctrl + c) to hit that breakpoint and freezing the thread
  5. Continuing the application logic by disposing the EventSource
  6. Unfreezing the previously frozen thread and continuing

Resulted in a NullReferenceException crash.

Testing

Performed the same steps as above with the changes in this PR, Dispose is blocked until the callback completes.

@jkotas
Copy link
Member

jkotas commented Aug 6, 2024

Can you re-enable the disabled test?

@jkotas
Copy link
Member

jkotas commented Aug 6, 2024

The change LGTM, but I am not deeply familiar with EventPipe code.

@dotnet dotnet deleted a comment from azure-pipelines bot Aug 6, 2024
@jkotas
Copy link
Member

jkotas commented Aug 6, 2024

/azp run runtime-coreclr outerloop

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Member

@noahfalk noahfalk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few comments inline but mostly looks good!

@lateralusX lateralusX self-requested a review August 7, 2024 07:32
@mdh1418
Copy link
Member Author

mdh1418 commented Aug 7, 2024

/azp run runtime-coreclr outerloop

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

mdh1418 added 4 commits August 7, 2024 13:32
Give the callback data access to the associated provider
so it can decrement the provider's callbacks counter
after the callback invocation is completed.
mdh1418 added 3 commits August 7, 2024 13:32
Rename counter
Add more comments describing the blocking behavior
Add comments for potential deadlock scenario
@mdh1418 mdh1418 force-pushed the eventpipe_block_unregister_for_callbacks_counter_signal_impl branch from 7878c1d to d28ed8e Compare August 7, 2024 17:32
@mdh1418
Copy link
Member Author

mdh1418 commented Aug 7, 2024

/azp run runtime-coreclr outerloop

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mdh1418
Copy link
Member Author

mdh1418 commented Aug 7, 2024

/ba-g The failing tests uncaught by build analysis are #104905 and #103347, not sure why build analysis didn't recognize the match in 103347.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

tracing/eventpipe/eventsourceerror/eventsourceerror/eventsourceerror failure

5 participants