KEMBAR78
JIT: Optimize redundant memory barriers on arm/arm64 by EgorBo · Pull Request #60219 · dotnet/runtime · GitHub
Skip to content

Conversation

@EgorBo
Copy link
Member

@EgorBo EgorBo commented Oct 9, 2021

Fixes #60023
Fixes #60021

This PR removes redundant load-only or full memory barriers if we already emitted one in current IG and there were no memory accesses after it. By "memory accesses" I mean all instructions in this table instrsarm64.h (and instrsarm.h for arm32) with "ST" or "LD" in the 3rd column.

Examples:

class Prog
{
    volatile int a;

    void Test1(Prog p) => p.a = a;

    void Test2() => a++;
}

Codegen diff

; Method Prog:Test1(Prog):this
G_M46938_IG01:
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
G_M46938_IG02:
            ldr     w0, [x0,#8]
-           dmb     ishld
            dmb     ish
            str     w0, [x1,#8]
G_M46938_IG03:
            ldp     fp, lr, [sp],#16
            ret     lr
-; Total bytes of code: 32
+; Total bytes of code: 28


; Method Prog:Test2():this
G_M48563_IG01:
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
G_M48563_IG02:
            ldr     w1, [x0,#8]
-           dmb     ishld
+           dmb     ish
            add     w1, w1, #1
-           dmb     ish
            str     w1, [x0,#8]
G_M48563_IG03:
            ldp     fp, lr, [sp],#16
            ret     lr
-; Total bytes of code: 36
+; Total bytes of code: 32

Diffs (aljit):

coreclr_tests.pmi.Linux.arm64.checked.mch:


Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 166112668 (overridden on cmd)
Total bytes of diff: 166112628 (overridden on cmd)
Total bytes of delta: -40 (-0.00 % of base)
    diff is an improvement.
    relative diff is an improvement.
Detail diffs


Top file improvements (bytes):
         -24 : 88442.dasm (-6.12% of base)
          -8 : 252713.dasm (-1.10% of base)
          -8 : 236119.dasm (-5.13% of base)

3 total files with Code Size differences (3 improved, 0 regressed), 0 unchanged.

Top method improvements (bytes):
         -24 (-6.12% of base) : 88442.dasm - stfldstatic2:Main():int
          -8 (-1.10% of base) : 252713.dasm - CCSE:Main():int
          -8 (-5.13% of base) : 236119.dasm - ScenarioMonitor:ScenarioShouldContinueRunning():bool:this

Top method improvements (percentages):
         -24 (-6.12% of base) : 88442.dasm - stfldstatic2:Main():int
          -8 (-5.13% of base) : 236119.dasm - ScenarioMonitor:ScenarioShouldContinueRunning():bool:this
          -8 (-1.10% of base) : 252713.dasm - CCSE:Main():int

3 total methods with Code Size differences (3 improved, 0 regressed), 0 unchanged.


libraries.crossgen2.Linux.arm64.checked.mch:


Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 48198936 (overridden on cmd)
Total bytes of diff: 48198692 (overridden on cmd)
Total bytes of delta: -244 (-0.00 % of base)
    diff is an improvement.
    relative diff is an improvement.
Detail diffs


Top file improvements (bytes):
         -12 : 169237.dasm (-3.80% of base)
         -12 : 207967.dasm (-1.10% of base)
         -12 : 22180.dasm (-0.75% of base)
          -8 : 31114.dasm (-0.91% of base)
          -8 : 30586.dasm (-3.03% of base)
          -8 : 31115.dasm (-1.92% of base)
          -4 : 195940.dasm (-1.23% of base)
          -4 : 30540.dasm (-1.49% of base)
          -4 : 31145.dasm (-0.99% of base)
          -4 : 661.dasm (-0.20% of base)
          -4 : 14686.dasm (-3.70% of base)
          -4 : 30338.dasm (-0.84% of base)
          -4 : 42184.dasm (-1.79% of base)
          -4 : 201007.dasm (-0.19% of base)
          -4 : 22200.dasm (-1.49% of base)
          -4 : 30624.dasm (-1.10% of base)
          -4 : 42802.dasm (-0.68% of base)
          -4 : 22187.dasm (-11.11% of base)
          -4 : 30565.dasm (-2.27% of base)
          -4 : 31148.dasm (-0.68% of base)

52 total files with Code Size differences (52 improved, 0 regressed), 0 unchanged.

Top method improvements (bytes):
         -12 (-3.80% of base) : 169237.dasm - Microsoft.Win32.RegistryKey:.ctor(Microsoft.Win32.SafeHandles.SafeRegistryHandle,bool,bool,bool,bool,int):this
         -12 (-0.75% of base) : 22180.dasm - System.Collections.Hashtable:Insert(System.Object,System.Object,bool):this
         -12 (-1.10% of base) : 207967.dasm - WorkStealingQueue:LocalPush(System.__Canon,byref):this
          -8 (-3.03% of base) : 30586.dasm - System.Threading.Tasks.Task:Dispose(bool):this
          -8 (-0.91% of base) : 31114.dasm - WorkStealingQueue:LocalFindAndPop(System.Object):bool:this
          -8 (-1.92% of base) : 31115.dasm - WorkStealingQueue:LocalPush_HandleTailOverflow():int:this
          -4 (-0.68% of base) : 42802.dasm - IdManager:GetId(bool):int:this
          -4 (-0.68% of base) : 31148.dasm - IdManager:GetId(bool):int:this
          -4 (-0.99% of base) : 31145.dasm - IdManager:ReturnId(int,bool):this
          -4 (-0.99% of base) : 42803.dasm - IdManager:ReturnId(int,bool):this
          -4 (-1.23% of base) : 195940.dasm - Microsoft.Extensions.Logging.LoggerFactory:Dispose():this
          -4 (-4.76% of base) : 207766.dasm - StaticIndexRangePartition`1:MoveNext():bool:this
          -4 (-1.41% of base) : 22201.dasm - System.Collections.Hashtable:Clear():this
          -4 (-1.49% of base) : 22200.dasm - System.Collections.Hashtable:Clone():System.Object:this
          -4 (-1.05% of base) : 22186.dasm - System.Collections.Hashtable:rehash(int):this
          -4 (-0.55% of base) : 22178.dasm - System.Collections.Hashtable:Remove(System.Object):this
          -4 (-11.11% of base) : 22187.dasm - System.Collections.Hashtable:UpdateVersion():this
          -4 (-1.43% of base) : 786.dasm - System.ComponentModel.Composition.Hosting.ApplicationCatalog:Dispose(bool):this
          -4 (-0.20% of base) : 661.dasm - System.ComponentModel.Composition.Hosting.ComposablePartExportProvider:Compose(System.ComponentModel.Composition.Hosting.CompositionBatch):this
          -4 (-0.63% of base) : 487.dasm - System.ComponentModel.Composition.Hosting.FilteredCatalog:Dispose(bool):this

Top method improvements (percentages):
          -4 (-11.11% of base) : 22187.dasm - System.Collections.Hashtable:UpdateVersion():this
          -4 (-4.76% of base) : 207766.dasm - StaticIndexRangePartition`1:MoveNext():bool:this
         -12 (-3.80% of base) : 169237.dasm - Microsoft.Win32.RegistryKey:.ctor(Microsoft.Win32.SafeHandles.SafeRegistryHandle,bool,bool,bool,bool,int):this
          -4 (-3.70% of base) : 14686.dasm - System.Net.Http.HttpClientHandler:Dispose(bool):this
          -4 (-3.57% of base) : 14813.dasm - System.Net.Http.DelegatingHandler:Dispose(bool):this
          -4 (-3.23% of base) : 14543.dasm - System.Net.Http.HttpMessageInvoker:Dispose(bool):this
          -4 (-3.23% of base) : 163790.dasm - System.Threading.Tasks.Dataflow.ActionBlock`1:Complete():this
          -8 (-3.03% of base) : 30586.dasm - System.Threading.Tasks.Task:Dispose(bool):this
          -4 (-2.86% of base) : 30588.dasm - System.Threading.Tasks.Task:set_CapturedContext(System.Threading.ExecutionContext):this
          -4 (-2.56% of base) : 30577.dasm - System.Threading.Tasks.Task:UpdateExceptionObservedStatus():this
          -4 (-2.27% of base) : 31372.dasm - System.Threading.ManualResetEventSlim:set_SpinCount(int):this
          -4 (-2.27% of base) : 30565.dasm - System.Threading.Tasks.Task:ExecuteEntryUnsafe(System.Threading.Thread):this
          -4 (-2.13% of base) : 30536.dasm - System.Threading.Tasks.Task:SetCancellationAcknowledged():this
          -8 (-1.92% of base) : 31115.dasm - WorkStealingQueue:LocalPush_HandleTailOverflow():int:this
          -4 (-1.85% of base) : 30623.dasm - System.Threading.Tasks.Task:AddNewChild():this
          -4 (-1.79% of base) : 14710.dasm - System.Net.Http.HttpClient:Dispose(bool):this
          -4 (-1.79% of base) : 42184.dasm - System.Threading.Tasks.Task`1:DangerousSetResult(System.Threading.Tasks.VoidTaskResult):this
          -4 (-1.72% of base) : 31505.dasm - System.Threading.CancellationTokenSource:Dispose(bool):this
          -4 (-1.69% of base) : 41930.dasm - System.Threading.Tasks.Task`1:DangerousSetResult(bool):this
          -4 (-1.69% of base) : 42228.dasm - System.Threading.Tasks.Task`1:DangerousSetResult(int):this

52 total methods with Code Size differences (52 improved, 0 regressed), 0 unchanged.


libraries.pmi.Linux.arm64.checked.mch:


Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 47235868 (overridden on cmd)
Total bytes of diff: 47235624 (overridden on cmd)
Total bytes of delta: -244 (-0.00 % of base)
    diff is an improvement.
    relative diff is an improvement.
Detail diffs


Top file improvements (bytes):
         -12 : 196881.dasm (-1.14% of base)
         -12 : 196872.dasm (-1.15% of base)
         -12 : 196880.dasm (-1.14% of base)
         -12 : 196863.dasm (-1.18% of base)
         -12 : 196884.dasm (-1.14% of base)
         -12 : 196882.dasm (-1.13% of base)
         -12 : 220988.dasm (-5.66% of base)
         -12 : 196883.dasm (-1.13% of base)
          -8 : 35858.dasm (-0.47% of base)
          -8 : 153848.dasm (-3.92% of base)
          -8 : 198764.dasm (-1.28% of base)
          -4 : 167270.dasm (-1.61% of base)
          -4 : 199232.dasm (-2.22% of base)
          -4 : 30213.dasm (-1.89% of base)
          -4 : 171945.dasm (-4.00% of base)
          -4 : 199225.dasm (-0.23% of base)
          -4 : 171961.dasm (-4.00% of base)
          -4 : 215812.dasm (-1.01% of base)
          -4 : 173316.dasm (-0.52% of base)
          -4 : 199094.dasm (-1.33% of base)

42 total files with Code Size differences (42 improved, 0 regressed), 0 unchanged.

Top method improvements (bytes):
         -12 (-5.66% of base) : 220988.dasm - Microsoft.Win32.RegistryKey:.ctor(Microsoft.Win32.SafeHandles.SafeRegistryHandle,bool,bool,bool,bool,int):this
         -12 (-1.18% of base) : 196863.dasm - WorkStealingQueue[__Canon][System.__Canon]:LocalPush(System.__Canon,byref):this
         -12 (-1.15% of base) : 196872.dasm - WorkStealingQueue[Byte][System.Byte]:LocalPush(ubyte,byref):this
         -12 (-1.13% of base) : 196882.dasm - WorkStealingQueue[Double][System.Double]:LocalPush(double,byref):this
         -12 (-1.14% of base) : 196880.dasm - WorkStealingQueue[Int16][System.Int16]:LocalPush(short,byref):this
         -12 (-1.14% of base) : 196881.dasm - WorkStealingQueue[Int32][System.Int32]:LocalPush(int,byref):this
         -12 (-1.14% of base) : 196884.dasm - WorkStealingQueue[Int64][System.Int64]:LocalPush(long,byref):this
         -12 (-1.13% of base) : 196883.dasm - WorkStealingQueue[Vector`1][System.Numerics.Vector`1[System.Single]]:LocalPush(System.Numerics.Vector`1[Single],byref):this
          -8 (-0.47% of base) : 35858.dasm - Microsoft.Diagnostics.Symbols.SourceFile:GetSourceFromSrcServer():System.String:this
          -8 (-1.28% of base) : 198764.dasm - System.ComponentModel.Composition.ReflectionModel.ExportingMember:GetExportedValue(System.Object,System.Object):System.Object:this
          -8 (-3.92% of base) : 153848.dasm - System.Net.Http.HttpClient:Dispose(bool):this
          -4 (-0.88% of base) : 30147.dasm - IdManager[Byte][System.Byte]:GetId(bool):int:this
          -4 (-1.22% of base) : 30150.dasm - IdManager[Byte][System.Byte]:ReturnId(int,bool):this
          -4 (-1.01% of base) : 215812.dasm - Microsoft.Extensions.Logging.LoggerFactory:Dispose():this
          -4 (-3.12% of base) : 220987.dasm - Microsoft.Win32.RegistryKey:.ctor(Microsoft.Win32.SafeHandles.SafeRegistryHandle,bool,int):this
          -4 (-1.89% of base) : 220931.dasm - Microsoft.Win32.RegistryKey:FromHandle(Microsoft.Win32.SafeHandles.SafeRegistryHandle,int):Microsoft.Win32.RegistryKey
          -4 (-4.76% of base) : 197474.dasm - StaticIndexRangePartition`1[__Canon][System.__Canon]:MoveNext():bool:this
          -4 (-4.76% of base) : 197477.dasm - StaticIndexRangePartition`1[Byte][System.Byte]:MoveNext():bool:this
          -4 (-1.33% of base) : 199094.dasm - System.ComponentModel.Composition.Hosting.ApplicationCatalog:Dispose(bool):this
          -4 (-2.22% of base) : 199181.dasm - System.ComponentModel.Composition.Hosting.CatalogExportProvider:EnsureRunning():this

Top method improvements (percentages):
         -12 (-5.66% of base) : 220988.dasm - Microsoft.Win32.RegistryKey:.ctor(Microsoft.Win32.SafeHandles.SafeRegistryHandle,bool,bool,bool,bool,int):this
          -4 (-4.76% of base) : 197474.dasm - StaticIndexRangePartition`1[__Canon][System.__Canon]:MoveNext():bool:this
          -4 (-4.76% of base) : 197477.dasm - StaticIndexRangePartition`1[Byte][System.Byte]:MoveNext():bool:this
          -4 (-4.17% of base) : 153798.dasm - System.Net.Http.DelegatingHandler:Dispose(bool):this
          -4 (-4.00% of base) : 171945.dasm - System.Threading.Tasks.Dataflow.ActionBlock`1[__Canon][System.__Canon]:Complete():this
          -4 (-4.00% of base) : 171961.dasm - System.Threading.Tasks.Dataflow.ActionBlock`1[Byte][System.Byte]:Complete():this
          -8 (-3.92% of base) : 153848.dasm - System.Net.Http.HttpClient:Dispose(bool):this
          -4 (-3.70% of base) : 154080.dasm - System.Net.Http.HttpMessageInvoker:Dispose(bool):this
          -4 (-3.57% of base) : 153948.dasm - System.Net.Http.HttpClientHandler:Dispose(bool):this
          -4 (-3.33% of base) : 154716.dasm - System.Net.Http.Http2Connection:ProcessPingAck(long):this
          -4 (-3.12% of base) : 220987.dasm - Microsoft.Win32.RegistryKey:.ctor(Microsoft.Win32.SafeHandles.SafeRegistryHandle,bool,int):this
          -4 (-2.22% of base) : 199181.dasm - System.ComponentModel.Composition.Hosting.CatalogExportProvider:EnsureRunning():this
          -4 (-2.22% of base) : 199232.dasm - System.ComponentModel.Composition.Hosting.ComposablePartExportProvider:EnsureRunning():this
          -4 (-2.08% of base) : 171946.dasm - System.Threading.Tasks.Dataflow.ActionBlock`1[__Canon][System.__Canon]:System.Threading.Tasks.Dataflow.IDataflowBlock.Fault(System.Exception):this
          -4 (-2.08% of base) : 171962.dasm - System.Threading.Tasks.Dataflow.ActionBlock`1[Byte][System.Byte]:System.Threading.Tasks.Dataflow.IDataflowBlock.Fault(System.Exception):this
          -4 (-2.00% of base) : 30253.dasm - System.Threading.Tasks.Task`1[Double][System.Double]:DangerousSetResult(double):this
          -4 (-1.89% of base) : 220931.dasm - Microsoft.Win32.RegistryKey:FromHandle(Microsoft.Win32.SafeHandles.SafeRegistryHandle,int):Microsoft.Win32.RegistryKey
          -4 (-1.89% of base) : 30213.dasm - System.Threading.Tasks.Task`1[Byte][System.Byte]:DangerousSetResult(ubyte):this
          -4 (-1.89% of base) : 30246.dasm - System.Threading.Tasks.Task`1[Int16][System.Int16]:DangerousSetResult(short):this
          -4 (-1.61% of base) : 167252.dasm - System.Linq.Parallel.AsynchronousChannel`1[__Canon][System.__Canon]:EnqueueChunk(System.__Canon[]):this

42 total methods with Code Size differences (42 improved, 0 regressed), 0 unchanged.


libraries_tests.pmi.Linux.arm64.checked.mch:


Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 112411060 (overridden on cmd)
Total bytes of diff: 112410808 (overridden on cmd)
Total bytes of delta: -252 (-0.00 % of base)
    diff is an improvement.
    relative diff is an improvement.
Detail diffs


Top file improvements (bytes):
          -8 : 115030.dasm (-0.47% of base)
          -8 : 218162.dasm (-3.92% of base)
          -8 : 273881.dasm (-2.74% of base)
          -8 : 87601.dasm (-2.02% of base)
          -8 : 87562.dasm (-1.69% of base)
          -8 : 90484.dasm (-0.95% of base)
          -8 : 253194.dasm (-0.94% of base)
          -8 : 87552.dasm (-0.87% of base)
          -8 : 120219.dasm (-1.03% of base)
          -8 : 196277.dasm (-1.07% of base)
          -8 : 87561.dasm (-1.44% of base)
          -8 : 115444.dasm (-0.34% of base)
          -8 : 242383.dasm (-1.10% of base)
          -4 : 115035.dasm (-0.27% of base)
          -4 : 115070.dasm (-0.39% of base)
          -4 : 147545.dasm (-0.25% of base)
          -4 : 323862.dasm (-0.08% of base)
          -4 : 115081.dasm (-0.22% of base)
          -4 : 115227.dasm (-0.49% of base)
          -4 : 196132.dasm (-0.10% of base)

50 total files with Code Size differences (50 improved, 0 regressed), 0 unchanged.

Top method improvements (bytes):
          -8 (-0.95% of base) : 90484.dasm - <SendAsync_NullRequest_ThrowsException>d__2:MoveNext():this
          -8 (-0.34% of base) : 115444.dasm - <TestLinkTo_MaxMessages>d__24:MoveNext():this
          -8 (-0.47% of base) : 115030.dasm - <TestSchedulerUsage>d__6:MoveNext():this
          -8 (-2.74% of base) : 273881.dasm - Microsoft.AspNetCore.Http.Features.FeatureCollection:set_Item(System.Type,System.Object):this
          -8 (-1.07% of base) : 196277.dasm - System.Data.SqlClient.TdsParserStateObject:SNIWritePacket(System.Data.SqlClient.PacketHandle,byref,bool,bool):System.Threading.Tasks.Task:this
          -8 (-1.10% of base) : 242383.dasm - System.Data.SqlClient.TdsParserStateObject:SNIWritePacket(System.Data.SqlClient.PacketHandle,byref,bool,bool):System.Threading.Tasks.Task:this
          -8 (-2.02% of base) : 87601.dasm - System.Net.Http.Functional.Tests.CustomHttpClientTest:Send_NullRequest_ThrowsException():this
          -8 (-1.44% of base) : 87561.dasm - System.Net.Http.Functional.Tests.HttpClientTest:DefaultRequestVersion_InitialValueExpected():this
          -8 (-1.69% of base) : 87562.dasm - System.Net.Http.Functional.Tests.HttpClientTest:DefaultRequestVersion_Roundtrips():this
          -8 (-0.87% of base) : 87552.dasm - System.Net.Http.Functional.Tests.HttpClientTest:Send_SingleThread_Succeeds(int):this
          -8 (-3.92% of base) : 218162.dasm - System.Net.Http.HttpClient:Dispose(bool):this
          -8 (-0.94% of base) : 253194.dasm - System.Net.Http.Json.Functional.Tests.JsonContentTestsBase:JsonContentThrowsOnIncompatibleTypeAsync():this
          -8 (-1.03% of base) : 120219.dasm - System.Net.Tests.HttpListenerTests:EndGetContext_AlreadyCalled_ThrowsInvalidOperationException():this
          -4 (-0.34% of base) : 250616.dasm - <>c:<HttpClient_ValidAuthentication_Success>b__1_0(System.String,System.String,System.String):this
          -4 (-0.99% of base) : 115752.dasm - <>c__DisplayClass0_0:<TestEtw>b__10():this
          -4 (-3.45% of base) : 115439.dasm - <>c__DisplayClass23_0:<TestLinkTo_Predicate>b__2(System.Threading.Tasks.Task):this
          -4 (-3.45% of base) : 115837.dasm - <>c__DisplayClass6_0:<WriteOnceToAction>b__3(System.Threading.Tasks.Task):this
          -4 (-0.08% of base) : 323862.dasm - <TestAsyncOutputStream_BeginCancelBeginOutputRead>d__7:MoveNext():this
          -4 (-0.15% of base) : 115026.dasm - <TestBasicMessageProcessing>d__5:MoveNext():this
          -4 (-0.27% of base) : 115035.dasm - <TestInputCount>d__7:MoveNext():this

Top method improvements (percentages):
          -4 (-4.17% of base) : 218131.dasm - System.Net.Http.DelegatingHandler:Dispose(bool):this
          -8 (-3.92% of base) : 218162.dasm - System.Net.Http.HttpClient:Dispose(bool):this
          -4 (-3.70% of base) : 88846.dasm - MockHandler:Dispose(bool):this
          -4 (-3.70% of base) : 218299.dasm - System.Net.Http.HttpMessageInvoker:Dispose(bool):this
          -4 (-3.45% of base) : 115439.dasm - <>c__DisplayClass23_0:<TestLinkTo_Predicate>b__2(System.Threading.Tasks.Task):this
          -4 (-3.45% of base) : 115837.dasm - <>c__DisplayClass6_0:<WriteOnceToAction>b__3(System.Threading.Tasks.Task):this
          -8 (-2.74% of base) : 273881.dasm - Microsoft.AspNetCore.Http.Features.FeatureCollection:set_Item(System.Type,System.Object):this
          -8 (-2.02% of base) : 87601.dasm - System.Net.Http.Functional.Tests.CustomHttpClientTest:Send_NullRequest_ThrowsException():this
          -8 (-1.69% of base) : 87562.dasm - System.Net.Http.Functional.Tests.HttpClientTest:DefaultRequestVersion_Roundtrips():this
          -4 (-1.64% of base) : 307097.dasm - System.ComponentModel.DataAnnotations.Tests.ValidationAttributeTests:TestNoThrowIfOverrideIsValid01()
          -4 (-1.54% of base) : 279329.dasm - System.Threading.Tasks.Tests.UserActionEnumerator`1[__Canon][System.__Canon]:MoveNext():bool:this
          -4 (-1.54% of base) : 279336.dasm - System.Threading.Tasks.Tests.UserActionEnumerator`1[Byte][System.Byte]:MoveNext():bool:this
          -8 (-1.44% of base) : 87561.dasm - System.Net.Http.Functional.Tests.HttpClientTest:DefaultRequestVersion_InitialValueExpected():this
          -4 (-1.41% of base) : 87769.dasm - System.Net.Http.Functional.Tests.MessageProcessingHandlerTest:Ctor_CreateDispose_Success():this
          -4 (-1.23% of base) : 87770.dasm - System.Net.Http.Functional.Tests.MessageProcessingHandlerTest:Ctor_CreateWithHandlerDispose_Success():this
          -8 (-1.10% of base) : 242383.dasm - System.Data.SqlClient.TdsParserStateObject:SNIWritePacket(System.Data.SqlClient.PacketHandle,byref,bool,bool):System.Threading.Tasks.Task:this
          -8 (-1.07% of base) : 196277.dasm - System.Data.SqlClient.TdsParserStateObject:SNIWritePacket(System.Data.SqlClient.PacketHandle,byref,bool,bool):System.Threading.Tasks.Task:this
          -8 (-1.03% of base) : 120219.dasm - System.Net.Tests.HttpListenerTests:EndGetContext_AlreadyCalled_ThrowsInvalidOperationException():this
          -4 (-0.99% of base) : 115752.dasm - <>c__DisplayClass0_0:<TestEtw>b__10():this
          -8 (-0.95% of base) : 90484.dasm - <SendAsync_NullRequest_ThrowsException>d__2:MoveNext():this

50 total methods with Code Size differences (50 improved, 0 regressed), 0 unchanged.


@ghost ghost added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Oct 9, 2021
@ghost
Copy link

ghost commented Oct 9, 2021

Tagging subscribers to this area: @JulieLeeMSFT
See info in area-owners.md if you want to be subscribed.

Issue Details

Fixes #60023
Fixes #60021
This PR removes redundant load-only or full memory barriers if we already emitted one in current IG and there were no memory accesses. Examples:

class Prog
{
    volatile int a;

    void Test1(Prog p) => p.a = a;

    void Test2() => a++;
}

Codegen diff

; Method Prog:Test1(Prog):this
G_M46938_IG01:
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
						;; bbWeight=1    PerfScore 1.50

G_M46938_IG02:
            ldr     w0, [x0,#8]
-           dmb     ishld
            dmb     ish
            str     w0, [x1,#8]
						;; bbWeight=1    PerfScore 24.00

G_M46938_IG03:
            ldp     fp, lr, [sp],#16
            ret     lr
						;; bbWeight=1    PerfScore 2.00
-; Total bytes of code: 32
+; Total bytes of code: 28

; Method Prog:Test2():this
G_M48563_IG01:
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
						;; bbWeight=1    PerfScore 1.50

G_M48563_IG02:
            ldr     w1, [x0,#8]
-           dmb     ishld
+           dmb     ish
            add     w1, w1, #1
-           dmb     ish
            str     w1, [x0,#8]
						;; bbWeight=1    PerfScore 24.50

G_M48563_IG03:
            ldp     fp, lr, [sp],#16
            ret     lr
						;; bbWeight=1    PerfScore 2.00
-; Total bytes of code: 36
+; Total bytes of code: 32
Author: EgorBo
Assignees: -
Labels:

area-CodeGen-coreclr

Milestone: -

@EgorBo
Copy link
Member Author

EgorBo commented Oct 9, 2021

/azp run runtime-coreclr jitstressregs, runtime-coreclr outerloop, runtime-coreclr jitstress2-jitstressregs, runtime-coreclr gcstress0x3-gcstress0xc

@dotnet dotnet deleted a comment from azure-pipelines bot Oct 9, 2021
@dotnet dotnet deleted a comment from azure-pipelines bot Oct 9, 2021
@EgorBo
Copy link
Member Author

EgorBo commented Oct 10, 2021

PTAL @dotnet/jit-contrib

Copy link
Member

@jakobbotsch jakobbotsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but may want to get another review.

@EgorBo
Copy link
Member Author

EgorBo commented Oct 11, 2021

cc @echesakovMSFT

@echesakov echesakov self-requested a review October 12, 2021 01:38
@echesakov
Copy link
Contributor

@EgorBo In a case when a volatile write followed by a volatile read (it is an inversion of your example) - we will still insert a barrier before the write and after the read to guarantee their release/acquire semantics. Is this correct interpretation of the change?

@EgorBo
Copy link
Member Author

EgorBo commented Oct 12, 2021

@EgorBo In a case when a volatile write followed by a volatile read (it is an inversion of your example) - we will still insert a barrier before the write and after the read to guarantee their release/acquire semantics. Is this correct interpretation of the change?

volatile int a;

int Test2()
{
    a = 42; // store
    return a; // load
}

In this case we emit:

            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
G_M18266_IG02:
            mov     w1, #42
            dmb     ish
            str     w1, [x0,#8]
            ldr     w0, [x0,#8]
            dmb     ishld
G_M18266_IG03:
            ldp     fp, lr, [sp],#16
            ret     lr

and nothing is optimized in this case since there are memory accesses in-between two barriers here.

Copy link
Contributor

@echesakov echesakov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@EgorBo EgorBo merged commit 845e66c into dotnet:main Oct 13, 2021
#ifdef TARGET_ARMARCH
// Reset emitLastMemBarrier for new IG
emitLastMemBarrier = nullptr;
#endif
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should really be:

if (!emitAdd) {
  emitLastMemBarrier = nullptr;
}

because if we're saving this IG for the purpose of creating a new, "overflow" IG, we're not creating a label, so it should be treated as contiguous.

It's unfortunate that this "peephole optimization" is spread around the emitter instead of being handled entirely within codegen. E.g., instead of emitLastMemBarrier, couldn't there be a codegen variable that keeps track of the last memory barrier, and clear that at BasicBlock boundaries (not IG boundaries)? You're leveraging a convenient chokepoint in the emitter, appendToCurIG, to clear this if a load/store is found, but maybe there is something not much more difficult in codegen only (and not touching the emitter).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to do that in codegen first but it didn't look safe to me - I wasn't sure if I remove "GTF_VOLATILE" flag from a load it won't be then optimized further in emitter or in other place of codegen. I chose emitter because on arm the only instructions we need memorybarrier for is ldr and str (and their variations) unlike x86 where it's way more complicated. And it's where other last-chance peepholes live.

because if we're saving this IG for the purpose of creating a new, "overflow" IG, we're not creating a label, so it should be treated as contiguous.

Yeah, I kept that in mind, but it didn't produce any diffs so I decided to leave it as is. I was mostly interested in optimizing things like volatileFiled++ or volatileFiled*=10 etc that we have in some places in Task. Fortunately, we don't emit many memory barriers on arm - for performance-critical code we mostly use Volatile.Read/Write that doesn't produce them.

@ghost ghost locked as resolved and limited conversation to collaborators Nov 14, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[JIT][arm64] double dmb [JIT][arm64] recognize Interlocked.Increment in volatile increment

4 participants