KEMBAR78
Improve ARM64 atomics for Clang by StephanTLavavej · Pull Request #4870 · microsoft/STL · GitHub
Skip to content

Conversation

StephanTLavavej
Copy link
Member

This mirrors @mcfi's MSVC-PR-567635 "Leverage clang builtins __atomic_load_n/__atomic_store_n for more efficient acquired loads and released stores on Arm64" as of Iteration 13. His description:

Clang doesn't support __load_acquire/__ldar/__stlr intrinsics, so applications built with clang still generate full barriers for acquired loads and released stores. This PR changes the STL code to leverage clang builtins __atomic_load_n/__atomic_store_n to generate more efficient ldar/stlr for acquired loads and released stores.

This improved a benchmark score by ~2.8% on real hardware.

Resolves llvm/llvm-project#62103 because we're going to use Clang's builtins now.

Works towards #1133.

@StephanTLavavej StephanTLavavej added performance Must go faster ARM64 Related to the ARM64 architecture labels Jul 30, 2024
@StephanTLavavej StephanTLavavej requested a review from a team as a code owner July 30, 2024 05:30
@StephanTLavavej StephanTLavavej merged commit a357ff1 into microsoft:main Jul 31, 2024
@StephanTLavavej StephanTLavavej deleted the arm64-atomics branch July 31, 2024 03:50
@StephanTLavavej
Copy link
Member Author

I overrode policy and merged, as MSVC-PR-567635 was merged to prod/fe.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ARM64 Related to the ARM64 architecture performance Must go faster

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

Consider implementing ARM64 __load_acquire/__stlr intrinsics

2 participants