KEMBAR78
Rework crash protection by apangin · Pull Request #1427 · async-profiler/async-profiler · GitHub
Skip to content

Conversation

@apangin
Copy link
Member

@apangin apangin commented Aug 4, 2025

Description

Revised implementation of the mechanism for loading from potentially unsafe locations (aka SafeAccess). Previous implementation was fragile and worked incorrectly in certain cases.

Summary of changes

  1. Previous implementation assumed that the load instruction was always within 16 bytes from the beginning of the function. This was not true for debug builds. I added a label that explicitly marks the end of the critical code.
  2. Label also protects from specialization of SafeAccess for constant arguments. This is needed when compiling with clang, which does not support noclone attribute.
  3. Fixed clang issue when a seemingly unused argument was optimized away.
  4. Unified SafeAccess::load functions. Now there are just 2 variants of load and a single skipLoad.
  5. Rewrote implementation in inline assembly for x86 and arm64 to tackle code generation quirks by gcc and clang with different optimization level and gcov instrumentation.
  6. If a native application did not install custom SIGSEGV handler, async-profiler could loop forever in its own handler. Now it restores SIG_DFL to pass control to the OS crash handler.
  7. macOS can raise either SIGSEGV or SIGBUS. Previously, async-profiler handled SIGBUS only; now it handles both.
  8. Added unit tests for SafeAccess.

Related issues

#1412

How has this been tested?

Added SafeAccess unit tests.
Compiled with gcc and clang with different levels of optimizations.


By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

u32 SafeAccess::skipLoad(instruction_t* pc) {
if ((pc >= (void*)load && pc < load_end) ||
(pc >= (void*)load32 && pc < load32_end)) {
#if defined(__x86_64__) || defined(__i386__)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about push [mem]; pop reg?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I now generate desired instruction explicitly with inline assembler.

make COMMIT_TAG=$HASH FAT_BINARY=true release -j
;;
*)
make COMMIT_TAG=$HASH CC=/usr/local/musl/bin/musl-gcc release coverage -j
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can keep the coverage build if we exclude the SafeAccess functions with __attribute__((no_profile_instrument_function)).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Returned coverage back. Inline assembler solves the problem.

void* deadbeef = (void*)(uintptr_t)0xdeadbeefdeadbeefULL;

ASSERT_EQ(SafeAccess::load(bad_ptr), 0);
ASSERT_EQ(SafeAccess::load(&good_ptr), good_ptr);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we take address of another variable in the good_ptr declaration, so bad_ptr and good_ptr are both void**? This might make these tests easier to read.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@apangin
Copy link
Member Author

apangin commented Aug 6, 2025

OK, I gave up fighting with compiler quirks. Rewrote implementation with inline assembler for x86 and arm64. Other architectures fall back to C++ implementation which is good enough for optimized builds (debug builds probably don't work on these architectures, but we do not officially support them anyway).

asm volatile("movl (%1), %0" : "=a"(ret) : "r"(ptr), "S"(default_value));
#elif defined(__i386__)
int32_t ret;
asm volatile("movl (%1), %0" : "=a"(ret) : "r"(ptr), "a"(default_value));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be mov, equivalent to movl x86 32 bits.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, they are the same on x86-32.

void* SafeAccess::load(void** ptr, void* default_value) {
#if defined(__x86_64__)
void* ret;
asm volatile("mov (%1), %0" : "=a"(ret) : "r"(ptr), "S"(default_value));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the output default_value/rsi is optimized away and it is passed in esi not rsi from the caller:

load(void**, void*):
 mov    rax,QWORD PTR [rdi]
load_end:
 ret
...
 mov    esi,0x99
...
 call load

https://godbolt.org/z/6nvef3zsb

It seems this could lead to incorrect default_value when higher bits of rsi is not clear, given that StackFrame::arg1 uses rsi.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the output default_value/rsi is optimized away

I intentionally use the register designated for the second argument by the ABI (rsi/esi), so that SafeAccess::load compiles to a single mov instruction.

It seems this could lead to incorrect default_value when higher bits of rsi

No: mov esi, int32 clears high bits of rsi.
Compiler uses this instruction because it's smaller in size.

@apangin apangin merged commit a9e8c8d into master Aug 6, 2025
39 checks passed
@apangin apangin deleted the crash-handler branch August 6, 2025 19:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants