KEMBAR78
add demangling of V0 Rust symbols by arielb1 · Pull Request #1070 · async-profiler/async-profiler · GitHub
Skip to content

Conversation

arielb1
Copy link
Collaborator

@arielb1 arielb1 commented Dec 1, 2024

Description

Uses the code from https://github.com/rust-lang/rustc-demangle/pull/75/files to allow demangling V0 ("new-style") Rust symbols

Motivation and context

Currently, async-profiler is unable to demangle "new-style" Rust symbols. This PR adds a way of demangling them.

How has this been tested?

Currently, the demangler has been fuzzed against the official Rust demangler and found to be byte-to-byte correct. Still waiting to test this PR in the real world (cc @Mark-Simulacrum).


By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

src/demangle.cpp Outdated
struct demangle demangle;
rust_demangle_demangle(s, &demangle);
if (rust_demangle_is_known(&demangle)) {
return demangleRust(s, &demangle);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is demangling performed in two steps? Can be the logic encapsulated in one call?

Copy link
Collaborator Author

@arielb1 arielb1 Dec 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because that's how it's done in the Rust code and the C code is very based on the Rust code

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to keep this similarity? In any case, I'd expect to have this logic encapsulated in demangleRust rather than in a more generic Demangle::demangle.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

demangle.cpp is supposed to be in rust-lang/rustc-demangle#75

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to keep this similarity? In any case, I'd expect to have this logic encapsulated in demangleRust rather than in a more generic Demangle::demangle.

The code being this way means we don't try to demangle invalid Rust symbols as C++. I think that is correct.

Comment on lines 142 to 152
if (s_len >= 2 && !strncmp(s, "_R", strlen("_R"))) {
inner = s+2;
inner_len = s_len - 2;
} else if (s_len >= 1 && !strncmp(s, "R", strlen("R"))) {
inner = s+1;
inner_len = s_len - 1;
} else if (s_len >= 3 && !strncmp(s, "__R", strlen("__R"))) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are 3 options? Don't V0 symbols always start with _R?

Copy link
Collaborator Author

@arielb1 arielb1 Dec 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

weird MSVC/Mac behavior, not relevant for rust-profiler

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Async-profiler already strips the first _ when parsing Mach-O symbols. And it will not support MSVC any time soon.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to keep the code synced with rust-lang/rustc-demangle#75

src/demangle.cpp Outdated
return result;
free(result);
// demangling Rust failed, just return the original string
return strdup(s);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function is supposed to return NULL if it's not a valid Rust symbol. Then the caller attempts to demangle it as a C++ symbol, and if it also fails, returns the original string.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will change

}
}

char* Demangle::demangle(const char* s, bool full_signature) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that Demangle::demangle is called only for symbols starting with _Z.
IIUC, it's not the case for V0 format.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the new revision changes that

@arielb1 arielb1 force-pushed the demangle-rust branch 4 times, most recently from 5d39cf0 to 24b162a Compare December 2, 2024 16:17
@krk
Copy link
Contributor

krk commented Dec 2, 2024

Great work! We could add some tests, using the C++ test runner as in: https://github.com/async-profiler/async-profiler/blob/master/test/native/perfEventsLinuxTest.cpp

@arielb1
Copy link
Collaborator Author

arielb1 commented Dec 2, 2024

Great work! We could add some tests, using the C++ test runner as in: https://github.com/async-profiler/async-profiler/blob/master/test/native/perfEventsLinuxTest.cpp

Any pointers on this? The original code does not have unit/integration tests.

@arielb1
Copy link
Collaborator Author

arielb1 commented Dec 3, 2024

added tests

@arielb1
Copy link
Collaborator Author

arielb1 commented Dec 4, 2024

review ping

@krk
Copy link
Contributor

krk commented Dec 4, 2024

For the testing part of the PR:

Thank you for adding tests! While we don't currently have formal unit testing guidelines, I appreciate your contribution toward improving test coverage.

If needed, you can find coverage reports in the GHA checks as a link, such as in Test and Publish Nightly Builds / test (linux-x64,, corretto 21) (pull_request) under the Upload coverage report step.

We don’t have a specific coverage percentage target, but it would be useful to add tests exercising some of the functions with zero coverage, especially in the following areas:

  • Punycode-related code: punycode_decode, code_to_utf8, utf8_next_char.
  • Constants-related: printer_print_const, printer_print_const_str_literal.

@arielb1
Copy link
Collaborator Author

arielb1 commented Dec 4, 2024

We don’t have a specific coverage percentage target, but it would be useful to add tests exercising some of the functions with zero coverage, especially in the following areas:

There are good tests of that code in https://github.com/rust-lang/rustc-demangle including quite intensive fuzzing, which are written in Rust and therefore would be annoying to copy over

@arielb1
Copy link
Collaborator Author

arielb1 commented Dec 4, 2024

Copied some of these tests over to increase the coverage

src/demangle.cpp Outdated
}

char* Demangle::demangleRust(const char* s, const char* e) {
char* Demangle::demangleRust(const char *s, struct demangle const *demangle, bool full_signature) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like s argument is redundant here: it is only used for initial size estimation. struct demangle already has everything needed to estimate result size even better.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I just remove the size estimation logic and start from size 64 and grow binary?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that is smarter

src/demangle.cpp Outdated
*r++ = ':';
char* result = NULL, *new_result;
for (size_t demangled_size = strlen(s) * 3 / 2 + 1; demangled_size < 1000000; demangled_size *= 2) {
new_result = (char *)realloc(result, demangled_size);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

realloc is not used for its intended purpose (no need to copy previous data).
The code will be even simpler if we use malloc here and free after rust_demangle_display_demangle.

src/demangle.cpp Outdated
}
// try to demangle as Rust
struct demangle demangle;
rust_demangle_demangle(s, &demangle);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This runs some non-trivial pre-parsing. I wonder if this may have performance impact, given that the code is also executed for every C++ symbol.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it should be fairly fast. most C++ symbols don't start with _ZN and a digit, and even if they do it just follows the list of lengths.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

most C++ symbols don't start with _ZN and a digit

The majority of JVM symbols start with _ZN and a digit:

nm /usr/java/jdk-23.0.1/lib/server/libjvm.so | grep " _ZN[1-9]" | wc -l
34181

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of these symbols are demangled as Rust producing a different result, e.g.:

_ZN12_GLOBAL__N_113single_threadE
Rust: _GLOBAL__N_1::single_thread
CPP: (anonymous namespace)::single_thread

src/demangle.h Outdated
#ifndef _DEMANGLE_H
#define _DEMANGLE_H

#include "rust-demangle.h"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How exactly Rust symbols are demangled is an implementation detail; I don't think Rust-specific headers should appear in the declaration part of Demangle class. In fact, nothing is really needed from rust-demangle.h here - you can just put a forward declaration of struct demangle.

src/demangle.h Outdated
private:
static char* demangleCpp(const char* s);
static char* demangleRust(const char* s, const char* e);
static char* demangleRust(const char *s, struct demangle const *demangle, bool full_signature);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will not ask you to restyle rust-demangle* sources (maybe, except for filenames), since they came from a different project, but please don't change coding style of the existing sources.

Comment on lines 1741 to 1746
// Advance by the length
for (size_t i = 0; i < len; i++) {
if (chars_len == 0) {
return DemangleInvalid;
}
c = *chars++;
chars_len--;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why doing this char-by-char instead of just incrementing by len at once?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code is slightly weird due to being copied from Rust code. changed it to be more idiomatic C

Copy link
Collaborator Author

@arielb1 arielb1 Dec 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed it to be more C


// demangle calls _cxa_demangle, which should support demangling Itanium-mangled symbols across all Linux systems,
// but might have a different mangling format on some other platforms.
#ifdef __linux__
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is Linux-specific here? Why can't it run on macOS?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if the demangling scheme for MacOS is the same as the one for Linux.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It calls _cxa_demangle and I'm not sure how portable this is

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It certainly works on macOS, async-profiler uses it.

#ifdef __linux__
TEST_CASE(Demangle_test_demangle_cpp) {
const char *s = Demangle::demangle("_ZNSt15basic_streambufIwSt11char_traitsIwEE9pbackfailEj", false);
CHECK_EQ (strcmp(s, "std::basic_streambuf<wchar_t, std::char_traits<wchar_t> >::pbackfail"), 0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CHECK_EQ supports comparing strings - no need for strcmp.
Also, redundant space before (.

Copy link
Collaborator Author

@arielb1 arielb1 Dec 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It didn't work for me for some reason and I tried it a few different ways. Do you have an example that works?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An example is

CHECK_EQ(event_type->name, name_); \

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@krk

I get errors like

        actual values: s = 94904844431968 (0x5650C18A3260), "std::basic_streambuf<wchar_t, std::char_traits<wchar_t> >::pbackfail" = 94904675958744 (0x5650B77F7FD8)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does work if you do it like this:

    char *s = Demangle::demangle("_ZNSt15basic_streambufIwSt11char_traitsIwEE9pbackfailEj", false);
    const char *e = "std::basic_streambuf<wchar_t, std::char_traits<wchar_t> >::pbackfail";
    CHECK_EQ(s, e);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trying to repro, I get this:

    const char *a = "asd";
    const char *b = "asdf";
    CHECK_EQ(a, b);

// Assertion failed: (a == b),
//         actual values: a = "asd", b = "asdf"

Could you share the code for the useless errors you got?

TEST_CASE(Demangle_test_demangle_cpp) {
const char *s = Demangle::demangle("_ZNSt15basic_streambufIwSt11char_traitsIwEE9pbackfailEj", false);
CHECK_EQ (strcmp(s, "std::basic_streambuf<wchar_t, std::char_traits<wchar_t> >::pbackfail"), 0);
free((void*)s);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just free(s)?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doesn't compile

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

found how to fix it

@arielb1
Copy link
Collaborator Author

arielb1 commented Dec 5, 2024 via email

@krk
Copy link
Contributor

krk commented Dec 5, 2024

Thanks, fixed in #1072

@arielb1
Copy link
Collaborator Author

arielb1 commented Dec 5, 2024

Updated PR to include the fix + appropriately cleaned unit tests

@arielb1
Copy link
Collaborator Author

arielb1 commented Dec 6, 2024 via email

@arielb1
Copy link
Collaborator Author

arielb1 commented Dec 6, 2024 via email

@apangin
Copy link
Member

apangin commented Dec 6, 2024

if a symbol starts with _R or if it ends with a h17…E use the Rust demangling, otherwise use cxa_demangle

Sounds good to me

@arielb1
Copy link
Collaborator Author

arielb1 commented Dec 6, 2024 via email

@apangin
Copy link
Member

apangin commented Dec 6, 2024

Just a minor thing: perhaps, rename rust-demangle.* to rustDemangle.* or something (to match file naming convention of the project), and add a link to https://github.com/rust-lang/rustc-demangle in the source code to make it clear where the code originates from?

@arielb1
Copy link
Collaborator Author

arielb1 commented Dec 6, 2024 via email

@arielb1
Copy link
Collaborator Author

arielb1 commented Dec 6, 2024

updated to use the old (same as before this PR) Rust vs. C++ heuristic

@apangin apangin merged commit bce178e into async-profiler:master Dec 6, 2024
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants