-
Notifications
You must be signed in to change notification settings - Fork 937
add demangling of V0 Rust symbols #1070
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
dfd4712 to
d9606a2
Compare
src/demangle.cpp
Outdated
| struct demangle demangle; | ||
| rust_demangle_demangle(s, &demangle); | ||
| if (rust_demangle_is_known(&demangle)) { | ||
| return demangleRust(s, &demangle); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is demangling performed in two steps? Can be the logic encapsulated in one call?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because that's how it's done in the Rust code and the C code is very based on the Rust code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to keep this similarity? In any case, I'd expect to have this logic encapsulated in demangleRust rather than in a more generic Demangle::demangle.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
demangle.cpp is supposed to be in rust-lang/rustc-demangle#75
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to keep this similarity? In any case, I'd expect to have this logic encapsulated in demangleRust rather than in a more generic Demangle::demangle.
The code being this way means we don't try to demangle invalid Rust symbols as C++. I think that is correct.
| if (s_len >= 2 && !strncmp(s, "_R", strlen("_R"))) { | ||
| inner = s+2; | ||
| inner_len = s_len - 2; | ||
| } else if (s_len >= 1 && !strncmp(s, "R", strlen("R"))) { | ||
| inner = s+1; | ||
| inner_len = s_len - 1; | ||
| } else if (s_len >= 3 && !strncmp(s, "__R", strlen("__R"))) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are 3 options? Don't V0 symbols always start with _R?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
weird MSVC/Mac behavior, not relevant for rust-profiler
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Async-profiler already strips the first _ when parsing Mach-O symbols. And it will not support MSVC any time soon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want to keep the code synced with rust-lang/rustc-demangle#75
src/demangle.cpp
Outdated
| return result; | ||
| free(result); | ||
| // demangling Rust failed, just return the original string | ||
| return strdup(s); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function is supposed to return NULL if it's not a valid Rust symbol. Then the caller attempts to demangle it as a C++ symbol, and if it also fails, returns the original string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will change
| } | ||
| } | ||
|
|
||
| char* Demangle::demangle(const char* s, bool full_signature) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that Demangle::demangle is called only for symbols starting with _Z.
IIUC, it's not the case for V0 format.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the new revision changes that
5d39cf0 to
24b162a
Compare
|
Great work! We could add some tests, using the C++ test runner as in: https://github.com/async-profiler/async-profiler/blob/master/test/native/perfEventsLinuxTest.cpp |
Any pointers on this? The original code does not have unit/integration tests. |
61e044c to
f5c6618
Compare
|
added tests |
|
review ping |
|
For the testing part of the PR: Thank you for adding tests! While we don't currently have formal unit testing guidelines, I appreciate your contribution toward improving test coverage. If needed, you can find coverage reports in the GHA checks as a link, such as in We don’t have a specific coverage percentage target, but it would be useful to add tests exercising some of the functions with zero coverage, especially in the following areas:
|
There are good tests of that code in https://github.com/rust-lang/rustc-demangle including quite intensive fuzzing, which are written in Rust and therefore would be annoying to copy over |
|
Copied some of these tests over to increase the coverage |
604df35 to
3a0d6cd
Compare
src/demangle.cpp
Outdated
| } | ||
|
|
||
| char* Demangle::demangleRust(const char* s, const char* e) { | ||
| char* Demangle::demangleRust(const char *s, struct demangle const *demangle, bool full_signature) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like s argument is redundant here: it is only used for initial size estimation. struct demangle already has everything needed to estimate result size even better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should I just remove the size estimation logic and start from size 64 and grow binary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that is smarter
src/demangle.cpp
Outdated
| *r++ = ':'; | ||
| char* result = NULL, *new_result; | ||
| for (size_t demangled_size = strlen(s) * 3 / 2 + 1; demangled_size < 1000000; demangled_size *= 2) { | ||
| new_result = (char *)realloc(result, demangled_size); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
realloc is not used for its intended purpose (no need to copy previous data).
The code will be even simpler if we use malloc here and free after rust_demangle_display_demangle.
src/demangle.cpp
Outdated
| } | ||
| // try to demangle as Rust | ||
| struct demangle demangle; | ||
| rust_demangle_demangle(s, &demangle); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This runs some non-trivial pre-parsing. I wonder if this may have performance impact, given that the code is also executed for every C++ symbol.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it should be fairly fast. most C++ symbols don't start with _ZN and a digit, and even if they do it just follows the list of lengths.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
most C++ symbols don't start with _ZN and a digit
The majority of JVM symbols start with _ZN and a digit:
nm /usr/java/jdk-23.0.1/lib/server/libjvm.so | grep " _ZN[1-9]" | wc -l
34181
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some of these symbols are demangled as Rust producing a different result, e.g.:
_ZN12_GLOBAL__N_113single_threadE
Rust: _GLOBAL__N_1::single_thread
CPP: (anonymous namespace)::single_thread
src/demangle.h
Outdated
| #ifndef _DEMANGLE_H | ||
| #define _DEMANGLE_H | ||
|
|
||
| #include "rust-demangle.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How exactly Rust symbols are demangled is an implementation detail; I don't think Rust-specific headers should appear in the declaration part of Demangle class. In fact, nothing is really needed from rust-demangle.h here - you can just put a forward declaration of struct demangle.
src/demangle.h
Outdated
| private: | ||
| static char* demangleCpp(const char* s); | ||
| static char* demangleRust(const char* s, const char* e); | ||
| static char* demangleRust(const char *s, struct demangle const *demangle, bool full_signature); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will not ask you to restyle rust-demangle* sources (maybe, except for filenames), since they came from a different project, but please don't change coding style of the existing sources.
| // Advance by the length | ||
| for (size_t i = 0; i < len; i++) { | ||
| if (chars_len == 0) { | ||
| return DemangleInvalid; | ||
| } | ||
| c = *chars++; | ||
| chars_len--; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why doing this char-by-char instead of just incrementing by len at once?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code is slightly weird due to being copied from Rust code. changed it to be more idiomatic C
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed it to be more C
test/native/demangleTest.cpp
Outdated
|
|
||
| // demangle calls _cxa_demangle, which should support demangling Itanium-mangled symbols across all Linux systems, | ||
| // but might have a different mangling format on some other platforms. | ||
| #ifdef __linux__ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is Linux-specific here? Why can't it run on macOS?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if the demangling scheme for MacOS is the same as the one for Linux.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It calls _cxa_demangle and I'm not sure how portable this is
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It certainly works on macOS, async-profiler uses it.
test/native/demangleTest.cpp
Outdated
| #ifdef __linux__ | ||
| TEST_CASE(Demangle_test_demangle_cpp) { | ||
| const char *s = Demangle::demangle("_ZNSt15basic_streambufIwSt11char_traitsIwEE9pbackfailEj", false); | ||
| CHECK_EQ (strcmp(s, "std::basic_streambuf<wchar_t, std::char_traits<wchar_t> >::pbackfail"), 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CHECK_EQ supports comparing strings - no need for strcmp.
Also, redundant space before (.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It didn't work for me for some reason and I tried it a few different ways. Do you have an example that works?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An example is
| CHECK_EQ(event_type->name, name_); \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I get errors like
actual values: s = 94904844431968 (0x5650C18A3260), "std::basic_streambuf<wchar_t, std::char_traits<wchar_t> >::pbackfail" = 94904675958744 (0x5650B77F7FD8)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does work if you do it like this:
char *s = Demangle::demangle("_ZNSt15basic_streambufIwSt11char_traitsIwEE9pbackfailEj", false);
const char *e = "std::basic_streambuf<wchar_t, std::char_traits<wchar_t> >::pbackfail";
CHECK_EQ(s, e);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Trying to repro, I get this:
const char *a = "asd";
const char *b = "asdf";
CHECK_EQ(a, b);
// Assertion failed: (a == b),
// actual values: a = "asd", b = "asdf"
Could you share the code for the useless errors you got?
test/native/demangleTest.cpp
Outdated
| TEST_CASE(Demangle_test_demangle_cpp) { | ||
| const char *s = Demangle::demangle("_ZNSt15basic_streambufIwSt11char_traitsIwEE9pbackfailEj", false); | ||
| CHECK_EQ (strcmp(s, "std::basic_streambuf<wchar_t, std::char_traits<wchar_t> >::pbackfail"), 0); | ||
| free((void*)s); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just free(s)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doesn't compile
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
found how to fix it
5bb6fa3 to
abecae2
Compare
|
Just inline one of the string parameters in the tests.On 5 Dec 2024, at 15:40, Kerem Kat ***@***.***> wrote:
@krk commented on this pull request.
In test/native/demangleTest.cpp:
+ CHECK_EQ(Demangle::needsDemangling("_ZN12panic_unwind3imp5panic17exception_cleanup17he4cf772173d90f46E"), true);
+ // Rust v0-mangled symbol
+ CHECK_EQ(Demangle::needsDemangling("_RNvCs6KtT2fMGqXk_8infiloop4main"), true);
+ // C++ symbol
+ CHECK_EQ(Demangle::needsDemangling("_ZNKSbIwSt11char_traitsIwESaIwEE4_Rep12_M_is_sharedEv"), true);
+ // C symbols
+ CHECK_EQ(Demangle::needsDemangling("malloc"), false);
+ CHECK_EQ(Demangle::needsDemangling("_malloc"), false);
+}
+
+// demangle calls _cxa_demangle, which should support demangling Itanium-mangled symbols across all Linux systems,
+// but might have a different mangling format on some other platforms.
+#ifdef __linux__
+TEST_CASE(Demangle_test_demangle_cpp) {
+ const char *s = Demangle::demangle("_ZNSt15basic_streambufIwSt11char_traitsIwEE9pbackfailEj", false);
+ CHECK_EQ (strcmp(s, "std::basic_streambuf<wchar_t, std::char_traits<wchar_t> >::pbackfail"), 0);
Trying to repro, I get this:
const char *a = "asd";
const char *b = "asdf";
CHECK_EQ(a, b);
// Assertion failed: (a == b),
// actual values: a = "asd", b = "asdf"
Could you share the code for the useless errors you got?
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you authored the thread.Message ID: ***@***.***>
|
|
Thanks, fixed in #1072 |
1. include the hash when doing full-signature demangling 2. fix the C++ vs. Rust heuristic 3. add unit tests
abecae2 to
5cfe5e7
Compare
|
Updated PR to include the fix + appropriately cleaned unit tests |
|
So should I keep the old hash heuristic since it seems to work well enough?On 6 Dec 2024, at 19:12, Andrei Pangin ***@***.***> wrote:
@apangin commented on this pull request.
In src/demangle.cpp:
@@ -117,14 +60,19 @@ void Demangle::cutArguments(char* s) {
}
char* Demangle::demangle(const char* s, bool full_signature) {
- // Check if the mangled symbol ends with a Rust hash "17h<hex>E"
- const char* e = strrchr(s, 'E');
- if (e != NULL && e - s > 22 && e[-19] == '1' && e[-18] == '7' && e[-17] == 'h') {
- const char* h = e - 16;
- while ((*h >= '0' && *h <= '9') || (*h >= 'a' && *h <= 'f')) h++;
- if (h == e) {
- return demangleRust(s + 3, e - 19);
- }
+ // try to demangle as Rust
+ struct demangle demangle;
+ rust_demangle_demangle(s, &demangle);
Some of these symbols are demangled as Rust producing a different result, e.g.:
_ZN12_GLOBAL__N_113single_threadE
Rust: _GLOBAL__N_1::single_thread
CPP: (anonymous namespace)::single_thread
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you authored the thread.Message ID: ***@***.***>
|
|
So if a symbol starts with _R or if it ends with a h17…E use the Rust demangling, otherwise use cxa_demangle.This will use cxa_demangle for legacy-mangled Rust symbols that have an LLVM termination, but I think it’s not too bad.On 6 Dec 2024, at 19:16, ***@***.*** wrote:So should I keep the old hash heuristic since it seems to work well enough?On 6 Dec 2024, at 19:12, Andrei Pangin ***@***.***> wrote:
@apangin commented on this pull request.
In src/demangle.cpp:
@@ -117,14 +60,19 @@ void Demangle::cutArguments(char* s) {
}
char* Demangle::demangle(const char* s, bool full_signature) {
- // Check if the mangled symbol ends with a Rust hash "17h<hex>E"
- const char* e = strrchr(s, 'E');
- if (e != NULL && e - s > 22 && e[-19] == '1' && e[-18] == '7' && e[-17] == 'h') {
- const char* h = e - 16;
- while ((*h >= '0' && *h <= '9') || (*h >= 'a' && *h <= 'f')) h++;
- if (h == e) {
- return demangleRust(s + 3, e - 19);
- }
+ // try to demangle as Rust
+ struct demangle demangle;
+ rust_demangle_demangle(s, &demangle);
Some of these symbols are demangled as Rust producing a different result, e.g.:
_ZN12_GLOBAL__N_113single_threadE
Rust: _GLOBAL__N_1::single_thread
CPP: (anonymous namespace)::single_thread
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you authored the thread.Message ID: ***@***.***>
|
Sounds good to me |
|
Anything else?
|
|
Just a minor thing: perhaps, rename |
|
The C code now has a link - 9ac516a#diff-6e3cd5c659e8a31192524378ab2bf3b2a71ffd6693fa346418d045fb29cb3188R9
Will rename the file.
|
|
updated to use the old (same as before this PR) Rust vs. C++ heuristic |
fc4319c to
be35d14
Compare
9c70352 to
34f736f
Compare
Description
Uses the code from https://github.com/rust-lang/rustc-demangle/pull/75/files to allow demangling V0 ("new-style") Rust symbols
Motivation and context
Currently, async-profiler is unable to demangle "new-style" Rust symbols. This PR adds a way of demangling them.
How has this been tested?
Currently, the demangler has been fuzzed against the official Rust demangler and found to be byte-to-byte correct. Still waiting to test this PR in the real world (cc @Mark-Simulacrum).
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.