KEMBAR78
GitHub · Where software is built
Skip to content

[libc++] <regex>: Character class [\W\D] fails to match alphabetic characters #131516

@muellerj2

Description

@muellerj2

The (ECMAScript) regular expression [\W\D] describes a character class that matches the union of (a) all non-alphanumeric characters and (b) all non-digits. So effectively, the character class should be equivalent to [\D] and thus match all non-digits. However, libc++'s regex implementation only matches non-alphanumeric characters.

Test case:

#include <iostream>
#include <regex>

using namespace std;

int main()
{
    regex re(R"([\W\D])");
    cout << "matches alphabetic: " << regex_match("a", re) << '\n'
         << "matches digit: " << regex_match("0", re) << '\n' 
         << "matches non-alphanumeric: " << regex_match(".", re);
    
    return 0;
}

https://godbolt.org/z/YdvY4Pb6a

This prints:

matches alphabetic: 0
matches digit: 0
matches non-alphanumeric: 1

But it should print (as MSVC STL and libstdc++ do here):

matches alphabetic: 1
matches digit: 0
matches non-alphanumeric: 1

The problem lies here:

_LIBCPP_HIDE_FROM_ABI void __add_neg_class(typename regex_traits<_CharT>::char_class_type __mask) {
__neg_mask_ |= __mask;
}

The negated character classes are bitwise or'ed, but De Morgan's law says that (not w) or (not d) = not (w and d), so the bit masks should really be bitwise and'ed.

But bitwise and'ing is problematic as well, because the standard only provides a guarantee that bitwise or'ing works, but doesn't state that bitwise and'ing corresponds to the intersection of the character classes (see [re.grammar/9]). Maybe and'ing will still work for libc++'s std::regex_traits<char> and std::regex_traits<wchar_t> traits classes (although I haven't checked that), but it might not do the right thing for some user-provided traits classes.

Metadata

Metadata

Assignees

No one assigned

    Labels

    libc++libc++ C++ Standard Library. Not GNU libstdc++. Not libc++abi.regexIssues related to regex

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions