This page is a snapshot from the LWG issues list, see the Library Active Issues List for more information and the meaning of WP status.
char to sequences of wchar_tSection: 28.5.6.4 [format.formatter.spec] Status: WP Submitter: Mark de Wever Opened: 2023-06-01 Last modified: 2024-07-08
Priority: 3
View other active issues in [format.formatter.spec].
View all other issues in [format.formatter.spec].
View all issues with WP status.
Discussion:
I noticed some interesting features introduced by the range based formatters in C++23
// Ill-formed in C++20 and C++23
const char* cstr = "hello";
char* str = const_cast<char*>(cstr);
std::format(L"{}", str);
std::format(L"{}",cstr);
// Ill-formed in C++20
// In C++23 they give L"['h', 'e', 'l', 'l', 'o']"
std::format(L"{}", "hello"); // A libc++ bug prevents this from working.
std::format(L"{}", std::string_view("hello"));
std::format(L"{}", std::string("hello"));
std::format(L"{}", std::vector{'h', 'e', 'l', 'l', 'o'});
An example is shown here. This only shows libc++ since libstdc++ and MSVC STL have not implemented the formatting ranges papers (P2286R8 and P2585R0) yet.
The difference between C++20 and C++23 is the existence of range formatters. These formatters use the formatter specializationformatter<char, wchar_t> which converts the sequence of chars
to a sequence of wchar_ts.
In this conversion same_as<char, charT> is false, thus the requirements
of the range-type s and ?s ([tab:formatter.range.type]) aren't met. So
the following is ill-formed:
std::format(L"{:s}", std::string("hello")); // Not L"hello"
It is surprising that some string types can be formatted as a sequence
of wide-characters, but others not. A sequence of characters can be a
sequence UTF-8 code units. This is explicitly supported in the width
estimation of string types. The conversion of char to wchar_t will
convert the individual code units, which will give incorrect results for
multi-byte code points. It will not transcode UTF-8 to UTF-16/32. The
current behavior is not in line with the note in
28.5.6.4 [format.formatter.spec]/2
[Note 1: Specializations such as
formatter<wchar_t, char>andformatter<const char*, wchar_t>that would require implicit multibyte / wide string or character conversion are disabled. — end note]
Disabling this could be done by explicitly disabling the char to wchar_t
sequence formatter. Something along the lines of
template<ranges::input_range R>
requires(format_kind<R> == range_format::sequence &&
same_as<remove_cvref_t<ranges::range_reference_t<R>>, char>)
struct formatter<R, wchar_t> : __disabled_formatter {};
where __disabled_formatter satisfies 28.5.6.4 [format.formatter.spec]/5, would
do the trick. This disables the conversion for all sequences not only
the string types. So vector, array, span, etc. would be disabled.
range_formatter. This allows
users to explicitly opt in to this formatter for their own
specializations.
An alternative would be to only disable this conversion for string type
specializations (28.5.6.4 [format.formatter.spec]/2.2) where char to
wchar_t is used:
template<size_t N> struct formatter<charT[N], charT>; template<class traits, class Allocator> struct formatter<basic_string<charT, traits, Allocator>, charT>; template<class traits> struct formatter<basic_string_view<charT, traits>, charT>;
Disabling following the following two is not strictly required:
template<> struct formatter<char*, wchar_t>; template<> struct formatter<const char*, wchar_t>;
However, if (const) char* becomes an input_range
in a future version C++, these formatters would become enabled.
Disabling all five instead of the three required specializations seems like a
future proof solution.
template<> struct formatter<wchar_t, char>;
there are no issues for wchar_t to char conversions.
Do we want to allow string types of chars to be formatted as
sequences of wchar_ts?
Do we want to allow non string type sequences of chars to be
formatted as sequences of wchar_ts?
Should we disable char to wchar_t conversion in the range_formatter?
SG16 has indicated they would like to discuss this issue during a telecon.
[2023-06-08; Reflector poll]
Set status to SG16 and priority to 3 after reflector poll.
[2023-07-26; Mark de Wever provides wording confirmed by SG16]
[2024-03-18; Tokyo: move to Ready]
[St. Louis 2024-06-29; Status changed: Voting → WP.]
Proposed resolution:
This wording is relative to N4950.
Modify 28.5.6.4 [format.formatter.spec] as indicated:
[Drafting note: The unwanted conversion happens due to the
formatterbase class specialization (28.5.7.3 [format.range.fmtdef])struct range-default-formatter<range_format::sequence, R, charT>which is defined the header
<format>. Therefore the disabling is only needed in this header) — end drafting note]
-2- […]
Theparsemember functions of these formatters interpret the format specification as a std-format-spec as described in 28.5.2.2 [format.string.std]. [Note 1: Specializations such asformatter<wchar_t, char>andthat would require implicit multibyte / wide string or character conversion are disabled. — end note] -?- The headerformatter<const char*, wchar_t><format>provides the following disabled specializations:
(?.1) — The string type specializations
template<> struct formatter<char*, wchar_t>; template<> struct formatter<const char*, wchar_t>; template<size_t N> struct formatter<char[N], wchar_t>; template<class traits, class Allocator> struct formatter<basic_string<char, traits, Allocator>, wchar_t>; template<class traits> struct formatter<basic_string_view<char, traits>, wchar_t>;-3- For any types
TandcharTfor which neither the library nor the user provides an explicit or partial specialization of the class templateformatter,formatter<T, charT>is disabled.