universal character names
Martin von Loewis
loewis at informatik.hu-berlin.de
Thu Mar 23 07:53:05 UTC 2000
I could not find an issue for that, but I think it needs to be
one. 2.2, [lex.charset]/2 allows usage of universal-character-names in
C++ programs, especially in identifiers and strings. This gives us two
issues:
A) External names for identifiers containing unicode letters; e.g.
namespace newmath{
const long double \u03A0 = 3.14159265358979;
}
This is also an issue for C99, so it may be that the base ABI has a
specification; we'd have to follow that at least for extern "C"
names. If not, I propose that such names are encoded in UTF-8.
B) Object file representation of narrow and wide string literals
containing such characters, eg.
wchar_t MvL[]=L"Martin von L\u00F6wis";
First, what is sizeof(wchar_t) in the base ABI? I'll assume 4 for
the moment. Then, the question comes down to: What is the execution
character set, and the wide execution character set? 2.2/3 says
they are implementation-defined, so I guess we must define
them. Typically, people expect this to be a run-time setting (which
is a reasonable assumption), but it kind-of breaks for string
literals.
Proposal: The wide execution character set is UCS-4. The
execution-character-set is "as-is", i.e. bytes from the source
character set are copied unmodified to the object
file. Universal-character-names appearing in narrow (ie. char)
strings are not portable in this ABI (the other alternatives would
be to say they are Latin-1, or encoded as UTF-8, I guess).
Martin
More information about the cxx-abi-dev
mailing list