universal character names
Martin von Loewis
loewis at informatik.hu-berlin.de
Tue Apr 11 17:28:12 UTC 2000
> UTF-8 is inappropriate for mangled names, as it uses values > 127 to
> encode non-ASCII characters.
Why is it not appropriate? AFAICT, the gABI has no restriction in that
respect. ch4.strtab.html says
# String table sections hold null-terminated character sequences,
# commonly called strings.
I can see there are a number of alternatives. I think it is important
that there is agreement on the rules, in a way that is also
interoperable with C99 implementations. What those rules are is not
that important.
> GNU Java encodes names in UTF-8 internally. For the mangled name, if there
> are non-ASCII characters, it adds a 'U' to the beginning and encodes each
> such UCS-2 character as _%04x. See gcc/java/mangle.c.
In the C++ ABI, the natural adaptation of that approach would be to
mangle non-ASCII-containing identifiers as _U instead of _Z, right?
Unfortunately, that does not give a solution for C names. I believe
the GNU Java approach also cannot be extended to C99.
Regards,
Martin
More information about the cxx-abi-dev
mailing list