universal character names

Martin von Loewis loewis at informatik.hu-berlin.de
Tue Apr 11 17:28:12 UTC 2000


> UTF-8 is inappropriate for mangled names, as it uses values > 127 to
> encode non-ASCII characters.

Why is it not appropriate? AFAICT, the gABI has no restriction in that
respect. ch4.strtab.html says 

# String table sections hold null-terminated character sequences,
# commonly called strings.

I can see there are a number of alternatives. I think it is important
that there is agreement on the rules, in a way that is also
interoperable with C99 implementations. What those rules are is not
that important.

> GNU Java encodes names in UTF-8 internally.  For the mangled name, if there
> are non-ASCII characters, it adds a 'U' to the beginning and encodes each
> such UCS-2 character as _%04x.  See gcc/java/mangle.c.

In the C++ ABI, the natural adaptation of that approach would be to
mangle non-ASCII-containing identifiers as _U instead of _Z, right?
Unfortunately, that does not give a solution for C names. I believe
the GNU Java approach also cannot be extended to C99.

Regards,
Martin




More information about the cxx-abi-dev mailing list