mangling of UCNs

scott douglass sdouglass at arm.com
Thu Sep 19 12:36:35 UTC 2002


Hello,

The mangling of UCNs is not specified by the ABI.  I think it should be.  Perhaps the idea is that the mangling of UCNs would have to be the same as an implementation's C99 implementation?  The only implementation I know that supports UCNs encodes them as "_unnnn" which makes "x\u00c0" mangle the same as the legal user identifier "x_u00c0".

Anyway, here's a fairly simple proposal.  I'm imagining that allowing "$" in identifiers is a common extension.

"n" is a hex digit [0-9a-f].

Treat \U0000nnnn as \unnnn
Encode hex digits as lower case.
\u0024 => "$"
\u0040 => "@"
\unnnn => "\unnnnn"
\Unnnn => "\Unnnnnnnnn"

This assumes the linker is happy with symbols containing "$", "@" and "\".

As far as I know "@" is rarely allowed in identifiers so an alternative that avoids "\" is:

Treat \U0000nnnn as \unnnn
\u0024 => "$"
\unnnn => "@unnnnn"
\Unnnn => "@Unnnnnnnnn"

Other alternatives:
  You could pick a different escape charater from the other non-identifier characters, e.g. "#", "!", ...
  You could use two escape characters instead of "@u" and "@U", e.g. "@00co" and "!00010000".
  You could adopt the variable width encoding similar to what other manglings use, e.g.  "@c0_" and "@10000_".

Interestingly Annex E gives no legal UCNs that require the \Unnnnnnnn form.

Fire away.




More information about the cxx-abi-dev mailing list