unicode again

Thu Aug 31 05:08:28 UTC 2000

>>>>> Martin von Loewis <loewis at informatik.hu-berlin.de> writes:

 >> When a name includes extended characters, what do we put in the name length
 >> in the mangling?  The length in abstract characters, or in bytes?

 > The posted resolution of F-8 only allows for encoding the number of
 > characters. The reason is that you first have to put the length into
 > the resulting C symbol, and convert that into a byte sequence only
 > afterwards.

I suppose it's a question of implementation strategy.  I've been planning
to represent extended characters in UTF-8 internally, so we would need to
jump through hoops to get the number of characters back again.

And, more significantly, the same concern applies to the demangler; if we
count characters, the demangler has to convert names from UTF-8 to UCS-4
one character at a time until it's seen the right number of characters.  If
we count bytes, it can ignore the contents of the name, and just feed the
entire demangled output to iconv at the end.  And we don't have to deal
with UCS-4 at all.

 > If you want to revert that, I guess we'd actually have to explicitly
 > specify the rules for encoding Unicode characters.

Seems to me that we have to do that anyway, so we get compatible
manglings.  Tom Tromey was suggesting that we just use UTF-8 and expect
binutils to deal appropriately, since this is a new platform.

Jason