[cxx-abi-dev] string constant mangling

John McCall rjmccall at apple.com
Mon Jan 9 20:07:15 UTC 2012


On Jan 9, 2012, at 11:49 AM, Richard Smith wrote:

> On Mon, Jan 9, 2012 at 11:04 AM, John McCall <rjmccall at apple.com> wrote:
> On Jan 9, 2012, at 9:55 AM, Richard Smith wrote:
>> On Fri, Jan 6, 2012 at 3:40 PM, John McCall <rjmccall at apple.com> wrote:
>> On Jan 6, 2012, at 7:53 AM, Jason Merrill wrote:
>> > On 03/08/2011 04:12 PM, David Vandevoorde wrote:
>> >> On Mar 8, 2011, at 11:43 AM, Jason Merrill wrote:
>> >>
>> >>> It occurs to me that now with constexpr, string constants can appear in a constant expression:
>> >>>
>> >>> template<typename T>  constexpr T f(const T* p) { return p[0]; }
>> >>> template<int>  struct N { };
>> >>> template<typename T>  N<f((const T*)"1")>  g(T);
>> >>> template<typename T>  N<f((const T*)"2")>  g(T);
>> >>>
>> >>> Here the two 'g's are different templates.
>> >>
>> >> Ouch :-(  I guess another tweak is needed then.
>> >
>> > So,
>> >
>> > L <string type> <value string> E
>> >
>> > where the string value is encoded in hex, omitting the terminal NUL?
>> 
>> This works for me.  Clarifications:
>>  - We don't need to distinguish "a" vs. u8"a" vs. R"a" because we're encoding
>>    the raw bytes as represented on the platform and because we're separately
>>    encoding the byte-length.
>>  - This implies platform endianness for multibyte encodings.
>>  - We should use lowercase hex to distinguish the terminal E.
>> 
>> > Maybe use an MD5 hash for strings longer than 16 bytes?
>> 
>> Probably a good idea.  Clarify as "more than 16 bytes of data,
>> excluding the implicit null on non-raw literals".
>> 
>> Preferably "more than 15 bytes" -- this won't make any encoding longer, and a program could plausibly use both a 16 byte string literal and another string literal containing the MD5 sum of the first.
> 
> I don't understand how this creates a collision.  The mangler doesn't magically let one of the strings through unmangled just because it happens to be an MD5 encoding.
> 
> Sorry, there was a typo in my description. I meant '[...] use both a >16 byte string literal and [...]'. The same 32 character sequence can then easily be produced by two distinct string literals:
> 
> #define STR1 "some string which is more than 16 bytes long"
> #define STR2 "\x22\xea\x22\x46\x30\xd1\xa3\xc9\x44\x97\xe0\x86\xd7\x21\xda\x7a" // md5 of STR1
> constexpr bool eq(const char *p1, const char *p2);
> template<typename T> std::enable_if<eq(STR1, T::s), void> f() { ... }
> template<typename T> std::enable_if<eq(STR2, T::s), void> f() { ... }
> 
> Under the proposed rule, STR1 gets md5sum'd, STR2 does not, and we create the same mangled name for both functions.

No.  The length of the literal is also mangled.  This was necessary before we considered constexpr.

John.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://sourcerytools.com/pipermail/cxx-abi-dev/attachments/20120109/a5729200/attachment.html>


More information about the cxx-abi-dev mailing list