[cxx-abi-dev] string constant mangling

Richard Smith richardsmith at googlers.com
Mon Jan 9 19:49:41 UTC 2012


On Mon, Jan 9, 2012 at 11:04 AM, John McCall <rjmccall at apple.com> wrote:

> On Jan 9, 2012, at 9:55 AM, Richard Smith wrote:
>
> On Fri, Jan 6, 2012 at 3:40 PM, John McCall <rjmccall at apple.com> wrote:
>
>> On Jan 6, 2012, at 7:53 AM, Jason Merrill wrote:
>> > On 03/08/2011 04:12 PM, David Vandevoorde wrote:
>> >> On Mar 8, 2011, at 11:43 AM, Jason Merrill wrote:
>> >>
>> >>> It occurs to me that now with constexpr, string constants can appear
>> in a constant expression:
>> >>>
>> >>> template<typename T>  constexpr T f(const T* p) { return p[0]; }
>> >>> template<int>  struct N { };
>> >>> template<typename T>  N<f((const T*)"1")>  g(T);
>> >>> template<typename T>  N<f((const T*)"2")>  g(T);
>> >>>
>> >>> Here the two 'g's are different templates.
>> >>
>> >> Ouch :-(  I guess another tweak is needed then.
>> >
>> > So,
>> >
>> > L <string type> <value string> E
>> >
>> > where the string value is encoded in hex, omitting the terminal NUL?
>>
>> This works for me.  Clarifications:
>>  - We don't need to distinguish "a" vs. u8"a" vs. R"a" because we're
>> encoding
>>    the raw bytes as represented on the platform and because we're
>> separately
>>    encoding the byte-length.
>>  - This implies platform endianness for multibyte encodings.
>>  - We should use lowercase hex to distinguish the terminal E.
>>
>> > Maybe use an MD5 hash for strings longer than 16 bytes?
>>
>> Probably a good idea.  Clarify as "more than 16 bytes of data,
>> excluding the implicit null on non-raw literals".
>
>
> Preferably "more than 15 bytes" -- this won't make any encoding longer,
> and a program could plausibly use both a 16 byte string literal and another
> string literal containing the MD5 sum of the first.
>
>
> I don't understand how this creates a collision.  The mangler doesn't
> magically let one of the strings through unmangled just because it happens
> to be an MD5 encoding.
>

Sorry, there was a typo in my description. I meant '[...] use both a >16
byte string literal and [...]'. The same 32 character sequence can then
easily be produced by two distinct string literals:

#define STR1 "some string which is more than 16 bytes long"
#define STR2
"\x22\xea\x22\x46\x30\xd1\xa3\xc9\x44\x97\xe0\x86\xd7\x21\xda\x7a" // md5
of STR1
constexpr bool eq(const char *p1, const char *p2);
template<typename T> std::enable_if<eq(STR1, T::s), void> f() { ... }
template<typename T> std::enable_if<eq(STR2, T::s), void> f() { ... }

Under the proposed rule, STR1 gets md5sum'd, STR2 does not, and we create
the same mangled name for both functions. Under my rule, both STR1 and STR2
get mangled: all 32-character encodings are now MD5 sums of string
literals, so the user can't easily create collisions.

- Richard
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://sourcerytools.com/pipermail/cxx-abi-dev/attachments/20120109/482f7102/attachment.html>


More information about the cxx-abi-dev mailing list