[cxx-abi-dev] string constant mangling

Richard Smith richardsmith at googlers.com
Mon Jan 9 21:57:11 UTC 2012


On Mon, Jan 9, 2012 at 12:07 PM, John McCall <rjmccall at apple.com> wrote:

>
> On Jan 9, 2012, at 11:49 AM, Richard Smith wrote:
>
> On Mon, Jan 9, 2012 at 11:04 AM, John McCall <rjmccall at apple.com> wrote:
>
>> On Jan 9, 2012, at 9:55 AM, Richard Smith wrote:
>>
>> On Fri, Jan 6, 2012 at 3:40 PM, John McCall <rjmccall at apple.com> wrote:
>>
>>> On Jan 6, 2012, at 7:53 AM, Jason Merrill wrote:
>>> > On 03/08/2011 04:12 PM, David Vandevoorde wrote:
>>> >> On Mar 8, 2011, at 11:43 AM, Jason Merrill wrote:
>>> >>
>>> >>> It occurs to me that now with constexpr, string constants can appear
>>> in a constant expression:
>>> >>>
>>> >>> template<typename T>  constexpr T f(const T* p) { return p[0]; }
>>> >>> template<int>  struct N { };
>>> >>> template<typename T>  N<f((const T*)"1")>  g(T);
>>> >>> template<typename T>  N<f((const T*)"2")>  g(T);
>>> >>>
>>> >>> Here the two 'g's are different templates.
>>> >>
>>> >> Ouch :-(  I guess another tweak is needed then.
>>> >
>>> > So,
>>> >
>>> > L <string type> <value string> E
>>> >
>>> > where the string value is encoded in hex, omitting the terminal NUL?
>>>
>>> This works for me.  Clarifications:
>>>  - We don't need to distinguish "a" vs. u8"a" vs. R"a" because we're
>>> encoding
>>>    the raw bytes as represented on the platform and because we're
>>> separately
>>>    encoding the byte-length.
>>>  - This implies platform endianness for multibyte encodings.
>>>  - We should use lowercase hex to distinguish the terminal E.
>>>
>>> > Maybe use an MD5 hash for strings longer than 16 bytes?
>>>
>>> Probably a good idea.  Clarify as "more than 16 bytes of data,
>>> excluding the implicit null on non-raw literals".
>>
>>
>> Preferably "more than 15 bytes" -- this won't make any encoding longer,
>> and a program could plausibly use both a 16 byte string literal and another
>> string literal containing the MD5 sum of the first.
>>
>>
>> I don't understand how this creates a collision.  The mangler doesn't
>> magically let one of the strings through unmangled just because it happens
>> to be an MD5 encoding.
>>
>
> Sorry, there was a typo in my description. I meant '[...] use both a >16
> byte string literal and [...]'. The same 32 character sequence can then
> easily be produced by two distinct string literals:
>
> #define STR1 "some string which is more than 16 bytes long"
> #define STR2
> "\x22\xea\x22\x46\x30\xd1\xa3\xc9\x44\x97\xe0\x86\xd7\x21\xda\x7a" // md5
> of STR1
> constexpr bool eq(const char *p1, const char *p2);
> template<typename T> std::enable_if<eq(STR1, T::s), void> f() { ... }
> template<typename T> std::enable_if<eq(STR2, T::s), void> f() { ... }
>
> Under the proposed rule, STR1 gets md5sum'd, STR2 does not, and we create
> the same mangled name for both functions.
>
>
> No.  The length of the literal is also mangled.  This was necessary before
> we considered constexpr.
>

Ah, my apologies for the noise then! (I was led astray by the terminating E)

- Richard
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://sourcerytools.com/pipermail/cxx-abi-dev/attachments/20120109/59b18bd7/attachment.html>


More information about the cxx-abi-dev mailing list