[cxx-abi-dev] string constant mangling

John McCall rjmccall at apple.com
Mon Jan 9 19:04:32 UTC 2012


On Jan 9, 2012, at 9:55 AM, Richard Smith wrote:
> On Fri, Jan 6, 2012 at 3:40 PM, John McCall <rjmccall at apple.com> wrote:
> On Jan 6, 2012, at 7:53 AM, Jason Merrill wrote:
> > On 03/08/2011 04:12 PM, David Vandevoorde wrote:
> >> On Mar 8, 2011, at 11:43 AM, Jason Merrill wrote:
> >>
> >>> It occurs to me that now with constexpr, string constants can appear in a constant expression:
> >>>
> >>> template<typename T>  constexpr T f(const T* p) { return p[0]; }
> >>> template<int>  struct N { };
> >>> template<typename T>  N<f((const T*)"1")>  g(T);
> >>> template<typename T>  N<f((const T*)"2")>  g(T);
> >>>
> >>> Here the two 'g's are different templates.
> >>
> >> Ouch :-(  I guess another tweak is needed then.
> >
> > So,
> >
> > L <string type> <value string> E
> >
> > where the string value is encoded in hex, omitting the terminal NUL?
> 
> This works for me.  Clarifications:
>  - We don't need to distinguish "a" vs. u8"a" vs. R"a" because we're encoding
>    the raw bytes as represented on the platform and because we're separately
>    encoding the byte-length.
>  - This implies platform endianness for multibyte encodings.
>  - We should use lowercase hex to distinguish the terminal E.
> 
> > Maybe use an MD5 hash for strings longer than 16 bytes?
> 
> Probably a good idea.  Clarify as "more than 16 bytes of data,
> excluding the implicit null on non-raw literals".
> 
> Preferably "more than 15 bytes" -- this won't make any encoding longer, and a program could plausibly use both a 16 byte string literal and another string literal containing the MD5 sum of the first.

I don't understand how this creates a collision.  The mangler doesn't magically let one of the strings through unmangled just because it happens to be an MD5 encoding.  A 16-byte string and its MD5 are both strings with 16 bytes of literal data.  Therefore the mangler applies the same algorithm to both, so either the first is mangled as hex("abcdef...") and the second is hex(md5("abcdef...")), or the first is hex(md5("abcdef...")) and the second is hex(md5(md5("abcdef..."))).  Both options create distinct manglings and require 32 characters in the encoding for both strings.  All your tweak accomplishes is a pair of spurious MD5-encodings.

John.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://sourcerytools.com/pipermail/cxx-abi-dev/attachments/20120109/8e361071/attachment.html>


More information about the cxx-abi-dev mailing list