Special characters in mangled names

Mon Nov 15 18:51:51 UTC 1999

Thanks for the clarification. This sounds like an interesting idea.  
I sure would appreciate decent link times with C++ :-)

Christophe

> >>>>> "Christophe" == Christophe de Dinechin <ddd at cup.hp.com> writes: 
>
>     Christophe> Can you please explain what is a "collision-resistant" 
>     Christophe> hash function? I doubt you would get the same
>     Christophe> collisions from two different translation units, so I 
>     Christophe> don't see how you could ensure a name that is
>     Christophe> identical between TUs and also avoids collisions.
>
> A collision-resistant hash function is a notion from cryptography. 
> (That's the world I spend a lot of my time in when I'm not doing
> compiler stuff.)
>
> Suppose you have an n-bit hash, so you have 2^n hash values.  A
> collision-resistant hash is one where the probability of two randomly 
> chosen strings hashing to the same value is (very close to) 1/(2^n). 
> A stronger notion of this is that finding strings that collide is
> computationally infeasible.
>
> Certainly, hashing introduces a probabilistic nature to things: it 
> becomes possible that two different functions could hash to the same 
> hash-mangled name.  However, by choosing a good hash function (and 
> provably good ones exist) and enough bits, you can make it
> considerably less likely that in the next hundred years and two
> distinct functions will hash to the same name than that cosmic rays 
> will cause unpredicatable linker errors.
>
>     Christophe> Also: There is a wide assumption that name mangling is 
>     Christophe> reversible.  In other words, there is a c++filt tool, 
>     Christophe> and hundreds of scripts use it somehow (combined with 
>     Christophe> nm, to process the error output of ld, etc.) I'm not 
>     Christophe> sure it is wise to give up this assumption.
>
> Yup, this is the biggest objection I can think of.
>
> We originally came up with this idea for our C++-to-C translator.  We 
> ship this to people with embedded systems whose linkers only support 
> 16-characters; by using a collision-resistant hash they can use C++. 
> Nobody has ever run into a collision.  We solved the c++-filt problem 
> by keeping a database mapping hashes back to mangled names.  (The
> probabilistic guarantee says that this database can actually be
> global; in our lifetime will never see two things with the same hash.) 
> So, it's still possible to make a c++-filt that works, but it is
> admittedly more difficult.
>
> The biggest advantage to this scheme is that you can put an upper
> bound on symbol lengths, even if the presence of truly huge template 
> usage.  (I've seen programs where mangled names approached a megabyte 
> in length.)  I would only suggest hashing long names; names under 100 
> characters, or even a thousand characters, say, could be left
> unhashed.
>
> --
> Mark Mitchell                   mark at codesourcery.com
> CodeSourcery, LLC               http://www.codesourcery.com
>
>