Special characters in mangled names

Mon Nov 15 18:40:00 UTC 1999

>>>>> "Christophe" == Christophe de Dinechin <ddd at cup.hp.com> writes:

    Christophe> Can you please explain what is a "collision-resistant"
    Christophe> hash function? I doubt you would get the same
    Christophe> collisions from two different translation units, so I
    Christophe> don't see how you could ensure a name that is
    Christophe> identical between TUs and also avoids collisions.

A collision-resistant hash function is a notion from cryptography.
(That's the world I spend a lot of my time in when I'm not doing
compiler stuff.)

Suppose you have an n-bit hash, so you have 2^n hash values.  A
collision-resistant hash is one where the probability of two randomly
chosen strings hashing to the same value is (very close to) 1/(2^n).
A stronger notion of this is that finding strings that collide is
computationally infeasible.

Certainly, hashing introduces a probabilistic nature to things: it
becomes possible that two different functions could hash to the same
hash-mangled name.  However, by choosing a good hash function (and
provably good ones exist) and enough bits, you can make it
considerably less likely that in the next hundred years and two
distinct functions will hash to the same name than that cosmic rays
will cause unpredicatable linker errors.

    Christophe> Also: There is a wide assumption that name mangling is
    Christophe> reversible.  In other words, there is a c++filt tool,
    Christophe> and hundreds of scripts use it somehow (combined with
    Christophe> nm, to process the error output of ld, etc.) I'm not
    Christophe> sure it is wise to give up this assumption.

Yup, this is the biggest objection I can think of.  

We originally came up with this idea for our C++-to-C translator.  We
ship this to people with embedded systems whose linkers only support
16-characters; by using a collision-resistant hash they can use C++.
Nobody has ever run into a collision.  We solved the c++-filt problem
by keeping a database mapping hashes back to mangled names.  (The
probabilistic guarantee says that this database can actually be
global; in our lifetime will never see two things with the same hash.)
So, it's still possible to make a c++-filt that works, but it is
admittedly more difficult.

The biggest advantage to this scheme is that you can put an upper
bound on symbol lengths, even if the presence of truly huge template
usage.  (I've seen programs where mangled names approached a megabyte
in length.)  I would only suggest hashing long names; names under 100
characters, or even a thousand characters, say, could be left
unhashed.

--
Mark Mitchell                   mark at codesourcery.com
CodeSourcery, LLC               http://www.codesourcery.com