demangling issues

Alex Samuel samuel at codesourcery.com
Thu Jun 22 10:06:34 UTC 2000


Hi,

Here are some issues we've run across while implementing the ABI
mangling scheme (and a demangler) in gcc over recent months, and how
we addressed them.  I apologize that this description isn't too
detailed; I've been rushing to tie up some loose ends in the
implementation in the last few days.  I'll supply more details as
necessary.

Our mangling implementation is checked into the gcc's CVS trunk, in
case anyone wants to experiment with it.  A particularly easy way to
try out mangling on a particular case is with this online compilation
web form, which uses a recent snapshot build of gcc:

    http://www.codesourcery.com/gcc-compile.shtml

Enter your code into the form, and enter `-fnew-abi' as an additional
compiler flag.  Compile to assembly or object and check out your
mangled names.  I'm happy to help anyone who'se interested build the
standalone demangler too.

Regards
Alex Samuel
CodeSourcery LLC


---

First, shortly before the hiatus in ABI committee meetings, I sent out
a proposed modification of the mangling grammar.  The modifications
correct some errors and brought the substitution rules in line with
the committee's intent, I think.  They've also proven easier to
implement.  We've implemented this grammar, with slight modifications.
Here are the relevant productions:

    <mangled-name>      ::= _Z <encoding>

    <encoding>		::= <function name> <bare-function-type>
			::= <data name>
			::= <special-name>   # see below
			::= <substitution>

    <name>              ::= <unscoped-name>
                        ::= <unscoped-template-name> <template-args>
			::= <nested-name>
                        ::= <local-name>

    <unscoped-name>     ::= <unqualified-name>
			::= St <unqualified-name>   # ::std::

    <unscoped-template-name>    
                        ::= <unscoped-name>
                        ::= <substitution>

    <nested-name>       ::= N [<CV-qualifiers>] <prefix> <component> E
			::= N [<CV-qualifiers>] <template-prefix> 
			    <template-args> E

    <prefix>            ::= <prefix> <component>
                        ::= <template-prefix> <template-args>
			::= # empty
			::= <substitution>

    <template-prefix>   ::= <prefix> <template component>
                        ::= <substitution>

    <component>         ::= <unqualified-name>
                        ::= <local-name>

    <unqualified-name>  ::= <operator-name>
			::= <ctor-dtor-name>  
			::= <source-name>   

    <class-enum-type>   ::= <name>

---

We also clarified the mangling of numbers using

    <number> ::= [n] <positive-number>

    <positive-number> ::= <decimal integer>

The `n' is for negative numbers.

Then

    <source-name> ::= <length positive-number> <identifier>

---

There's no operator code for for unary plus.  We used `pl'.  An
operator code for conversions is also needed.  We used this:

    <operator-name>     ::= cv <type>


---

The mangling for unions was not mentioned.  We used <class-enum-name>.

---

Template template args and parameters were not explicitly mentioned,
except in examples.  We used these productions:

    <template-arg>      ::= <template-template-arg>

    <template-template-arg>
			::= <name>
			::= <substitution>

    <type>              ::= <template-template-param> <template-args>

    <template-template-param>
                        ::= <template-param> 
			::= <substitution>

---

We ran into trouble mangling some of the special objects like thunks
and guard variables, and concluded that these would make more sense if
the mangling didn't make them look like they were scoped in some
enclosing scope.  It doesn't really make sense to think of a typeinfo
variable, for instance, as scoped inside the type it describes --
especially if that type is a built-in type.  Instead, it would make
more sense to think of it as a global object, since it's fully
specified by the type it describes.  

So, we used these manglings:

    <special-name> ::= TV <type>    # virtual table
                   ::= TT <type>    # VTT
                   ::= TI <type>    # typeinfo structure
		   ::= TS <type>    # typeinfo name
                   ::= GV <name>    # guard variable
                   ::= Th <offset number> _ <base encoding>
                        # non-virtual base override thunk
                        # base is the nominal target function of thunk
                   ::= Tv <offset number> _ <vcall offset number> _ 
                         <base encoding>
                        # virtual base override thunk
                        # base is the nominal target function of thunk

That's why <special-name> is a production for <encoding>, above.  

The only special names left are ctors and dtors, so we used

    <unqualified-name> ::= <ctor-dtor-name>

where

    <ctor-dtor-name> ::= C1 # complete object (in-charge) ctor
		     ::= C2 # base object (not-in-charge) ctor
		     ::= C3 # complete object (in-charge) allocating ctor
		     ::= C4 # base object (not-in-charge) allocating ctor
		     ::= D0 # deleting (in-charge) dtor
		     ::= D1 # complete object (in-charge) dtor
		     ::= D2 # base object (not-in-charge) dtor

So, for instance given

    namespace NS { class C { virtual void foo (); }; }

the virtual table for C would be mangled

    _ZTVN2NS1CE

and the typeinfo struct for C would be mangled

    _ZTIN2NS1CE


The typeinfo struct for int is mangled

    _ZTIi

and in the function

    int C::foo (int i) 
    {
      static int j = 0;
      return ++j + i;
    }

the guard variable for j is mangled

    _ZGVZN1C3fooEiE1j

---

We need a way of talking about classes declared in function scopes, so
add this production:

    <class-enum-type> ::= <local-name>

---

The <local-name> production should be

    <local-name> := Z <function encoding> E <entity unqualified-name> 
                      [<discriminator>]

If <name> is used instead of <unqualified-name>, it's infinitely recursive.

---

Literals in template parameters that are pointer-to-member constants
cause problems.  It is in fact not always possible to tell whether
they are pointer-to-member literals at all.  Given the template
instantiation

    template void f (C<S, &S::j>);

of

    template <class T> void f (C<T, &T::j>) {}

the second template parameter could be a pointer-to-member, or a
pointer to a static member function or data member.  You can't know if
you're mangling it without substituting S for T.

So, we added a new operator that denotes scope resolution (the ::
token), represented by the code `sr'.  The template instantiation
given above mangles to 

    _Z1fI1SEv1CIT_XadsrS2_1jEE





More information about the cxx-abi-dev mailing list