Mangling: initial attempt

Daveed Vandevoorde daveed at edg.com
Wed Jan 19 05:31:13 UTC 2000


Hi all & Happy New Year 2000!

I've finally gotten around to put together a name mangling proposal.
It's not complete, not tested, and not formally validated, but it's
a start derived from intuitin.

Hopefully the description is sufficiently clear from the examples;
I admit the words are terse. (There are a bunch of examples at the
end.)

Feedback welcome,

	Daveed
-------------- next part --------------
Name mangling
=============

[ Notes:
   1) Most of the "special entities" spec is still missing.
   2) The truncation+hashing approach needs to be described if desired.
   3) A proof of nonambiguity is needed.
   4) Many things (e.g., the <prefix>) may need tweaking.
--end Notes ]

Entities with C linkage and file scope variables are not mangled.

General structure
-----------------

    <prefix><length><name><specialization>opt<type>opt<scope>opt

<prefix> is one of:
   . "_0" for namespace scope variables and static data members
   . "_1" for nontemplate nonoperator functions
   . "_2" for template nonoperator functions
   . "_3" for nontemplate operator functions
   . "_4" for template operator functions
   . "_5" for special entities (constructors etc; see below)

<length> is the decimal representation of the length of <name>.

<name> is one of:
   . the unqualified variable name for namespace scope variables
   . the unqualified member name for static data members
   . the unqualified function name for nonoperator nontemplate functions
   . the unqualified function template name for nonoperator template functions
   . an encoding of the operator for operator functions
   . a reserved encoding special entities (constructors etc; see below)

<specialization> encodes the template arguments for function templates.

<type> is used to disambiguate overloaded functions, but also to
distinguish the various virtual tables associated with a given complete
class type.  For nontemplate functions, <type> lists the parameter types
only. For template functions, <type> lists the return type followed by
the parameter types. <type> is omitted for variables and static data
members.

<scope> encodes the class or namespace scope that the function belongs to.
For virtual tables and RTTI structures, <scope> encodes the type for which
it is defined.

Operator encodings
------------------

Operators appear as function names, but also in nontype template argument
expressions.
   new           nw
   new[]         na
   delete        dl
   delete[]      da
   - (unary)     ng
   & (unary)     ad
   * (unary)     de
   ~             co
   +             pl
   -             mi
   *             ml
   /             dv
   %             rm
   &             an
   |             or
   ^             eo
   =             aS
   +=            pL
   -=            mI
   *=            mL
   /=            dV
   %=            rM
   &=            aN
   |=            oR
   ^=            eO
   <<            ls
   >>            rs
   <<=           lS
   >>=           rS
   ==            eq
   !=            ne
   <             lt
   >             gt
   <=            le
   >=            ge
   !             nt
   &&            aa
   ||            oo
   ++            pp
   --            mm
   ,             cm
   ->*           pm
   ->            pt
   ()            cl
   []            ix
   ?             qu
   (cast)        cv

Unlike Cfront, unary and binary operators using the same symbol have
different encodings.  All operators are encoded using exactly two letters,
the first of which is lowercase.

Other special functions and entities
------------------------------------

   TV            virtual table
   TI            typeinfo structure
   C1            complete object constructor
   C2            base object constructor
   D0            deleting destructor
   D1            complete object destructor
   D2            base object destructor


Type encodings
--------------
Types are encoded as follows:

builtin types: (one letter)
   void                     v
   wchar_t                  w
   bool                     b
   char                     c
   signed char              a
   unsigned char            h
   short                    s
   unsigned short           t
   int                      i
   unsigned int             j
   long                     l
   unsigned long            m
   long long                x
   unsigned long long       y
   float                    f
   double                   d
   long double              e
   ellipsis                 z

classes & enums:
   <decimal length of unqualified name><unqualified-name>
   Class names can optionally be followed by the encoding of a template
   argument list (see below).

template params (including nontype parameters):
   T<param num>_
   TT<param num>_  // For template template parameters

other dependent names: (see below)
   N<qual 1>...<qual N><unqual name>E

template argument list: (see below)
   I<arg1>...<argN>E

function types:
   F<return type><param type 1>...<param type N>E

array types:
   A<dimension>_<element type encoding>

pointers, references:
   P<encoding pointed-to type>
   R<encoding pointed-to type>

pointer-to-member:
   M<class type encoding><member type encoding>

cv-qualifiers:
   K const
   V volatile
   r restrict


Scope encoding
--------------
Namespace names are encoded like those of classes and enumerations.
The encoding for the <scope> segment (i.e., a qualifier) has the following
format:
   Q<qual 1>...<qual N>E
where each <qualJ> is the encoding of a class name or a namespace name.

Scope can also appear inside <type> to denote dependent types or bind
specific names as arguments. In that case the format is:
   N<qual 1>...<qual N><unqual name>E


Template argument encoding
--------------------------
Template-ids are encoded by following the unqualified name with
	I<arguments>E
This is used for the <specialization> segment in particular, but also in the
<type> and <scope> segments.

Type arguments appear using their regular encoding.  For example, the
template class "A<char, float>" is encoded as "1AIcfE".  A slightly more
involved example might be a dependent function parameter type "A<T2>::X"
(T2 is the second template parameter) which is encoded as "N1AIT2_E1XE",
where the "N...E" construct is used to describe a qualified name.

Nontype arguments can be:
   a) a literal, e.g. "A<42L>": these are encoded as "L<num><type>E";
      hence "A<42L>" becomes "1AIL42lEE". (false is "0b"; true is "1b")
   b) a reference to an entity with external linkage: encoded with
      "L<mangled name>E".  For example:
          void foo(char); // mangled as _13fooc
          template<void (&)(char)> struct CB;
          // CB<foo> is encoded with "2CBIL_13foocEE"
   c) an expression, e.g., "B<(J+1)/2>" is encoded with a prefix traversal
      of the operators involved, delimited by "X...E".  The operators are
      encoded using their two letter mangled names.  For example, "B<(J+1)/2>"
      becomes "1BI Xdv pl T1_ L1iE L2iE E E" (the blanks were inserted to
      visualize the decomposition).
    
Compression
-----------
The subsequence
   S<num>_
is used to repeat the num-th most recently encoded type (in right-to-left
order, starting at "1"), but only if "S<num>_" is strictly shorter that the
previous encoding.


Truncation
----------
If the mangled name exceeds 255 characters in length, it is reduced as
follows:
    (description of strong hash and truncation)



Examples
--------

1) "f": The C function or variable "f" or a file scope variable "f".

2) "_11f": Ret? f();

3) "_11fi": Ret? f(int);

4) "_13foo3bar": Ret? foo(bar);

5) "_3rm1X1X": Ret? operator%(X, X);

6) "_3plR1XR1X": Ret? operator+(X&, X&);

7) "_3lsRK1XS1_": Ret? operator(X const&, X const&);
       (Note: strlen("S1_")<strlen("RK1X"))

8) "_21fIiE": void f<int>();

9) "_21fIiEvi": void f<int>(/*nondependent*/int);
       (Note: the return type is always explicitly encoded for template
              functions taking parameters.)

10) "_25firstI3DuoEvS2_": void first<Duo>(/*nondependent*/Duo);
       (Note: "S1_" would refer to the "void" return type.)

11) "_25firstI3DuoEvT1_": void first<Duo>(/*T1=*/Duo);

12) "_11fQ1N": Ret? N::f();

13) "_14beepQ6System5Sound: Ret? System::Sound::beep();

14) "_05levelQ5Arena": Type? Arena::level;

15) "_05levelQ5StackIiiE": Type? Stack<int, int>::level

16) "_21fI1XEvPVN1AIT1_E1TE": void f<X>(A</*T1=*/X>::T volatile*);
                |         |
                |         `------> end dependent name encoding
                `----------------> start of dependent name A<T1>::T

17) "_4ngIL42iEEvN1AIXplT1_L2iEE1TE": void operator-</*int J=*/42>(A<J+2>::T);



More information about the cxx-abi-dev mailing list