Local name discriminators

Mon Jun 22 18:56:56 UTC 2009

5.1.6 "Scope Encoding" has this to say (among other things):

> Occasionally entities in local scopes must be mangled too (e.g.  
> because inlining or template compilation causes multiple translation  
> units to require access to that entity). The encoding for such  
> entities is as follows:
>
> <local-name> := Z <function encoding> E <entity name>  
> [<discriminator>]
>             := Z <function encoding> E s [<discriminator>]
> <discriminator> := _ <non-negative number>
> The first production is used for named local static objects and  
> classes, which are identified by their declared names. The <entity  
> name> may itself be a compound name, but it is relative to the  
> closest enclosing function, i.e. none of the components of the  
> function encoding appear in the entity name.

This seems to suggest that the first production doesn't apply to  
member functions of local classes, nor to local enumeration types.

I assume that's unintentional since the next sentence says:

> It is possible to have nested function scopes, e.g. when dealing  
> with a member function in a local class. In such cases, the function  
> encoding will itself have <local-name> structure.

and the other production of <local-name> doesn't apply at all.

Now consider the following example:

	void x() {
	  { struct X {}; }
	  struct X {
	    void foo() { foo(); } // #1
	  } x1;
	  x1.foo();
	  { struct X {
	      void foo() { foo(); }  // #2
	    } x2;
	    x2.foo();
	  }
	}

g++ produces the following mangled names for the X::foo members:

	_ZZ1xvEN1X3fooE_0v  for #1
	_ZZ1xvEN1X3fooE_1v  for #2

Note that both have discriminators, for which the spec says:
> The discriminator is used only for the second and later occurrences  
> of the same name within a single function. In this case <number> is  
> n - 2, if this is the nth occurrence, in lexical order, of the given  
> name.

The "same name" here is X::foo and #1 is the first occurrence (no  
discriminator needed) while #2 is the second occurrence (disciminator  
value 0).  So I would've expected instead:

	_ZZ1xvEN1X3fooEv    for #1
	_ZZ1xvEN1X3fooE_0v  for #2

Is this correct?

(EDG has a different interpretation:

	_ZZ1xvEN1X_03fooEv  for #1

	_ZZ1xvEN1X_13fooEv  for #2

but that's a bug too.)

We could change the spec to ensure that g++'s approach is  
"standard" (assuming that I understand what g++ really does here).   
I.e., we could specify that the "discriminator" discriminates the "top- 
level component" of colliding local names.  I think that would be  
sufficient.

Now consider a different example:

	class C {} c;

	inline int g() {
	  { struct X {}; }
	  { struct X {}; }
	  struct X {} x;
	  struct Y { int f(X x, C c) { return f(x, c); }; } y;
	  return y.f(x, c) + g();
	};

	int main() {
	  return g();
	}

(bad recursion written as a quick hack to force compilers to spill the  
inline functions).

The mangling for g()::Y::f is

	ZZ1gvEN1Y1fEZ1gvE1X_11C
	                   ^^^

The problem here is that there is no delimiter after the discriminator  
"_1" to separate it from the "1" that indicates the length of the  
class name "C".  So this cannot in general be demangled.  (Such  
situations become more common in C++0x where local classes can be  
template arguments.)

Addressing this requires a change that is technically ABI breakage,  
but I think we can do it so that real-world programs are highly  
unlikely to break by saying that a <discriminator> is "_<n>" for <n>  
<= 9 (that's unchanged), but "__<n>_" when <n> >= 10 (I assume here  
that <n> >= 10 doesn't happen in real programs).

Any thoughts?

	Daveed

P.S.: There are related issues with unnamed local classes in C++0x,  
but I plan to address those along with closure types in a separate  
proposal.