[cxx-abi-dev] need mangling for string literals and lifetime-extended temporaries used in static constexpr member initializers and for string literals in constexpr functions

Fri May 24 08:10:43 UTC 2013

On May 23, 2013, at 11:29 PM, Richard Smith <richardsmith at google.com> wrote:
> On Thu, May 23, 2013 at 10:41 PM, John McCall <rjmccall at apple.com> wrote:
> On May 23, 2013, at 10:23 PM, Richard Smith <richardsmith at google.com> wrote:
> > So... this problem was not really new in C++11. In C++98 it can be witnessed for an inline function such as:
> >
> > inline const char *get() {
> >   static const char *str = "foo";
> >   return str;
> > }
> 
> How is this different from the following?
> 
>   inline const char *get_nostatic() { return "foo"; }
> 
> or
> 
>   inline const char *get_separate() {
>     const char *temp = "foo";
>     static const char *str = tmp;
>     return str;
>   }
> 
> Please find or add something in the standard which will allow us to
> not export a symbol for every string literal(*) that happens to be used
> in a function with weak linkage.
> 
> Finding failed. In addition to the implications of the ODR, we have this:
> 
> [dcl.fct.spec]p4: "A string literal in the body of an extern inline function is the same object in different translation units."

This is a really terrible language requirement.  Does anyone actually do what's necessary for this?  I really can't imagine actually implementing it;  it would be a *ton* of new extern symbols.

> On the adding front, perhaps the simplest way to avoid generating such extra symbols (at least, in most cases) would be to specify that a string literal expression may produce the address of a different (static storage duration) object each time it is evaluated.  However, even if we allow that, I don't think it's reasonable for an unchanging static storage duration pointer or reference to point at different objects depending on who is asking.

I agree;  I just really don't want to have to export unique symbols for every logging statement in an inline function.

So, let's see.  I see two basic language designs and implementation strategies.

1.  The first is that the source location of a string literal (function / initializer where it appears and its source order therewithin) is actually a crucial semantic property that compilers have to track/update through everything.  (Source order becomes a really interesting question when you consider default argument expressions.)  Not every string literal is blessed this way;  just the ones that show up in (1) inline functions or (2) initializers of (weak-linkage) constexpr variables with static storage duration.  This is a major implementation pain, and it becomes a bizarre new pervasive cost of C++ just to satisfy a requirement that very, very few people care about.  Hooray.

2.  The second is that we somehow limit this problem to just initializing an object of static storage duration.

There are three places where we can have initializers for the same object in different translation units:
  - constexpr static data members
  - static data members of a class template
  - static local variables in inline functions

The constexpr and non-constexpr cases are subtly different.

In the constexpr case, we know that everybody agrees that the initializer can be constant-evaluated, and we can assume that everybody evaluates it to the same constant.  This gives us a number of ways to stably identify sub-objects in the variable.  If we actually have to emit the definition, that's easy enough, too.

In the non-constexpr case, we don't know that, and we have to compile the code as if there was a possibility that somebody might have emitted as a dynamic initializer.  So I think we can't make any assumptions about string-literal pointer values stored in the variable;  we always have to load them out, which is really unfortunate.

Also, this entire approach seems to make the presence of 'constexpr' affect ABI.  (It does get caught by ODR, so that's *legal*, but I don't know that it's *a good idea*.)

It's also unclear what *parts* of any given initializer will be constant-initialized vs. dynamically-initialized;  consider:
  inline const char *second(const char *a, const char *b) { return b; }
  inline const char *ident(const char *s) { return s; }
  ...
  inline void test() {
    static const char *strs[] = { second("a", "b"), ident("c"), "d" };
  }
The only part that's "guaranteed" to be constant-initialized is the third element, but a compiler which does constant-initialize this can get both of the others.  And note that the string literals we use aren't 1-1 with the string literals in the initializer;  the uniquing scheme needs to be positional within the initialized object to ensure that different translation units use the same thing.  (That is, "d" would have to mangled as "_Z4testEv::strs[2]".)

I think the right solution is to:
  - concede that (1) is the simpler language and implementation design but
  - nonetheless refuse to implement it due to an insufficient indignant-user count (and a reasonable suspicion of seeing a higher indignant-user count if we did).

In practice, I believe most linkers will coalesce common string values within a linkage unit, which is all that even the few people who care about this actually want.

John.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://sourcerytools.com/pipermail/cxx-abi-dev/attachments/20130524/7d55e464/attachment.html>