From rjmccall at apple.com Thu Feb 5 00:11:03 2015 From: rjmccall at apple.com (John McCall) Date: Wed, 4 Feb 2015 16:11:03 -0800 Subject: [cxx-abi-dev] Layout of overaligned empty base classes In-Reply-To: References: Message-ID: > On Apr 24, 2014, at 2:52 PM, Richard Smith wrote: > 2.4/II/3 says: > > "If D is an empty proper base class: [...potentially misalign the D base class...] Note that nvalign(D) is 1, so no update of align(C) is needed." > > This is not true. > > struct A {}; > struct B : A { char c; }; > struct alignas(4) D : A {}; > struct C : B, D {}; > > This puts a D object at offset 1 within C, and gives C nvalign of 1, which is obviously not right. Fortunately, GCC, Clang, and EDG all deviate from the ABI and instead do the natural thing here (put it at offset zero if you can, and otherwise allocate it like any other subobject). Looks like the wording only needs a little massaging here to say the right thing. Catching up on my queue, and I found this very old email. Would you mind proposing an exact wording change? John. From richardsmith at google.com Thu Feb 5 00:26:41 2015 From: richardsmith at google.com (Richard Smith) Date: Wed, 4 Feb 2015 16:26:41 -0800 Subject: [cxx-abi-dev] Layout of overaligned empty base classes In-Reply-To: References: Message-ID: Change the final paragraph of 2.4/II/3 as follows: """ Once offset(D) has been chosen, update sizeof(C) to max (sizeof(C), offset(D)+sizeof(D)) , and align(C) to max (align(C), nvalign(D)). Note that nvalign(D) is 1, so no update of align(C) is needed. Similarly, since D is an empty base class, no update of dsize(C) is needed. """ On 4 February 2015 at 16:11, John McCall wrote: > > On Apr 24, 2014, at 2:52 PM, Richard Smith > wrote: > > 2.4/II/3 says: > > > > "If D is an empty proper base class: [...potentially misalign the D base > class...] Note that nvalign(D) is 1, so no update of align(C) is needed." > > > > This is not true. > > > > struct A {}; > > struct B : A { char c; }; > > struct alignas(4) D : A {}; > > struct C : B, D {}; > > > > This puts a D object at offset 1 within C, and gives C nvalign of 1, > which is obviously not right. Fortunately, GCC, Clang, and EDG all deviate > from the ABI and instead do the natural thing here (put it at offset zero > if you can, and otherwise allocate it like any other subobject). Looks like > the wording only needs a little massaging here to say the right thing. > > Catching up on my queue, and I found this very old email. Would you mind > proposing an exact wording change? > > John. -------------- next part -------------- An HTML attachment was scrubbed... URL: From rjmccall at apple.com Thu Feb 5 06:53:02 2015 From: rjmccall at apple.com (John McCall) Date: Wed, 4 Feb 2015 22:53:02 -0800 Subject: [cxx-abi-dev] Mangling of reference temporaries In-Reply-To: References: <30E56083-5E76-4D39-BA4B-AA477E93E36B@apple.com> <41A87A23-895A-4429-89A2-0C04073BF39D@apple.com> Message-ID: > On Jan 9, 2015, at 5:14 PM, David Majnemer wrote: > > > On Tue, May 6, 2014 at 4:46 PM, David Majnemer > wrote: > On Mon, May 5, 2014 at 1:36 PM, John McCall > wrote: > On May 5, 2014, at 1:32 PM, Richard Smith > wrote: >> On 5 May 2014 12:10, John McCall > wrote: >> On May 5, 2014, at 11:07 AM, Richard Smith > wrote: >>> On 5 May 2014 10:14, John McCall > wrote: >>> On May 5, 2014, at 10:02 AM, Richard Smith > wrote: >>>> On 5 May 2014 09:13, John McCall > wrote: >>>> On May 4, 2014, at 8:00 PM, David Majnemer > wrote: >>>> > The Itanium ABI does not seem to provide a mangling for reference temporaries. >>>> > >>>> > Consider the following: >>>> > struct A { const int (&x)[3]; }; >>>> > struct B { const A (&x)[2]; }; >>>> > template B &&b = { { { { 1, 2, 3 } }, { { 4, 5, 6 } } } }; >>>> > B &temp = b; >>>> > >>>> > The temporaries created by instantiating b must be the same in all translation units. >>>> > >>>> > To satisfy this requirement, I propose that we mangle the temporaries in lexical order using a mangling similar to what GCC 4.9 uses and identical to what trunk clang uses. >>>> >>>> What does GCC do? >>>> >>>> GCC trunk seems to use >>>> >>>> ::= GR >>>> >>>> where the first reference temporary gets number 0, and so on. It appears to number them through a post-order tree walk of the expression. Older versions of GCC did not add a number, IIRC. >>> >>> Okay. So we have two different manglings out there that both look basically the same except for an off-by-one and a major semantic ordering difference. I think we should either standardize on one or the other or switch to a different prefix entirely. >>> >>> Looking at the GCC output again, I see: >>> * GCC actually does seem to be using lexical order (of the start of the expression) after all (at least in the std::initializer_list array temporary case). >>> * GCC emits these symbols with internal linkage. >>> >>> So I don't think there's any compatibility problem with GCC. >> >> Okay. >> >>> Has the clang mangling actually been used in a released compiler, or did it just get implemented? >>> >>> Sort of? Until very recently, Clang used the same mangling for all the temporaries, and added numbers to disambiguate, so we got the current proposal by accident (except the numbering starts from 1 instead of from 0) -- at least, in some cases: Clang would number the temporaries in a different order if they were initialized by constant expressions (because it happened to emit them in a different order). >> >> Yeah, we don?t need to work to maintain compatibility with that. >> >>> Hmm. Putting a after a requires demangler lookahead, doesn?t it? >>> >>> is self-delimiting, so a demangler can walk over it, then read digits until it sees a non-digit or end-of-mangled-name. (s are only nested if they appear within a , which has a terminating E.) Not sure if that addresses your concern, though. >> >> Ah, right, I was thinking of . >> >> Let?s just follow the example of , which is basically what you?re proposing except a instead of a and always followed by a _. >> >> Compared to the previous proposal (without the _), that's an ABI break for Clang in the overwhelmingly common case where a declaration lifetime-extends a single temporary, but I can live with it. > > Yeah, I?m comfortable with this. > >> Do you want someone to provide wording for the ABI document? > > Sure, might as well re-submit the proposal. It would be nice to get some feedback from someone not working on Clang, however. > > To implement support for mangling reference temporaries: > > 1. An additional non-terminal production should be added: > > ::= GR [ ] _ # Reference temporaries > > The is strictly the lexical order in which the reference temporary was written in the source. > > The following exists as a practical example: > > _ZGR1bIvE_ would be given to the 'B' object that 't' would refer to. > _ZGR1bIvE0_ would be given to the array of 'A' object references > _ZGR1bIvE1_ would be given to the object containing the first array of ints, {1, 2, 3} > _ZGR1bIvE2_ would be given to the object containing the second array of ints, {4, 5, 6} > > 2. The text describing should probably refrain from mentioning substitutable entities. > > Does anything else need to happen to get this added to the ABI document? It should be there now. Sorry for the delay. John. -------------- next part -------------- An HTML attachment was scrubbed... URL: From rjmccall at apple.com Thu Feb 5 07:15:28 2015 From: rjmccall at apple.com (John McCall) Date: Wed, 4 Feb 2015 23:15:28 -0800 Subject: [cxx-abi-dev] Layout of overaligned empty base classes In-Reply-To: References: Message-ID: > On Feb 4, 2015, at 4:26 PM, Richard Smith wrote: > > Change the final paragraph of 2.4/II/3 as follows: > > """ > Once offset(D) has been chosen, update sizeof(C) to max (sizeof(C), offset(D)+sizeof(D)) > , and align(C) to max (align(C), nvalign(D)). > Note that nvalign(D) is 1, so no update of align(C) is needed. > Similarly, since D is an empty base class, no update of dsize(C) is needed. > ?"" Applied, thanks! John. From richardsmith at google.com Wed Feb 18 19:46:37 2015 From: richardsmith at google.com (Richard Smith) Date: Wed, 18 Feb 2015 11:46:37 -0800 Subject: [cxx-abi-dev] missing mangling for in Message-ID: Consider these two cases: template struct X { struct Y {}; }; template class U> decltype(X().~U()) f(); template class U> decltype(X::Y().U::Y::~Y()) g(); Neither of these function templates has a mangling. We get to for the destructor name, and find a template template parameter with template args, which we cannot mangle as an , and must not mangle as a (because the name of the template template parameter can change between redeclarations). Suggested fix: U should be an . Replace ::= with ::= [ ] ... which results, I think, in these manglings for f and g: _Z1fI1XEDTcldtcvS0_IiE_EdnT_IiEEEv _Z1gI1XEDTcldtcvNS0_IiE1YE_EsrNT_IiE1YEdn1YEEv (Clang trunk implements this, but gets the g mangling wrong for other reasons.) OK? -------------- next part -------------- An HTML attachment was scrubbed... URL: From rjmccall at apple.com Wed Feb 18 21:04:57 2015 From: rjmccall at apple.com (John McCall) Date: Wed, 18 Feb 2015 13:04:57 -0800 Subject: [cxx-abi-dev] missing mangling for in In-Reply-To: References: Message-ID: > On Feb 18, 2015, at 11:46 AM, Richard Smith wrote: > Consider these two cases: > > template struct X { struct Y {}; }; > > template class U> decltype(X().~U()) f(); > template class U> decltype(X::Y().U::Y::~Y()) g(); > > Neither of these function templates has a mangling. We get to for the destructor name, and find a template template parameter with template args, which we cannot mangle as an , and must not mangle as a (because the name of the template template parameter can change between redeclarations). > > Suggested fix: U should be an . Replace > > ::= > > with > > ::= [ ] > > ... which results, I think, in these manglings for f and g: > > _Z1fI1XEDTcldtcvS0_IiE_EdnT_IiEEEv > _Z1gI1XEDTcldtcvNS0_IiE1YE_EsrNT_IiE1YEdn1YEEv > > (Clang trunk implements this, but gets the g mangling wrong for other reasons.) > > OK? I had to go and convince myself that an optional dangling production is fine here, but it does look like it can unambiguously and unheroically demangled. There are several other major productions that use an optional dangling like this, most notably ; so while this is not my favorite way of designing a mangling, it?s widely precedented in the grammar with this exact production, so the rest of the grammar has been designed to not collide with it. I did go ahead and verify that it?s unambiguous anyway. So this looks good to me. Is ~T::T() legal with a template parameter, or does that actually look up ?T" in the template argument? John. From richardsmith at google.com Wed Feb 18 21:45:00 2015 From: richardsmith at google.com (Richard Smith) Date: Wed, 18 Feb 2015 13:45:00 -0800 Subject: [cxx-abi-dev] missing mangling for in In-Reply-To: References: Message-ID: On 18 February 2015 at 13:04, John McCall wrote: > > On Feb 18, 2015, at 11:46 AM, Richard Smith > wrote: > > Consider these two cases: > > > > template struct X { struct Y {}; }; > > > > template class U> decltype(X().~U()) f(); > > template class U> > decltype(X::Y().U::Y::~Y()) g(); > > > > Neither of these function templates has a mangling. We get to > for the destructor name, and find a template template > parameter with template args, which we cannot mangle as an > , and must not mangle as a (because the name > of the template template parameter can change between redeclarations). > > > > Suggested fix: U should be an . Replace > > > > ::= > > > > with > > > > ::= [ ] > > > > ... which results, I think, in these manglings for f and g: > > > > _Z1fI1XEDTcldtcvS0_IiE_EdnT_IiEEEv > > _Z1gI1XEDTcldtcvNS0_IiE1YE_EsrNT_IiE1YEdn1YEEv > > > > (Clang trunk implements this, but gets the g mangling wrong for other > reasons.) > > > > OK? > > I had to go and convince myself that an optional dangling production is > fine here, but it does look like it can unambiguously and unheroically > demangled. There are several other major productions that use an optional > dangling like this, most notably ; so while this > is not my favorite way of designing a mangling, it?s widely precedented in > the grammar with this exact production, so the rest of the grammar has been > designed to not collide with it. I did go ahead and verify that it?s > unambiguous anyway. So this looks good to me. > > Is ~T::T() legal with a template parameter, or does that actually look up > ?T" in the template argument? It depends on whether the base object has a dependent type. If x's type is not dependent, then x.T::~T() looks up the first T within the type and names the template parameter if T is not found within the type. If x's type is dependent, (the standard is not clear but) lookup within the class is deemed to fail and the first T always names the template parameter. In all cases, the second T is looked up in the same scope(s) as the first. -------------- next part -------------- An HTML attachment was scrubbed... URL: From rjmccall at apple.com Wed Feb 18 23:35:52 2015 From: rjmccall at apple.com (John McCall) Date: Wed, 18 Feb 2015 15:35:52 -0800 Subject: [cxx-abi-dev] missing mangling for in In-Reply-To: References: Message-ID: > On Feb 18, 2015, at 1:45 PM, Richard Smith wrote: > On 18 February 2015 at 13:04, John McCall > wrote: > > On Feb 18, 2015, at 11:46 AM, Richard Smith > wrote: > > Consider these two cases: > > > > template struct X { struct Y {}; }; > > > > template class U> decltype(X().~U()) f(); > > template class U> decltype(X::Y().U::Y::~Y()) g(); > > > > Neither of these function templates has a mangling. We get to for the destructor name, and find a template template parameter with template args, which we cannot mangle as an , and must not mangle as a (because the name of the template template parameter can change between redeclarations). > > > > Suggested fix: U should be an . Replace > > > > ::= > > > > with > > > > ::= [ ] > > > > ... which results, I think, in these manglings for f and g: > > > > _Z1fI1XEDTcldtcvS0_IiE_EdnT_IiEEEv > > _Z1gI1XEDTcldtcvNS0_IiE1YE_EsrNT_IiE1YEdn1YEEv > > > > (Clang trunk implements this, but gets the g mangling wrong for other reasons.) > > > > OK? > > I had to go and convince myself that an optional dangling production is fine here, but it does look like it can unambiguously and unheroically demangled. There are several other major productions that use an optional dangling like this, most notably ; so while this is not my favorite way of designing a mangling, it?s widely precedented in the grammar with this exact production, so the rest of the grammar has been designed to not collide with it. I did go ahead and verify that it?s unambiguous anyway. So this looks good to me. > > Is ~T::T() legal with a template parameter, or does that actually look up ?T" in the template argument? > > It depends on whether the base object has a dependent type. If x's type is not dependent, then x.T::~T() looks up the first T within the type and names the template parameter if T is not found within the type. If x's type is dependent, (the standard is not clear but) lookup within the class is deemed to fail and the first T always names the template parameter. In all cases, the second T is looked up in the same scope(s) as the first. Okay, thanks. Do you agree that that?s not something that needs to be preserved in the mangling? It seems like that rule allows us to uniformly decide on srT_dnT_ or sd1Tdn1T at parse time in the non-dependent case, and whether we?re in the dependent or non-dependent case should always be reflected by the mangling of the base expression. If the language required us to do the member-type lookup in the dependent case, we?d need a special kind of (and even crazier logic in function template redeclaration matching, because you wouldn?t be able to match templates using different template parameter names when this happened?). John. -------------- next part -------------- An HTML attachment was scrubbed... URL: From richardsmith at google.com Wed Feb 18 23:54:03 2015 From: richardsmith at google.com (Richard Smith) Date: Wed, 18 Feb 2015 15:54:03 -0800 Subject: [cxx-abi-dev] missing mangling for in In-Reply-To: References: Message-ID: On 18 February 2015 at 15:35, John McCall wrote: > On Feb 18, 2015, at 1:45 PM, Richard Smith > wrote: > On 18 February 2015 at 13:04, John McCall wrote: > >> > On Feb 18, 2015, at 11:46 AM, Richard Smith >> wrote: >> > Consider these two cases: >> > >> > template struct X { struct Y {}; }; >> > >> > template class U> decltype(X().~U()) f(); >> > template class U> >> decltype(X::Y().U::Y::~Y()) g(); >> > >> > Neither of these function templates has a mangling. We get to >> for the destructor name, and find a template template >> parameter with template args, which we cannot mangle as an >> , and must not mangle as a (because the name >> of the template template parameter can change between redeclarations). >> > >> > Suggested fix: U should be an . Replace >> > >> > ::= >> > >> > with >> > >> > ::= [ ] >> > >> > ... which results, I think, in these manglings for f and g: >> > >> > _Z1fI1XEDTcldtcvS0_IiE_EdnT_IiEEEv >> > _Z1gI1XEDTcldtcvNS0_IiE1YE_EsrNT_IiE1YEdn1YEEv >> > >> > (Clang trunk implements this, but gets the g mangling wrong for >> other reasons.) >> > >> > OK? >> >> I had to go and convince myself that an optional dangling production is >> fine here, but it does look like it can unambiguously and unheroically >> demangled. There are several other major productions that use an optional >> dangling like this, most notably ; so while this >> is not my favorite way of designing a mangling, it?s widely precedented in >> the grammar with this exact production, so the rest of the grammar has been >> designed to not collide with it. I did go ahead and verify that it?s >> unambiguous anyway. So this looks good to me. >> >> Is ~T::T() legal with a template parameter, or does that actually look up >> ?T" in the template argument? > > > It depends on whether the base object has a dependent type. If x's type is > not dependent, then x.T::~T() looks up the first T within the type and > names the template parameter if T is not found within the type. If x's type > is dependent, (the standard is not clear but) lookup within the class is > deemed to fail and the first T always names the template parameter. In all > cases, the second T is looked up in the same scope(s) as the first. > > > Okay, thanks. Do you agree that that?s not something that needs to be > preserved in the mangling? It seems like that rule allows us to uniformly > decide on srT_dnT_ or sd1Tdn1T at parse time in the non-dependent case, and > whether we?re in the dependent or non-dependent case should always be > reflected by the mangling of the base expression. > Yes, I agree. If the language required us to do the member-type lookup in the dependent > case, we?d need a special kind of (and even crazier logic > in function template redeclaration matching, because you wouldn?t be able > to match templates using different template parameter names when this > happened?). > If the language required that, I'd call it a defect in the specification. -------------- next part -------------- An HTML attachment was scrubbed... URL: From rjmccall at apple.com Thu Feb 19 00:04:44 2015 From: rjmccall at apple.com (John McCall) Date: Wed, 18 Feb 2015 16:04:44 -0800 Subject: [cxx-abi-dev] missing mangling for in In-Reply-To: References: Message-ID: > On Feb 18, 2015, at 3:54 PM, Richard Smith wrote: > On 18 February 2015 at 15:35, John McCall > wrote: >> On Feb 18, 2015, at 1:45 PM, Richard Smith > wrote: >> On 18 February 2015 at 13:04, John McCall > wrote: >> > On Feb 18, 2015, at 11:46 AM, Richard Smith > wrote: >> > Consider these two cases: >> > >> > template struct X { struct Y {}; }; >> > >> > template class U> decltype(X().~U()) f(); >> > template class U> decltype(X::Y().U::Y::~Y()) g(); >> > >> > Neither of these function templates has a mangling. We get to for the destructor name, and find a template template parameter with template args, which we cannot mangle as an , and must not mangle as a (because the name of the template template parameter can change between redeclarations). >> > >> > Suggested fix: U should be an . Replace >> > >> > ::= >> > >> > with >> > >> > ::= [ ] >> > >> > ... which results, I think, in these manglings for f and g: >> > >> > _Z1fI1XEDTcldtcvS0_IiE_EdnT_IiEEEv >> > _Z1gI1XEDTcldtcvNS0_IiE1YE_EsrNT_IiE1YEdn1YEEv >> > >> > (Clang trunk implements this, but gets the g mangling wrong for other reasons.) >> > >> > OK? >> >> I had to go and convince myself that an optional dangling production is fine here, but it does look like it can unambiguously and unheroically demangled. There are several other major productions that use an optional dangling like this, most notably ; so while this is not my favorite way of designing a mangling, it?s widely precedented in the grammar with this exact production, so the rest of the grammar has been designed to not collide with it. I did go ahead and verify that it?s unambiguous anyway. So this looks good to me. >> >> Is ~T::T() legal with a template parameter, or does that actually look up ?T" in the template argument? >> >> It depends on whether the base object has a dependent type. If x's type is not dependent, then x.T::~T() looks up the first T within the type and names the template parameter if T is not found within the type. If x's type is dependent, (the standard is not clear but) lookup within the class is deemed to fail and the first T always names the template parameter. In all cases, the second T is looked up in the same scope(s) as the first. > > Okay, thanks. Do you agree that that?s not something that needs to be preserved in the mangling? It seems like that rule allows us to uniformly decide on srT_dnT_ or sd1Tdn1T at parse time in the non-dependent case, and whether we?re in the dependent or non-dependent case should always be reflected by the mangling of the base expression. > > Yes, I agree. > > If the language required us to do the member-type lookup in the dependent case, we?d need a special kind of (and even crazier logic in function template redeclaration matching, because you wouldn?t be able to match templates using different template parameter names when this happened?). > > If the language required that, I'd call it a defect in the specification. Agreed on that, too. Okay, I?ll commit this in a week or so if nobody objects. John. -------------- next part -------------- An HTML attachment was scrubbed... URL: From david.majnemer at gmail.com Thu Feb 19 22:51:19 2015 From: david.majnemer at gmail.com (David Majnemer) Date: Thu, 19 Feb 2015 14:51:19 -0800 Subject: [cxx-abi-dev] Mangling string constants Message-ID: Hi, It seems that the ABI has no means to mangle the contents of string constants. The cxx-abi-dev archives have a proposal http://sourcerytools.com/pipermail/cxx-abi-dev/2012-January/000032.html but it seems this was never integrated into the ABI document. Further, the proposal doesn't specify how to mangle UTF-16/UTF-32 string literals. Such a mangling would have to specify the endianness used to encode the code points. At the moment, I am trying to figure out how we should mangle the string constant in: struct X { static constexpr const char *p = "foo"; }; We are required to give the storage for the string the same name in all translation units in order to adhere to the ODR. One idea I had was to treat it like a lifetime extended temporary but this might break compatibility with existing programs. Are there any preferences as to what should be done? -- David Majnemer -------------- next part -------------- An HTML attachment was scrubbed... URL: From dhandly at cup.hp.com Fri Feb 20 07:44:44 2015 From: dhandly at cup.hp.com (Dennis Handly) Date: Thu, 19 Feb 2015 23:44:44 -0800 Subject: [cxx-abi-dev] Mangling string constants Message-ID: <201502200744.t1K7iik25088@adlwrk06.cce.hp.com> >From: David Majnemer >It seems that the ABI has no means to mangle the contents of string constants. Why is that needed? The current scheme is to just number the constants in order. And that handles both strings and wide strings. And by the ODR rule the inlines must be the same. >the proposal doesn't specify how to mangle UTF-16/UTF-32 string literals. >Such a mangling would have to specify the endianness used to encode the code >points. Again why? We just need mangling to make sure they match addresses. Do we really to check code enforcement? We don't do for narrow vs wide. >I am trying to figure out how we should mangle the string constant in: >struct X { > static constexpr const char *p = "foo"; >}; I thought this was defined? From rjmccall at apple.com Fri Feb 20 23:51:52 2015 From: rjmccall at apple.com (John McCall) Date: Fri, 20 Feb 2015 15:51:52 -0800 Subject: [cxx-abi-dev] Mangling string constants In-Reply-To: <201502200744.t1K7iik25088@adlwrk06.cce.hp.com> References: <201502200744.t1K7iik25088@adlwrk06.cce.hp.com> Message-ID: <2EF0943D-8D5D-4820-A4BF-CE2EFA2E2FDC@apple.com> > On Feb 19, 2015, at 11:44 PM, Dennis Handly wrote: >> From: David Majnemer >> It seems that the ABI has no means to mangle the contents of string constants. > > Why is that needed? > The current scheme is to just number the constants in order. > And that handles both strings and wide strings. > And by the ODR rule the inlines must be the same. I think this is what David means by numbering like a reference temporary. To the extent that this is needed, I agree with you that that?s the right solution: string literals should be mangled in the same sequence as reference temporaries. (Which already applies to more than just reference temporaries anyway, since the same concept of lifetime extension applies to std::initializer_list temporaries.) I have some of the same concerns here as I do with guaranteeing the uniqueness of string literals within inline functions: I want to make sure the language isn?t accidentally promising something that grotesquely affects performance far out of proportion to its utility to the programmer. It would be very unfortunate if we, say, introduced thousands of new global weak symbols just to unique the strings used by assertions. We can take things like this back to the committee if necessary. But if we can restrict this guarantee to string literals that appear in reference-temporary-like positions in constexpr initializers, I think it?s reasonable enough. John. From rjmccall at apple.com Sat Feb 21 01:58:23 2015 From: rjmccall at apple.com (John McCall) Date: Fri, 20 Feb 2015 17:58:23 -0800 Subject: [cxx-abi-dev] Mangling string constants In-Reply-To: References: <201502200744.t1K7iik25088@adlwrk06.cce.hp.com> <2EF0943D-8D5D-4820-A4BF-CE2EFA2E2FDC@apple.com> Message-ID: <7E307DDF-E2F7-492D-B45E-AD915D861D08@apple.com> > On Feb 20, 2015, at 4:28 PM, Richard Smith wrote: > On 20 February 2015 at 15:51, John McCall > wrote: > > On Feb 19, 2015, at 11:44 PM, Dennis Handly > wrote: > >> From: David Majnemer > > >> It seems that the ABI has no means to mangle the contents of string constants. > > > > Why is that needed? > > The current scheme is to just number the constants in order. > > And that handles both strings and wide strings. > > And by the ODR rule the inlines must be the same. > > I think this is what David means by numbering like a reference temporary. > > To the extent that this is needed, I agree with you that that?s the right solution: string literals should be mangled in the same sequence as reference temporaries. (Which already applies to more than just reference temporaries anyway, since the same concept of lifetime extension applies to std::initializer_list temporaries.) > > I have some of the same concerns here as I do with guaranteeing the uniqueness of string literals within inline functions: I want to make sure the language isn?t accidentally promising something that grotesquely affects performance far out of proportion to its utility to the programmer. It would be very unfortunate if we, say, introduced thousands of new global weak symbols just to unique the strings used by assertions. We can take things like this back to the committee if necessary. > > But if we can restrict this guarantee to string literals that appear in reference-temporary-like positions in constexpr initializers, I think it?s reasonable enough. > > We can't. Consider: > > constexpr const char *f(const char *p) { return p; } > constexpr const char *g() { return "foo"; } > struct X { > constexpr static const char *p = "foo", // ok > *q = f("foo"), // not in a "reference-temporary-like" position > *r = g(); // string literal is not even lexically within the initializer > }; Yeah, I thought about this a bit too late. There are two ways to salvage the idea: mark string literals by position as they appear in the actual constexpr result, or just don?t promise anything in this case. Another concern with widespread string-literal mangling that occurs to me is whether it will completely defeat ordinary string-literal sharing. To do this feature optimally, we would need? in ELF terms, what, a COMDAT alias (?) into the string literal section? This might be pushing the boundaries of supported linker behavior a lot. If we have to emit separate, unmergeable string literal objects just because they were used in a constexpr, that would be a disaster. John. -------------- next part -------------- An HTML attachment was scrubbed... URL: From richardsmith at googlers.com Thu Feb 19 23:04:46 2015 From: richardsmith at googlers.com (Richard Smith) Date: Thu, 19 Feb 2015 15:04:46 -0800 Subject: [cxx-abi-dev] Mangling string constants In-Reply-To: References: Message-ID: On 19 February 2015 at 14:51, David Majnemer wrote: > Hi, > > It seems that the ABI has no means to mangle the contents of string > constants. > > The cxx-abi-dev archives have a proposal > http://sourcerytools.com/pipermail/cxx-abi-dev/2012-January/000032.html > but it seems this was never integrated into the ABI document. Further, the > proposal doesn't specify how to mangle UTF-16/UTF-32 string literals. Such > a mangling would have to specify the endianness used to encode the code > points. > > At the moment, I am trying to figure out how we should mangle the string > constant in: > struct X { > static constexpr const char *p = "foo"; > }; > > We are required to give the storage for the string the same name in all > translation units in order to adhere to the ODR. > > One idea I had was to treat it like a lifetime extended temporary but this > might break compatibility with existing programs. > > Are there any preferences as to what should be done? > A related case: inline constexpr const char *f() { return "foo"; } struct X { static constexpr const char *p = f(), *q = f(); }; We've removed the language rule that required that p == q, but we still have a constraint that every translation unit sees the same value for p. I think the simplest way to address this problem and David's original one is to give a mangling for string literals based on their contents. This mangling would be optional in all cases *except* where the string literal object must be the same across translation units, in which case the mangling must be used and the string literal must be emitted with vague linkage. I would expect there are few enough such cases that we don't need to worry about the implied extra symbols. -------------- next part -------------- An HTML attachment was scrubbed... URL: From richardsmith at googlers.com Fri Feb 20 21:22:32 2015 From: richardsmith at googlers.com (Richard Smith) Date: Fri, 20 Feb 2015 13:22:32 -0800 Subject: [cxx-abi-dev] Mangling string constants In-Reply-To: <201502200744.t1K7iik25088@adlwrk06.cce.hp.com> References: <201502200744.t1K7iik25088@adlwrk06.cce.hp.com> Message-ID: On 19 February 2015 at 23:44, Dennis Handly wrote: > >From: David Majnemer > >It seems that the ABI has no means to mangle the contents of string > constants. > > Why is that needed? > See the linked message: http://sourcerytools.com/pipermail/cxx-abi-dev/2012-January/000032.html This is about string literals in signatures, where we need to mangle the contents because the contents can be observed through evaluation of a value-dependent constant expression. The current scheme is to just number the constants in order. > And that handles both strings and wide strings. > And by the ODR rule the inlines must be the same. > > >the proposal doesn't specify how to mangle UTF-16/UTF-32 string literals. > >Such a mangling would have to specify the endianness used to encode the > code > >points. > > Again why? We just need mangling to make sure they match addresses. > Do we really to check code enforcement? > We don't do for narrow vs wide. > > >I am trying to figure out how we should mangle the string constant in: > >struct X { > > static constexpr const char *p = "foo"; > >}; > > I thought this was defined? Apparently not. -------------- next part -------------- An HTML attachment was scrubbed... URL: From richardsmith at googlers.com Sat Feb 21 00:28:00 2015 From: richardsmith at googlers.com (Richard Smith) Date: Fri, 20 Feb 2015 16:28:00 -0800 Subject: [cxx-abi-dev] Mangling string constants In-Reply-To: <2EF0943D-8D5D-4820-A4BF-CE2EFA2E2FDC@apple.com> References: <201502200744.t1K7iik25088@adlwrk06.cce.hp.com> <2EF0943D-8D5D-4820-A4BF-CE2EFA2E2FDC@apple.com> Message-ID: On 20 February 2015 at 15:51, John McCall wrote: > > On Feb 19, 2015, at 11:44 PM, Dennis Handly wrote: > >> From: David Majnemer > >> It seems that the ABI has no means to mangle the contents of string > constants. > > > > Why is that needed? > > The current scheme is to just number the constants in order. > > And that handles both strings and wide strings. > > And by the ODR rule the inlines must be the same. > > I think this is what David means by numbering like a reference temporary. > > To the extent that this is needed, I agree with you that that?s the right > solution: string literals should be mangled in the same sequence as > reference temporaries. (Which already applies to more than just reference > temporaries anyway, since the same concept of lifetime extension applies to > std::initializer_list temporaries.) > > I have some of the same concerns here as I do with guaranteeing the > uniqueness of string literals within inline functions: I want to make sure > the language isn?t accidentally promising something that grotesquely > affects performance far out of proportion to its utility to the > programmer. It would be very unfortunate if we, say, introduced thousands > of new global weak symbols just to unique the strings used by assertions. > We can take things like this back to the committee if necessary. > > But if we can restrict this guarantee to string literals that appear in > reference-temporary-like positions in constexpr initializers, I think it?s > reasonable enough. We can't. Consider: constexpr const char *f(const char *p) { return p; } constexpr const char *g() { return "foo"; } struct X { constexpr static const char *p = "foo", // ok *q = f("foo"), // not in a "reference-temporary-like" position *r = g(); // string literal is not even lexically within the initializer }; -------------- next part -------------- An HTML attachment was scrubbed... URL: