[cxx-abi-dev] pointer-to-data-member representation for null pointer is not conforming

John McCall rjmccall at apple.com
Fri Dec 21 07:13:31 UTC 2012


On Dec 20, 2012, at 11:00 PM, Richard Smith <richardsmith at google.com> wrote:
> On Thu, Dec 20, 2012 at 10:48 PM, John McCall <rjmccall at apple.com> wrote:
> On Dec 20, 2012, at 10:32 PM, Richard Smith <richardsmith at google.com> wrote:
>> On Thu, Dec 20, 2012 at 10:02 PM, John McCall <rjmccall at apple.com> wrote:
>> On Dec 20, 2012, at 9:37 PM, Richard Smith <richardsmith at google.com> wrote:
>> > On Thu, Dec 20, 2012 at 8:53 PM, John McCall <rjmccall at apple.com> wrote:
>> > On Dec 20, 2012, at 7:09 PM, John McCall <rjmccall at apple.com> wrote:
>> >> On Dec 20, 2012, at 4:19 PM, Richard Smith <richardsmith at google.com> wrote:
>> >>> Consider the following:
>> >>>
>> >>> struct E {};
>> >>> struct X : E {};
>> >>> struct C : E, X { char x; };
>> >>>
>> >>> char C::*c1 = &C::x;
>> >>> char X::*x = (char(X::*))c1;
>> >>> char C::*c2 = x2;
>> >>>
>> >>> int main() { return c2 != 0; }
>> >>>
>> >>> I believe this program is valid and has defined behavior; per [expr.static.cast]p12, we can convert a pointer to a member of a derived class to a pointer to a member of a base class, so long as the base class is a base class of the class containing the original member.
>> >>>
>> >>> Per the ABI, C::x is at offset 0, C::E is at offset 0, and C::X and C::X::E are at offset 1 (they can't go at 0 due to the collision of the empty E base class). So the value of c1 is 0. And the value of x is... -1. Whoops.
>> >>>
>> >>> Finally, the conversion from x to c2 preserves the -1 value (conversion of a null member pointer produces a null member pointer), giving the wrong value for x2, and resulting in main returning 0, where the standard requires it to return 1 (likewise, returning x != 0 would produce the wrong value).
>> >>
>> >> Yep.
>> >>
>> >> Personally, I've been aware of this for awhile and consider it an unfixable defect.  I don't know if it's generally known, though, and I can't find any prior discussion on the list.
>> >>
>> >> I'm not aware of any non-artificial code that the defect has ever broken;  there are some decent just-so stories for why that might be true:
>> >>   (1) Data member pointers provide a really awkward abstraction that just aren't used that much:
>> >>     (1a) They let you abstract over any member you want!
>> >>     (1b) As long as that member has exactly the right type, not something implicitly convertible to it!
>> >>     (1c) And as long as that member is actually stored in a field, not computed from it!
>> >>     (1d) And as long as that field is a field of the class or one of its bases, not a field of a field of the class!
>> >>   (2) Everything about the syntax of member pointers — making them, using them, writing their types — is kindof weird-looking, and many people don't like using them.
>> >>   (3) The sorts of low-level programmers who would use this strange abstraction are often more comfortable using offsetof and explicit char* manipulation anyway.
>> >>   (4) People usually use data member pointers on hierarchically boring types anyway — generally leaf classes.
>> >>   (5) People usually don't mix data member pointers from different levels of the class hierarchy, and therefore generally don't convert do hierarchy conversions on them.
>> >>   (6) People usually don't work with null member pointers — they use member pointers as a way of abstracting an access for some algorithm, and generally that doesn't admit a null value.
>> >>   (6) Vanishingly few non-empty subclasses are ever going to be laid out at an offset of 1:
>> >>     (6a) The base class must have an alignment of 1, meaning (for pretty much every platform out there) no virtual functions, no interesting data structures, no pointers, no ints — nothing but bools and chars and arrays thereof.
>> >>     (6b) The derived class cannot have any virtual functions or virtual bases.
>> >>     (6c) The derived class must have multiple base classes, the first of which has to be either empty (totally empty, lacking even virtual methods) or size 1.
>> >
>> > I went to dinner and realized that this point isn't as useful as I thought — you don't need a base class to be laid out at an offset of 1, you need a base class to be laid out immediately after a base A that has a field of size 1 at offset datasize(A)-1.
>> >
>> > You need the field to be in the derived class in order for this to be a problem; otherwise, the cast would have undefined behavior. Hence, the base class must be empty, and indeed must be a repeated empty base class (to not be at offset 0).
>> 
>> I think I see where you're getting that, but I'm not sure that's really
>> the intended meaning of the standard here.
>> 
>> To elaborate, you seem to be interpreting the following text to mean
>> that members of *other bases* of the derived class cannot be casted
>> to be members of base class:
>>   If class B contains the original member, or is a base or derived
>>   class of the class containing the original member, the resulting
>>   pointer to member points to the original member.  Otherwise, the
>>   result of the cast is undefined.
>> 
>> It does seem to be generally true that "contains" means only direct
>> containment;  compare [intro.object]p3:
>>   For every object x, there is some object called the complete object
>>   of x, determined as follows:
>>     - If x is a complete object, then x is the complete object of x.
>>     - Otherwise, the complete object of x is the complete object of the
>>       (unique) object that contains x.
>> 
>> And the use of "contains" in the quote above does seem to imply
>> only direct containment, because otherwise it wouldn't need to
>> include the "base or derived" phrase.
>> 
>> On the other hand, the note immediately after this uses "contains"
>> more loosely:
>>   although class B need not contain the original member, the dynamic
>>   type of the object on which the pointer to member is dereferenced
>>   must contain the original member
>> 
>> So I'm not convinced that the standard should necessarily be read that
>> closely.
>> 
>> For...
>> 
>> struct A { int x; };
>> struct B { int y; };
>> struct C : A, B {};
>> 
>> int B::*p = (int(B::*))(int(C::*))&A::x;
>> 
>> ... the 'original member' is A::x, and 'the class containing the original member' is A, and B is neither a base class or a derived class of A, so the result (ahem, behavior) is undefined. Since we're talking about *the* class containing the original member, the normative wording seems unambiguous to me (and the note is true but not precise, which is what we expect from notes...).
> 
> There's definitely no rule that the dynamic type — i.e. the type of the
> complete object, the most-derived class — directly contains the member to
> which the member pointer refers.  I don't see how this note can be "true".
> 
>> If it were as you described, wouldn't this have defined behavior:
>> 
>> struct D : B, A {} d;
>> int k = d.*p;
>> 
>> (Since, per [expr.mptr.oper]p4, the dynamic type of the LHS *does* contain the member, A::x, to which the RHS refers?) I'm also not sure which situations would reach the "Otherwise" case in your interpretation.
> 
> Good point;  my rule would need to be defined in terms of subobjects.
> On the other hand, I don't think you can avoid that.  Consider:
>   struct A { int x; };
>   struct B : A {};
>   struct C : A, C {};
> 
> I assume this was intended to be C : A, B ?

Yes.
 
>   int B::*b = &A::x;
>   int C::*c = b;
>   int A::*a = (int A::*) c;
> 
> That's an error; the 'A' base is ambiguous, and if you disambiguate it by adding an extra layer between A and C, you introduce UB.

You're right about the ambiguity, but the second is not true.

struct A { int x; };
struct B : A {};
struct D : A {};
struct C : D, B {};

int A::*a1 = &A::x;
int B::*b = a1;
int C::*c = b;
int D::*d = (int D::*) c;
int A::*a2 = (int A::*) d;

Note that D is a derived class of the class containing the original member.
That the original member was led along a path through a different class
type does not grant us the right to undefined behavior by your
interpretation;  for that, you really have to talk about subobjects, not about
classes.

John.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://sourcerytools.com/pipermail/cxx-abi-dev/attachments/20121220/bbaa213d/attachment-0001.html>


More information about the cxx-abi-dev mailing list