[cxx-abi-dev] pointer-to-data-member representation for null pointer is not conforming
Richard Smith
richardsmith at google.com
Fri Dec 21 07:00:47 UTC 2012
On Thu, Dec 20, 2012 at 10:48 PM, John McCall <rjmccall at apple.com> wrote:
> On Dec 20, 2012, at 10:32 PM, Richard Smith <richardsmith at google.com>
> wrote:
>
> On Thu, Dec 20, 2012 at 10:02 PM, John McCall <rjmccall at apple.com> wrote:
>
>> On Dec 20, 2012, at 9:37 PM, Richard Smith <richardsmith at google.com>
>> wrote:
>> > On Thu, Dec 20, 2012 at 8:53 PM, John McCall <rjmccall at apple.com>
>> wrote:
>> > On Dec 20, 2012, at 7:09 PM, John McCall <rjmccall at apple.com> wrote:
>> >> On Dec 20, 2012, at 4:19 PM, Richard Smith <richardsmith at google.com>
>> wrote:
>> >>> Consider the following:
>> >>>
>> >>> struct E {};
>> >>> struct X : E {};
>> >>> struct C : E, X { char x; };
>> >>>
>> >>> char C::*c1 = &C::x;
>> >>> char X::*x = (char(X::*))c1;
>> >>> char C::*c2 = x2;
>> >>>
>> >>> int main() { return c2 != 0; }
>> >>>
>> >>> I believe this program is valid and has defined behavior; per
>> [expr.static.cast]p12, we can convert a pointer to a member of a derived
>> class to a pointer to a member of a base class, so long as the base class
>> is a base class of the class containing the original member.
>> >>>
>> >>> Per the ABI, C::x is at offset 0, C::E is at offset 0, and C::X and
>> C::X::E are at offset 1 (they can't go at 0 due to the collision of the
>> empty E base class). So the value of c1 is 0. And the value of x is... -1.
>> Whoops.
>> >>>
>> >>> Finally, the conversion from x to c2 preserves the -1 value
>> (conversion of a null member pointer produces a null member pointer),
>> giving the wrong value for x2, and resulting in main returning 0, where the
>> standard requires it to return 1 (likewise, returning x != 0 would produce
>> the wrong value).
>> >>
>> >> Yep.
>> >>
>> >> Personally, I've been aware of this for awhile and consider it an
>> unfixable defect. I don't know if it's generally known, though, and I
>> can't find any prior discussion on the list.
>> >>
>> >> I'm not aware of any non-artificial code that the defect has ever
>> broken; there are some decent just-so stories for why that might be true:
>> >> (1) Data member pointers provide a really awkward abstraction that
>> just aren't used that much:
>> >> (1a) They let you abstract over any member you want!
>> >> (1b) As long as that member has exactly the right type, not
>> something implicitly convertible to it!
>> >> (1c) And as long as that member is actually stored in a field, not
>> computed from it!
>> >> (1d) And as long as that field is a field of the class or one of
>> its bases, not a field of a field of the class!
>> >> (2) Everything about the syntax of member pointers — making them,
>> using them, writing their types — is kindof weird-looking, and many people
>> don't like using them.
>> >> (3) The sorts of low-level programmers who would use this strange
>> abstraction are often more comfortable using offsetof and explicit char*
>> manipulation anyway.
>> >> (4) People usually use data member pointers on hierarchically boring
>> types anyway — generally leaf classes.
>> >> (5) People usually don't mix data member pointers from different
>> levels of the class hierarchy, and therefore generally don't convert do
>> hierarchy conversions on them.
>> >> (6) People usually don't work with null member pointers — they use
>> member pointers as a way of abstracting an access for some algorithm, and
>> generally that doesn't admit a null value.
>> >> (6) Vanishingly few non-empty subclasses are ever going to be laid
>> out at an offset of 1:
>> >> (6a) The base class must have an alignment of 1, meaning (for
>> pretty much every platform out there) no virtual functions, no interesting
>> data structures, no pointers, no ints — nothing but bools and chars and
>> arrays thereof.
>> >> (6b) The derived class cannot have any virtual functions or
>> virtual bases.
>> >> (6c) The derived class must have multiple base classes, the first
>> of which has to be either empty (totally empty, lacking even virtual
>> methods) or size 1.
>> >
>> > I went to dinner and realized that this point isn't as useful as I
>> thought — you don't need a base class to be laid out at an offset of 1, you
>> need a base class to be laid out immediately after a base A that has a
>> field of size 1 at offset datasize(A)-1.
>> >
>> > You need the field to be in the derived class in order for this to be a
>> problem; otherwise, the cast would have undefined behavior. Hence, the base
>> class must be empty, and indeed must be a repeated empty base class (to not
>> be at offset 0).
>>
>> I think I see where you're getting that, but I'm not sure that's really
>> the intended meaning of the standard here.
>>
>> To elaborate, you seem to be interpreting the following text to mean
>> that members of *other bases* of the derived class cannot be casted
>> to be members of base class:
>> If class B contains the original member, or is a base or derived
>> class of the class containing the original member, the resulting
>> pointer to member points to the original member. Otherwise, the
>> result of the cast is undefined.
>>
>> It does seem to be generally true that "contains" means only direct
>> containment; compare [intro.object]p3:
>> For every object x, there is some object called the complete object
>> of x, determined as follows:
>> - If x is a complete object, then x is the complete object of x.
>> - Otherwise, the complete object of x is the complete object of the
>> (unique) object that contains x.
>>
>> And the use of "contains" in the quote above does seem to imply
>> only direct containment, because otherwise it wouldn't need to
>> include the "base or derived" phrase.
>>
>> On the other hand, the note immediately after this uses "contains"
>> more loosely:
>> although class B need not contain the original member, the dynamic
>> type of the object on which the pointer to member is dereferenced
>> must contain the original member
>>
>> So I'm not convinced that the standard should necessarily be read that
>> closely.
>
>
> For...
>
> struct A { int x; };
> struct B { int y; };
> struct C : A, B {};
>
> int B::*p = (int(B::*))(int(C::*))&A::x;
>
> ... the 'original member' is A::x, and 'the class containing the original
> member' is A, and B is neither a base class or a derived class of A, so the
> result (ahem, behavior) is undefined. Since we're talking about *the* class
> containing the original member, the normative wording seems unambiguous to
> me (and the note is true but not precise, which is what we expect from
> notes...).
>
>
> There's definitely no rule that the dynamic type — i.e. the type of the
> complete object, the most-derived class — directly contains the member to
> which the member pointer refers. I don't see how this note can be "true".
>
> If it were as you described, wouldn't this have defined behavior:
>
> struct D : B, A {} d;
> int k = d.*p;
>
> (Since, per [expr.mptr.oper]p4, the dynamic type of the LHS *does* contain
> the member, A::x, to which the RHS refers?) I'm also not sure which
> situations would reach the "Otherwise" case in your interpretation.
>
>
> Good point; my rule would need to be defined in terms of subobjects.
> On the other hand, I don't think you can avoid that. Consider:
> struct A { int x; };
> struct B : A {};
> struct C : A, C {};
>
I assume this was intended to be C : A, B ?
> int B::*b = &A::x;
> int C::*c = b;
> int A::*a = (int A::*) c;
>
That's an error; the 'A' base is ambiguous, and if you disambiguate it by
adding an extra layer between A and C, you introduce UB.
> Clearly the first two conversions here are valid, and A contains the
> original member. Why does this have undefined behavior?
>
> And if it doesn't:
> 1. That's a way to produce the collision with offset -1: just make 'x' a
> char.
> 2. What's the legitimate language excuse for making this defined only
> when the other base class is a repeat?
>
> John.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://sourcerytools.com/pipermail/cxx-abi-dev/attachments/20121220/3e48de83/attachment.html>
More information about the cxx-abi-dev
mailing list