[cxx-abi-dev] pointer-to-data-member representation for null pointer is not conforming

Fri Dec 21 06:32:59 UTC 2012

On Thu, Dec 20, 2012 at 10:02 PM, John McCall <rjmccall at apple.com> wrote:

> On Dec 20, 2012, at 9:37 PM, Richard Smith <richardsmith at google.com>
> wrote:
> > On Thu, Dec 20, 2012 at 8:53 PM, John McCall <rjmccall at apple.com> wrote:
> > On Dec 20, 2012, at 7:09 PM, John McCall <rjmccall at apple.com> wrote:
> >> On Dec 20, 2012, at 4:19 PM, Richard Smith <richardsmith at google.com>
> wrote:
> >>> Consider the following:
> >>>
> >>> struct E {};
> >>> struct X : E {};
> >>> struct C : E, X { char x; };
> >>>
> >>> char C::*c1 = &C::x;
> >>> char X::*x = (char(X::*))c1;
> >>> char C::*c2 = x2;
> >>>
> >>> int main() { return c2 != 0; }
> >>>
> >>> I believe this program is valid and has defined behavior; per
> [expr.static.cast]p12, we can convert a pointer to a member of a derived
> class to a pointer to a member of a base class, so long as the base class
> is a base class of the class containing the original member.
> >>>
> >>> Per the ABI, C::x is at offset 0, C::E is at offset 0, and C::X and
> C::X::E are at offset 1 (they can't go at 0 due to the collision of the
> empty E base class). So the value of c1 is 0. And the value of x is... -1.
> Whoops.
> >>>
> >>> Finally, the conversion from x to c2 preserves the -1 value
> (conversion of a null member pointer produces a null member pointer),
> giving the wrong value for x2, and resulting in main returning 0, where the
> standard requires it to return 1 (likewise, returning x != 0 would produce
> the wrong value).
> >>
> >> Yep.
> >>
> >> Personally, I've been aware of this for awhile and consider it an
> unfixable defect.  I don't know if it's generally known, though, and I
> can't find any prior discussion on the list.
> >>
> >> I'm not aware of any non-artificial code that the defect has ever
> broken;  there are some decent just-so stories for why that might be true:
> >>   (1) Data member pointers provide a really awkward abstraction that
> just aren't used that much:
> >>     (1a) They let you abstract over any member you want!
> >>     (1b) As long as that member has exactly the right type, not
> something implicitly convertible to it!
> >>     (1c) And as long as that member is actually stored in a field, not
> computed from it!
> >>     (1d) And as long as that field is a field of the class or one of
> its bases, not a field of a field of the class!
> >>   (2) Everything about the syntax of member pointers — making them,
> using them, writing their types — is kindof weird-looking, and many people
> don't like using them.
> >>   (3) The sorts of low-level programmers who would use this strange
> abstraction are often more comfortable using offsetof and explicit char*
> manipulation anyway.
> >>   (4) People usually use data member pointers on hierarchically boring
> types anyway — generally leaf classes.
> >>   (5) People usually don't mix data member pointers from different
> levels of the class hierarchy, and therefore generally don't convert do
> hierarchy conversions on them.
> >>   (6) People usually don't work with null member pointers — they use
> member pointers as a way of abstracting an access for some algorithm, and
> generally that doesn't admit a null value.
> >>   (6) Vanishingly few non-empty subclasses are ever going to be laid
> out at an offset of 1:
> >>     (6a) The base class must have an alignment of 1, meaning (for
> pretty much every platform out there) no virtual functions, no interesting
> data structures, no pointers, no ints — nothing but bools and chars and
> arrays thereof.
> >>     (6b) The derived class cannot have any virtual functions or virtual
> bases.
> >>     (6c) The derived class must have multiple base classes, the first
> of which has to be either empty (totally empty, lacking even virtual
> methods) or size 1.
> >
> > I went to dinner and realized that this point isn't as useful as I
> thought — you don't need a base class to be laid out at an offset of 1, you
> need a base class to be laid out immediately after a base A that has a
> field of size 1 at offset datasize(A)-1.
> >
> > You need the field to be in the derived class in order for this to be a
> problem; otherwise, the cast would have undefined behavior. Hence, the base
> class must be empty, and indeed must be a repeated empty base class (to not
> be at offset 0).
>
> I think I see where you're getting that, but I'm not sure that's really
> the intended meaning of the standard here.
>
> To elaborate, you seem to be interpreting the following text to mean
> that members of *other bases* of the derived class cannot be casted
> to be members of base class:
>   If class B contains the original member, or is a base or derived
>   class of the class containing the original member, the resulting
>   pointer to member points to the original member.  Otherwise, the
>   result of the cast is undefined.
>
> It does seem to be generally true that "contains" means only direct
> containment;  compare [intro.object]p3:
>   For every object x, there is some object called the complete object
>   of x, determined as follows:
>     - If x is a complete object, then x is the complete object of x.
>     - Otherwise, the complete object of x is the complete object of the
>       (unique) object that contains x.
>
> And the use of "contains" in the quote above does seem to imply
> only direct containment, because otherwise it wouldn't need to
> include the "base or derived" phrase.
>
> On the other hand, the note immediately after this uses "contains"
> more loosely:
>   although class B need not contain the original member, the dynamic
>   type of the object on which the pointer to member is dereferenced
>   must contain the original member
>
> So I'm not convinced that the standard should necessarily be read that
> closely.

For...

struct A { int x; };
struct B { int y; };
struct C : A, B {};

int B::*p = (int(B::*))(int(C::*))&A::x;

... the 'original member' is A::x, and 'the class containing the original
member' is A, and B is neither a base class or a derived class of A, so the
result (ahem, behavior) is undefined. Since we're talking about *the* class
containing the original member, the normative wording seems unambiguous to
me (and the note is true but not precise, which is what we expect from
notes...).

If it were as you described, wouldn't this have defined behavior:

struct D : B, A {} d;
int k = d.*p;

(Since, per [expr.mptr.oper]p4, the dynamic type of the LHS *does* contain
the member, A::x, to which the RHS refers?) I'm also not sure which
situations would reach the "Otherwise" case in your interpretation.

> >  I *can* imagine a number of use cases that cause situations like this,
> so while most of my other points stand, it isn't quite as cut-and-dry as I
> made it out to be.
> >
> > #include <iostream>
> >
> > struct noncopyable {
> >   noncopyable() = default;
> >   noncopyable(const noncopyable&) = delete;
> > };
> > struct serializable : noncopyable {
> >   template<typename T> void serialize(T serializable::**members) {
> >     while (*members) std::cout << this->**members++ << std::endl;
> >   }
> > };
> > struct MyWonderfulType : noncopyable, serializable {
> >   char c = 'x';
> >   void serialize() {
> >     char serializable::*(CharMembers[]) = {
> (char(serializable::*))&MyWonderfulType::c, nullptr };
> >     serializable::serialize(CharMembers);
> >   }
> > };
>
> Cute.
>
> At any rate, it's not fixable.
>
> John.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://sourcerytools.com/pipermail/cxx-abi-dev/attachments/20121220/2fbfc4dd/attachment.html>