[cxx-abi-dev] pointer-to-data-member representation for null pointer is not conforming

John McCall rjmccall at apple.com
Fri Dec 21 03:09:27 UTC 2012


On Dec 20, 2012, at 4:19 PM, Richard Smith <richardsmith at google.com> wrote:
> Consider the following:
> 
> struct E {};
> struct X : E {};
> struct C : E, X { char x; };
> 
> char C::*c1 = &C::x;
> char X::*x = (char(X::*))c1;
> char C::*c2 = x2;
> 
> int main() { return c2 != 0; }
> 
> I believe this program is valid and has defined behavior; per [expr.static.cast]p12, we can convert a pointer to a member of a derived class to a pointer to a member of a base class, so long as the base class is a base class of the class containing the original member.
> 
> Per the ABI, C::x is at offset 0, C::E is at offset 0, and C::X and C::X::E are at offset 1 (they can't go at 0 due to the collision of the empty E base class). So the value of c1 is 0. And the value of x is... -1. Whoops.
> 
> Finally, the conversion from x to c2 preserves the -1 value (conversion of a null member pointer produces a null member pointer), giving the wrong value for x2, and resulting in main returning 0, where the standard requires it to return 1 (likewise, returning x != 0 would produce the wrong value).

Yep.

Personally, I've been aware of this for awhile and consider it an unfixable defect.  I don't know if it's generally known, though, and I can't find any prior discussion on the list.

I'm not aware of any non-artificial code that the defect has ever broken;  there are some decent just-so stories for why that might be true:
  (1) Data member pointers provide a really awkward abstraction that just aren't used that much:
    (1a) They let you abstract over any member you want!
    (1b) As long as that member has exactly the right type, not something implicitly convertible to it!
    (1c) And as long as that member is actually stored in a field, not computed from it!
    (1d) And as long as that field is a field of the class or one of its bases, not a field of a field of the class!
  (2) Everything about the syntax of member pointers — making them, using them, writing their types — is kindof weird-looking, and many people don't like using them.
  (3) The sorts of low-level programmers who would use this strange abstraction are often more comfortable using offsetof and explicit char* manipulation anyway.
  (4) People usually use data member pointers on hierarchically boring types anyway — generally leaf classes.
  (5) People usually don't mix data member pointers from different levels of the class hierarchy, and therefore generally don't convert do hierarchy conversions on them.
  (6) People usually don't work with null member pointers — they use member pointers as a way of abstracting an access for some algorithm, and generally that doesn't admit a null value.
  (6) Vanishingly few non-empty subclasses are ever going to be laid out at an offset of 1:
    (6a) The base class must have an alignment of 1, meaning (for pretty much every platform out there) no virtual functions, no interesting data structures, no pointers, no ints — nothing but bools and chars and arrays thereof.
    (6b) The derived class cannot have any virtual functions or virtual bases.
    (6c) The derived class must have multiple base classes, the first of which has to be either empty (totally empty, lacking even virtual methods) or size 1.

So it's a defect, to be sure — but I don't believe it has ever affected anyone, and it's not something that I feel merits any effort to pursue a fix for, even I could think of a way of do so without outright breaking the ABI, which I can't.

As to *why* this defect exists, I have a couple of theories.

The less charitable one is that the Itanium committee just overlooked this possibility.  They could probably have used 0x800..000 instead — it's more awkward to actually produce as an immediate on many architectures, but it's still pretty easy to test for (decrement and check for signed overflow, or negate and test for equality), and it's unambiguous given some pretty reasonable assumptions.

The more charitable one is that it's a casualty of the early flux in what conversions were going to be legal with member pointers.  A lot of early ABIs have it much worse than we do.  For example, the committee didn't originally ban converting member pointers across virtual-base boundaries, which really inflates both the size of a member pointer and the amount of code necessary for even a member-pointer downcast (which, recall, is the implicit, always-safe conversion) — for example, turning an opaque member pointer of a class with virtual bases into a member of a derived class across a virtual-base boundary requires potentially remapping the original virtual-base offset.  A lot of these early ABIs try to optimize the size of member pointers according to the known members of a class, which clearly doesn't work in the presence of upcasts (or in the presence of incomplete types, but that's the perennial evil of C++).

John.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://sourcerytools.com/pipermail/cxx-abi-dev/attachments/20121220/4d638f76/attachment.html>


More information about the cxx-abi-dev mailing list