vtable layout

Mon Aug 30 21:47:24 UTC 1999

> >>>>> thomson  <thomson at ca.ibm.com> writes:
>
>  > I don't see how this solves my diamond case
>
>  >    struct V1 { virtual void f();  virtual void g(); };
>  >    struct Other1 { virtual void ignore1(); }
>  >    struct X : Other1, virtual V1 { virtual void f(); }
>
>  >    struct Y : Other1, virtual V1 { virtual void g(); }
>
>  >    struct ZZ: X, Y {}
>
> You're right, I didn't think it through far enough.  On the sides  
of the
> diamonds, we decide where the adjustments go.  They end up in the first 
> available slot, which is slot -1 in both classes.  But only one  
adjustment
> can be at that offset from the V1 vptr, so the adjustments from V1  
to X and
> Y must be identical.  Which they're not, so this doesn't work.  It gets 
> worse if the two classes have different numbers of virtual functions. 
>
> Christophe?

I get it now, sorry for my previous post. I believe that this  
example has been brought up earlier (two or three weeks ago). You are  
right, that's one of the two cases where we still need to emit a  
thunk. We also need a thunk in some cases of  covariant return type  
(to perform a "post" adjustment).

In terms of performance, the impact is limited, because it will  
occur only if you use an A* to call f() or g(). With a B*, a C* or a  
D*, the pair (vtable, offset) is unique. The same offset can be  
reused for f() and g() and mean, in one case, "convert_to_X", in the  
other case, "convert_to_Y". Same thing for non-virtual inheritance.  
Last, the thunk generated in that case is no worse than the thunk  
that would be generated otherwise: we win in other cases, and don't  
lose in this one.

>  >> This isn't an outrageous idea, it only works for nonvirtual  
inheritance
>  >> but we are already on a path where the solutions for the  
virtual and
>  >> nonvirtual cases have to be different.  We end up with more entry 
>  >> points, but they are simpler than the  
reach-back-into-the-vtable ones.
>
When we discussed the problem for covariant returns, someone  
(Jason?) pointed out that the ABI simply mandated the presence of the  
offsets in the vtable, but that you can be ABI-compatible and  
generate thunks that never use the offsets.

>  > ... And quite slower too ...
>
> Why?

A thunk approach means that your virtual calls will look like:

- Indirect branch, almost always mispredicted (probably well over 99%)
- I-Cache miss on thunk, since the thunk is quite "unique"
- Direct branch, almost always mispredicted, since prefetching did  
not have time to recover
- Possible I-Cache miss on target function

On the other hand, the method I proposed has the following benefits:
- The indirect branch mispredicts as before
 - Once its target is known, the I-cache and pipeline are filled  
with useful information (the target function)
- D-cache misses on the vtable offsets are unlikely if any virtual  
function of the same class was called recently
- Call-site adjustment costs zero, in the sense that it is needed to  
get the the vptr anyway.
- If call site adjustment is all that is needed, then the necessary  
adjustment is done at a place where scheduling is easier (the  
caller), rather than at a place where scheduling is impossible (the  
thunk)

For more details, see the complete code trail I sent with my initial  
proposal.

Best regards
Christophe