vtable layout

Mon Aug 30 22:24:56 UTC 1999

>>>>> Christophe de Dinechin <ddd at cup.hp.com> writes:

 > In terms of performance, the impact is limited, because it will  
 > occur only if you use an A* to call f() or g(). With a B*, a C* or a  
 > D*, the pair (vtable, offset) is unique. The same offset can be  
 > reused for f() and g() and mean, in one case, "convert_to_X", in the  
 > other case, "convert_to_Y". Same thing for non-virtual inheritance.  

Yes, because it is non-virtual inheritance.

 > Last, the thunk generated in that case is no worse than the thunk  
 > that would be generated otherwise: we win in other cases, and don't  
 > lose in this one.

But it loses compared to the scheme Brian and I have been talking about,
which never requires a third-party thunk.

 >>  > ... And quite slower too ...
 >> 
 >> Why?

 > A thunk approach means that your virtual calls will look like:

 > - Indirect branch, almost always mispredicted (probably well over 99%)

Endemic to virtual functions; no way to get around this.

 > - I-Cache miss on thunk, since the thunk is quite "unique"
 > - Direct branch, almost always mispredicted, since prefetching did  
 > not have time to recover
 > - Possible I-Cache miss on target function

These effects are negated if the thunk is located immediately before the
target function, and you can use a pc-relative branch or just fall through.
And your method faces the same issues.  That's why I talk about third-party
thunks; they're the only ones that have performance problems.

 > On the other hand, the method I proposed has the following benefits:
 > - The indirect branch mispredicts as before
 >  - Once its target is known, the I-cache and pipeline are filled  
 > with useful information (the target function)

Why would this be any different with normal thunks?

 > - D-cache misses on the vtable offsets are unlikely if any virtual  
 > function of the same class was called recently

Even less likely if they aren't used...

 > - Call-site adjustment costs zero, in the sense that it is needed to  
 > get the the vptr anyway.
 > - If call site adjustment is all that is needed, then the necessary  
 > adjustment is done at a place where scheduling is easier (the  
 > caller), rather than at a place where scheduling is impossible (the  
 > thunk)

Again, how is this any different?  All schemes will involve adjusting
'this' to point to a subobject of the appropriate base for the call.

Unless you're talking about loading the offset from the vtable and applying
it in the caller, but that doesn't work with this scheme anyway.

Do you have an implementation of your layout code?  It seems to me that in
large hierarchies, deciding how to lay out the slots so that all the
offsets match up would get very complex.

Jason