vtable layout

thomson at ca.ibm.com thomson at ca.ibm.com
Tue Aug 31 14:52:52 UTC 1999


Christophe:

>In terms of performance, the impact is limited, because it will
>occur only if you use an A* to call f() or g(). With a B*, a C* or a
>D*, the pair (vtable, offset) is unique. The same offset can be
>reused for f() and g() and mean, in one case, "convert_to_X", in the
>other case, "convert_to_Y". Same thing for non-virtual inheritance.
>Last, the thunk generated in that case is no worse than the thunk
>that would be generated otherwise: we win in other cases, and don't
>lose in this one.

I still don't fully understand.  What cases of virtual inheritance
will not require a thunk?  Is it when the virtual base appears only
once in the hierarchy (and so might as well not have been virtual)?
That is the only time when you can maintain the spatial relationship
between derived and virtual base vtables.

And, as Jason points out, you are using the worst kind of thunk, it
probably isn't even on the same page as any of the other code never mind
the same cache line.


Jason:

I think your response is complete except for one item,

>These effects are negated if the thunk is located immediately before the
>target function, and you can use a pc-relative branch or just fall through.
>And your method faces the same issues.  That's why I talk about third-party
>thunks; they're the only ones that have performance problems.

A thunk that can fall through has no penalty, but modern deeply
pipelined processors don't like taken branches even if they are correctly
predicted (as an unconditional, pc-relative branch would be).
Because prediction happens in a later stage the prefetcher
normally assumes fall-through control flow and gets corrected a cycle or two
later when the predictor kicks in.  Whether the resulting "bubble" in
instruction issue actually ends up affecting throughput will depend on how
full the rest of the pipeline was. I interpret some of Christophe's
earlier contributions to suggest that we are likely to have just
suffered a mispredicted indirect branch, and in words stolen from
Gulliver's Travels, which seems to have something to say about
almost any situation, we may find the pipeline "lank as a bladder".

Let's see how well I can summarize this for nonvirtual inheritance:

On the one hand we have Christophe's reach-back entry point which,
because of RAW dependencies, is intrinsically 3 cycles and may suffer
an extra D-cache miss, but which can always fall through.

On the other hand we have the thunks we have been discussing, which
are one cycle but only one of them can fall through.  Others will have
a taken branch penalty which may or may not affect throughput.

It looks to me that our performance is better in the fall-through case
and, as long as the penalty is 2 cycles or less, at least as good
in the other cases, and we don't risk the extra D-cache miss, and we have
avoided growing the vtables in a way that has a worst-case 2X expansion.




Brian Thomson
VisualAge C/C++ Chief Architect






More information about the cxx-abi-dev mailing list