vtable layout

Tue Aug 31 23:41:27 UTC 1999

> > 1/ Misprediction penalty
>
> > All I can say is that the hypothesis that the penalty is 2 cycles or
> > less is way too optimistic
>
>Is there a term for the case when the branch predictor correctly predicts a
>branch but the pipeline stalls because the prefetcher assumed no branch?

Yes, this is the difference between the "taken branch penalty", which is
typically small (0 to 2 cycles), and the "mispredicted branch penalty" which
is much higher, and in an aggressively pipelined state-of-the-art
processor could be 10 or 20 or more.

> > Regarding whether the second branch would be correctly predicted or
> > not... The documentation I have is quite difficult to decipher, so
> > I'm not too sure. My impression is that at least on one
> > implementation, the branch would predict correctly and not cause an
> > additional penalty.
>
>What would be the excuse for mispredicting an unconditional forward
>pc-relative branch?

To be fair, this is not unheard of.  There was an AMD processor years
ago that did this, and more recently the Pentium had the same problem.
It used the BTB to predict all branches, even unconditional
pc-relative ones, so if the branch hadn't already and fairly recently
been encountered they would get it wrong; I think the penalty was
3 cycles.

It seems easy enough to fix, all you have to do is do an add and you
can get the right answer.  But in hardware, doing an add somewhere
where I didn't do it before means building an extra adder.  This was
finally judged worth doing in the P6, when the cost of misprediction
grew to about 12 cycles.  The "static predictor" reduces this to
5, while a BTB-predicted taken branch cost only 1.

So there is precedent, and Christophe has access to more information
about the implementations than I have, but it would surprise me since
it certainly is a step in the wrong direction.  Especially since the
architecture takes some pains to support static prediction as an
alternative to dynamic, to reduce contention for BTB resources.

With regard to the virtual base side of this whole issue, IBM has
skin in the game in virtual bases because of our support for
the CORBA programming model.  CORBA in C++ implies lots of virtual
bases and lots of calls through introducing classes, and that is
why the possibility of avoiding the wandering thunk is particularly
interesting.

Brian Thomson
VisualAge C/C++ Chief Architect