Virtual function calls
Christophe de Dinechin
ddd at cup.hp.com
Thu Jun 24 00:31:27 UTC 1999
Folks,
Here is a summary of the virtual function call mechanism.
OPEN ISSUES THAT BELONG TO THIS DISCUSSION (Coming later)
1/ Keeping all of a class in a single load module. The vtable
contains the target address and one copy of the target GP. This
implies that it is not in text, and that it is generated by dld.
2/ Detailed layout of the virtual table
3/ How can we share class offsets?
1/ Scope and "State of the Art"
The following proposal applies only to calls to virtual functions
when a this pointer adjustment is required from a base class to a
derived class. Essentially, this means multiple inheritance, and the
existence of two or more virtual table pointers (vptr) in the
complete object. The multiple vptrs are required so that the layout
of all bases is unchanged in the complete object. There will be one
additional vptr for each base class which already required a vptr,
but cannot be placed in the whole object so that it shares its vptr
with the whole object. Note: when the vptr is shared, the base class
is said to be the "primary base class", and there is only one such
class.
For the primary base class, no pointer adjustment is needed. For all
other bases, a pointer to the whole object is not a pointer to the
base class, so whenever a pointer to the base class is needed,
adjustment will occur.
In particular, when calling a virtual function, one does not know in
advance in which class the function was actually defined. Depending
on the actual class of the object pointed to, pointer adjustment may
be needed or not, and the pointer adjustment value may vary from
class to class. The existing solution is to have the vtable point not
to the function itself, but to a "thunk" which does pointer
adjustment when needed, and then jumps to the actual function.
Another possibility is to have an offset in the vtable, which is used
by the called function. However, more often than not, this implies
adding zero.
Virtual bases make things slightly more complicated. In that case,
the data layout is such that there is only one instance of the
virtual base in the whole object. Therefore, the offset from a this
pointer to a same virtual base may change along the inheritance tree.
This is solved by placing an offset in the virtual table, which is
used to adjust the this pointer to the virtual base.
2/ Proposal and Rationale
My proposal is to replace thunks with offsets, with two additional tricks:
- Give a virtual function two entry points, so as to bypass the
adjustment when it's known to be zero
- Moving the adjustment at call-site, where it can be scheduled more
easily, using a "reasonable" value, so that the adjustment is
bypassed even more often.
The thunks are believed to cost more on IA64 than they would on
other platforms. The reason is that they are small islands of code
spread throughout the code, where you cannot guarantee any cache
locality. Since they immediately follow an indirect branch, chances
are we will always encounter both a branch misprediction and a
I-cache miss in a row.
On the other hand, a virtual function call starts by reading the
virtual function address. Reading the offset immediately thereafter
should almost never cause a D-cache miss (cache locality should be
good). More often than not, no adjustment is needed, or the
adjustment will be done at call site correctly. In the worst case
scenario, we perform two adjustments, one static at call site, and
one dynamic in the callee, but this case should be really infrequent.
3/ New Calling Convention
The new calling convention requires that the 'this' pointer on entry
points to the class for which the virtual function is just defined.
That is, for A::f(), the pointer is an A* when the main entry of the
function is reached. If the actual pointer is not an A*, then an
adjusting entry point is used, which immediately precedes the
function.
In the following, we will assume the following examples:
struct A { virtual void f(); };
struct B { virtual void g(); };
struct C: A, B { }
struct D : C { virtual void f(); virtual void g(); }
struct E: Other, C { virtual void f(); virtual void g(); }
struct F: D, E { virtual void f(); }
void call_Cf(C *c) { c->f(); }
void call_Cg(C *c) { c->g(); }
void call_Df(D* d) { d->f(); }
void call_Dg(D* d) { d->g(); }
void call_Ef(E* e) { e->f(); }
void call_Eg(E* e) { e->g(); }
void call_Ff(F *ff) { ff->f(); }
void call_Fg(F *ff) { ff->g(); } // Invalid:
ambiguous
a) Call site:
The caller performs adjustment to match the class of the last
overrider of the given function.
+ call_Cf will assume that the pointer needs to be cast to an A*,
since C::f is actually A::f. Since A is the primary base class, no
adjustment is done at call site.
+ call_Cg is similar, but assumes that the actual type is a B*, and
performs the adjustment, since B is not the primary base class.
+ call_Df and call_Dg will assume that the pointer needs to be cast
to a D*, which is where D::f is defined. No adjustment is performed
at call site.
b) Callee
+ A::f and B::g are defined in classes where there is a single vptr.
They don't define a secondary entry point. Because of call-site
conventions, they expect to always be called with the correct type.
+ D::f is defined in a class where there is more than one vptr, so
it needs a secondary entry point and an entry 'convert_to_D' in the
vtable. That's because it can be potentially called with either an A*
or a B*. There are two vtables, one for A in D, one for B in D. The
D::f entry in A in D points to the non-adjusting entry point, since A
shares its vptr.
+ D::g requires a secondary entry point, that will read the same
offset 'convert_to_D' from the vtable.
+ E also will require a 'convert_to_E' entry in the vtable, but this
time, the vtable for A in C will have to point to an adjusting entry
point, since A no longer shares the vptr with E (assuming Other has
a vptr). This vtable is also the vtable of C in E.
c) Offsets in the vtable
Offsets have to be placed in the vtable at a position which does not
conflict with any offset in the inheritance tree.
convert_to_D and convert_to_E are likely to be at the same offset in
the vtable. This is not a problem, even if D and E are used in the
same class, such as F, because this is the same offset in different
vtables.
- call_Fg is invalid, because it is ambiguous.
- A notation such as ((E*) ff)->g() can be used to disambiguate, but
in that case, we don't use the same vtable (either the E in F or D
in F vtable). The E in F vtable uses that offset as 'convert_to_E',
whereas the D in F vtable uses that offset as 'convert_to_D'.
- Similarly, call_Cf called with an F object will actually be called
with the E in F or D in F, which disambiguates which C is actually
used. The actual C* passed will have been adjusted by the caller
unambiguously, or the call will be invalid.
- For functions overriden in F, an entry 'convert_to_F' is created
anyway. This entry will not overlap with either convert_to_E or
convert_to_D.
The fact that an offset is reserved does not mean that it is
actually used. A vtable need to contain the offset only if it refers
to a function that will use it. An offset of 0 is not needed, since
the function pointer will point to the non-adjusting entry point in
that case.
4/ Cases where adjustment is performed
+ For call_Cf: No adjustment is done at call site. No adjustment is
done at callee site if the dynamic type is C, or D, or D in F (that
is, F casted to an E).
+ For call_Cg: Adjustment to B* is done at call-site. No further
adjustment is needed if the dynamic type is C, D, or D in F. On the
other hand, a second adjustment may happen for an E or E in F,
because C is not their primary base.
In other words, adjustment is made only when necessary, and at a
place where it is better scheduled than with thunks. The only bad
case is double adjustment for call_Cg called with an E*. This case
can probably be considered rare enough, compared to calls such as
call_Cg called with a C*, where we now actually do the adjustment at
the call-site.
4/ Comparing the code trails
Currently, the sequence for a virtual function call in a shared
library will look as follows. I'm assuming +DD64, there would be some
additional addp4 in +DD32. The trail below is the dynamic execution
sequence. In bold and between #if/#endif, the affected code.
// Compute the address of the vptr in the object, from the
this pointer
// Optional, since vptroffset is often 0. This also adjusts
to the class
// of the final overrider
addi Rthis=vptroffset_of_final_overrider,Rthis
;;
// Load the vptr in a register
ld8 Rvptr=[Rthis]
;;
// Add the offset to get to the function descriptor pointer
in the vtable
// Never zero, this instruction is always generated
addi Rfndescr=fndescroffset,Rvptr
;;
// (Assuming inlined stub) Load the function address and new GP
ld8 Rfnaddr=[Rfndescr],8
;;
// Load the new GP
ld8 GP=[Rfndescr]
mov BRn=Rfnaddr
;;
// Perform the actual branch to the target
// For prediction to occur correctly on McKinley,
// at least 5 cycles between the mov to BR above and the use
// of the BR. Up to 16 bundles...
// This is very unlikely to be possible in most actual code, so...
// ... Branch misprediction almost always
// ... followed by I-Cache miss almost always if jumping to
a thunk
br.call B0=BRn
#if OLD_ADJUST
thunk_A::f_from_a_B:
// If the 'adjustment_from_B_to_A is the 'adjustment_to_A' above,
// then in the new case, the vtable directly points to A::f
addi Rthis,adjustment_from_B_to_A
// In most cases, we can probably generate a PC-relative
branch here
// It is unclear whether we would correctly predict that branch
// (since it is assumed that we arrive here immediately following
// a misprediction at call site)
br A::f
#endif // OLD_ADJUST
// This occurs less often than OLD_ADJUST
// (it does not happen when call-site adjustment is correct)
#if NEW_ADJUST
adjusting_entry_A::f
// Can't be executed in less than 3 cycles?
addi Rvptr=class_adjustment_offset,Rvptr
;;
// This loads data which is close to the fn descriptor,
// so it's likely to be in the D-cache
ld8 Rvptr=[Rvptr]
;;
add Rthis=Rthis,Rvptr
#endif
A::f:
alloc ...
Christophe
More information about the cxx-abi-dev
mailing list