incomplete rtti

Mon Apr 3 22:57:24 UTC 2000

> From: Nathan Sidwell <nathan at codesourcery.com>
> 
> >This assumption is untrue.  The problem is that weak types don't work
> >like you assume on most systems.  With the exception of Linux and Irix,
> >most systems do not distinguish between weak and "strong" symbols once
> >an object is linked.  Therefore, given a weak RTTI in the main
> >executable and a strong RTTI in a DSO, they would preempt the latter
> >with the former.  As a result, it is necessary to make our incomplete
> >class RTTI not just weak, but distinct.  Once it is distinct, the
> >pointer RTTI referencing it must be distinct from one referencing the
> >complete version, and so on up the pointer chain, and it is not
> >possible to compare them at any level.
> 
> Ok, thanks for the clarification about that. Let me just see if I
> understand DSO linking properly. If we have a loaded object file which
> refered to a non-defined weakly declared symbol, that object file will
> have resolved the symbol to zero. Loading a DSO which defines that
> symbol will not affect the already loaded object file, which remains
> having the value zero. Ok?

That's correct on some of the Unix implementations.  On others (e.g.
Irix), it will be "fixed" by the DSO.

> >Our solution is to use the ABI-defined external mangled RTTI name only
> >for complete types.  RTTI generated for pointer-to-incomplete-type must
> >be different.  We leave it to the implementation to decide how, but two
> >workable approaches are (a) make it a local static, or (b) mangle it
> >differently and use COMDAT to remove duplicates; but at least one
> >incomplete RTTI would remain, and it would not be the same as the
> >complete one even after preemption.
> 
> Ok, this needs more documentation. There are three uses for the rtti
> information, which we _don't_ have to solve with the same rtti object.
> We have attempted to use a single type_info for all three uses, but
> that is not necessary, and in some cases we have separate objects (of
> the same type, __pointer_type_info) for the same type (T *...*, for
> incomplete T).
> 
> 1) A distinct lvalue for the typeid operator
> 2) A class heirarchy descriptor for dynamic_cast
> 3) A class heirarchy and pointer qualification descriptor for catch
> matching.
> 
> Consider a source file `foo' consisting of these snippets
>         struct A;
>         typeid (A *); //1
>         catch (A **); //2
> 1 would produce a __pointer_type_info of the following shape
> tf_P1A: comdat  ; call this foo.tf_P1A
>         name = "P1A"
>         flags = 0
>         target = weak tf_1A
> 
> The TU would not need to emit the (incomplete) type_info for A itself,
> but leave a dangling weak reference. That will become zero, unless
> another TU is linked in which provides it.
> 
> 2 must produce information about the entire pointer chain, from the
> layout doc that would be
> _tf_PP1A: static
>         name = "PP1A"
>         flags = 8
>         target = _tf_P1A
> _tf_P1A: static
>         name = "P1A"
>         flags = 8
>         target = _tf_1A
> _tf_1A: static
>         name = "1A"
> 
> Ok?
> 
> Now, if we another source `baz' with a definition of A,
>         struct B {};
>         struct A : B {};
>         typeid (A *); //1a
>         catch (A **); //2a
>         throw (A *)NULL;  //3a
> 
> 1a would produce the same __pointer_type_info as before, but this time
> we would emit comdat typeinfo for the catch clause 2a, and throw 3a is
> also permitted.
> tf_PP1A: comdat
>         name = "PP1A"
>         flags = 0
>         target = tf_P1A
> tf_P1A: comdat  ; call this baz.tf_P1A
>         name = "P1A"
>         flags = 0
>         target = tf_1A
> tf_1A: comdat
>         name = "1A"
>         base = tf_1B
> tf_1B: comdat
>         name = "1B"
> 
> of course tf_P1A is the same object that 1a forced us to emit.
> 
> Now the $BIGNUM dollar question. What if `foo' is our executable and
> `baz' is the DSO? foo.tf_P1A will be the `active' definition of tf_P1A,
> and it will have a NULL target value. Therefore baz.tf_P1A is not
> selected and baz.tf_PP1A's target will resolve to foo.tf_P1A. Also the
> throw 3a will refer to tf_P1A, which will be resolved by foo.tf_P1A.
> That has an incomplete target type, so 3a won't have information about
> A being derived from B, and thus won't match a catch (B *) clause.

Right, which is why foo can't emit a global comdat for P1A with the
ABI mangled name and a weak pointer to nothing.  It must behave like
the low levels of your PP1a chain -- static, pointing to an
incomplete-type class_type_info, and it should have the incomplete
target flag set.  I think the rule in the document is clear:

    If the target type of the pointer is an incomplete class type,
    directly or indirectly,
    a dummy class RTTI is generated for the incomplete type
    that will not resolve to the final complete class RTTI
    (because the latter need not exist),
    possibly by making it a local static object,
    and the incomplete target type flag is set
    in each pointer RTTI that references it directly or indirectly.

> Perhaps I'm being dense, and there's still something about comdat
> linkage I don't understand. But to my understanding the previous
> paragraph is the same problem with DSO's I originally raised before
> this incomplete type info was addressed.

It is.  That's why we are NOT depending on weak linkage.  Only
complete-type RTTI are specified to have the defined mangled global
names, and everything else (i.e. the pointers to incomplete class types
and the target types themselves) is specified to be static or otherwise
conflict-free.  Such things are always accessed via pointers with the
incomplete-target flag set, which is the signal to do comparisons using
the mangled names in the ultimate target RTTI instead of comparing
addresses.

Jim

-	    Jim Dehnert		dehnert at sgi.com
				(650)933-4272