C++ ABI Closed Issues

Revised 17 November 2000


Issue Status

In the following sections, the class of an issue attempts to classify it on the basis of what it likely affects. The identifiers used are:
call Function call interface, i.e. call linkage
data Data layout
lib Runtime library support
lif Library interface, i.e. API
g Potential gABI impact
ps Potential psABI impact
source Source code conventions (i.e. API, not ABI)
tools May affect how program construction tools interact


A. Object Layout Issues

# Issue Class Status Source Opened Closed
A-1 Vptr location data closed SGI 990520 990624
Summary: Where is the Vptr stored in an object (first or last are the usual answers).

[990610 All] Given the absence of addressing modes with displacements on IA-64, the consensus is to answer this question with "first."

[990617 All] Given a Vptr and only non-polymorphic bases, which (Vptr or base) goes at offset 0?

Tentative decision: Vptr always goes at beginning.

[990624 All] Accepted tentative decision. Rename, close this issue, and open separate issue (B-6) for Vtable layout.

# Issue Class Status Source Opened Closed
A-2 Virtual base classes data closed SGI 990520 990624
Summary: Where are the virtual base subobjects placed in the class layout? How are data member accesses to them handled?

[990610 Matt] With regard to how data member accesses are handled, the choices are to store either a pointer or an offset in the Vtable. The concensus seems to be to prefer an offset.

[990617 All] Any number of empty virtual base subobjects (rare) will be placed at offset zero. If there are no non-virtual polymorphic bases, the first virtual base subobject with a Vpointer will be placed at offset zero. Finally, all other virtual base subobjects will be allocated at the end of the class, left-to-right, depth-first.

[990624 All] Define an empty object as one with no non-static, non-empty data members, no virtual functions, no virtual base classes, and no non-empty non-virtual base classes. Define a nearly empty object as one which contains only a Vptr. The above resolution is accepted, restated as follows:

Any number of empty virtual base subobjects (rare, because they cannot have virtual functions or bases themselves) will be placed at offset zero, subject to the conflict rules in A-3 (i.e. this cannot result in two objects of the same type at the same address). If there are no non-virtual polymorphic base subobjects, the first nearly empty virtual base subobject will be placed at offset zero. Any virtual base subobjects not thus placed at offset zero will be allocated at the end of the class, in left-to-right, depth-first declaration order.

# Issue Class Status Source Opened Closed
A-3 Multiple inheritance data closed SGI 990520 990701
Summary: Define the class layout in the presence of multiple base classes.

[990617 All] At offset zero is the Vptr whenever there is one, as well as the primary base class if any (see A-7). Also at offset zero is any number of empty base classes, as long as that does not place multiple subobjects of the same type at the same offset. If there are multiple empty base classes such that placing two of them at offset zero would violate this constraint, the first is placed there. (First means in declaration order.)

All other non-virtual base classes are laid out in declaration order at the beginning of the class. All other virtual base subobjects will be allocated at the end of the class, left-to-right, depth-first.

The above ignores issues of padding for alignment, and possible reordering of class members to fit in padding areas. See issue A-9.

[990624 All] There remains an issue concerning the selection of the primary base class (see A-7), but we are otherwise in agreement. We will attempt to close this on 1 July, modulo A-7.

[990701 All] This issue is closed. A full description of the class layout can be found in issue A-9. (At this time, A-7 remains to be closed, waiting for the Taligent rationale.)

# Issue Class Status Source Opened Closed
A-4 Empty base classes data closed SGI 990520 990624
Summary: Where are empty base classes allocated? (An empty base class is one with no non-static data members, no virtual functions, no virtual base classes, and no non-empty non-virtual base classes.)

[990624 All] Closed as a duplicate of A-3.

# Issue Class Status Source Opened Closed
A-5 Empty parameters data closed SGI 990520 001117
Summary: When passing a parameter with an empty class type by value, what is the convention?
Resolution : Except for cases of non-trivial copy constructors (see C-7), and parameters in the variable part of varargs lists, A single parameter slot will be allocated to empty parameters, as though they were a struct containing a single character.

[990623 SGI] We propose that no parameter slot be allocated to such parameters, i.e. that no register be used, and that no space in the parameter memory sequence be used. This implies that the callee must allocate storage at a unique address if the address is taken (which we expect to be rare).

[990624 All] In addition to the address-taken case, care is required if the object has a non-trivial copy constructor. HP observes that in (some?) such cases, they perform the construction at the call site and pass the object by reference.

[990625 SGI -- Jim] I understand that the Standard explicitly allows elimination of even non-trivial copy construction in some cases. Is this one of them? Where should I look? Also, of course, varargs processing for elided empty parameters would need to be careful.

I have opened a new issue (C-7) for passing copy-constructed parameters by reference. Since doing so would turn an empty value parameter into a non-empty reference parameter, this issue can ignore such cases.

[990701 All] An empty parameter will not occupy a slot in the parameter sequence unless:

  1. its type is a class with a non-trivial copy constructor; or
  2. it corresponds to the variable part of a varargs parameter list.

Daveed and Matt will pursue the question of when copy constructors may be ignored for parameters with the Core committee, and if they identify cases where the constructors may clearly be omitted, those (empty) parameters will also be elided.

[001109 CodeSourcery -- Mark] Both g++ and the HP compiler have great difficulty dealing with this, and prefer to reserve the parameter slot even for empty parameters. At the meeting, we tentatively decided to reverse our decision and allocate an integer parameter slot even for empty parameters. We will place no constraints on the data in the parameter slot, except that on IA-64, it must be not be NaT data.

[001117 All -- Jim] There having been no objection to the proposed resolution, it is adopted. Results will be treated the same way.

# Issue Class Status Source Opened Closed
A-6 RTTI .o representation data call ps closed SGI 990520 991028
Summary: Define the data structure to be used for RTTI, that is:
  • for user type_info calls;
  • for dynamic_cast implementation; and
  • for exception-handling.
Resolution: Defined in the Draft C++ ABI for IA-64.

[990701 All] Daveed will put together a proposal by the 15th (action #13); the group will discuss it on the 22nd.

[990805 All] Daveed should have his proposal together for discussion. Michael Lam will look into the Sun dynamic cast algorithm.

It was noted that appropriate name selection along with the normal DSO global name resolution should be sufficient to produce a unique address for each class' RTTI struct, which address would then be a suitable identifier for comparisons.

[990812 Sun -- Michael] Sun has provided a description, in a separate page, describing their implementation. They are filing for a patent on the algorithms described.

[990819 EDG -- Daveed] (Proposal replaced by later version on 6 October.)

[990826 All] Discussion centered on whether the representation should include all base classes or just the direct ones, and in the former case how hashing might be handled. It was agreed that the __qualifier_type_info variant is not needed, and it is now striken in the above proposal. Also, a pointer-to-member variant is needed. Christophe will provide a description of the HP hashing approach, and Daveed will update the specification.


[991006 EDG -- Daveed]

Run-time type information

The C++ programming language definition implies that information about types be available at run time for three distinct purposes:

  1. to support the typeid operator,
  2. to match an exception handler with a thrown object, and
  3. to implement the dynamic_cast operator.
(c) only requires type information about polymorphic class types, but (a) and (b) may apply to other types as well; for example, when a pointer to an int is thrown, it can be caught by a handler that catches "int const*".

Deliberations

The following conclusions were arrived at by the attending members of the C++ IA-64 ABI group:

The full proposal has been incorporated in the Draft C++ ABI for IA-64.


[991014 all]

  1. Do we keep pointers to direct bases only, or to indirect bases as well? It is believed that keeping pointers to indirect bases speeds up dynamic_cast by a constant factor, but at the cost of extra space even when dynamic_cast is never used. There is a general preference for keeping direct bases only.

  2. The current proposal has a flag to differentiate single inheritance from multiple inheritance case. Jason suggests instead splitting the two cases into two separate classes, and there was general agreement that this is a good idea.

  3. The current proposal has separate classes for various kinds of non-class types. Jason suggests merging all non-class types into a single class. Nobody had strong feelings, or strong arguments either for or against this change. In the absence of a consensus in favor of this change, we'll keep the proposal as is.

  4. Minor changes: There's a typo in the pointer to member part, which Daveed will fix. Jason suggests flipping the sign on the offset, and nobody objected.

ACTION ITEMS: Daveed---make these changes. Jim---incorporate these changes into the open issues list. We are almost ready to close this issue; we intend to close it at the 28 October meeting, after we've all had a change to go over the modified writeup.


[991028 all] The current definition, in the Draft C++ ABI for IA-64, has been updated with Daveed's changes, and is accepted. Note that we are back to using a pointer to RTTI in the vtable (see B-8), since we need uniqueness, and since we need an external symbol in any case, the ABI will make no statement about where RTTI is allocated. It is likely that implementations will use COMDAT for it.

# Issue Class Status Source Opened Closed
A-7 Vptr sharing with primary base class data closed HP 990603 990729
Summary: It is in general possible to share the virtual pointer with a polymorphic base class (the primary base class). Which base class do we use for this?
Resolution: Share with the first non-virtual polymorphic base class, or if none with the first nearly empty virtual base class.

[990617 All] It will be shared with the first polymorphic non-virtual base class, or if none, with the first nearly empty polymorphic virtual base class. (See A-2 for the definition of nearly empty.)

[990624 All] HP noted that Taligent chooses a base class with virtual bases before one without as the primary base class), probably to avoid additional "this" pointer adjustments. SGI observed that such a rule would prevent users from controlling the choice by their ordering of the base classes in the declaration. The bias of the group remains the above resolution, but HP will attempt to find the Taligent rationale before this is decided.

[990729 All] Close with the agree resolution. If a convincing Taligent rationale is found, we can reconsider.

# Issue Class Status Source Opened Closed
A-8 (Virtual) base class alignment data closed HP 990603 990624
Summary: A (virtual) base class may have a larger alignment constraint than a derived class. Do we agree to extend the alignment constraint to the derived class? (An alternative for virtual bases: allow the virtual base to move in the complete object.)

[990623 SGI] We propose that the alignment of a class be the maximum alignment of its virtual and non-virtual base classes, non-static data members, and Vptr if any.

[990624 All] Above proposal accepted. (SGI observation: the size of the class is rounded up to a multiple of this alignment, per the underlying psABI rules.)

# Issue Class Status Source Opened Closed
A-9 Sorting fields as allowed by [class.mem]/12 data closed HP 990603 990624
Summary: The standard constrains ordering of class members in memory only if they are not separated by an access clause. Do we use an access clause as an opportunity to fill the gaps left by padding?
Resolution: See separate writeup of Draft C++ ABI for IA-64.

[990610 all] Some participants want to avoid attempts to reorder members differently than the underlying C struct ABI rules. Others think there may be benefit in reordering later access sections to fill holes in earlier ones, or even in base classes.

[990617 all] There are several potential reordering questions, more or less independent:

  1. Do we reorder whole access regions relative to one another?
  2. Do we attempt to fill padding in earlier access regions with initial members from later regions?
  3. Do we fill the tail padding of non-POD base classes with members from the current class?
  4. Do we attempt to fill interior padding of non-POD base classes with later members?

There is no apparent support for (1), since no simple heuristic has been identified with obvious benefits. There is interest in (2), based on a simple heuristic which might sometimes help and will never hurt. However, it is not clear that it will help much, and Sun objects on grounds that they prefer to match C struct layout. Unless someone is interested enough to implement and run experiments, this will be hard to agree upon. G++ has implemented (3) as an option, based on specific user complaints. It clearly helps HP's example of a base class containing a word and flag, with a derived class adding more flags. Idea (4) has more problems, including some non-intuitive (to users) layouts, and possibly complicating the selection of bitwise copy in the compiler.

[990624 all] We will not do (1), (2), or (4). We will do (3). Specifically, allocation will be in modified declaration order as follows:

  1. Vptr if any, and the primary base class per A-7.
  2. Any empty base classes allocated at offset zero per A-3.
  3. Any remaining non-virtual base classes.
  4. Any non-static data members.
  5. Any remaining virtual base classes.
Each subobject allocated is placed at the next available position that satisfies its alignment constraints, as in the underlying psABI. This is interpreted with the following special cases:
  1. The "next available position" after a non-POD class subobject (base class or data member) with tail padding is at the beginning of the tail padding, not after it. (For POD objects, the tail padding is not "available.")
  2. Empty classes are considered to have alignment and size 1, consisting solely of one byte of tail padding.
  3. Placement on top of the tail padding of an empty class must avoid placing multiple subobjects of the same type at the same address.
After allocation is complete, the size is rounded up to a multiple of alignment (with tail padding).

[990722 all] The precise placement of empty bases when they don't fit at offset zero remained imprecise in the original description. Accordingly, a precise layout algorithm is described in a separate writeup of Data Layout.

[990729 all] The layout writeup was accepted, with the first choice for empty base placement. That is, if placement at offset zero doesn't work, it will be placed like a normal base/member. The concensus was that this won't happen often, and such bases will often overlap with the preceding tail padding or following components anyway. Jim will modify the writeup accordingly.

# Issue Class Status Source Opened Closed
A-10 Class parameters in registers call closed HP 990603 990710
Summary: The C ABI specifies that structs are passed in registers. Does this apply to small non-POD C++ objects passed by value? What about the copy constructor and this pointer in that case?

[990701 all] A separate issue (C-7) deals with cases where a non-trivial copy constructor is required; we ignore those cases here. Our conclusion is that, without a non-trivial copy constructor, we need not be concerned about the class object moving in the process of being passed, and there is no need to use a mechanism different from the base ABI C struct mechanism. At the same time, if we do use the underlying C struct mechanism, the user has complete control of the passing technique, by choosing whether to pass by value or reference/pointer.

Therefore, except in cases identified by issue C-7 for different treatment, class parameters will be passed using the underlying C struct protocol.

# Issue Class Status Source Opened Closed
A-11 Pointers to member functions data closed Cygnus 990603 990812
Summary: How should pointers to member functions be represented?
Resolution: As a pair of values, described below.

[990729 All] Jason described the g++ implementation, which is a three-member struct:

  1. The adjustment to this.
  2. The Vtable index plus one of the function, or -1. (Zero is a NULL pointer.)
  3. If (2) is an index, the offset from the full object to the member function's Vtable. If -1, a pointer to the function (non-virtual).

A concern about covariant returns was raised. It was observed that, given our decision to use distinct Vtable entries for distinct return types, no further concern is required here. Others will describe their representations. IBM has an alternative, but it is believed to be patented by Microsoft.

[990805 All] It is agreed that a two-element struct will be used for a pointer to a member function, with elements as follows:

ptr:
For a non-virtual function, this field is a simple function pointer. (Under current base IA-64 psABI conventions, this is a pointer to a GP/function address pair.) For a virtual function, it is 1 plus twice the Vtable offset of the function. The value zero is a NULL pointer.

adj:
The required adjustment to this.

Although we agreed to close this, SGI suggests a minor modification. Since the Vtable offset of a virtual function will always be even, we suggest that it not be doubled before adding 1. This is because shifts are more restricted on many processors than other integer ALU operations (shifters are large structures), so an XOR or NAND will often be cheaper than a right shift.

[990812 All] Close this issue with the suggested modification.

# Issue Class Status Source Opened Closed
A-12 Merging secondary vtables data closed Sun 990610 990805
Summary: Sun merges the secondary Vtables for a class (i.e. those for non-primary base classes) with the primary Vtable by appending them. This allows their reference via the primary Vtable entry symbol, minimizing the number of external symbols required in linking, in the GOT, etc.
Resolution: Concatenate the Vtables associated with a class in the same order that the corresponding base subobjects are allocated in the object.

[990701 Michael Lam] Michael will check what the Sun ABI treatment is and report back.

[990729 All] A separate issue raised in conjunction with A-7 is whether to include Vfunc pointers in the primary Vtable for functions defined only in the base classes and not overridden. If the primary and secondary Vtables are concatenated, this is no longer an issue, since all can be referenced from the primary Vptr.

[990805 All] All of the Vtables associated with a class will be concatenated, and a single external symbol used (to be identified as part of the mangling issue F-1). The order of the tables will be the same as the order of base class subobjects in an object of the class, i.e. first the primary Vtable, then the non-virtual base classes in declaration order, and finally the virtual base classes in depth-first declaration order.

# Issue Class Status Source Opened Closed
A-13 Parameter struct field promotion call closed SGI 990603 990701
Summary: It is possible to pass small classes either as memory images, as is specified by the base ABI for C structs, or as a sequence of parameters, one for each member. Which should be done, and if the latter, what are the rules for identifying "small" classes?
Resolution: No special treatment will be specified by the ABI.

[990701 all] Define no special treatment for this case in the ABI. A translator with control over both caller and callee may choose to optimize.

# Issue Class Status Source Opened Closed
A-14 Pointers to data members data closed SGI 990729 990805
Summary: How should pointers to data members be represented?
Resolution: Represented as one plus the offset from the base address.

[990729 SGI] We suggest an offset from the base address of the class, represented as a ptrdiff_t.

[990805 All] Such pointers are represented as one plus the offset from the base address of the class, as a ptrdiff_t. NULL pointers are zero.

# Issue Class Status Source Opened Closed
A-15 Empty bit-fields data closed CodeSourcery 991214 000106
Summary: How are zero-length bit-fields handled?
Resolution: Zero-length bit-fields do not prevent a class from being considered empty or nearly empty.

[991214 CodeSourcery -- Mark]

Question: Does the presence of a zero-width bit-field prevent a class from being empty?

Suggested Resolution: No. Amend the definition of an "empty class" to read:

A class with no non-static data members other than zero-width bitfields, no virtual functions, no virtual base classes, and no non-empty non-virtual base classes.

Amend the definition of a "nearly empty class" to read:

A class, the objects of which contain only a Vptr and zero-width bitfields.

[000106 All] Accept the CodeSourcery proposal.

# Issue Class Status Source Opened Closed
A-16 Nearly empty virtual bases data closed SGI 991228 000106
Summary: May a class with non-empty, non-primary, virtual base classes be treated as nearly empty (and thus eligible to be a primary base) if its only non-vptr data is in its virtual base classes?
Resolution: Virtual base classes do not prevent a class from being considered nearly empty.

[000106 All] Accept the proposal.

# Issue Class Status Source Opened Closed
A-17 Primary indirect virtual base allocation data closed SGI 991228 000113
Summary: When a nearly empty virtual base class A is allocated as the primary base class of class B, and then B is allocated as a base class of C, should A (i.e. its vptr) be separately allocated in C, or should its first occurrence in a previously allocated base B be used as its allocation in C?
Resolution: Do not reallocate a nearly empty virtual base class that is the primary base class of any other base class, direct or indirect. Use the first primary base class instance in the inheritance hierarchy as its allocation, in the usual depth-first, left-to-right order.

[991228 SGI -- Jim] Specific wording for a proposed change is in the Draft C++ ABI for IA-64.

[000103 CodeSourcery -- Mark] I think the current proposal for allocating virtual bases is still a little suboptimal. In particular, given:

  struct A { void f(); };
  struct B : virtual public A { };
  struct C : virtual public A, virtual public B { };
we'll give `C' a larger size than for:
  struct C : virtual public B, virtual public A { };
where we'll reuse the `A' part of `B' rather than reallocating it.

I know that ordering can already affect size (principally because of alignment issues) but I think that in this case we might as well not punish programmers for choosing the "wrong" ordering.

I think we should change the green A-17 proposed resolution to indicate that if one of the virtual bases is a (direct or indirect) primary base of one of the other virtual bases then we need not allocate a fresh copy.

FWIW, it turns out to actually be easier in GCC to code the more generous version.

The algorithm to do this is linear in the size of the hierarchy: just iterate through the inheritance DAG marking all primary bases. Any virtual base classes that remain unmarked need to be allocated in step III. A slight formalization of this sentence might be a good way to express which bases to choose for III.


[000113 All] Do not reallocate a nearly empty virtual base class that is the primary base class of any other base class, direct or indirect. Use the first primary base class instance in the inheritance hierarchy as its allocation, in the usual depth-first, left-to-right order.

# Issue Class Status Source Opened Closed
A-18 Virtual base alignment data closed SGI 991228 000113
Summary: Should virtual bases have a different effect on class alignment than other components?
Resolution: Yes. When allocating the non-virtual part of a base class, use its non-virtual allignment, i.e. ignoring its virtual bases' contributions.

[991228 SGI -- Jim] Since the allocation of virtual bases is "floating" relative to the classes in which they occur, it is possible for them to have independent alignment constraints. Specifically, when allocating a base class with a virtual base, we could treat its alignment as that obtained by ignoring the virtual base, and later allocate the virtual base with greater alignment.

Since the class with a virtual base already has a vptr, this only matters if the virtual base contains components more strictly aligned than a pointer. Thus, the benefit of doing so is probably not large. To get some idea of the effect on the layout definition, look at dsize and nvsize, and assume a similar pair of alignment values.

[000106 All] No strong opinions were expressed on this issue. We will decide it at the next meeting after people have a chance to think it over. The bias will be to keep the current simpler definition.

[000113 All] It turns out that both Compaq and someone else (Cygnus?) already do this, find it straightforward, and prefer to keep it. Therefore, accept the suggestion that when allocating the non-virtual part of a base class, we use its non-virtual allignment, i.e. ignoring its virtual bases' contributions.

# Issue Class Status Source Opened Closed
A-19 Primary indirect virtual base choice data closed All 000106 000120
Summary: In allocating class C, when the first nearly empty virtual base class A is allocated as the primary base class of a later nearly empty virtual base class B, should A or B become the primary base class of C?
Resolution: Do not use a virtual base as primary if it is already a primary base of some other direct or indirect base, unless such are the only candidates. In either case, use the first candidate in depth-first, left-to-right order in the inheritance graph.

[000106 All] This issue was initially confused in the discussion with A-17, but is independent. Recall that non-virtual bases have priority over virtual bases for selection as the primary base. Assuming that no non-virtual base is suitable, this issue involves which virtual base should be selected. Our original decision was to use the first in left-to-right order.

The proposal here is that, if this initial candidate A is itself already a primary base class of a later virtual base B, then B will be used instead, unless it is already a primary base class of a later virtual base, and so on. See proposed wording in the ABI layout document.

Noone can identify a case in which this approach is worse than the original definition.

[000113 All] The proposed resolution on the table is to use the following priority to choose the primary base class:

  1. The first (left-to-right declaration order) super-polymorphic non-virtual base class.
  2. The first (left-to-right declaration order) nearly empty virtual base class that is not a primary base class of any other base, direct or indirect.
  3. The first (left-to-right declaration order) nearly empty virtual base class.

[000113 All] Modify the above to use any virtual base in the inheritance graph, first one that is not already primary to some base if possible, or then any candidate, chosen as the first in a depth-first, left-to-right inheritance graph walk.

# Issue Class Status Source Opened Closed
A-20 Operator new array cookies data closed All 000113 000120
Summary: When operator new is used to create a new dynamic-length array, a cookie must be stored to remember the allocated length so that it can be deallocated correctly.
Resolution: In principle, place cookie immediately before array, aligned naturally. Use no cookie for array element types without destructors. See the Draft C++ ABI for IA-64.

[000113 All] The proposed resolution is as follows:

This resolution has the following consequences:

[000120 All] Accept the above.

# Issue Class Status Source Opened Closed
A-21 Placement new array cookies data closed All 000113 000217
Summary: Same issue as A-20, except that for placement new, the user supplies already-allocated space. Therefore, there is a conflict between wanting to make delete() work on arrays created in this way, and wanting to avoid surprising users who haven't allocated enough space for the cookie. Also, are cookies allocated if there is no destructor?
Resolution: Use no cookie for element types with no destructors, nor for ::operator new(size_t, void*). Otherwise, use a cookie as in issue A-20. See the Draft C++ ABI for IA-64.

[000119 SGI -- Matt]

What the standard says (3.7.3.1, 5.3.4, and 18.4.1.3)

Array placement new has the form "new(ARGS) T[n]". The "(ARGS)" part is optional. If it's present then this is a placement new-expression, and we use a version of operator new[] with two or more arguments, otherwise it's an ordinary new-expression, and we use a version of operator new[] with one argument. For the purposes of this proposal, the distinction isn't all that important.

After finding the appropriate operation new, a new-expression obtains storage with

void* p = operator new[](n1, ARGS),
where n1 >= n * sizeof(T). It then constructs n objects of type T starting at position p1, where p1 = p + delta. The return value is p1.

It is required (3.7.3.1/2) that the return value of any operator new[], whether it's built-in or provided by the user, must be suitably aligned for objects of any type.

If T is "char" or "unsigned char" the standard requires that delta is a nonnegative multiple of the most stringent alignment constraint for objects of size less than or equal to n (5.3.4/10). Otherwise the only restriction is that delta is nonnegative.

Some implementations store the number of elements in the array at a negative offset from p1. The standard neither requires nor forbids it.

There's a predefined placement version of array operator new,

::operator new[](size_t n1, void* p),
that does nothing but return p. p must be a pointer to the beginning of some array of size at least n1. The standard doesn't tell users how large an array they need. Many users probably assume that it's sufficient for the array to be of size n * sizeof(T), but there's no basis in the standard for that assumption.

IA-64 Specifics

On IA-64 long double is 80 bits. long double has 128-bit alignment, as do classes and unions containing long double, so sizeof(long double) is 16. All other types have at most 64-bit alignment.

What the abi needs to specify

  1. Given n, T, sizeof(T), and alignof(T), what are n1 and delta?
    1. Are T=char and T=unsigned char special cases? (Or, perhaps, is sizeof(T)=1 a special case?)
    2. Is ::operator new[](size_t, void*) a special case?
    3. Is ::operator new[](size_t), which is used for non-placement new, a special case?
    4. Is ::operator new[](size_t, const nothrow_t&) a special case? I can't find anything in the standard guaranteeing that you can delete an array allocated with nothrow array new using an ordinary array delete-expression, but users probably expect it, and legitimately so.

  2. Do we store n at a negative offset from the return value of operator new[]? (This affects the answer to question 1.) If so, we need to specify precisely what that offset is.

Proposal A

No version of operator new[] is a special case. For any array new-expression we store the number of elements in the array, as a size_t, at an offset of -sizeof(size_t) from the pointer returned by the new-expression. For any type T other than char, unsigned char, long double, or a type containing a long double, n1 = n * sizeof(T) + sizeof(size_t). For those three types, since we need to preserve long double alignment, n1 = n * sizeof(T) + sizeof(long double).

Pseudocode for new(ARGS) T[n] under this proposal:

    if T = char or unsigned char, or if it has long double alignment,
      padding = sizeof(long double)
    else
      padding = sizeof(size_t)

    p = operator new[](n * sizeof(T) + padding, ARGS)

    p1 = (T*) (p + padding)
    ((unsigned long*) p1 - 1) = n

    for i = [0, n)
      create a T, using the default constructor, at p1[i]

    return p1

Proposal B

::operator new[](size_t, void*) is a special case. For that version of operator new[] only, n1 = n * sizeof(T). We do not store the number of elements in such an array anywhere.

Pseudocode for new(ARGS) T[n] under this proposal:

    If the expression is new(p) T[n], and if overload resolution
    determines we're using ::operator new[](size_t, void*), then
      p1 = (T*) p

      for i = [0, n)
        create a T, using the default constructor, at p1[i]

      return p1

For all other cases, same as proposal A.

Proposal A is simpler, but proposal B probably conforms more closely to user expectations.


[000210 All -- Matt] We agreed that Proposal B, where ::operator new(size_t, void*) is a special case with no cookie, is preferable to Proposal A, where all versions of array new get cookies.

We also agreed to the variation where we don't reserve space for a cookie if the type has no destructor. We're calling it Proposal C. We need a writeup, but we should be able to close this issue next week.


[000302 CodeSourcery -- Mark] I believe the resolution to A-20/A-21, dealing with array new, is incorrect with respect to the C++ standard. (In other words, I think we'll make it impossible to implement the behavior required by the standard.)

In particular, there are situations in which we do not allocate cookies, even when allocating arrays of class type. But, the standard guarantees that:

[class.free]

When a delete-expression is executed, the selected deallocation function shall be called with the address of the block of storage to be reclaimed as its first argument and (if the two-parameter style is used) the size of the block as its second argument.)

That paragraph doesn't require that the class type have a non-trivial destructor.

I think that means the first bullet:

No cookie is required if the array element type T has a trivial destructor (C++ standard, 12.4/3).
should read:
No cookie is required if the array element type T has a trivial destructor ([class.dtor]) and the usual (array) deallocation function ([basic.stc.dynamic.deallocation]) function does not take two arguments.

(Note: if the usual array deallocation functions takes two arguments, then its second argument is of type size_t. The standard guarantees that this function will be passed the number of bytes allocated with the previous array new expression. See [class.free] for details.)


[000302 All] Modification accepted.

# Issue Class Status Source Opened Closed
A-22 RTTI for reference types data closed CodeSourcery 000119 000203
Summary: __reference_type_info does not appear to be necessary.
Resolution: Remove it.

[000119 CodeSourcery -- Nathan] When would a type_info of a reference ever be generated? (So why __ref_type_info?)

[000126 CodeSourcery -- Nathan]

[dcl.mptr] (8.3.3)/3
A pointer to member shall not point to ... a member with reference type

[000128 Cygnus -- Jason] Based on that, I definitely think reference type_info can go away.

[000203 All] Remove __ref_type_info.

# Issue Class Status Source Opened Closed
A-23 RTTI class descriptors data closed CodeSourcery 000124 000302
Summary: Resolve several questions about the RTTI representation of class types.
Resolution: See the Draft C++ ABI for IA-64.

[000124 CodeSourcery -- Nathan] si_class_type_info is for a single nonvirtual inheritance heirarchy. Presumably this single non-virtual inheritance is between the derrived and the base (the base may or may not have multiple or virtual bases). An additional constraint is that, if the derrived class is polymorphic, the base class is too. Rationale: if the derrived class adds polymorphism, the base will be at a non-zero offset.

[000126 CodeSourcery -- Nathan] More useful for dynamic cast (and possibly catch matching) {than the current set of flags -- editor} would be the following flags:

Note that the virtual/non-virtual and public/non-public are not mutually exclusive. Also note that I have not actually implemented anything with these flags, so I could be wrong.

[class.mi] (clause 10.1) provides good examples of "diamond shaped." Paragraph 4 gives a non-diamond shaped graph with multiple base object. At least one of the multiply inherited base objects must be non-virtual.

        struct L {};
        struct A : L {};
        struct B : L {};
        struct C : A, B {};

There are two distinct L base objects in C. C would have the non-diamond shaped multiple inheritance flag set. A, B and C would have the non-virtual base flag and public base flag set.

Paragraph 5 gives a diamond shaped graph. Such a multiply inherited base object must be virtual.

        struct V {};
        struct A : virtual V {};
        struct B : virtual V {};
        struct C : A, B {};

This time C would have the diamond shaped flag set. A, B & C would have the virtual base flag set and the public base flag set. C would also have the non-virtual base flag set.

Paragraph 6 gives a graph which contains both features. Here there is one non-virtual base and one virtual base.

        struct B {};
        struct X : virtual B {};
        struct Y : virtual B {};
        struct Z : B {};
        struct AA : X, Y, Z {};

In that example, AA would have both diamond and non-diamond flags set. all would have the public base flag set, AA & Z would have the non-virtual base flag set, AA, X & Y would have the virtual base flag set.

The above is treating the non-virtual and virtual base flags differently, they should have the following meaning:

Similarly the public and non-public flags mean:

My thinking is that for dynamic_cast, having such information will allow pruning parts of the inheritance graph walk. For instance, there can only be distinct multiple target base objects when the non-diamond shaped flag is set in the complete object. When we find them, the base sub-object started from can only be a common base for both of them, if the diamond shaped flag is set in the complete object. Alternatively, there can only be (at most) one instance of the target type when the non-diamond shaped flag is clear. When we find it via a non-public path, there could only be an alternative public path if the complete object has the diamond shaped flag set. Similar pruning should be possible for catch matching. Without such information, the graph walk has to be pessimistic, which I beleive will slow down the common case.

[000126 CodeSourcery -- Nathan] __si_class_type_info is documented for a single non-virtual hierarchy, and __vmi_class_type_info for a class containing (directly or indirectly) a multiple or virtual inheritance component. My mistake was to use __si_class_type_info for a class with a single base, regardless of the heirachy within the base (that is the current g++ behaviour).

__si_class_type_info is for both public and non-public inheritance (again, something I'd not noticed, thinking it was for public only). For this to work, the __class_type_info flag bit 0x8 'non-publicly inherited base' must mean `non-publicly inherited direct base'. Please can the wording about bases here explicitly say `direct base,' `indirect base,' or `direct or indirect base.' The description currently use `contains' and `has' which are open to interpretation.

In dynamic casting, access is important. In a cross cast from base A via complete type C to another base B, both B and A must be publicly accessible from C. It might be that dynamic_cast locates B, and, knowing that C does not have multiply inherited subobjects, determines it need look no further. However, it must determine access. If C has no non-public direct or indirect bases, access must be OK, without further inspection. However the hint flag 0x8 can't be indicating that, as it is only for direct bases. (This was the one case where I was able to take advantage of these flags, but alas it seems I can't.)

[000127 All] We decided on Thursday that your "mistakes" are what we want. __si_class_type_info will be for any class with a single direct base at offset 0 which is public and non-virtual.

We also decided that the flags should move from __class_type_info into __vmi_class_type_info, and that the polymorphic flag should be removed.

[000126 CodeSourcery -- Nathan] I think this moving of the flags is a mistake. If I understood correctly, they indicated information about direct and indirect bases (whether there was virtuality anywhere in the heirarchy for instance). Such information can speed up dynamic cast. When walking the inheritance graph, we can take some early outs, if we know there are no multiple subobject types within the complete graph. With the flags in every class's type_info, it becomes easier to get hold of that info. With it only for vmi classes, we have to remember `unknown' when presented with a complete object of si type, and fill the information in when/if we find a vmi base.

Another case is in a potential cross-cast case, which I had in the previous email. Suppose we've found the target base, which we know is unique, but not found the source base (because we early outed, maybe). To be a valid cross-cast both the source and target base objects must be public in the complete object. If we know the complete heirarchy has no non-public bases, there's no need to search for the source base in this case.


[000129 Cygnus -- Jason] So what you're saying is if we try to dynamic_cast from A* to B*, where B has a unique A subobject and the A* does not actually point to part of a B, if we know that B has no multiple subobjects we can check the passed offset, see that it doesn't match, and return failure. Without that information, we would have to recurse up the single-inheritance chain until we either reach the A or a class with multiple or virtual bases.

I think I'd rather pay that small performance hit than add a word to the type_info for each class. Matt, would this affect locales?

... cross-casts only come up in the context of classes with multiple bases, so it wouldn't make sense to look for this in single inheritance classes anyway.


[000127 All] Note from the meeting: A proposed precise definition of a diamond-shaped object is one that has two different direct bases with the same virtual base, directly, indirectly, or vacuously (the direct base is the virtual base).


[000203 All] Move the flags from __class_type_info to __vmi_class_type_info. Share them with one byte from the __base_class_info offset field. Replace Daveed's set with Nathan's, but the first one isn't needed.


[000203 SGI -- Jim] The class type restructuring is a bit different than what I expected going in (could just be my confusion).

I moved the flags from __class_type_info to __vmi_class_type_info, discovering that they don't need to share space with the offset field in the __base_class_info records, but rather with the base class count. But, the __base_class_info has its own flags (virtual and public) which can reasonably share a doubleword, as we were discussing for the other flags this morning. So I specified that. Note that I put the flags in the low byte rather than the high byte. That is because the offset is signed, and it is likely that implementations will sign-extend (signed doubleword>>8), but not (doubleword & 0x00ffffffffffffffll).

After an exchange with Nathan, I reinstated his first flag (contains non-diamond multiple inheritance).


[000210 All -- Matt] Notes from the meeting:

Minor corrections to RTTI discussion in data layout document: In section 7c, which describes the vmi_flags, flag 0x01 is documented incorrectly. It says "class has non-diamond multiple inheritance", which isn't quite right. We're really talking more about repeated inheritance: having multiple subobjects of the same type.

Also in vmi_flags, Jason questions whether flags 0x04 and 0x08 are necessary. What do we really need "has virtual base(s)" and "has non-virtual base(s)" for? Jason has sent email to Nathan about this.

Naming issue: we decided to put all of our type_info subclasses in namespace abi, not namespace std. This means, of course, that they can't go in any of the standard headers. Rather than inventing multiple header names, we would like to put everything (unwinding longjmp, type_info subclasses, etc.) into one quasi- standard header. We propose the name . Everything in that header will be in namespace abi.

Issue A23 can almost be closed. The only thing we need to resolve is whether to keep the two flags that Jason is unsure about.


[000302 All -- Matt] We will tentatively keep the has-public-base flag. Nathan has an action item to validate its usefullness when he implements.

# Issue Class Status Source Opened Closed
A-24 RTTI for incomplete types data closed CodeSourcery 000126 000330
Summary: How does RTTI represent incomplete types?
Resolution: Use class_type_info distinct from the complete type copy, add a flag to pointer_type_info if it points to incomplete type RTTI, and do mangled name comparison if an incomplete pointer is involved.

[000126 CodeSourcery -- Nathan] The amended (25th Jan) RTTI specification says:

Note that the full structure described by an RTTI descriptor may include incomplete types not required by the Standard to be completed, although not in contexts where it would cause ambiguity.

I don't believe this is the case, the example I posted a couple of weeks back pointed this out. Here it is, in a slightly more compact form

        struct A;
        struct B;

        int main ()
        {
          try {
            throw (B **)0;
          } catch (A const * const *) {
            abort ();
          } catch (B const * const *) {
            ;//ok
          } catch (...) {
            abort ();
          }
        }

I believe this is well formed and should not abort. The RTTI document indicates that `typeid (A const * const *)' and `typeid (B const * const *)' will produce __pointer_type_info chains that end at a weak symbol reference for A and B respectively. These will both resolve to zero. How is catch matching able to determine the difference between `A const * const *' and `B const * const *' under these circumstances? If this is a shortcoming of the ABI, or considered a defect in the standard, it should be documented.

There seems to be no discussion of this case.


[000127 All] We decided on Thursday that this can be handled by not emitting info for A and B, just referring to them using weak references. The EH matcher will never look past the inner pointers.


[000128 CodeSourcery -- Nathan] I'm sorry, I'm just not getting this. The type_infos for `B **' and `B *' will be, (I'm using g++'s existing name mangling, but these are new-abi structures):

__tiPP1B:
        .long   __vt_19__pointer_type_info
        .long   .LC2
        .long   0
        .long   __tiP1B

__tiP1B:
        .long   __vt_19__pointer_type_info
        .long   .LC3
        .long   0
        .long   __ti1B  ;; not emitted, will resolve to zero

In the catch matching, the type_infos for `A const *const *' and `A const *' will be:

__tiPCPC1A:
        .long   __vt_19__pointer_type_info
        .long   .LC1
        .long   1
        .long   __tiPC1A

__tiPC1A:
        .long   __vt_19__pointer_type_info
        .long   .LC4
        .long   1
        .long   __ti1A ;; not emitted, will resolve to zero

and those for `B const *const *' and `B const *':

__tiPCPC1B:
        .long   __vt_19__pointer_type_info
        .long   .LC0
        .long   1
        .long   __tiPC1B

__tiPC1B:
        .long   __vt_19__pointer_type_info
        .long   .LC5
        .long   1
        .long   __ti1B ;; not emitted, will resolve to zero

I fail to see how the catch matcher can get different results comparing __tiPP1B to __tiPCPC1A as opposed to comparing __tiPP1B to __tiPCPC1B. They both look like qualification conversions of pointers to pointers to incomplete type. In the first case we'll end up comparing __tiP1B to __tiPC1A, which still is a valid qualification conversion, then have two NULL pointers for the pointed to types, which somehow we have to tell apart. In the second case we'll end up comparing __tiP1B to __tiPC1B, and again have two NULL pointers for the pointed to types, but this time we have to consider them the same type. I don't see anything in [conv.qual] saying that qualification conversions don't have to deal with incomplete types. N.B.: old-abi g++ seg faults on the above code because it does wander into the NULL pointers.


[000129 Cygnus -- Jason] Good point. I was forgetting about multi-level qualification conversions.

I think that leaves us with something like what EDG does now: namely, comparisons are done by comparing the addresses of one-byte commons rather than of the type_info nodes themselves. Then we could emit incomplete info in one file and complete info in another file and they would compare the same because both refer to the same ID proxy.

We could mangle the complete and incomplete versions differently, so they would not be combined by the linker.

This would also change how we refer to type_infos; under the current scheme, references to type_infos in the EH type table need to be via relocs that will be resolved by the dynamic linker at runtime. If we don't need to compare addresses, we could use gp-relative references. Of course, we'd still have the absolute references in the type_infos to the ID proxies, so we're no better off.


[000130 CodeSourcery -- Nathan] There's a bit of strangeness with loading & unloading a DSO which contains the complete definition of `struct A', into an executable which has the incomplete info. That too is in the original email. If both DSO and executable have __tiP1A (struct A *), they'll be merged, presumably with the DSO's copy ignored. However, the __tiP1A in the executable will point at the proxy incomplete A type_info (which will have already been filled with a weak NULL for its target). Somehow we have to arrange that the proxy is altered to now point at the __ti1A (struct A) type_info that the DSO supplied. If we don't do that, throwing `struct A *' in the DSO (which is valid, `cos the DSO source had complete information), will throw the __tiP1A in the executable which points to incomplete. Hence we wont find any base conversions if we're trying to catch a base of A.


[000203 All] We can't seem to get around the need for an EDG-style implementation, i.e. a proxy for the type RTTI which is resolved by name, e.g. a one-byte common block referenced from the RTTI. We need a specific proposal for putting the reference in the RTTI, and a mangling for the name.

Since all we need from the common block is a distinct address, we may want to float a base ABI proposal for a new symbol type which is resolved by the linkers to a unique address without allocating storage.


[000210 All -- Matt] The scheme we have been converging on: we extend __class_type_info by putting in a new field, id_proxy_ptr, of type char*. It points to a one-byte comdat which serves only as a unique address. (We don't see a strong need to ask the base ABI group to mandate a magic unique-address feature in the linker. We may want to get input from our linker people, though.)

A class's __class_type_info object and its comdat proxy both receive mangled names. We must make sure that the proxy's mangled name is the same for all complete and incomplete declarations of a class, that the mangled name of the __class_type_info object is the same for all complete declarations of a class, and that the mangled name of the __class_type_info object is different for incomplete declarations than for complete declarations. One way to achieve this is to make __class_type_info objects for incomplete declarations static.

We add a new flag to __pointer_type_info; let's say bit 0x4. If this is set, it means we have a pointer to an incomplete type (or pointer to pointer to incomplete type, etc.)

We compare two __class_type_infos for equality by pointer comparison of the id_proxy_ptr fields. We compare two __pointer_type_infos for equality by looking at the addresses of the type_info objects, *unless* the incomplete bit is set in at least one of them. If the incomplete bit is set, we have to compare the pointed-to types. For everything other than classes and pointers we can just use address equality of the type_info objects themselves.

In response to Jason's 000129 question: we can't use gp-relative references for type_info objects because we're only using comdat proxies for __class_type_info, not for other kinds of type_info objects.

In response to Nathan's 000130 question: this is the reason to give the complete and incomplete __class_type_info objects different mangled names. That way a complete __class_type_info object in a DSO won't be overridden by an incomplete __class_type_info object in the executable.

At the very end of this meeting we got a suggestion from Christophe for a complete different mechanism. We agreed that we can't evaluate it without a writeup. The suggestion: abandon these comdat proxies altogether. Instead we have a new type_info class, __incomplete_class_type_info. Comparisons involving two __class_type_info objects use address equality, comparisons involving two __incomplete_class_type_info objects, or a __class_type_info and an __incomplete_class_type_info, do string comparison on the name. We still would have an incomplete bit in the __pointer_type_info class, which, again, we would use to determine whether two __pointer_type_info objects with different addresses might nevertheless represent the same pointer type.


[000309 All] The group decided to go ahead and close this issue with the proxy solution. If Christophe comes up with a writeup of the alternate proposal, we can reopen.


[000314 SGI -- Jim] I've incorporated the chosen scheme into the Draft C++ ABI for IA-64. In working this out, though, I've remembered why SGI had an issue with the proxy commons, which is that, in large programs with lots of class types, they produce a lot of runtime relocation scattered through data. Matt and I think we understand the representation of Christophe's proposal, and will think about how to compare the mangled names.


[000330 All] Adopt the proposed scheme. Make sure Nathan understands it.

# Issue Class Status Source Opened Closed
A-25 Excess-width bitfields data closed IBM 000204 000217
Summary: C++ allows bitfields with a larger size specified than that required by the declared type, e.g. int f: 64. How should they be allocated?
Resolution: Allocate the field with alignment determined as though it were the largest integer type that fits in the specified size, and use the first bits available in the field (lowest order for little endian IA-64) for the actual data.

When the specified width of a bitfield exceeds the size of the declared type, the standard specifies that the accessible field is to be padded to the specified width, with the location of the padding implementation-defined. That is, the accessible field could be placed at the beginning, at the end, or in the middle of the specified bits. (Note that such declarations are explicitly disallowed by the C 2000 draft, so this is not a C ABI issue.)

[000204 SGI -- Jim] It seems to me that the situation that makes it interesting is the following:

        struct s {
          short s1;
          int i: 64;
          short s2;
        }
In this case, I don't want the accessible part of i at the beginning or the end -- I want it in the middle. Doing otherwise yields either a badly aligned i, or wasted space.

One could express this by the following rule:

Place the accessible part of the bitfield object as if it were a non-bitfield member of the declared type, i.e. at the next available offset of the appropriate alignment. Allocate the full bitfield at the earliest available offset where it will include the accessible part.
[000204 IBM -- Mark] I disagree. If the user wants the bitfield to be aligned in a certain place, he has the tools to do so. He can certainly pick a different size bitfield. I think that this should be aligned as if it is the same size as the type, and then the extra bits put somewhere. Putting them afterwards is probably simpler than before, or splitting it in the middle. [000217 All] The rationale for the solution chosen is that the most likely reason for using this feature is to achieve a known allocation for an enum type when the user does not know how big compilers will make it. Thus, we want "enum ... e : 32;" to behave as though the compiler allocated a 32-bit int, even if it actually uses only 8 bits for the enum value.

# Issue Class Status Source Opened Closed
A-26 NULL pointers to member functions data closed CodeSourcery 000221 000302
Summary: How are NULL pointers to member functions represented?
Resolution: A NULL pointer is represented by a 0 value of ptr, and the value of adj is irrelevant.

[000221 CodeSourcery -- Mark] The ABI document says that a NULL pointer-to-member function has `ptr == 0'. It does, not, however say whether or not a NULL pointer-to-member function also has `adj == 0'.

I believe that this should be specified as well so that code generated to do comparison of pointers to members (of the same type) looks like:

p1->ptr == p2->ptr && p1->adj == p2->adj
and not:
p1->ptr == p2->ptr && (!p1->ptr || (p1->adj == p2->adj))

So, I would say:

If the pointer-to-member is NULL, both fields are zero. (Note: there are no non-NULL pointers-to-members for which the `ptr' field is non-zero.)

It's occurred to me that this imposes some overhead on casting pointers-to-members around: now when you convert from a base pointer to member to a derived version (or vice versa), you can't just adjust the `adj' member willy-nilly; instead, you have to check first whether or not the pointer is NULL.

So, I'm not sure any more which scheme is preferable -- but we definitely need to say clearly which we want.

[000222 CodeSourcery -- Mark] So, it would be helpful if we were to add:

(Note: the `adj' field is not necessarily zero even when the pointer-to-member is NULL. Therefore, casting a pointer-to-derived-member to a pointer-to-base-member (or vice versa) requires only an adjustment to the `adj' field. However, comparsion of two pointers-to-members requires more than a bitwise comparision. Code equivalent to:
p1.ptr == p2.ptr && (!p1.ptr || (p1.adj == p2.adj))
is required since in the case that p1.ptr and p2.ptr are both zero, there `adj' fields are irrelevant.)
to the ABI document.

[000229 SGI -- Jim] Comparisons (5.10) of pointers to virtual member functions are undefined. So, for pointer-to-function-member comparisons, we only need to worry about non-virtual members and null. Since the representation stores the actual address of the function descriptor, we should be able to just compare the pointers, and ignore the adjustment.

For conversions between base classes, it seems that we need only modify the adjustment, and then only if one is not primary for the other. For conversion to null, it seems that we need only set the pointer to 0, and can ignore the adjustment.

[000302 All] Represent NULL by a 0 pointer, with the adjustment unspecified.

# Issue Class Status Source Opened Closed
A-27 NULL pointers to data members data closed CodeSourcery 000222 000302
Summary: How are NULL pointers to member data represented?
Resolution: A NULL pointer is represented by the value -1.

[000222 CodeSourcery -- Mark] We haven't specified a way to represent a NULL pointer to data member. G++ presently adds one to the offset, allowing zero to serve as the NULL pointer to member.

[000223 CodeSourcery -- Mark] What is the value for the NULL pointer to data member? I guess -1 would do, unless there are cases I can't think of where the pointer to member would legitimately have a negative value. Maybe 0x8000000000000000 is better...

[000229 SGI -- Jim] From the Standard:

So we can conclude that, since we always allocate non-virtual bases before data members, any base object in a derivation chain will have its base address smaller than any of the data members declared in members of the chain. Therefore, the offset represented by a pointer-to-data-member will always be non-negative, even after the permitted conversions above.

So, we could either use -1 for NULL, or use 0 and increment the offset. 0x800...000 is an unnecessary complication.

[000302 All] Represent NULL by the value -1.

# Issue Class Status Source Opened Closed
A-28 RTTI equality testing data closed CodeSourcery 000406 000504
Summary: Can we get back the ability to do a simple test for RTTI equality?
Resolution: Mangle the name NTBS for std::type_info separately, emit it in its own COMDAT, and use it instead of the RTTI struct, at least if the incomplete flags are set in pointer types.

[000406 CodeSourcery -- Nathan] The current RTTI proposal loses the property that all type_info objects can be compared for equality and orderability by address comparison. Instead, type_info::operator== must involve a virtual function call or unconditionaly strcmp. (An alternative of testing the typeid of the polymorphic type_info objects results in infinite recursion!)

Here are two proposals which reinstate the address equality property. The first is rather different to the current scheme, but when I was done documenting it, I realised there was a minor modification to the current scheme, which partially reinstates the address equality. I present both for consideration. Feel free to shot them down ...

Proposal A

  1. The typeid operator produces a std::type_info object for all types. No subclassing of std::type_info is done. The object has comdat linkage, and hence after linking and loading, only one object of that name is active. For typeid(X) it does not matter whether X is incomplete, or direct or indirect pointer to incomplete. The functionality required of typeid is to produce objects which can test for type equality and (implementation defined) type orderability. No information about the internal structure of the type is required.

  2. Dynamic_cast and catch matching require more information. Primarily the heirarchy of a class type, and the target of pointer types. To do this, a separate class heirarchy is used. These objects are also emitted with comdat linkage, and with a different name to the std::type_info objects produced by typeid. (It is not _necessary_ for these to have comdat linkage, but that will reduce overall program size.)

    The base class of these is:

    class abi::__type_info
    {
      std::type_info const *type; // pointer to typeid(foo) object.
      virtual ~__type_info ();
      ... other implementation defined member functions
    };
    
    

    This contains a pointer to the type_info object produced by the typeid operator, for whatever type this is describing. That will be a unique object.

    There are a number of necessary derivations of this type, which can be taken largely unaltered from the current proposal.

    It is necessary to distinguish function types, so that catch matching can distinguish a data pointer object from a function pointer object. Other types (fundamental, enum, array) need not be distinguished, and can be represented by an abi::__type_info object. (Or we could keep the current proposal of having separate derivations for these.)

    class abi::__function_type_info
      : public abi::__type_info
    {
      virtual ~__function_type_info ();
      ... other implementation defined member functions
    };
    
    

    Pointers are as they currently are, other than the base class change. We still need the incomplete target flag.

    class abi::__pointer_type_info
      : public abi::__type_info 
    {
      abi::__type_info const *target;   // target type of the pointer
      unsigned flags;                   // flags, as currently specified
      virtual ~__pointer_type_info ();
      ... other implementation defined member functions
    };
    
    

    Pointers to member could be a sibling class of non member pointers. However, they do share common functionality, and IMO it makes sense to derive from __pointer_type_info.

    class abi::__pointer_to_member_type_info
      : public abi::__pointer_type_info
    {
      abi::__class_type_info const *klass;  // class of the member
      virtual ~__pointer_to_member_type_info ();
      ... other implementation defined member functions
    };
    
    

    The __class_type_info, __si_class_type_info and __vmi_class_type_info are unchanged, other than the change to __class_type_info's base.

    class abi::__class_type_info
      : public abi::__type_info
    {
      ... as currently defined
    }
    
    

The vtable slot -1, (which currently holds a pointer to the std::type_info object for a class), points to the abi::__class_type_info object. To implement typeid(X), where X is polymorphic, involves an additional indirection through the abi::__type_info base to return the `type' member.

dynamic_cast uses the abi::__class_type_info object pointed to in the vtable. throwing and catch matching use the abi::__type_info object for the type being thrown or caught.

As with the current proposal, an incomplete type is represented by an abi::__class_type_info object. Note that its abi::__type_info base will point to the unique std::type_info object for that type, regardless of whether a DSO completes the type. This incomplete type is prevented from preempting the complete type information.

Also direct or indirect pointers to incomplete have their incomplete flag set, and are also prevented from preempting the equivalent pointer to complete object.

During catch matching, comparison of pointers can compare the abi::__pointer_type_info addresses, unless either has the incomplete flag set, in which case the std::type_info objects pointed to must be compared. (The std::type_info objects could be compared even when the incomplete flags are clear.)

There are two or three naming schemes with this proposal:

  1. The naming of the std::type_info object produced by typeid.
  2. The naming of the abi::__type_info object required for dynamic cast and catch matching
  3. Optionally, the naming of the incomplete abi::__class_type_info and direct or indirect pointers to it. If that mangling is specified, we can emit those as comdat objects too, rather than forcing them to be statics.

Advantages of this proposal are:

The cost of this proposal is

Proposal B

The first proposal is essentially using the std::type_info objects as unique objects, via which incomplete types can be compared. We already have such a unique object candidate -- the NTBS name member of std::type_info. Currently we've not said anything about that. If, however, we give that NTBS comdat linkage, a unique name, and prevent it being commonized with other strings, we have a proxy. These features can be obtained by treating it as a `const char []' rather than a string constant. type_info equality and orderability can now use the address of this array, rather than the type_info objects themselves. We can do this in all cases, even though it is only necessary for the pointer to incomplete case, as that avoids a virtual function call. Here is an implementaion of type_info::operator==

bool type_info::operator== (type_info const &other) throw ()
{
  return name == other.name;
}

We need to specify the naming scheme for the NTBS.

The advantages of this are

The costs over proposal A are


[000411 CodeSourcery -- Nathan]

Issue 2

The algorithm for collation order of type_infos, cannot simply compare addresses for non-pointer types, and complete pointer types. Using string collation only works when one of the types is a pointer with the incomplete_mask set. There are two difficulties. Firstly, we might be comparing a non-pointer type_info with a pointer type_info. We need to determine this and DTRT WRT the incomplete flag of the pointer type_info. to do that will require dynamic_cast or typeid'ing the type_infos. Secondly, assume we are just comparing pointer type_info's. We have two pointers to complete, Aptr and Bptr, and a third pointer to incomplete, Cptr.

  1. Aptr.before (Bptr) can just compare addresses.
  2. Bptr.before (Cptr) will compare names.
  3. Cptr.before (Aptr) will compare names.

There is nothing maintaining the consistency of the results of these three tests -- result 1 is uncorrelated with results 2 & 3.

Therefore type_info::before must be implemented as string compare on the type's names. We lose any advantage of commonizing the type_infos.

Issue 3

17.4.4.4 prevents an implementation adding member functions to one of the std classes, except in particular circumstance. About the only leeway given is whether a particular non-virtual function is inline or not. So I presume we're not permitted to add virtual member functions to std::type_info (18.5.1). The rules given in 17.4.4.4 specifying what member functions can be added look like applications of the as-if rule, but there must be something deeper going on, as if that was all, it wouldn't be mentioned. I'm not sure how a conforming program could tell whether additional functions had been added.

The abi requires us to add virtual functions to type_info. For instance the implementation of operator== will require it to deal with pointers to incomplete. G++ needs several for catch matching.

Issue 4

5.2.8 talks about typeid returning something derived from type_info, but the footnote mentioning extended_type_info implies to me that typeid always returns objects of the same type. Again, I'm not sure how a conforming program could tell.

The two proposals above resolve these issues. Proposal A resolves issues 2,3 &4, whilst proposal B resolves issue 2 only, and will leave us (slightly) non-conformant.


[000413 All] The Standard committee members in the group are quite sure that Issues 3 and 4 are not problems. Section 17.4.4.4 does not impose the suggested constraint (see footnote 173), and the intent of 5.2.8 is not to restrict typeid to returning a single class.

Proposal B resolves the remaining issue, and the group is inclined to accept it, while considering whether to go further with A. Jim will (and has) integrated B into the Draft C++ ABI for IA-64.


[000504 All] It was decided to accept the current writeup. See the Draft C++ ABI for IA-64.

# Issue Class Status Source Opened Closed
A-29 RTTI pointer-to-member data closed CodeSourcery 000407 000504
Summary: Derive __pointer_to_member_type_info from __pointer_type_info.
Resolution: Derive __pointer_to_member_type_info and __pointer_type_info from a common base class __pbase_type_info. Add a new flag to __pbase_type_info indicating that the class of a pointer-to-member is incomplete (propagated up a chain of pointers).

[000407 CodeSourcery -- Nathan] __pointer_to_member_type_info is derived from type_info. I strongly recommend it be derived from __pointer_type_info, as it requires much of the same functionality, and has the same meanings of its flags. By subclassing __pointer_type_info, much code could be reused.

Thus point 8 of the rtti classes would become

The abi::__pointer_to_member_type_info type adds one field to abi::__pointer_type_info:


[000411 CodeSourcery -- Nathan] It is permissible in a pointer to member of X, for X to be an incomplete type [8.3.3]/2. This means that we need more that a single incomplete flag. The presence of such a ptr to member, will mean that it and all pointers to it will have their incomplete flag set, but its target might not be an incomplete chain. In implementing G++'s rtti runtime I found the following three flags useful, (this is with __pointer_to_member_type_info derived from __pointer_type_info):

incomplete_mask       = 0x8
incomplete_chain_mask = 0x10
incomplete_klass_mask = 0x20

incomplete_mask is an inclusive or of the other two flags. incomplete_klass_mask is only used by __pointer_to_member_type_info, and __pointer_type_info knows nothing about it (it simply examines the other two).

A __pointer_type_info or __pointer_to_member_type_info sets the incomplete_mask and incomplete_chain_mask, if the target is an incomplete type, or has its incomplete_mask set.

A __pointer_to_member_type_info sets the incomplete_mask and the incomplete_klass_mask, if the class of the member is incomplete.


[000411 Ed.] I've tentatively incorporated both of these into the layout document, except that I just defined a second flag (in __pointer_type_info flags) for direct or indirect incomplete class type (in member pointers). Any pointer type inspections can check for both flags, even though only member pointers can cause one of them to be set up the chain.


[000413 All] Derive __pointer_to_member_type_info and __pointer_type_info from a common base class __pbase_type_info. Add a new flag to __pbase_type_info indicating that the class of a pointer-to-member is incomplete (propagated up a chain of pointers).

(Ed. note) I've added updates to the Draft C++ ABI for IA-64.


[000504 All] It was decided to accept the current writeup. See the Draft C++ ABI for IA-64.

# Issue Class Status Source Opened Closed
A-30 RTTI portability data closed HUB 001012 001109
Summary: What must be specified to produce RTTI portability? Are member layouts specified? Names? Virtual functions?
Resolution: Data members of the ABI-defined type_info derived classes must be allocated as specified, and their names are normative. Virtual functions, beyond the Standard-specified destructor, are implementation-specific, and may not be referenced outside the compiler and system vendors' runtime libraries.

[001012 all -- Jim] The issue here, raised originally by Martin, I will open as A-30. Implementations will generally need additional virtual functions associated with the type_info hierarchy to implement such functionality as dynamic cast. Gcc for instance has functions __is_function_p, __do_catch, __pointer_catch, ...

A program that is built from pieces from different compilers, where the pieces come from different implementations of the hierarchy, will see different structures, at least in the vtables, if we allow this extra material to be arbitrary, creating a problem if such programs actually make use of parts of the hierarchy.

We worked out the following possible solution:

Now an implementation can add an arbitrary set of functions to __cxa_aux_typeinfo, specialized to the derived class like a virtual function, without changing the external interface (to the user) of the hierarchy.

[001103 SGI -- Jim]

[...leaving out much discussion...]

So, after all the above, I suggest the following actions:


[001109 all] The current writeup is adequate. See the resolution in the issue header.

# Issue Class Status Source Opened Closed
A-31 Overlaying tail padding data closed CodeSourcery 001019 001109
Summary: Should we change the decision to overlay tail padding in class layout? For volatile members? In general?
Resolution: The overlaying of tail padding is eliminated, but we will retain the treatment of empty bases.

[001019 CodeSourcery -- Mark] I think I recall that the committee was intentionally trying to use the tail padding of one object to save space. For example, consider:

  struct A { short s; char c; };
  struct B { A a; char d; };
  

(These are PODs, but you can easily make an equivalent non-POD example).

Here, I think the comittee wanted to give `B' size 4, by packing `d' into the tail padding of `A'.

I think this is a mistake. David Gross came up with the following example:

Code generator needs to copy dsize, not sizeof, unless it can prove that the object is in a context where tail padding isn't overlayed. Reason? Tail padding might be overlayed by a volatile field.

Hence, a non-POD that looks like

      struct S { short sh; char ch; };
  

requires ld2/st2/ld1/st1 for a copy instead of ld4/st4 because we might have

      struct T { S s; volatile char d; };
  

Similarly, people using memcpy to copy around POD components of non-PODs will get burned.

This completely breaks user expectation since people routinely expect to be able to stick a function or two into a POD without changing its layout.

I think we should make the following changes:

Note that this still permits the empty base optimization; nvsize will be zero, and sizeof will be 1.

There's an important different between using the tail padding in an empty base and the tail padding in a generic object: you know that you never have to copy an empty base.


[001109 all] Although dealing with tail padding overlaying would be straightforward in a from-scratch compiler, getting the information to all the places in the back end of g++ or the HP compiler that would need it is a huge task (estimated at a widely scattered 1500 lines of code touched in g++). In addition, it is expected that some number of users moving back and forth between C and C++ and trying to match C structs with C++ non-POD classes will have problems, though there are questions about how many.

Therefore, we have decided to eliminate the overlaying of tail padding. Mark will provide alternate proposed wording for the ABI document.


B. Virtual Function Handling Issues

# Issue Class Status Source Opened Closed
B-1 Adjustment of "this" pointer (e.g. thunks) data call closed SGI 990520 991202
Summary: There are several methods for adjusting the this pointer for a member function call, including thunks or offsets located in the vtable. We need to agree on the mechanism used, and on the location of offsets, if any are needed. To maximize performance on IA64, a slightly unusual approach such as using secondary entry points to perform the adjustment may actually prove interesting.
Resolution: See the writeup in the Draft C++ ABI for IA-64.

[990623 HP -- Christophe]

Open Issues Relevant To This Discussion

  1. Keeping all of a class in a single load module. The vtable contains the target address and one copy of the target GP. This implies that it is not in text, and that it is generated by dld.

  2. Detailed layout of the virtual table.

  3. How can we share class offsets?

1. Scope and "State of the Art"

The following proposal applies only to calls to virtual functions when a this pointer adjustment is required from a base class to a derived class. Essentially, this means multiple inheritance, and the existence of two or more virtual table pointers (vptr) in the complete object. The multiple vptrs are required so that the layout of all bases is unchanged in the complete object. There will be one additional vptr for each base class which already required a vptr, but cannot be placed in the whole object so that it shares its vptr with the whole object. Note: when the vptr is shared, the base class is said to be the "primary base class", and there is only one such class.

For the primary base class, no pointer adjustment is needed. For all other bases, a pointer to the whole object is not a pointer to the base class, so whenever a pointer to the base class is needed, adjustment will occur.

In particular, when calling a virtual function, one does not know in advance in which class the function was actually defined. Depending on the actual class of the object pointed to, pointer adjustment may be needed or not, and the pointer adjustment value may vary from class to class. The existing solution is to have the vtable point not to the function itself, but to a "thunk" which does pointer adjustment when needed, and then jumps to the actual function. Another possibility is to have an offset in the vtable, which is used by the called function. However, more often than not, this implies adding zero.

Virtual bases make things slightly more complicated. In that case, the data layout is such that there is only one instance of the virtual base in the whole object. Therefore, the offset from a this pointer to a same virtual base may change along the inheritance tree. This is solved by placing an offset in the virtual table, which is used to adjust the this pointer to the virtual base.

2. Proposal and Rationale

My proposal is to replace thunks with offsets, with two additional tricks:

The thunks are believed to cost more on IA64 than they would on other platforms. The reason is that they are small islands of code spread throughout the code, where you cannot guarantee any cache locality. Since they immediately follow an indirect branch, chances are we will always encounter both a branch misprediction and a I-cache miss in a row.

On the other hand, a virtual function call starts by reading the virtual function address. Reading the offset immediately thereafter should almost never cause a D-cache miss (cache locality should be good). More often than not, no adjustment is needed, or the adjustment will be done at call site correctly. In the worst case scenario, we perform two adjustments, one static at call site, and one dynamic in the callee, but this case should be really infrequent.

3. New Calling Convention

The new calling convention requires that the 'this' pointer on entry points to the class for which the virtual function is just defined. That is, for A::f(), the pointer is an A* when the main entry of the function is reached. If the actual pointer is not an A*, then an adjusting entry point is used, which immediately precedes the function.

In the following, we will assume the following examples:

    struct A { virtual void f(); };
    struct B { virtual void g(); };
    struct C: A, B { }
    struct D : C { virtual void f(); virtual void g(); }
    struct E: Other, C { virtual void f(); virtual void g(); }
    struct F: D, E { virtual void f(); }

    void call_Cf(C *c) { c->f(); }
    void call_Cg(C *c) { c->g(); }
    void call_Df(D* d) { d->f(); }
    void call_Dg(D* d) { d->g(); }
    void call_Ef(E* e) { e->f(); }
    void call_Eg(E* e) { e->g(); }
    void call_Ff(F *ff) { ff->f(); }
    void call_Fg(F *ff) { ff->g(); }	// Invalid: ambiguous

a) Call site:
The caller performs adjustment to match the class of the last overrider of the given function.

  • call_Cf will assume that the pointer needs to be cast to an A*, since C::f is actually A::f. Since A is the primary base class, no adjustment is done at call site.

  • call_Cg is similar, but assumes that the actual type is a B*, and performs the adjustment, since B is not the primary base class.

  • call_Df and call_Dg will assume that the pointer needs to be cast to a D*, which is where D::f is defined. No adjustment is performed at call site.

b) Callee

  • A::f and B::g are defined in classes where there is a single vptr. They don't define a secondary entry point. Because of call-site conventions, they expect to always be called with the correct type.

  • D::f is defined in a class where there is more than one vptr, so it needs a secondary entry point and an entry 'convert_to_D' in the vtable. That's because it can be potentially called with either an A* or a B*. There are two vtables, one for A in D, one for B in D. The D::f entry in A in D points to the non-adjusting entry point, since A shares its vptr.

  • D::g requires a secondary entry point, that will read the same offset 'convert_to_D' from the vtable.

  • E also will require a 'convert_to_E' entry in the vtable, but this time, the vtable for A in C will have to point to an adjusting entry point, since A no longer shares the vptr with E (assuming Other has a vptr). This vtable is also the vtable of C in E.

c) Offsets in the vtable
Offsets have to be placed in the vtable at a position which does not conflict with any offset in the inheritance tree.

convert_to_D and convert_to_E are likely to be at the same offset in the vtable. This is not a problem, even if D and E are used in the same class, such as F, because this is the same offset in different vtables.

  • call_Fg is invalid, because it is ambiguous.

  • A notation such as ((E*) ff)->g() can be used to disambiguate, but in that case, we don't use the same vtable (either the E in F or D in F vtable). The E in F vtable uses that offset as 'convert_to_E', whereas the D in F vtable uses that offset as 'convert_to_D'.

  • Similarly, call_Cf called with an F object will actually be called with the E in F or D in F, which disambiguates which C is actually used. The actual C* passed will have been adjusted by the caller unambiguously, or the call will be invalid.

  • For functions overriden in F, an entry 'convert_to_F' is created anyway. This entry will not overlap with either convert_to_E or convert_to_D.

The fact that an offset is reserved does not mean that it is actually used. A vtable need to contain the offset only if it refers to a function that will use it. An offset of 0 is not needed, since the function pointer will point to the non-adjusting entry point in that case.

4. Cases where adjustment is performed

In other words, adjustment is made only when necessary, and at a place where it is better scheduled than with thunks. The only bad case is double adjustment for call_Cg called with an E*. This case can probably be considered rare enough, compared to calls such as call_Cg called with a C*, where we now actually do the adjustment at the call-site.

5. Comparing the code trails

Currently, the sequence for a virtual function call in a shared library will look as follows. I'm assuming +DD64, there would be some additional addp4 in +DD32. The trail below is the dynamic execution sequence. In bold and between #if/#endif, the affected code.

        // Compute the address of the vptr in the object,
	// from the this pointer
        // Optional, since vptroffset is often 0.
	// This also adjusts to the class of the final overrider
        addi            Rthis=vptroffset_of_final_overrider,Rthis
        ;;
        // Load the vptr in a register
        ld8             Rvptr=[Rthis]
        ;;
        // Add the offset to get to the function descriptor pointer
	// in the vtable.  Never zero, this instruction is always generated
        addi            Rfndescr=fndescroffset,Rvptr
        ;;
        // (Assuming inlined stub) Load the function address and new GP
        ld8             Rfnaddr=[Rfndescr],8
        ;;
        // Load the new GP
        ld8             GP=[Rfndescr]
        mov             BRn=Rfnaddr
        ;;
        // Perform the actual branch to the target

        // ...
        // ... Branch misprediction almost always, followed by
        // ... I-Cache miss almost always if jumping to a thunk
        br.call B0=BRn

#if OLD_ADJUST
thunk_A::f_from_a_B:
        // If the 'adjustment_from_B_to_A is the 'adjustment_to_A' above,
        // then in the new case, the vtable directly points to A::f
        addi            Rthis,adjustment_from_B_to_A

        // In most cases, we can probably generate a PC-relative branch here
        // It is unclear whether we would correctly predict that branch
        // (since it is assumed that we arrive here immediately following
        // a misprediction at call site)
        br              A::f
#endif // OLD_ADJUST

// This occurs less often than OLD_ADJUST
// (it does not happen when call-site adjustment is correct)
#if NEW_ADJUST
adjusting_entry_A::f
        // Can't be executed in less than 3 cycles?
        addi            Rvptr=class_adjustment_offset,Rvptr
        ;;
        // This loads data which is close to the fn descriptor,
        // so it's likely to be in the D-cache
        ld8             Rvptr=[Rvptr]
        ;;
        add             Rthis=Rthis,Rvptr
#endif

A::f:
        alloc   ...

[990812 All] Discussion of B-6 raises questions of impact on the above approach. Christophe will look at the issues.

[990826 Cygnus -- Jason] [An alternative suggestion from Jason via email.]

Rather than per-function offsets, we have per-target type offsets. These offsets (if any) are stored at a negative index from the vptr. When a derived class D overrides a virtual function F from a base class B, if no previously allocated offset slot can be reused, we add one to the beginning of the vtable(s) of the closest base(s) which are non-virtually derived from B. In the case of non-virtual inheritance, that would be D's vtable; in simple virtual inheritance, it would be B's. The vtables are written out in one large block, laid out like an object of the class, so if B is a non-virtual base of D, we can find the D vtable from the B vptr.

D::f then recieves a B*, loads the offset from the vtable, and makes the adjustment to get a D*. The plan is to also have a non-adjusting vtable entry in D's vtable, so we don't have to do two adjustments to call D::f with a D*; the implementation of this is up to the compiler. I expect that for g++, we will do the adjustment in a thunk which just falls into the main function.

The performance problems with classic thunks occur when the thunk is not close enough to the function it jumps to for a pc-relative branch. This cannot be avoided in certain cases of virtual inheritance, where a derived class must whip up a thunk for a new adjustment to a method it doesn't override.

In this case, we will only ever have one thunk per function, so we don't even have to jump. Except in the case of covariant returns, that is, where we will have one per return adjustment. But we know all necessary adjustments at the point of definition of the function, so they can all be within pc-relative branch range.

[Extensive discussion followed by email -- this suggestion is not completely correct, but may be the basis of a workable solution.]

[990831 Cygnus -- Ian] A couple of observations ...

On the state of the art:

The Microsoft approach is worth mentioning. (I haven't seen it discussed -- though perhaps that is because of the patent situation.)

It allows zero-adjusting (i.e. non-thunking) calls for (almost) every virtual function call in a non-virtual, multiple inheritance hierarchy.

For those that are unfamiliar, the idea is that all calls go via the base class vft and overriding functions expect a pointer to the base class type. (That is, if D::f overrides B::f, it expects the first parameter to be of type B*, not D*.) The callee does the necessary static adjustment to get to the derived class 'this' pointer as needed.

It avoids requiring a thunk, and it's often the case that the cost is zero in the callee because the this-adjustment can be folded into other offset computations.

On the balance, it could well win over all the other approaches being discussed here. [Though, it may lose in some specific cases vs. Christophe's approach where one would create additional extra entries in the derived class vft.]

On when to make extra virtual function table entries for functions:

One of Cristophe's suggestions is sort-of separate from the rest of the discussion: making extra entries in the derived class' vft for some overridden virtual functions. It has the benefit of giving you a faster calls if you happen to be in (or near) the derived class -- at the expense of space in the vft.

Of course, you can always make the call through the introducing base class, so these extra entries are a pure space/time performance trade off (w/ some unpredictable D-cache effects) and the cost/benefit analysis will depend a little on what the rest of the strategy looks like.

The same idea is potentially applicable, no matter what strategy you actually use for vft layout, and different criteria for deciding what extra entries to make are possible. For example, creating an extra entry when overriding a function introduced in a virtual base has the added benefit of avoiding a cast to a virtual base at the call site.

[990909 All] We are getting closer -- understanding of the alternatives is improving, and Christophe may agree with the Jason/Brian proposal after more thought. To make sure we really understand what we're agreeing to, Jason and Christophe will write up more precise proposal(s).


[991111 jason]

Final virtual calling convention:

We have decided that for virtual functions not inherited from a virtual base, regular thunks will work fine, since we can emit them immediately before the function to avoid the indirect branch penalty; we will use offsets in the vtable for functions that come from a virtual base, because it is impossible to predict what the offset between the current class and its virtual base will be in classes derived from the current class.

The calling convention is as follows:


[991202 all] Adopt Jason's writeup.

# Issue Class Status Source Opened Closed
B-2 Covariant return types call closed SGI 990520 990722
Summary: There are several methods for adjusting the 'this' pointer of the returned value for member functions with covariant return types. We need to decide how this is done. Return thunks might be especially costly on IA64, so a solution based on returning multiple pointers may prove more interesting.
Resolution: Provide a separate Vtable entry for each return type.

[990610 Matt] One possibility is to have two Vtable entries, which might point to different functions, different entrypoints, or a real entrypoint and a thunk. Another is to return two result pointers (base/derived), and have the caller select the right one.

[990715 All] Daveed presented his multiple-return-value scheme, including an example that involved virtual base classes, return values that are pointers to nonpolymorphic classes, and other equally horrible things.

Consensus: we need to get the horrible cases correct, but speed only matters in the simple case. The simple case: class B has a virtual function f returning a B1* and class D has a virtual function f returning a D1*, where all four classes are polymorphic, B is a primary base of D, and B1 is a primary base of D1. (The really important case is where B1 is B and D1 is D, but that simplification doesn't make any difference.)

Jason: Would the usual multiple-entry-point scheme work just as well? That is, would it be just as fast as Daveed's scheme in the simple case, and still preserve enough information for the more complicated cases? It appears so, but we don't have a proof. Jason will try to provide one.

[990716 Cygnus -- Jason] Proof? You always know what types a given override must be able to return, and you know how to convert from the return type to those base types. You know from the entry point which type is desired. Seems pretty straightforward to me.

[990716 Cygnus -- Jason] The alternative I was talking about yesterday goes something like this:

When we have a non-trivial covariant return situation, we create a new entry in the vtable for the new return type. The caller chooses which vtable entry to use based on the type they want.

This could be implemented several ways, at the discretion of the vendor:

  1. Multiple entry points to one function, with an internal flag indicating which type to return.

  2. Thunks which intercept the function's return and modify the return value. Note that unlike the case of calling virtual functions, for covariant returns we always know which adjustments will be needed, so we don't have to pay for a long branch. We do, however, lose the 1-1 correspondence between calls and returns, which apparently affects performance on the Pentium Pro.

  3. Function duplication.

The advantage of this approach to the complex case is that we don't have to do a dynamic_cast when faced with multiple levels of virtual derivation. It is also strictly simpler; Daveed's model already requires something like this in cases of multiple inheritance.

Of course, we can always mix and match; we could choose to only do this in cases of virtual inheritance, or use Daveed's proposal and do this only in cases of repeated virtual inheritance. In that case, the multiple returns would just be an optimization for the single virtual inheritance case.

Since we don't seem to care about the performance of anything but single nonvirtual inheritance, it seems simpler not to bother with multiple returns.

The remaining question is how to handle the case of nontrivial nonvirtual inheritance: do we use multiple slots or have the caller do the adjustment? My inclination is to have the caller adjust.

WRT patents, the idea of having the function return the base-most class and having the caller adjust is parallel to the patented Microsoft scheme whereby they pass the base-most class as the 'this' argument to virtual functions, but the word 'return' does not appear anywhere in the patent, so it seems safe.

[990722 All] The group was generally agreed that the simplicity of multiple entries in the vtable outweighed any space/performance advantage of more complex schemes (e.g. the method Daveed described on 15 July). Discussion focussed on whether it is worthwhile to eliminate some of the entries in cases where they are unnecessary because the caller knows the required conversion, namely when the return type has a unique non-virtual subobject of the original return type.

Agreement was reached to avoid the complication of eliminating some of the Vtable entries. Thus, the Vtable will have one entry for each accessible return type of a covariant virtual function. These may be implemented in a variety of ways, e.g. duplicated functions, separate entrypoints, or stubs, and the ABI need not specify the choice. The location of the Vtable entries is part of the separate Vtable layout issue B-6.

# Issue Class Status Source Opened Closed
B-3 Allowed caching of vtable contents call closed HP 990603 990805
Summary: The contents of the vtable can sometimes be modified, but the concensus is that it is nonetheless always allowed to "cache" elements, i.e. to retain them in registers and reuse them, whenever it is really useful. However, this may sometimes break "beyond the standard" code, such as code loading a shared library that replaces a virtual function. Can we all agree when caching is allowed?
Resolution : Caching is allowed.

[990604 HP -- Christophe] Mike (Ball) gave me what I believe is an excellent definition of when caching is allowed. I'd like him to present it.

[990805 All] Christophe explained that the rule is simply that, within a call to a member function of the class, the class Vtable may not be modified. Between such calls, no assumption may be made. With this observation, the issue is closed.

[990812 All] The rule is even simpler. Once a program changes the type of a pointer's target, the pointer is invalidated, and its value may not be reused. Therefore, a code sequence which repeatedly refers to the same pointer value is invalid if the pointee's vtable has been changed.

# Issue Class Status Source Opened Closed
B-4 Function descriptors in vtable data closed HP 990603 990805
Summary: For a runtime architecture where the caller is expected to load the GP of the callee (if it is in, or may be in, a different DSO), e.g. HP/UX, what should vtable entries contain? One possibility is to put a function address/GP pair in the vtable. Another is to include only the address of a thunk which loads the GP before doing the actual call.
Resolution : The Vtable will contain a function address/GP pair.

[990624 All] Note that putting GP in the Vtable prevents putting it in shared memory. See B-7.

[990805 All] It was decided that special representations to accomodate shared memory would be expensive and therefore undesirable. Therefore, the decision is to put the function address/GP pair in the vtable, avoiding the cost of an extra indirection in using it.

[991007 IBM -- Brian] A while ago Jason was worried about COM compatibility. Part of that is to ensure that vtables can be expressed in C. But the resolution of issue B-4 says that a vtable contains function descriptors rather than function descriptor pointers.

From the standpoint of call performance that is a good thing, but the result can't be built in C. I know that we at least will also have to rewrite parts of our C++ runtime that hand-build vtables. Neither of these are critical for IBM but may be for others.

[991103 Cygnus -- Richard Henderson]

> The ia64 C++ ABI committee has decided to use the descriptors.
> If this doesn't make sense (i.e. if there's no way to express
> such a thing to the assembler), now's the time to let us know...:)

You mean you want the vtable to look like

      struct { void *code, *gp } vtable[];

There are no suitable IA-64 relocations to express this.

[991106 SGI -- Jim] Richard Henderson of Cygnus points out that the IA-64 relocations don't support doing this (inserting a function descriptor in data). However, the R_IA_64_IPLT*SB relocations do perform the correct action. The problem is that they are currently specified to be valid only in executables and shared objects. I believe that the problem can be solved by simply removing this restriction. The static linker support required shouldn't be major -- it would presumably just pass the relocations through to the linked object and let the dynamic linker deal with them.

The above issue has been raised with the IA-64 base ABI group.

# Issue Class Status Source Opened Closed
B-5 Where are vtables emitted? data closed HP 990603 991118
Summary: In C++, there are various things with external linkage that can be defined in multiple translation units, while the ODR requires that the program behave as if there were only a single definition. From the user's standpoint, this applies to inlines and templates. From the implementation's perspective, it also applies to things like vtables and RTTI info. (We call this vague linkage.)
Resolution: Vtables will be emitted with the key function (first virtual function that is not inline at the point of class definition), if any. If no key function, emit everywhere used (i.e. referred to by name). Place in a comdat group in all cases.

[990624 Cygnus -- Jason] There are several ways of dealing with vague linkage items:

  1. Emit them everywhere and only use one.
  2. Use some heuristic to decide where to emit them.
  3. Use a database to decide where to emit them.
  4. Generate them at link time.

#3 and #4 are feasible for templates, but I consider them too heavyweight to be used for other things.

The typical heuristic for #2 is "with the first non-inline, non-abstract virtual function in the class". This works pretty well, but fails for classes that have no such virtual function, and for non-member inlines. Worse, the heuristic may produce different results in different translation units, as a method could be defined inline after being declared non-inline in the class body. So we have to handle multiple copies in some cases anyway.

The way to handle this in standard ELF is weak symbols. If all definitions are marked weak, the linker will choose one and the others will just sit there taking up space.

Christophe mentioned the other day that the HP compiler used the typical heuristic above, and handled the case of different results by encoding the key function in the vtable name. But this seems unnecessary when we can just choose one of multiple defns.

A better solution than weak symbols alone would be to set things up so that the linker will discard the extra copies. Various existing implementations of this are:

  1. The Microsoft PE/COFF defn includes support for COMDAT sections, which key off of the first symbol defined. One copy is chosen, others are discarded. You can specify conditions to the linker (must have same contents, must have same size).

  2. The IBM XCOFF platform includes a garbage-collecting linker; sections that are not referenced in a sweep from main are discarded. In xlC, template instantiations are emitted in separate sections, with encoded names; at link time, one copy is renamed to the real mangled name, and the others are discarded by garbage collection.

The GNU ELF toolchain does a variant of #1 here; any sections with names beginning with ".gnu.linkonce." are treated as COMDAT sections. It seems more sensible to me to key off of the section name than the first symbol name as in PE.

The GNU linker recently added support for garbage collection, and I've been thinking about changing our handling of vague linkage to make use of it, but haven't.

I propose that the ia64 base ABI be extended to provide for either COMDAT sections or garbage collection, and that we use that support for vague linkage.

I further propose that we not use heuristics to cut down the number of copies ahead of time; they usually work fine, but can cause problems in some situations, such as when not all of the class's members are in the same symbol space. Does the ia64 ABI provide for controlling which symbols are exported from a shared library?

A side issue: What do we want to do with dynamically-initialized variables? The same thing, or use COMMON? I propose COMMON.

See also G-3, for vague linkage of inlined routines and their static variables.


[990624 SGI summarizing others] HP uses COMDAT for many cases, keying from the symbol names. HP also uses some heuristics. HP observes that IA-64 objects will already be large. From the base ABI discussions, any use of WEAK or COMMON symbols will need to take care not to depend on vendor-specific treatment.

Defining a COMDAT mechanism doesn't preclude using heuristics to avoid some copies up front. A COMDAT mechanism should also specify how to get rid of associated sections like debugging info, unless the identical mechanism works.


[990629 HP -- Christophe] First, the "usual" heuristic (which is usual because it dates back to Cfront) is to emit vtables in the translation unit that contains the definition of the first non inline, non pure virtual function. That is, for:


        struct X {
                void a();
                virtual void f() { return; }
                virtual void g() = 0;
                virtual void h();
                virtual void i();
        };
the vtable is emitted only in the TU that contains the definition of h().

This breaks and becomes non-portable if:

Now, the COMDAT issue is as follows: a COMDAT section is, in some cases, slightly more difficult to handle (at least, that's the impression Jason gave me). For statics with runtime initialization, what you can do is reserve COMMON space ('easier'), then initialize that space at runtime. As I said, the problem is if two compilers disagree on whether this is a runtime or a compile time initialization, such as in :


	int f() { return 1; }
	int x = f();	// Static (COMDAT) or Dynamic (COMMON) initialization?

So I personally recommend that we put everything in COMDAT.

[990715 All] Consensus so far: use a heuristic for vtable and typeinfo emission, based on the definition of the key function. (The first virtual function that is not declared inline in the class definition.) The vtable must be emitted where the key function is defined, it may also be emitted in other translation units as well. If there is no key function then the vtable must be emitted in any translation unit that refers to the vtable in any way.

Implication: the linker must be prepared to discard duplicate vtables. We want to use COMDAT sections for this (and for other entities with vague linkage.)

Open issue: the elf format allows only 16 bits for section identifiers, and typically two of those bits are already taken up for other things. So we've only got 16k sections available, which is unacceptable if we're creating lots of small sections.

Jason - COMDATs disappear into text and data at link time, so the issue is really only serious if we've got more than 16k vtables (or template instantiations, etc.) in a single translation unit.

Daveed - HP has gotten around this problem by hacking their ELF files to steal another 8 bits from somewhere else.

Jack - a new kind of section table could be a viable solution. However, it would break everything if we did it for ia32. Is a solution that only works on ia64 acceptable? Note also that the elf section table has its own string table, which we wouldn't be able to share with the new kind of section table. Index and link fields often point into section table, we would have to figure out how to deal with this. (Jack is not opposed to the idea of an alternate section table, he is just pointing out some of the issues we will have to resolve.)


[990805 All] We need a specific proposed representation for COMDAT. IBM's version is restricted to one symbol per section. Jim will look for Microsoft's PECOFF definition. Anyone else with a usable definition should send it.


[revised 991012 SGI]

C++ ABI: COMDAT Proposal

Revisions

[991007] Change default to simply group; COMDAT semantics is option. Don't support removal based on duplication of non-COMDAT sections. Just remove symbols defined relative to removed sections.

Introduction

C++ has many situations where the compiler may need to emit code or data, but may not be able to identify a unique compilation unit where it should be emitted. The approach chosen by the C++ ABI group to deal with this problem, is to allow the compiler to emit the required information in multiple compilation units, in a form which allows the linker to remove all but one copy. This is essentially the idea called COMDAT in several existing implementations.

Various other implementations (notably Windows NT) and proposals obtain more generality by varying the duplicate removal semantics. The most obviously useful variant supports grouping of sections for removal purposes, but treats duplication as an error, using it to support link-time removal of unreferenced sections. The proposal below treats this simple grouping as the default semantics, and provides duplicate removal as an option.

Our objectives include:

Proposal

The proposal below is based on the HP definition, with minor modifications and more precise definitions.

SHF_GROUP: Group Member Sections

A section which is part of a group, and is to be retained or discarded with the group as a whole, is identified by a new section header attribute:
SHF_GROUP
This section is a member (perhaps the only one) of a group of sections, and the linker should retain or discard all or none of the members. This section must be referenced in a SHT_GROUP section (see below).

This attribute flag may be set in any section header, and no other modification or indication is made in the grouped sections. All additional information is contained in the associated SHT_GROUP section (see below).

SHT_GROUP: Section Group Definition

Some sections occur in interrelated groups. For instance, an out-of-line definition of an inline function might require, in addition to its .text section, a read-only data section containing literals referenced, one or more debug information sections, and/or other informational sections. Furthermore, there may be internal references among these sections that would not make sense if one of them were removed or replaced by a duplicate from another object. Therefore, we assume that such groups are to be included or omitted from the linked object as a unit. (Except for the GRP_COMDAT flag described below, this definition does not specify the circumstances under which the members of a group might be discarded from the linked object.)

To facilitate this, we define a SHT_GROUP section:

The section header attributes of a Group Section are:

name unspecified
sh_type SHT_GROUP
sh_link .symtab section index
sh_info symbol index
sh_flags none
sh_entsize size of section indices (4)
requirements may not be stripped

The section group's sh_link field identifies a symbol table section, and its sh_info field the index of a symbol in that section. The name of that symbol is treated as the identifier of the section group.

The section data of a SHT_GROUP section is a flag word followed by a sequence of section indices. The flag word may contain the following flags:

GRP_COMDAT (0x1)
This is a COMDAT group. It may duplicate another COMDAT group in another object file, where duplication is defined as having the same identifying symbol name. In such cases, only one of the duplicate groups should be retained by the linker, and the remaining groups should be discarded.

The section indices in the SHT_GROUP section identify the sections which make up the group.

The sh_size value is sh_entsize times one plus the number of sections in the group.

The linker may choose to discard a section in a group, i.e. not include its data in the linked object, based on COMDAT duplicate semantics (above), or for other implementation-defined reasons (e.g. removing unreferenced code). If it does so, the group semantics requires that all of the group members be removed as a unit.

(Note, however, that this is not intended to imply that special-case behavior like removing debug information requires removing the sections to which it refers, even if they are in a group. We could clarify this issue by tying the removal semantics to the section which contains the identifying symbol, but this seems overly restrictive and unnecessary.

Requirements

Questions


[revised 991012 SGI]

gABI: Section Indices

Revisions and Status

[991007] Change section/flag names, move ELF header extension to section header 0.

Background

SGI has long been concerned about the 64K limitation on the number of sections in an object file. Although this need not normally be a problem, there are purposes for which we would like to place distinct functions, and sometimes data items, in distinct sections. When one takes into account associated sections, e.g. relocation, debug information, etc., this leads to a limitation on the order of 16K units, and threatens to be a problem for some large compilation units such as machine-generated simulators.

C++ ABI efforts raise the same issue from another source. Various C++ structures are emitted under circumstances where the compiler cannot reliably identify a single compilation unit in which to emit them. Examples include common cases like class virtual tables, out-of-line copies of inline functions, and template instantiations. The favored solution is COMDAT sections, i.e. putting the potentially duplicated items in their own sections, and allowing the linker to remove the duplicates. Once again, though, this threatens to be a problem for very large compilation units.

The following proposal attempts to remove this limitation. Obviously, even if the problem is real, it will actually arise in very few compilation units. Therefore, the elements of the proposed solution are defined so as to leave unchanged object files which do not encounter the problem. We consider this compatibility objective as primary -- much more important than performance or clean definitions for the problematic object files -- particularly as it should allow vendors to merge the solution into existing tool chains at convenient times without disrupting existing programs.

Proposed ABI wording is in normal font; commentary is in italics. Section numbers are from the Intel IA-64 psABI.


Proposed gABI Changes

General Approach

The range of section indices from 0xff00 (SHN_LORESERVE) to 0xffff (SHN_HIRESERVE) is reserved for special purposes, and the gABI already forbids real sections with these indices. Our approach is to deal with situations where section indices cannot be compatibly expanded to a full 32 bits by using one of these indices as an escape value indicating that the actual index will be found elsewhere.

4.1 Elf Header

The ELF header has two relevant 16-bit fields: e_shnum contains the section count, and e_shtrndx the index of a string section. We modify their descriptions to include an overflow indicator, and put the actual values in the reserved section header at index 0 if necessary, as follows:

ElfXX_Half e_shnum;
This member holds the number of entries in the section header table. Thus the product of e_shentsize and e_shnum gives the section header table's size in bytes. If a file has no section header table, e_shnum holds the value zero.

If the number of sections is greater than SHN_LORESERVE (0xff00), this member has the value SHN_XINDEX (0xffff), and the actual number of section header table entries is in the member sh_size of the section header at index 0.

ElfXX_Half e_shstrndx;
This member holds the section header table index of the entry associated with the section name string table. If the file has no section name string table, this member holds the value SHN_UNDEF. See ``Sections'' and ``String Table'' below for more information.

If the section name string table index is greater than SHN_LORESERVE (0xff00), this member has the value SHN_XINDEX (0xffff), and the actual index of the section name string table is in the member sh_link of the section header at index 0.

4.2 Sections

We define a new special section index as an escape value for large section indices, as referenced above:

SHN_XINDEX (0xffff)
This special section index means, conventionally, that the actual section index is too large to fit in the field where it appears, and is to be found in another location (specific to the structure where it appears).

We note here that the section header contains two fields commonly used to hold section indices, sh_link and sh_info, but they are already defined as ElfXX_Word, and require no change.

A new section type is defined:

SHT_SYMTAB_SHNDX (17)
A section of this type is paired with an SHT_SYMTAB section, if any of the symbols in that section reference a section index larger than 16 bits. It contains a table of 32-bit section indices, one for each symbol in the symbol table section, in the same order.

The sh_link field of this section contains the index of the associated SHT_SYMTAB section.

A new special section name is defined:

.symtab_shndx
This section holds a section header index table for an associated .symtab section. The section's attributes will include the SHF_ALLOC bit if the associated .symtab section does; otherwise, that bit will be off.

There is no available field to point from the .symtab section to its associated .symtab_shndx section, so we use the sh_link field in the latter to point back. It is recommended (but not required) that implementations place each .symtab_shndx section immediately after its associated .symtab section (in the section header table) to make it easy for the linker to find.

4.x Symbol Table

The symbol table is the most problematic. It has no convenient location for an expanded section index. Therefore, we propose that the escape value imply redirection to a separate, parallel table containing full-size section indices.

Modify the definition of st_shndx as follows:

st_shndx
Every symbol table entry is defined in relation to some section. This member holds the relevant section header table index.

As the sh_link and sh_info interpretation table and the related text describe, section indexes in the range 0xff00 to 0xffff indicate special meanings. In particular, SHN_XINDEX (0xffff) indicates that the real index is too large to fit in this field, and must be found in the associated SHT_SYMTAB_SHNDX table (above).

If any of the st_shndx fields in a symbol table section contain the value SHN_XINDEX (0xffff), there must be an associated SHT_SYMTAB_SHNDX section, with a sh_link field containing the index of this SHT_SYMTAB section. That section contains an array of 32-bit section indices, matching the symbol table entries 1-1 in the same order. Entries corresponding to SHN_XINDEX (0xffff) values of st_shndx in the symbol table must contain the actual section header index to be used. Others should contain either the correct section header index (i.e. duplicating the value in st_shndx), or zero.

The .dynsym section in a linked object is completely analogous to a .symtab section in a relocatable object, and could be handled in the same way with the addition of a dynamic tag to locate it. We have not specified handling here because we expect the linking process to remove most of the section duplication process which causes the problem, e.g. leaving only a small number of .text sections.

Compatibility

There should be no compatibility impact on existing environments, since only very large section counts require object file changes. Individual vendors can postpone implementation until convenient, with no impact on typical programs.

Note, however, that any ELF consumer applications that are currently storing section indices as 16-bit values must change.


[991014 All] Jim Dehnert will push these proposals to the base ABI committee.


[991118 All] A class vtable will be emitted with the key function (the first virtual function that is not inline at the point of class definition), if any. If there is no key function, it will be emitted in every compilation where used (i.e. referred to by name). It will be placed in a comdat group in all cases.

# Issue Class Status Source Opened Closed
B-6 Virtual function table layout data closed SGI 990520 991028
Summary: What is the layout of the Vtable?
Resolution: See the Draft C++ ABI for IA-64, abi.html.

[990624] Issue split from A-1.

[990630 HP - Christophe]

The current full proposal has been incorporated in the Draft C++ ABI for IA-64.

[990701 All] The above arrived to late for everyone to read it carefully. It was agreed that we would consider it outside the meetings, discuss any issues noted by email, and attempt to close on 22 July. (Christophe is on vacation until that week, and Daveed leaves on vacation the next week.)

[990811 SGI -- Jim] I've put a reworked version of Christophe's writeup in the Draft C++ ABI for IA-64, along with a number of questions it raises.

[990812 All] Extensive discussion of this issue produced the observations that

Christophe will look at the implications of these observations. Others should too.

[990820 IBM -- Brian]

Re: vtable layout, sharing vtable offsets

I'm going to write the exam on this to see how well I am understanding the issue.

If I understand it correctly, the proposal under consideration is tied to the decision to replicate virtual function entries in vtables. It requires replicating in the vtable for base class B all virtual functions that are overridden in B; more replication that this implies will be wasted since a function is always called through a vtable of an introducing or overriding class.

When a non-pure virtual function X::f() is compiled it is possible to determine whether it requires a secondary entry point. It will require one if that function may be virtually called (i.e., is the final overrider) in any class in which f() appears in more than one vtable; this needs to be decidable knowing only X. A rule that works is: X::f() overrides one or more f()'s from base classes of X, and either one or more of those base classes are virtual or X fails to share its vptr with all instances of them.

[Though a virtual base may happen to share its vptr with X in an object of complete type X, that relationship may fail to hold in further derived classes, so we need to generate the secondary entry point just in case.] ["Sharing a vptr" is the condition under which no adjustment is necessary; if the bases involved are all nonvirtual then subsequent class derivation won't change this.]

Each vtable that requires a nonzero adjustment will have a "convert to X" offset mixed in with its virtual base offsets. It is necessary that a "convert to X" appears in the same position in each vtable that references X::f()'s secondary entry; it is desirable that the "convert to X" also be unique in each vtable.

Assume that X has nonvirtual nonprimary bases Nx (x=1,2,...), and virtual bases Vx, all of which have a virtual f(). Then vtables for Nx in X, or in anyclass derived from X that does not further override f(), will reference X::f()'s secondary entry. Vtables for Vx in X or any derived class where Vx does not share a vptr with X, will also reference X::f()'s secondary entry; note this will occur in a construction vtable even if the derived class does further override f().

The question, then, is whether a position for the "convert to X" offset can be chosen, knowing only X and its parentage, that can be used consistently in all those vtables and that won't collide with a "convert to Y" position chosen on account of some other hierarchy where Y::g() overrides an Nx::g() or Vx::g().

If Y derives from X, we will be able to select a "convert to Y" position that doesn't conflict, so we can restrict our attention to cases where X and Y are unrelated. Also, if the base involved is nonvirtual (Nx) then we are safe, because no instance of Nx will be a subobject of both X and Y, so no Nx vtable will require both "convert to X" and "convert to Y" offsets.

The remaining case is where X and Y are unrelated but both have a virtual base Vx:

struct V1 { virtual void f();  virtual void g(); };
struct Other1 { virtual void ignore1(); }
struct X : Other1, virtual V1 { virtual void f(); }

struct Y : Other1, virtual V1 { virtual void g(); }

struct ZZ: X, Y { }

The vtable for N1 in ZZ does require both offsets. The only way I see to accomplish this is to preallocate an adjustment slot for each virtual function in V1. That is, X::f() uses the first slot position, and Y::g() the second, based on the order that f() and g() are declared in V1. This only needs to be done in hierarchies where V1 is virtual, but the same offset has to be used for any Nx tables in X too.

Is this close?

Re: Concatenating vtables

I don't understand the comment that varying numbers of virtual base offsets make it impossible to concatenate vtables and refer to them via a single symbol. The only code that refers by name to X's vtable and the vtables of N1 in X etc. is X's constructor and destructor, and maybe some derived classes that find they are able to reuse some pieces. All that code is aware of X's declaration and can map out its tables. What am I missing?

[990826 All] There is still considerable confusion about what will work. Key questions are (1) whether member functions can share offsets to base classes, or each need their own; and (2) when we need a no-this-adjustment override entry.

[990901 SGI -- Jim] Being confused myself by all the discussion, I've constructed a new page containing (initially) an example of a class hierarchy supplied by Christophe, and attempted to identify possible function calls, the class data layout, and the class vtable layout based on Christophe's original proposal. Please provide corrections, and if you're proposing alternative vtable constructions, describing them for this example might help (me, at least). Also feel free to provide additional examples illustrating other points.

[990930 Cygnus -- Jason] Jason has updated the Vtable layout description in abi.html to reflect the approach from Cygnus and IBM.


[991014 all]

  1. Do we promote base offsets out of base class vtables? Answer: we promote them out of virtual bases, but we do not promote them out of nonvirtual bases. It's a time/space tradeoff. The time saving is large for virtual bases, but too small to bother with for nonvirtual bases.

  2. Do we have rtti fields for classes that have virtual bases but no virtual functions? The C++ standard regards such classes as nonpolymorphic, so performing rtti operations on them is undefined. Decision: we will keep the rtti fields themselves in the vtable, in the interest of having a uniform vtable format. The slot of offset to beginning of complete type will be filled in, and the slot for offset to typeinfo object will contain 0.

  3. When we discussed issue B-8, we agreed that we would have an offset to typeinfo object rather than a pointer to typeinfo object. This means that the typeinfo object is now part of the vtable. It will go at the very beginning, i.e. at a negative offset from where the vtpr points. (Comment: We discussed B-6 before discussing B-8. Does making this change interfere with having a uniform vtable offset, since we won't have a typeinfo object at the beginning of a vtable for a nonpolymorphic class with virtual bases? Should we revisit decision (2) or (3), or am I just being paranoid?)

ACTION ITEMS: Jason---update writeup to reflect these three changes. Our decision on issue B-8 will require a one-sentence change. All of us: study the revised version. We are almost ready to close this issue, and if we agree with the revised version we can close it at the 21 October meeting.


[991028 all] It was agreed to accept the version currently in the Draft C++ ABI for IA-64, abi.html.

# Issue Class Status Source Opened Closed
B-7 Objects and Vtables in shared memory data closed HP 990624 990805
Summary: Is it possible to allocate objects in shared memory? For polymorphic objects, this implies that the Vtable must also be in shared memory.
Resolution : No special representation is useful in support of shared memory.

[990624 All] Note that putting GP in the Vtable prevents putting it in shared memory. This interacts with B-4.

[990624 HP -- Cary] For a C++ object to be placed into shared memory, its vtable pointer must be valid in all processes that are sharing that object.

  1. If the vtable can be placed in text, that would be fine, but the vtable contains function pointers (or descriptors) that require runtime relocation, so it must be in data.

  2. We can place the vtables in shared memory, but only if the function pointers/descriptors are valid in all processes. The entry point addresses, which refer to shared text, should be shareable, but the gp values may not be identical for all processes. (RTTI pointers are also an issue, and could be solved by putting the RTTI information in shared memory as well.)

  3. We can place the vtables in private memory, provided they are at the same address in all processes.

One way or another, we need a way of ensuring that a pointer from shared memory to private memory is valid in all processes, which means that we will need a means to ensure that certain shared library data segments can get mapped at the same address in all processes that load those certain libraries.

My wild idea a few years ago was to put the vtables in shared memory (by allocating and building them at load time, as Taligent did), and store a shared library index in place of the gp value in each function descriptor. Each process would have its own table of gp values, indexed by this shared library index, but the index space would be managed system-wide. The C++ runtime library would have been responsible for allocating a new index for each unique C++ shared library loaded on the system, then storing the process-local copy of the gp pointer in the appropriate slot of the table.

[990628 SGI -- Jim] Note a further problem with vtables in shared memory (Cary's point 2). If a virtual function comes from another DSO, it may be pre-empted differently in different programs. Hence, the function pointer itself is a problem even if the GP isn't.

[990701 All] An extensive discussion boiled down to a few points:

These ideas are very fuzzy. Participants should think about the need and possibilities and attempt to identify more concrete approaches.

[990805 All] It was determined (largely based on consideration by Jason) that the only practical approach to putting objects in shared memory is to force the objects, Vtables, functions, etc. to the same addresses in the various processes involved. If this is done, data representation issues are irrelevant. Therefore, this issue is closed as moot.

Note that the base psABI defines a flag, EF_IA_64_ABSOLUTE, which forces an executable object to the addresses specified in ELF, so at least one method of representing this is already available.

# Issue Class Status Source Opened Closed
B-8 dynamic_cast data closed SGI 990628 991014
Summary: What information to we put in the vtable to enable (a) dynamic_cast from pointer-to-base to pointer-to-derived (including detection of ambiguous base classes) and (b) dynamic_cast to void*?
Resolution : The vtable will contain an offset to the beginning of the complete object, and a pointer to the typeinfo object.

[990701 All] This should be part of the proposal Daveed will put together by the 15th (action #13); the group will discuss it on the 22nd.

[990812 Sun -- Michael] Sun has provided a description, in a separate page, describing their implementation. They are filing for a patent on the algorithms described.

[991014 All] This is closely related to issues A-6 and B-6. It is agreed that what we need is an offset to the beginning of the complete object, and a pointer or offset to the typeinfo object. We choose to have an offset to the typeinfo object instead of a pointer, which effectively means that the typeinfo object is part of the vtable. We will put it at the very beginning, at a negative offset from the vptr.

[991027 SGI -- Matt] At the October 14 meeting we decided to include RTTI information as part of the vtable block, and to include an offset to RTTI information in the vtable rather than a pointer to RTTI information. (We decided on this change so that we would have fewer symbols to resolve at link time.)

Jim came up with a serious objection at the October 21 meeting: during construction we need different RTTI information at different points. A few of us talked about this at Kona, and my impression is that Jim's objection is fatal. We could imagine having base class typeinfo objects in every vtable block, but (1) this would kill any performance advantage we'd get by using an offset rather than a pointer; and (2) we'd lose the ability to use simple pointer identity as a way of telling whether two typeinfos represent the same type.

I propose that we abandon that decision, and go back to using pointers. Does everyone agree?

[991028 All] Agreed.

# Issue Class Status Source Opened Closed
B-9 Primary base vtable embedding data closed Cygnus 000217 000302
Summary: Resolve the embedding of the vtable for the primary base class in the derived class vtable.
Resolution: Any class with virtual bases shall contain vbase pointers for all of its virtual bases.

[000217 All] Jason noticed an issue today involving the layout of primary vtables.

Our chosen layout starts with the primary base class vtable layout (if any), and adds additional vbase/vcall offsets to the beginning, and additional vfunc pointers at the end. It is then followed by the secondary vtables, in inheritance graph order.

We have assumed, for instance in our decision not to propagate vbase offsets from non-virtual bases, that the secondary vtables were directly accessible at compile-time offsets from the primary vptr. However, this is not currently the case if we are dealing with a class that is the primary base of a derived class. The derived class's additional vfunc pointers will be added between the base class vtable and its secondary vtables for the base's base classes. Therefore, non-overridden base class member functions, at least, can't make assumptions about secondary vtable offsets.

One can, of course, get to the secondary vtable via the secondary vptr in the object, but that costs an additional load.

There is a "solution" that should work, but is a touch ugly. That would be to place the additional vfunc fields for the derived class not immediately after the primary base vtable, but after all of its non-virtual secondary vtables. If we don't think this is worthwhile, we should reconsider the decision about promoting vbase offsets.

[000302 All] It was decided that the simplest solution is to include vbase pointers for all virtual bases, even those with vbase pointers in direct base vtables. They may then be referenced via either the primary or the secondary vtable.

# Issue Class Status Source Opened Closed
B-10 Pure virtual runtime call closed CodeSourcery 000629 000706
Summary: Define a runtime proxy routine for pure virtual functions.
Resolution: Define such a runtime routine, with implementation-defined behavior.

[000629 CodeSourcery -- Mark] We need to have a standard entry point to put in vtables to indicate a pure virtual function. (Some compilers use __pure_virtual, for example.) I think we want:

  extern "C" void __cxa_pure_virtual ();

[000706 All] Accepted. We will not mandate behavior, since this will be called only in case of Standard-specified undefined behavior, but will comment that program termination is expected, possibly after an error message.


C. Object Construction/Destruction Issues

# Issue Class Status Source Opened Closed
C-1 Interaction with .init/.fini lif ps closed SGI 990520 991202
Summary: Static objects with dynamic constructors must be constructed at intialization time. This is done via the executable object initialization functions that are identified (in ELF) by the DT_INIT and DT_INIT_ARRAY dynamic tags. How should the compiler identify the constructors to be called in this way? One traditional mechanism is to put calls in a .init section. Another, used by HP, is to put function addresses in a .init_array section.

The dual question arises for static object destructors. Again, the extant mechanisms include putting calls in a .fini section, or putting function addresses in a .fini_array section.

Finally, which mechanism (DT_INIT or DT_INIT_ARRAY, or the FINI versions) should be used in linked objects? The gABI, and the IA-64 psABI, will support both, with DT_INIT being executed before the DT_INIT_ARRAY elements.

Resolution: Use .init_array and .fini_array sections.

[991202 All] It was decided to use the array forms for all required initialization or finalization entries, i.e. to put initialization entries into .init_array sections with ELF section type SHT_INIT_ARRAY, and finalization entries into .fini_array sections with ELF section type SHT_FINI_ARRAY. The static linker will combine them, and identify them to the dynamic linker using DT_INIT_ARRAY, DT_INIT_ARRAYSZ, DT_FINI_ARRAY, and DT_FINI_ARRAYSZ dynamic tags.

# Issue Class Status Source Opened Closed
C-2 Order of ctors/dtors w.r.t. link lif ps closed HP 990603 000817
Summary: Given that the compiler has identified constructor/destructor calls for static objects in each relocatable object, in what order should the static linker combine them in the linked executable object? (The initialization order determines the finalization order, as its opposite.)
Resolution: Accepted method based on IBM's specification. See Draft C++ ABI for IA-64, Section 3.3.4.

[990610 All] Meeting concensus is that the desirable order is right to left on the link command line, i.e. last listed relocatable object is initialized first.


[990701 SGI] We propose that global constructors be handled as follows:

This does not address the global destructor problem. That solution needs to deal not only with the global objects seen by the compiler, but also interspersed local static objects. This treatment seems to be tied up in the question of how early unloading of DSOs is handled, and the data structure used for that purpose (issue C-3).


[990715 All] Cygnus scheme: priorities are 16-bit unsigned integers, lower numbers are higher priority. In each translation unit, there's a single initialization function for each priority. Anything that's prioritized has a higher priority than anything that isn't explicitly assigned a priority.

IBM scheme: priorities are 32-bit signed integers, higher numbers are higher priority. Something that isn't explicitly assigned a priority effectively gets a priority of 0.

Consensus: nobody is sure that negative priorities are very important, but also nobody can think of a reason not to allow them. We accept the idea that priorities are 32-bit signed integers. On a source level Cygnus will keep lower numbers as higher priority, but that's a source issue, not an ABI issue.

Status: No real technical issues, we have consensus on everything that matters. We need to write up the finicky details.


[990722 all] It was decided to follow the IBM approach, including:

To be resolved are the precise source pragma definition (possibly IBM's), and the ELF file representation.


[990729 all] SGI suggested an object representation involving (in relocatables) a new section type, containing pairs <priority, entry address>. The linker would merge all such sections, include any initialization entries specified by other means, and leave one or more DT_INITARRAY entries for normal runtime initialization, either building a routine to call the entries, or referencing a standard runtime routine.

IBM noted that they combine their equivalent data structures in the linker, but don't sort them, leaving that to a runtime routine. This can be done without explicit linker support, but involves runtime overhead.

Cygnus suggested that if we are going to require linker sorting, we should make the facility more general.

Jim will write up a more precise proposal.


[990804 SGI -- Jim]

Proposal

My objectives are:

Object File Representation

Define a new section type, e.g. SHT_CXX_PRIORITY_INIT. Its elements are structs:

	typedef struct {
	  ElfXX_Word	pi_pri;
	  ElfXX_Addr	pi_addr;
	} ElfXX_Cxx_Priority_Init;
The semantics are that pi_addr is a function pointer, with an unsigned int priority parameter, which performs some initialization at priority pi_pri. Each of these functions will be called with the GP of the executable object containing the table. The section header field sh_entsize is 8 for ELF-32, or 16 for ELF-64.

Runtime Library Support

Each implementation shall provide a runtime library function with prototype:

void __cxx_priority_init ( ElfXX_Cxx_Priority_Init *pi, int cnt );
It will be called with the address of a cnt-element (sub-)vector of the priority initialization entries, and will call each of them in order. It will be called with the GP of the initialization entries.

Linker Processing

The linker must take the collection of SHT_CXX_PRIORITY_INIT section entries from the relocatable object files being linked, and other initialization tasks specified in other ways (and treated as source priority 0 or object priority -MIN_INT), and produce an executable object file which executes the initialization tasks in priority order using only DT_INIT, DT_INIT_ARRAY, and __cxx_priority_init. Priority order is first according to the priority of the task, and then according to the order of relocatable objects and options in the link command. The order of tasks specified by other methods, relative to SHT_CXX_PRIORITY_INIT tasks of priority zero, is implementation defined. There are several possible implementations. Two extremes are:

Note that if one is linking ELF-32 objects into a 64-bit program, the entries must be expanded as part of this process.

Sorting Sections

Jason suggested that if we base this feature on sorting sections, we should provide a general mechanism. Following is a proposal for that purpose.

Define a new section header flag, SHF_SORT. If present, the linker is required to sort the elements of the concatenated sections of the same type, where the elements are determined by sh_entsize. The sort is controlled by fields in sh_info:

#define SH_INFO_KEYSIZE(info) (info & 0xff)
The size of the sort key (bytes).

#define SH_INFO_KEYSTART(info) ((info>>8) & 0xff)
The start byte of the sort key within element, from 0.

#define SH_INFO_SORTKIND(info) ((info>>16) & 0xf)
The kind of sort data: 0 for unsigned integer, 1 for signed integer.

The sort must be stable. The sort key must be naturally aligned.

Other conceivable options would be to allow sorting strings (like SHF_MERGE, this would be indicated by setting SHF_STRING and putting the character size in sh_entsize), or floating point data. Also, note that if we don't anticipate using such a general mechanism, it becomes possible to avoid padding words in the ELF-64 format by separating the priority and address vectors.


[990810 HU-B -- Martin] Global destructor ordering must not only interleave with static locals, but also with atexit. This gives two problems: atexit is only guaranteed to support 32 functions; and dynamic unloading of DSOs break when functions are atexit registered.


[990810 SGI -- Matt] Yes, the interleaving is required by the C++ standard. It's a nuisance, and I don't think there's any good reason for it, but the requirement is quite explicit.

The relevant part of the C++ standard is section 3.6.3, paragraph 3:

"If a function is registered with atexit (see , 18.3) then following the call to exit, any objects with static storage duration initialized prior to the registration of that function shall not be destroyed until the registered function is called from the termination process and has completed. For an object with static storage duration constructed after a function is registered with atexit, then following the call to exit, the registered function is not called until the execution of the object's destructor has completed. If atexit is called during the construction of an object, the complete object to which it belongs shall be destroyed before the registered function is called."

What this implies to me is that atexit, and the part of the runtime library that handles destructors for static objects, must know about each other.


[990812 All] Some people would prefer a sorting scheme based on the section name instead of the data, and also less linker impact. Jim will look into alternatives.


[991110 SGI -- Jim] I said I would revisit my proposal, looking at two questions:

  1. Can we get less linker impact?
  2. Can we sort based on section name instead of data?
I'll address them separately.

A) Linker impact

I believe the proposal made need have almost no linker impact. Consider the second suggested implementation scheme, based on IBM's description of their approach.

A minimalist implementation (from the linker point of view) includes:

  1. The link components are bracketed (either by a driver constructing the command line, or by implicit arguments generated within the linker) by two INIT_ARRAY entries. The first calls

    __cxx_priority_init_begin()

    The one at the end calls

    __cxx_priority_init_end()

    These are both in the implementation runtime. The begin routine determines the address and size of the SHT_CXX_PRIORITY_INIT section (below). It sorts the section by priority, and calls __cxx_priority_init(addr,cnt) as described in the proposal with the count of <=0 entries.

    __cxx_priority_init_end calls __cxx_priority_init(addr,cnt) with the address and count of >0 entries.

  2. The linker simply concatenates the SHT_CXX_PRIORITY_INIT sections, and emits markers (DT entries) that allow __cxx_priority_init_begin to find the section and its size. At the same time, it creates a init_array section from other (i.e. non-constructor) entries as it normally would, which of course gets bracketed by the entries described above.

  3. At runtime, when loading the executable object, the init_array entries are executed, thereby sorting the constructor entries, executing the <=0-priority entries, executing the non-constructor entries, and finally executing the >0-priority entries.

My original proposal did not describe the dynamic tags to delimit the section, nor the __cxx_priority_init_ routines. Given such an approach, it's hard for me to imagine much less linker impact.

Now suppose you want to minimize runtime instead of linker impact -- the first suggested implementation scheme. There are at least two approaches:

One of my original objectives, and I think a key attribute of this proposal, is that this full range of possible implementations, from minimal linker impact to minimal runtime impact, makes absolutely no difference to the generated .o files -- compatibility between compilers does not depend on the chosen link-time implementation.

B) Sorting approach

Sorting is a more interesting issue. I see four possibilities:

  1. No sorting -- the low-linker-impact approach above.

  2. Implicit sorting -- the low-runtime approach above, with knowledge explicit in the linker about how to sort SHT_CXX_PRIORITY_INIT.

  3. Explicit sorting within a section, e.g. what my proposal described, based on an explicit sorting specification that describes the size of objects to be sorted and the key location.

  4. Explicit sorting of sections, based on a sort key encoded in the section name (for example).

I'll say up front that I think implicit sorting is adequate for the purpose at hand, and I'd like to understand other applications before I'd choose (3) or (4).

There are two differences between (3) and (4):

Either would work for the application at hand. Approach (3) would require only one SHT_CXX_PRIORITY_INIT section per .o file, while approach (4) would require up to one such section per constructor call (though only if the user used lots of different priorities). I personally think sorting based on a data vector that's already been concatenated should be much more efficient, but it probably doesn't matter much.

On the other hand, sorting an arbitrarily-sized section, based on an external key, is more flexible except that the keys may be more constrained. So, again, I think the choice comes down to other applications of the feature. Absent significant other demands, I'd just stick to implicit sorting (and optional at that) for now.


[991202 All] An extensive discussion failed to reach concensus, but clarified the issues.

The proposed alternative of sorting based on section name is specifically the Linux implementation of treating all section names containing a dollar sign ($) as being a section name before the dollar sign and a sort key after it. As mentioned above, this has the advantage of being more general, except with respect to the sort key, which isn't an issue here, and it is implemented in Linux.

The primary concern with the Linux approach is that some implementations must deal with static linkers which are under control of other groups or companies, and therefore can't depend on getting linker sorting implemented. IBM has been in that position, though it isn't clear whether it will be an issue on IA-64.

A secondary concern is a general objection from SGI to features that depend on section naming rather than section types and attributes.

Jim will attempt to frame the issue and get feedback from the base ABI group.

[000106 All] We will wait for base ABI feedback before deciding.


[000502 SGI -- Jim] The base ABI group is not particularly interested in this, because they are not getting pressure from their C++ people to worry about it. So, if we want to standardize this, we need to apply pressure within our companies.

We have three choices:

I don't think we should pursue the first unless we have vendors anxious to support it.


[000504 All] The sense of the meeting was that since multiple vendors are going to implement this capability, the ABI will be much healthier if we can agree on the implementation. Otherwise, object files cannot be mixed. We will pursue this further.

[000720 All] Jim reported that the psABI group agreed to allocate a section type for this purpose, and will add a writeup to the Draft ABI (section 3.3.4).

[000803 All] We will follow more closely the IBM pragma semantics: no variable names, applying until the next pragma or end of file. Rename the pragma simply "priority."

[000808 SGI -- Dehnert] I remembered why I changed the pragma name. I'm concerned about "priority" conflicting with more traditional uses of the term, e.g. for multiprocessing priority.

[000817 All] Accepted, changing pragma name from init_priority to priority. There is no conflict with OpenMP or pthreads.

# Issue Class Status Source Opened Closed
C-3 Order of ctors/dtors w.r.t. DSOs ps closed HP 990603 000504
Summary: Given the constructor/destructor calls for each executable object comprising a program, what is the order of execution between objects? For constructors, there is not much question: unless we choose some explicit means of control, file-scope objects will be initialized by the DT_INIT/DT_INITARRAY functions in the order determined by the base ABI order rules, and local objects will be initialized in the order their containing scopes are entered.

For destructors, the Standard requires opposite-order destruction, which implies a runtime structure to keep track of the order. Furthermore, the potential for dynamic unloading of a DSO (e.g. by dlclose) requires a mechanism for early destruction of a subset.

Resolution: Accept SGI proposal for a simple API which registers destructors and atexit calls. Subsequently, accept proposal to eliminate call to __cxa_finalize when program exits.


[990804 SGI -- Jim]

Proposal

My objectives are:

Runtime Data Structure

The runtime library shall maintain a list of termination functions with the following information about each:

The representation of this structure is implementation defined. All references are via the API described below.

Runtime API

  1. Object construction:

    When a global or local static object is constructed, which will require destruction on exit, a termination function is registered as follows:

    int __cxa_atexit ( void (*f)(void *), void *p, dso_handle d );
    This registration, e.g. __cxa_atexit(f,p,d), is intended to cause the call f(p) when DSO d is unloaded, before all such termination calls registered before this one. It returns zero if registration is successful, nonzero on failure. Should we use exceptions instead?

    The registration function is called separate from the constructor.

  2. User atexit calls:

    When the user registers exit functions with atexit, they should be registered with NULL parameter and DSO handle, i.e.

    __cxa_atexit ( f, NULL, NULL );
    It is expected that implementations supporting both C and C++ will integrate this capability into the libc atexit implementation, so that C-only DSOs will nevertheless interact with C++ programs in a C++-standard-conforming manner. No user interface to __cxa_atexit is supported, so the user is not able register an atexit function with a parameter or a home DSO.

  3. Termination:

    When linking any DSO containing a call to __cxa_atexit, the linker should define a hidden symbol __dso_handle, with a value which is an address in one of the object's segments. (It doesn't matter what address, as long as they are different in different DSOs.) It should also include a call to the following function in the FINI list (to be executed first):

    void __cxa_finalize ( dso_handle d );
    The parameter passed should be __dso_handle.

    Note that the above can be accomplished either by explicitly providing the symbol and call in the linker, or by implicitly including a relocatable object in the link with the necessary definitions, using a .fini_array section for the FINI call. Also, note that these can be omitted for an object with no calls to __cxa_atexit, but they can be safely included in all objects.

    Finally, a main program should be linked with a FINI call to __cxa_finalize with NULL parameter.

    When __cxa_finalize(d) is called, it should walk the termination function list, calling each in turn if d matches __dso_handle for the termination function entry. If d == NULL, it should call all of them. Multiple calls to __cxa_finalize should not result in calling termination function entries multiple times; the implementation may either remove entries or mark them finished.

    Issue: By passing a NULL-terminated vector of DSO handles to __cxa_finalize instead of one, we could deal with unloading multiple DSOs at once. However, dlclose closes one at a time, so I'm not sure the extra complexity is worthwhile.

Since __cxa_atexit and __cxa_finalize must both manipulate the same termination function list, they must be defined in the implementation's C++ runtime library, rather than in the individual linked objects.


[991202 All] The proposal above is accepted, with three changes (integrated above):

During discussion, it was noted that this proposal will not deal effectively with DSOs which (a) have cross-DSO destructor interactions and (b) are unloaded dynamically. It is generally believed that such code would not reliably work on a variety of platforms today, and is not a robust methodology worthy of ABI support. However, note that if it becomes an issue, it would be possible to define a __cxa_finalize analog which takes a list of DSOs instead of a single DSO, and if the program or dynamic linker identifies a set of DSOs to be unloaded together, run their finalization entries in a single pass instead of one DSO at a time.


[991215 CodeSourcery -- Mark] Note that the type of "__dso_handle" above is not specified. Since the simplest implementation is for the static linker to resolve it into an arbitrary address in the DSO, define it as "void *".


[991216 CodeSourcery -- Mark]

What I'm suggesting (for exit finalization) is:


[991217 CodeSourcery -- Mark] I've attached the GNU libc source files. Basically, none of these routines are implemented in terms of the others; instead, they just share a common data structure. I think the source will make it clear; none of these files is more than 50 lines or so.

================================
=====  filename="cxa_atexit.c"
================================

/* Copyright (C) 1999 Free Software Foundation, Inc.
   This file is part of the GNU C Library.

   The GNU C Library is free software; you can redistribute it and/or
   modify it under the terms of the GNU Library General Public License as
   published by the Free Software Foundation; either version 2 of the
   License, or (at your option) any later version.

   The GNU C Library is distributed in the hope that it will be useful,
   but WITHOUT ANY WARRANTY; without even the implied warranty of
   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
   Library General Public License for more details.

   You should have received a copy of the GNU Library General Public
   License along with the GNU C Library; see the file COPYING.LIB.  If not,
   write to the Free Software Foundation, Inc., 59 Temple Place - Suite 330,
   Boston, MA 02111-1307, USA.  */

#include 
#include "exit.h"

/* Register a function to be called by exit or when a shared library
   is unloaded.  This function is only called from code generated by
   the C++ compiler.  */
int
__cxa_atexit (void (*func) (void *), void *arg, void *d)
{
  
  struct exit_function *new = __new_exitfn ();

  if (new == NULL)
    return -1;

  new->flavor = ef_cxa;
  new->func.cxa.fn = func;
  new->func.cxa.arg = arg;
  new->func.cxa.dso_handle = d;
  return 0;
}

================================
=====  filename="cxa_finalize.c"
================================

/* Copyright (C) 1999 Free Software Foundation, Inc.
   This file is part of the GNU C Library.

   The GNU C Library is free software; you can redistribute it and/or
   modify it under the terms of the GNU Library General Public License as
   published by the Free Software Foundation; either version 2 of the
   License, or (at your option) any later version.

   The GNU C Library is distributed in the hope that it will be useful,
   but WITHOUT ANY WARRANTY; without even the implied warranty of
   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
   Library General Public License for more details.

   You should have received a copy of the GNU Library General Public
   License along with the GNU C Library; see the file COPYING.LIB.  If not,
   write to the Free Software Foundation, Inc., 59 Temple Place - Suite 330,
   Boston, MA 02111-1307, USA.  */

#include 
#include "exit.h"

/* If D is non-NULL, call all functions registered with `__cxa_atexit'
   with the same dso handle.  Otherwise, if D is NULL, do nothing.  */

void
__cxa_finalize (void *d)
{
  struct exit_function_list *funcs;

  if (!d)
    return;

  for (funcs = __exit_funcs; funcs; funcs = funcs->next)
    {
      struct exit_function *f;

      for (f = &funcs->fns[funcs->idx - 1]; f >= &funcs->fns[0]; --f)
        {
          if (f->flavor == ef_cxa && d == f->func.cxa.dso_handle)
            {
              (*f->func.cxa.fn) (f->func.cxa.arg);
              /* We don't want to run this cleanup again.  */
              f->flavor = ef_free;
            }
        }
    }
}

===========================
=====  filename="atexit.c"
===========================

/* Copyright (C) 1991, 1996, 1999 Free Software Foundation, Inc.
   This file is part of the GNU C Library.

   The GNU C Library is free software; you can redistribute it and/or
   modify it under the terms of the GNU Library General Public License as
   published by the Free Software Foundation; either version 2 of the
   License, or (at your option) any later version.

   The GNU C Library is distributed in the hope that it will be useful,
   but WITHOUT ANY WARRANTY; without even the implied warranty of
   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
   Library General Public License for more details.

   You should have received a copy of the GNU Library General Public
   License along with the GNU C Library; see the file COPYING.LIB.  If not,
   write to the Free Software Foundation, Inc., 59 Temple Place - Suite 330,
   Boston, MA 02111-1307, USA.  */

#include 
#include 
#include "exit.h"


/* Register FUNC to be executed by `exit'.  */
int
atexit (void (*func) (void))
{
  struct exit_function *new = __new_exitfn ();

  if (new == NULL)
    return -1;

  new->flavor = ef_at;
  new->func.at = func;
  return 0;
}


/* We change global data, so we need locking.  */
__libc_lock_define_initialized (static, lock)


static struct exit_function_list initial;
struct exit_function_list *__exit_funcs = &initial;

struct exit_function *
__new_exitfn (void)
{
  struct exit_function_list *l;
  size_t i = 0;

  __libc_lock_lock (lock);

  for (l = __exit_funcs; l != NULL; l = l->next)
    {
      for (i = 0; i < l->idx; ++i)
        if (l->fns[i].flavor == ef_free)
          break;
      if (i < l->idx)
        break;

      if (l->idx < sizeof (l->fns) / sizeof (l->fns[0]))
        {
          i = l->idx++;
          break;
        }
    }

  if (l == NULL)
    {
      l = (struct exit_function_list *)
        malloc (sizeof (struct exit_function_list));
      if (l != NULL)
        {
          l->next = __exit_funcs;
          __exit_funcs = l;

          l->idx = 1;
          i = 0;
        }
    }

  /* Mark entry as used, but we don't know the flavor now.  */
  if (l != NULL)
    l->fns[i].flavor = ef_us;

  __libc_lock_unlock (lock);

  return l == NULL ? NULL : &l->fns[i];
}

===========================
=====  filename="on_exit.c"
===========================

/* Copyright (C) 1991, 1996 Free Software Foundation, Inc.
   This file is part of the GNU C Library.

   The GNU C Library is free software; you can redistribute it and/or
   modify it under the terms of the GNU Library General Public License as
   published by the Free Software Foundation; either version 2 of the
   License, or (at your option) any later version.

   The GNU C Library is distributed in the hope that it will be useful,
   but WITHOUT ANY WARRANTY; without even the implied warranty of
   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
   Library General Public License for more details.

   You should have received a copy of the GNU Library General Public
   License along with the GNU C Library; see the file COPYING.LIB.  If not,
   write to the Free Software Foundation, Inc., 59 Temple Place - Suite 330,
   Boston, MA 02111-1307, USA.  */

#include 
#include "exit.h"

/* Register a function to be called by exit.  */
int
__on_exit (void (*func) (int status, void *arg), void *arg)
{
  struct exit_function *new = __new_exitfn ();

  if (new == NULL)
    return -1;

  new->flavor = ef_on;
  new->func.on.fn = func;
  new->func.on.arg = arg;
  return 0;
}
weak_alias (__on_exit, on_exit)

========================
=====  filename="exit.h"
========================

/* Copyright (C) 1991, 1996, 1997 Free Software Foundation, Inc.
   This file is part of the GNU C Library.

   The GNU C Library is free software; you can redistribute it and/or
   modify it under the terms of the GNU Library General Public License as
   published by the Free Software Foundation; either version 2 of the
   License, or (at your option) any later version.

   The GNU C Library is distributed in the hope that it will be useful,
   but WITHOUT ANY WARRANTY; without even the implied warranty of
   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
   Library General Public License for more details.

   You should have received a copy of the GNU Library General Public
   License along with the GNU C Library; see the file COPYING.LIB.  If not,
   write to the Free Software Foundation, Inc., 59 Temple Place - Suite 330,
   Boston, MA 02111-1307, USA.  */

#ifndef _EXIT_H
#define _EXIT_H 1

struct exit_function
  {
    enum {
       ef_free, ef_us, ef_on, ef_at, ef_cxa } flavor;
		/* `ef_free' MUST be zero! */
    union
      {
        void (*at) (void);
        struct
          {
            void (*fn) (int status, void *arg);
            void *arg;
          } on;
        struct
          {
            void (*fn) (void *arg);
            void *arg;
            void *dso_handle;
          } cxa;
      } func;
  };
struct exit_function_list
  {
    struct exit_function_list *next;
    size_t idx;
    struct exit_function fns[32];
  };
extern struct exit_function_list *__exit_funcs;

extern struct exit_function *__new_exitfn (void);

#endif  /* exit.h  */


[991220 SGI -- Jim]

In the elf context assumed by the base IA-64 ABI, I expect that a C++ program will typically be running with the C run-time library libc.so, the C++ runtime library libC.so, likely other system DSOs, and its own components.

In this context, achieving an integrated solution could be accomplished in a couple of ways. The obvious one is to replace the routines atexit, on_exit, and exit in the C run-time library with routines that are cognizant of the C++ __cxa_atexit and __cxa_finalize facilities. a less obvious method, but still generally usable, would be to insert C++-specific versions of them in the C++ runtime library, and depend on preemption to achieve the replacement. This works as long as libC.so precedes libc.so in the library list.

There are other possible non-integrated solutions, but given the assumption of the underlying IA-64 ABI, and the fact that the second solution above can work without changing the underlying C run-time library, it doesn't seem necessary to consider them.

What is an issue, however, is that the application could in theory be linked on a different system than the one where it ultimately runs, and therefore presumably on a different system than that which built the run-time library DSOs. It is that interface which we need to pin down, namely (a) what routines (with what interfaces and semantics) must be present in libC.so/libc.so, and (b) what sequences of calls the libraries may assume the program will make.

We appear to be agreed on the presence of __cxa_atexit and __cxa_finalize in libC.so, on the registration of C++ destructors and C atexit cleanup with __cxa_atexit, and on the use of __cxa_finalize for destructor execution upon early unloading. The open questions are (1) whether (or how) on_exit registration can be integrated, and (2) how the final cleanup is invoked.

The original proposal adopted ignored (1) out of ignorance, and answered (2) by specifying a call to __cxa_finalize(NULL). If (1) is addressed by calling __cxa_atexit for on_exit with a parameter, and passing an additional exit code parameter to __cxa_finalize (and thence to all the finalization actions it invokes), this works, i.e. on_exit works as currently defined by Sun and is properly integrated into the finalization order. But that assumes that the exit code is available for passing to __cxa_finalize, which may imply calling it from exit if it's not available to a .fini_array routine (which was what the original proposal specified).

Mark points out that it works to just assume that exit does the call to __cxa_finalize, or performs the equivalent processing, eliminating the need for the explicit __cxa_finalize call in .fini_array. This is slightly simpler in that it doesn't require generation of the .fini_array entry, and the library implementation can coordinate features like on_exit without exposing the interfaces necessary to implement them. It also probably preserves more faithfully the traditional semantics that atexit routines are executed before the main program .fini_array, although doing __cxa_finalize first in the latter should produce the same effect.

Note that we can't just not choose -- one approach requires the builder of the main executable to insert a .fini_array entry, while the other doesn't -- unless we want to require the run-time to handle either, which doesn't seem useful.

My current preference is to proceed with Mark's proposal, requiring that exit handle the __cxa_atexit -registered calls (but _not_ requiring that anyone explicitly register __cxa_finalize or anything else to accomplish that). Upon re-reading all the mail, this seems quite workable. In any case, I'll re-open the issue and we can discuss it next time.


[000504 All] Accept Mark's proposal. Jim will add to Draft C++ ABI for IA-64.

# Issue Class Status Source Opened Closed
C-4 Construction vtables call closed Cygnus 990603 000504
Summary: When calling a virtual function from the constructor/destructor of a base subobject, the version specific to the base type is required, unlike the typical case when calling such a vfunc for the full object from some other context. Since the pointer for that vfunc in the the subobject's sub-vtable of the full object's vtable is the full object version, some other means is required for accessing the correct vfunc.
Resolution: Accept Compaq proposal as currently documented in the Draft C++ ABI for IA-64.

[990630 HP -- Christophe] A rough idea from Christophe's original vtable layout proposal has been incorporated in the Draft C++ ABI for IA-64.

[000217 All] Coleen has generated a proposal.

[000308 All] Discussed and clarified the proposal. Jim will clarify the content descriptions. Coleen will describe how some of the base vtables can be eliminated from the construction vtable groups given vbase promotion. She should be out to California in two weeks, so we can finalize this issue.

[000323 All] Discussion clarified the two proposals and their relative merits:

It was decided that the space savings outweighed the lost optimizations, and proposal B was adopted. Jim will clean up the writeup for final adoption.

For the record, following are several issues that have been raised and resolved in the process of developing this proposal:


[000504 All] Modify VTT order to put everything in preorder, to match other aspects. Accept Compaq proposal as currently documented in the Draft C++ ABI for IA-64.

# Issue Class Status Source Opened Closed
C-5 Calling destructors call closed Sun 990603 991104
Summary: What is the calling convention for destructors? Do virtual destructors require special treatment? Is delete() integrated with the destructor call or separate? How is delete() handled when invoked on a base subobject?
Resolution: Destructors are called with a reference to this. Virtual destructors have two versions, and two entries in the vtable, one that deletes the object after destruction, and one that doesn't. There is a third version that does not delete the object, and is not in-charge, i.e. does not destroy any base objects; it is not called via the vtable.

[990729 all] Some implementations combine destructors with deletion, checking a flag in the destructor to determine whether to delete. This produces somewhat less code, especially if there are many delete() calls. However, it adds overhead to any destructor which does not require deletion, e.g. base and member objects, automatic objects. There is some concern that a runtime test is sometimes required, but noone has yet identified why.

[990819 Cygnus -- Jason] The [above] questions the usefulness of calling op delete from the destructor. But it's required by the language, in case the derived class defines its own op delete. This only applies to virtual dtors, of course.

One option would be to have two dtor slots, one which performs deletion and one which doesn't. The advantage of this sort of approach would be avoiding pulling in all the memory management code if you never actually touch the heap.

Microsoft has a patent on this device, but the old Sun ABI also talks about it, which seems to qualify as prior art.

[991014 all] One solution to the problem with destructors is to have two destructor entry points, and two destructor slots in the vtable. One entry point destroys the object and then calls operator delete, the other destroys the object without calling operator delete. We can use a similar solution for constructors (but without any impact on the vtable layout): one entry point for constructing a complete object, another for constructing a subobject.

Note that one of the entry points may call the other, but that's not an ABI issue and can be left to individual implementors.

There was general agreement that this is a promising idea. We don't have a detailed proposal yet. HP is working on a prototype implementation. Christophe will submit a writeup.


[991028 all] There are two options in destructors:

Since only the most-derived object calls delete(), and only the most-derived object does destruction for virtual bases, only three of the possible combinations arise: The first will be called for a base object when a derived object is being destroyed, and one of the other two for the most-derived object. Therefore, for any particular vtable, no more than two will be required. A vtable for a class with virtual destructors will contain two destructor entries, delete and no-delete, and they will both be the in-charge versions for the most-derived class in the structure. The no-delete, not in-charge destructors may be called from those, but always directly, so a global name is required but no vtable entry.

We distinguish the delete/no-delete cases by distinct entrypoints, so only a this parameter is required, and the standard calling conventions are used. The only special treatment of virtual destructors is the pair of vtable entries described above.

# Issue Class Status Source Opened Closed
C-6 Extra parameters to constructors call closed Cygnus 990603 991104
Summary: When calling constructors for classes with virtual bases, what information about the treatment of virtual base subobjects in the full class, or about object allocation, must be transmitted to the constructor in parameters?
Resolution: None. Two versions, and two entrypoints, of the constructor will be created: one that calls the virtual base subobject constructor (in-charge), and one that does not. Object allocation will be done by the caller.

[991028 all] We will produce two constructor entries, one in-charge (constructing virtual bases) for a most-derived object, and one not in-charge for a base subobject. The object allocation will be the responsibility of the caller, so there will be no variation or parameters for that purpose.

# Issue Class Status Source Opened Closed
C-7 Passing value parameters by reference call closed All 990624 990805
Summary: It may be desirable in some cases where a type has a non-trivial copy constructor to pass value parameters of that type by performing the copy at the call site and passing a reference.
Resolution : Whenever a class type has a non-trivial copy constructor, pass value parameters of that type by performing the copy at the call site and passing a reference.

[990701 All] Daveed and Matt will attempt to pin down the copy requirements with the Core committee, i.e. when a non-trivial copy constructor may be elided. The relevant Standard requirement is 12.8/15, and there is an open defect report related to this question. For cases where the ctor may not be elided, we expect to perform the copy at the call site, and pass a reference.

[990729 All] Matt will produce a clear proposal for when the ABI will elide the constructor (and therefore pass the class object like a normal C struct), based on the Standard's exceptions.

[990805 All] There are no cases where a non-trivial copy constructor can be simply elided for all instances of a particular parameter. Therefore, we shall use the consistent convention that, if a value parameter's (class) type has a non-trivial copy constructor, the caller will allocate space for it, perform the copy, and pass a reference.

Note that the standard does allow the caller, if the value being passed is a temporary, to construct the temporary directly into the parameter memory and elide the copy constructor call.

# Issue Class Status Source Opened Closed
C-8 Returning classes with non-trival copy constructors call closed All 990625 990722
Summary: How do we return classes with non-trivial copy constructors?
Resolution: The caller allocates space, and passes a pointer as an implicit first parameter (prior to the implicit this parameter).

# Issue Class Status Source Opened Closed
C-9 Passing parameters with ctors/dtors call closed All 991028 991104
Summary: Where do allocation, construction, destruction, and deallocation occur for value parameters?
Summary: See the description in the closed issues list.

[991028 all] For value parameter types with a non-trivial copy constructor or destructor, a call handles the parameter as follows:

  1. Space is allocated in the caller for the temporary. If there is no non-trivial copy constructor, it is in the normal parameter-passing space (registers/stack); otherwise it is allocated on the stack or heap.

  2. The caller constructs the parameter in the space allocated, a simple copy to parameter space if there is no non-trivial copy constructor.

  3. The function is called, passing the parameter value (if no non-trivial copy constructor), or its address (if there is a non-trivial copy constructor).

  4. The callee calls any non-trivial destructor for the parameter before returning. (Note that, if there is no non-trivial copy constructor, this implies that the parameter was copied out of registers on entry so the destructor can be called with this in memory.)

  5. If necessary (e.g. if the parameter was allocated on the heap), the caller deallocates space after the return.

# Issue Class Status Source Opened Closed
C-10 Synthesized copy assignments call closed All 991028 991028
Summary: Should we specify special treatment for synthesized copy assignments, to avoid multiple copies of virtual bases?
Resolution: No.

[991028 all] For classes with virtual bases, the Standard allows a synthesized copy assignment to copy the virtual bases multiple times, but does not require it. The simplest approach, recursively copying the base objects, will cause multiple copies for virtual bases with multiple inheritance paths. This can be avoided by synthesizing a second copy assignment operator which does not copy virtual bases, to be called when assigning a subobject.

The decision was made not to do so, on grounds that the situation is rare, and virtual bases are often empty besides, so that the solution is not worth the resulting code bloat.

# Issue Class Status Source Opened Closed
C-11 Array constructors/destructors call closed Cygnus 000130 000309
Summary: How are constructors/destructors run for arrays? Many compilers use a __vec_new function; g++ doesn't, to allow for inlining of constructors.
Resolution: Define standard library entries for array construction/destruction. See the Draft C++ ABI for IA-64.

# Issue Class Status Source Opened Closed
C-12 Constructor return values call closed Cygnus 000130 000309
Summary: What is the return value of a constructor? Void, this, ...?
Resolution: Void.

[000130 Cygnus -- Jason] I don't see any reason to return a value from constructors, since we will always pass in the address of the object. g++ currently returns that address, for historical reasons (previously, to support assignment to 'this').

[000131 IBM -- Mark] Currently, we use the returned value from the ctor for cases like S().i. It wouldn't be hard to change the compiler, but we do need a decision one way or another.

[000308 All] Decided to return void. Open another issue (C-13) to consider alternate allocating constructors (low priority).

# Issue Class Status Source Opened Closed
C-13 Allocating constructors call closed HP 000309 000803
Summary: Should we define allocating constructors?
Resolution: Their use is optional. Their name mangling is specified. If used, they must be emitted everywhere referenced as a COMDAT group (Draft ABI section 5.2.5).

[000308 HP -- Christophe] We should consider defining alternate constructors which allocate the object before constructing it.

[000803 All] The definition in the Draft, section 5.2.5, is accepted.

# Issue Class Status Source Opened Closed
C-14 Local-scope dynamic constructors data closed all 000309 000511
Summary: The Standard requires that local static objects with dynamic constructors be initialized exactly once, the first time the containing scope is entered. This requires a data object to serve as a guard variable; define its content or interface.
Resolution: The size of the guard variable is 64 bits. The low-order byte shall contains a boolean initialization flag.

[000309 All] We have defined a mangling for the guard variable object (issue F-1), but we need to define at least its size and either its content or a library interface to it. This is tied up with multithreading issue G-4. If we want the initialization to be implicitly thread-safe, the object probably needs to contain both an initialized flag and a thread semaphore, and it is desirable that they be in different cache lines.

[000511 All] The size of the guard variable is 64 bits. The low-order byte shall contain the value 0 prior to initialization of the associated variable, and 1 after initialization is complete. Usage of the other bytes of the guard variable is implementation-defined.

# Issue Class Status Source Opened Closed
C-15 Alternate array allocators call closed CodeSourcery 000628 000720
Summary: Allow alternate allocators/deallocators to __cxa_vec_new and __cxa_vec_delete.
Resolution: Add two new allocators, and two new deallocators, with one of each pair using a simple user deallocator and one using a user deallocator requiring a size. See the Draft C++ ABI for IA-64.

[000628 CodeSourcery -- Mark] __cxa_vec_new and __cxa_vec_delete would be a lot more useful if they accepted pointers to the allocation and deallocation functions as well. As it is, they are hard-wired to use the `::operator new[]' and `::operator delete[]'. Since the whole purpose of these functions is to provide compilers a convenient way to manage construction and destruction, I think we should either add allocation/deallocation routine pointers to these functions, or add additional entry points. This additional flexibility would also be useful for C++-compatible allocations from other languages, etc.

[000706 All] We agreed to do this. Jim will write it up.

[000720 All] Accepted as documented.

# Issue Class Status Source Opened Closed
C-16 Copy constructor runtime call closed CodeSourcery 000628 000720
Summary: Define a runtime support routine for copy constructors.
Resolution: Add a new runtime for vector copy construction. See the Draft C++ ABI for IA-64.

[000628 CodeSourcery -- Mark] I think we should also add a runtime support routine for copy constructors. Here's a sample definition:


  extern "C" void
  __cxa_vec_cctor (void *dest_array,
                   void *src_array,
                   size_t element_count,
                   size_t element_size,
                   void (*constructor) (void *, void *),
                   void (*destructor) (void *))
  {
    size_t ix = 0;
    char *dest_ptr = static_cast  (dest_array);
    char *src_ptr = static_cast  (src_array);

    try
      {
        if (constructor)
          for (; ix != element_count; 
               ix++, src_ptr += element_size, dest_ptr += element_size)
            constructor (dest_ptr, src_ptr);
      }
    catch (...)
      {
        __uncatch_exception ();
        __cxa_vec_dtor (dest_array, ix, element_size, destructor);
        throw;
      }
  }

This routine will be useful to compilers when copying a structure containing an array. The EDG front-end uses this method.

[000706 All] We agreed to do this. Jim will write it up.

[000720 All] Accepted as documented. NULL constructor is not allowed. An allocating version is not needed.

# Issue Class Status Source Opened Closed
C-18 Result buffers call closed SGI 000724 000817
Summary: Should buffers for results with non-trivial copy constructors be passed as a dummy first parameter, or in r8 as specified by the psABI for long structured results?
Resolution: All results with non-trivial copy constructors or destructors will be returned in buffers allocated by the caller, with their addresses passed as an implicit first parameter. Other structure results too large for the return registers are returned in a buffer created by the caller, with the buffer address passed in r8.

[000724 SGI -- Dehnert] I just noticed that the IA-64 psABI requires returning large aggregates (over 256 bits except for some floating point ones) via a buffer allocated by the caller and passed in r8. We have specified in the C++ ABI that class results with non-trivial copy constructors be returned in a buffer allocated by the caller and passed as an implicit first parameter (i.e. in out0, not in r8). I suggest that we make these two cases consistent, i.e. pass the buffer address in r8 instead of out0. (This would not affect non-IA-64 compilers.)

[000817 All] Accepted. In all cases where a result class object is returned in a buffer created by the caller, the buffer address will be passed in r8, and not like an implicit first parameter.

# Issue Class Status Source Opened Closed
C-19 NULL ctor/dtor API parameters call closed CodeSourcery 000806 000831
Summary: Allow NULL constructor/destructor parameters whereever it makes sense in the Section 3.3 APIs.
Resolution: Accepted as proposed.

[000806 CodeSourcery -- Mark] The ABI doesn't say whether or not the constructor and destructor parameters may be NULL for many of the functions. In some cases, it does say that the pointers may not be NULL.

I believe that a) the spec should explicitly specify this everywhere, and b) we should allow NULL pointers whenever it makes sense. These are convenience routines; why not make them convenient?

For example, why not allow __cxa_vec_new2 to be used with both NULL constructors and destructors? The caller should then pass in zero for the padding size, of course. There's no reason to try to make these routines go fast -- they're just their for convenience, and the memory allocation/function call indirection overhead will swamp a few conditionals on NULL parameters.


[000824 CodeSourcery -- Mark] Overall motiviation: there is every reason to make these functions convenient for use by compilers and for manual use in various kinds of specialized reflection-like situations, including use in debuggers. There is virtually no speed penalty for allowing NULL pointers in these functions since the tests for NULL can be performed outside of the loop, and the loop itself will normally function calls.


[000831 All] Accepted this proposal as per Mark's list above.


D. Exception Handling Issues

# Issue Class Status Source Opened Closed
D-0 Exception handling framework lib ps closed SGI 990520 991216
Summary: Define the general framework for exception handling, including Level I (psABI unwinding API) and Level II (C++ ABI exception handling API).
Resolution: See the HP proposal, accepted as a working paper, and discussions in the closed issues page.

For reference, we have design information as follows:

[990902 All] We observed that there are three levels at which we can discuss EH compatibility.

The first, minimal level is effectively that of the definition in the IA-64 Software Conventions document. It describes a framework which can be used by an arbitrary implementation, with a complete definition of the stack unwind mechanism, but no significant constraints on the language-specific processing. In particular, it is not sufficient to guarantee that two object files compiled by different C++ compilers could interoperate, e.g. throwing an exception in one of them and catching it in the other.

The second level is the minimum that must be specified to allow interoperability in the sense described above. This level requires agreement on:

The third level is a specification sufficient to allow all compliant systems to share the relevant runtime implementation. It includes, in addition to the above:

The vocal attendees at the meeting wish to achieve the third level, and we will attempt to do so. Whether or not that is achieved, however, a second-level specification must be part of the ABI.

  • [990909 All/Jim] With much further discussion, we are starting to get better understanding of one another, but there are still obviously (in my mind) mismatched underlying assumptions. To resolve this, Christophe agreed to attempt to get us the HP APIs for the exception handling routines. I have also started a document on a more complete EH specification, though it hasn't gone beyond specifying more of the underlying base ABI part. I will go farther once I get back from my trip.

  • [990922 HP -- Christophe]

    Here is a quick description of the personality routine interface and semantics. This description is a slight extension of the existing personality routine implemented by HP for IA64. The extension is to allow multiple runtimes from possibly different vendors or for possibly different languages to cooperate in processsing an exception.

    This document assumes that the chapter 11 of the Intel/HP "IA-64 = Software Conventions and Runtime Architecture" document is known to = the reader.

    INTERFACE:

    The complete exception processing framework consists of at least the = following routines: _RaiseException, _ResumeUnwind, = _DeleteException, _Unwind_getGR, = _Unwind_setGR, _Unwind_getIP, _Unwind_setIP, = _Unwind_getLanguageSpecificData, = _Unwind_getRegionStart. In addition, a language and vendor = specific personality routine will be stored by the compiler in the = unwind descriptor for the stack frames requiring exception = processing.

    UNWIND RUNTIME ROUTINES:

    The unwind runtime routines have the following interface and = semantics (all routines are extern "C"):

    uint64 _RaiseException(uint64 exception_class, void = *exception_object);

    Raise an exception, passing along the given exception class and = exception object. The exception object has been allocated by the = language-specific runtime, and has a language-specific format. = _RaiseException does not return, unless an error condition is = found (such as no handler accepting to handle the exception, bad stack = format, etc).

    The first 4 words (32 bytes) of the exception object = are allocated for use exclusively by the unwinder, and should not be = written by the personality routine or other parts of the = language-specific runtime. The first word is used to store the exception = class. The second word points to the personality routine of the frame = that threw the exception intially. The two next words are reserved for = use by the unwinder. [Note: Typical use is to keep the state of the = unwinder while executing user code, such as our current frame_handle = pointer.]

    void _ResumeUnwind (void = *exception_object);

    Resume propagation of an = existing exception. [Note: _ResumeUnwind should not be used to implement = rethrowing. To the unwinding runtime, the catch code that rethrows was a = handler, and the previous unwinding session was terminated before = entering it.] [Note: Compared to HP runtime, the exception class = and frame handle arguments have been removed. They also need no longer = be passed to the landing pads. Instead, the unwinder will store the = information in one of its 2 reserved words.]

    void _DeleteException(void = *exception_object);

    If a given runtime resumes = normal execution after catching a foreign exception, it will not know = how to delete that exception. This exception will be deleted by calling = _DeleteException, which in turn will delegate the task to the = original personality routine (see EH_DELETE_EXCEPTION_OBJECT = below).

    uint64 _Unwind_getGR(void *context, int index);
    uint64 = _Unwind_getIP(void *context);
    void _Unwind_setGR(void *context, int = index, uint64 new_value);
    void _Unwind_setIP(void *context, uint64 = new_value);

    Get or set registers from the given = unwinder context. The 'context' argument is the same argument passed to = the personality routine (see below).[Note: Minor changes compared to the = existing unwinding interface, mostly to hide the register = classes]

    uint64 _Unwind_getLanguageSpecificData(void = *context)

    Get the address of the language-specific = data area for the current stack frame. The 'context' argument = is the same argument passed to the personality routine.[Note: This is = not stricly required: it could be accessed through getIP using the = documented format of the UnwindInfoBlock, but since this work has been = done for finding the personality routine in the first place, it makes = sense to cache the result in the context, as we currently = do]

    uint64 _Unwind_getRegionStart(void = *context)

    Get the address of the beginning of the = current procedure or region of code. [Note: This is required for us = because we store data relative to the beginning of the code. So let's = make it mandatory ;-]

    PERSONALITY ROUTINE:

    The personality routine is defined with the following = interface:

    int = PersonalityRoutine
        (int = version,
         int = phase,
         UInt64 = exceptionClass,
         void * = exceptionObject,
         void = *context);

    [Note: the frame_handle argument has been removed: it was used only = once in the runtime, and the cost of reading it back from the exception = object is really minimal, compared to the cost of having to spill it in = all landing pads... The context argument type has been made opaque]

    The arguments have the following role and meanings:

    UNWINDING PHASES

    Unwinding is a 2-phases process.

    [Note: Cleanup code is code doing some user-defined cleanup such as = destructors. Compensation code is code inserted by the compiler to = compensate for an optimization that moved code past the throwing call. = Handler code is user-defined code that possibly can resume normal = execution]

    The unwinding phase argument to the personality routine is a bitwise = or of the following constants:

    TRANSFERRING CONTROL TO A LANDING = PAD:

    In the case the personality routine wants to transfer control to a = landing pad, it setups registers (including IP) to suitable values for = entering the landing pad. Prior to executing code in the landing pad, = registers not altered by the personality routine will be restored to the = exact state they were in that frame before the call that threw the = exception.

    The landing pad can either resume execution to normal (as, for = instance, at end of a C++ catch), or resume unwinding by = calling the _ResumeUnwind function and passing it the = 'exceptionObject' argument received by the personality routine. = _ResumeUnwind will never return.

    _ResumeUnwind should be called if and only if the = personality routine did not return EH_HANDLER_FOUND during = phase 1. In other words, the unwinder can allocate some resources (for = instance memory) and keep track of them in the exception object reserved = words. It should then free these resources before transferring control = to the last (handler) landing pad. It does not need to free the = resources before entering non-handler landing-pads, since = _ResumeUnwind will ultimately be called.

    The landing pad will receive various arguments from the runtime, = typically passed in registers set using _Unwind_setGR by the = personality routine. For a landing pad that can lead to = _ResumeUnwind, one argument must be the = exceptionObject pointer, which must be preserved to be passed = to _ResumeUnwind. [Note: Thanks to the 4 reserved words in the = exception object, 2 landing-pad arguments have been eliminated.] The = landing pad may receive other arguments, for instance a 'switch value' = indicating the type of the exception being caught.

    RULES FOR CORRECT INTER-LANGUAGE = OPERATION:

    The following rules must be observed for correct operation between = languages and/or runtimes from different vendors:

    CATCHING FOREIGN EXCEPTIONS IN C++

    Foreign exception can be caught in a catch(...). They can = also be caught as if they were of a __foreign_exception class, = defined in <exception>. [Note: The = __foreign_exception may have subclasses, such as = __java_exception and __ada_exception, if the runtime = is capable of identifying some of the foreign languages.]

    The behavior is undefined in the following cases:

    [Note: All these cases might involve accessing the C++ specific = content of the thrown exception, for instance to chain active = exceptions]

    Otherwise, a catch block catching a foreign exception is = allowed:

    A catch-all block may be executed during forced unwinding. For = instance, a setjmp may execute code in a catch(...) during stack = unwinding. However, if this happens, unwinding will proceed at the end = of the catch-all block, whether or not there is an explicit = rethrow.

    Setting the low 4 bytes of exception class to C++\0 is = reserved for use by C++ runtimes compatible with the common = C++ ABI.


  • [990923 All] Extensive discussion at the meeting was generally positive about the HP proposal. Several changes came up, ranging from editorial to substantive. Christophe will modify the specification.


  • [991202 All] Since there was not much time for review of HP's revised proposal, discussion was limited to relatively minor comments. This remains the highest priority area, with ongoing implementations depending on resolution. We plan a thorough discussion next week, with adoption as soon as practical. Note that the concensus remains positive, with the expectation that the proposal will undergo only minor fixes before adoption, so implementations can proceed with the current document as a basis without great risk.


  • [991209 All] Several issues arose from the discussion of HP's exception handling specification:


  • [991216 All] The HP proposal is accepted as a working paper, subject to a number of minor issues which need to be resolved, and will be opened and tracked independently. SGI volunteered to do the necessary rework to put the material into a more ABI-oriented (rather than implementation-oriented) form. (This has been done for the base ABI unwind material as of 5 January.)

    HP management has agreed to release the C++ exception handling runtime, but don't consider their unwind library suitable for release. SGI has agreed to release their unwind library. SGI is now (5 Jan) working on ABI conformance in preparation for doing so.

    It was clarified (and should be in the document) that unwinding determines in Phase 1 that an exception will be uncaught, and calls terminate() before starting Phase 2.

    # Issue Class Status Source Opened Closed
    D-2 Unwind personality routines lib ps closed SGI 990520 000106
    Summary: The IA-64 runtime conventions provide for a personality routine pointer for language-specific actions when unwinding the stack. They do not specify its interface. There are typically two required actions for C++: locating a handler (non-destructively) and destroying automatic objects while unwinding. This issue involves specification of the API (see also D-3).
    Resolution: See the exception handling specification, level 1, and the working paper.

    [990826 Intel/HP] The Software Conventions document is claimed to specify the interface, with the parameters indicating which action is required. (I can't find it, but this would be an acceptable solution -- Jim.)

    [991209 all] Observe that this issue is part of a level 1 specification, i.e. part of the base ABI. It is being described as part of the proposed common EH interface from HP.

    [000106 all] Closed -- specified as part of the accepted exception handling specification.

    # Issue Class Status Source Opened Closed
    D-3 Unwind process clarification lib ps closed SGI 990520 000106
    Summary: The IA-64 runtime conventions provide for a personality routine pointer for language-specific actions when unwinding the stack. However, they are quite muddy about the precise sequence of calls. This issue involves specification of unwind process (see also D-2).
    Resolution: See the exception handling specification.

    [991209 all] Observe that this issue is part of a level 1 specification, i.e. part of the base ABI. It is being described as part of the proposed common EH interface from HP.

    [000106 all] Closed -- specified as part of the accepted exception handling specification.

    # Issue Class Status Source Opened Closed
    D-4 Unwind routines nested? lib ps closed SGI 990520 991209
    Summary: The IA-64 runtime conventions call for the unwind personality routine to behave like a routine nested in the routine raising an exception. Is that the preferred definition?
    Resolution: This is not required, nor included in the proposed common implementation. However, a conforming implementation could add this option in the personality routine and tables.

    [990902 All] Discussion reveals that Intel and HP have very different models of how cleanup actions are handled.

    Intel builds one or more routines which are called from the unwind runtime, based on action descriptors in the unwind tables, and acting on the stack contents or objects to be destroyed without actually modifying the stack pointer until the final transfer of control to the user handler. This approach avoids actually restoring registers until the final transfer to the handler.

    HP transfers control back to a user landing pad whenever anything needs to be done -- descriptors or handlers -- and reenters the unwind runtime if further processing is required. They believe this approach to use much less space than the action descriptors would, and most importantly, that it allows arbitrary fixup for code motion around the call that throws.

    [991209 All] An implementation can conform with the proposed C++ personality routine interface and either support or not support nested handlers -- the only requirement is that the generated personality tables and routine collaborate. The proposed common EH interface from HP does not use nested functions as handlers, but could easily be extended.

    This issue is closed, with the immediate resolution of changing the base unwind ABI to not require nested function handlers.

    # Issue Class Status Source Opened Closed
    D-5 Interaction with other languages (e.g. Java) lib ps closed HP 990603 991007
    Summary: The IA64 exceptions handling framework is largely language independent. What is the behaviour of a C++ runtime receiving, for instance, an exception thrown from Java? Does it call terminate()? Does it allow the exception to pass through C++ code with destructors if there is no catch clause? Does it allow the exception to be caught in a catch(...) provided this catch(...) ends with a rethrow? Does it allow even more?
    Resolution: In general, foreign exceptions will cause normal destructor invocation and other cleanup in C++ code, and will pass through C++ frames except where explicit exception specifications do not allow them.

    [990908 SGI -- Jim] We propose that this be resolved by identifying the source language in the exception descriptor and specifying that the personality routine be able to perform cleanup actions during handling of foreign-language exceptions, but not attempt to catch them.

    [991006 All] The concensus of the group, from the discussion of the low-level exception API, is:

    [991007 All] In addition to the above, Christophe will define an exception __foreign_exception to be used by foreign-language code which wants to raise an exception that C++ can catch.

    Close this issue.

    # Issue Class Status Source Opened Closed
    D-6 Allow resumption in other languages? lib ps closed HP 990603 991007
    Summary: The exception handling framework requires the interaction of the runtime of all the languages "on the stack" during exception processing. Some of these languages may have very different exception handling semantics. What are the constraints we impose on the C++ exception handling runtime to preserve the relative language neutrality of the EH framework? Example: do we allow a handler to cleanup and resume at the point where the exception was thrown?
    Resolution: Moot -- resume-type exceptions are more appropriately handled by registering trap handlers and processing them in place. No interaction with stack traceback should be necessary.

    [990908 SGI -- Jim] The typical case of cleanup and resume is floating point trap handling, which is normally handled entirely in the original FP trap handler. Is there an example where stack walkback must occur to identify the handler, but resumption at the point-of-exception is required? I can't think of any, and I think the model of registering a trap handler is preferable for such purposes.

    # Issue Class Status Source Opened Closed
    D-7 Interaction with signals or asynch events lib ps closed HP 990603 991209
    Summary: The Standard says that the behavior of anything other than "pure C code" (POF) is implementation defined, and warns (in a note) against using EH in a signal handler. We should define what is supported, possibly explicitly stating that signal handler code must be a POF. We could allow any feature but exception handling to be used. We could allow some EH routines to be called (for instance, uncaught_exception()). Or we could allow even an exception to be thrown, if it does not exit the handler.
    Resolution: This ABI requires no support beyond the Standard requirements.

    [991006 All] This common ABI will not allow throwing exceptions from a signal handler.

    [991007 All] There remains concern about how to help customers (examples were presented of big database applications) for which raising exceptions from signal handlers for I/O failures is a highly desirable design. We will revisit this issue.

    [991209 All] Further discussion clarified the situation.

    The fundamental problem is that signals thrown from a signal handler (or otherwise asynchronously) may appear at arbitrary points in the program, where the unwind information is inadequate to reliably clean up, for instance because global variable updates have been moved across the point of exception.

    A second problem is that signals are often processed on their own stack, and making the transition to the main user stack might not happen automatically.

    As a result, it was generally agreed that dealing with exceptions raised asynchronously would require simply passing through the immediately enclosing stack frame (to avoid the first problem), and a special raise invocation (as a basis for addressing both).

    However, the only customer that has been adamant about supporting asynchronous exceptions has also been adamant that such a partial solution would not be adequate. Their intended application involves raising the exception in a simple routine that they expect to be inlined (for performance reasons) directly into a try block, which would be bypassed by the proposed solution. Since making this work would involve significant performance penalties elsewhere, the group's concensus is that there is inadequate benefit from an attempted solution.

    # Issue Class Status Source Opened Closed
    D-8 Interaction with threads packages lib ps closed SGI 990603 000106
    Summary: What happens when an exception is not caught in the thread where raised? What does uncaught_exception() return if another thread is currently processing an exception?
    Resolution: With one exception, exception handling is entirely per-thread -- exceptions must be caught in the thread where raised, and queries about them (e.g. uncaught_exception()) are answered only with respect to the thread doing the query. The only global exception behavior is handler registration -- see issue D-15.

    # Issue Class Status Source Opened Closed
    D-9 longjmp interaction lib ps closed IBM 990908 000113
    Summary: Does longjmp run destructors?
    Resolution: Define an alternate routine, longjmp_unwind in namespace abi, defined in new header cxxabi.h, which always does full cleanup during unwinding.

    [990908 IBM -- Mendell] Does longjmp run destructors? I believe that the C ABI makes this optional. I would like to propose that it does run destructors.

    [990908 SGI -- Wilkinson] The C++ standard, 18.7 paragraph 4, says a call to longjmp has undefined behavior if any automatic objects would have been destroyed by a throw/catch with the same source and destination. I don't see that this is something we need to fix.

    [990908 IBM -- Thomson] Yes it does, but ANSI is not my customer. Meeting the bare minimum of function that ANSI requires doesn't necessarily mean that users can build robust applications. How can they know to avoid longjmp in their C code, because some third party library they are using has C++ buried in it?

    [990908 SGI -- Dehnert] Implementation is a significant issue. The normal longjmp implementation is very simple -- setjmp stores the register/stack state, and longjmp copies it back and branches. There is normally no traceback involved, so what you suggest is a dramatic change, and probably would make C people very unhappy. Furthermore, C++ users have the option of using C++ exceptions, which have the effect you seek.

    [990908 SGI -- Boehm] The problem is that on the C side:

    1. A number of thread packages use setjmp/longjmp to perform context switches. In this case, the target sp is not on the same stack as the original sp, and there should not be any destructor invocations, since the original thread will be resumed, and the original sp will eventually be restored. (This isn't the optimal way to do thread switching, but it's the only one that's semi-portable, and hence it's moderately common.)

    2. Some variants of longjmp are often used to jump out of signal handlers, which may not be invoked on the original user stack (cf. sigaltstack on most Unix systems). Thus unwinding may have to cross stack boundaries.

    3. Setjmp is often used to capture the register state, e.g. for garbage collectors. (The collector I'm responsible for optionally does this. Last I looked, Guile did it unconditionally.) A straightforward stack-unwinding implementation of setjmp/longjmp would break this.

    I don't know whether it's possible to avoid breaking these clients while providing the stack-unwinding semantics.

    [990908 IBM -- Mendell/Thomson] [VisualAge C++] on OS/2 and Windows does do the unwinding. This is probably because unwinding support is in the OS. Also OS/390 and I believe AS/400 too. Our AIX implementation does not do the unwinding.

    [990909 DEC -- Brender] In addition to the systems already mentioned by others, these systems also do exception-handling compatible unwinding for C's setjmp/longjmp:

    If you believe in safe and compatible multi-language systems, there really is no choice but to do EH compatible unwinding for setjmp/longjmp -- at least by default.

    I suppose it would be OK for an implementation to offer an alternate setjmp/longjmp that could be linked in for those who either know that it is safe in particular cases or are happy to trade safety for speed...

    [990909 All] A brief discussion agreed that concensus is not absolutely necessary. An implementation could replace setjmp/longjmp with a version that either unwinds or justs restores and jumps, without breaking any code except that which assumed one or the other. (Ed.: In fact, if setjmp stores enough information to either restore or to catch an exception, one could just swap longjmp, although that would not be optimal for the unwind and catch case, since setjmp doesn't need to save much information in that case as most of what is needed is in the unwind descriptors.)

    [990923 All] We agreed that:

    See the HP low-level exception writeup at the beginning of the exception issues section.

    [991216 All] Use the name longjmp_unwind for the alternate longjmp that always does full C++ unwinding. The issue of where to put it (namespace and header) remains.

    [000106 All] We agreed to define a new header for ABI definitions, initially containing this and the special exception objects agreed upon. SGI will create an initial version. We also agreed to put ABI-defined new features in an "abi" namespace. Therefore, for this issue, we have a prototype in cxxabi.h:

        namespace abi {
    	extern "C" void longjmp_unwind (jmp_buf env, int val);
        }
    

    [000113 All] Accept as described and close.

    # Issue Class Status Source Opened Closed
    D-10 psABI proposal lib ps closed all 991216 000120
    Summary: Solidify the Level I (psABI) specification and submit it to the base ABI group.
    Resolution: See the exception handling specification.

    [991216 All] This is essentially Section 8 of the HP working paper. SGI has reworked it into the draft exception handling specification. This group needs to approve the reworked version, at which time it can be submitted to the base ABI group.

    The draft needs to clarify that the unwinder will detect uncaught exceptions in Phase 1, and call terminate() before Phase 2. Issues D-11 through D-14 below are also relevant to the Level I specification.

    [000120 All] Close with minor modifications.

    # Issue Class Status Source Opened Closed
    D-11 pthreads interface lib ps closed all 991216 000203
    Summary: Certain pthreads functionality is a prerequisite, e.g. to acquire thread-local storage. The ABI should specify the requirements, along with the expected stub behavior when the pthreads library is not present.
    Resolution: No specification necessary. This is Level 3 material.

    [000106 All] Christophe will extract a list of what the HP library expects and send it.

    [000120 HP -- Christophe]

    Data types:

    Functions:

    Extra expected functionality:


    [000201 SGI -- Jim] We propose that the following functionality be required of the base ABI. The definitions are based on the pthreads package, with multi-threading semantics. However, it is expected that an implementation will provide default versions in the C++ (or C) library for single-threading programs, and override them in the thread library for multi-threading cases.

    Two sets of functionality are provided: once-only initialization, and thread-private data key management. The group also wants a means of identifying whether the real pthreads implementation is present -- I have not yet proposed such a feature.

    Once-only Initialization
        typedef ... pthread_once_t;
    
        pthread_once_t once_control = PTHREAD_ONCE_INIT;
        int pthread_once ( pthread_once_t *once_control,
    		       void (*init_routine) (void) );
    

    The purpose of the pthread_once routine is to execute a particular initialization routine exactly once in a thread-safe manner. The user declares a control variable of type pthread_once_t statically initialized to PTHREAD_ONCE_INIT, and passes it to the pthread_once routine.

    The first time pthread_once is called with a given once_control argument, it calls init_routine with no argument and changes the value of the once_control variable to record that initialization has been performed. Subsequent calls to pthread_once with the same once_control argument do nothing. pthread_once always returns 0.

    The default single-threaded implementation need not lock accesses to once_control, whereas overriding versions in multi-threading libraries presumably will.

    Thread-Private Data Key Management
        typedef ... pthread_key_t;
    
        int pthread_key_create ( pthread_key_t *key,
    			     void (*destr_function) (void *) );
        int pthread_setspecific ( pthread_key_t key,
    			      const void *pointer );
        void * pthread_getspecific ( pthread_key_t key );
    

    The purpose of this functionality is to allow a program to manage data segments which are specific to a particular thread, but are identified by a key common to all threads. It is required in the C++ exception handling library, for example, to maintain thread-specific active exception lists.

    The user program must first create a key variable of type pthread_key_t. It then obtains an identifying key value from the implementation by calling pthread_key_create, also specifying at that time a destructor routine that will be called if a thread terminates, with a single argument that is the value associated with the key for the terminating thread. This destructor call is only made if the associated value is not NULL, and it is set to NULL before making the call.

    If successful, pthread_key_create returns zero, places the value of the key identifier in *key, and initializes the value associated with the key to NULL for all threads. If unsuccessful, e.g. exceeding the number of allocated keys, it returns an error code.

    A user thread may then associate a value with the key, typically the address of a thread-specific data area, by calling pthread_setspecific. If successful, pthread_setspecific returns zero. If unsuccessful, e.g. because of an invalid key identifier, it returns an error code.

    Later, a thread can obtain the value it has associated with the key by calling pthread_getspecific, which returns the value associated with key on success, and NULL on error.


    [000203 All] It turns out that some (but not all) Unix implementations provide stubs for some of the pthreads routines in libc or equivalent that, rather than implementing a simplified form of the functionality, return an error code indicating that pthreads is not loaded. A specification such as the above would therefore cause compatibility problems.

    These functions are only used in the exception handling library at Level 3, i.e. they are part of the interface between the system-specific implementation and other system-provided libraries, and do not involve interfaces to either compiled code or other components not under control of the system vendor. Therefore, no specification is needed.

    # Issue Class Status Source Opened Closed
    D-12 Table location lib ps closed all 991216 000504
    Summary: Determine constraints on the location of the unwind table and the unwind information table.
    Resolution: The unwind tables must reside in the text segment they describe.

    [991216 SGI -- Jim] The unwind table consists of triples: a begin and end location bounding the code fragment described by the unwind descriptors, and the location of the unwind information for this fragment. The base psABI states that these are segment-relative offsets, to avoid the need to relocate them at runtime. It also specifies a section type and name for the unwind table, with attribute SHF_ALLOC (but not writable), as well as a segment type, but does not specify the unwind information table section information.

    The psABI specification leaves open the question of how to identify the relevant segments for the unwind table segment-relative entries. There are several possibilities:


    [000120 HP -- Cary] The first bullet you listed is the intended method. Both the unwind table and the unwind info blocks are intended to be in the same segment as the text with which they're associated. Thus, any segment-relative addresses in those tables are understood to refer to locations in the same segment.

    To overcome any limitations that placing info blocks in text might impose, we designed the LTV family of relocations, which allows a link-time virtual address to be placed in an info block without requiring a dynamic relocation; the consumer is expected to be able to calculate from context what segment the LTV address refers to so it can relocate the address on the fly. We also have the LTOFF_FPTR family of relocations, which is needed to identify the personality routine as a gp-relative offset to a linkage table entry that contains the function pointer.

    The advantages to this scheme are that there are no dynamic relocations for any unwind information (except function pointers in the GOT created by LTOFF_FPTR), and that the unwind information does not cause any increase in the application's per-process data space.

    In order to unwind correctly, it's important that there is a one-to-one relationship between text segments and unwind tables. The dynamic loader needs to keep track of these relationships, so that the unwinder can find the appropriate unwind table, given a pc value.

    Instead of a table of triples, there is a PT_UNWIND program header table entry that locates the unwind information for a load module; this entry is intended to refer to a subset of the text segment. It's through this entry that the dynamic loader finds the unwind table.


    [991224 SGI -- Jim] My concern with this comes from the possibility of generating multiple text segments. In such a case, if an implementation wants to put the unwind information in a separate segment from text, there's no longer a trivial way to find the associated text segments for fixup. And although I have no objection to putting these in text today for C++, I'm concerned that a future requirement for C++ or some other language might make it desirable to put them in data. If there's a simple way of making this work, I'd like to pursue it.


    [000126 HP -- Cary]

    Re. multiple text segments...

    Our position is that we would only need more than one text segment in a single load module where we need to establish different access permissions for some text pages than for others. In such a case, we consider them to be separate -- but contiguous -- text segments from the loader's point of view, and a single text segment from the unwinder's point of view. Therefore, we still need only one unwind table per load module.

    This points out the hazy definition of "segment" and "program header table entry" in the ELF specification. Some program header table entries describe segments that are disjoint from all other segments, while others (like PT_DYNAMIC and PT_UNWIND) describe "sub-segments" that are really part of another segment.

    Re. unwind tables in data...

    The performance bigots here would *never* let me put the unwind tables in the data segment. Nevertheless, if some language-specific data really needs to be in data, it can be arranged by putting "LTV" pointers in the language-specific data that point to an auxiliary block of info in the data segment. A much earlier version of our C++ exception handling tables in fact did just that.

    ("LTV" pointers are "link-time virtual" addresses. At link time, an LTV relocation works just like the corresponding DIR relocation, except that no dynamic relocation is generated, so the associated word can be placed in a read-only segment. The consumer of that pointer must, at run time, figure out what segment the link-time virtual address refers to and apply the appropriate relocation factor to the address. The required information can be obtained from the dynamic loader. Note that this scheme requires that the linker-assigned addresses for all of the loadable segments do not overlap.)

    [Jim] Does the ABI require that the segment table be allocated? Easy to find?

    No, but the dynamic loader does have access to it. When we need to find an unwind table, we ask the dynamic loader: given a pc value, its dlmodinfo() entry point locates the load module containing that text segment, and returns a struct load_module_desc, which contains, among other things, a pointer to the unwind table for that load module.


    [991226 SGI -- Jim] An observation, then: in order to make this work, we should specify how to obtain this information in the psABI, unless dlmodinfo() is already standard.


    [000203 All] To understand this issue better, we worked through the EH structures looking at references:

    1. The unwind table is in its own segment, assumed by the HP implementation to be an overlay of .text. They find it, and the associated .text segment by a query to the dynamic linker based on the IP address to be located. They also use linker-defined symbols for the base addresses of text and the unwind table, which of course depend on only having one of each.

      It contains references:

      • Start and end addresses of the text fragment which the entry describes, as an offset from the text base address.
      • Address of the unwind information entry for the fragment, as an offset from a segment assumed to be .text.

    2. The unwind info table is assumed by the HP implementation to be contained in .text, and is referenced from the unwind table via .text-segment-relative offsets.

      It contains references:

      • Unwind descriptor references are relative to the text fragment.
      • A landing pad start pointer (LPStart), at the beginning of the language-specific data area (LSDA), represented as an offset relative to its own location, and therefore assuming that the LSDA is in the same segment as the landing pad if runtime relocation is to be avoided.
      • A type table base pointer, represented as an offset relative to its own location, and therefore assuming that the LSDA is in the same segment as the type table if runtime relocation is to be avoided.
      • Other references, to call sites and landing pads, are represented as offsets relative to the address contained in LPStart.

    3. The type table is assumed by the HP implementation to be contained in .text, and is referenced from the unwind info table via self-relative offsets.

      It contains references:

      • To RTTI records, relative to the GP for the current text fragment. Note that this does not allow for address-only RTTI comparisons, since it does not support preemption. HP uses the RTTI pointer along with another identifier for comparisons.


    [000323 Inprise -- Eli] I'd like it if we could avoid imposing data structures on the language implementations where possible. I'd particularly like to avoid this in the area of exception handling, as this is a place where different languages need to cohabitate in the process space. That's partly why I was happy to see the functional interface in the C++ exception handling doc that you folks did. My problems with the existing gcc mechanism revolve around the total commitment requirement to the gcc data format, which prevents me from even throwing exceptions past gcc frames without dying unless I fully conform to their data format.

    The updated proposal seems to handle most of my concerns, but I'd still like to see the PC map hidden, so that language implementors can do as they see fit with this. To that end, I'd like to toss out the following additions. Note that these are tentative, based on my fiddling with it just a bit for the past day or so. I'm going to do a prototype to see how it holds together.

    I would like to see the unwind tables registered with the _Unwind library, and referenced only through callbacks, like this:

    typedef __personality_routine
    	(*_Unwind_IPLookupFn) (uint64 IP, void **pImplementationData);
    
    int _Unwind_RegisterIPLookup
    	(_Unwind_IPLookupFn LookupFn, uint64 StartAddr, uint64 EndAddr);
    
    void _Unwind_UnregisterIPLookup (_Unwind_IPLookupFn LookupFn);
    
    

    The first function takes the address of a lookup function which returns a personality and pointer to implementation specific data based on an IP. Start and end addresses are made available so that the _Unwind library can optimize calls to these routines. When an exception is raised, the _Unwind library looks up the current IP by calling these registered procedures. The need for something like this was implied in the Intel Software Conventions and Runtime Architecture Guide, Chap 11 (SCRAG is what I'll call it). Section 11.1.2 says that the dynamic loader needs to provide an API for finding the unwind table. I've just changed the 'ownership' of the data a bit.

    The second function lets you uninstall a lookup function. That's for when you're unloading, and you don't want to leave bad fn pointers floating. Yes, the RTL for the language does have to cooperate, or things can go south a considerable time after a module unloads.

    The personality routine as it is stated in the C++ ABI doesn't have the implementation specific data passed to it. I'd like to add that:

    typedef _Unwind_Reason_Code(*__personality_routine)
    	( int version,
    	  _Unwind_Action actions,
    	  uint64 exceptionClass,
    	  _Unwind_Exception *exceptionObject,
    	  _Unwind_Context *context,
    	  void *ImplementationData );
    
    

    The ImplementationData parameter is the item that is returned by the lookup function that resolves the personality for a given IP.

    Given these changes, the format of most of the unwind data in chapter 11 of the SCRAG becomes mostly advisory (the frame info was already made so by the current document). Chapter 11 could essentially become an appendix implementation that could be used by implementors if they chose, but not forced on them. The other thing that I like about the lookup registering is that it allows implementors to innovate with respect to fast lookup schemes within a loadable module. The current scheme allows for no innovation whatever. I'd prefer that the implementors be left with the option to build as fancy or as simple a scheme for lookups and frame decomposition as possible, depending on the needs of the language.


    [000406 All] There was some discussion of Eli's suggestion, centered on the observation that registration might be useful for situations like Java run-time compilation, where the unwind tables (nor the text referenced) do not exist at startup time. We agreed to go off and consider how we intended to deal with that situation.

    Cary Coutant mentioned in a private conversation that he expects this to be handled by having the Java compiler (for example) register additional unwind tables with the dynamic linker. Since the HP implementation gets the table locations from the dynamic linker, this makes the additions transparent to the unwind library.


    [000406 HP -- Christophe] An interesting observation was raised at todays C++ ABI meeting. Can we dynamically generate unwind tables for instance from a JIT? We are back to the question of whether the IP->UnwindInfo translation can be done just by looking up tables, or whether we need an API to do it.

    I had a discussion with Laurent Morichetti a few minutes ago. It is unclear at that point whether their unwinding would be based on the unwind library at all (there are alternatives, such as encoding unwind information themselves). But assuming they want to leverage all the code that deals with the RSE and all that magic, they need to have a way to be compatible with the unwind library.

    Today, the unwind library uses dlmodinfo to find the start of the code segment for the current IP (and a predefined symbol in the case of archive-bound executables). From there, it can find the start of the unwind table, and from there do a binary search on the IP to find the unwind info block.

    The JVM could be compatible with this black magic by having a way to tell dld what to return for the newly created code segment. I don't think there is a public dld interface to do that, and it creates a rather obscure and difficult to document dependency between the JVM, the unwind library and dld.

    Alternatively, we could have a couple of APIs to do IP->UnwindInfo translation, and to register a new range of text and provide the corresponding unwind info pointer. In that scheme, the actual location of the unwind table would become irrelevant.

    Also note that in addition to Java support, an implementation of Dynamo for IA64 would probably have a similar problem.


    [000502 SGI -- Jim] Unfortunately, though I'm not real happy with forcing the unwind tables into the text segment being described, and believe that we could avoid that restriction without significant complications, I think the current scheme is workable for mainstream systems, and I suspect that changing it at this point will encounter more resistance than we can overcome. So without a groundswell of support for a more general scheme, we should probably close this with the current approach.


    [000504 All] Agreed as suggested. That is, the unwind table and descriptors are to be generated in the same text segment as the code to which they refer. The dynamic linker (ld.so) can find it via the PT_IA_64_UNWIND program header entry, and should provide an internal implementation-defined interface to the unwind library to map a PC to the associated unwind table, which is outside the scope of this C++ ABI.

    To deal with applications that create code and unwind information dynamically (e.g. Java JITs), the base ABI should define an interface by which the application can register a new code/unwind data pair with ld.so. This issue has been submitted to the psABI group.

    # Issue Class Status Source Opened Closed
    D-13 _Unwind_ForcedUnwind lib ps closed all 991216 000120
    Summary: Define the interface of _Unwind_ForcedUnwind.
    Resolution: See the exception handling specification.

    [000106 All] Coleen will send a description of their thread cancellation mechanism.

    [000120 All] Close with minor modifications. Christophe will send a thread cancellation example writeup.

    # Issue Class Status Source Opened Closed
    D-14 __cxa_begin/end_catch lib closed all 991216 001109
    Summary: Define the interfaces of __cxa_begin_catch and __cxa_end_catch.
    Resolution: See the exception handling specification.

    [991216 All] Define how __cxa_begin_catch and __cxa_end_catch identify the thrown exception.

    [991216 Compaq - Coleen] If you need to clean up more than one live exception from a catch handler, don't you need a 'count' parameter to __cxa_end_catch? In this case, you destroy both X and Y objects (whether or not they're both on the stack, or just X is).

    Our equivalent of end_catch has a count parameter which is set to the number of live exception objects to delete and is used for branching out of the nested catch clause (not by rethrow).

    struct X {
       X(); ~X(); };
    struct Y {
       Y(); ~Y(); };
    extern "C" int printf(const char *,...);
    main()
    {
      try {
        throw X();
      } catch (X x) {
        try {
            throw Y();
        } catch(...) {
            //generates __cxa_end_catch(/*levels=*/2)
            return 1;
        }
      }
    }
    


    [991217 HP -- Christophe] The reason __cxa_end_catch does not need the exception argument is that the exceptions it is interested in are in the "caught stack". When you rethrow, the exception you rethrow is also on this caught stack (it is indeed the top of the stack). So you don't need a separate copy or argument.

    All you need is a flag set by __rethrow, saying "this top exception is the one being just rethrown". In that case, when __end_catch finds that the exception exits its last catch block, it will not delete it. Instead, the exception will just be popped from the stack. As a result, the exception being rethrown remains on the caught stack until you exit the last catch that caught it, and then becomes referred to only through the exception object passed in the runtime (that is, it becomes similar to a new exception being thrown: it does not appear in the caught stack.) This is the "stack + 1" model I mentioned...

    __begin_catch clears the flag, in case you catch the rethrown exception before exiting the last catch handler.

    This mechanism is actually correctly specified in the description of __cxa_end_catch (see in particular the last bullet):

    Upon exit from the handler by any means, the epilogue calls __cxa_end_catch(), which:

    What is unclear, though, is the fact that __rethrow needs to pass a flag to __end_catch for that purpose, and also that the flag is stored in the high bit of the handlerCount (which is why it did not appear in the specification...).


    [000112 editor] Does this mean that the specification on pg. 16 of the HP document is the desired definition?

    [000126 editor] The answer to the above question is yes. This issue is effectively closed, but I will not close it officially until the working paper reflects the clarifications in the email discussion.

    [001109 Editor] These routines are specified adequately in the Exception ABI document.

    # Issue Class Status Source Opened Closed
    D-15 Terminate handler and threads lib ps closed all 991216 000106
    Summary: Define how the terminate and unexpected handler registration interacts with threads.
    Resolution: Handler registration applies to all threads.

    [991216 All] C++ allows the user to register terminate() and unexpected() handlers, but does not specify how the registration interacts with threading. There are (at least) three possibilities:

    Several members believe the second choice (per-thread) would be very surprising to many users and is therefore a highly undesirable default.

    [000106 All] Handler registration is global, applying to all threads. It is observed that the global handler can be programmed to do thread-specific processing, e.g. by keying off a per-thread datum, but that many users would find it very surprising if the registration only worked for the calling thread.

    # Issue Class Status Source Opened Closed
    D-16 Exception specifications lib ps closed all 991216 000113
    Summary: How is the type list for an exception specification represented in the action records?
    Resolution: As specified in the HP document

    [991216 All] The working paper specifies this, but HP wishes to propose a different representation.

    [000106 All] Christophe believes the submitted version may actually be the desired one. He will attempt to determine this, and others should look at it closely to determine whether it has a large combinatorial impact on the compiler.

    [000113 All] Noone has identified a problem with the proposal in the HP document. Close this issue, and it can be reopened if a problem surfaces.

    # Issue Class Status Source Opened Closed
    D-17 bad_cast, bad_typeid runtime call closed CodeSourcery 000629 000706
    Summary: Define runtime support routines for throwing bad_cast and bad_typeid exceptions.
    Resolution: Accepted as proposed originally. See draft EH Specification.

    [000629 CodeSourcery -- Mark] Both EDG and G++ call run-time library routines to throw the bad_cast and bad_typeid exceptions, rather than trying to expand the throws inline. This is much more convenient since those exceptions can be thrown without the headers declaring bad_cast being included. I think we should follow this existing practice and provide appropriate entry points. How about:

      extern "C" void __cxa_bad_cast ();
      extern "C" void __cxa_bad_typeid ();
    

    [000629 CodeSourcery -- Nathan] FYI, the G++ declarations are

            extern "C" void *__throw_bad_cast ();
            extern "C" std::type_info const &__throw_bad_typeid ();
    
    Of course these never actually return, but it causes least confusion at the calling point by keeping the type system consistent. These are called with something like the following pseudo C++ for dynamic_cast (lvalue)
            (void *tmp = __dynamic_cast (...),
                    *(T*)(tmp ? tmp : __throw_bad_cast ()))
    
    for typeid (*ptr):
            (ptr ? *(type_info const *)ptr->vtable[-1] : __throw_bad_typeid ())
    

    One side of a conditional expr can be void, but only if it is a throw expression, wrapping up the throws in function calls hides that, and in g++'s case caused problems. The easiest solution was the above declarations.

    I suggest the following:

            extern "C" void *__cxa_bad_cast ();
            extern "C" const void *__cxa_bad_typeid ();
    
    That typeid signature will mean a little reworking of the typeid operator implementation for G++, but not too much. For implementations where Mark's suggestion is valid, these will be too, but not vice-versa.

    [000629 CodeSourcery -- Mark] That's a reasonable suggestion, too. With a `void' return, you can always do:

      (__cxa_bad_cast (), (void*) NULL)
    
    or whatever, in the compiler, to make the arms of the conditional have the right type.

    [000706 All] Accepted as originally proposed by Mark, without return types. The decision is intended to not burden the routines with dummy returns, since callers with ?: operators can use casts to achieve the desired result.

    # Issue Class Status Source Opened Closed
    D-18 __cxa_throw_type_info lib closed all 001012 001109
    Summary: Should we replace the __cxa_throw_type_info pointer in the exception object by a pair of pointers to a std::type_info and a destructor?
    Resolution: Make the replacement. See Sections 2.2.1 and 2.4.3 of the See draft EH Specification.

    [001012 all] Making this type be a pair (type_info and destructor pointers) makes it necessary that a thrower or __cxa_throw construct one so that the exception object can point to it. This can't be done on the stack, since it's about to be unwound, and doing it on the heap when the exception might be out-of-memory doesn't seem ideal.

    We propose that instead, we replace the __cxa_throw_type_info pointer in the exception object header by separate std::type_info and destructor pointers, and pass them as two parameters to __cxa_throw.

    We also noticed that, if the thrown object is an array, the destructor passed will need to be a fabricated one which loops over the array elements. The alternative, to store the array bounds explicitly in the exception object, seems to be a lot of overhead for a very rare case.

    [001109 all] The interface change will be made.


    E. Template Instantiation Model Issues

    E-1 When does instantiation occur? tools closed SGI 990520 000511
    Summary: There are two principal models for instantiation. The early instantiation (or Borland) model performs all instantiation at compile time, potentially resulting in extra copies which are removed at link time. The pre-link instantiation model identifies the required instantiations prior to linking and instantiates them via a special compile step.
    Resolution: Non-export templates are instantiated where referenced in COMDAT groups. See the Draft C++ ABI for IA-64.


    [000511 All] Non-export templates are instantiated where referenced in COMDAT groups. We will not deal with export templates at this time (E-2).

    # Issue Class Status Source Opened Closed
    E Template Instantiation Model
    E-1 When does instantiation occur? tools closed SGI 990520 000511
    Summary: There are two principal models for instantiation. The early instantiation (or Borland) model performs all instantiation at compile time, potentially resulting in extra copies which are removed at link time. The pre-link instantiation model identifies the required instantiations prior to linking and instantiates them via a special compile step.
    Resolution: Non-export templates are instantiated where referenced in COMDAT groups. See the Draft C++ ABI for IA-64.

    # Issue Class Status Source Opened Closed
    E-3 Template repository tools closed HP 990603 000511
    Summary: Independent of the template instantiation model, we need to make sure that whatever template persistent storage is used by one vendor does not interact negatively with other vendors' mechanisms. Issues: (1) Avoiding conflict on the name of any repository. (2) If .o files are used, describe how this information is to be preserved, ignored, etc. (3) Evaluate if tools such as make, ld, ar, or others, can break because .o files get written at unexpected times.
    Resolution: COMDAT emission and naming for non-export templates is specified in the Draft C++ ABI for IA-64.


    [000511 All] Treatment is specified now for non-export templates; We will not deal with export templates at this time, given no existing implementations to serve as models.


    F. Name Mangling Issues

    # Issue Class Status Source Opened Closed
    F-1 Mangling convention call closed SGI 990520 000330
    Summary: What rules shall be used for mangling names, i.e. for encoding the information other than the source-level object name necessary to resolve overloading?
    Resolution: See the Draft C++ ABI for IA-64.

    [991019/28 various] The following is assembled from several mail messages on the subject.

    Objectives of the mangling scheme include:

    • Compression: It is critical that name length be minimized (issue F-2).

    • Character set: Names should use a character set that does not cause problems in linkers (easy for Elf) or in assemblers (more problematic). This probably implies use of ~64 characters.

    • Legibility: It is desirable that the base name (i.e. the function or class name) be present and easy to identify (for readability). Other components of the name probably need to be difficult in order to attain compression (issue F-2).

    • Cfront: It is desirable that the names not be confusable with cfront manglings, to avoid apparent but incomplete compatibility with cfront-compiled objects.

    Entities with linkable names to be resolved include:

    • Global and member operator names
    • Global and member function names
    • Alternate versions of constructors/destructors.
    • Namespace scope variables
    • Static local variables
    • Static data members
    • Virtual function table names (primary and initialization)
    • RTTI structures (std::type_info derivations)
    • Template instances of the above
    • Namespace effects on the above (including anonymous namespaces)
    • Possibly string constants

    For entities with C name linkage, the entity's linkable name is identical to its base name (as usual).

    Note that linkable names include not only names with C++ global scope semantics, but also "local" names which for some reason end up requiring linker resolution (e.g. static local variables declared in inline functions). Note also that inlining requirements apply equally to functions declared inline and those chosen to be inlined by the compiler.

    Name decomposition for function-like entities:

    For function-like entities with C++ name linkage, the following components MUST be part of the of the name:

    • encoding of the base name (presumably, the base name itself)
    • encoding of the declarative scope (classes and namespaces), when applicable
    • encoding of each parameter type (with known positions)
    • encoding of each template argument and the parameter with which it is associated, when applicable

    [ For the last item, consider:

        template void f(T1, T2);
        template void f(T2, T1);
    
    The encoding of each of these templates instantiated for should be distinct. ]

    In addition, it may be desirable to encode the following components:

    • the function's return type
    • the function's exception specifications
    (Combined with the parameter types, this encodes the type of the function. Note that even though exception specifications are not considered part of the function type in the C++ standard; they actually are.)

    Name decomposition for data entities:

    Namespace scope variables and static data members have linkable names that must include at least:

    • encoding of the base name (presumably, the base name itself)
    • encoding of the declarative scope (classes and namespaces), when applicable
    In addition, it may be desirable to encode:
    • the variable's type (possibly including exception specifications)

    Note that although there are benefits to encoding array size, and therefore being able to catch mismatches, the ability to declare a[] makes this problematic.

    Fundamental types and type operators:

    fundamental types:

    • void
    • [signed|unsigned] {
    • char, short, int, long} (long long?, int_t)
    • bool
    • float, double, (long double?)
    • wchar_t
    • ellipsis (not strictly a type)
    • complex

    type modifiers/constructors:

    • const, volatile (restrict?)
    • array (with size?) of type
    • pointer to type
    • reference to type
    • function expecting type*, returning type
    • pointer to member function of type, expecting type*, returning type
    • pointer to member of type T, having type U (i.e. "U T::*")

    The types in parentheses are available in C99, but not in standard C++.


    [991021 all] It was observed in the meeting that it might be better to deal with non-essential type information (e.g. exception specifications, array sizes) as a separate construct to allow error detection, rather than as a required part of the mangled name. This allows it to be elided or removed if unneeded.

    [991028 all] Objectives of a specification were discussed, and have been added to the writeup above.

    [000127 IBM -- Mark] [Ed.]: Mark raises the issue of how template expression parameters are mangled. The Standard requires that equivalent expressions be identified, but not all functionally equivalent ones. The relevant paragraph is 14.5.5.1. Don't lose this issue.

    [000127 All] Notes from the meeting:

    • One prefix should be enough, say _Z (General Structure). This would facilitate future revisions, which could be indicated by changing the prefix.

    • Compression should address more than just types, e.g. other names.

    • A number of pre-compressed abbreviations should be defined, e.g. for std, string, allocator, etc.

    • String constants and static variables in inlined functions can be handled by using the function mangling plus a sequence ID.


    [000210 All -- Matt] Notes from the meeting:

    We have agreed that local statics and local classes must be mangled. We agreed that string literals should also be mangled even if linker features might make it unnecessary. The motivation is a desire to support less capable linkers on other platforms.

    For local statics and local classes, the mangled name consists of the mangled function name, a sequence number, and the name of the local class/varaible. For string literals the mangled name consists only of the mangled function name and the sequence number.

    (There was concern that this might prevent merging of identical string literals. Jason believes that given a smart linker it will just result in multiple names for the same string literal.)

    Sequence numbers are assigned in lexical order within a function, starting at 1. The entities that receive sequence numbers are local static variables, local classes, and string literals. Other entities (e.g. automatic variables) do not receive or affect sequence numbers.

    Exception specification information must be part of the mangled name of a function.

    Special entities that need to receive mangled names, in addition to those mentioned in Daveed's document:

    • Vtables (which should contain mangled name of complete type)
    • Construction vtables probably do not require mangled names, and table of vtables probably doesn't either. Daveed will reserve prefixes for them just in case.
    • If we are using the comdat proxy method for class typeinfos, then both class typeinfo objects and the comdat proxies must be given mangled names. We must ensure that the comdat proxy for an incomplete class is the same as the one for a complete class with the same name, and we must ensure that the typeinfo object for an incomplete class is different from that of any complete class. (Either that, or make all typeinfo objects for incomplete classes static.)

    Exported template may require other things to be mangled. We don't have a detailed analysis.

    We discussed the idea of having a small dictionary of well known names, so that mangled names could be shorter. Jason was concerned with readability of mangled names if we had too many things in this dictionary, and Daveed was concerned that a large dictionary wouldn't give enough of a space savings because an index would take too many bits. If we have such a dictionary it will have very few names in it. Some obvious candidates are:

    
      std
      std::char_traits
      std::allocator
      std::basic_string, std::allocator >
    


    [000215 HU-Berlin -- Martin] (Re: sequence numbers for statics in inline functions.)

    The C99 standard defines an implicit variable inside of each function:

    static const char __func__[]="function-name";
    Even though this is not part of standard C++, it is likely that C++ compilers will support this if the 'corresponding' C compiler supports it. If so, it might be useful to support it in the ABI.

    Proposal: The sequence number of __func__ is 0.

    Of course, there is always discussion what the value of __func__ is in C++ context; I think this does not necessarily need to be defined by the ABI (or the question whether __func__ is defined at all - if it is not used in a function, it does not matter).


    [000217 Editor] Note that the current mangling proposal is now part of the Draft C++ ABI for IA-64.


    [000308 All] Several loose ends were discussed (primarily vtable-related). Jason will do a YACC description to check for ambiguity.

    [000313 SGI -- Jim] I have reworked the description in the Draft C++ ABI for IA-64, to get a more precise grammar description, and to incorporate the loose ends decisions from the meeting and proposals for a few more.

    [000316 All] Extensive discussions in the meeting, reflected in the updated Draft C++ ABI for IA-64.

    [000323 All] Extensive discussions in the meeting, reflected in the updated Draft C++ ABI for IA-64. The principal decisions were:

    • CV-qualifiers must be ordered.
    • Substitution numbering starts from zero, and is base 36 using upper-case letters.
    • Use "S_" for repeated substitutions.
    • Substitution candidates are added to dictionary only once.

    [000330 All] Change virtual thunk mangling to encode static offset to nearest virtual derived class. Encode single void parameter type for parameterless functions, to facilitate demangling distinction from data objects. Use object name for named entities, hash for strings, in mangling local names, to minimize implementation mistakes.

    # Issue Class Status Source Opened Closed
    F-2 Mangled name size call g closed SGI 990520 000511
    Summary: Typical name mangling schemes to date typically begin to produce very long names. SGI routinely encounters multi-kilobyte names, and increasing usage of namespaces and templates will make them worse. This has a negative impact on object file size, and on linker speed.

    SGI has considered solutions to this problem including modified string tables and/or symbol tables to eliminate redundancy. Cygnus, HP, and Sun have also considered or implemented approaches which at least mitigate it.

    Resolution: The current mangling solution is considered an adequate solution to this problem.

    [991028 all] Cygnus and Sun use a mangling scheme which has proven extremely effective at compression, but not overly complex. Each time the mangler incorporates a type into a name, it remembers it and assigns it a number, and subsequent occurrences of the type in the name are replaced by the (escaped) number. Jason believes this might be adequate compression, without going to large character sets or more complex schemes.

    [991115 SCO -- Jonathon] In a discussion with Matt Austern I suggested using a collision-resistant hash function on the manglings to generate the names actually used in object files. (The algorithm is: first mangle, then hash.) This could really reduce .o size a ton; think expression templates, etc. I bet this would have a much bigger impact that any obvious compression algorithm; you could just decree that all symbols be no longer than 256 bits long, say. Lots of tools (assemblers, debuggers) will use less space/time dealing with the shorter names. You would keep around a table mapping hashes back to the original mangled names for debugging.

    An interesting twist on this would be to use a secure hash with a key. For ordinary compilation, use some well-known key. But, by setting some flag/environment-variable you could tell the compiler to use a key of your choice. You can now distribute a .o that is hard to link to -- unless you know the key.

    <After a request for clarification...>

    A collision-resistant hash function is a notion from cryptography. (That's the world I spend a lot of my time in when I'm not doing compiler stuff.)

    Suppose you have an n-bit hash, so you have 2^n hash values. A collision-resistant hash is one where the probability of two randomly chosen strings hashing to the same value is (very close to) 1/(2^n). A stronger notion of this is that finding strings that collide is computationally infeasible.

    Certainly, hashing introduces a probabilistic nature to things: it becomes possible that two different functions could hash to the same hash-mangled name. However, by choosing a good hash function (and provably good ones exist) and enough bits, you can make it considerably less likely that in the next hundred years two distinct functions will hash to the same name, than that cosmic rays will cause unpredicatable linker errors.

    ... this (the assumption that mangling is reversible, as the basis for such things as the c++filt tool) is the biggest objection I can think of.

    We originally came up with this idea for our C++-to-C translator. We ship this to people with embedded systems whose linkers only support 16-characters; by using a collision-resistant hash they can use C++. Nobody has ever run into a collision. We solved the c++-filt problem by keeping a database mapping hashes back to mangled names. (The probabilistic guarantee says that this database can actually be global; in our lifetime will never see two things with the same hash.) So, it's still possible to make a c++-filt that works, but it is admittedly more difficult.

    The biggest advantage to this scheme is that you can put an upper bound on symbol lengths, even if the presence of truly huge template usage. (I've seen programs where mangled names approached a megabyte in length.) I would only suggest hashing long names; names under 100 characters, or even a thousand characters, say, could be left unhashed.


    [000504 All] Alex Samuels has mangling almost done, and will provide data on before/after sizes of library symbols.


    [000511 HU-Berlin -- Martin] I finally managed to remangle the set of names that Matt Austern kindly provided. Please take my results with a grain of salt:

    • I just finished the script that remangles the names, it probably still has some errors.

    • I currently don't have a demangler that works with the latest mangling scheme, to verify my results

    • I've started with the pretty-printed (demangled) list of names as input, not with the original EDG-mangled names, because I did not want to invest time in understanding that scheme.

      As a result, some of these names come out wrong. In particular, if template parameters appear in the signature, I use the substituted parameters instead of the formal ones (i.e. I never use ). Also, for the same reason, I never put the return type into template functions.

      I've produced a table showing how the size of EDG-mangled names relates to the new names. For each length of an old name, it shows how often a certain new length appeared. E.g. for

      89 : 71(2x) 72(5x)

      there were a total of 7 names with 89 characters in Matt's list. Under the new mangling, 2 of them are now 71 characters, and 5 are 72 characters in size.

      In general, all names under the new mangling are shorter than under EGG's mangling, with a single exception (listed on top). For short names (<80char), size reduction is small, unless one of the predefined dictionary entries is used. For longer names (>200 chars), compression under the new ABI is about 50% better than under the EDG scheme.

      If you find errors in my implementation that could be corrected from looking at the demangled names, please let me know; I can then produce corrected statistics.

      
      51 : 43(18x) 44(10x) 27
      52 : 45(30x) 44(7x) 43(8x) 50(6x)
      53 : 47 46(12x) 45(18x) 44(8x) 51(2x) 50(8x)
      54 : 47(32x) 46(10x) 45(2x) 53 48
      55 : 47(19x) 46(16x) 53 41 48(21x)
      56 : 47 48(12x)
      57 : 55 44 51 50(10x) 48(4x)
      58 : 38 50(7x) 56
      59 : 47(2x)
      60 : 47 38 51(8x) 59
      61 : 55
      62 : 54 53(16x) 50 65 INCREASED 56
      63 : 51(2x)
      64 : 63 52(2x)
      65 : 54(2x) 44 50(2x) 52
      66 : 55(3x) 65
      67 : 57 56
      68 : 49 11 58(2x) 57(2x) 56
      69 : 47(6x) 12(3x) 59 58(3x) 57 55(4x) 50(4x) 9
      70 : 13(2x) 60(2x) 51(3x) 56(2x) 48(3x)
      71 : 14(4x) 52(3x) 59(2x) 57 56
      72 : 15 14 53(2x) 60(2x) 57
      73 : 63 62(2x) 58(2x) 54(7x) 53(2x) 15
      74 : 59(6x) 55(6x) 54(3x) 69 66 64(2x)
      75 : 63 60(3x) 57(2x) 56(4x) 70 18(2x)
      76 : 63 62(2x) 61(2x) 58 57(5x) 55(2x) 64
      77 : 59(2x) 62(4x) 66
      78 : 63(2x) 68 57 66(2x) 60(2x) 64(2x)
      79 : 78 61(3x) 62 67(2x) 65(2x)
      80 : 63(2x) 62(11x) 69(2x) 66
      81 : 63(3x) 62(2x) 61 58(2x) 23(8x) 54 64(2x)
      82 : 23(4x) 69 68 26(2x) 64 24
      83 : 71 78 69 27 66(4x) 65(2x)
      84 : 55(3x) 73 67(4x) 66 65(2x)
      85 : 63(2x) 69(4x) 65(2x)
      86 : 68(8x)
      87 : 65
      88 : 70(2x)
      89 : 71(2x) 74 73(4x)
      90 : 68 75 74(2x) 73 72(2x)
      91 : 24 74 73(2x) 64
      92 : 77(2x) 76(6x)
      93 : 78(2x) 77(4x) 76(2x) 11 41
      94 : 79(4x) 77(2x) 80(2x)
      95 : 79(2x) 65(2x)
      96 : 75(2x) 73
      97 : 67 68(4x)
      98 : 14 69(4x) 84 83(2x) 56
      99 : 15(4x) 45(2x) 60 27(3x) 83 70(4x) 67
      100 : 68(3x)
      101 : 17 68(4x) 59 82(2x) 19 49
      102 : 63(2x) 70 60(2x) 17
      103 : 71(2x) 70(2x) 18 64(3x)
      104 : 78(2x) 86(8x) 21 89
      105 : 86(8x) 85(2x) 67 90 64
      106 : 54 24
      107 : 91 88(2x)
      108 : 87(4x) 92
      109 : 87 74 88(2x)
      110 : 94(2x) 27(2x) 26 89(4x)
      111 : 95(2x) 28 27 73 89(4x)
      112 : 29 97(2x)
      113 : 98
      114 : 31 30(2x) 93(6x)
      115 : 31(4x) 33
      116 : 95(8x) 101
      117 : 95(8x) 103
      118 : 97
      119 : 36
      120 : 31 95
      122 : 38(2x)
      124 : 74
      125 : 109(2x)
      126 : 110 42(2x) 52
      128 : 72(2x) 77 108(2x) 44(4x) 112
      129 : 33 44(5x) 113(2x) 65 73(2x)
      130 : 47 110(4x) 45 75 115(2x) 114(2x)
      131 : 51 116 115(3x)
      132 : 47 74 56 72 53 116 82(3x) 117(3x)
      133 : 83 118 117
      134 : 119(3x) 118(3x)
      135 : 119(2x) 51 120(4x)
      136 : 50 121(3x)
      137 : 122(5x) 105
      138 : 123(4x) 106(2x)
      139 : 124(4x)
      140 : 125(5x) 65(2x)
      141 : 126(2x)
      142 : 127 110 44(2x)
      143 : 128
      146 : 94
      148 : 96
      149 : 52
      150 : 55 60
      152 : 70(2x) 122
      154 : 56
      157 : 68
      160 : 70
      162 : 55 69
      169 : 126
      171 : 130
      174 : 72(2x)
      176 : 74(2x)
      178 : 75(2x)
      180 : 78(2x)
      185 : 61
      186 : 71
      187 : 83
      188 : 71 70
      191 : 74
      192 : 75(2x) 89(8x)
      193 : 89(8x)
      194 : 97(2x) 108
      196 : 109
      197 : 101(2x)
      202 : 95(2x) 150
      215 : 106 48
      218 : 121
      220 : 106
      226 : 132
      228 : 133
      232 : 108
      234 : 111 109
      235 : 139
      240 : 116
      242 : 117
      243 : 119
      250 : 145
      251 : 143 128
      264 : 111
      267 : 163
      268 : 133
      278 : 88
      280 : 98 93(2x) 113
      282 : 132
      283 : 101 116
      285 : 151
      288 : 130
      303 : 143
      305 : 144 100
      308 : 148
      330 : 159
      333 : 133
      342 : 133 148
      347 : 177
      355 : 101
      530 : 161
      

      # Issue Class Status Source Opened Closed
      F-5 ILP32 vs. LP64 call closed HP 000210 000824
      Summary: This ABI focusses on the LP64 data model. What should we do (if anything) to support (a) compatibility between different vendors' ILP32 compilers (b) compatibility between ILP32 and LP64?
      Resolution: Withdrawn -- no action.

      [000210 All -- Matt] HP will be supporting an ilp32 model as well as as an lp64 model. The ABI only discusses an lp64 model. Do we want to support ilp32 in any way? What will we have to do to support (a) compatibility between different vendors' ilp32 compilers, or (b) compatibility between ilp32 and lp64? HP has suggested, for example, modifying the mangling scheme so that long long in ilp32 is mangled the same way as long in lp64. Is this enough to ensure ilp32/lp64 link compatibility, or would we need to make many other changes as well?

      [000217 All] The group observed that one can prevent all incorrect linkage by using a different version prefix for LP64 and ILP32 mangling. Christophe would prefer to just mangle those types that are different differently, so as not to prevent linkage when it would work. It is not clear whether mixed models are workable enough to make such a complication useful. Christophe will produce a concrete proposal to discuss once the base mangling is settled enough to base it on.

      # Issue Class Status Source Opened Closed
      F-6 Demangling lib closed Cygnus 000210 000504
      Summary: Users may sometimes want to get demangled names. Should we provide an entry point for calling a demangler?
      Resolution: Provide a simple demangler interface callable from C. See the Draft C++ ABI for IA-64.

      [000210 all -- Matt] Users have access to types' mangled names via the standard type_info class. Users may sometimes want to get demangled names. Should we provide an entry point for calling a demangler? This might be a standalone function, perhaps with an interface like that of EDG's demangle(), or it might be some kind of type_info extension. If we do this, should we attempt to specify exactly what demangled names look like, or should we explicitly leave it unspecified and warn users not to depend on the exact format?

      [000321 HU-Berlin -- Martin] Suggestion:

        namespace abi {
          std::string demangle_mangled_name (const char*);	// <mangled-name>
          std::string demangle_type (const char*);		// <type>
        }
      

      [000330 all] The problem with the suggested interface is that using std::string requires sucking in half the standard library. An alternate proposed is that the user pass in a buffer, with a NULL pointer causing the routine to allocate storage. Christophe also volunteered to send the HP interface, though it is a bit heavyweight.

      [000330 HP -- Christophe] Here is the interface HP offers today. As I said, it seems overly complicated, compared to what Matt proposed. On the plus side, it has handling of erroneous input, which I believe we need to define.

      class TDemangler {
        
      public:
        void * operator new(size_t size) {
          return (void*)malloc(size);
        }
      
        void operator delete(void *deadObject) {
          free(deadObject);
        }
      
        TDemangler();
        TDemangler(const char *mangledDecl);
        ~TDemangler();
      
        enum Status { OK, Empty, Error, Truncated };
          
        void reset();
        Status getStatus() const { return status; }
        Status demangleDecl(const char *mangledDecl);
        Status demangleType(const char *mangledType);
        Status copy(char *result, size_t maxToCopy /*including null*/) const;
        Status copy(char *result, size_t maxToCopy /*including null*/,
                    char *name, size_t nameLength) const;
          
      private:
        Status status;
        const char *p;
        const char *end;
        void partial(bool top, bool typeOfExternalDecl = false);
        void typeName(size_t &baseOffset, size_t &baseLength);
        void templateArgs();
        void writePrefix(const char *text, size_t length);
        void writeSuffix(const char *text, size_t length);
        void writeDuplicate(unsigned offset, unsigned length);
        void writeBaseName(const char *baseName, size_t baseNameLength,
                           size_t classNameOffset, size_t classNameLength);
        enum Spacing { Before, None, After };
        void writeQualifiers(const char *cv, Spacing spacing);
        size_t extractCount();
        void demangleDecl();
      
        char *buffer;
        size_t bufferSize;
        enum { InternalBufferSize = 200 };
        char internalBuffer[InternalBufferSize];
        size_t nameSize;
        size_t prefixSize;
        size_t suffixSize;
        bool spaceBeforeName;
        void makeAvailable(size_t length);
        void merge();
        static size_t min(size_t a, size_t b) { return a < b ? a : b; }
      };
      

      [000406 all] There was some discussion of the desirability of making the demangler a class member. Christophe believes it would thereby become easier to derive from it, e.g. to tailor output. Others believe it would add unnecessary complication; one particular concern is that it be callable from C. Christophe and Matt will send specific proposals.

      It was observed that Martin's suggestion of two functions is unnecessary. A name beginning with "_Z" is a <mangled-name>; otherwise it is a type name (if valid).


      [000406 SGI -- Matt] We need to return multiple return values: a status code, and a buffer pointer. We can use an extra level of indirection on one, both, or neither. If neither, we need to return a pair or the moral equivalent.

      ALTERNATIVE A

      namespace abi {
          extern "C" 
          char* __cxa_demangle ( const char* mangled_name,
      			   char* buf, size_t n,
      			   int* status );
      }
      
      

      mangled_name is a null-terminated string with the mangled name. buf is a pointer to a user-provided buffer of at least n characters. If buf is a null pointer then n is ignored, and demangle allocates its own buffer with malloc. The user is responsible for freeing it.

      If the return value is non-null, it points to a null-terminated string with the demangled name. If the return value is null, an error has occurred. *status == 0 means the demangling failed because the buffer wasn't long enough (or because malloc failed). *status == -1 means the demangling failed because mangled_name is invalid.

      Users may pass a null pointer as the last argument to __cxa_demangle. All that means is that, if the demangling fails, they won't be able to find out why.

      ALTERNATIVE B

      namespace abi {
          struct dm {
            char* name;
            enum { buffer_too_small, invalid_name } status;
          };
          dm demangle(const char* mangled_name, char* buf, size_t n);
      }
      
      

      mangled_name is a null-terminated string with the mangled name. buf is a pointer to a user-provided buffer of at least n characters. If buf is a null pointer then n is ignored, and demangle allocates its own buffer with malloc. The user is responsible for freeing it.

      If result.name is non-null, it points to a null-terminated string with the demangled name. If result.name is null, demangling has failed and result.status gives the type of failure.

      DISCUSSION

      I prefer alternative A, even though the error indication is clumsier, because it's callable from C. Having a C-callable demangling interface could come in handy, e.g. for linkers. If we decide that's unimportant, we should go with alternative B.


      [000406 HP -- Christophe]

      ALTERNATIVE C

      Interface:

      namespace abi
      {
      
      struct demangler
      {
              // Provide name to demangle
              void demangle(char *);
      protected:
              // Output demangled characters
              // I don't know whether it is better to output
              // on char or a string... It seems there are
              // many cases where the demangler can put
              // multiple chars at the same time, but they
              // are not zero-terminated (we know the length)
              virtual void output(char c);
      };
      
      }
      
      

      Implementation:

      #include <cxxabi.h>
      #include <iostream>
      
      using namespace std;
      
      void abi::demangler::output(char c)
      {
              cout << c;
      }
      
      


      [000413 All] Most members strongly prefer a C-callable interface. Discussion centered around how to handle memory allocation (user, library, re-allocatable, etc.) and whether options like gcc's (e.g. list parameters or not) are desirable. Matt will consider these and modify his proposal.


      [000427 SGI -- Matt] One thing I promised to do and didn't, though, was to come up with a revised demangler interface. Here it is. It's more complicated than I like, but the complexity does serve a real purpose. Motivation:

      • allow returning an error code
      • interface callable from C
      • allow reusing a buffer between multiple invocations
      • allow resizing a buffer, since there is no way, even in principle, to know how large a buffer to provide.
      namespace abi {
      
        char* __cxa_demangle(const char* mangled_name,
                             char* buf,
                             size_t* n,
                             int* status);
      
      }
      
      

      mangled-name is a pointer to a null-terminated array of characters.

      buf may be null. If it is non-null, then n must also be nonnull, and buf is a pointer to an array, of at least *n characters, that was allocated using malloc.

      status points to an int that's used as an error indicator. It is permitted to be null, in which case the user just doesn't get any detailed error information.

      Behavior: the return value is a pointer to a null-terminated array of characters, the demangled name. If there is an error in demangling, the return value is a null pointer. The user can examine *status to find out what kind of error it is. Meaning of error indications:

      • 0: success
      • -1: memory allocation failure
      • -2: invalid mangled name
      • -3: invalid arguments (e.g. buf nonnull and n null)

      Memory management:

      • If buf is a null pointer, __cxa_demangle allocates a new buffer with malloc. It stores the size of the buffer in *n, if n is nonnull.
      • If buf is not a null pointer, it must have been allocated with malloc. If the array turns out to be too small, __cxa_demangle may use realloc to increase its size. The new size will be stored in *n.


      [000504 All] Accept Matt's latest proposal.

      # Issue Class Status Source Opened Closed
      F-7 Mangling statics call closed HP 000223 000504
      Summary: What, if anything, should we do about mangling the names of objects in static functions in case a compiler chooses to inline them?
      Resolution: Local objects are mangled with the name of the containing function followed by a discriminator, consisting of the object name and possibly a sequence ID. Strings are mangled with a discriminator consisting of "s" followed by a sequence ID. See the Draft C++ ABI for IA-64.

      # Issue Class Status Source Opened Closed
      F-8 Identifiers with unicode letters call closed HU-Berlin 000323 000413
      Summary: How should we mangle names containing unicode letters?
      Resolution: Follow the underlying C ABI.

      [000323 HU-Berlin -- Martin] 2.2, [lex.charset]/2, allows usage of universal-character-names in C++ programs, especially in identifiers and strings. How do we mangle the variable pi below?

         namespace newmath {
            const long double \u03A0 = 3.14159265358979;
         }
      
      

      This is also an issue for C99, so it may be that the base ABI has a specification; we'd have to follow that at least for extern "C" names. If not, I propose that such names are encoded in UTF-8.

      [000405 Cygnus -- Jason] UTF-8 is inappropriate for mangled names, as it uses values > 127 to encode non-ASCII characters.

      GNU Java encodes names in UTF-8 internally. For the mangled name, if there are non-ASCII characters, it adds a 'U' to the beginning and encodes each such UCS-2 character as _%04x. See gcc/java/mangle.c.

      This assumes that all interesting characters fall within the Basic Multilingual Plane (the low 16 bits); that is a valid assumption for us, since all the extended characters valid for use in C++ identifiers are part of the BMP.

      [000411 HU-Berlin -- Martin] Why is [UTF-8] not appropriate? AFAICT, the gABI has no restriction in that respect. ch4.strtab.html says

      String table sections hold null-terminated character sequences, commonly called strings.

      I can see there are a number of alternatives. I think it is important that there is agreement on the rules, in a way that is also interoperable with C99 implementations. What those rules are is not that important.

      GNU Java encodes names in UTF-8 internally. For the mangled name, if there are non-ASCII characters, it adds a 'U' to the beginning and encodes each such UCS-2 character as _%04x. See gcc/java/mangle.c.

      In the C++ ABI, the natural adaptation of that approach would be to mangle non-ASCII-containing identifiers as _U instead of _Z, right? Unfortunately, that does not give a solution for C names. I believe the GNU Java approach also cannot be extended to C99.

      [000413 All] We need to follow the underlying C ABI. Names containing unicode letters after mangling according to our normal mangling rules will be encoded as required for external names by the C ABI.


      [000504 All] Agreed that only function and member function template parameters are mangled with T*_. Jim will go back to single nested name grammar, and include auxiliary symbols (e.g. RTTI) for builtin types.

      # Issue Class Status Source Opened Closed
      F-9 Strings with unicode letters call closed HU-Berlin 000323 000413
      Summary: How should we handle the object file representation of narrow and wide string literals containing unicode letters?
      Resolution: Follow the underlying C ABI.

      [000323 HU-Berlin -- Martin] 2.2, [lex.charset]/2, allows usage of universal-character-names in C++ programs, especially in identifiers and strings. Consider the example:

          wchar_t MvL[]=L"Martin von L\u00F6wis";
      

      First, what is sizeof(wchar_t) in the base ABI? I'll assume 4 for the moment. Then, the question comes down to: What is the execution character set, and the wide execution character set? 2.2/3 says they are implementation-defined, so I guess we must define them. Typically, people expect this to be a run-time setting (which is a reasonable assumption), but it kind-of breaks for string literals.

      Proposal: The wide execution character set is UCS-4. The execution-character-set is "as-is", i.e. bytes from the source character set are copied unmodified to the object file. Universal-character-names appearing in narrow (ie. char) strings are not portable in this ABI (the other alternatives would be to say they are Latin-1, or encoded as UTF-8, I guess).

      [000405 Cygnus -- Jason] I have been told that it is inappropriate to assume that wchar_t is always UCS-4; a suggestion was to convert from UCS-4 to the host locale character set using iconv(), and then if we're in a wide string, convert to wchar_t with mbtowc(). This makes sense to me, though of course it requires iconv to know about UCS-4.

      [000413 All] We need to follow the underlying C ABI. Strings containing unicode letters will be encoded as required by the C ABI.

      # Issue Class Status Source Opened Closed
      F-10 Mangling function return types call closed all 000330 000413
      Summary: Should we always mangle the return type of a function?
      Resolution: No. It is mangled only for template instantiations/specializations.


      [000504 All] See the comment for issue F-3.

      # Issue Class Status Source Opened Closed
      F-11 Hash for local strings call closed all 000330 000504
      Summary: How should we hash strings for local name mangling?
      Resolution: Strings are mangled with a discriminator consisting of "s" followed by a sequence ID. See the Draft C++ ABI for IA-64.

      [000406 All] One suggestion is to go back to the collision-resistant hash suggested by Mark in November in another context. The relevant source code is attached as fingerprint.h and fingerprint.c .

      [991119 CodeSourcery -- Mark] I was asked to provide a little more information on collision-free hashing algorithms. I've appended our source to do this in our C++-to-C translator. The hash function here was originally used in Modula-3; it is provably collision-resistant. This version uses 64 bits; the algorithm can be extended to any bit length, however.

      Even for 64 bits, the probabilistic guarantee (details at Compaq research) ensures that (for example), the chance of getting a collision with a thousand mangled names of length a thousand is less than one in a billion.

      At CenterLine, we used this algorithm to compute type fingerprints to detect ODR mismatches at link-time. The same trick could be used to see whether all definitions of an inline function are really the same. It's better to use a collision-resistant hash (like this one) than an ad-hoc hash because the math actually guarantees nice properties.

      Other examples of collision-free hashses are "secure hashes", i.e., those designed to resist an adversaries ability to create a text with a given hash, or to find collisions. Well-known examples include SHA and MD5.


      [000504 All] We will use the simpler scheme of the function name followed by a discriminator consisting of "s" followed by a sequence number.

      [000413 All] No. It requires more space, it can be done external to the mangling, and the group is uncomfortable with the potential breakage.


      G. Miscellaneous Issues

      # Issue Class Status Source Opened Closed
      G-1 Basic command line options tools closed HP 990603 000824
      Summary: Can we agree on basic command line options (compiler and linker) for fundamental functionality, possibly allowing portable makefiles?
      Resolution: Withdrawn -- no action.

      # Issue Class Status Source Opened Closed
      G-2 Detection of ODR violations call closed Sun 990603 000504
      Summary: [Sun] (See also F-3.)
      Resolution: This is a duplicate. See F-3, F-4, F-10.

      # Issue Class Status Source Opened Closed
      G-3 Inlined routine linkage call closed Sun 990603 991202
      Summary: Inline routines with external linkage require a method of handling vague linkage (see B-5 for definition) for the out-of-line instance, as well as for any static data they contain. The latter includes string constants per [7.1.2]/4.
      Resolution: Out-of-line instances are emitted where required, using COMDAT (issue B-5). Static data referenced will be placed in COMDAT sections as well. The names of each are addressed as part of mangling (issue F-1). Strings will be emitted in SHT_MERGE/SHT_STRING sections, with the static linker responsible for removing duplicates.

      [990624 Cygnus -- Jason] How should we handle local static variables in inlines? G++ currently avoids this issue by suppressing inlining of functions with local statics. If we don't want to do that, we'll need to specify a mangling for the statics, and handle multiple copies like we do above.

      [990721 Cygnus -- Jason] [We should emit inline routines] in translation units where an out-of-line copy is needed. I am opposed to emitting the inlines with the vtable, for two reasons:

      • One of our users defines a proxy class whose implementation is not exported from the shared library where it is defined; the API for the class consists of virtual functions, accessible through the vtable, and inline functions. They complained that since g++ currently emits inlines along with the vtable, their code would only link if inlining was enabled.

      • Often, we will need no copies of an inline function.

      [991118 All] We discussed linkage of static locals in inline functions. The C++ standard requires that there be only a single object in the entire program, i.e. the static locals in different translation units must be merged. Two cases: string literals and everything else. "Everything else" is believed to be a rare and unimportant case. We'll just give the static locals mangled names, and put them in comdat groups. String literals are believed to be common, and mangled names in COMDAT is too heavyweight. The base ABI provides an optional mechanism for merging all copies of a given string literal. We would like to make this mechanism mandatory, so that string literals in inline functions get merged automatically.

      [991202 All] The use of the new SHT_MERGE/SHT_STRING attributes, requiring the static linker to do the merging, was decided to be a suitable solution. It was noted that this will not provide merging across DSOs, but this is not considered a problem. An implementation may overcome this by naming the strings and invoking dynamic linker name preemption, at the cost of additional dynamic link time.

      # Issue Class Status Source Opened Closed
      G-4 Dynamic init of local static objects and multithreading call closed SCO 990607 001109
      Summary: The Standard requires that local static objects with dynamic constructors be initialized exactly once, the first time the containing scope is entered. Multi-threading renders the simple check of a flag before initialization inadequate to prevent multiple initialization. Should the ABI require locking for this purpose, and if so, what are the necessary interfaces? In addition to the locking of the initialization, special exception handling treatment is required to deal with an exception during construction.
      Resolution: The ABI will specify an 8-byte guard variable, with one byte used for the initialization flag, and the others available for use by a threading package for locking. ABI routines are specified for acquiring and releasing the lock. See ABI section 3.3.2.


      [990607 SCO -- Jonathan] The standard is mute on multiple threads of control in general, so there is no requirement in the language to support what I'm talking about. But as a practical matter compilers have to do it (Watcom gave a paper on their approach during the standardization process, if I remember). This example using UI/SVR4 threads will usually show whether a compiler does it or not:

      
      thr5.C:
      // static local initialization and threads
      
      #include 
      #define EXIT(a) exit(a)
      #define THR_EXIT() thr_exit(0)
      
      #include 
      
      int init_count = 0;
      int start_count = 0;
      
      int init()
      {
        
              ::thr_yield();
              return ++init_count;
      }
      
      void* start(void* s)
      {
        
              start_count++;
              static int i = init();
              if (i != 1) EXIT(5);
              THR_EXIT();
              return 0;
      }
                      
      int main()
      {
        
              thread_t t1, t2;
              if (::thr_create(0, 0, start, 0, 0L, &t1) != 0) EXIT(1);
              if (::thr_create(0, 0, start, 0, 0L, &t2) != 0) EXIT(2);
              if (::thr_join(t1, 0, 0) != 0) EXIT(3);
              if (::thr_join(t2, 0, 0) != 0) EXIT(4);
              if (start_count != 2)
                      EXIT(6);
              if (init_count != 1)
                      EXIT(7);
              THR_EXIT();
      }
      

      When compiled with CC -Kthread thr5.C on UnixWare 7, for instance, it passes by returning 0. When compiled with CC -mt thr5.C on Solaris/x86 C++ 4.2 (sorry don't have the latest version!), it fails by returning 5.


      [990607 Sun -- Mike Ball] As far as I can tell, the language says that the automatic blocking issue isn't a valid approach. It says what has to happen, and it isn't that.

      If you look at the entire statement you find that it reads: "Otherwise such an object is initialized the first time control passes through its declaration; such an object is considered initialized upon the completion of its initialization. If the initialization exits by throwing an exception, the initialization is not complete, so it will be tried again the next time control enters the declaration. If control re-enters the declaration (recursively) while the object is being initialized, the behavior is undefined."

      The word "recursively" is normative, so eliminates that sentence from consideration.

      One can, of course, make any extension to the language, but in this case I think the extension invalidates some otherwise valid code.

      The sentence I'm referring to is that the object is considered initialized upon the completion of its initialization. This is explicit, and the reason for it is covered in the following sentence, which discusses an initialization that terminates with an exception. A person catching such an exception has the right to try again without danger that the static variable will be initialized in the meantime.

      I don't see anything at all to justify semantics that say, "after initialization is started, Any other threads of control are blocked until that thread completes the initialization, unless, of course, it executes by an exception, in which case the other thread can do the initialization before the exception handler gets a chance to try again, except...." Take an attempt to define the semantics as far as you like.

      The problem is that there is no way for the compiler writer to know what the programmer really wanted to do. I can (and will at some other date, if necessary) come up with scenarios justifying a variety of mutual exclusion policies, including none.

      The solution is to let the programmer write the mutual exclusion, the same as we do for every other potential race condition. It's a real mess, and, I claim, an unwise one to put in as an extension.


      [990608 HP -- Christophe] The semantics currently implemented in the HP aC++ compiler is as follows:

      • No two thread can enter a static initialization at the same time
      • Threads are blocked until immediately after the static initialization either succeeds or fails with an exception.

      There are details of our implementation that I disagree with, but in general, the semantics seem clear and sane, not as convoluted as you seemed to imply. In particular, it correctly covers the case where the static initialization fails with an exception. Any thread at that point can attempt the initialization.


      [990608 SCO -- Jonathan] Here's what the SCO UnixWare 7 C++ compiler does for IA-32, from a (slightly sanitized) design document. It meets Jim's goal of having no overhead for non-threaded programs and minimal overhead for threaded programs unless actual contention occurs (infrequent), and meets Mike's goal of handling exceptions in the initialization correctly (although it doesn't guarantee that the thread getting the exception is the one that gets next crack at initializing the static). It's also worth noting that dynamic initialization of local variables (static or otherwise) is very common in C++, since that's what most object constructions involve, so I don't think this case is as rare as Jim does.

      [...] This is in local static variables with dynamic initialization, where the compiler generates out a static one-time flag to guard the initialization. Two threads could read the flag as zero before either of them set it, resulting in multiple initializations.

      [...] Accordingly, when compilation is done with -Kthread on, a code sequence will be generated to lock this initialization. [...] the basic idea is to have one guard saying whether the initialization is done (so that multiple initializations do not occur) and have another guard saying whether initialization is in progress (so that a second thread doesn't access what it thinks is an initialized value before the first thread has finished the initialization). [...]

      When compiled with -Kthread, the generated code for a dynamic initialization of a local static variable will look like the following. guard is a local static boolean, initialized to zero, generated by the [middle pass of the compiler]. Two bits of it are used: the low-order 'done bit' and the next-low-order 'busy bit'.

      
      .again:
              movl    $guard,%eax
              testl   $1,(%eax)       // test the done bit
              jnz     .done           // if set, variable is initialized,
      done
              lock; btsl  $1,(%eax)   // test and set the busy bit
              jc      .busy
              < init code >           // not busy, do the initialization
              movl    $guard,%eax
              movl    $3,(%eax)       // set the done bit
              jmp     .done
      .busy:
              pushl   %eax            // call RTS routine to wait, passing address
              call1   __static_init_wait      // of guard to monitor
              testl   %eax,%eax       // 1 means exception occurred in init code,
              popl    %ecx
              jnz     .again                  // start the whole thing over
      .done                                   // 0 means wait finished
      

      The above code will work for position-independent code as well. The complication due to exceptions is: what happens if the initialization code throws an exception? The [compiler] EH tables will have set up a special region and flag in their region table to detect this situation, along with a pointer to the guard variable. Because the initialization never completed, when the RTS sees that it is cleaning up from such a region, it will reset the guard variable back to both zeroes. This will free up a busy-waiting thread, if any, or will reset everything for the next thread that calls the function.

      The idea of the __static_init_wait() RTS routine is to monitor the value of guard bits passed in, by looping on this decision table:

      
          done    busy
          0       0       return 1 in %eax        (EH wipe-out)
          1       1       return 0 in %eax        (no longer busy)
          0       1       continue to wait        (still busy)
          1       0       internal error, shouldn't happen
      

      As for how the wait is done [... not relevant for ABI, although currently we're using thr_yield(), which may or may not be right for this context].


      [990608 SGI -- Hans] I'd like to make some claims about function scope static constructor calls in multithreaded environments. I personally can't recall ever having used such a construct, which somewhat substantiates my claims, but also implies some lack of certainty. I'd be interested in hearing any arguments to the contrary.

      I believe that these arguments imply that this problem is not important enough to warrant added ABI complexity or overhead for sequential code.

      Consider the following skeletal example:

      f(int x) { static foo a(...); ... }

      1. If the constructor argument doesn't depend on the function parameter, and the code behaves reasonably, it should be possible to rewrite this as

        static foo a(...);
        f(int x) { ... }

      2. If I read the standard correctly (and that's a big disclaimer), the compiler is entitled to perform the above transformation under conditions that are usually true, but hard for the compiler to deduce. Thus code that relies on the initialization occurring during the execution of f is usually broken.

      3. Thus the foo constructor cannot rely on its caller holding any locks. It must explicitly acquire any locks it needs.

      4. It is far preferable to write the transformed form with a file scope static variable to start with. The initial form risks deadlock, since f may be called with locks held which the constructor can't assume are held. If it needs one of those locks it will need to reacquire it. With default mutex semantics that results in deadlock with itself. (If locks may be reentered, it may fail in a more subtle manner since the foo constructor may acquire a monitor lock whose monitor invariant doesn't hold.)

      5. File scope static constructor calls aren't a problem and require no locking, since they are executed in a single thread before main is called or before dlopen returns. (Forking a thread in a static constructor should probably be disallowed. Threads may not have been fully initialized, among other issues.)

      6. Static function scope constructor calls which depend on function arguments are likely to involve a race condition anyway, if multiple instances of the function can be invoked concurrently. Any of the calls might determine the constructor parameters. Thus these aren't very interesting either. And if they are really needed, they can be replaced with a file scope static constructor call plus an assignment.


      [990607 SCO -- Jonathan] Hans' argument breaks such local statics into two groups: those that don't depend upon the function's parameters, and those that do. For the latter group, he says:

      
      > 6) Static function scope constructor calls which depend on function
      > arguments are likely to involve a race condition anyway, if multiple
      > instances of the function can be invoked concurrently.  Any of the
      > calls might determine the constructor parameters.  Thus these aren't
      > very interesting either.  And if they are really needed, they can be
      > replaced with a file scope static constructor call plus an assignment.
      

      I don't agree with these claims. There are sometimes situations where a group of objects is being processed, and you want to arbitrarily pick one of them to serve as an identifier or key for all of them. Consider perhaps a golf course scheduler, which is taking in players and assigning them to foursomes. You want to name each foursome by one of the names of the players (it doesn't matter which one), such as the "Jones group" or the "Smith group". A natural way to program this might be:

      
            void build_foursome(string golfer) {
      	  static string group_name(golfer);
      	  // process golfer into group group_name ...
            }
      

      Now if the golfers being scheduled are coming from four different databases, it might be that a thread is running to extract from each database. Thus build_foursome() might be called concurrently. That's fine, and there is no need for application-level locks in either the caller or this function; we don't care which golfer the group is named after. We just want the 'static' to work correctly; what we don't want is a double initialization, with two different group names being generated for golfers in the same group, which is possible if the guard code isn't thread-safe.

      Now one can say that this kind of design isn't wise, or that locks will probably be needed later in this function to do the rest of the processing, or that this can be coded in several other ways. And that may all be so. But I think this usage is *reasonable* in this context, and that as implementors we should get it right. [Editorial: Especially with the advent of Java, threaded application programming is becoming more the norm; and language implementations that dodge the challenge and say that thread support is solely the job of libraries, may not be looked upon kindly by users.]


      [000511 All] The ABI will not specify special multi-threading behavior. Note that the initialization guard variable (Issue C-14) is specified with size 8 bytes, with only 1 used, so an implementation is free to make arbitrary use of the other 7 for the suggested purpose, with the consequence that initializations from multiple copies (e.g. from inlining) could be inconsistent across implementations.


      [000706 All] Reopen this issue and attempt to define an API for those implementations that do want to do a thread-safe version. Jim has added a proposed API to the Draft ABI document.


      [000706 HP -- Christophe] The current HP implementation does not use a release, and has a more specializedroutine. This would be something like:

          extern "C" void __cxa_allocate_static(
      	    bool *flag,
      	    void *object_address,
      	    void (*object_dtor)(void *object));
      
      

      The calling sequence for:

          static X x
      
      
      becomes:
          static bool static_x_flag;
          static X x;
          if (!static_x_flag)
      	    __cxa_allocate_static(&static_x_flag,
      				  &x, __addressof(X::~X));
      
      

      This has the following benefits:

      1. If the static has been initialized already, the flag is set, so we short-circuit the function call
      2. The function registers the object and its destructor for invocation at exit()time.

      The function itself deals with the flag in a thread-safe way, but this requires only one mutex inside the function. This is important, since test and set operations are potentially costly memorywise on IA64 (they are definitely on PA-RISC, where any mutex / lock / whatever must be 16-bytes aligned)


      [000803 All] Discussion brought out that Christophe's __cxa_allocate_static can't work precisely as described, since the constructor and its arguments are also needed. Christophe said that the actual sequence is more complex, he removed too much to simplify the presentation, and he will attempt to provide a fuller description.

      The concern was repeated that there are objections to any automatic locking approach, and we should go back and consider them again.


      [000720 All] Christophe would like to see the locking for this purpose combined with the locking required to register the initialized object with __cxa_atexit, as well as the ability to statically create the structure that will be enqueued by __cxa_atexit.

      A potential interface that allows this would be the following. Expand the guard object to the following structure:

      
      	struct __cxa_guard {
      	  long long guard;	// Guard variable
      	  void *next;		// List link for destructor chain
      	  void (*dtor) (void*);	// Pointer to destruction routine
      	  void *p;		// Pointer to dtor parameter
      	  dso_handle dhandle;	// DSO handle for owning DSO
      	};
      
      An implementation that chooses to implement its __cxa_atexit list with elements matching this structure could then simply enqueue the above structure on the list (without its initial doubleword guard). An implementation using another structure might need to rearrange the data. (This ABI would not specify either choice.) The __cxa_guard_release call above would be re-specified to also enqueue the object on the destruction list by calling __cxa_atexit or its equivalent.


      [000817 SGI -- Jim] Note the tradeoff in the above: It would increase the guard variable size from 8 to 40 bytes, but would likely eliminate a bunch of instructions to gather that data for the destructor registration call. (But it would be a pure loss for no-destructor objects. So perhaps we should modify it to eliminate the extra data for those, and pass a parameter or use a byte in the guard member to indicate that to the release routine?)


      [001109 all] It was observed that the current specification of __cxa_guard_release in 3.3.2 is not adequate to cope with the case where an exception is raised and the lock must be released without marking the object initialization complete. Therefore, we will define an analogous __cxa_guard_abort that does not mark the initialization complete, so that the next thread entering the scope will obtain the lock and try again.

      Since there has been no further feedback from HP on the more complicated proposal above, and the current HP attendees do not think it necessary, this issue will be closed.

      # Issue Class Status Source Opened Closed
      G-5 Varargs routine interface call open HU-B 990810
      Summary: The underlying C ABI defines conventions for calling varargs routines. Does C++ need, or would it benefit from, any modifications or special cases? How should we pass references or class objects? Is any runtime library support required?
      Resolution: No special cases required -- C++ will follow the C varargs ABI.

      [990810 HU-B Martin] I'd like to see an indirection in vararg lists, so they can be passed through thunks. This is necessary at least for the covariant returns, but might have other applications as well.

      [990810 HU-B Martin] Since there already was the decision not to return a list of pointers from a covariant method, the only alternative to real thunks is code duplication (as done in Sun Workshop 5). (Or alternate entrypoints... Jim)

      With real thunks, you have to copy the argument list. That is not possible for a varargs list, so here is my proposal for varargs in C++:

      In the place of the ellipsis, a pointer to the first argument is passed. In case of a thunk for covariant returns, this pointer can be copied to the destination function. The variable arguments are put on the stack as they normally would.

      With that, the issue is in which cases to use such a calling convention:

      1. only for vararg calls to virtual methods, or
      2. only for vararg calls to functions with C++ linkage, or
      3. for all vararg calls. That would probably require a change to the C ABI

      Option (1) could be further restricted to methods returning a pointer or reference to class type.

      [990812 All] In response to a question, it was observed that passing one variant of a class hierarchy in a varargs list and referencing another variant in the va_arg macro is undefined, and we don't need to worry about a mechanism for doing the conversion.

      [991014 All] We would want to reject option (3), even if it were still possible to change the base ABI. The present scheme is compatible with K&R C methods, the proposed change would not be.

      Decision: Close with no action. We're using multiple entry points for covariant return types, not thunks, so there's no need for doing anything different for varargs functions with covariant return types than for any other varargs functions.

      # Issue Class Status Source Opened Closed
      G-6 bool parameters call closed all 991104 991202
      Summary: How should we pass bool parameters on IA-64? Choices are to pass them like ABI ints, or in predicate registers or register pairs.
      Resolution: No special treatment -- pass bool like char.

      [991202 All] It was decided not to treat bool parameters specially, i.e. they will be passed like chars.


      H. Library Interface Issues

      # Issue Class Status Source Opened Closed
      H-1 Runtime library DSO name tools closed SGI 990616 000817
      Summary: Determine the name of the common C++ runtime library DSO, e.g. libC.so. If there are to be vendor-specific support libraries which must coexist in programs from mixed sources, identify naming convention for them.
      Resolution: The runtime library will be named libcxa.so.

      [000817 All] Agreed to name the library libcxa.so.