C++ ABI Closed Issues
Revised 17 November 2000
call | Function call interface, i.e. call linkage |
data | Data layout |
lib | Runtime library support |
lif | Library interface, i.e. API |
g | Potential gABI impact |
ps | Potential psABI impact |
source | Source code conventions (i.e. API, not ABI) |
tools | May affect how program construction tools interact |
# | Issue | Class | Status | Source | Opened | Closed |
---|---|---|---|---|---|---|
A-1 | Vptr location | data | closed | SGI | 990520 | 990624 |
Summary: Where is the Vptr stored in an object (first or last are the usual answers). |
[990610 All] Given the absence of addressing modes with displacements on IA-64, the consensus is to answer this question with "first."
[990617 All] Given a Vptr and only non-polymorphic bases, which (Vptr or base) goes at offset 0?
Tentative decision: Vptr always goes at beginning.
[990624 All] Accepted tentative decision. Rename, close this issue, and open separate issue (B-6) for Vtable layout.
# | Issue | Class | Status | Source | Opened | Closed |
---|---|---|---|---|---|---|
A-2 | Virtual base classes | data | closed | SGI | 990520 | 990624 |
Summary: Where are the virtual base subobjects placed in the class layout? How are data member accesses to them handled? |
[990610 Matt] With regard to how data member accesses are handled, the choices are to store either a pointer or an offset in the Vtable. The concensus seems to be to prefer an offset.
[990617 All] Any number of empty virtual base subobjects (rare) will be placed at offset zero. If there are no non-virtual polymorphic bases, the first virtual base subobject with a Vpointer will be placed at offset zero. Finally, all other virtual base subobjects will be allocated at the end of the class, left-to-right, depth-first.
[990624 All] Define an empty object as one with no non-static, non-empty data members, no virtual functions, no virtual base classes, and no non-empty non-virtual base classes. Define a nearly empty object as one which contains only a Vptr. The above resolution is accepted, restated as follows:
Any number of empty virtual base subobjects (rare, because they cannot have virtual functions or bases themselves) will be placed at offset zero, subject to the conflict rules in A-3 (i.e. this cannot result in two objects of the same type at the same address). If there are no non-virtual polymorphic base subobjects, the first nearly empty virtual base subobject will be placed at offset zero. Any virtual base subobjects not thus placed at offset zero will be allocated at the end of the class, in left-to-right, depth-first declaration order.
# | Issue | Class | Status | Source | Opened | Closed |
---|---|---|---|---|---|---|
A-3 | Multiple inheritance | data | closed | SGI | 990520 | 990701 |
Summary: Define the class layout in the presence of multiple base classes. |
[990617 All] At offset zero is the Vptr whenever there is one, as well as the primary base class if any (see A-7). Also at offset zero is any number of empty base classes, as long as that does not place multiple subobjects of the same type at the same offset. If there are multiple empty base classes such that placing two of them at offset zero would violate this constraint, the first is placed there. (First means in declaration order.)
All other non-virtual base classes are laid out in declaration order at the beginning of the class. All other virtual base subobjects will be allocated at the end of the class, left-to-right, depth-first.
The above ignores issues of padding for alignment, and possible reordering of class members to fit in padding areas. See issue A-9.
[990624 All] There remains an issue concerning the selection of the primary base class (see A-7), but we are otherwise in agreement. We will attempt to close this on 1 July, modulo A-7.
[990701 All] This issue is closed. A full description of the class layout can be found in issue A-9. (At this time, A-7 remains to be closed, waiting for the Taligent rationale.)
# | Issue | Class | Status | Source | Opened | Closed |
---|---|---|---|---|---|---|
A-4 | Empty base classes | data | closed | SGI | 990520 | 990624 |
Summary: Where are empty base classes allocated? (An empty base class is one with no non-static data members, no virtual functions, no virtual base classes, and no non-empty non-virtual base classes.) |
[990624 All] Closed as a duplicate of A-3.
# | Issue | Class | Status | Source | Opened | Closed |
---|---|---|---|---|---|---|
A-5 | Empty parameters | data | closed | SGI | 990520 | 001117 |
Summary: When passing a parameter with an empty class type by value, what is the convention? | ||||||
Resolution : Except for cases of non-trivial copy constructors (see C-7), and parameters in the variable part of varargs lists, A single parameter slot will be allocated to empty parameters, as though they were a struct containing a single character. |
[990623 SGI] We propose that no parameter slot be allocated to such parameters, i.e. that no register be used, and that no space in the parameter memory sequence be used. This implies that the callee must allocate storage at a unique address if the address is taken (which we expect to be rare).
[990624 All] In addition to the address-taken case, care is required if the object has a non-trivial copy constructor. HP observes that in (some?) such cases, they perform the construction at the call site and pass the object by reference.
[990625 SGI -- Jim] I understand that the Standard explicitly allows elimination of even non-trivial copy construction in some cases. Is this one of them? Where should I look? Also, of course, varargs processing for elided empty parameters would need to be careful.
I have opened a new issue (C-7) for passing copy-constructed parameters by reference. Since doing so would turn an empty value parameter into a non-empty reference parameter, this issue can ignore such cases.
[990701 All] An empty parameter will not occupy a slot in the parameter sequence unless:
Daveed and Matt will pursue the question of when copy constructors may be ignored for parameters with the Core committee, and if they identify cases where the constructors may clearly be omitted, those (empty) parameters will also be elided.
[001109 CodeSourcery -- Mark] Both g++ and the HP compiler have great difficulty dealing with this, and prefer to reserve the parameter slot even for empty parameters. At the meeting, we tentatively decided to reverse our decision and allocate an integer parameter slot even for empty parameters. We will place no constraints on the data in the parameter slot, except that on IA-64, it must be not be NaT data.
[001117 All -- Jim] There having been no objection to the proposed resolution, it is adopted. Results will be treated the same way.
# | Issue | Class | Status | Source | Opened | Closed |
---|---|---|---|---|---|---|
A-6 | RTTI .o representation | data call ps | closed | SGI | 990520 | 991028 |
Summary:
Define the data structure to be used for RTTI, that is:
| ||||||
Resolution: Defined in the Draft C++ ABI for IA-64. |
[990701 All] Daveed will put together a proposal by the 15th (action #13); the group will discuss it on the 22nd.
[990805 All] Daveed should have his proposal together for discussion. Michael Lam will look into the Sun dynamic cast algorithm.
It was noted that appropriate name selection along with the normal DSO global name resolution should be sufficient to produce a unique address for each class' RTTI struct, which address would then be a suitable identifier for comparisons.
[990812 Sun -- Michael] Sun has provided a description, in a separate page, describing their implementation. They are filing for a patent on the algorithms described.
[990819 EDG -- Daveed] (Proposal replaced by later version on 6 October.)
[990826 All] Discussion centered on whether the representation should include all base classes or just the direct ones, and in the former case how hashing might be handled. It was agreed that the __qualifier_type_info variant is not needed, and it is now striken in the above proposal. Also, a pointer-to-member variant is needed. Christophe will provide a description of the HP hashing approach, and Daveed will update the specification.
[991006 EDG -- Daveed]
The C++ programming language definition implies that information about types be available at run time for three distinct purposes:
The following conclusions were arrived at by the attending members of the C++ IA-64 ABI group:
The full proposal has been incorporated in the Draft C++ ABI for IA-64.
[991014 all]
ACTION ITEMS: Daveed---make these changes. Jim---incorporate these changes into the open issues list. We are almost ready to close this issue; we intend to close it at the 28 October meeting, after we've all had a change to go over the modified writeup.
[991028 all] The current definition, in the Draft C++ ABI for IA-64, has been updated with Daveed's changes, and is accepted. Note that we are back to using a pointer to RTTI in the vtable (see B-8), since we need uniqueness, and since we need an external symbol in any case, the ABI will make no statement about where RTTI is allocated. It is likely that implementations will use COMDAT for it.
# | Issue | Class | Status | Source | Opened | Closed |
---|---|---|---|---|---|---|
A-7 | Vptr sharing with primary base class | data | closed | HP | 990603 | 990729 |
Summary: It is in general possible to share the virtual pointer with a polymorphic base class (the primary base class). Which base class do we use for this? | ||||||
Resolution: Share with the first non-virtual polymorphic base class, or if none with the first nearly empty virtual base class. |
[990617 All] It will be shared with the first polymorphic non-virtual base class, or if none, with the first nearly empty polymorphic virtual base class. (See A-2 for the definition of nearly empty.)
[990624 All] HP noted that Taligent chooses a base class with virtual bases before one without as the primary base class), probably to avoid additional "this" pointer adjustments. SGI observed that such a rule would prevent users from controlling the choice by their ordering of the base classes in the declaration. The bias of the group remains the above resolution, but HP will attempt to find the Taligent rationale before this is decided.
[990729 All] Close with the agree resolution. If a convincing Taligent rationale is found, we can reconsider.
# | Issue | Class | Status | Source | Opened | Closed |
---|---|---|---|---|---|---|
A-8 | (Virtual) base class alignment | data | closed | HP | 990603 | 990624 |
Summary: A (virtual) base class may have a larger alignment constraint than a derived class. Do we agree to extend the alignment constraint to the derived class? (An alternative for virtual bases: allow the virtual base to move in the complete object.) |
[990623 SGI] We propose that the alignment of a class be the maximum alignment of its virtual and non-virtual base classes, non-static data members, and Vptr if any.
[990624 All] Above proposal accepted. (SGI observation: the size of the class is rounded up to a multiple of this alignment, per the underlying psABI rules.)
# | Issue | Class | Status | Source | Opened | Closed |
---|---|---|---|---|---|---|
A-9 | Sorting fields as allowed by [class.mem]/12 | data | closed | HP | 990603 | 990624 |
Summary: The standard constrains ordering of class members in memory only if they are not separated by an access clause. Do we use an access clause as an opportunity to fill the gaps left by padding? | ||||||
Resolution: See separate writeup of Draft C++ ABI for IA-64. |
[990610 all] Some participants want to avoid attempts to reorder members differently than the underlying C struct ABI rules. Others think there may be benefit in reordering later access sections to fill holes in earlier ones, or even in base classes.
[990617 all] There are several potential reordering questions, more or less independent:
There is no apparent support for (1), since no simple heuristic has been identified with obvious benefits. There is interest in (2), based on a simple heuristic which might sometimes help and will never hurt. However, it is not clear that it will help much, and Sun objects on grounds that they prefer to match C struct layout. Unless someone is interested enough to implement and run experiments, this will be hard to agree upon. G++ has implemented (3) as an option, based on specific user complaints. It clearly helps HP's example of a base class containing a word and flag, with a derived class adding more flags. Idea (4) has more problems, including some non-intuitive (to users) layouts, and possibly complicating the selection of bitwise copy in the compiler.
[990624 all] We will not do (1), (2), or (4). We will do (3). Specifically, allocation will be in modified declaration order as follows:
[990722 all] The precise placement of empty bases when they don't fit at offset zero remained imprecise in the original description. Accordingly, a precise layout algorithm is described in a separate writeup of Data Layout.
[990729 all] The layout writeup was accepted, with the first choice for empty base placement. That is, if placement at offset zero doesn't work, it will be placed like a normal base/member. The concensus was that this won't happen often, and such bases will often overlap with the preceding tail padding or following components anyway. Jim will modify the writeup accordingly.
# | Issue | Class | Status | Source | Opened | Closed |
---|---|---|---|---|---|---|
A-10 | Class parameters in registers | call | closed | HP | 990603 | 990710 |
Summary:
The C ABI specifies that structs are passed in registers.
Does this apply to small non-POD C++ objects passed by value?
What about the copy constructor and this pointer in that case?
|
[990701 all] A separate issue (C-7) deals with cases where a non-trivial copy constructor is required; we ignore those cases here. Our conclusion is that, without a non-trivial copy constructor, we need not be concerned about the class object moving in the process of being passed, and there is no need to use a mechanism different from the base ABI C struct mechanism. At the same time, if we do use the underlying C struct mechanism, the user has complete control of the passing technique, by choosing whether to pass by value or reference/pointer.
Therefore, except in cases identified by issue C-7 for different treatment, class parameters will be passed using the underlying C struct protocol.
# | Issue | Class | Status | Source | Opened | Closed |
---|---|---|---|---|---|---|
A-11 | Pointers to member functions | data | closed | Cygnus | 990603 | 990812 |
Summary: How should pointers to member functions be represented? | ||||||
Resolution: As a pair of values, described below. |
[990729 All] Jason described the g++ implementation, which is a three-member struct:
A concern about covariant returns was raised. It was observed that, given our decision to use distinct Vtable entries for distinct return types, no further concern is required here. Others will describe their representations. IBM has an alternative, but it is believed to be patented by Microsoft.
[990805 All] It is agreed that a two-element struct will be used for a pointer to a member function, with elements as follows:
ptr
:
adj
:
Although we agreed to close this, SGI suggests a minor modification. Since the Vtable offset of a virtual function will always be even, we suggest that it not be doubled before adding 1. This is because shifts are more restricted on many processors than other integer ALU operations (shifters are large structures), so an XOR or NAND will often be cheaper than a right shift.
[990812 All] Close this issue with the suggested modification.
# | Issue | Class | Status | Source | Opened | Closed |
---|---|---|---|---|---|---|
A-12 | Merging secondary vtables | data | closed | Sun | 990610 | 990805 |
Summary: Sun merges the secondary Vtables for a class (i.e. those for non-primary base classes) with the primary Vtable by appending them. This allows their reference via the primary Vtable entry symbol, minimizing the number of external symbols required in linking, in the GOT, etc. | ||||||
Resolution: Concatenate the Vtables associated with a class in the same order that the corresponding base subobjects are allocated in the object. |
[990701 Michael Lam] Michael will check what the Sun ABI treatment is and report back.
[990729 All] A separate issue raised in conjunction with A-7 is whether to include Vfunc pointers in the primary Vtable for functions defined only in the base classes and not overridden. If the primary and secondary Vtables are concatenated, this is no longer an issue, since all can be referenced from the primary Vptr.
[990805 All] All of the Vtables associated with a class will be concatenated, and a single external symbol used (to be identified as part of the mangling issue F-1). The order of the tables will be the same as the order of base class subobjects in an object of the class, i.e. first the primary Vtable, then the non-virtual base classes in declaration order, and finally the virtual base classes in depth-first declaration order.
# | Issue | Class | Status | Source | Opened | Closed |
---|---|---|---|---|---|---|
A-13 | Parameter struct field promotion | call | closed | SGI | 990603 | 990701 |
Summary: It is possible to pass small classes either as memory images, as is specified by the base ABI for C structs, or as a sequence of parameters, one for each member. Which should be done, and if the latter, what are the rules for identifying "small" classes? | ||||||
Resolution: No special treatment will be specified by the ABI. |
[990701 all] Define no special treatment for this case in the ABI. A translator with control over both caller and callee may choose to optimize.
# | Issue | Class | Status | Source | Opened | Closed |
---|---|---|---|---|---|---|
A-14 | Pointers to data members | data | closed | SGI | 990729 | 990805 |
Summary: How should pointers to data members be represented? | ||||||
Resolution: Represented as one plus the offset from the base address. |
[990729 SGI]
We suggest an offset from the base address of the class,
represented as a ptrdiff_t
.
[990805 All]
Such pointers are represented as one plus the offset from the base
address of the class, as a ptrdiff_t
.
NULL pointers are zero.
# | Issue | Class | Status | Source | Opened | Closed |
---|---|---|---|---|---|---|
A-15 | Empty bit-fields | data | closed | CodeSourcery | 991214 | 000106 |
Summary: How are zero-length bit-fields handled? | ||||||
Resolution: Zero-length bit-fields do not prevent a class from being considered empty or nearly empty. |
[991214 CodeSourcery -- Mark]
Question: Does the presence of a zero-width bit-field prevent a class from being empty?
Suggested Resolution: No. Amend the definition of an "empty class" to read:
Amend the definition of a "nearly empty class" to read:
[000106 All] Accept the CodeSourcery proposal.
# | Issue | Class | Status | Source | Opened | Closed |
---|---|---|---|---|---|---|
A-16 | Nearly empty virtual bases | data | closed | SGI | 991228 | 000106 |
Summary: May a class with non-empty, non-primary, virtual base classes be treated as nearly empty (and thus eligible to be a primary base) if its only non-vptr data is in its virtual base classes? | ||||||
Resolution: Virtual base classes do not prevent a class from being considered nearly empty. |
[000106 All] Accept the proposal.
# | Issue | Class | Status | Source | Opened | Closed |
---|---|---|---|---|---|---|
A-17 | Primary indirect virtual base allocation | data | closed | SGI | 991228 | 000113 |
Summary: When a nearly empty virtual base class A is allocated as the primary base class of class B, and then B is allocated as a base class of C, should A (i.e. its vptr) be separately allocated in C, or should its first occurrence in a previously allocated base B be used as its allocation in C? | ||||||
Resolution: Do not reallocate a nearly empty virtual base class that is the primary base class of any other base class, direct or indirect. Use the first primary base class instance in the inheritance hierarchy as its allocation, in the usual depth-first, left-to-right order. |
[991228 SGI -- Jim] Specific wording for a proposed change is in the Draft C++ ABI for IA-64.
[000103 CodeSourcery -- Mark]
I think the current proposal for allocating virtual bases is still a
little suboptimal. In particular, given:
struct A { void f(); };
struct B : virtual public A { };
struct C : virtual public A, virtual public B { };
we'll give `C' a larger size than for:
struct C : virtual public B, virtual public A { };
where we'll reuse the `A' part of `B' rather than reallocating it.
I know that ordering can already affect size (principally because of alignment issues) but I think that in this case we might as well not punish programmers for choosing the "wrong" ordering.
I think we should change the green A-17 proposed resolution to indicate that if one of the virtual bases is a (direct or indirect) primary base of one of the other virtual bases then we need not allocate a fresh copy.
FWIW, it turns out to actually be easier in GCC to code the more generous version.
The algorithm to do this is linear in the size of the hierarchy: just iterate through the inheritance DAG marking all primary bases. Any virtual base classes that remain unmarked need to be allocated in step III. A slight formalization of this sentence might be a good way to express which bases to choose for III.
[000113 All] Do not reallocate a nearly empty virtual base class that is the primary base class of any other base class, direct or indirect. Use the first primary base class instance in the inheritance hierarchy as its allocation, in the usual depth-first, left-to-right order.
# | Issue | Class | Status | Source | Opened | Closed |
---|---|---|---|---|---|---|
A-18 | Virtual base alignment | data | closed | SGI | 991228 | 000113 |
Summary: Should virtual bases have a different effect on class alignment than other components? | ||||||
Resolution: Yes. When allocating the non-virtual part of a base class, use its non-virtual allignment, i.e. ignoring its virtual bases' contributions. |
[991228 SGI -- Jim] Since the allocation of virtual bases is "floating" relative to the classes in which they occur, it is possible for them to have independent alignment constraints. Specifically, when allocating a base class with a virtual base, we could treat its alignment as that obtained by ignoring the virtual base, and later allocate the virtual base with greater alignment.
Since the class with a virtual base already has a vptr, this only matters if the virtual base contains components more strictly aligned than a pointer. Thus, the benefit of doing so is probably not large. To get some idea of the effect on the layout definition, look at dsize and nvsize, and assume a similar pair of alignment values.
[000106 All] No strong opinions were expressed on this issue. We will decide it at the next meeting after people have a chance to think it over. The bias will be to keep the current simpler definition.
[000113 All] It turns out that both Compaq and someone else (Cygnus?) already do this, find it straightforward, and prefer to keep it. Therefore, accept the suggestion that when allocating the non-virtual part of a base class, we use its non-virtual allignment, i.e. ignoring its virtual bases' contributions.
# | Issue | Class | Status | Source | Opened | Closed |
---|---|---|---|---|---|---|
A-19 | Primary indirect virtual base choice | data | closed | All | 000106 | 000120 |
Summary: In allocating class C, when the first nearly empty virtual base class A is allocated as the primary base class of a later nearly empty virtual base class B, should A or B become the primary base class of C? | ||||||
Resolution: Do not use a virtual base as primary if it is already a primary base of some other direct or indirect base, unless such are the only candidates. In either case, use the first candidate in depth-first, left-to-right order in the inheritance graph. |
[000106 All] This issue was initially confused in the discussion with A-17, but is independent. Recall that non-virtual bases have priority over virtual bases for selection as the primary base. Assuming that no non-virtual base is suitable, this issue involves which virtual base should be selected. Our original decision was to use the first in left-to-right order.
The proposal here is that, if this initial candidate A is itself already a primary base class of a later virtual base B, then B will be used instead, unless it is already a primary base class of a later virtual base, and so on. See proposed wording in the ABI layout document.
Noone can identify a case in which this approach is worse than the original definition.
[000113 All] The proposed resolution on the table is to use the following priority to choose the primary base class:
[000113 All] Modify the above to use any virtual base in the inheritance graph, first one that is not already primary to some base if possible, or then any candidate, chosen as the first in a depth-first, left-to-right inheritance graph walk.
# | Issue | Class | Status | Source | Opened | Closed |
---|---|---|---|---|---|---|
A-20 | Operator new array cookies | data | closed | All | 000113 | 000120 |
Summary: When operator new is used to create a new dynamic-length array, a cookie must be stored to remember the allocated length so that it can be deallocated correctly. | ||||||
Resolution: In principle, place cookie immediately before array, aligned naturally. Use no cookie for array element types without destructors. See the Draft C++ ABI for IA-64. |
[000113 All] The proposed resolution is as follows:
sizeof(size_t)
.
align
be the maximum alignment of
size_t
and an element of the array to be allocated.
align
bytes.
align
bytes.
align
bytes
from the space allocated for the array.
sizeof(size_t)
bytes
immediately preceding the array data.
sizeof(size_t)
is smaller than the array element alignment,
and if present will precede the cookie.
[000120 All] Accept the above.
# | Issue | Class | Status | Source | Opened | Closed |
---|---|---|---|---|---|---|
A-21 | Placement new array cookies | data | closed | All | 000113 | 000217 |
Summary: Same issue as A-20, except that for placement new, the user supplies already-allocated space. Therefore, there is a conflict between wanting to make delete() work on arrays created in this way, and wanting to avoid surprising users who haven't allocated enough space for the cookie. Also, are cookies allocated if there is no destructor? | ||||||
Resolution:
Use no cookie for element types with no destructors,
nor for ::operator new(size_t, void*) .
Otherwise, use a cookie as in issue A-20.
See the Draft C++ ABI for IA-64.
|
[000119 SGI -- Matt]
What the standard says (3.7.3.1, 5.3.4, and 18.4.1.3)
Array placement new has the form "new(ARGS) T[n]". The "(ARGS)" part is optional. If it's present then this is a placement new-expression, and we use a version of operator new[] with two or more arguments, otherwise it's an ordinary new-expression, and we use a version of operator new[] with one argument. For the purposes of this proposal, the distinction isn't all that important.
After finding the appropriate operation new, a new-expression obtains storage with
void* p = operator new[](n1, ARGS),
It is required (3.7.3.1/2) that the return value of any operator new[], whether it's built-in or provided by the user, must be suitably aligned for objects of any type.
If T is "char" or "unsigned char" the standard requires that delta is a nonnegative multiple of the most stringent alignment constraint for objects of size less than or equal to n (5.3.4/10). Otherwise the only restriction is that delta is nonnegative.
Some implementations store the number of elements in the array at a negative offset from p1. The standard neither requires nor forbids it.
There's a predefined placement version of array operator new,
::operator new[](size_t n1, void* p),
IA-64 Specifics
On IA-64 long double is 80 bits. long double has 128-bit alignment, as do classes and unions containing long double, so sizeof(long double) is 16. All other types have at most 64-bit alignment.
What the abi needs to specify
Proposal A
No version of operator new[] is a special case. For any array new-expression we store the number of elements in the array, as a size_t, at an offset of -sizeof(size_t) from the pointer returned by the new-expression. For any type T other than char, unsigned char, long double, or a type containing a long double, n1 = n * sizeof(T) + sizeof(size_t). For those three types, since we need to preserve long double alignment, n1 = n * sizeof(T) + sizeof(long double).
Pseudocode for new(ARGS) T[n] under this proposal:
if T = char or unsigned char, or if it has long double alignment,
padding = sizeof(long double)
else
padding = sizeof(size_t)
p = operator new[](n * sizeof(T) + padding, ARGS)
p1 = (T*) (p + padding)
((unsigned long*) p1 - 1) = n
for i = [0, n)
create a T, using the default constructor, at p1[i]
return p1
Proposal B
::operator new[](size_t, void*) is a special case. For that version of operator new[] only, n1 = n * sizeof(T). We do not store the number of elements in such an array anywhere.
Pseudocode for new(ARGS) T[n] under this proposal:
If the expression is new(p) T[n], and if overload resolution
determines we're using ::operator new[](size_t, void*), then
p1 = (T*) p
for i = [0, n)
create a T, using the default constructor, at p1[i]
return p1
For all other cases, same as proposal A.
Proposal A is simpler, but proposal B probably conforms more closely to user expectations.
[000210 All -- Matt]
We agreed that Proposal B, where
::operator new(size_t, void*)
is a special case with no cookie,
is preferable to Proposal A,
where all versions of array new get cookies.
We also agreed to the variation where we don't reserve space for a cookie if the type has no destructor. We're calling it Proposal C. We need a writeup, but we should be able to close this issue next week.
[000302 CodeSourcery -- Mark] I believe the resolution to A-20/A-21, dealing with array new, is incorrect with respect to the C++ standard. (In other words, I think we'll make it impossible to implement the behavior required by the standard.)
In particular, there are situations in which we do not allocate cookies, even when allocating arrays of class type. But, the standard guarantees that:
When a delete-expression is executed, the selected deallocation function shall be called with the address of the block of storage to be reclaimed as its first argument and (if the two-parameter style is used) the size of the block as its second argument.)
That paragraph doesn't require that the class type have a non-trivial destructor.
I think that means the first bullet:
(Note: if the usual array deallocation functions takes two arguments, then its second argument is of type size_t. The standard guarantees that this function will be passed the number of bytes allocated with the previous array new expression. See [class.free] for details.)
[000302 All] Modification accepted.
# | Issue | Class | Status | Source | Opened | Closed |
---|---|---|---|---|---|---|
A-22 | RTTI for reference types | data | closed | CodeSourcery | 000119 | 000203 |
Summary: __reference_type_info does not appear to be necessary. | ||||||
Resolution: Remove it. |
[000119 CodeSourcery -- Nathan] When would a type_info of a reference ever be generated? (So why __ref_type_info?)
[000126 CodeSourcery -- Nathan]
[000128 Cygnus -- Jason] Based on that, I definitely think reference type_info can go away.
[000203 All] Remove __ref_type_info.
# | Issue | Class | Status | Source | Opened | Closed |
---|---|---|---|---|---|---|
A-23 | RTTI class descriptors | data | closed | CodeSourcery | 000124 | 000302 |
Summary: Resolve several questions about the RTTI representation of class types. | ||||||
Resolution: See the Draft C++ ABI for IA-64. |
[000124 CodeSourcery -- Nathan]
si_class_type_info
is for a single nonvirtual inheritance heirarchy.
Presumably this single non-virtual inheritance is between the derrived
and the base (the base may or may not have multiple or virtual bases).
An additional constraint is that, if the derrived class is polymorphic,
the base class is too. Rationale: if the derrived class adds
polymorphism, the base will be at a non-zero offset.
[000126 CodeSourcery -- Nathan] More useful for dynamic cast (and possibly catch matching) {than the current set of flags -- editor} would be the following flags:
Note that the virtual/non-virtual and public/non-public are not mutually exclusive. Also note that I have not actually implemented anything with these flags, so I could be wrong.
[class.mi] (clause 10.1) provides good examples of "diamond shaped."
Paragraph 4 gives a non-diamond shaped graph with multiple base object.
At least one of the multiply inherited base objects must be non-virtual.
struct L {};
struct A : L {};
struct B : L {};
struct C : A, B {};
There are two distinct L base objects in C. C would have the non-diamond shaped multiple inheritance flag set. A, B and C would have the non-virtual base flag and public base flag set.
Paragraph 5 gives a diamond shaped graph.
Such a multiply inherited base object must be virtual.
struct V {};
struct A : virtual V {};
struct B : virtual V {};
struct C : A, B {};
This time C would have the diamond shaped flag set. A, B & C would have the virtual base flag set and the public base flag set. C would also have the non-virtual base flag set.
Paragraph 6 gives a graph which contains both features.
Here there is one non-virtual base and one virtual base.
struct B {};
struct X : virtual B {};
struct Y : virtual B {};
struct Z : B {};
struct AA : X, Y, Z {};
In that example, AA would have both diamond and non-diamond flags set. all would have the public base flag set, AA & Z would have the non-virtual base flag set, AA, X & Y would have the virtual base flag set.
The above is treating the non-virtual and virtual base flags differently, they should have the following meaning:
My thinking is that for dynamic_cast, having such information will allow pruning parts of the inheritance graph walk. For instance, there can only be distinct multiple target base objects when the non-diamond shaped flag is set in the complete object. When we find them, the base sub-object started from can only be a common base for both of them, if the diamond shaped flag is set in the complete object. Alternatively, there can only be (at most) one instance of the target type when the non-diamond shaped flag is clear. When we find it via a non-public path, there could only be an alternative public path if the complete object has the diamond shaped flag set. Similar pruning should be possible for catch matching. Without such information, the graph walk has to be pessimistic, which I beleive will slow down the common case.
[000126 CodeSourcery -- Nathan]
__si_class_type_info
is documented for
a single non-virtual hierarchy,
and __vmi_class_type_info
for a class containing
(directly or indirectly)
a multiple or virtual inheritance component.
My mistake was to use __si_class_type_info
for a class with a single base,
regardless of the
heirachy within the base (that is the current g++ behaviour).
__si_class_type_info
is for both public and non-public inheritance
(again, something I'd not noticed, thinking it was for public only).
For this to work,
the __class_type_info flag bit 0x8 'non-publicly inherited base'
must mean `non-publicly inherited direct base'.
Please can the wording about bases here explicitly say
`direct base,' `indirect base,' or `direct or indirect base.'
The description currently use `contains' and `has' which
are open to interpretation.
In dynamic casting, access is important. In a cross cast from base A via complete type C to another base B, both B and A must be publicly accessible from C. It might be that dynamic_cast locates B, and, knowing that C does not have multiply inherited subobjects, determines it need look no further. However, it must determine access. If C has no non-public direct or indirect bases, access must be OK, without further inspection. However the hint flag 0x8 can't be indicating that, as it is only for direct bases. (This was the one case where I was able to take advantage of these flags, but alas it seems I can't.)
[000127 All]
We decided on Thursday that your "mistakes" are what we want.
__si_class_type_info
will be for any class with a
single direct base at offset 0 which is public and non-virtual.
We also decided that the flags should move from
__class_type_info
into __vmi_class_type_info
,
and that the polymorphic flag should be removed.
[000126 CodeSourcery -- Nathan] I think this moving of the flags is a mistake. If I understood correctly, they indicated information about direct and indirect bases (whether there was virtuality anywhere in the heirarchy for instance). Such information can speed up dynamic cast. When walking the inheritance graph, we can take some early outs, if we know there are no multiple subobject types within the complete graph. With the flags in every class's type_info, it becomes easier to get hold of that info. With it only for vmi classes, we have to remember `unknown' when presented with a complete object of si type, and fill the information in when/if we find a vmi base.
Another case is in a potential cross-cast case, which I had in the previous email. Suppose we've found the target base, which we know is unique, but not found the source base (because we early outed, maybe). To be a valid cross-cast both the source and target base objects must be public in the complete object. If we know the complete heirarchy has no non-public bases, there's no need to search for the source base in this case.
[000129 Cygnus -- Jason] So what you're saying is if we try to dynamic_cast from A* to B*, where B has a unique A subobject and the A* does not actually point to part of a B, if we know that B has no multiple subobjects we can check the passed offset, see that it doesn't match, and return failure. Without that information, we would have to recurse up the single-inheritance chain until we either reach the A or a class with multiple or virtual bases.
I think I'd rather pay that small performance hit than add a word to the type_info for each class. Matt, would this affect locales?
... cross-casts only come up in the context of classes with multiple bases, so it wouldn't make sense to look for this in single inheritance classes anyway.
[000127 All] Note from the meeting: A proposed precise definition of a diamond-shaped object is one that has two different direct bases with the same virtual base, directly, indirectly, or vacuously (the direct base is the virtual base).
[000203 All] Move the flags from __class_type_info to __vmi_class_type_info. Share them with one byte from the __base_class_info offset field. Replace Daveed's set with Nathan's, but the first one isn't needed.
[000203 SGI -- Jim] The class type restructuring is a bit different than what I expected going in (could just be my confusion).
I moved the flags from __class_type_info to __vmi_class_type_info, discovering that they don't need to share space with the offset field in the __base_class_info records, but rather with the base class count. But, the __base_class_info has its own flags (virtual and public) which can reasonably share a doubleword, as we were discussing for the other flags this morning. So I specified that. Note that I put the flags in the low byte rather than the high byte. That is because the offset is signed, and it is likely that implementations will sign-extend (signed doubleword>>8), but not (doubleword & 0x00ffffffffffffffll).
After an exchange with Nathan, I reinstated his first flag (contains non-diamond multiple inheritance).
[000210 All -- Matt] Notes from the meeting:
Minor corrections to RTTI discussion in data layout document: In section 7c, which describes the vmi_flags, flag 0x01 is documented incorrectly. It says "class has non-diamond multiple inheritance", which isn't quite right. We're really talking more about repeated inheritance: having multiple subobjects of the same type.
Also in vmi_flags, Jason questions whether flags 0x04 and 0x08 are necessary. What do we really need "has virtual base(s)" and "has non-virtual base(s)" for? Jason has sent email to Nathan about this.
Naming issue: we decided to put all of our type_info subclasses
in namespace abi, not namespace std. This means, of course,
that they can't go in any of the standard headers. Rather than
inventing multiple header names, we would like to put everything
(unwinding longjmp, type_info subclasses, etc.) into one quasi-
standard header. We propose the name
Issue A23 can almost be closed. The only thing we need to resolve is whether to keep the two flags that Jason is unsure about.
[000302 All -- Matt] We will tentatively keep the has-public-base flag. Nathan has an action item to validate its usefullness when he implements.
# | Issue | Class | Status | Source | Opened | Closed |
---|---|---|---|---|---|---|
A-24 | RTTI for incomplete types | data | closed | CodeSourcery | 000126 | 000330 |
Summary: How does RTTI represent incomplete types? | ||||||
Resolution: Use class_type_info distinct from the complete type copy, add a flag to pointer_type_info if it points to incomplete type RTTI, and do mangled name comparison if an incomplete pointer is involved. |
[000126 CodeSourcery -- Nathan] The amended (25th Jan) RTTI specification says:
Note that the full structure described by an RTTI descriptor may include incomplete types not required by the Standard to be completed, although not in contexts where it would cause ambiguity.
I don't believe this is the case,
the example I posted a couple of weeks back pointed this out.
Here it is, in a slightly more compact form
struct A;
struct B;
int main ()
{
try {
throw (B **)0;
} catch (A const * const *) {
abort ();
} catch (B const * const *) {
;//ok
} catch (...) {
abort ();
}
}
I believe this is well formed and should not abort. The RTTI document indicates that `typeid (A const * const *)' and `typeid (B const * const *)' will produce __pointer_type_info chains that end at a weak symbol reference for A and B respectively. These will both resolve to zero. How is catch matching able to determine the difference between `A const * const *' and `B const * const *' under these circumstances? If this is a shortcoming of the ABI, or considered a defect in the standard, it should be documented.
There seems to be no discussion of this case.
[000127 All] We decided on Thursday that this can be handled by not emitting info for A and B, just referring to them using weak references. The EH matcher will never look past the inner pointers.
[000128 CodeSourcery -- Nathan]
I'm sorry, I'm just not getting this.
The type_infos for `B **' and `B *' will be,
(I'm using g++'s existing name mangling, but these are new-abi structures):
__tiPP1B:
.long __vt_19__pointer_type_info
.long .LC2
.long 0
.long __tiP1B
__tiP1B:
.long __vt_19__pointer_type_info
.long .LC3
.long 0
.long __ti1B ;; not emitted, will resolve to zero
In the catch matching,
the type_infos for `A const *const *' and `A const *' will be:
__tiPCPC1A:
.long __vt_19__pointer_type_info
.long .LC1
.long 1
.long __tiPC1A
__tiPC1A:
.long __vt_19__pointer_type_info
.long .LC4
.long 1
.long __ti1A ;; not emitted, will resolve to zero
and those for `B const *const *' and `B const *':
__tiPCPC1B:
.long __vt_19__pointer_type_info
.long .LC0
.long 1
.long __tiPC1B
__tiPC1B:
.long __vt_19__pointer_type_info
.long .LC5
.long 1
.long __ti1B ;; not emitted, will resolve to zero
I fail to see how the catch matcher can get different results comparing __tiPP1B to __tiPCPC1A as opposed to comparing __tiPP1B to __tiPCPC1B. They both look like qualification conversions of pointers to pointers to incomplete type. In the first case we'll end up comparing __tiP1B to __tiPC1A, which still is a valid qualification conversion, then have two NULL pointers for the pointed to types, which somehow we have to tell apart. In the second case we'll end up comparing __tiP1B to __tiPC1B, and again have two NULL pointers for the pointed to types, but this time we have to consider them the same type. I don't see anything in [conv.qual] saying that qualification conversions don't have to deal with incomplete types. N.B.: old-abi g++ seg faults on the above code because it does wander into the NULL pointers.
[000129 Cygnus -- Jason] Good point. I was forgetting about multi-level qualification conversions.
I think that leaves us with something like what EDG does now: namely, comparisons are done by comparing the addresses of one-byte commons rather than of the type_info nodes themselves. Then we could emit incomplete info in one file and complete info in another file and they would compare the same because both refer to the same ID proxy.
We could mangle the complete and incomplete versions differently, so they would not be combined by the linker.
This would also change how we refer to type_infos; under the current scheme, references to type_infos in the EH type table need to be via relocs that will be resolved by the dynamic linker at runtime. If we don't need to compare addresses, we could use gp-relative references. Of course, we'd still have the absolute references in the type_infos to the ID proxies, so we're no better off.
[000130 CodeSourcery -- Nathan] There's a bit of strangeness with loading & unloading a DSO which contains the complete definition of `struct A', into an executable which has the incomplete info. That too is in the original email. If both DSO and executable have __tiP1A (struct A *), they'll be merged, presumably with the DSO's copy ignored. However, the __tiP1A in the executable will point at the proxy incomplete A type_info (which will have already been filled with a weak NULL for its target). Somehow we have to arrange that the proxy is altered to now point at the __ti1A (struct A) type_info that the DSO supplied. If we don't do that, throwing `struct A *' in the DSO (which is valid, `cos the DSO source had complete information), will throw the __tiP1A in the executable which points to incomplete. Hence we wont find any base conversions if we're trying to catch a base of A.
[000203 All] We can't seem to get around the need for an EDG-style implementation, i.e. a proxy for the type RTTI which is resolved by name, e.g. a one-byte common block referenced from the RTTI. We need a specific proposal for putting the reference in the RTTI, and a mangling for the name.
Since all we need from the common block is a distinct address, we may want to float a base ABI proposal for a new symbol type which is resolved by the linkers to a unique address without allocating storage.
[000210 All -- Matt] The scheme we have been converging on: we extend __class_type_info by putting in a new field, id_proxy_ptr, of type char*. It points to a one-byte comdat which serves only as a unique address. (We don't see a strong need to ask the base ABI group to mandate a magic unique-address feature in the linker. We may want to get input from our linker people, though.)
A class's __class_type_info object and its comdat proxy both receive mangled names. We must make sure that the proxy's mangled name is the same for all complete and incomplete declarations of a class, that the mangled name of the __class_type_info object is the same for all complete declarations of a class, and that the mangled name of the __class_type_info object is different for incomplete declarations than for complete declarations. One way to achieve this is to make __class_type_info objects for incomplete declarations static.
We add a new flag to __pointer_type_info; let's say bit 0x4. If this is set, it means we have a pointer to an incomplete type (or pointer to pointer to incomplete type, etc.)
We compare two __class_type_infos for equality by pointer comparison of the id_proxy_ptr fields. We compare two __pointer_type_infos for equality by looking at the addresses of the type_info objects, *unless* the incomplete bit is set in at least one of them. If the incomplete bit is set, we have to compare the pointed-to types. For everything other than classes and pointers we can just use address equality of the type_info objects themselves.
In response to Jason's 000129 question: we can't use gp-relative references for type_info objects because we're only using comdat proxies for __class_type_info, not for other kinds of type_info objects.
In response to Nathan's 000130 question: this is the reason to give the complete and incomplete __class_type_info objects different mangled names. That way a complete __class_type_info object in a DSO won't be overridden by an incomplete __class_type_info object in the executable.
At the very end of this meeting we got a suggestion from Christophe for a complete different mechanism. We agreed that we can't evaluate it without a writeup. The suggestion: abandon these comdat proxies altogether. Instead we have a new type_info class, __incomplete_class_type_info. Comparisons involving two __class_type_info objects use address equality, comparisons involving two __incomplete_class_type_info objects, or a __class_type_info and an __incomplete_class_type_info, do string comparison on the name. We still would have an incomplete bit in the __pointer_type_info class, which, again, we would use to determine whether two __pointer_type_info objects with different addresses might nevertheless represent the same pointer type.
[000309 All] The group decided to go ahead and close this issue with the proxy solution. If Christophe comes up with a writeup of the alternate proposal, we can reopen.
[000314 SGI -- Jim] I've incorporated the chosen scheme into the Draft C++ ABI for IA-64. In working this out, though, I've remembered why SGI had an issue with the proxy commons, which is that, in large programs with lots of class types, they produce a lot of runtime relocation scattered through data. Matt and I think we understand the representation of Christophe's proposal, and will think about how to compare the mangled names.
[000330 All] Adopt the proposed scheme. Make sure Nathan understands it.
# | Issue | Class | Status | Source | Opened | Closed |
---|---|---|---|---|---|---|
A-25 | Excess-width bitfields | data | closed | IBM | 000204 | 000217 |
Summary:
C++ allows bitfields with a larger size specified than that required by
the declared type, e.g. int f: 64 .
How should they be allocated?
| ||||||
Resolution: Allocate the field with alignment determined as though it were the largest integer type that fits in the specified size, and use the first bits available in the field (lowest order for little endian IA-64) for the actual data. |
When the specified width of a bitfield exceeds the size of the declared type, the standard specifies that the accessible field is to be padded to the specified width, with the location of the padding implementation-defined. That is, the accessible field could be placed at the beginning, at the end, or in the middle of the specified bits. (Note that such declarations are explicitly disallowed by the C 2000 draft, so this is not a C ABI issue.)
[000204 SGI -- Jim]
It seems to me that the situation that makes it interesting is the
following:
struct s {
short s1;
int i: 64;
short s2;
}
In this case, I don't want the accessible part of i at the beginning or
the end -- I want it in the middle. Doing otherwise yields either a
badly aligned i, or wasted space.
One could express this by the following rule:
enum ... e : 32;
" to behave as though
the compiler allocated a 32-bit int,
even if it actually uses only 8 bits for the enum value.
# | Issue | Class | Status | Source | Opened | Closed |
---|---|---|---|---|---|---|
A-26 | NULL pointers to member functions | data | closed | CodeSourcery | 000221 | 000302 |
Summary: How are NULL pointers to member functions represented? | ||||||
Resolution:
A NULL pointer is represented by a 0 value of ptr ,
and the value of adj is irrelevant.
|
[000221 CodeSourcery -- Mark] The ABI document says that a NULL pointer-to-member function has `ptr == 0'. It does, not, however say whether or not a NULL pointer-to-member function also has `adj == 0'.
I believe that this should be specified as well so that code generated
to do comparison of pointers to members (of the same type)
looks like:
p1->ptr == p2->ptr && p1->adj == p2->adj
p1->ptr == p2->ptr && (!p1->ptr || (p1->adj == p2->adj))
So, I would say:
It's occurred to me that this imposes some overhead on casting pointers-to-members around: now when you convert from a base pointer to member to a derived version (or vice versa), you can't just adjust the `adj' member willy-nilly; instead, you have to check first whether or not the pointer is NULL.
So, I'm not sure any more which scheme is preferable -- but we definitely need to say clearly which we want.
[000222 CodeSourcery -- Mark] So, it would be helpful if we were to add:
p1.ptr == p2.ptr && (!p1.ptr || (p1.adj == p2.adj))
[000229 SGI -- Jim] Comparisons (5.10) of pointers to virtual member functions are undefined. So, for pointer-to-function-member comparisons, we only need to worry about non-virtual members and null. Since the representation stores the actual address of the function descriptor, we should be able to just compare the pointers, and ignore the adjustment.
For conversions between base classes, it seems that we need only modify the adjustment, and then only if one is not primary for the other. For conversion to null, it seems that we need only set the pointer to 0, and can ignore the adjustment.
[000302 All] Represent NULL by a 0 pointer, with the adjustment unspecified.
# | Issue | Class | Status | Source | Opened | Closed |
---|---|---|---|---|---|---|
A-27 | NULL pointers to data members | data | closed | CodeSourcery | 000222 | 000302 |
Summary: How are NULL pointers to member data represented? | ||||||
Resolution: A NULL pointer is represented by the value -1. |
[000222 CodeSourcery -- Mark] We haven't specified a way to represent a NULL pointer to data member. G++ presently adds one to the offset, allowing zero to serve as the NULL pointer to member.
[000223 CodeSourcery -- Mark] What is the value for the NULL pointer to data member? I guess -1 would do, unless there are cases I can't think of where the pointer to member would legitimately have a negative value. Maybe 0x8000000000000000 is better...
It's illegal to do this if the base is virtual. But, that's the only case in which the `this' pointer can increase.
[000229 SGI -- Jim] From the Standard:
So we can conclude that, since we always allocate non-virtual bases before data members, any base object in a derivation chain will have its base address smaller than any of the data members declared in members of the chain. Therefore, the offset represented by a pointer-to-data-member will always be non-negative, even after the permitted conversions above.
So, we could either use -1 for NULL, or use 0 and increment the offset. 0x800...000 is an unnecessary complication.
[000302 All] Represent NULL by the value -1.
# | Issue | Class | Status | Source | Opened | Closed |
---|---|---|---|---|---|---|
A-28 | RTTI equality testing | data | closed | CodeSourcery | 000406 | 000504 |
Summary: Can we get back the ability to do a simple test for RTTI equality? | ||||||
Resolution: Mangle the name NTBS for std::type_info separately, emit it in its own COMDAT, and use it instead of the RTTI struct, at least if the incomplete flags are set in pointer types. |
[000406 CodeSourcery -- Nathan] The current RTTI proposal loses the property that all type_info objects can be compared for equality and orderability by address comparison. Instead, type_info::operator== must involve a virtual function call or unconditionaly strcmp. (An alternative of testing the typeid of the polymorphic type_info objects results in infinite recursion!)
Here are two proposals which reinstate the address equality property. The first is rather different to the current scheme, but when I was done documenting it, I realised there was a minor modification to the current scheme, which partially reinstates the address equality. I present both for consideration. Feel free to shot them down ...
The base class of these is:
class abi::__type_info
{
std::type_info const *type; // pointer to typeid(foo) object.
virtual ~__type_info ();
... other implementation defined member functions
};
This contains a pointer to the type_info object produced by the typeid operator, for whatever type this is describing. That will be a unique object.
There are a number of necessary derivations of this type, which can be taken largely unaltered from the current proposal.
It is necessary to distinguish function types, so that catch matching
can distinguish a data pointer object from a function pointer object.
Other types (fundamental, enum, array) need not be distinguished,
and can be represented by an abi::__type_info object.
(Or we could keep the current proposal of having separate derivations
for these.)
class abi::__function_type_info
: public abi::__type_info
{
virtual ~__function_type_info ();
... other implementation defined member functions
};
Pointers are as they currently are,
other than the base class change.
We still need the incomplete target flag.
class abi::__pointer_type_info
: public abi::__type_info
{
abi::__type_info const *target; // target type of the pointer
unsigned flags; // flags, as currently specified
virtual ~__pointer_type_info ();
... other implementation defined member functions
};
Pointers to member could be a sibling class of non member pointers.
However, they do share common functionality,
and IMO it makes sense to derive from __pointer_type_info.
class abi::__pointer_to_member_type_info
: public abi::__pointer_type_info
{
abi::__class_type_info const *klass; // class of the member
virtual ~__pointer_to_member_type_info ();
... other implementation defined member functions
};
The __class_type_info, __si_class_type_info and __vmi_class_type_info
are unchanged, other than the change to __class_type_info's base.
class abi::__class_type_info
: public abi::__type_info
{
... as currently defined
}
The vtable slot -1, (which currently holds a pointer to the std::type_info object for a class), points to the abi::__class_type_info object. To implement typeid(X), where X is polymorphic, involves an additional indirection through the abi::__type_info base to return the `type' member.
dynamic_cast uses the abi::__class_type_info object pointed to in the vtable. throwing and catch matching use the abi::__type_info object for the type being thrown or caught.
As with the current proposal, an incomplete type is represented by an abi::__class_type_info object. Note that its abi::__type_info base will point to the unique std::type_info object for that type, regardless of whether a DSO completes the type. This incomplete type is prevented from preempting the complete type information.
Also direct or indirect pointers to incomplete have their incomplete flag set, and are also prevented from preempting the equivalent pointer to complete object.
During catch matching, comparison of pointers can compare the abi::__pointer_type_info addresses, unless either has the incomplete flag set, in which case the std::type_info objects pointed to must be compared. (The std::type_info objects could be compared even when the incomplete flags are clear.)
There are two or three naming schemes with this proposal:
Advantages of this proposal are:
The cost of this proposal is
The first proposal is essentially
using the std::type_info objects as unique objects,
via which incomplete types can be compared.
We already have such a unique object candidate --
the NTBS name member of std::type_info.
Currently we've not said anything about that.
If, however, we give that NTBS comdat linkage, a unique name,
and prevent it being commonized with other strings, we have a proxy.
These features can be obtained by treating it as a
`const char []' rather than a string constant.
type_info equality and orderability can now use the address of this array,
rather than the type_info objects themselves.
We can do this in all cases,
even though it is only necessary for the pointer to incomplete case,
as that avoids a virtual function call.
Here is an implementaion of type_info::operator==
bool type_info::operator== (type_info const &other) throw ()
{
return name == other.name;
}
We need to specify the naming scheme for the NTBS.
The advantages of this are
The costs over proposal A are
[000411 CodeSourcery -- Nathan]
Issue 2
The algorithm for collation order of type_infos, cannot simply compare addresses for non-pointer types, and complete pointer types. Using string collation only works when one of the types is a pointer with the incomplete_mask set. There are two difficulties. Firstly, we might be comparing a non-pointer type_info with a pointer type_info. We need to determine this and DTRT WRT the incomplete flag of the pointer type_info. to do that will require dynamic_cast or typeid'ing the type_infos. Secondly, assume we are just comparing pointer type_info's. We have two pointers to complete, Aptr and Bptr, and a third pointer to incomplete, Cptr.
There is nothing maintaining the consistency of the results of these three tests -- result 1 is uncorrelated with results 2 & 3.
Therefore type_info::before must be implemented as string compare on the type's names. We lose any advantage of commonizing the type_infos.
Issue 3
17.4.4.4 prevents an implementation adding member functions to one of the std classes, except in particular circumstance. About the only leeway given is whether a particular non-virtual function is inline or not. So I presume we're not permitted to add virtual member functions to std::type_info (18.5.1). The rules given in 17.4.4.4 specifying what member functions can be added look like applications of the as-if rule, but there must be something deeper going on, as if that was all, it wouldn't be mentioned. I'm not sure how a conforming program could tell whether additional functions had been added.
The abi requires us to add virtual functions to type_info. For instance the implementation of operator== will require it to deal with pointers to incomplete. G++ needs several for catch matching.
Issue 4
5.2.8 talks about typeid returning something derived from type_info, but the footnote mentioning extended_type_info implies to me that typeid always returns objects of the same type. Again, I'm not sure how a conforming program could tell.
The two proposals above resolve these issues. Proposal A resolves issues 2,3 &4, whilst proposal B resolves issue 2 only, and will leave us (slightly) non-conformant.
[000413 All] The Standard committee members in the group are quite sure that Issues 3 and 4 are not problems. Section 17.4.4.4 does not impose the suggested constraint (see footnote 173), and the intent of 5.2.8 is not to restrict typeid to returning a single class.
Proposal B resolves the remaining issue, and the group is inclined to accept it, while considering whether to go further with A. Jim will (and has) integrated B into the Draft C++ ABI for IA-64.
[000504 All] It was decided to accept the current writeup. See the Draft C++ ABI for IA-64.
# | Issue | Class | Status | Source | Opened | Closed |
---|---|---|---|---|---|---|
A-29 | RTTI pointer-to-member | data | closed | CodeSourcery | 000407 | 000504 |
Summary: Derive __pointer_to_member_type_info from __pointer_type_info. | ||||||
Resolution: Derive __pointer_to_member_type_info and __pointer_type_info from a common base class __pbase_type_info. Add a new flag to __pbase_type_info indicating that the class of a pointer-to-member is incomplete (propagated up a chain of pointers). |
[000407 CodeSourcery -- Nathan] __pointer_to_member_type_info is derived from type_info. I strongly recommend it be derived from __pointer_type_info, as it requires much of the same functionality, and has the same meanings of its flags. By subclassing __pointer_type_info, much code could be reused.
Thus point 8 of the rtti classes would become
The abi::__pointer_to_member_type_info type adds one field to abi::__pointer_type_info:
- a pointer to a abi::__class_type_info (e.g., the "A" in "int A::*")
[000411 CodeSourcery -- Nathan]
It is permissible in a pointer to member of X,
for X to be an incomplete type [8.3.3]/2.
This means that we need more that a single incomplete flag.
The presence of such a ptr to member,
will mean that it and all pointers to it will have their incomplete flag set,
but its target might not be an incomplete chain.
In implementing G++'s rtti runtime I
found the following three flags useful,
(this is with __pointer_to_member_type_info derived from __pointer_type_info):
incomplete_mask = 0x8
incomplete_chain_mask = 0x10
incomplete_klass_mask = 0x20
incomplete_mask is an inclusive or of the other two flags. incomplete_klass_mask is only used by __pointer_to_member_type_info, and __pointer_type_info knows nothing about it (it simply examines the other two).
A __pointer_type_info or __pointer_to_member_type_info sets the incomplete_mask and incomplete_chain_mask, if the target is an incomplete type, or has its incomplete_mask set.
A __pointer_to_member_type_info sets the incomplete_mask and the incomplete_klass_mask, if the class of the member is incomplete.
[000411 Ed.] I've tentatively incorporated both of these into the layout document, except that I just defined a second flag (in __pointer_type_info flags) for direct or indirect incomplete class type (in member pointers). Any pointer type inspections can check for both flags, even though only member pointers can cause one of them to be set up the chain.
[000413 All] Derive __pointer_to_member_type_info and __pointer_type_info from a common base class __pbase_type_info. Add a new flag to __pbase_type_info indicating that the class of a pointer-to-member is incomplete (propagated up a chain of pointers).
(Ed. note) I've added updates to the Draft C++ ABI for IA-64.
[000504 All] It was decided to accept the current writeup. See the Draft C++ ABI for IA-64.
# | Issue | Class | Status | Source | Opened | Closed |
---|---|---|---|---|---|---|
A-30 | RTTI portability | data | closed | HUB | 001012 | 001109 |
Summary: What must be specified to produce RTTI portability? Are member layouts specified? Names? Virtual functions? | ||||||
Resolution: Data members of the ABI-defined type_info derived classes must be allocated as specified, and their names are normative. Virtual functions, beyond the Standard-specified destructor, are implementation-specific, and may not be referenced outside the compiler and system vendors' runtime libraries. |
[001012 all -- Jim] The issue here, raised originally by Martin, I will open as A-30. Implementations will generally need additional virtual functions associated with the type_info hierarchy to implement such functionality as dynamic cast. Gcc for instance has functions __is_function_p, __do_catch, __pointer_catch, ...
A program that is built from pieces from different compilers, where the pieces come from different implementations of the hierarchy, will see different structures, at least in the vtables, if we allow this extra material to be arbitrary, creating a problem if such programs actually make use of parts of the hierarchy.
We worked out the following possible solution:
std::type_info
:
class __cxa_aux_typeinfo {
... (*__is_function_p) (...);
...
};
The implementation will create one instance of this class for each of the classes derived from std::type_info, and we will specify a mangled name for it.
class std::type_info {
...
protected:
__cxa_aux_typeinfo *__aux;
type_info (void) { /* set up __aux */ };
};
Now an implementation can add an arbitrary set of functions to __cxa_aux_typeinfo, specialized to the derived class like a virtual function, without changing the external interface (to the user) of the hierarchy.
[001103 SGI -- Jim]
[...leaving out much discussion...]
So, after all the above, I suggest the following actions:
[001109 all] The current writeup is adequate. See the resolution in the issue header.
# | Issue | Class | Status | Source | Opened | Closed |
---|---|---|---|---|---|---|
A-31 | Overlaying tail padding | data | closed | CodeSourcery | 001019 | 001109 |
Summary: Should we change the decision to overlay tail padding in class layout? For volatile members? In general? | ||||||
Resolution: The overlaying of tail padding is eliminated, but we will retain the treatment of empty bases. |
[001019 CodeSourcery -- Mark]
I think I recall that the committee was intentionally trying to use
the tail padding of one object to save space. For example, consider:
struct A { short s; char c; };
struct B { A a; char d; };
(These are PODs, but you can easily make an equivalent non-POD example).
Here, I think the comittee wanted to give `B' size 4, by packing `d' into the tail padding of `A'.
I think this is a mistake. David Gross came up with the following example:
Code generator needs to copy dsize, not sizeof, unless it can prove that the object is in a context where tail padding isn't overlayed. Reason? Tail padding might be overlayed by a volatile field.
Hence, a non-POD that looks like
struct S { short sh; char ch; };
requires ld2/st2/ld1/st1 for a copy instead of ld4/st4 because we
might have
struct T { S s; volatile char d; };
Similarly, people using memcpy to copy around POD components of non-PODs will get burned.
This completely breaks user expectation since people routinely expect to be able to stick a function or two into a POD without changing its layout.
I think we should make the following changes:
Note that this still permits the empty base optimization; nvsize will be zero, and sizeof will be 1.
There's an important different between using the tail padding in an empty base and the tail padding in a generic object: you know that you never have to copy an empty base.
[001109 all] Although dealing with tail padding overlaying would be straightforward in a from-scratch compiler, getting the information to all the places in the back end of g++ or the HP compiler that would need it is a huge task (estimated at a widely scattered 1500 lines of code touched in g++). In addition, it is expected that some number of users moving back and forth between C and C++ and trying to match C structs with C++ non-POD classes will have problems, though there are questions about how many.
Therefore, we have decided to eliminate the overlaying of tail padding. Mark will provide alternate proposed wording for the ABI document.
# | Issue | Class | Status | Source | Opened | Closed |
---|---|---|---|---|---|---|
B-1 | Adjustment of "this" pointer (e.g. thunks) | data call | closed | SGI | 990520 | 991202 |
Summary: There are several methods for adjusting the this pointer for a member function call, including thunks or offsets located in the vtable. We need to agree on the mechanism used, and on the location of offsets, if any are needed. To maximize performance on IA64, a slightly unusual approach such as using secondary entry points to perform the adjustment may actually prove interesting. | ||||||
Resolution: See the writeup in the Draft C++ ABI for IA-64. |
[990623 HP -- Christophe]
The following proposal applies only to calls to virtual functions when a this pointer adjustment is required from a base class to a derived class. Essentially, this means multiple inheritance, and the existence of two or more virtual table pointers (vptr) in the complete object. The multiple vptrs are required so that the layout of all bases is unchanged in the complete object. There will be one additional vptr for each base class which already required a vptr, but cannot be placed in the whole object so that it shares its vptr with the whole object. Note: when the vptr is shared, the base class is said to be the "primary base class", and there is only one such class.
For the primary base class, no pointer adjustment is needed. For all other bases, a pointer to the whole object is not a pointer to the base class, so whenever a pointer to the base class is needed, adjustment will occur.
In particular, when calling a virtual function, one does not know in advance in which class the function was actually defined. Depending on the actual class of the object pointed to, pointer adjustment may be needed or not, and the pointer adjustment value may vary from class to class. The existing solution is to have the vtable point not to the function itself, but to a "thunk" which does pointer adjustment when needed, and then jumps to the actual function. Another possibility is to have an offset in the vtable, which is used by the called function. However, more often than not, this implies adding zero.
Virtual bases make things slightly more complicated. In that case, the data layout is such that there is only one instance of the virtual base in the whole object. Therefore, the offset from a this pointer to a same virtual base may change along the inheritance tree. This is solved by placing an offset in the virtual table, which is used to adjust the this pointer to the virtual base.
My proposal is to replace thunks with offsets, with two additional tricks:
The thunks are believed to cost more on IA64 than they would on other platforms. The reason is that they are small islands of code spread throughout the code, where you cannot guarantee any cache locality. Since they immediately follow an indirect branch, chances are we will always encounter both a branch misprediction and a I-cache miss in a row.
On the other hand, a virtual function call starts by reading the virtual function address. Reading the offset immediately thereafter should almost never cause a D-cache miss (cache locality should be good). More often than not, no adjustment is needed, or the adjustment will be done at call site correctly. In the worst case scenario, we perform two adjustments, one static at call site, and one dynamic in the callee, but this case should be really infrequent.
The new calling convention requires that the 'this' pointer on entry points to the class for which the virtual function is just defined. That is, for A::f(), the pointer is an A* when the main entry of the function is reached. If the actual pointer is not an A*, then an adjusting entry point is used, which immediately precedes the function.
In the following, we will assume the following examples:
struct A { virtual void f(); };
struct B { virtual void g(); };
struct C: A, B { }
struct D : C { virtual void f(); virtual void g(); }
struct E: Other, C { virtual void f(); virtual void g(); }
struct F: D, E { virtual void f(); }
void call_Cf(C *c) { c->f(); }
void call_Cg(C *c) { c->g(); }
void call_Df(D* d) { d->f(); }
void call_Dg(D* d) { d->g(); }
void call_Ef(E* e) { e->f(); }
void call_Eg(E* e) { e->g(); }
void call_Ff(F *ff) { ff->f(); }
void call_Fg(F *ff) { ff->g(); } // Invalid: ambiguous
convert_to_D and convert_to_E are likely to be at the same offset in the vtable. This is not a problem, even if D and E are used in the same class, such as F, because this is the same offset in different vtables.
The fact that an offset is reserved does not mean that it is actually used. A vtable need to contain the offset only if it refers to a function that will use it. An offset of 0 is not needed, since the function pointer will point to the non-adjusting entry point in that case.
In other words, adjustment is made only when necessary, and at a place where it is better scheduled than with thunks. The only bad case is double adjustment for call_Cg called with an E*. This case can probably be considered rare enough, compared to calls such as call_Cg called with a C*, where we now actually do the adjustment at the call-site.
Currently, the sequence for a virtual function call in a shared library will look as follows. I'm assuming +DD64, there would be some additional addp4 in +DD32. The trail below is the dynamic execution sequence. In bold and between #if/#endif, the affected code.
// Compute the address of the vptr in the object,
// from the this pointer
// Optional, since vptroffset is often 0.
// This also adjusts to the class of the final overrider
addi Rthis=vptroffset_of_final_overrider,Rthis
;;
// Load the vptr in a register
ld8 Rvptr=[Rthis]
;;
// Add the offset to get to the function descriptor pointer
// in the vtable. Never zero, this instruction is always generated
addi Rfndescr=fndescroffset,Rvptr
;;
// (Assuming inlined stub) Load the function address and new GP
ld8 Rfnaddr=[Rfndescr],8
;;
// Load the new GP
ld8 GP=[Rfndescr]
mov BRn=Rfnaddr
;;
// Perform the actual branch to the target
// ...
// ... Branch misprediction almost always, followed by
// ... I-Cache miss almost always if jumping to a thunk
br.call B0=BRn
#if OLD_ADJUST
thunk_A::f_from_a_B:
// If the 'adjustment_from_B_to_A is the 'adjustment_to_A' above,
// then in the new case, the vtable directly points to A::f
addi Rthis,adjustment_from_B_to_A
// In most cases, we can probably generate a PC-relative branch here
// It is unclear whether we would correctly predict that branch
// (since it is assumed that we arrive here immediately following
// a misprediction at call site)
br A::f
#endif // OLD_ADJUST
// This occurs less often than OLD_ADJUST
// (it does not happen when call-site adjustment is correct)
#if NEW_ADJUST
adjusting_entry_A::f
// Can't be executed in less than 3 cycles?
addi Rvptr=class_adjustment_offset,Rvptr
;;
// This loads data which is close to the fn descriptor,
// so it's likely to be in the D-cache
ld8 Rvptr=[Rvptr]
;;
add Rthis=Rthis,Rvptr
#endif
A::f:
alloc ...
[990812 All] Discussion of B-6 raises questions of impact on the above approach. Christophe will look at the issues.
[990826 Cygnus -- Jason] [An alternative suggestion from Jason via email.]
Rather than per-function offsets, we have per-target type offsets. These offsets (if any) are stored at a negative index from the vptr. When a derived class D overrides a virtual function F from a base class B, if no previously allocated offset slot can be reused, we add one to the beginning of the vtable(s) of the closest base(s) which are non-virtually derived from B. In the case of non-virtual inheritance, that would be D's vtable; in simple virtual inheritance, it would be B's. The vtables are written out in one large block, laid out like an object of the class, so if B is a non-virtual base of D, we can find the D vtable from the B vptr.
D::f then recieves a B*, loads the offset from the vtable, and makes the adjustment to get a D*. The plan is to also have a non-adjusting vtable entry in D's vtable, so we don't have to do two adjustments to call D::f with a D*; the implementation of this is up to the compiler. I expect that for g++, we will do the adjustment in a thunk which just falls into the main function.
The performance problems with classic thunks occur when the thunk is not close enough to the function it jumps to for a pc-relative branch. This cannot be avoided in certain cases of virtual inheritance, where a derived class must whip up a thunk for a new adjustment to a method it doesn't override.
In this case, we will only ever have one thunk per function, so we don't even have to jump. Except in the case of covariant returns, that is, where we will have one per return adjustment. But we know all necessary adjustments at the point of definition of the function, so they can all be within pc-relative branch range.
[Extensive discussion followed by email -- this suggestion is not completely correct, but may be the basis of a workable solution.]
[990831 Cygnus -- Ian] A couple of observations ...
On the state of the art:
The Microsoft approach is worth mentioning. (I haven't seen it discussed -- though perhaps that is because of the patent situation.)
It allows zero-adjusting (i.e. non-thunking) calls for (almost) every virtual function call in a non-virtual, multiple inheritance hierarchy.
For those that are unfamiliar, the idea is that all calls go via the base class vft and overriding functions expect a pointer to the base class type. (That is, if D::f overrides B::f, it expects the first parameter to be of type B*, not D*.) The callee does the necessary static adjustment to get to the derived class 'this' pointer as needed.
It avoids requiring a thunk, and it's often the case that the cost is zero in the callee because the this-adjustment can be folded into other offset computations.
On the balance, it could well win over all the other approaches being discussed here. [Though, it may lose in some specific cases vs. Christophe's approach where one would create additional extra entries in the derived class vft.]
On when to make extra virtual function table entries for functions:
One of Cristophe's suggestions is sort-of separate from the rest of the discussion: making extra entries in the derived class' vft for some overridden virtual functions. It has the benefit of giving you a faster calls if you happen to be in (or near) the derived class -- at the expense of space in the vft.
Of course, you can always make the call through the introducing base class, so these extra entries are a pure space/time performance trade off (w/ some unpredictable D-cache effects) and the cost/benefit analysis will depend a little on what the rest of the strategy looks like.
The same idea is potentially applicable, no matter what strategy you actually use for vft layout, and different criteria for deciding what extra entries to make are possible. For example, creating an extra entry when overriding a function introduced in a virtual base has the added benefit of avoiding a cast to a virtual base at the call site.
[990909 All] We are getting closer -- understanding of the alternatives is improving, and Christophe may agree with the Jason/Brian proposal after more thought. To make sure we really understand what we're agreeing to, Jason and Christophe will write up more precise proposal(s).
[991111 jason]
We have decided that for virtual functions not inherited from a virtual base, regular thunks will work fine, since we can emit them immediately before the function to avoid the indirect branch penalty; we will use offsets in the vtable for functions that come from a virtual base, because it is impossible to predict what the offset between the current class and its virtual base will be in classes derived from the current class.
The calling convention is as follows:
For each virtual function defined in a class, we add an entry to the primary vtable if one is not already there. In particular, a definition which overrides a function inherited from a secondary base gets a new slot in the primary vtable. We do this to avoid useless adjustments when calling a virtual function through a pointer to the most derived class.
When a class is used as a virtual base, we add a vcall offset slot to the beginning of its vtable for each of the virtual functions it provides, whether in its primary or secondary vtables. Derived classes which override these functions will use the slots to determine the adjustment necessary.
As in Christophe's proposal above, the caller adjusts the 'this' argument to point to the class which last overrode the function being called. The result provides both the 'this' argument and the vtable pointer for finding the function we want.
Each virtual function 'f' defined in a class 'A' has one entry point which takes an A*, and performs no adjustment. The primary vtable for A points to this entry point.
For each secondary vtable from a non-virtual base class 'B' which defines f, an additional entry point is generated which performs the constant adjustment from B* to A*.
For each secondary vtable from a virtual base class 'C' which defines f, an additional entry point is generated which performs the adjustment from C* to A* using the vcall offset for f stored in the secondary vtable for C.
For each secondary vtable from a base 'D' which is a non-virtual base of a virtual base 'E', an additional entry point is generated which first performs the constant adjustment from D* to E*, then the adjustment from E* to A* using the vcall offset for f stored in the secondary vtable for E.
Note that the ABI only specifies the multiple entry points; how those entry points are provided is unspecified. An existing compiler which uses thunks could be converted to use this ABI by only adding support for the vcall offsets. A more efficient implementation would be to emit all of the thunks immediately before the non-adjusting entry point to the function. Another might use predication rather than branches to reach the main function. Another might emit a new copy of the function for each entry point; this is a quality of implementation issue.
[991202 all] Adopt Jason's writeup.
# | Issue | Class | Status | Source | Opened | Closed |
---|---|---|---|---|---|---|
B-2 | Covariant return types | call | closed | SGI | 990520 | 990722 |
Summary: There are several methods for adjusting the 'this' pointer of the returned value for member functions with covariant return types. We need to decide how this is done. Return thunks might be especially costly on IA64, so a solution based on returning multiple pointers may prove more interesting. | ||||||
Resolution: Provide a separate Vtable entry for each return type. |
[990610 Matt] One possibility is to have two Vtable entries, which might point to different functions, different entrypoints, or a real entrypoint and a thunk. Another is to return two result pointers (base/derived), and have the caller select the right one.
[990715 All] Daveed presented his multiple-return-value scheme, including an example that involved virtual base classes, return values that are pointers to nonpolymorphic classes, and other equally horrible things.
Consensus: we need to get the horrible cases correct, but speed only matters in the simple case. The simple case: class B has a virtual function f returning a B1* and class D has a virtual function f returning a D1*, where all four classes are polymorphic, B is a primary base of D, and B1 is a primary base of D1. (The really important case is where B1 is B and D1 is D, but that simplification doesn't make any difference.)
Jason: Would the usual multiple-entry-point scheme work just as well? That is, would it be just as fast as Daveed's scheme in the simple case, and still preserve enough information for the more complicated cases? It appears so, but we don't have a proof. Jason will try to provide one.
[990716 Cygnus -- Jason] Proof? You always know what types a given override must be able to return, and you know how to convert from the return type to those base types. You know from the entry point which type is desired. Seems pretty straightforward to me.
[990716 Cygnus -- Jason] The alternative I was talking about yesterday goes something like this:
When we have a non-trivial covariant return situation, we create a new entry in the vtable for the new return type. The caller chooses which vtable entry to use based on the type they want.
This could be implemented several ways, at the discretion of the vendor:
The advantage of this approach to the complex case is that we don't have to do a dynamic_cast when faced with multiple levels of virtual derivation. It is also strictly simpler; Daveed's model already requires something like this in cases of multiple inheritance.
Of course, we can always mix and match; we could choose to only do this in cases of virtual inheritance, or use Daveed's proposal and do this only in cases of repeated virtual inheritance. In that case, the multiple returns would just be an optimization for the single virtual inheritance case.
Since we don't seem to care about the performance of anything but single nonvirtual inheritance, it seems simpler not to bother with multiple returns.
The remaining question is how to handle the case of nontrivial nonvirtual inheritance: do we use multiple slots or have the caller do the adjustment? My inclination is to have the caller adjust.
WRT patents, the idea of having the function return the base-most class and having the caller adjust is parallel to the patented Microsoft scheme whereby they pass the base-most class as the 'this' argument to virtual functions, but the word 'return' does not appear anywhere in the patent, so it seems safe.
[990722 All] The group was generally agreed that the simplicity of multiple entries in the vtable outweighed any space/performance advantage of more complex schemes (e.g. the method Daveed described on 15 July). Discussion focussed on whether it is worthwhile to eliminate some of the entries in cases where they are unnecessary because the caller knows the required conversion, namely when the return type has a unique non-virtual subobject of the original return type.
Agreement was reached to avoid the complication of eliminating some of the Vtable entries. Thus, the Vtable will have one entry for each accessible return type of a covariant virtual function. These may be implemented in a variety of ways, e.g. duplicated functions, separate entrypoints, or stubs, and the ABI need not specify the choice. The location of the Vtable entries is part of the separate Vtable layout issue B-6.
# | Issue | Class | Status | Source | Opened | Closed |
---|---|---|---|---|---|---|
B-3 | Allowed caching of vtable contents | call | closed | HP | 990603 | 990805 |
Summary: The contents of the vtable can sometimes be modified, but the concensus is that it is nonetheless always allowed to "cache" elements, i.e. to retain them in registers and reuse them, whenever it is really useful. However, this may sometimes break "beyond the standard" code, such as code loading a shared library that replaces a virtual function. Can we all agree when caching is allowed? | ||||||
Resolution : Caching is allowed. |
[990604 HP -- Christophe] Mike (Ball) gave me what I believe is an excellent definition of when caching is allowed. I'd like him to present it.
[990805 All] Christophe explained that the rule is simply that, within a call to a member function of the class, the class Vtable may not be modified. Between such calls, no assumption may be made. With this observation, the issue is closed.
[990812 All] The rule is even simpler. Once a program changes the type of a pointer's target, the pointer is invalidated, and its value may not be reused. Therefore, a code sequence which repeatedly refers to the same pointer value is invalid if the pointee's vtable has been changed.
# | Issue | Class | Status | Source | Opened | Closed |
---|---|---|---|---|---|---|
B-4 | Function descriptors in vtable | data | closed | HP | 990603 | 990805 |
Summary: For a runtime architecture where the caller is expected to load the GP of the callee (if it is in, or may be in, a different DSO), e.g. HP/UX, what should vtable entries contain? One possibility is to put a function address/GP pair in the vtable. Another is to include only the address of a thunk which loads the GP before doing the actual call. | ||||||
Resolution : The Vtable will contain a function address/GP pair. |
[990624 All] Note that putting GP in the Vtable prevents putting it in shared memory. See B-7.
[990805 All] It was decided that special representations to accomodate shared memory would be expensive and therefore undesirable. Therefore, the decision is to put the function address/GP pair in the vtable, avoiding the cost of an extra indirection in using it.
[991007 IBM -- Brian] A while ago Jason was worried about COM compatibility. Part of that is to ensure that vtables can be expressed in C. But the resolution of issue B-4 says that a vtable contains function descriptors rather than function descriptor pointers.
From the standpoint of call performance that is a good thing, but the result can't be built in C. I know that we at least will also have to rewrite parts of our C++ runtime that hand-build vtables. Neither of these are critical for IBM but may be for others.
[991103 Cygnus -- Richard Henderson]
> The ia64 C++ ABI committee has decided to use the descriptors. > If this doesn't make sense (i.e. if there's no way to express > such a thing to the assembler), now's the time to let us know...:) You mean you want the vtable to look like struct { void *code, *gp } vtable[]; There are no suitable IA-64 relocations to express this.
[991106 SGI -- Jim] Richard Henderson of Cygnus points out that the IA-64 relocations don't support doing this (inserting a function descriptor in data). However, the R_IA_64_IPLT*SB relocations do perform the correct action. The problem is that they are currently specified to be valid only in executables and shared objects. I believe that the problem can be solved by simply removing this restriction. The static linker support required shouldn't be major -- it would presumably just pass the relocations through to the linked object and let the dynamic linker deal with them.
The above issue has been raised with the IA-64 base ABI group.
# | Issue | Class | Status | Source | Opened | Closed |
---|---|---|---|---|---|---|
B-5 | Where are vtables emitted? | data | closed | HP | 990603 | 991118 |
Summary: In C++, there are various things with external linkage that can be defined in multiple translation units, while the ODR requires that the program behave as if there were only a single definition. From the user's standpoint, this applies to inlines and templates. From the implementation's perspective, it also applies to things like vtables and RTTI info. (We call this vague linkage.) | ||||||
Resolution: Vtables will be emitted with the key function (first virtual function that is not inline at the point of class definition), if any. If no key function, emit everywhere used (i.e. referred to by name). Place in a comdat group in all cases. |
[990624 Cygnus -- Jason] There are several ways of dealing with vague linkage items:
#3 and #4 are feasible for templates, but I consider them too heavyweight to be used for other things.
The typical heuristic for #2 is "with the first non-inline, non-abstract virtual function in the class". This works pretty well, but fails for classes that have no such virtual function, and for non-member inlines. Worse, the heuristic may produce different results in different translation units, as a method could be defined inline after being declared non-inline in the class body. So we have to handle multiple copies in some cases anyway.
The way to handle this in standard ELF is weak symbols. If all definitions are marked weak, the linker will choose one and the others will just sit there taking up space.
Christophe mentioned the other day that the HP compiler used the typical heuristic above, and handled the case of different results by encoding the key function in the vtable name. But this seems unnecessary when we can just choose one of multiple defns.
A better solution than weak symbols alone would be to set things up so that the linker will discard the extra copies. Various existing implementations of this are:
The GNU ELF toolchain does a variant of #1 here; any sections with names beginning with ".gnu.linkonce." are treated as COMDAT sections. It seems more sensible to me to key off of the section name than the first symbol name as in PE.
The GNU linker recently added support for garbage collection, and I've been thinking about changing our handling of vague linkage to make use of it, but haven't.
I propose that the ia64 base ABI be extended to provide for either COMDAT sections or garbage collection, and that we use that support for vague linkage.
I further propose that we not use heuristics to cut down the number of copies ahead of time; they usually work fine, but can cause problems in some situations, such as when not all of the class's members are in the same symbol space. Does the ia64 ABI provide for controlling which symbols are exported from a shared library?
A side issue: What do we want to do with dynamically-initialized variables? The same thing, or use COMMON? I propose COMMON.
See also G-3, for vague linkage of inlined routines and their static variables.
[990624 SGI summarizing others] HP uses COMDAT for many cases, keying from the symbol names. HP also uses some heuristics. HP observes that IA-64 objects will already be large. From the base ABI discussions, any use of WEAK or COMMON symbols will need to take care not to depend on vendor-specific treatment.
Defining a COMDAT mechanism doesn't preclude using heuristics to avoid some copies up front. A COMDAT mechanism should also specify how to get rid of associated sections like debugging info, unless the identical mechanism works.
[990629 HP -- Christophe] First, the "usual" heuristic (which is usual because it dates back to Cfront) is to emit vtables in the translation unit that contains the definition of the first non inline, non pure virtual function. That is, for:
struct X {
void a();
virtual void f() { return; }
virtual void g() = 0;
virtual void h();
virtual void i();
};
the vtable is emitted only in the TU that contains the definition of h().
This breaks and becomes non-portable if:
inline void X::h() { f(); }
Now, the COMDAT issue is as follows: a COMDAT section is, in some cases, slightly more difficult to handle (at least, that's the impression Jason gave me). For statics with runtime initialization, what you can do is reserve COMMON space ('easier'), then initialize that space at runtime. As I said, the problem is if two compilers disagree on whether this is a runtime or a compile time initialization, such as in :
int f() { return 1; }
int x = f(); // Static (COMDAT) or Dynamic (COMMON) initialization?
So I personally recommend that we put everything in COMDAT.
[990715 All] Consensus so far: use a heuristic for vtable and typeinfo emission, based on the definition of the key function. (The first virtual function that is not declared inline in the class definition.) The vtable must be emitted where the key function is defined, it may also be emitted in other translation units as well. If there is no key function then the vtable must be emitted in any translation unit that refers to the vtable in any way.
Implication: the linker must be prepared to discard duplicate vtables. We want to use COMDAT sections for this (and for other entities with vague linkage.)
Open issue: the elf format allows only 16 bits for section identifiers, and typically two of those bits are already taken up for other things. So we've only got 16k sections available, which is unacceptable if we're creating lots of small sections.
Jason - COMDATs disappear into text and data at link time, so the issue is really only serious if we've got more than 16k vtables (or template instantiations, etc.) in a single translation unit.
Daveed - HP has gotten around this problem by hacking their ELF files to steal another 8 bits from somewhere else.
Jack - a new kind of section table could be a viable solution. However, it would break everything if we did it for ia32. Is a solution that only works on ia64 acceptable? Note also that the elf section table has its own string table, which we wouldn't be able to share with the new kind of section table. Index and link fields often point into section table, we would have to figure out how to deal with this. (Jack is not opposed to the idea of an alternate section table, he is just pointing out some of the issues we will have to resolve.)
[990805 All] We need a specific proposed representation for COMDAT. IBM's version is restricted to one symbol per section. Jim will look for Microsoft's PECOFF definition. Anyone else with a usable definition should send it.
[revised 991012 SGI]
[991007] Change default to simply group; COMDAT semantics is option. Don't support removal based on duplication of non-COMDAT sections. Just remove symbols defined relative to removed sections.
C++ has many situations where the compiler may need to emit code or data, but may not be able to identify a unique compilation unit where it should be emitted. The approach chosen by the C++ ABI group to deal with this problem, is to allow the compiler to emit the required information in multiple compilation units, in a form which allows the linker to remove all but one copy. This is essentially the idea called COMDAT in several existing implementations.
Various other implementations (notably Windows NT) and proposals obtain more generality by varying the duplicate removal semantics. The most obviously useful variant supports grouping of sections for removal purposes, but treats duplication as an error, using it to support link-time removal of unreferenced sections. The proposal below treats this simple grouping as the default semantics, and provides duplicate removal as an option.
Our objectives include:
The proposal below is based on the HP definition, with minor modifications and more precise definitions.
This attribute flag may be set in any section header, and no other modification or indication is made in the grouped sections. All additional information is contained in the associated SHT_GROUP section (see below).
Some sections occur in interrelated groups. For instance, an out-of-line definition of an inline function might require, in addition to its .text section, a read-only data section containing literals referenced, one or more debug information sections, and/or other informational sections. Furthermore, there may be internal references among these sections that would not make sense if one of them were removed or replaced by a duplicate from another object. Therefore, we assume that such groups are to be included or omitted from the linked object as a unit. (Except for the GRP_COMDAT flag described below, this definition does not specify the circumstances under which the members of a group might be discarded from the linked object.)
To facilitate this, we define a SHT_GROUP section:
The section header attributes of a Group Section are:
name | unspecified |
sh_type |
SHT_GROUP |
sh_link |
.symtab section index |
sh_info |
symbol index |
sh_flags |
none |
sh_entsize |
size of section indices (4 ) |
requirements | may not be stripped |
The section group's sh_link
field identifies a symbol
table section, and its sh_info
field the index of a
symbol in that section.
The name of that symbol is treated as the identifier of the section group.
The section data of a SHT_GROUP section is a flag word followed by a sequence of section indices. The flag word may contain the following flags:
The section indices in the SHT_GROUP section identify the sections which make up the group.
The sh_size
value is sh_entsize
times
one plus the number of sections in the group.
The linker may choose to discard a section in a group, i.e. not include its data in the linked object, based on COMDAT duplicate semantics (above), or for other implementation-defined reasons (e.g. removing unreferenced code). If it does so, the group semantics requires that all of the group members be removed as a unit.
(Note, however, that this is not intended to imply that special-case behavior like removing debug information requires removing the sections to which it refers, even if they are in a group. We could clarify this issue by tying the removal semantics to the section which contains the identifying symbol, but this seems overly restrictive and unnecessary.
The above rules allow a group to be removed without leaving dangling references, with only minimal processing of the symbol table.
[revised 991012 SGI]
[991007] Change section/flag names, move ELF header extension to section header 0.
SGI has long been concerned about the 64K limitation on the number of sections in an object file. Although this need not normally be a problem, there are purposes for which we would like to place distinct functions, and sometimes data items, in distinct sections. When one takes into account associated sections, e.g. relocation, debug information, etc., this leads to a limitation on the order of 16K units, and threatens to be a problem for some large compilation units such as machine-generated simulators.
C++ ABI efforts raise the same issue from another source. Various C++ structures are emitted under circumstances where the compiler cannot reliably identify a single compilation unit in which to emit them. Examples include common cases like class virtual tables, out-of-line copies of inline functions, and template instantiations. The favored solution is COMDAT sections, i.e. putting the potentially duplicated items in their own sections, and allowing the linker to remove the duplicates. Once again, though, this threatens to be a problem for very large compilation units.
The following proposal attempts to remove this limitation. Obviously, even if the problem is real, it will actually arise in very few compilation units. Therefore, the elements of the proposed solution are defined so as to leave unchanged object files which do not encounter the problem. We consider this compatibility objective as primary -- much more important than performance or clean definitions for the problematic object files -- particularly as it should allow vendors to merge the solution into existing tool chains at convenient times without disrupting existing programs.
Proposed ABI wording is in normal font; commentary is in italics. Section numbers are from the Intel IA-64 psABI.
The range of section indices from 0xff00 (SHN_LORESERVE) to 0xffff (SHN_HIRESERVE) is reserved for special purposes, and the gABI already forbids real sections with these indices. Our approach is to deal with situations where section indices cannot be compatibly expanded to a full 32 bits by using one of these indices as an escape value indicating that the actual index will be found elsewhere.
The ELF header has two relevant 16-bit fields: e_shnum contains the section count, and e_shtrndx the index of a string section. We modify their descriptions to include an overflow indicator, and put the actual values in the reserved section header at index 0 if necessary, as follows:
ElfXX_Half e_shnum;
e_shentsize
and e_shnum
gives the section header table's size in bytes.
If a file has no section header table,
e_shnum
holds the value zero.
If the number of sections is greater than
SHN_LORESERVE
(0xff00
),
this member has the value SHN_XINDEX
(0xffff
),
and the actual number of section header table entries is in the member
sh_size
of the section header at index 0.
ElfXX_Half e_shstrndx;
SHN_UNDEF
.
See ``Sections'' and ``String Table'' below for more information.
If the section name string table index is greater than
SHN_LORESERVE
(0xff00
),
this member has the value SHN_XINDEX
(0xffff
),
and the actual index of the section name string table is in the member
sh_link
of the section header at index 0.
We define a new special section index as an escape value for large section indices, as referenced above:
SHN_XINDEX (0xffff)
We note here that the section header contains two fields commonly used
to hold section indices, sh_link
and sh_info
,
but they are already defined as ElfXX_Word, and require no change.
A new section type is defined:
SHT_SYMTAB_SHNDX (17)
The sh_link
field of this section contains the index of the
associated SHT_SYMTAB
section.
A new special section name is defined:
.symtab_shndx
.symtab
section.
The section's attributes will include the SHF_ALLOC
bit
if the associated .symtab
section does;
otherwise, that bit will be off.
There is no available field to point from the .symtab
section
to its associated .symtab_shndx
section,
so we use the sh_link
field in the latter to point back.
It is recommended (but not required) that implementations place each
.symtab_shndx
section immediately after its associated
.symtab
section (in the section header table)
to make it easy for the linker to find.
The symbol table is the most problematic. It has no convenient location for an expanded section index. Therefore, we propose that the escape value imply redirection to a separate, parallel table containing full-size section indices.
Modify the definition of st_shndx
as follows:
st_shndx
As the sh_link
and sh_info
interpretation
table and the related text describe,
section indexes in the range 0xff00 to 0xffff indicate special meanings.
In particular, SHN_XINDEX (0xffff)
indicates that the
real index is too large to fit in this field,
and must be found in the associated SHT_SYMTAB_SHNDX table (above).
If any of the st_shndx
fields in a symbol table section
contain the value SHN_XINDEX (0xffff)
,
there must be an associated SHT_SYMTAB_SHNDX
section,
with a sh_link
field containing the index of this
SHT_SYMTAB
section.
That section contains an array of 32-bit section indices,
matching the symbol table entries 1-1 in the same order.
Entries corresponding to SHN_XINDEX (0xffff)
values of
st_shndx
in the symbol table must contain the actual
section header index to be used.
Others should contain either the correct section header index
(i.e. duplicating the value in st_shndx
), or zero.
The .dynsym section in a linked object is completely analogous to a .symtab section in a relocatable object, and could be handled in the same way with the addition of a dynamic tag to locate it. We have not specified handling here because we expect the linking process to remove most of the section duplication process which causes the problem, e.g. leaving only a small number of .text sections.
There should be no compatibility impact on existing environments, since only very large section counts require object file changes. Individual vendors can postpone implementation until convenient, with no impact on typical programs.
Note, however, that any ELF consumer applications that are currently storing section indices as 16-bit values must change.
[991014 All] Jim Dehnert will push these proposals to the base ABI committee.
[991118 All] A class vtable will be emitted with the key function (the first virtual function that is not inline at the point of class definition), if any. If there is no key function, it will be emitted in every compilation where used (i.e. referred to by name). It will be placed in a comdat group in all cases.
# | Issue | Class | Status | Source | Opened | Closed |
---|---|---|---|---|---|---|
B-6 | Virtual function table layout | data | closed | SGI | 990520 | 991028 |
Summary: What is the layout of the Vtable? | ||||||
Resolution: See the Draft C++ ABI for IA-64, abi.html. |
[990624] Issue split from A-1.
[990630 HP - Christophe]
The current full proposal has been incorporated in the Draft C++ ABI for IA-64.
[990701 All] The above arrived to late for everyone to read it carefully. It was agreed that we would consider it outside the meetings, discuss any issues noted by email, and attempt to close on 22 July. (Christophe is on vacation until that week, and Daveed leaves on vacation the next week.)
[990811 SGI -- Jim] I've put a reworked version of Christophe's writeup in the Draft C++ ABI for IA-64, along with a number of questions it raises.
[990812 All] Extensive discussion of this issue produced the observations that
[990820 IBM -- Brian]
I'm going to write the exam on this to see how well I am understanding the issue.
If I understand it correctly, the proposal under consideration is tied to the decision to replicate virtual function entries in vtables. It requires replicating in the vtable for base class B all virtual functions that are overridden in B; more replication that this implies will be wasted since a function is always called through a vtable of an introducing or overriding class.
When a non-pure virtual function X::f() is compiled it is possible to determine whether it requires a secondary entry point. It will require one if that function may be virtually called (i.e., is the final overrider) in any class in which f() appears in more than one vtable; this needs to be decidable knowing only X. A rule that works is: X::f() overrides one or more f()'s from base classes of X, and either one or more of those base classes are virtual or X fails to share its vptr with all instances of them.
[Though a virtual base may happen to share its vptr with X in an object of complete type X, that relationship may fail to hold in further derived classes, so we need to generate the secondary entry point just in case.] ["Sharing a vptr" is the condition under which no adjustment is necessary; if the bases involved are all nonvirtual then subsequent class derivation won't change this.]
Each vtable that requires a nonzero adjustment will have a "convert to X" offset mixed in with its virtual base offsets. It is necessary that a "convert to X" appears in the same position in each vtable that references X::f()'s secondary entry; it is desirable that the "convert to X" also be unique in each vtable.
Assume that X has nonvirtual nonprimary bases Nx (x=1,2,...), and virtual bases Vx, all of which have a virtual f(). Then vtables for Nx in X, or in anyclass derived from X that does not further override f(), will reference X::f()'s secondary entry. Vtables for Vx in X or any derived class where Vx does not share a vptr with X, will also reference X::f()'s secondary entry; note this will occur in a construction vtable even if the derived class does further override f().
The question, then, is whether a position for the "convert to X" offset can be chosen, knowing only X and its parentage, that can be used consistently in all those vtables and that won't collide with a "convert to Y" position chosen on account of some other hierarchy where Y::g() overrides an Nx::g() or Vx::g().
If Y derives from X, we will be able to select a "convert to Y" position that doesn't conflict, so we can restrict our attention to cases where X and Y are unrelated. Also, if the base involved is nonvirtual (Nx) then we are safe, because no instance of Nx will be a subobject of both X and Y, so no Nx vtable will require both "convert to X" and "convert to Y" offsets.
The remaining case is where X and Y are unrelated but both have
a virtual base Vx:
struct V1 { virtual void f(); virtual void g(); };
struct Other1 { virtual void ignore1(); }
struct X : Other1, virtual V1 { virtual void f(); }
struct Y : Other1, virtual V1 { virtual void g(); }
struct ZZ: X, Y { }
The vtable for N1 in ZZ does require both offsets. The only way I see to accomplish this is to preallocate an adjustment slot for each virtual function in V1. That is, X::f() uses the first slot position, and Y::g() the second, based on the order that f() and g() are declared in V1. This only needs to be done in hierarchies where V1 is virtual, but the same offset has to be used for any Nx tables in X too.
Is this close?
I don't understand the comment that varying numbers of virtual base offsets make it impossible to concatenate vtables and refer to them via a single symbol. The only code that refers by name to X's vtable and the vtables of N1 in X etc. is X's constructor and destructor, and maybe some derived classes that find they are able to reuse some pieces. All that code is aware of X's declaration and can map out its tables. What am I missing?
[990826 All] There is still considerable confusion about what will work. Key questions are (1) whether member functions can share offsets to base classes, or each need their own; and (2) when we need a no-this-adjustment override entry.
[990901 SGI -- Jim] Being confused myself by all the discussion, I've constructed a new page containing (initially) an example of a class hierarchy supplied by Christophe, and attempted to identify possible function calls, the class data layout, and the class vtable layout based on Christophe's original proposal. Please provide corrections, and if you're proposing alternative vtable constructions, describing them for this example might help (me, at least). Also feel free to provide additional examples illustrating other points.
[990930 Cygnus -- Jason] Jason has updated the Vtable layout description in abi.html to reflect the approach from Cygnus and IBM.
[991014 all]
ACTION ITEMS: Jason---update writeup to reflect these three changes. Our decision on issue B-8 will require a one-sentence change. All of us: study the revised version. We are almost ready to close this issue, and if we agree with the revised version we can close it at the 21 October meeting.
[991028 all] It was agreed to accept the version currently in the Draft C++ ABI for IA-64, abi.html.
# | Issue | Class | Status | Source | Opened | Closed |
---|---|---|---|---|---|---|
B-7 | Objects and Vtables in shared memory | data | closed | HP | 990624 | 990805 |
Summary: Is it possible to allocate objects in shared memory? For polymorphic objects, this implies that the Vtable must also be in shared memory. | ||||||
Resolution : No special representation is useful in support of shared memory. |
[990624 All] Note that putting GP in the Vtable prevents putting it in shared memory. This interacts with B-4.
[990624 HP -- Cary] For a C++ object to be placed into shared memory, its vtable pointer must be valid in all processes that are sharing that object.
One way or another, we need a way of ensuring that a pointer from shared memory to private memory is valid in all processes, which means that we will need a means to ensure that certain shared library data segments can get mapped at the same address in all processes that load those certain libraries.
My wild idea a few years ago was to put the vtables in shared memory (by allocating and building them at load time, as Taligent did), and store a shared library index in place of the gp value in each function descriptor. Each process would have its own table of gp values, indexed by this shared library index, but the index space would be managed system-wide. The C++ runtime library would have been responsible for allocating a new index for each unique C++ shared library loaded on the system, then storing the process-local copy of the gp pointer in the appropriate slot of the table.
[990628 SGI -- Jim] Note a further problem with vtables in shared memory (Cary's point 2). If a virtual function comes from another DSO, it may be pre-empted differently in different programs. Hence, the function pointer itself is a problem even if the GP isn't.
[990701 All] An extensive discussion boiled down to a few points:
These ideas are very fuzzy. Participants should think about the need and possibilities and attempt to identify more concrete approaches.
[990805 All] It was determined (largely based on consideration by Jason) that the only practical approach to putting objects in shared memory is to force the objects, Vtables, functions, etc. to the same addresses in the various processes involved. If this is done, data representation issues are irrelevant. Therefore, this issue is closed as moot.
Note that the base psABI defines a flag, EF_IA_64_ABSOLUTE, which forces an executable object to the addresses specified in ELF, so at least one method of representing this is already available.
# | Issue | Class | Status | Source | Opened | Closed |
---|---|---|---|---|---|---|
B-8 | dynamic_cast | data | closed | SGI | 990628 | 991014 |
Summary: What information to we put in the vtable to enable (a) dynamic_cast from pointer-to-base to pointer-to-derived (including detection of ambiguous base classes) and (b) dynamic_cast to void*? | ||||||
Resolution : The vtable will contain an offset to the beginning of the complete object, and a pointer to the typeinfo object. |
[990701 All] This should be part of the proposal Daveed will put together by the 15th (action #13); the group will discuss it on the 22nd.
[990812 Sun -- Michael] Sun has provided a description, in a separate page, describing their implementation. They are filing for a patent on the algorithms described.
[991014 All] This is closely related to issues A-6 and B-6. It is agreed that what we need is an offset to the beginning of the complete object, and a pointer or offset to the typeinfo object. We choose to have an offset to the typeinfo object instead of a pointer, which effectively means that the typeinfo object is part of the vtable. We will put it at the very beginning, at a negative offset from the vptr.
[991027 SGI -- Matt] At the October 14 meeting we decided to include RTTI information as part of the vtable block, and to include an offset to RTTI information in the vtable rather than a pointer to RTTI information. (We decided on this change so that we would have fewer symbols to resolve at link time.)
Jim came up with a serious objection at the October 21 meeting: during construction we need different RTTI information at different points. A few of us talked about this at Kona, and my impression is that Jim's objection is fatal. We could imagine having base class typeinfo objects in every vtable block, but (1) this would kill any performance advantage we'd get by using an offset rather than a pointer; and (2) we'd lose the ability to use simple pointer identity as a way of telling whether two typeinfos represent the same type.
I propose that we abandon that decision, and go back to using pointers. Does everyone agree?
[991028 All] Agreed.
# | Issue | Class | Status | Source | Opened | Closed |
---|---|---|---|---|---|---|
B-9 | Primary base vtable embedding | data | closed | Cygnus | 000217 | 000302 |
Summary: Resolve the embedding of the vtable for the primary base class in the derived class vtable. | ||||||
Resolution: Any class with virtual bases shall contain vbase pointers for all of its virtual bases. |
[000217 All] Jason noticed an issue today involving the layout of primary vtables.
Our chosen layout starts with the primary base class vtable layout (if any), and adds additional vbase/vcall offsets to the beginning, and additional vfunc pointers at the end. It is then followed by the secondary vtables, in inheritance graph order.
We have assumed, for instance in our decision not to propagate vbase offsets from non-virtual bases, that the secondary vtables were directly accessible at compile-time offsets from the primary vptr. However, this is not currently the case if we are dealing with a class that is the primary base of a derived class. The derived class's additional vfunc pointers will be added between the base class vtable and its secondary vtables for the base's base classes. Therefore, non-overridden base class member functions, at least, can't make assumptions about secondary vtable offsets.
One can, of course, get to the secondary vtable via the secondary vptr in the object, but that costs an additional load.
There is a "solution" that should work, but is a touch ugly. That would be to place the additional vfunc fields for the derived class not immediately after the primary base vtable, but after all of its non-virtual secondary vtables. If we don't think this is worthwhile, we should reconsider the decision about promoting vbase offsets.
[000302 All] It was decided that the simplest solution is to include vbase pointers for all virtual bases, even those with vbase pointers in direct base vtables. They may then be referenced via either the primary or the secondary vtable.
# | Issue | Class | Status | Source | Opened | Closed |
---|---|---|---|---|---|---|
B-10 | Pure virtual runtime | call | closed | CodeSourcery | 000629 | 000706 |
Summary: Define a runtime proxy routine for pure virtual functions. | ||||||
Resolution: Define such a runtime routine, with implementation-defined behavior. |
[000629 CodeSourcery -- Mark]
We need to have a standard entry point to put in vtables to indicate a
pure virtual function.
(Some compilers use __pure_virtual, for example.)
I think we want:
extern "C" void __cxa_pure_virtual ();
[000706 All] Accepted. We will not mandate behavior, since this will be called only in case of Standard-specified undefined behavior, but will comment that program termination is expected, possibly after an error message.
# | Issue | Class | Status | Source | Opened | Closed |
---|---|---|---|---|---|---|
C-1 | Interaction with .init/.fini | lif ps | closed | SGI | 990520 | 991202 |
Summary:
Static objects with dynamic constructors must be constructed at
intialization time.
This is done via the executable object initialization functions that
are identified (in ELF) by the DT_INIT and DT_INIT_ARRAY dynamic tags.
How should the compiler identify the constructors to be called in this way?
One traditional mechanism is to put calls in a .init section.
Another, used by HP, is to put function addresses in a .init_array section.
The dual question arises for static object destructors. Again, the extant mechanisms include putting calls in a .fini section, or putting function addresses in a .fini_array section. Finally, which mechanism (DT_INIT or DT_INIT_ARRAY, or the FINI versions) should be used in linked objects? The gABI, and the IA-64 psABI, will support both, with DT_INIT being executed before the DT_INIT_ARRAY elements. | ||||||
Resolution: Use .init_array and .fini_array sections. |
[991202 All] It was decided to use the array forms for all required initialization or finalization entries, i.e. to put initialization entries into .init_array sections with ELF section type SHT_INIT_ARRAY, and finalization entries into .fini_array sections with ELF section type SHT_FINI_ARRAY. The static linker will combine them, and identify them to the dynamic linker using DT_INIT_ARRAY, DT_INIT_ARRAYSZ, DT_FINI_ARRAY, and DT_FINI_ARRAYSZ dynamic tags.
# | Issue | Class | Status | Source | Opened | Closed |
---|---|---|---|---|---|---|
C-2 | Order of ctors/dtors w.r.t. link | lif ps | closed | HP | 990603 | 000817 |
Summary: Given that the compiler has identified constructor/destructor calls for static objects in each relocatable object, in what order should the static linker combine them in the linked executable object? (The initialization order determines the finalization order, as its opposite.) | ||||||
Resolution: Accepted method based on IBM's specification. See Draft C++ ABI for IA-64, Section 3.3.4. |
[990610 All] Meeting concensus is that the desirable order is right to left on the link command line, i.e. last listed relocatable object is initialized first.
[990701 SGI] We propose that global constructors be handled as follows:
This does not address the global destructor problem. That solution needs to deal not only with the global objects seen by the compiler, but also interspersed local static objects. This treatment seems to be tied up in the question of how early unloading of DSOs is handled, and the data structure used for that purpose (issue C-3).
[990715 All] Cygnus scheme: priorities are 16-bit unsigned integers, lower numbers are higher priority. In each translation unit, there's a single initialization function for each priority. Anything that's prioritized has a higher priority than anything that isn't explicitly assigned a priority.
IBM scheme: priorities are 32-bit signed integers, higher numbers are higher priority. Something that isn't explicitly assigned a priority effectively gets a priority of 0.
Consensus: nobody is sure that negative priorities are very important, but also nobody can think of a reason not to allow them. We accept the idea that priorities are 32-bit signed integers. On a source level Cygnus will keep lower numbers as higher priority, but that's a source issue, not an ABI issue.
Status: No real technical issues, we have consensus on everything that matters. We need to write up the finicky details.
[990722 all] It was decided to follow the IBM approach, including:
To be resolved are the precise source pragma definition (possibly IBM's), and the ELF file representation.
[990729 all] SGI suggested an object representation involving (in relocatables) a new section type, containing pairs <priority, entry address>. The linker would merge all such sections, include any initialization entries specified by other means, and leave one or more DT_INITARRAY entries for normal runtime initialization, either building a routine to call the entries, or referencing a standard runtime routine.
IBM noted that they combine their equivalent data structures in the linker, but don't sort them, leaving that to a runtime routine. This can be done without explicit linker support, but involves runtime overhead.
Cygnus suggested that if we are going to require linker sorting, we should make the facility more general.
Jim will write up a more precise proposal.
[990804 SGI -- Jim]
My objectives are:
Define a new section type, e.g. SHT_CXX_PRIORITY_INIT
.
Its elements are structs:
typedef struct {
ElfXX_Word pi_pri;
ElfXX_Addr pi_addr;
} ElfXX_Cxx_Priority_Init;
The semantics are that pi_addr
is a function pointer,
with an unsigned int
priority parameter,
which performs some initialization at priority pi_pri
.
Each of these functions will be called with the GP of the
executable object containing the table.
The section header field sh_entsize
is 8 for ELF-32,
or 16 for ELF-64.
Each implementation shall provide a runtime library function with
prototype:
void __cxx_priority_init ( ElfXX_Cxx_Priority_Init *pi, int cnt );
cnt
-element
(sub-)vector of the priority initialization entries,
and will call each of them in order.
It will be called with the GP of the initialization entries.
The linker must take the collection of SHT_CXX_PRIORITY_INIT section
entries from the relocatable object files being linked,
and other initialization tasks specified in other ways
(and treated as source priority 0 or object priority -MIN_INT),
and produce an executable object file which executes the initialization
tasks in priority order using only DT_INIT
,
DT_INIT_ARRAY
, and __cxx_priority_init
.
Priority order is first according to the priority of the task,
and then according to the order of relocatable objects and options
in the link command.
The order of tasks specified by other methods,
relative to SHT_CXX_PRIORITY_INIT tasks of priority zero,
is implementation defined.
There are several possible implementations. Two extremes are:
Note that if one is linking ELF-32 objects into a 64-bit program, the entries must be expanded as part of this process.
Jason suggested that if we base this feature on sorting sections, we should provide a general mechanism. Following is a proposal for that purpose.
Define a new section header flag, SHF_SORT
.
If present, the linker is required to sort the elements of the
concatenated sections of the same type,
where the elements are determined by sh_entsize
.
The sort is controlled by fields in sh_info
:
#define SH_INFO_KEYSIZE(info) (info & 0xff)
#define SH_INFO_KEYSTART(info) ((info>>8) & 0xff)
#define SH_INFO_SORTKIND(info) ((info>>16) & 0xf)
The sort must be stable. The sort key must be naturally aligned.
Other conceivable options would be to allow sorting strings
(like SHF_MERGE, this would be indicated by setting SHF_STRING
and putting the character size in sh_entsize
),
or floating point data.
Also, note that if we don't anticipate using such a general mechanism,
it becomes possible to avoid padding words in the ELF-64 format by
separating the priority and address vectors.
[990810 HU-B -- Martin] Global destructor ordering must not only interleave with static locals, but also with atexit. This gives two problems: atexit is only guaranteed to support 32 functions; and dynamic unloading of DSOs break when functions are atexit registered.
[990810 SGI -- Matt] Yes, the interleaving is required by the C++ standard. It's a nuisance, and I don't think there's any good reason for it, but the requirement is quite explicit.
The relevant part of the C++ standard is section 3.6.3, paragraph 3:
What this implies to me is that atexit, and the part of the runtime library that handles destructors for static objects, must know about each other.
[990812 All] Some people would prefer a sorting scheme based on the section name instead of the data, and also less linker impact. Jim will look into alternatives.
[991110 SGI -- Jim] I said I would revisit my proposal, looking at two questions:
I believe the proposal made need have almost no linker impact. Consider the second suggested implementation scheme, based on IBM's description of their approach.
A minimalist implementation (from the linker point of view) includes:
The one at the end calls
These are both in the implementation runtime. The begin routine determines the address and size of the SHT_CXX_PRIORITY_INIT section (below). It sorts the section by priority, and calls __cxx_priority_init(addr,cnt) as described in the proposal with the count of <=0 entries.
__cxx_priority_init_end calls __cxx_priority_init(addr,cnt) with the address and count of >0 entries.
My original proposal did not describe the dynamic tags to delimit the
section, nor the __cxx_priority_init_
Now suppose you want to minimize runtime instead of linker impact -- the first suggested implementation scheme. There are at least two approaches:
One of my original objectives, and I think a key attribute of this proposal, is that this full range of possible implementations, from minimal linker impact to minimal runtime impact, makes absolutely no difference to the generated .o files -- compatibility between compilers does not depend on the chosen link-time implementation.
Sorting is a more interesting issue. I see four possibilities:
I'll say up front that I think implicit sorting is adequate for the purpose at hand, and I'd like to understand other applications before I'd choose (3) or (4).
There are two differences between (3) and (4):
Either would work for the application at hand. Approach (3) would require only one SHT_CXX_PRIORITY_INIT section per .o file, while approach (4) would require up to one such section per constructor call (though only if the user used lots of different priorities). I personally think sorting based on a data vector that's already been concatenated should be much more efficient, but it probably doesn't matter much.
On the other hand, sorting an arbitrarily-sized section, based on an external key, is more flexible except that the keys may be more constrained. So, again, I think the choice comes down to other applications of the feature. Absent significant other demands, I'd just stick to implicit sorting (and optional at that) for now.
[991202 All] An extensive discussion failed to reach concensus, but clarified the issues.
The proposed alternative of sorting based on section name is specifically the Linux implementation of treating all section names containing a dollar sign ($) as being a section name before the dollar sign and a sort key after it. As mentioned above, this has the advantage of being more general, except with respect to the sort key, which isn't an issue here, and it is implemented in Linux.
The primary concern with the Linux approach is that some implementations must deal with static linkers which are under control of other groups or companies, and therefore can't depend on getting linker sorting implemented. IBM has been in that position, though it isn't clear whether it will be an issue on IA-64.
A secondary concern is a general objection from SGI to features that depend on section naming rather than section types and attributes.
Jim will attempt to frame the issue and get feedback from the base ABI group.
[000106 All] We will wait for base ABI feedback before deciding.
[000502 SGI -- Jim] The base ABI group is not particularly interested in this, because they are not getting pressure from their C++ people to worry about it. So, if we want to standardize this, we need to apply pressure within our companies.
We have three choices:
I don't think we should pursue the first unless we have vendors anxious to support it.
[000504 All] The sense of the meeting was that since multiple vendors are going to implement this capability, the ABI will be much healthier if we can agree on the implementation. Otherwise, object files cannot be mixed. We will pursue this further.
[000720 All] Jim reported that the psABI group agreed to allocate a section type for this purpose, and will add a writeup to the Draft ABI (section 3.3.4).
[000803 All] We will follow more closely the IBM pragma semantics: no variable names, applying until the next pragma or end of file. Rename the pragma simply "priority."
[000808 SGI -- Dehnert] I remembered why I changed the pragma name. I'm concerned about "priority" conflicting with more traditional uses of the term, e.g. for multiprocessing priority.
[000817 All] Accepted, changing pragma name from init_priority to priority. There is no conflict with OpenMP or pthreads.
# | Issue | Class | Status | Source | Opened | Closed |
---|---|---|---|---|---|---|
C-3 | Order of ctors/dtors w.r.t. DSOs | ps | closed | HP | 990603 | 000504 |
Summary:
Given the constructor/destructor calls for each executable object
comprising a program, what is the order of execution between objects?
For constructors, there is not much question:
unless we choose some explicit means of control,
file-scope objects will be initialized by the DT_INIT/DT_INITARRAY
functions in the order determined by the base ABI order rules,
and local objects will be initialized in the order their containing
scopes are entered.
For destructors, the Standard requires opposite-order destruction, which implies a runtime structure to keep track of the order. Furthermore, the potential for dynamic unloading of a DSO (e.g. by dlclose) requires a mechanism for early destruction of a subset. | ||||||
Resolution: Accept SGI proposal for a simple API which registers destructors and atexit calls. Subsequently, accept proposal to eliminate call to __cxa_finalize when program exits. |
[990804 SGI -- Jim]
My objectives are:
The runtime library shall maintain a list of termination functions with the following information about each:
The representation of this structure is implementation defined. All references are via the API described below.
When a global or local static object is constructed, which will require destruction on exit, a termination function is registered as follows:
int __cxa_atexit ( void (*f)(void *), void *p, dso_handle d );
__cxa_atexit(f,p,d)
,
is intended to cause the call f(p)
when DSO d is unloaded,
before all such termination calls registered before this one.
It returns zero if registration is successful, nonzero on failure.
Should we use exceptions instead?
The registration function is called separate from the constructor.
When the user registers exit functions with atexit
,
they should be registered with NULL parameter and DSO handle, i.e.
__cxa_atexit ( f, NULL, NULL );
atexit
implementation, so that C-only DSOs will nevertheless interact with C++
programs in a C++-standard-conforming manner.
No user interface to __cxa_atexit is supported,
so the user is not able register an atexit
function with a
parameter or a home DSO.
When linking any DSO containing a call to __cxa_atexit
,
the linker should define a hidden symbol __dso_handle
,
with a value which is an address in one of the object's segments.
(It doesn't matter what address,
as long as they are different in different DSOs.)
It should also include a call to the following function in the FINI
list (to be executed first):
void __cxa_finalize ( dso_handle d );
__dso_handle
.
Note that the above can be accomplished either by explicitly providing
the symbol and call in the linker, or by implicitly including a
relocatable object in the link with the necessary definitions,
using a .fini_array section for the FINI call.
Also, note that these can be omitted for an object with no calls to
__cxa_atexit
, but they can be safely included in all objects.
Finally, a main program should be linked with a FINI call to
__cxa_finalize
with NULL parameter.
When __cxa_finalize(d)
is called,
it should walk the termination function list,
calling each in turn if d
matches
__dso_handle
for the termination function entry.
If d == NULL
, it should call all of them.
Multiple calls to __cxa_finalize
should not result in
calling termination function entries multiple times;
the implementation may either remove entries or mark them finished.
Issue: By passing a NULL-terminated vector of DSO handles to
__cxa_finalize
instead of one,
we could deal with unloading multiple DSOs at once.
However, dlclose
closes one at a time,
so I'm not sure the extra complexity is worthwhile.
Since __cxa_atexit
and __cxa_finalize
must both manipulate the same termination function list,
they must be defined in the implementation's C++ runtime library,
rather than in the individual linked objects.
[991202 All] The proposal above is accepted, with three changes (integrated above):
__cxa_atexit
is supported.
During discussion, it was noted that this proposal will not deal
effectively with DSOs which (a) have cross-DSO destructor interactions
and (b) are unloaded dynamically.
It is generally believed that such code would not reliably work on a
variety of platforms today,
and is not a robust methodology worthy of ABI support.
However, note that if it becomes an issue,
it would be possible to define a __cxa_finalize
analog
which takes a list of DSOs instead of a single DSO,
and if the program or dynamic linker identifies a set of DSOs to be
unloaded together, run their finalization entries in a single pass
instead of one DSO at a time.
[991215 CodeSourcery -- Mark] Note that the type of "__dso_handle" above is not specified. Since the simplest implementation is for the static linker to resolve it into an arbitrary address in the DSO, define it as "void *".
[991216 CodeSourcery -- Mark]
What I'm suggesting (for exit finalization) is:
[991217 CodeSourcery -- Mark]
I've attached the GNU libc source files. Basically, none of these
routines are implemented in terms of the others; instead, they just
share a common data structure. I think the source will make it clear;
none of these files is more than 50 lines or so.
================================
===== filename="cxa_atexit.c"
================================
/* Copyright (C) 1999 Free Software Foundation, Inc.
This file is part of the GNU C Library.
The GNU C Library is free software; you can redistribute it and/or
modify it under the terms of the GNU Library General Public License as
published by the Free Software Foundation; either version 2 of the
License, or (at your option) any later version.
The GNU C Library is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
Library General Public License for more details.
You should have received a copy of the GNU Library General Public
License along with the GNU C Library; see the file COPYING.LIB. If not,
write to the Free Software Foundation, Inc., 59 Temple Place - Suite 330,
Boston, MA 02111-1307, USA. */
#include
#include "exit.h"
/* Register a function to be called by exit or when a shared library
is unloaded. This function is only called from code generated by
the C++ compiler. */
int
__cxa_atexit (void (*func) (void *), void *arg, void *d)
{
struct exit_function *new = __new_exitfn ();
if (new == NULL)
return -1;
new->flavor = ef_cxa;
new->func.cxa.fn = func;
new->func.cxa.arg = arg;
new->func.cxa.dso_handle = d;
return 0;
}
================================
===== filename="cxa_finalize.c"
================================
/* Copyright (C) 1999 Free Software Foundation, Inc.
This file is part of the GNU C Library.
The GNU C Library is free software; you can redistribute it and/or
modify it under the terms of the GNU Library General Public License as
published by the Free Software Foundation; either version 2 of the
License, or (at your option) any later version.
The GNU C Library is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
Library General Public License for more details.
You should have received a copy of the GNU Library General Public
License along with the GNU C Library; see the file COPYING.LIB. If not,
write to the Free Software Foundation, Inc., 59 Temple Place - Suite 330,
Boston, MA 02111-1307, USA. */
#include
#include "exit.h"
/* If D is non-NULL, call all functions registered with `__cxa_atexit'
with the same dso handle. Otherwise, if D is NULL, do nothing. */
void
__cxa_finalize (void *d)
{
struct exit_function_list *funcs;
if (!d)
return;
for (funcs = __exit_funcs; funcs; funcs = funcs->next)
{
struct exit_function *f;
for (f = &funcs->fns[funcs->idx - 1]; f >= &funcs->fns[0]; --f)
{
if (f->flavor == ef_cxa && d == f->func.cxa.dso_handle)
{
(*f->func.cxa.fn) (f->func.cxa.arg);
/* We don't want to run this cleanup again. */
f->flavor = ef_free;
}
}
}
}
===========================
===== filename="atexit.c"
===========================
/* Copyright (C) 1991, 1996, 1999 Free Software Foundation, Inc.
This file is part of the GNU C Library.
The GNU C Library is free software; you can redistribute it and/or
modify it under the terms of the GNU Library General Public License as
published by the Free Software Foundation; either version 2 of the
License, or (at your option) any later version.
The GNU C Library is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
Library General Public License for more details.
You should have received a copy of the GNU Library General Public
License along with the GNU C Library; see the file COPYING.LIB. If not,
write to the Free Software Foundation, Inc., 59 Temple Place - Suite 330,
Boston, MA 02111-1307, USA. */
#include
#include
#include "exit.h"
/* Register FUNC to be executed by `exit'. */
int
atexit (void (*func) (void))
{
struct exit_function *new = __new_exitfn ();
if (new == NULL)
return -1;
new->flavor = ef_at;
new->func.at = func;
return 0;
}
/* We change global data, so we need locking. */
__libc_lock_define_initialized (static, lock)
static struct exit_function_list initial;
struct exit_function_list *__exit_funcs = &initial;
struct exit_function *
__new_exitfn (void)
{
struct exit_function_list *l;
size_t i = 0;
__libc_lock_lock (lock);
for (l = __exit_funcs; l != NULL; l = l->next)
{
for (i = 0; i < l->idx; ++i)
if (l->fns[i].flavor == ef_free)
break;
if (i < l->idx)
break;
if (l->idx < sizeof (l->fns) / sizeof (l->fns[0]))
{
i = l->idx++;
break;
}
}
if (l == NULL)
{
l = (struct exit_function_list *)
malloc (sizeof (struct exit_function_list));
if (l != NULL)
{
l->next = __exit_funcs;
__exit_funcs = l;
l->idx = 1;
i = 0;
}
}
/* Mark entry as used, but we don't know the flavor now. */
if (l != NULL)
l->fns[i].flavor = ef_us;
__libc_lock_unlock (lock);
return l == NULL ? NULL : &l->fns[i];
}
===========================
===== filename="on_exit.c"
===========================
/* Copyright (C) 1991, 1996 Free Software Foundation, Inc.
This file is part of the GNU C Library.
The GNU C Library is free software; you can redistribute it and/or
modify it under the terms of the GNU Library General Public License as
published by the Free Software Foundation; either version 2 of the
License, or (at your option) any later version.
The GNU C Library is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
Library General Public License for more details.
You should have received a copy of the GNU Library General Public
License along with the GNU C Library; see the file COPYING.LIB. If not,
write to the Free Software Foundation, Inc., 59 Temple Place - Suite 330,
Boston, MA 02111-1307, USA. */
#include
#include "exit.h"
/* Register a function to be called by exit. */
int
__on_exit (void (*func) (int status, void *arg), void *arg)
{
struct exit_function *new = __new_exitfn ();
if (new == NULL)
return -1;
new->flavor = ef_on;
new->func.on.fn = func;
new->func.on.arg = arg;
return 0;
}
weak_alias (__on_exit, on_exit)
========================
===== filename="exit.h"
========================
/* Copyright (C) 1991, 1996, 1997 Free Software Foundation, Inc.
This file is part of the GNU C Library.
The GNU C Library is free software; you can redistribute it and/or
modify it under the terms of the GNU Library General Public License as
published by the Free Software Foundation; either version 2 of the
License, or (at your option) any later version.
The GNU C Library is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
Library General Public License for more details.
You should have received a copy of the GNU Library General Public
License along with the GNU C Library; see the file COPYING.LIB. If not,
write to the Free Software Foundation, Inc., 59 Temple Place - Suite 330,
Boston, MA 02111-1307, USA. */
#ifndef _EXIT_H
#define _EXIT_H 1
struct exit_function
{
enum {
ef_free, ef_us, ef_on, ef_at, ef_cxa } flavor;
/* `ef_free' MUST be zero! */
union
{
void (*at) (void);
struct
{
void (*fn) (int status, void *arg);
void *arg;
} on;
struct
{
void (*fn) (void *arg);
void *arg;
void *dso_handle;
} cxa;
} func;
};
struct exit_function_list
{
struct exit_function_list *next;
size_t idx;
struct exit_function fns[32];
};
extern struct exit_function_list *__exit_funcs;
extern struct exit_function *__new_exitfn (void);
#endif /* exit.h */
[991220 SGI -- Jim]
In the elf context assumed by the base IA-64 ABI, I expect that a C++ program will typically be running with the C run-time library libc.so, the C++ runtime library libC.so, likely other system DSOs, and its own components.
In this context, achieving an integrated solution could be accomplished in a couple of ways. The obvious one is to replace the routines atexit, on_exit, and exit in the C run-time library with routines that are cognizant of the C++ __cxa_atexit and __cxa_finalize facilities. a less obvious method, but still generally usable, would be to insert C++-specific versions of them in the C++ runtime library, and depend on preemption to achieve the replacement. This works as long as libC.so precedes libc.so in the library list.
There are other possible non-integrated solutions, but given the assumption of the underlying IA-64 ABI, and the fact that the second solution above can work without changing the underlying C run-time library, it doesn't seem necessary to consider them.
What is an issue, however, is that the application could in theory be linked on a different system than the one where it ultimately runs, and therefore presumably on a different system than that which built the run-time library DSOs. It is that interface which we need to pin down, namely (a) what routines (with what interfaces and semantics) must be present in libC.so/libc.so, and (b) what sequences of calls the libraries may assume the program will make.
We appear to be agreed on the presence of __cxa_atexit and __cxa_finalize in libC.so, on the registration of C++ destructors and C atexit cleanup with __cxa_atexit, and on the use of __cxa_finalize for destructor execution upon early unloading. The open questions are (1) whether (or how) on_exit registration can be integrated, and (2) how the final cleanup is invoked.
The original proposal adopted ignored (1) out of ignorance, and answered (2) by specifying a call to __cxa_finalize(NULL). If (1) is addressed by calling __cxa_atexit for on_exit with a parameter, and passing an additional exit code parameter to __cxa_finalize (and thence to all the finalization actions it invokes), this works, i.e. on_exit works as currently defined by Sun and is properly integrated into the finalization order. But that assumes that the exit code is available for passing to __cxa_finalize, which may imply calling it from exit if it's not available to a .fini_array routine (which was what the original proposal specified).
Mark points out that it works to just assume that exit does the call to __cxa_finalize, or performs the equivalent processing, eliminating the need for the explicit __cxa_finalize call in .fini_array. This is slightly simpler in that it doesn't require generation of the .fini_array entry, and the library implementation can coordinate features like on_exit without exposing the interfaces necessary to implement them. It also probably preserves more faithfully the traditional semantics that atexit routines are executed before the main program .fini_array, although doing __cxa_finalize first in the latter should produce the same effect.
Note that we can't just not choose -- one approach requires the builder of the main executable to insert a .fini_array entry, while the other doesn't -- unless we want to require the run-time to handle either, which doesn't seem useful.
My current preference is to proceed with Mark's proposal, requiring that exit handle the __cxa_atexit -registered calls (but _not_ requiring that anyone explicitly register __cxa_finalize or anything else to accomplish that). Upon re-reading all the mail, this seems quite workable. In any case, I'll re-open the issue and we can discuss it next time.
[000504 All] Accept Mark's proposal. Jim will add to Draft C++ ABI for IA-64.
# | Issue | Class | Status | Source | Opened | Closed |
---|---|---|---|---|---|---|
C-4 | Construction vtables | call | closed | Cygnus | 990603 | 000504 |
Summary: When calling a virtual function from the constructor/destructor of a base subobject, the version specific to the base type is required, unlike the typical case when calling such a vfunc for the full object from some other context. Since the pointer for that vfunc in the the subobject's sub-vtable of the full object's vtable is the full object version, some other means is required for accessing the correct vfunc. | ||||||
Resolution: Accept Compaq proposal as currently documented in the Draft C++ ABI for IA-64. |
[990630 HP -- Christophe] A rough idea from Christophe's original vtable layout proposal has been incorporated in the Draft C++ ABI for IA-64.
[000217 All] Coleen has generated a proposal.
[000308 All] Discussed and clarified the proposal. Jim will clarify the content descriptions. Coleen will describe how some of the base vtables can be eliminated from the construction vtable groups given vbase promotion. She should be out to California in two weeks, so we can finalize this issue.
[000323 All] Discussion clarified the two proposals and their relative merits:
It was decided that the space savings outweighed the lost optimizations, and proposal B was adopted. Jim will clean up the writeup for final adoption.
For the record, following are several issues that have been raised and resolved in the process of developing this proposal:
[000504 All] Modify VTT order to put everything in preorder, to match other aspects. Accept Compaq proposal as currently documented in the Draft C++ ABI for IA-64.
# | Issue | Class | Status | Source | Opened | Closed |
---|---|---|---|---|---|---|
C-5 | Calling destructors | call | closed | Sun | 990603 | 991104 |
Summary: What is the calling convention for destructors? Do virtual destructors require special treatment? Is delete() integrated with the destructor call or separate? How is delete() handled when invoked on a base subobject? | ||||||
Resolution: Destructors are called with a reference to this. Virtual destructors have two versions, and two entries in the vtable, one that deletes the object after destruction, and one that doesn't. There is a third version that does not delete the object, and is not in-charge, i.e. does not destroy any base objects; it is not called via the vtable. |
[990729 all] Some implementations combine destructors with deletion, checking a flag in the destructor to determine whether to delete. This produces somewhat less code, especially if there are many delete() calls. However, it adds overhead to any destructor which does not require deletion, e.g. base and member objects, automatic objects. There is some concern that a runtime test is sometimes required, but noone has yet identified why.
[990819 Cygnus -- Jason] The [above] questions the usefulness of calling op delete from the destructor. But it's required by the language, in case the derived class defines its own op delete. This only applies to virtual dtors, of course.
One option would be to have two dtor slots, one which performs deletion and one which doesn't. The advantage of this sort of approach would be avoiding pulling in all the memory management code if you never actually touch the heap.
Microsoft has a patent on this device, but the old Sun ABI also talks about it, which seems to qualify as prior art.
[991014 all] One solution to the problem with destructors is to have two destructor entry points, and two destructor slots in the vtable. One entry point destroys the object and then calls operator delete, the other destroys the object without calling operator delete. We can use a similar solution for constructors (but without any impact on the vtable layout): one entry point for constructing a complete object, another for constructing a subobject.
Note that one of the entry points may call the other, but that's not an ABI issue and can be left to individual implementors.
There was general agreement that this is a promising idea. We don't have a detailed proposal yet. HP is working on a prototype implementation. Christophe will submit a writeup.
[991028 all] There are two options in destructors:
delete()
is called.
delete()
,
and only the most-derived object does destruction for virtual bases,
only three of the possible combinations arise:
delete()
, not in-charge.
delete()
, in-charge.
delete()
, in-charge.
We distinguish the delete/no-delete cases by distinct entrypoints,
so only a this
parameter is required,
and the standard calling conventions are used.
The only special treatment of virtual destructors is the pair of vtable
entries described above.
# | Issue | Class | Status | Source | Opened | Closed |
---|---|---|---|---|---|---|
C-6 | Extra parameters to constructors | call | closed | Cygnus | 990603 | 991104 |
Summary: When calling constructors for classes with virtual bases, what information about the treatment of virtual base subobjects in the full class, or about object allocation, must be transmitted to the constructor in parameters? | ||||||
Resolution: None. Two versions, and two entrypoints, of the constructor will be created: one that calls the virtual base subobject constructor (in-charge), and one that does not. Object allocation will be done by the caller. |
[991028 all] We will produce two constructor entries, one in-charge (constructing virtual bases) for a most-derived object, and one not in-charge for a base subobject. The object allocation will be the responsibility of the caller, so there will be no variation or parameters for that purpose.
# | Issue | Class | Status | Source | Opened | Closed |
---|---|---|---|---|---|---|
C-7 | Passing value parameters by reference | call | closed | All | 990624 | 990805 |
Summary: It may be desirable in some cases where a type has a non-trivial copy constructor to pass value parameters of that type by performing the copy at the call site and passing a reference. | ||||||
Resolution : Whenever a class type has a non-trivial copy constructor, pass value parameters of that type by performing the copy at the call site and passing a reference. |
[990701 All] Daveed and Matt will attempt to pin down the copy requirements with the Core committee, i.e. when a non-trivial copy constructor may be elided. The relevant Standard requirement is 12.8/15, and there is an open defect report related to this question. For cases where the ctor may not be elided, we expect to perform the copy at the call site, and pass a reference.
[990729 All] Matt will produce a clear proposal for when the ABI will elide the constructor (and therefore pass the class object like a normal C struct), based on the Standard's exceptions.
[990805 All] There are no cases where a non-trivial copy constructor can be simply elided for all instances of a particular parameter. Therefore, we shall use the consistent convention that, if a value parameter's (class) type has a non-trivial copy constructor, the caller will allocate space for it, perform the copy, and pass a reference.
Note that the standard does allow the caller, if the value being passed is a temporary, to construct the temporary directly into the parameter memory and elide the copy constructor call.
# | Issue | Class | Status | Source | Opened | Closed |
---|---|---|---|---|---|---|
C-8 | Returning classes with non-trival copy constructors | call | closed | All | 990625 | 990722 |
Summary: How do we return classes with non-trivial copy constructors? | ||||||
Resolution: The caller allocates space, and passes a pointer as an implicit first parameter (prior to the implicit this parameter). |
# | Issue | Class | Status | Source | Opened | Closed |
---|---|---|---|---|---|---|
C-9 | Passing parameters with ctors/dtors | call | closed | All | 991028 | 991104 |
Summary: Where do allocation, construction, destruction, and deallocation occur for value parameters? | ||||||
Summary: See the description in the closed issues list. |
[991028 all] For value parameter types with a non-trivial copy constructor or destructor, a call handles the parameter as follows:
this
in memory.)
# | Issue | Class | Status | Source | Opened | Closed |
---|---|---|---|---|---|---|
C-10 | Synthesized copy assignments | call | closed | All | 991028 | 991028 |
Summary: Should we specify special treatment for synthesized copy assignments, to avoid multiple copies of virtual bases? | ||||||
Resolution: No. |
[991028 all] For classes with virtual bases, the Standard allows a synthesized copy assignment to copy the virtual bases multiple times, but does not require it. The simplest approach, recursively copying the base objects, will cause multiple copies for virtual bases with multiple inheritance paths. This can be avoided by synthesizing a second copy assignment operator which does not copy virtual bases, to be called when assigning a subobject.
The decision was made not to do so, on grounds that the situation is rare, and virtual bases are often empty besides, so that the solution is not worth the resulting code bloat.
# | Issue | Class | Status | Source | Opened | Closed |
---|---|---|---|---|---|---|
C-11 | Array constructors/destructors | call | closed | Cygnus | 000130 | 000309 |
Summary:
How are constructors/destructors run for arrays?
Many compilers use a __vec_new function;
g++ doesn't, to allow for inlining of constructors.
| ||||||
Resolution: Define standard library entries for array construction/destruction. See the Draft C++ ABI for IA-64. |
# | Issue | Class | Status | Source | Opened | Closed |
---|---|---|---|---|---|---|
C-12 | Constructor return values | call | closed | Cygnus | 000130 | 000309 |
Summary: What is the return value of a constructor? Void, this, ...? | ||||||
Resolution: Void. |
[000130 Cygnus -- Jason] I don't see any reason to return a value from constructors, since we will always pass in the address of the object. g++ currently returns that address, for historical reasons (previously, to support assignment to 'this').
[000131 IBM -- Mark] Currently, we use the returned value from the ctor for cases like S().i. It wouldn't be hard to change the compiler, but we do need a decision one way or another.
[000308 All] Decided to return void. Open another issue (C-13) to consider alternate allocating constructors (low priority).
# | Issue | Class | Status | Source | Opened | Closed |
---|---|---|---|---|---|---|
C-13 | Allocating constructors | call | closed | HP | 000309 | 000803 |
Summary: Should we define allocating constructors? | ||||||
Resolution: Their use is optional. Their name mangling is specified. If used, they must be emitted everywhere referenced as a COMDAT group (Draft ABI section 5.2.5). |
[000308 HP -- Christophe] We should consider defining alternate constructors which allocate the object before constructing it.
[000803 All] The definition in the Draft, section 5.2.5, is accepted.
# | Issue | Class | Status | Source | Opened | Closed |
---|---|---|---|---|---|---|
C-14 | Local-scope dynamic constructors | data | closed | all | 000309 | 000511 |
Summary: The Standard requires that local static objects with dynamic constructors be initialized exactly once, the first time the containing scope is entered. This requires a data object to serve as a guard variable; define its content or interface. | ||||||
Resolution: The size of the guard variable is 64 bits. The low-order byte shall contains a boolean initialization flag. |
[000309 All] We have defined a mangling for the guard variable object (issue F-1), but we need to define at least its size and either its content or a library interface to it. This is tied up with multithreading issue G-4. If we want the initialization to be implicitly thread-safe, the object probably needs to contain both an initialized flag and a thread semaphore, and it is desirable that they be in different cache lines.
[000511 All] The size of the guard variable is 64 bits. The low-order byte shall contain the value 0 prior to initialization of the associated variable, and 1 after initialization is complete. Usage of the other bytes of the guard variable is implementation-defined.
# | Issue | Class | Status | Source | Opened | Closed |
---|---|---|---|---|---|---|
C-15 | Alternate array allocators | call | closed | CodeSourcery | 000628 | 000720 |
Summary:
Allow alternate allocators/deallocators to
__cxa_vec_new and __cxa_vec_delete .
| ||||||
Resolution: Add two new allocators, and two new deallocators, with one of each pair using a simple user deallocator and one using a user deallocator requiring a size. See the Draft C++ ABI for IA-64. |
[000628 CodeSourcery -- Mark] __cxa_vec_new and __cxa_vec_delete would be a lot more useful if they accepted pointers to the allocation and deallocation functions as well. As it is, they are hard-wired to use the `::operator new[]' and `::operator delete[]'. Since the whole purpose of these functions is to provide compilers a convenient way to manage construction and destruction, I think we should either add allocation/deallocation routine pointers to these functions, or add additional entry points. This additional flexibility would also be useful for C++-compatible allocations from other languages, etc.
[000706 All] We agreed to do this. Jim will write it up.
[000720 All] Accepted as documented.
# | Issue | Class | Status | Source | Opened | Closed |
---|---|---|---|---|---|---|
C-16 | Copy constructor runtime | call | closed | CodeSourcery | 000628 | 000720 |
Summary: Define a runtime support routine for copy constructors. | ||||||
Resolution: Add a new runtime for vector copy construction. See the Draft C++ ABI for IA-64. |
[000628 CodeSourcery -- Mark] I think we should also add a runtime support routine for copy constructors. Here's a sample definition:
extern "C" void
__cxa_vec_cctor (void *dest_array,
void *src_array,
size_t element_count,
size_t element_size,
void (*constructor) (void *, void *),
void (*destructor) (void *))
{
size_t ix = 0;
char *dest_ptr = static_cast (dest_array);
char *src_ptr = static_cast (src_array);
try
{
if (constructor)
for (; ix != element_count;
ix++, src_ptr += element_size, dest_ptr += element_size)
constructor (dest_ptr, src_ptr);
}
catch (...)
{
__uncatch_exception ();
__cxa_vec_dtor (dest_array, ix, element_size, destructor);
throw;
}
}
This routine will be useful to compilers when copying a structure containing an array. The EDG front-end uses this method.
[000706 All] We agreed to do this. Jim will write it up.
[000720 All] Accepted as documented. NULL constructor is not allowed. An allocating version is not needed.
# | Issue | Class | Status | Source | Opened | Closed |
---|---|---|---|---|---|---|
C-18 | Result buffers | call | closed | SGI | 000724 | 000817 |
Summary: Should buffers for results with non-trivial copy constructors be passed as a dummy first parameter, or in r8 as specified by the psABI for long structured results? | ||||||
Resolution: All results with non-trivial copy constructors or destructors will be returned in buffers allocated by the caller, with their addresses passed as an implicit first parameter. Other structure results too large for the return registers are returned in a buffer created by the caller, with the buffer address passed in r8. |
[000724 SGI -- Dehnert] I just noticed that the IA-64 psABI requires returning large aggregates (over 256 bits except for some floating point ones) via a buffer allocated by the caller and passed in r8. We have specified in the C++ ABI that class results with non-trivial copy constructors be returned in a buffer allocated by the caller and passed as an implicit first parameter (i.e. in out0, not in r8). I suggest that we make these two cases consistent, i.e. pass the buffer address in r8 instead of out0. (This would not affect non-IA-64 compilers.)
[000817 All] Accepted. In all cases where a result class object is returned in a buffer created by the caller, the buffer address will be passed in r8, and not like an implicit first parameter.
# | Issue | Class | Status | Source | Opened | Closed |
---|---|---|---|---|---|---|
C-19 | NULL ctor/dtor API parameters | call | closed | CodeSourcery | 000806 | 000831 |
Summary: Allow NULL constructor/destructor parameters whereever it makes sense in the Section 3.3 APIs. | ||||||
Resolution: Accepted as proposed. |
[000806 CodeSourcery -- Mark] The ABI doesn't say whether or not the constructor and destructor parameters may be NULL for many of the functions. In some cases, it does say that the pointers may not be NULL.
I believe that a) the spec should explicitly specify this everywhere, and b) we should allow NULL pointers whenever it makes sense. These are convenience routines; why not make them convenient?
For example, why not allow __cxa_vec_new2 to be used with both NULL constructors and destructors? The caller should then pass in zero for the padding size, of course. There's no reason to try to make these routines go fast -- they're just their for convenience, and the memory allocation/function call indirection overhead will swamp a few conditionals on NULL parameters.
[000824 CodeSourcery -- Mark] Overall motiviation: there is every reason to make these functions convenient for use by compilers and for manual use in various kinds of specialized reflection-like situations, including use in debuggers. There is virtually no speed penalty for allowing NULL pointers in these functions since the tests for NULL can be performed outside of the loop, and the loop itself will normally function calls.
`constructor' and/or `destructor' may be NULL.
The destructor may be NULL if and only if the padding_size is zero.
`constructor' and/or `destructor' may be NULL.
The destructor may be NULL if and only if the padding_size is zero.
`alloc' and `dealloc' may not be NULL.
`constructor' and/or `destructor' may be NULL.
`destructor' may be NULL.
`destructor' may be NULL.
`dealloc' may not be NULL.
`constructor' and/or `destructor' may be NULL.
[000831 All] Accepted this proposal as per Mark's list above.
# | Issue | Class | Status | Source | Opened | Closed |
---|---|---|---|---|---|---|
D-0 | Exception handling framework | lib ps | closed | SGI | 990520 | 991216 |
Summary: Define the general framework for exception handling, including Level I (psABI unwinding API) and Level II (C++ ABI exception handling API). | ||||||
Resolution: See the HP proposal, accepted as a working paper, and discussions in the closed issues page. |
For reference, we have design information as follows:
[990902 All] We observed that there are three levels at which we can discuss EH compatibility.
The first, minimal level is effectively that of the definition in the IA-64 Software Conventions document. It describes a framework which can be used by an arbitrary implementation, with a complete definition of the stack unwind mechanism, but no significant constraints on the language-specific processing. In particular, it is not sufficient to guarantee that two object files compiled by different C++ compilers could interoperate, e.g. throwing an exception in one of them and catching it in the other.
The second level is the minimum that must be specified to allow interoperability in the sense described above. This level requires agreement on:
The third level is a specification sufficient to allow all compliant systems to share the relevant runtime implementation. It includes, in addition to the above:
The vocal attendees at the meeting wish to achieve the third level, and we will attempt to do so. Whether or not that is achieved, however, a second-level specification must be part of the ABI.
Here is a quick description of the personality routine interface and semantics. This description is a slight extension of the existing personality routine implemented by HP for IA64. The extension is to allow multiple runtimes from possibly different vendors or for possibly different languages to cooperate in processsing an exception.
This document assumes that the chapter 11 of the Intel/HP "IA-64 = Software Conventions and Runtime Architecture" document is known to = the reader.
INTERFACE:
The complete exception processing framework consists of at least the = following routines: _RaiseException, _ResumeUnwind, = _DeleteException, _Unwind_getGR, = _Unwind_setGR, _Unwind_getIP, _Unwind_setIP, = _Unwind_getLanguageSpecificData, = _Unwind_getRegionStart. In addition, a language and vendor = specific personality routine will be stored by the compiler in the = unwind descriptor for the stack frames requiring exception = processing.
UNWIND RUNTIME ROUTINES:
The unwind runtime routines have the following interface and = semantics (all routines are extern "C"):
uint64 _RaiseException(uint64 exception_class, void = *exception_object);
Raise an exception, passing along the given exception class and = exception object. The exception object has been allocated by the = language-specific runtime, and has a language-specific format. = _RaiseException does not return, unless an error condition is = found (such as no handler accepting to handle the exception, bad stack = format, etc).
The first 4 words (32 bytes) of the exception object = are allocated for use exclusively by the unwinder, and should not be = written by the personality routine or other parts of the = language-specific runtime. The first word is used to store the exception = class. The second word points to the personality routine of the frame = that threw the exception intially. The two next words are reserved for = use by the unwinder. [Note: Typical use is to keep the state of the = unwinder while executing user code, such as our current frame_handle = pointer.]
void _ResumeUnwind (void = *exception_object);
Resume propagation of an = existing exception. [Note: _ResumeUnwind should not be used to implement = rethrowing. To the unwinding runtime, the catch code that rethrows was a = handler, and the previous unwinding session was terminated before = entering it.] [Note: Compared to HP runtime, the exception class = and frame handle arguments have been removed. They also need no longer = be passed to the landing pads. Instead, the unwinder will store the = information in one of its 2 reserved words.]
void _DeleteException(void = *exception_object);
If a given runtime resumes = normal execution after catching a foreign exception, it will not know = how to delete that exception. This exception will be deleted by calling = _DeleteException, which in turn will delegate the task to the = original personality routine (see EH_DELETE_EXCEPTION_OBJECT = below).
uint64 _Unwind_getGR(void *context, int index);
uint64 =
_Unwind_getIP(void *context);
void _Unwind_setGR(void *context, int =
index, uint64 new_value);
void _Unwind_setIP(void *context, uint64 =
new_value);
Get or set registers from the given = unwinder context. The 'context' argument is the same argument passed to = the personality routine (see below).[Note: Minor changes compared to the = existing unwinding interface, mostly to hide the register = classes]
uint64 _Unwind_getLanguageSpecificData(void = *context)
Get the address of the language-specific = data area for the current stack frame. The 'context' argument = is the same argument passed to the personality routine.[Note: This is = not stricly required: it could be accessed through getIP using the = documented format of the UnwindInfoBlock, but since this work has been = done for finding the personality routine in the first place, it makes = sense to cache the result in the context, as we currently = do]
uint64 _Unwind_getRegionStart(void = *context)
Get the address of the beginning of the = current procedure or region of code. [Note: This is required for us = because we store data relative to the beginning of the code. So let's = make it mandatory ;-]
PERSONALITY ROUTINE:
The personality routine is defined with the following = interface:
int = PersonalityRoutine
(int = version,
int = phase,
UInt64 = exceptionClass,
void * = exceptionObject,
void = *context);
[Note: the frame_handle argument has been removed: it was used only = once in the runtime, and the cost of reading it back from the exception = object is really minimal, compared to the cost of having to spill it in = all landing pads... The context argument type has been made opaque]
The arguments have the following role and meanings:
version: Version number that the compiler and personality = routine agree on, identifying for instance language-specific table = format. This version number is read from the unwind information block = (unwind tables)
phase: Indicates what processing the personality routine is = supposed to perform. The possible actions are described below under = 'UNWINDING PHASES'
exceptionClass: An 8-bytes identifier specifying the type of = the thrown exception. By convention, the high 4 bytes indicate the = vendor (for instance HP\0\0), and the low 4 bytes indicate the language = (for instance C++\0.) [Note: For C++, it is expected that agreement will = be reached on a common 'exceptionObject', but different vendors may = still chose to have different personality routines with different table = formats.]
exceptionObject: The pointer to a memory location recording = the necessary information for processing the exception according to the = semantics of a given language. [Note: For C++, it is assumed that the = format of this exception object can be agreed upon, even if we disagree = on the LSDA and/or landing pad registers or similar = details.]
context: Unwinder state information for use by the personality = routine. This is used by the personality routine in particular to access = the frame's registers. [Note: I don't see how anything could work = without a minimal common unwinder interface - which is why it has been = defined above]
return value: The return value from the personality routine = indicates how further undinwind should happen, as well as possible error = conditions. See "UNWINDING PHASES" below for = details.
UNWINDING PHASES
Unwinding is a 2-phases process.
PASS 1 unwinds through the stack, looking for a "handler", = that is a code that has the potential to stop the exception propagation. = For C++, this would be a 'catch' clause. The first pass can do a = "quick" unwind, meaning it does not need to maintain full = registers state.
PASS 2 starts once a handler has been found. For each stack frame = that requires some cleanup, it performs that cleanup. For C++, this = would be destructors in addition to catch clauses. If compensation code = for some optimization is required, this is also the pass this code will = be executed. During that pass, the stack is actually unwound, and full = register state is restored prior to executing any cleanup, compensation = or handler code.
[Note: Cleanup code is code doing some user-defined cleanup such as = destructors. Compensation code is code inserted by the compiler to = compensate for an optimization that moved code past the throwing call. = Handler code is user-defined code that possibly can resume normal = execution]
The unwinding phase argument to the personality routine is a bitwise = or of the following constants:
EH_FORCE_UNWIND =3D 32: During pass 2, indicates that = no language is allowed to "catch" the exception. This flag is = set while unwinding the stack for setjmp or during thread cancellation. = User-defined code in a catch clause may still be executed, but the catch = clause has to resume unwinding at its end.
TRANSFERRING CONTROL TO A LANDING = PAD:
In the case the personality routine wants to transfer control to a = landing pad, it setups registers (including IP) to suitable values for = entering the landing pad. Prior to executing code in the landing pad, = registers not altered by the personality routine will be restored to the = exact state they were in that frame before the call that threw the = exception.
The landing pad can either resume execution to normal (as, for = instance, at end of a C++ catch), or resume unwinding by = calling the _ResumeUnwind function and passing it the = 'exceptionObject' argument received by the personality routine. = _ResumeUnwind will never return.
_ResumeUnwind should be called if and only if the = personality routine did not return EH_HANDLER_FOUND during = phase 1. In other words, the unwinder can allocate some resources (for = instance memory) and keep track of them in the exception object reserved = words. It should then free these resources before transferring control = to the last (handler) landing pad. It does not need to free the = resources before entering non-handler landing-pads, since = _ResumeUnwind will ultimately be called.
The landing pad will receive various arguments from the runtime, = typically passed in registers set using _Unwind_setGR by the = personality routine. For a landing pad that can lead to = _ResumeUnwind, one argument must be the = exceptionObject pointer, which must be preserved to be passed = to _ResumeUnwind. [Note: Thanks to the 4 reserved words in the = exception object, 2 landing-pad arguments have been eliminated.] The = landing pad may receive other arguments, for instance a 'switch value' = indicating the type of the exception being caught.
RULES FOR CORRECT INTER-LANGUAGE = OPERATION:
The following rules must be observed for correct operation between = languages and/or runtimes from different vendors:
An exception which has an unknown class must not be altered by the = personality routine. The semantics of foreign exception processing = depend on the language of the stack frame being unwound. This covers in = particular how exceptions from a foreign language are mapped to the = native language in that frame.
If a runtime resumes normal execution, and the caught exception was = created by another runtime, it should call _DeleteException. = This is true even if it understands the exception object format (such as = would be the case between different C++ runtimes). [Note: This is = because the other runtime might have to update some global variables = that point to the exception being deleted.]
A runtime is not allowed to catch an exception if the = EH_FORCE_UNWIND flag was passed to the personality = routine.
CATCHING FOREIGN EXCEPTIONS IN C++
Foreign exception can be caught in a catch(...). They can = also be caught as if they were of a __foreign_exception class, = defined in <exception>. [Note: The = __foreign_exception may have subclasses, such as = __java_exception and __ada_exception, if the runtime = is capable of identifying some of the foreign languages.]
The behavior is undefined in the following cases:
A __foreign_exception catch argument is accessed in any way = (including taking its address).
A __foreign_exception is active at the same time as another = exception (either there is a nested exception while catching the foreign = exception, or the foreign exception was itself = nested)
uncaught_exception(), set_terminate(), = set_unexpected(), terminate() or unexpected() = is called at a time a foreign exception exists (for instance, calling = set_terminate() during unwinding of a foreign = exception)
[Note: All these cases might involve accessing the C++ specific = content of the thrown exception, for instance to chain active = exceptions]
Otherwise, a catch block catching a foreign exception is = allowed:
To resume normal execution, thereby stopping propagation of the = foreign exception and deleting it,
Or to rethrow the foreign exception. In that case, the original = exception object should have been unaltered in any way by the = C++ runtime.
A catch-all block may be executed during forced unwinding. For = instance, a setjmp may execute code in a catch(...) during stack = unwinding. However, if this happens, unwinding will proceed at the end = of the catch-all block, whether or not there is an explicit = rethrow.
Setting the low 4 bytes of exception class to C++\0 is = reserved for use by C++ runtimes compatible with the common = C++ ABI.
[990923 All] Extensive discussion at the meeting was generally positive about the HP proposal. Several changes came up, ranging from editorial to substantive. Christophe will modify the specification.
setjmp
in the document should be to
longjmp
.
[991202 All] Since there was not much time for review of HP's revised proposal, discussion was limited to relatively minor comments. This remains the highest priority area, with ongoing implementations depending on resolution. We plan a thorough discussion next week, with adoption as soon as practical. Note that the concensus remains positive, with the expectation that the proposal will undergo only minor fixes before adoption, so implementations can proceed with the current document as a basis without great risk.
[991209 All] Several issues arose from the discussion of HP's exception handling specification:
[991216 All] The HP proposal is accepted as a working paper, subject to a number of minor issues which need to be resolved, and will be opened and tracked independently. SGI volunteered to do the necessary rework to put the material into a more ABI-oriented (rather than implementation-oriented) form. (This has been done for the base ABI unwind material as of 5 January.)
HP management has agreed to release the C++ exception handling runtime, but don't consider their unwind library suitable for release. SGI has agreed to release their unwind library. SGI is now (5 Jan) working on ABI conformance in preparation for doing so.
It was clarified (and should be in the document)
that unwinding determines in Phase 1 that an exception will be uncaught,
and calls terminate()
before starting Phase 2.
# | Issue | Class | Status | Source | Opened | Closed |
---|---|---|---|---|---|---|
D-2 | Unwind personality routines | lib ps | closed | SGI | 990520 | 000106 |
Summary: The IA-64 runtime conventions provide for a personality routine pointer for language-specific actions when unwinding the stack. They do not specify its interface. There are typically two required actions for C++: locating a handler (non-destructively) and destroying automatic objects while unwinding. This issue involves specification of the API (see also D-3). | ||||||
Resolution: See the exception handling specification, level 1, and the working paper. |
[990826 Intel/HP] The Software Conventions document is claimed to specify the interface, with the parameters indicating which action is required. (I can't find it, but this would be an acceptable solution -- Jim.)
[991209 all] Observe that this issue is part of a level 1 specification, i.e. part of the base ABI. It is being described as part of the proposed common EH interface from HP.
[000106 all] Closed -- specified as part of the accepted exception handling specification.
# | Issue | Class | Status | Source | Opened | Closed |
---|---|---|---|---|---|---|
D-3 | Unwind process clarification | lib ps | closed | SGI | 990520 | 000106 |
Summary: The IA-64 runtime conventions provide for a personality routine pointer for language-specific actions when unwinding the stack. However, they are quite muddy about the precise sequence of calls. This issue involves specification of unwind process (see also D-2). | ||||||
Resolution: See the exception handling specification. |
[991209 all] Observe that this issue is part of a level 1 specification, i.e. part of the base ABI. It is being described as part of the proposed common EH interface from HP.
[000106 all] Closed -- specified as part of the accepted exception handling specification.
# | Issue | Class | Status | Source | Opened | Closed |
---|---|---|---|---|---|---|
D-4 | Unwind routines nested? | lib ps | closed | SGI | 990520 | 991209 |
Summary: The IA-64 runtime conventions call for the unwind personality routine to behave like a routine nested in the routine raising an exception. Is that the preferred definition? | ||||||
Resolution: This is not required, nor included in the proposed common implementation. However, a conforming implementation could add this option in the personality routine and tables. |
[990902 All] Discussion reveals that Intel and HP have very different models of how cleanup actions are handled.
Intel builds one or more routines which are called from the unwind runtime, based on action descriptors in the unwind tables, and acting on the stack contents or objects to be destroyed without actually modifying the stack pointer until the final transfer of control to the user handler. This approach avoids actually restoring registers until the final transfer to the handler.
HP transfers control back to a user landing pad whenever anything needs to be done -- descriptors or handlers -- and reenters the unwind runtime if further processing is required. They believe this approach to use much less space than the action descriptors would, and most importantly, that it allows arbitrary fixup for code motion around the call that throws.
[991209 All] An implementation can conform with the proposed C++ personality routine interface and either support or not support nested handlers -- the only requirement is that the generated personality tables and routine collaborate. The proposed common EH interface from HP does not use nested functions as handlers, but could easily be extended.
This issue is closed, with the immediate resolution of changing the base unwind ABI to not require nested function handlers.
# | Issue | Class | Status | Source | Opened | Closed |
---|---|---|---|---|---|---|
D-5 | Interaction with other languages (e.g. Java) | lib ps | closed | HP | 990603 | 991007 |
Summary: The IA64 exceptions handling framework is largely language independent. What is the behaviour of a C++ runtime receiving, for instance, an exception thrown from Java? Does it call terminate()? Does it allow the exception to pass through C++ code with destructors if there is no catch clause? Does it allow the exception to be caught in a catch(...) provided this catch(...) ends with a rethrow? Does it allow even more? | ||||||
Resolution: In general, foreign exceptions will cause normal destructor invocation and other cleanup in C++ code, and will pass through C++ frames except where explicit exception specifications do not allow them. |
[990908 SGI -- Jim] We propose that this be resolved by identifying the source language in the exception descriptor and specifying that the personality routine be able to perform cleanup actions during handling of foreign-language exceptions, but not attempt to catch them.
[991006 All] The concensus of the group, from the discussion of the low-level exception API, is:
[991007 All] In addition to the above, Christophe will define an exception __foreign_exception to be used by foreign-language code which wants to raise an exception that C++ can catch.
Close this issue.
# | Issue | Class | Status | Source | Opened | Closed |
---|---|---|---|---|---|---|
D-6 | Allow resumption in other languages? | lib ps | closed | HP | 990603 | 991007 |
Summary: The exception handling framework requires the interaction of the runtime of all the languages "on the stack" during exception processing. Some of these languages may have very different exception handling semantics. What are the constraints we impose on the C++ exception handling runtime to preserve the relative language neutrality of the EH framework? Example: do we allow a handler to cleanup and resume at the point where the exception was thrown? | ||||||
Resolution: Moot -- resume-type exceptions are more appropriately handled by registering trap handlers and processing them in place. No interaction with stack traceback should be necessary. |
[990908 SGI -- Jim] The typical case of cleanup and resume is floating point trap handling, which is normally handled entirely in the original FP trap handler. Is there an example where stack walkback must occur to identify the handler, but resumption at the point-of-exception is required? I can't think of any, and I think the model of registering a trap handler is preferable for such purposes.
# | Issue | Class | Status | Source | Opened | Closed |
---|---|---|---|---|---|---|
D-7 | Interaction with signals or asynch events | lib ps | closed | HP | 990603 | 991209 |
Summary:
The Standard says that the behavior of anything other than
"pure C code" (POF) is implementation defined,
and warns (in a note) against using EH in a signal handler.
We should define what is supported,
possibly explicitly stating that signal handler code must be a POF.
We could allow any feature but exception handling to be used.
We could allow some EH routines to be called
(for instance, uncaught_exception() ).
Or we could allow even an exception to be thrown,
if it does not exit the handler.
| ||||||
Resolution: This ABI requires no support beyond the Standard requirements. |
[991006 All] This common ABI will not allow throwing exceptions from a signal handler.
[991007 All] There remains concern about how to help customers (examples were presented of big database applications) for which raising exceptions from signal handlers for I/O failures is a highly desirable design. We will revisit this issue.
[991209 All] Further discussion clarified the situation.
The fundamental problem is that signals thrown from a signal handler (or otherwise asynchronously) may appear at arbitrary points in the program, where the unwind information is inadequate to reliably clean up, for instance because global variable updates have been moved across the point of exception.
A second problem is that signals are often processed on their own stack, and making the transition to the main user stack might not happen automatically.
As a result, it was generally agreed that dealing with exceptions raised asynchronously would require simply passing through the immediately enclosing stack frame (to avoid the first problem), and a special raise invocation (as a basis for addressing both).
However, the only customer that has been adamant about supporting asynchronous exceptions has also been adamant that such a partial solution would not be adequate. Their intended application involves raising the exception in a simple routine that they expect to be inlined (for performance reasons) directly into a try block, which would be bypassed by the proposed solution. Since making this work would involve significant performance penalties elsewhere, the group's concensus is that there is inadequate benefit from an attempted solution.
# | Issue | Class | Status | Source | Opened | Closed |
---|---|---|---|---|---|---|
D-8 | Interaction with threads packages | lib ps | closed | SGI | 990603 | 000106 |
Summary:
What happens when an exception is not caught in the thread where raised?
What does uncaught_exception()
return if another thread is currently processing an exception?
| ||||||
Resolution:
With one exception, exception handling is entirely per-thread --
exceptions must be caught in the thread where raised,
and queries about them (e.g. uncaught_exception() )
are answered only with respect to the thread doing the query.
The only global exception behavior is handler registration --
see issue D-15.
|
# | Issue | Class | Status | Source | Opened | Closed |
---|---|---|---|---|---|---|
D-9 | longjmp interaction | lib ps | closed | IBM | 990908 | 000113 |
Summary: Does longjmp run destructors? | ||||||
Resolution: Define an alternate routine, longjmp_unwind in namespace abi, defined in new header cxxabi.h, which always does full cleanup during unwinding. |
[990908 IBM -- Mendell] Does longjmp run destructors? I believe that the C ABI makes this optional. I would like to propose that it does run destructors.
[990908 SGI -- Wilkinson] The C++ standard, 18.7 paragraph 4, says a call to longjmp has undefined behavior if any automatic objects would have been destroyed by a throw/catch with the same source and destination. I don't see that this is something we need to fix.
[990908 IBM -- Thomson] Yes it does, but ANSI is not my customer. Meeting the bare minimum of function that ANSI requires doesn't necessarily mean that users can build robust applications. How can they know to avoid longjmp in their C code, because some third party library they are using has C++ buried in it?
[990908 SGI -- Dehnert] Implementation is a significant issue. The normal longjmp implementation is very simple -- setjmp stores the register/stack state, and longjmp copies it back and branches. There is normally no traceback involved, so what you suggest is a dramatic change, and probably would make C people very unhappy. Furthermore, C++ users have the option of using C++ exceptions, which have the effect you seek.
[990908 SGI -- Boehm] The problem is that on the C side:
I don't know whether it's possible to avoid breaking these clients while providing the stack-unwinding semantics.
[990908 IBM -- Mendell/Thomson] [VisualAge C++] on OS/2 and Windows does do the unwinding. This is probably because unwinding support is in the OS. Also OS/390 and I believe AS/400 too. Our AIX implementation does not do the unwinding.
[990909 DEC -- Brender] In addition to the systems already mentioned by others, these systems also do exception-handling compatible unwinding for C's setjmp/longjmp:
If you believe in safe and compatible multi-language systems, there really is no choice but to do EH compatible unwinding for setjmp/longjmp -- at least by default.
I suppose it would be OK for an implementation to offer an alternate setjmp/longjmp that could be linked in for those who either know that it is safe in particular cases or are happy to trade safety for speed...
[990909 All] A brief discussion agreed that concensus is not absolutely necessary. An implementation could replace setjmp/longjmp with a version that either unwinds or justs restores and jumps, without breaking any code except that which assumed one or the other. (Ed.: In fact, if setjmp stores enough information to either restore or to catch an exception, one could just swap longjmp, although that would not be optimal for the unwind and catch case, since setjmp doesn't need to save much information in that case as most of what is needed is in the unwind descriptors.)
[990923 All] We agreed that:
See the HP low-level exception writeup at the beginning of the exception issues section.
[991216 All]
Use the name longjmp_unwind
for the alternate
longjmp
that always does full C++ unwinding.
The issue of where to put it (namespace and header) remains.
[000106 All]
We agreed to define a new header for ABI definitions,
initially containing this and the special exception objects agreed upon.
SGI will create an initial version.
We also agreed to put ABI-defined new features in an "abi" namespace.
Therefore, for this issue, we have a prototype in cxxabi.h
:
namespace abi {
extern "C" void longjmp_unwind (jmp_buf env, int val);
}
[000113 All] Accept as described and close.
# | Issue | Class | Status | Source | Opened | Closed |
---|---|---|---|---|---|---|
D-10 | psABI proposal | lib ps | closed | all | 991216 | 000120 |
Summary: Solidify the Level I (psABI) specification and submit it to the base ABI group. | ||||||
Resolution: See the exception handling specification. |
[991216 All] This is essentially Section 8 of the HP working paper. SGI has reworked it into the draft exception handling specification. This group needs to approve the reworked version, at which time it can be submitted to the base ABI group.
The draft needs to clarify that the unwinder will detect uncaught
exceptions in Phase 1, and call terminate()
before Phase 2.
Issues D-11 through D-14 below are also relevant to the Level I
specification.
[000120 All] Close with minor modifications.
# | Issue | Class | Status | Source | Opened | Closed |
---|---|---|---|---|---|---|
D-11 | pthreads interface | lib ps | closed | all | 991216 | 000203 |
Summary: Certain pthreads functionality is a prerequisite, e.g. to acquire thread-local storage. The ABI should specify the requirements, along with the expected stub behavior when the pthreads library is not present. | ||||||
Resolution: No specification necessary. This is Level 3 material. |
[000106 All] Christophe will extract a list of what the HP library expects and send it.
[000120 HP -- Christophe]
Data types:
Functions:
Extra expected functionality:
[000201 SGI -- Jim]
We propose that the following functionality be required of the base ABI.
The definitions are based on the pthreads
package,
with multi-threading semantics.
However, it is expected that an implementation will provide default
versions in the C++ (or C) library for single-threading programs,
and override them in the thread library for multi-threading cases.
Two sets of functionality are provided: once-only initialization, and thread-private data key management. The group also wants a means of identifying whether the real pthreads implementation is present -- I have not yet proposed such a feature.
typedef ... pthread_once_t;
pthread_once_t once_control = PTHREAD_ONCE_INIT;
int pthread_once ( pthread_once_t *once_control,
void (*init_routine) (void) );
The purpose of the pthread_once
routine is to execute
a particular initialization routine exactly once in a thread-safe manner.
The user declares a control variable of type
pthread_once_t
statically initialized to PTHREAD_ONCE_INIT
,
and passes it to the pthread_once
routine.
The first time pthread_once
is called with a given
once_control
argument,
it calls init_routine
with no argument
and changes the value of the once_control
variable to record that initialization has been performed.
Subsequent calls to pthread_once
with the
same once_control
argument do nothing.
pthread_once
always returns 0.
The default single-threaded implementation need not lock
accesses to once_control
,
whereas overriding versions in multi-threading libraries
presumably will.
typedef ... pthread_key_t;
int pthread_key_create ( pthread_key_t *key,
void (*destr_function) (void *) );
int pthread_setspecific ( pthread_key_t key,
const void *pointer );
void * pthread_getspecific ( pthread_key_t key );
The purpose of this functionality is to allow a program to manage data segments which are specific to a particular thread, but are identified by a key common to all threads. It is required in the C++ exception handling library, for example, to maintain thread-specific active exception lists.
The user program must first create a key variable of type
pthread_key_t
.
It then obtains an identifying key value from the implementation
by calling pthread_key_create
,
also specifying at that time a destructor routine that will be called
if a thread terminates,
with a single argument that is the value associated with the key
for the terminating thread.
This destructor call is only made if the associated value is not NULL,
and it is set to NULL before making the call.
If successful, pthread_key_create
returns zero,
places the value of the key identifier in *key
, and
initializes the value associated with the key to NULL for all threads.
If unsuccessful, e.g. exceeding the number of allocated keys,
it returns an error code.
A user thread may then associate a value with the key,
typically the address of a thread-specific data area,
by calling pthread_setspecific
.
If successful, pthread_setspecific
returns zero.
If unsuccessful, e.g. because of an invalid key identifier,
it returns an error code.
Later, a thread can obtain the value it has associated with the key
by calling pthread_getspecific
,
which returns the value associated with key
on success,
and NULL on error.
[000203 All] It turns out that some (but not all) Unix implementations provide stubs for some of the pthreads routines in libc or equivalent that, rather than implementing a simplified form of the functionality, return an error code indicating that pthreads is not loaded. A specification such as the above would therefore cause compatibility problems.
These functions are only used in the exception handling library at Level 3, i.e. they are part of the interface between the system-specific implementation and other system-provided libraries, and do not involve interfaces to either compiled code or other components not under control of the system vendor. Therefore, no specification is needed.
# | Issue | Class | Status | Source | Opened | Closed |
---|---|---|---|---|---|---|
D-12 | Table location | lib ps | closed | all | 991216 | 000504 |
Summary: Determine constraints on the location of the unwind table and the unwind information table. | ||||||
Resolution: The unwind tables must reside in the text segment they describe. |
[991216 SGI -- Jim]
The unwind table consists of triples:
a begin and end location bounding the code fragment described by the
unwind descriptors,
and the location of the unwind information for this fragment.
The base psABI states that these are segment-relative offsets,
to avoid the need to relocate them at runtime.
It also specifies a section type and name for the unwind table,
with attribute SHF_ALLOC
(but not writable),
as well as a segment type,
but does not specify the unwind information table section information.
The psABI specification leaves open the question of how to identify the relevant segments for the unwind table segment-relative entries. There are several possibilities:
Forcing the unwind information tables into the text segment is constraining. Given that their format is undefined by the ABI (i.e. the language-specific data area), the severity of that constraint is not fully predictable. It would, for example, interfere with the bias in some systems to avoid data in text segments.
[000120 HP -- Cary] The first bullet you listed is the intended method. Both the unwind table and the unwind info blocks are intended to be in the same segment as the text with which they're associated. Thus, any segment-relative addresses in those tables are understood to refer to locations in the same segment.
To overcome any limitations that placing info blocks in text might impose, we designed the LTV family of relocations, which allows a link-time virtual address to be placed in an info block without requiring a dynamic relocation; the consumer is expected to be able to calculate from context what segment the LTV address refers to so it can relocate the address on the fly. We also have the LTOFF_FPTR family of relocations, which is needed to identify the personality routine as a gp-relative offset to a linkage table entry that contains the function pointer.
The advantages to this scheme are that there are no dynamic relocations for any unwind information (except function pointers in the GOT created by LTOFF_FPTR), and that the unwind information does not cause any increase in the application's per-process data space.
In order to unwind correctly, it's important that there is a one-to-one relationship between text segments and unwind tables. The dynamic loader needs to keep track of these relationships, so that the unwinder can find the appropriate unwind table, given a pc value.
Instead of a table of triples, there is a PT_UNWIND program header table entry that locates the unwind information for a load module; this entry is intended to refer to a subset of the text segment. It's through this entry that the dynamic loader finds the unwind table.
[991224 SGI -- Jim] My concern with this comes from the possibility of generating multiple text segments. In such a case, if an implementation wants to put the unwind information in a separate segment from text, there's no longer a trivial way to find the associated text segments for fixup. And although I have no objection to putting these in text today for C++, I'm concerned that a future requirement for C++ or some other language might make it desirable to put them in data. If there's a simple way of making this work, I'd like to pursue it.
[000126 HP -- Cary]
Re. multiple text segments...
Our position is that we would only need more than one text segment in a single load module where we need to establish different access permissions for some text pages than for others. In such a case, we consider them to be separate -- but contiguous -- text segments from the loader's point of view, and a single text segment from the unwinder's point of view. Therefore, we still need only one unwind table per load module.
This points out the hazy definition of "segment" and "program header table entry" in the ELF specification. Some program header table entries describe segments that are disjoint from all other segments, while others (like PT_DYNAMIC and PT_UNWIND) describe "sub-segments" that are really part of another segment.
Re. unwind tables in data...
The performance bigots here would *never* let me put the unwind tables in the data segment. Nevertheless, if some language-specific data really needs to be in data, it can be arranged by putting "LTV" pointers in the language-specific data that point to an auxiliary block of info in the data segment. A much earlier version of our C++ exception handling tables in fact did just that.
("LTV" pointers are "link-time virtual" addresses. At link time, an LTV relocation works just like the corresponding DIR relocation, except that no dynamic relocation is generated, so the associated word can be placed in a read-only segment. The consumer of that pointer must, at run time, figure out what segment the link-time virtual address refers to and apply the appropriate relocation factor to the address. The required information can be obtained from the dynamic loader. Note that this scheme requires that the linker-assigned addresses for all of the loadable segments do not overlap.)
No, but the dynamic loader does have access to it.
When we need to find an unwind table,
we ask the dynamic loader:
given a pc value, its dlmodinfo()
entry point locates
the load module containing that text segment,
and returns a struct load_module_desc,
which contains, among other things,
a pointer to the unwind table for that load module.
[991226 SGI -- Jim]
An observation, then:
in order to make this work,
we should specify how to obtain this information in the psABI,
unless dlmodinfo()
is already standard.
[000203 All] To understand this issue better, we worked through the EH structures looking at references:
It contains references:
It contains references:
It contains references:
[000323 Inprise -- Eli] I'd like it if we could avoid imposing data structures on the language implementations where possible. I'd particularly like to avoid this in the area of exception handling, as this is a place where different languages need to cohabitate in the process space. That's partly why I was happy to see the functional interface in the C++ exception handling doc that you folks did. My problems with the existing gcc mechanism revolve around the total commitment requirement to the gcc data format, which prevents me from even throwing exceptions past gcc frames without dying unless I fully conform to their data format.
The updated proposal seems to handle most of my concerns, but I'd still like to see the PC map hidden, so that language implementors can do as they see fit with this. To that end, I'd like to toss out the following additions. Note that these are tentative, based on my fiddling with it just a bit for the past day or so. I'm going to do a prototype to see how it holds together.
I would like to see the unwind tables registered with the _Unwind library,
and referenced only through callbacks,
like this:
typedef __personality_routine
(*_Unwind_IPLookupFn) (uint64 IP, void **pImplementationData);
int _Unwind_RegisterIPLookup
(_Unwind_IPLookupFn LookupFn, uint64 StartAddr, uint64 EndAddr);
void _Unwind_UnregisterIPLookup (_Unwind_IPLookupFn LookupFn);
The first function takes the address of a lookup function which returns a personality and pointer to implementation specific data based on an IP. Start and end addresses are made available so that the _Unwind library can optimize calls to these routines. When an exception is raised, the _Unwind library looks up the current IP by calling these registered procedures. The need for something like this was implied in the Intel Software Conventions and Runtime Architecture Guide, Chap 11 (SCRAG is what I'll call it). Section 11.1.2 says that the dynamic loader needs to provide an API for finding the unwind table. I've just changed the 'ownership' of the data a bit.
The second function lets you uninstall a lookup function. That's for when you're unloading, and you don't want to leave bad fn pointers floating. Yes, the RTL for the language does have to cooperate, or things can go south a considerable time after a module unloads.
The personality routine as it is stated in the C++ ABI doesn't have the
implementation specific data passed to it.
I'd like to add that:
typedef _Unwind_Reason_Code(*__personality_routine)
( int version,
_Unwind_Action actions,
uint64 exceptionClass,
_Unwind_Exception *exceptionObject,
_Unwind_Context *context,
void *ImplementationData );
The ImplementationData parameter is the item that is returned by the lookup function that resolves the personality for a given IP.
Given these changes, the format of most of the unwind data in chapter 11 of the SCRAG becomes mostly advisory (the frame info was already made so by the current document). Chapter 11 could essentially become an appendix implementation that could be used by implementors if they chose, but not forced on them. The other thing that I like about the lookup registering is that it allows implementors to innovate with respect to fast lookup schemes within a loadable module. The current scheme allows for no innovation whatever. I'd prefer that the implementors be left with the option to build as fancy or as simple a scheme for lookups and frame decomposition as possible, depending on the needs of the language.
[000406 All] There was some discussion of Eli's suggestion, centered on the observation that registration might be useful for situations like Java run-time compilation, where the unwind tables (nor the text referenced) do not exist at startup time. We agreed to go off and consider how we intended to deal with that situation.
Cary Coutant mentioned in a private conversation that he expects this to be handled by having the Java compiler (for example) register additional unwind tables with the dynamic linker. Since the HP implementation gets the table locations from the dynamic linker, this makes the additions transparent to the unwind library.
[000406 HP -- Christophe] An interesting observation was raised at todays C++ ABI meeting. Can we dynamically generate unwind tables for instance from a JIT? We are back to the question of whether the IP->UnwindInfo translation can be done just by looking up tables, or whether we need an API to do it.
I had a discussion with Laurent Morichetti a few minutes ago. It is unclear at that point whether their unwinding would be based on the unwind library at all (there are alternatives, such as encoding unwind information themselves). But assuming they want to leverage all the code that deals with the RSE and all that magic, they need to have a way to be compatible with the unwind library.
Today, the unwind library uses dlmodinfo to find the start of the code segment for the current IP (and a predefined symbol in the case of archive-bound executables). From there, it can find the start of the unwind table, and from there do a binary search on the IP to find the unwind info block.
The JVM could be compatible with this black magic by having a way to tell dld what to return for the newly created code segment. I don't think there is a public dld interface to do that, and it creates a rather obscure and difficult to document dependency between the JVM, the unwind library and dld.
Alternatively, we could have a couple of APIs to do IP->UnwindInfo translation, and to register a new range of text and provide the corresponding unwind info pointer. In that scheme, the actual location of the unwind table would become irrelevant.
Also note that in addition to Java support, an implementation of Dynamo for IA64 would probably have a similar problem.
[000502 SGI -- Jim] Unfortunately, though I'm not real happy with forcing the unwind tables into the text segment being described, and believe that we could avoid that restriction without significant complications, I think the current scheme is workable for mainstream systems, and I suspect that changing it at this point will encounter more resistance than we can overcome. So without a groundswell of support for a more general scheme, we should probably close this with the current approach.
[000504 All] Agreed as suggested. That is, the unwind table and descriptors are to be generated in the same text segment as the code to which they refer. The dynamic linker (ld.so) can find it via the PT_IA_64_UNWIND program header entry, and should provide an internal implementation-defined interface to the unwind library to map a PC to the associated unwind table, which is outside the scope of this C++ ABI.
To deal with applications that create code and unwind information dynamically (e.g. Java JITs), the base ABI should define an interface by which the application can register a new code/unwind data pair with ld.so. This issue has been submitted to the psABI group.
# | Issue | Class | Status | Source | Opened | Closed |
---|---|---|---|---|---|---|
D-13 | _Unwind_ForcedUnwind | lib ps | closed | all | 991216 | 000120 |
Summary: Define the interface of _Unwind_ForcedUnwind. | ||||||
Resolution: See the exception handling specification. |
[000106 All] Coleen will send a description of their thread cancellation mechanism.
[000120 All] Close with minor modifications. Christophe will send a thread cancellation example writeup.
# | Issue | Class | Status | Source | Opened | Closed |
---|---|---|---|---|---|---|
D-14 | __cxa_begin/end_catch | lib | closed | all | 991216 | 001109 |
Summary:
Define the interfaces of __cxa_begin_catch
and __cxa_end_catch .
| ||||||
Resolution: See the exception handling specification. |
[991216 All]
Define how __cxa_begin_catch
and __cxa_end_catch
identify the thrown exception.
[991216 Compaq - Coleen] If you need to clean up more than one live exception from a catch handler, don't you need a 'count' parameter to __cxa_end_catch? In this case, you destroy both X and Y objects (whether or not they're both on the stack, or just X is).
Our equivalent of end_catch has a count parameter which is set to the number of live exception objects to delete and is used for branching out of the nested catch clause (not by rethrow).
struct X {
X(); ~X(); };
struct Y {
Y(); ~Y(); };
extern "C" int printf(const char *,...);
main()
{
try {
throw X();
} catch (X x) {
try {
throw Y();
} catch(...) {
//generates __cxa_end_catch(/*levels=*/2)
return 1;
}
}
}
[991217 HP -- Christophe] The reason __cxa_end_catch does not need the exception argument is that the exceptions it is interested in are in the "caught stack". When you rethrow, the exception you rethrow is also on this caught stack (it is indeed the top of the stack). So you don't need a separate copy or argument.
All you need is a flag set by __rethrow, saying "this top exception is the one being just rethrown". In that case, when __end_catch finds that the exception exits its last catch block, it will not delete it. Instead, the exception will just be popped from the stack. As a result, the exception being rethrown remains on the caught stack until you exit the last catch that caught it, and then becomes referred to only through the exception object passed in the runtime (that is, it becomes similar to a new exception being thrown: it does not appear in the caught stack.) This is the "stack + 1" model I mentioned...
__begin_catch clears the flag, in case you catch the rethrown exception before exiting the last catch handler.
This mechanism is actually correctly specified in the description of __cxa_end_catch (see in particular the last bullet):
Upon exit from the handler by any means, the epilogue calls __cxa_end_catch(), which:
What is unclear, though, is the fact that __rethrow needs to pass a flag to __end_catch for that purpose, and also that the flag is stored in the high bit of the handlerCount (which is why it did not appear in the specification...).
[000112 editor] Does this mean that the specification on pg. 16 of the HP document is the desired definition?
[000126 editor] The answer to the above question is yes. This issue is effectively closed, but I will not close it officially until the working paper reflects the clarifications in the email discussion.
[001109 Editor] These routines are specified adequately in the Exception ABI document.
# | Issue | Class | Status | Source | Opened | Closed |
---|---|---|---|---|---|---|
D-15 | Terminate handler and threads | lib ps | closed | all | 991216 | 000106 |
Summary: Define how the terminate and unexpected handler registration interacts with threads. | ||||||
Resolution: Handler registration applies to all threads. |
[991216 All]
C++ allows the user to register terminate()
and
unexpected()
handlers,
but does not specify how the registration interacts with threading.
There are (at least) three possibilities:
Several members believe the second choice (per-thread) would be very surprising to many users and is therefore a highly undesirable default.
[000106 All] Handler registration is global, applying to all threads. It is observed that the global handler can be programmed to do thread-specific processing, e.g. by keying off a per-thread datum, but that many users would find it very surprising if the registration only worked for the calling thread.
# | Issue | Class | Status | Source | Opened | Closed |
---|---|---|---|---|---|---|
D-16 | Exception specifications | lib ps | closed | all | 991216 | 000113 |
Summary: How is the type list for an exception specification represented in the action records? | ||||||
Resolution: As specified in the HP document |
[991216 All] The working paper specifies this, but HP wishes to propose a different representation.
[000106 All] Christophe believes the submitted version may actually be the desired one. He will attempt to determine this, and others should look at it closely to determine whether it has a large combinatorial impact on the compiler.
[000113 All] Noone has identified a problem with the proposal in the HP document. Close this issue, and it can be reopened if a problem surfaces.
# | Issue | Class | Status | Source | Opened | Closed |
---|---|---|---|---|---|---|
D-17 | bad_cast, bad_typeid runtime | call | closed | CodeSourcery | 000629 | 000706 |
Summary: Define runtime support routines for throwing bad_cast and bad_typeid exceptions. | ||||||
Resolution: Accepted as proposed originally. See draft EH Specification. |
[000629 CodeSourcery -- Mark]
Both EDG and G++ call run-time library routines to throw the bad_cast
and bad_typeid exceptions, rather than trying to expand the throws
inline. This is much more convenient since those exceptions can be
thrown without the headers declaring bad_cast being included. I think
we should follow this existing practice and provide appropriate entry
points. How about:
extern "C" void __cxa_bad_cast ();
extern "C" void __cxa_bad_typeid ();
[000629 CodeSourcery -- Nathan]
FYI, the G++ declarations are
extern "C" void *__throw_bad_cast ();
extern "C" std::type_info const &__throw_bad_typeid ();
Of course these never actually return, but it causes least
confusion at the calling point by keeping the type system consistent.
These are called with something like the following pseudo C++
for dynamic_cast (void *tmp = __dynamic_cast (...),
*(T*)(tmp ? tmp : __throw_bad_cast ()))
for typeid (*ptr):
(ptr ? *(type_info const *)ptr->vtable[-1] : __throw_bad_typeid ())
One side of a conditional expr can be void, but only if it is a throw expression, wrapping up the throws in function calls hides that, and in g++'s case caused problems. The easiest solution was the above declarations.
I suggest the following:
extern "C" void *__cxa_bad_cast ();
extern "C" const void *__cxa_bad_typeid ();
That typeid signature will mean a little reworking of the typeid
operator implementation for G++,
but not too much.
For implementations where Mark's suggestion is valid,
these will be too, but not vice-versa.
[000629 CodeSourcery -- Mark]
That's a reasonable suggestion, too. With a `void' return, you can
always do:
(__cxa_bad_cast (), (void*) NULL)
or whatever, in the compiler, to make the arms of the conditional have
the right type.
[000706 All] Accepted as originally proposed by Mark, without return types. The decision is intended to not burden the routines with dummy returns, since callers with ?: operators can use casts to achieve the desired result.
# | Issue | Class | Status | Source | Opened | Closed |
---|---|---|---|---|---|---|
D-18 | __cxa_throw_type_info | lib | closed | all | 001012 | 001109 |
Summary:
Should we replace the __cxa_throw_type_info pointer in the exception
object by a pair of pointers to a std::type_info and a
destructor?
| ||||||
Resolution: Make the replacement. See Sections 2.2.1 and 2.4.3 of the See draft EH Specification. |
[001012 all]
Making this type be a pair (type_info and destructor pointers)
makes it necessary that a thrower or __cxa_throw
construct one so that the exception object can point to it.
This can't be done on the stack,
since it's about to be unwound,
and doing it on the heap when the
exception might be out-of-memory doesn't seem ideal.
We propose that instead,
we replace the __cxa_throw_type_info
pointer
in the exception object header by separate
std::type_info
and destructor pointers,
and pass them as two parameters to __cxa_throw
.
We also noticed that, if the thrown object is an array, the destructor passed will need to be a fabricated one which loops over the array elements. The alternative, to store the array bounds explicitly in the exception object, seems to be a lot of overhead for a very rare case.
[001109 all] The interface change will be made.
E-1 | When does instantiation occur? | tools | closed | SGI | 990520 | 000511 |
Summary: There are two principal models for instantiation. The early instantiation (or Borland) model performs all instantiation at compile time, potentially resulting in extra copies which are removed at link time. The pre-link instantiation model identifies the required instantiations prior to linking and instantiates them via a special compile step. | ||||||
Resolution: Non-export templates are instantiated where referenced in COMDAT groups. See the Draft C++ ABI for IA-64. |
[000511 All] Non-export templates are instantiated where referenced in COMDAT groups. We will not deal with export templates at this time (E-2).
# | Issue | Class | Status | Source | Opened | Closed |
---|---|---|---|---|---|---|
E | Template Instantiation Model | |||||
E-1 | When does instantiation occur? | tools | closed | SGI | 990520 | 000511 |
Summary: There are two principal models for instantiation. The early instantiation (or Borland) model performs all instantiation at compile time, potentially resulting in extra copies which are removed at link time. The pre-link instantiation model identifies the required instantiations prior to linking and instantiates them via a special compile step. | ||||||
Resolution: Non-export templates are instantiated where referenced in COMDAT groups. See the Draft C++ ABI for IA-64. |
# | Issue | Class | Status | Source | Opened | Closed |
---|---|---|---|---|---|---|
E-3 | Template repository | tools | closed | HP | 990603 | 000511 |
Summary: Independent of the template instantiation model, we need to make sure that whatever template persistent storage is used by one vendor does not interact negatively with other vendors' mechanisms. Issues: (1) Avoiding conflict on the name of any repository. (2) If .o files are used, describe how this information is to be preserved, ignored, etc. (3) Evaluate if tools such as make, ld, ar, or others, can break because .o files get written at unexpected times. | ||||||
Resolution: COMDAT emission and naming for non-export templates is specified in the Draft C++ ABI for IA-64. |
[000511 All] Treatment is specified now for non-export templates; We will not deal with export templates at this time, given no existing implementations to serve as models.
# | Issue | Class | Status | Source | Opened | Closed |
---|---|---|---|---|---|---|
F-1 | Mangling convention | call | closed | SGI | 990520 | 000330 |
Summary: What rules shall be used for mangling names, i.e. for encoding the information other than the source-level object name necessary to resolve overloading? | ||||||
Resolution: See the Draft C++ ABI for IA-64. |
[991019/28 various] The following is assembled from several mail messages on the subject.
For entities with C name linkage, the entity's linkable name is identical to its base name (as usual).
Note that linkable names include not only names with C++ global scope semantics, but also "local" names which for some reason end up requiring linker resolution (e.g. static local variables declared in inline functions). Note also that inlining requirements apply equally to functions declared inline and those chosen to be inlined by the compiler.
For function-like entities with C++ name linkage, the following components MUST be part of the of the name:
[ For the last item, consider:
template void f(T1, T2);
template void f(T2, T1);
The encoding of each of these templates instantiated for In addition, it may be desirable to encode the following components:
Namespace scope variables and static data members have linkable names that must include at least:
Note that although there are benefits to encoding array size,
and therefore being able to catch mismatches,
the ability to declare a[]
makes this problematic.
fundamental types:
type modifiers/constructors:
The types in parentheses are available in C99, but not in standard C++.
[991021 all] It was observed in the meeting that it might be better to deal with non-essential type information (e.g. exception specifications, array sizes) as a separate construct to allow error detection, rather than as a required part of the mangled name. This allows it to be elided or removed if unneeded.
[991028 all] Objectives of a specification were discussed, and have been added to the writeup above.
[000127 IBM -- Mark] [Ed.]: Mark raises the issue of how template expression parameters are mangled. The Standard requires that equivalent expressions be identified, but not all functionally equivalent ones. The relevant paragraph is 14.5.5.1. Don't lose this issue.
[000127 All] Notes from the meeting:
[000210 All -- Matt] Notes from the meeting:
We have agreed that local statics and local classes must be mangled. We agreed that string literals should also be mangled even if linker features might make it unnecessary. The motivation is a desire to support less capable linkers on other platforms.
For local statics and local classes, the mangled name consists of the mangled function name, a sequence number, and the name of the local class/varaible. For string literals the mangled name consists only of the mangled function name and the sequence number.
(There was concern that this might prevent merging of identical string literals. Jason believes that given a smart linker it will just result in multiple names for the same string literal.)
Sequence numbers are assigned in lexical order within a function, starting at 1. The entities that receive sequence numbers are local static variables, local classes, and string literals. Other entities (e.g. automatic variables) do not receive or affect sequence numbers.
Exception specification information must be part of the mangled name of a function.
Special entities that need to receive mangled names, in addition to those mentioned in Daveed's document:
Exported template may require other things to be mangled. We don't have a detailed analysis.
We discussed the idea of having a small dictionary of well known names, so that mangled names could be shorter. Jason was concerned with readability of mangled names if we had too many things in this dictionary, and Daveed was concerned that a large dictionary wouldn't give enough of a space savings because an index would take too many bits. If we have such a dictionary it will have very few names in it. Some obvious candidates are:
std
std::char_traits
std::allocator
std::basic_string, std::allocator >
[000215 HU-Berlin -- Martin] (Re: sequence numbers for statics in inline functions.)
The C99 standard defines an implicit variable inside of each function:
static const char __func__[]="function-name";
Proposal: The sequence number of __func__ is 0.
Of course, there is always discussion what the value of __func__ is in C++ context; I think this does not necessarily need to be defined by the ABI (or the question whether __func__ is defined at all - if it is not used in a function, it does not matter).
[000217 Editor] Note that the current mangling proposal is now part of the Draft C++ ABI for IA-64.
[000308 All] Several loose ends were discussed (primarily vtable-related). Jason will do a YACC description to check for ambiguity.
[000313 SGI -- Jim] I have reworked the description in the Draft C++ ABI for IA-64, to get a more precise grammar description, and to incorporate the loose ends decisions from the meeting and proposals for a few more.
[000316 All] Extensive discussions in the meeting, reflected in the updated Draft C++ ABI for IA-64.
[000323 All] Extensive discussions in the meeting, reflected in the updated Draft C++ ABI for IA-64. The principal decisions were:
[000330 All] Change virtual thunk mangling to encode static offset to nearest virtual derived class. Encode single void parameter type for parameterless functions, to facilitate demangling distinction from data objects. Use object name for named entities, hash for strings, in mangling local names, to minimize implementation mistakes.
# | Issue | Class | Status | Source | Opened | Closed |
---|---|---|---|---|---|---|
F-2 | Mangled name size | call g | closed | SGI | 990520 | 000511 |
Summary:
Typical name mangling schemes to date typically begin to produce very
long names. SGI routinely encounters multi-kilobyte names,
and increasing usage of namespaces and templates will make them worse.
This has a negative impact on object file size, and on linker speed.
SGI has considered solutions to this problem including modified string tables and/or symbol tables to eliminate redundancy. Cygnus, HP, and Sun have also considered or implemented approaches which at least mitigate it. | ||||||
Resolution: The current mangling solution is considered an adequate solution to this problem. |
[991028 all] Cygnus and Sun use a mangling scheme which has proven extremely effective at compression, but not overly complex. Each time the mangler incorporates a type into a name, it remembers it and assigns it a number, and subsequent occurrences of the type in the name are replaced by the (escaped) number. Jason believes this might be adequate compression, without going to large character sets or more complex schemes.
[991115 SCO -- Jonathon] In a discussion with Matt Austern I suggested using a collision-resistant hash function on the manglings to generate the names actually used in object files. (The algorithm is: first mangle, then hash.) This could really reduce .o size a ton; think expression templates, etc. I bet this would have a much bigger impact that any obvious compression algorithm; you could just decree that all symbols be no longer than 256 bits long, say. Lots of tools (assemblers, debuggers) will use less space/time dealing with the shorter names. You would keep around a table mapping hashes back to the original mangled names for debugging.
An interesting twist on this would be to use a secure hash with a key. For ordinary compilation, use some well-known key. But, by setting some flag/environment-variable you could tell the compiler to use a key of your choice. You can now distribute a .o that is hard to link to -- unless you know the key.
<After a request for clarification...>
A collision-resistant hash function is a notion from cryptography. (That's the world I spend a lot of my time in when I'm not doing compiler stuff.)
Suppose you have an n-bit hash, so you have 2^n hash values. A collision-resistant hash is one where the probability of two randomly chosen strings hashing to the same value is (very close to) 1/(2^n). A stronger notion of this is that finding strings that collide is computationally infeasible.
Certainly, hashing introduces a probabilistic nature to things: it becomes possible that two different functions could hash to the same hash-mangled name. However, by choosing a good hash function (and provably good ones exist) and enough bits, you can make it considerably less likely that in the next hundred years two distinct functions will hash to the same name, than that cosmic rays will cause unpredicatable linker errors.
... this (the assumption that mangling is reversible, as the basis for such things as the c++filt tool) is the biggest objection I can think of.
We originally came up with this idea for our C++-to-C translator. We ship this to people with embedded systems whose linkers only support 16-characters; by using a collision-resistant hash they can use C++. Nobody has ever run into a collision. We solved the c++-filt problem by keeping a database mapping hashes back to mangled names. (The probabilistic guarantee says that this database can actually be global; in our lifetime will never see two things with the same hash.) So, it's still possible to make a c++-filt that works, but it is admittedly more difficult.
The biggest advantage to this scheme is that you can put an upper bound on symbol lengths, even if the presence of truly huge template usage. (I've seen programs where mangled names approached a megabyte in length.) I would only suggest hashing long names; names under 100 characters, or even a thousand characters, say, could be left unhashed.
[000504 All] Alex Samuels has mangling almost done, and will provide data on before/after sizes of library symbols.
[000511 HU-Berlin -- Martin] I finally managed to remangle the set of names that Matt Austern kindly provided. Please take my results with a grain of salt:
As a result, some of these names come out wrong. In particular, if
template parameters appear in the signature, I use the substituted
parameters instead of the formal ones (i.e. I never use
I've produced a table showing how the size of EDG-mangled names relates to the new names. For each length of an old name, it shows how often a certain new length appeared. E.g. for
89 : 71(2x) 72(5x)
there were a total of 7 names with 89 characters in Matt's list. Under the new mangling, 2 of them are now 71 characters, and 5 are 72 characters in size.
In general, all names under the new mangling are shorter than under EGG's mangling, with a single exception (listed on top). For short names (<80char), size reduction is small, unless one of the predefined dictionary entries is used. For longer names (>200 chars), compression under the new ABI is about 50% better than under the EDG scheme.
If you find errors in my implementation that could be corrected from looking at the demangled names, please let me know; I can then produce corrected statistics.
51 : 43(18x) 44(10x) 27
52 : 45(30x) 44(7x) 43(8x) 50(6x)
53 : 47 46(12x) 45(18x) 44(8x) 51(2x) 50(8x)
54 : 47(32x) 46(10x) 45(2x) 53 48
55 : 47(19x) 46(16x) 53 41 48(21x)
56 : 47 48(12x)
57 : 55 44 51 50(10x) 48(4x)
58 : 38 50(7x) 56
59 : 47(2x)
60 : 47 38 51(8x) 59
61 : 55
62 : 54 53(16x) 50 65 INCREASED 56
63 : 51(2x)
64 : 63 52(2x)
65 : 54(2x) 44 50(2x) 52
66 : 55(3x) 65
67 : 57 56
68 : 49 11 58(2x) 57(2x) 56
69 : 47(6x) 12(3x) 59 58(3x) 57 55(4x) 50(4x) 9
70 : 13(2x) 60(2x) 51(3x) 56(2x) 48(3x)
71 : 14(4x) 52(3x) 59(2x) 57 56
72 : 15 14 53(2x) 60(2x) 57
73 : 63 62(2x) 58(2x) 54(7x) 53(2x) 15
74 : 59(6x) 55(6x) 54(3x) 69 66 64(2x)
75 : 63 60(3x) 57(2x) 56(4x) 70 18(2x)
76 : 63 62(2x) 61(2x) 58 57(5x) 55(2x) 64
77 : 59(2x) 62(4x) 66
78 : 63(2x) 68 57 66(2x) 60(2x) 64(2x)
79 : 78 61(3x) 62 67(2x) 65(2x)
80 : 63(2x) 62(11x) 69(2x) 66
81 : 63(3x) 62(2x) 61 58(2x) 23(8x) 54 64(2x)
82 : 23(4x) 69 68 26(2x) 64 24
83 : 71 78 69 27 66(4x) 65(2x)
84 : 55(3x) 73 67(4x) 66 65(2x)
85 : 63(2x) 69(4x) 65(2x)
86 : 68(8x)
87 : 65
88 : 70(2x)
89 : 71(2x) 74 73(4x)
90 : 68 75 74(2x) 73 72(2x)
91 : 24 74 73(2x) 64
92 : 77(2x) 76(6x)
93 : 78(2x) 77(4x) 76(2x) 11 41
94 : 79(4x) 77(2x) 80(2x)
95 : 79(2x) 65(2x)
96 : 75(2x) 73
97 : 67 68(4x)
98 : 14 69(4x) 84 83(2x) 56
99 : 15(4x) 45(2x) 60 27(3x) 83 70(4x) 67
100 : 68(3x)
101 : 17 68(4x) 59 82(2x) 19 49
102 : 63(2x) 70 60(2x) 17
103 : 71(2x) 70(2x) 18 64(3x)
104 : 78(2x) 86(8x) 21 89
105 : 86(8x) 85(2x) 67 90 64
106 : 54 24
107 : 91 88(2x)
108 : 87(4x) 92
109 : 87 74 88(2x)
110 : 94(2x) 27(2x) 26 89(4x)
111 : 95(2x) 28 27 73 89(4x)
112 : 29 97(2x)
113 : 98
114 : 31 30(2x) 93(6x)
115 : 31(4x) 33
116 : 95(8x) 101
117 : 95(8x) 103
118 : 97
119 : 36
120 : 31 95
122 : 38(2x)
124 : 74
125 : 109(2x)
126 : 110 42(2x) 52
128 : 72(2x) 77 108(2x) 44(4x) 112
129 : 33 44(5x) 113(2x) 65 73(2x)
130 : 47 110(4x) 45 75 115(2x) 114(2x)
131 : 51 116 115(3x)
132 : 47 74 56 72 53 116 82(3x) 117(3x)
133 : 83 118 117
134 : 119(3x) 118(3x)
135 : 119(2x) 51 120(4x)
136 : 50 121(3x)
137 : 122(5x) 105
138 : 123(4x) 106(2x)
139 : 124(4x)
140 : 125(5x) 65(2x)
141 : 126(2x)
142 : 127 110 44(2x)
143 : 128
146 : 94
148 : 96
149 : 52
150 : 55 60
152 : 70(2x) 122
154 : 56
157 : 68
160 : 70
162 : 55 69
169 : 126
171 : 130
174 : 72(2x)
176 : 74(2x)
178 : 75(2x)
180 : 78(2x)
185 : 61
186 : 71
187 : 83
188 : 71 70
191 : 74
192 : 75(2x) 89(8x)
193 : 89(8x)
194 : 97(2x) 108
196 : 109
197 : 101(2x)
202 : 95(2x) 150
215 : 106 48
218 : 121
220 : 106
226 : 132
228 : 133
232 : 108
234 : 111 109
235 : 139
240 : 116
242 : 117
243 : 119
250 : 145
251 : 143 128
264 : 111
267 : 163
268 : 133
278 : 88
280 : 98 93(2x) 113
282 : 132
283 : 101 116
285 : 151
288 : 130
303 : 143
305 : 144 100
308 : 148
330 : 159
333 : 133
342 : 133 148
347 : 177
355 : 101
530 : 161
# | Issue | Class | Status | Source | Opened | Closed |
---|---|---|---|---|---|---|
F-5 | ILP32 vs. LP64 | call | closed | HP | 000210 | 000824 |
Summary: This ABI focusses on the LP64 data model. What should we do (if anything) to support (a) compatibility between different vendors' ILP32 compilers (b) compatibility between ILP32 and LP64? | ||||||
Resolution: Withdrawn -- no action. |
[000210 All -- Matt] HP will be supporting an ilp32 model as well as as an lp64 model. The ABI only discusses an lp64 model. Do we want to support ilp32 in any way? What will we have to do to support (a) compatibility between different vendors' ilp32 compilers, or (b) compatibility between ilp32 and lp64? HP has suggested, for example, modifying the mangling scheme so that long long in ilp32 is mangled the same way as long in lp64. Is this enough to ensure ilp32/lp64 link compatibility, or would we need to make many other changes as well?
[000217 All] The group observed that one can prevent all incorrect linkage by using a different version prefix for LP64 and ILP32 mangling. Christophe would prefer to just mangle those types that are different differently, so as not to prevent linkage when it would work. It is not clear whether mixed models are workable enough to make such a complication useful. Christophe will produce a concrete proposal to discuss once the base mangling is settled enough to base it on.
# | Issue | Class | Status | Source | Opened | Closed |
---|---|---|---|---|---|---|
F-6 | Demangling | lib | closed | Cygnus | 000210 | 000504 |
Summary: Users may sometimes want to get demangled names. Should we provide an entry point for calling a demangler? | ||||||
Resolution: Provide a simple demangler interface callable from C. See the Draft C++ ABI for IA-64. |
[000210 all -- Matt] Users have access to types' mangled names via the standard type_info class. Users may sometimes want to get demangled names. Should we provide an entry point for calling a demangler? This might be a standalone function, perhaps with an interface like that of EDG's demangle(), or it might be some kind of type_info extension. If we do this, should we attempt to specify exactly what demangled names look like, or should we explicitly leave it unspecified and warn users not to depend on the exact format?
[000321 HU-Berlin -- Martin]
Suggestion:
namespace abi {
std::string demangle_mangled_name (const char*); // <mangled-name>
std::string demangle_type (const char*); // <type>
}
[000330 all] The problem with the suggested interface is that using std::string requires sucking in half the standard library. An alternate proposed is that the user pass in a buffer, with a NULL pointer causing the routine to allocate storage. Christophe also volunteered to send the HP interface, though it is a bit heavyweight.
[000330 HP -- Christophe]
Here is the interface HP offers today.
As I said, it seems overly complicated,
compared to what Matt proposed.
On the plus side, it has handling of erroneous input,
which I believe we need to define.
class TDemangler {
public:
void * operator new(size_t size) {
return (void*)malloc(size);
}
void operator delete(void *deadObject) {
free(deadObject);
}
TDemangler();
TDemangler(const char *mangledDecl);
~TDemangler();
enum Status { OK, Empty, Error, Truncated };
void reset();
Status getStatus() const { return status; }
Status demangleDecl(const char *mangledDecl);
Status demangleType(const char *mangledType);
Status copy(char *result, size_t maxToCopy /*including null*/) const;
Status copy(char *result, size_t maxToCopy /*including null*/,
char *name, size_t nameLength) const;
private:
Status status;
const char *p;
const char *end;
void partial(bool top, bool typeOfExternalDecl = false);
void typeName(size_t &baseOffset, size_t &baseLength);
void templateArgs();
void writePrefix(const char *text, size_t length);
void writeSuffix(const char *text, size_t length);
void writeDuplicate(unsigned offset, unsigned length);
void writeBaseName(const char *baseName, size_t baseNameLength,
size_t classNameOffset, size_t classNameLength);
enum Spacing { Before, None, After };
void writeQualifiers(const char *cv, Spacing spacing);
size_t extractCount();
void demangleDecl();
char *buffer;
size_t bufferSize;
enum { InternalBufferSize = 200 };
char internalBuffer[InternalBufferSize];
size_t nameSize;
size_t prefixSize;
size_t suffixSize;
bool spaceBeforeName;
void makeAvailable(size_t length);
void merge();
static size_t min(size_t a, size_t b) { return a < b ? a : b; }
};
[000406 all] There was some discussion of the desirability of making the demangler a class member. Christophe believes it would thereby become easier to derive from it, e.g. to tailor output. Others believe it would add unnecessary complication; one particular concern is that it be callable from C. Christophe and Matt will send specific proposals.
It was observed that Martin's suggestion of two functions is unnecessary. A name beginning with "_Z" is a <mangled-name>; otherwise it is a type name (if valid).
[000406 SGI -- Matt] We need to return multiple return values: a status code, and a buffer pointer. We can use an extra level of indirection on one, both, or neither. If neither, we need to return a pair or the moral equivalent.
ALTERNATIVE A
namespace abi {
extern "C"
char* __cxa_demangle ( const char* mangled_name,
char* buf, size_t n,
int* status );
}
mangled_name is a null-terminated string with the mangled name. buf is a pointer to a user-provided buffer of at least n characters. If buf is a null pointer then n is ignored, and demangle allocates its own buffer with malloc. The user is responsible for freeing it.
If the return value is non-null, it points to a null-terminated string with the demangled name. If the return value is null, an error has occurred. *status == 0 means the demangling failed because the buffer wasn't long enough (or because malloc failed). *status == -1 means the demangling failed because mangled_name is invalid.
Users may pass a null pointer as the last argument to __cxa_demangle. All that means is that, if the demangling fails, they won't be able to find out why.
ALTERNATIVE B
namespace abi {
struct dm {
char* name;
enum { buffer_too_small, invalid_name } status;
};
dm demangle(const char* mangled_name, char* buf, size_t n);
}
mangled_name is a null-terminated string with the mangled name. buf is a pointer to a user-provided buffer of at least n characters. If buf is a null pointer then n is ignored, and demangle allocates its own buffer with malloc. The user is responsible for freeing it.
If result.name is non-null, it points to a null-terminated string with the demangled name. If result.name is null, demangling has failed and result.status gives the type of failure.
DISCUSSION
I prefer alternative A, even though the error indication is clumsier, because it's callable from C. Having a C-callable demangling interface could come in handy, e.g. for linkers. If we decide that's unimportant, we should go with alternative B.
[000406 HP -- Christophe]
ALTERNATIVE C
Interface:
namespace abi
{
struct demangler
{
// Provide name to demangle
void demangle(char *);
protected:
// Output demangled characters
// I don't know whether it is better to output
// on char or a string... It seems there are
// many cases where the demangler can put
// multiple chars at the same time, but they
// are not zero-terminated (we know the length)
virtual void output(char c);
};
}
Implementation:
#include <cxxabi.h>
#include <iostream>
using namespace std;
void abi::demangler::output(char c)
{
cout << c;
}
[000413 All] Most members strongly prefer a C-callable interface. Discussion centered around how to handle memory allocation (user, library, re-allocatable, etc.) and whether options like gcc's (e.g. list parameters or not) are desirable. Matt will consider these and modify his proposal.
[000427 SGI -- Matt] One thing I promised to do and didn't, though, was to come up with a revised demangler interface. Here it is. It's more complicated than I like, but the complexity does serve a real purpose. Motivation:
namespace abi {
char* __cxa_demangle(const char* mangled_name,
char* buf,
size_t* n,
int* status);
}
mangled-name
is a pointer to a null-terminated array of characters.
buf
may be null.
If it is non-null, then n
must also be nonnull,
and buf is a pointer to an array, of at least *n characters,
that was allocated using malloc.
status
points to an int that's used as an error indicator.
It is permitted to be null,
in which case the user just doesn't get any detailed error information.
Behavior: the return value is a pointer to a null-terminated array of characters, the demangled name. If there is an error in demangling, the return value is a null pointer. The user can examine *status to find out what kind of error it is. Meaning of error indications:
Memory management:
[000504 All] Accept Matt's latest proposal.
# | Issue | Class | Status | Source | Opened | Closed |
---|---|---|---|---|---|---|
F-7 | Mangling statics | call | closed | HP | 000223 | 000504 |
Summary: What, if anything, should we do about mangling the names of objects in static functions in case a compiler chooses to inline them? | ||||||
Resolution: Local objects are mangled with the name of the containing function followed by a discriminator, consisting of the object name and possibly a sequence ID. Strings are mangled with a discriminator consisting of "s" followed by a sequence ID. See the Draft C++ ABI for IA-64. |
# | Issue | Class | Status | Source | Opened | Closed |
---|---|---|---|---|---|---|
F-8 | Identifiers with unicode letters | call | closed | HU-Berlin | 000323 | 000413 |
Summary: How should we mangle names containing unicode letters? | ||||||
Resolution: Follow the underlying C ABI. |
[000323 HU-Berlin -- Martin]
2.2, [lex.charset]/2, allows usage of universal-character-names in
C++ programs, especially in identifiers and strings.
How do we mangle the variable pi below?
namespace newmath {
const long double \u03A0 = 3.14159265358979;
}
This is also an issue for C99, so it may be that the base ABI has a specification; we'd have to follow that at least for extern "C" names. If not, I propose that such names are encoded in UTF-8.
[000405 Cygnus -- Jason] UTF-8 is inappropriate for mangled names, as it uses values > 127 to encode non-ASCII characters.
GNU Java encodes names in UTF-8 internally. For the mangled name, if there are non-ASCII characters, it adds a 'U' to the beginning and encodes each such UCS-2 character as _%04x. See gcc/java/mangle.c.
This assumes that all interesting characters fall within the Basic Multilingual Plane (the low 16 bits); that is a valid assumption for us, since all the extended characters valid for use in C++ identifiers are part of the BMP.
[000411 HU-Berlin -- Martin] Why is [UTF-8] not appropriate? AFAICT, the gABI has no restriction in that respect. ch4.strtab.html says
String table sections hold null-terminated character sequences, commonly called strings.
I can see there are a number of alternatives. I think it is important that there is agreement on the rules, in a way that is also interoperable with C99 implementations. What those rules are is not that important.
GNU Java encodes names in UTF-8 internally. For the mangled name, if there are non-ASCII characters, it adds a 'U' to the beginning and encodes each such UCS-2 character as _%04x. See gcc/java/mangle.c.
In the C++ ABI, the natural adaptation of that approach would be to mangle non-ASCII-containing identifiers as _U instead of _Z, right? Unfortunately, that does not give a solution for C names. I believe the GNU Java approach also cannot be extended to C99.
[000413 All] We need to follow the underlying C ABI. Names containing unicode letters after mangling according to our normal mangling rules will be encoded as required for external names by the C ABI.
[000504 All] Agreed that only function and member function template parameters are mangled with T*_. Jim will go back to single nested name grammar, and include auxiliary symbols (e.g. RTTI) for builtin types.
# | Issue | Class | Status | Source | Opened | Closed |
---|---|---|---|---|---|---|
F-9 | Strings with unicode letters | call | closed | HU-Berlin | 000323 | 000413 |
Summary: How should we handle the object file representation of narrow and wide string literals containing unicode letters? | ||||||
Resolution: Follow the underlying C ABI. |
[000323 HU-Berlin -- Martin]
2.2, [lex.charset]/2, allows usage of universal-character-names in
C++ programs, especially in identifiers and strings.
Consider the example:
wchar_t MvL[]=L"Martin von L\u00F6wis";
First, what is sizeof(wchar_t) in the base ABI? I'll assume 4 for the moment. Then, the question comes down to: What is the execution character set, and the wide execution character set? 2.2/3 says they are implementation-defined, so I guess we must define them. Typically, people expect this to be a run-time setting (which is a reasonable assumption), but it kind-of breaks for string literals.
Proposal: The wide execution character set is UCS-4. The execution-character-set is "as-is", i.e. bytes from the source character set are copied unmodified to the object file. Universal-character-names appearing in narrow (ie. char) strings are not portable in this ABI (the other alternatives would be to say they are Latin-1, or encoded as UTF-8, I guess).
[000405 Cygnus -- Jason] I have been told that it is inappropriate to assume that wchar_t is always UCS-4; a suggestion was to convert from UCS-4 to the host locale character set using iconv(), and then if we're in a wide string, convert to wchar_t with mbtowc(). This makes sense to me, though of course it requires iconv to know about UCS-4.
[000413 All] We need to follow the underlying C ABI. Strings containing unicode letters will be encoded as required by the C ABI.
# | Issue | Class | Status | Source | Opened | Closed | |
---|---|---|---|---|---|---|---|
F-10 | Mangling function return types | call | closed | all | 000330 | 000413 | |
Summary: Should we always mangle the return type of a function? | |||||||
Resolution: No. It is mangled only for template instantiations/specializations. |
[000504 All] See the comment for issue F-3.
# | Issue | Class | Status | Source | Opened | Closed |
---|---|---|---|---|---|---|
F-11 | Hash for local strings | call | closed | all | 000330 | 000504 |
Summary: How should we hash strings for local name mangling? | ||||||
Resolution: Strings are mangled with a discriminator consisting of "s" followed by a sequence ID. See the Draft C++ ABI for IA-64. |
[000406 All] One suggestion is to go back to the collision-resistant hash suggested by Mark in November in another context. The relevant source code is attached as fingerprint.h and fingerprint.c .
[991119 CodeSourcery -- Mark] I was asked to provide a little more information on collision-free hashing algorithms. I've appended our source to do this in our C++-to-C translator. The hash function here was originally used in Modula-3; it is provably collision-resistant. This version uses 64 bits; the algorithm can be extended to any bit length, however.
Even for 64 bits, the probabilistic guarantee (details at Compaq research) ensures that (for example), the chance of getting a collision with a thousand mangled names of length a thousand is less than one in a billion.
At CenterLine, we used this algorithm to compute type fingerprints to detect ODR mismatches at link-time. The same trick could be used to see whether all definitions of an inline function are really the same. It's better to use a collision-resistant hash (like this one) than an ad-hoc hash because the math actually guarantees nice properties.
Other examples of collision-free hashses are "secure hashes", i.e., those designed to resist an adversaries ability to create a text with a given hash, or to find collisions. Well-known examples include SHA and MD5.
[000504 All] We will use the simpler scheme of the function name followed by a discriminator consisting of "s" followed by a sequence number.
[000413 All] No. It requires more space, it can be done external to the mangling, and the group is uncomfortable with the potential breakage.
# | Issue | Class | Status | Source | Opened | Closed |
---|---|---|---|---|---|---|
G-1 | Basic command line options | tools | closed | HP | 990603 | 000824 |
Summary: Can we agree on basic command line options (compiler and linker) for fundamental functionality, possibly allowing portable makefiles? | ||||||
Resolution: Withdrawn -- no action. |
# | Issue | Class | Status | Source | Opened | Closed |
---|---|---|---|---|---|---|
G-2 | Detection of ODR violations | call | closed | Sun | 990603 | 000504 |
Summary: [Sun] (See also F-3.) | ||||||
Resolution: This is a duplicate. See F-3, F-4, F-10. |
# | Issue | Class | Status | Source | Opened | Closed |
---|---|---|---|---|---|---|
G-3 | Inlined routine linkage | call | closed | Sun | 990603 | 991202 |
Summary: Inline routines with external linkage require a method of handling vague linkage (see B-5 for definition) for the out-of-line instance, as well as for any static data they contain. The latter includes string constants per [7.1.2]/4. | ||||||
Resolution: Out-of-line instances are emitted where required, using COMDAT (issue B-5). Static data referenced will be placed in COMDAT sections as well. The names of each are addressed as part of mangling (issue F-1). Strings will be emitted in SHT_MERGE/SHT_STRING sections, with the static linker responsible for removing duplicates. |
[990624 Cygnus -- Jason] How should we handle local static variables in inlines? G++ currently avoids this issue by suppressing inlining of functions with local statics. If we don't want to do that, we'll need to specify a mangling for the statics, and handle multiple copies like we do above.
[990721 Cygnus -- Jason] [We should emit inline routines] in translation units where an out-of-line copy is needed. I am opposed to emitting the inlines with the vtable, for two reasons:
[991118 All] We discussed linkage of static locals in inline functions. The C++ standard requires that there be only a single object in the entire program, i.e. the static locals in different translation units must be merged. Two cases: string literals and everything else. "Everything else" is believed to be a rare and unimportant case. We'll just give the static locals mangled names, and put them in comdat groups. String literals are believed to be common, and mangled names in COMDAT is too heavyweight. The base ABI provides an optional mechanism for merging all copies of a given string literal. We would like to make this mechanism mandatory, so that string literals in inline functions get merged automatically.
[991202 All] The use of the new SHT_MERGE/SHT_STRING attributes, requiring the static linker to do the merging, was decided to be a suitable solution. It was noted that this will not provide merging across DSOs, but this is not considered a problem. An implementation may overcome this by naming the strings and invoking dynamic linker name preemption, at the cost of additional dynamic link time.
# | Issue | Class | Status | Source | Opened | Closed |
---|---|---|---|---|---|---|
G-4 | Dynamic init of local static objects and multithreading | call | closed | SCO | 990607 | 001109 |
Summary: The Standard requires that local static objects with dynamic constructors be initialized exactly once, the first time the containing scope is entered. Multi-threading renders the simple check of a flag before initialization inadequate to prevent multiple initialization. Should the ABI require locking for this purpose, and if so, what are the necessary interfaces? In addition to the locking of the initialization, special exception handling treatment is required to deal with an exception during construction. | ||||||
Resolution: The ABI will specify an 8-byte guard variable, with one byte used for the initialization flag, and the others available for use by a threading package for locking. ABI routines are specified for acquiring and releasing the lock. See ABI section 3.3.2. |
[990607 SCO -- Jonathan] The standard is mute on multiple threads of control in general, so there is no requirement in the language to support what I'm talking about. But as a practical matter compilers have to do it (Watcom gave a paper on their approach during the standardization process, if I remember). This example using UI/SVR4 threads will usually show whether a compiler does it or not:
thr5.C:
// static local initialization and threads
#include
#define EXIT(a) exit(a)
#define THR_EXIT() thr_exit(0)
#include
int init_count = 0;
int start_count = 0;
int init()
{
::thr_yield();
return ++init_count;
}
void* start(void* s)
{
start_count++;
static int i = init();
if (i != 1) EXIT(5);
THR_EXIT();
return 0;
}
int main()
{
thread_t t1, t2;
if (::thr_create(0, 0, start, 0, 0L, &t1) != 0) EXIT(1);
if (::thr_create(0, 0, start, 0, 0L, &t2) != 0) EXIT(2);
if (::thr_join(t1, 0, 0) != 0) EXIT(3);
if (::thr_join(t2, 0, 0) != 0) EXIT(4);
if (start_count != 2)
EXIT(6);
if (init_count != 1)
EXIT(7);
THR_EXIT();
}
When compiled with CC -Kthread thr5.C on UnixWare 7, for instance, it passes by returning 0. When compiled with CC -mt thr5.C on Solaris/x86 C++ 4.2 (sorry don't have the latest version!), it fails by returning 5.
[990607 Sun -- Mike Ball] As far as I can tell, the language says that the automatic blocking issue isn't a valid approach. It says what has to happen, and it isn't that.
If you look at the entire statement you find that it reads:
"Otherwise such an object is initialized the first time control passes
through its declaration; such an object is considered initialized upon
the completion of its initialization.
If the initialization exits by throwing an exception,
the initialization is not complete,
so it will be tried again the next time control enters the declaration.
If control re-enters the declaration (recursively)
while the object is being initialized,
the behavior is undefined."
The word "recursively" is normative, so eliminates that sentence from consideration.
One can, of course, make any extension to the language, but in this case I think the extension invalidates some otherwise valid code.
The sentence I'm referring to is that the object is considered initialized upon the completion of its initialization. This is explicit, and the reason for it is covered in the following sentence, which discusses an initialization that terminates with an exception. A person catching such an exception has the right to try again without danger that the static variable will be initialized in the meantime.
I don't see anything at all to justify semantics that say, "after initialization is started, Any other threads of control are blocked until that thread completes the initialization, unless, of course, it executes by an exception, in which case the other thread can do the initialization before the exception handler gets a chance to try again, except...." Take an attempt to define the semantics as far as you like.
The problem is that there is no way for the compiler writer to know what the programmer really wanted to do. I can (and will at some other date, if necessary) come up with scenarios justifying a variety of mutual exclusion policies, including none.
The solution is to let the programmer write the mutual exclusion, the same as we do for every other potential race condition. It's a real mess, and, I claim, an unwise one to put in as an extension.
[990608 HP -- Christophe] The semantics currently implemented in the HP aC++ compiler is as follows:
There are details of our implementation that I disagree with, but in general, the semantics seem clear and sane, not as convoluted as you seemed to imply. In particular, it correctly covers the case where the static initialization fails with an exception. Any thread at that point can attempt the initialization.
[990608 SCO -- Jonathan] Here's what the SCO UnixWare 7 C++ compiler does for IA-32, from a (slightly sanitized) design document. It meets Jim's goal of having no overhead for non-threaded programs and minimal overhead for threaded programs unless actual contention occurs (infrequent), and meets Mike's goal of handling exceptions in the initialization correctly (although it doesn't guarantee that the thread getting the exception is the one that gets next crack at initializing the static). It's also worth noting that dynamic initialization of local variables (static or otherwise) is very common in C++, since that's what most object constructions involve, so I don't think this case is as rare as Jim does.
[...] This is in local static variables with dynamic initialization, where the compiler generates out a static one-time flag to guard the initialization. Two threads could read the flag as zero before either of them set it, resulting in multiple initializations.
[...] Accordingly, when compilation is done with -Kthread on, a code sequence will be generated to lock this initialization. [...] the basic idea is to have one guard saying whether the initialization is done (so that multiple initializations do not occur) and have another guard saying whether initialization is in progress (so that a second thread doesn't access what it thinks is an initialized value before the first thread has finished the initialization). [...]
When compiled with -Kthread, the generated code for a dynamic initialization of a local static variable will look like the following. guard is a local static boolean, initialized to zero, generated by the [middle pass of the compiler]. Two bits of it are used: the low-order 'done bit' and the next-low-order 'busy bit'.
.again:
movl $guard,%eax
testl $1,(%eax) // test the done bit
jnz .done // if set, variable is initialized,
done
lock; btsl $1,(%eax) // test and set the busy bit
jc .busy
< init code > // not busy, do the initialization
movl $guard,%eax
movl $3,(%eax) // set the done bit
jmp .done
.busy:
pushl %eax // call RTS routine to wait, passing address
call1 __static_init_wait // of guard to monitor
testl %eax,%eax // 1 means exception occurred in init code,
popl %ecx
jnz .again // start the whole thing over
.done // 0 means wait finished
The above code will work for position-independent code as well. The complication due to exceptions is: what happens if the initialization code throws an exception? The [compiler] EH tables will have set up a special region and flag in their region table to detect this situation, along with a pointer to the guard variable. Because the initialization never completed, when the RTS sees that it is cleaning up from such a region, it will reset the guard variable back to both zeroes. This will free up a busy-waiting thread, if any, or will reset everything for the next thread that calls the function.
The idea of the __static_init_wait() RTS routine is to monitor the value of guard bits passed in, by looping on this decision table:
done busy
0 0 return 1 in %eax (EH wipe-out)
1 1 return 0 in %eax (no longer busy)
0 1 continue to wait (still busy)
1 0 internal error, shouldn't happen
As for how the wait is done [... not relevant for ABI, although currently we're using thr_yield(), which may or may not be right for this context].
[990608 SGI -- Hans] I'd like to make some claims about function scope static constructor calls in multithreaded environments. I personally can't recall ever having used such a construct, which somewhat substantiates my claims, but also implies some lack of certainty. I'd be interested in hearing any arguments to the contrary.
I believe that these arguments imply that this problem is not important enough to warrant added ABI complexity or overhead for sequential code.
Consider the following skeletal example:
f(int x) { static foo a(...); ... }
static foo a(...);
f(int x) { ... }
[990607 SCO -- Jonathan] Hans' argument breaks such local statics into two groups: those that don't depend upon the function's parameters, and those that do. For the latter group, he says:
> 6) Static function scope constructor calls which depend on function > arguments are likely to involve a race condition anyway, if multiple > instances of the function can be invoked concurrently. Any of the > calls might determine the constructor parameters. Thus these aren't > very interesting either. And if they are really needed, they can be > replaced with a file scope static constructor call plus an assignment.
I don't agree with these claims. There are sometimes situations where a group of objects is being processed, and you want to arbitrarily pick one of them to serve as an identifier or key for all of them. Consider perhaps a golf course scheduler, which is taking in players and assigning them to foursomes. You want to name each foursome by one of the names of the players (it doesn't matter which one), such as the "Jones group" or the "Smith group". A natural way to program this might be:
void build_foursome(string golfer) { static string group_name(golfer); // process golfer into group group_name ... }
Now if the golfers being scheduled are coming from four different databases, it might be that a thread is running to extract from each database. Thus build_foursome() might be called concurrently. That's fine, and there is no need for application-level locks in either the caller or this function; we don't care which golfer the group is named after. We just want the 'static' to work correctly; what we don't want is a double initialization, with two different group names being generated for golfers in the same group, which is possible if the guard code isn't thread-safe.
Now one can say that this kind of design isn't wise, or that locks will probably be needed later in this function to do the rest of the processing, or that this can be coded in several other ways. And that may all be so. But I think this usage is *reasonable* in this context, and that as implementors we should get it right. [Editorial: Especially with the advent of Java, threaded application programming is becoming more the norm; and language implementations that dodge the challenge and say that thread support is solely the job of libraries, may not be looked upon kindly by users.]
[000511 All] The ABI will not specify special multi-threading behavior. Note that the initialization guard variable (Issue C-14) is specified with size 8 bytes, with only 1 used, so an implementation is free to make arbitrary use of the other 7 for the suggested purpose, with the consequence that initializations from multiple copies (e.g. from inlining) could be inconsistent across implementations.
[000706 All] Reopen this issue and attempt to define an API for those implementations that do want to do a thread-safe version. Jim has added a proposed API to the Draft ABI document.
[000706 HP -- Christophe]
The current HP implementation does not use a release, and has a more
specializedroutine. This would be something like:
extern "C" void __cxa_allocate_static(
bool *flag,
void *object_address,
void (*object_dtor)(void *object));
The calling sequence for:
static X x
becomes:
static bool static_x_flag;
static X x;
if (!static_x_flag)
__cxa_allocate_static(&static_x_flag,
&x, __addressof(X::~X));
This has the following benefits:
The function itself deals with the flag in a thread-safe way, but this requires only one mutex inside the function. This is important, since test and set operations are potentially costly memorywise on IA64 (they are definitely on PA-RISC, where any mutex / lock / whatever must be 16-bytes aligned)
[000803 All]
Discussion brought out that Christophe's
__cxa_allocate_static
can't work precisely as described,
since the constructor and its arguments are also needed.
Christophe said that the actual sequence is more complex,
he removed too much to simplify the presentation,
and he will attempt to provide a fuller description.
The concern was repeated that there are objections to any automatic locking approach, and we should go back and consider them again.
[000720 All] Christophe would like to see the locking for this purpose combined with the locking required to register the initialized object with __cxa_atexit, as well as the ability to statically create the structure that will be enqueued by __cxa_atexit.
A potential interface that allows this would be the following. Expand the guard object to the following structure:
struct __cxa_guard {
long long guard; // Guard variable
void *next; // List link for destructor chain
void (*dtor) (void*); // Pointer to destruction routine
void *p; // Pointer to dtor parameter
dso_handle dhandle; // DSO handle for owning DSO
};
An implementation that chooses to implement its __cxa_atexit list with
elements matching this structure could then simply enqueue the above
structure on the list (without its initial doubleword guard).
An implementation using another structure might need to rearrange the data.
(This ABI would not specify either choice.)
The __cxa_guard_release call above would be re-specified to also
enqueue the object on the destruction list by calling __cxa_atexit or
its equivalent.
[000817 SGI -- Jim]
Note the tradeoff in the above:
It would increase the guard variable size from 8 to 40 bytes,
but would likely eliminate a bunch of instructions
to gather that data for the destructor registration call.
(But it would be a pure loss for no-destructor objects.
So perhaps we should modify it to eliminate the extra data for those,
and pass a parameter or use a byte in the guard
member
to indicate that to the release routine?)
[001109 all] It was observed that the current specification of __cxa_guard_release in 3.3.2 is not adequate to cope with the case where an exception is raised and the lock must be released without marking the object initialization complete. Therefore, we will define an analogous __cxa_guard_abort that does not mark the initialization complete, so that the next thread entering the scope will obtain the lock and try again.
Since there has been no further feedback from HP on the more complicated proposal above, and the current HP attendees do not think it necessary, this issue will be closed.
# | Issue | Class | Status | Source | Opened | Closed |
---|---|---|---|---|---|---|
G-5 | Varargs routine interface | call | open | HU-B | 990810 | |
Summary: The underlying C ABI defines conventions for calling varargs routines. Does C++ need, or would it benefit from, any modifications or special cases? How should we pass references or class objects? Is any runtime library support required? | ||||||
Resolution: No special cases required -- C++ will follow the C varargs ABI. |
[990810 HU-B Martin] I'd like to see an indirection in vararg lists, so they can be passed through thunks. This is necessary at least for the covariant returns, but might have other applications as well.
[990810 HU-B Martin] Since there already was the decision not to return a list of pointers from a covariant method, the only alternative to real thunks is code duplication (as done in Sun Workshop 5). (Or alternate entrypoints... Jim)
With real thunks, you have to copy the argument list. That is not possible for a varargs list, so here is my proposal for varargs in C++:
In the place of the ellipsis, a pointer to the first argument is passed. In case of a thunk for covariant returns, this pointer can be copied to the destination function. The variable arguments are put on the stack as they normally would.
With that, the issue is in which cases to use such a calling convention:
Option (1) could be further restricted to methods returning a pointer or reference to class type.
[990812 All] In response to a question, it was observed that passing one variant of a class hierarchy in a varargs list and referencing another variant in the va_arg macro is undefined, and we don't need to worry about a mechanism for doing the conversion.
[991014 All] We would want to reject option (3), even if it were still possible to change the base ABI. The present scheme is compatible with K&R C methods, the proposed change would not be.
Decision: Close with no action. We're using multiple entry points for covariant return types, not thunks, so there's no need for doing anything different for varargs functions with covariant return types than for any other varargs functions.
# | Issue | Class | Status | Source | Opened | Closed |
---|---|---|---|---|---|---|
G-6 | bool parameters | call | closed | all | 991104 | 991202 |
Summary: How should we pass bool parameters on IA-64? Choices are to pass them like ABI ints, or in predicate registers or register pairs. | ||||||
Resolution: No special treatment -- pass bool like char. |
[991202 All] It was decided not to treat bool parameters specially, i.e. they will be passed like chars.
# | Issue | Class | Status | Source | Opened | Closed |
---|---|---|---|---|---|---|
H-1 | Runtime library DSO name | tools | closed | SGI | 990616 | 000817 |
Summary:
Determine the name of the common C++ runtime library DSO,
e.g. libC.so .
If there are to be vendor-specific support libraries which must coexist
in programs from mixed sources, identify naming convention for them.
| ||||||
Resolution:
The runtime library will be named libcxa.so .
|
[000817 All]
Agreed to name the library libcxa.so
.