C++ ABI Closed Issues
Revised 17 November 2000
call | Function call interface, i.e. call linkage |
data | Data layout |
lib | Runtime library support |
lif | Library interface, i.e. API |
g | Potential gABI impact |
ps | Potential psABI impact |
source | Source code conventions (i.e. API, not ABI) |
tools | May affect how program construction tools interact |
# | Issue | Class | Status | Source | Opened | Closed |
---|---|---|---|---|---|---|
A-1 | Vptr location | data | closed | SGI | 990520 | 990624 |
Summary: Where is the Vptr stored in an object (first or last are the usual answers). |
[990610 All] Given the absence of addressing modes with displacements on IA-64, the consensus is to answer this question with "first."
[990617 All] Given a Vptr and only non-polymorphic bases, which (Vptr or base) goes at offset 0?
Tentative decision: Vptr always goes at beginning.
[990624 All] Accepted tentative decision. Rename, close this issue, and open separate issue (B-6) for Vtable layout.
# | Issue | Class | Status | Source | Opened | Closed |
---|---|---|---|---|---|---|
A-2 | Virtual base classes | data | closed | SGI | 990520 | 990624 |
Summary: Where are the virtual base subobjects placed in the class layout? How are data member accesses to them handled? |
[990610 Matt] With regard to how data member accesses are handled, the choices are to store either a pointer or an offset in the Vtable. The concensus seems to be to prefer an offset.
[990617 All] Any number of empty virtual base subobjects (rare) will be placed at offset zero. If there are no non-virtual polymorphic bases, the first virtual base subobject with a Vpointer will be placed at offset zero. Finally, all other virtual base subobjects will be allocated at the end of the class, left-to-right, depth-first.
[990624 All] Define an empty object as one with no non-static, non-empty data members, no virtual functions, no virtual base classes, and no non-empty non-virtual base classes. Define a nearly empty object as one which contains only a Vptr. The above resolution is accepted, restated as follows:
Any number of empty virtual base subobjects (rare, because they cannot have virtual functions or bases themselves) will be placed at offset zero, subject to the conflict rules in A-3 (i.e. this cannot result in two objects of the same type at the same address). If there are no non-virtual polymorphic base subobjects, the first nearly empty virtual base subobject will be placed at offset zero. Any virtual base subobjects not thus placed at offset zero will be allocated at the end of the class, in left-to-right, depth-first declaration order.
# | Issue | Class | Status | Source | Opened | Closed |
---|---|---|---|---|---|---|
A-3 | Multiple inheritance | data | closed | SGI | 990520 | 990701 |
Summary: Define the class layout in the presence of multiple base classes. |
[990617 All] At offset zero is the Vptr whenever there is one, as well as the primary base class if any (see A-7). Also at offset zero is any number of empty base classes, as long as that does not place multiple subobjects of the same type at the same offset. If there are multiple empty base classes such that placing two of them at offset zero would violate this constraint, the first is placed there. (First means in declaration order.)
All other non-virtual base classes are laid out in declaration order at the beginning of the class. All other virtual base subobjects will be allocated at the end of the class, left-to-right, depth-first.
The above ignores issues of padding for alignment, and possible reordering of class members to fit in padding areas. See issue A-9.
[990624 All] There remains an issue concerning the selection of the primary base class (see A-7), but we are otherwise in agreement. We will attempt to close this on 1 July, modulo A-7.
[990701 All] This issue is closed. A full description of the class layout can be found in issue A-9. (At this time, A-7 remains to be closed, waiting for the Taligent rationale.)
# | Issue | Class | Status | Source | Opened | Closed |
---|---|---|---|---|---|---|
A-4 | Empty base classes | data | closed | SGI | 990520 | 990624 |
Summary: Where are empty base classes allocated? (An empty base class is one with no non-static data members, no virtual functions, no virtual base classes, and no non-empty non-virtual base classes.) |
[990624 All] Closed as a duplicate of A-3.
# | Issue | Class | Status | Source | Opened | Closed |
---|---|---|---|---|---|---|
A-5 | Empty parameters | data | closed | SGI | 990520 | 001117 |
Summary: When passing a parameter with an empty class type by value, what is the convention? | ||||||
Resolution : Except for cases of non-trivial copy constructors (see C-7), and parameters in the variable part of varargs lists, A single parameter slot will be allocated to empty parameters, as though they were a struct containing a single character. |
[990623 SGI] We propose that no parameter slot be allocated to such parameters, i.e. that no register be used, and that no space in the parameter memory sequence be used. This implies that the callee must allocate storage at a unique address if the address is taken (which we expect to be rare).
[990624 All] In addition to the address-taken case, care is required if the object has a non-trivial copy constructor. HP observes that in (some?) such cases, they perform the construction at the call site and pass the object by reference.
[990625 SGI -- Jim] I understand that the Standard explicitly allows elimination of even non-trivial copy construction in some cases. Is this one of them? Where should I look? Also, of course, varargs processing for elided empty parameters would need to be careful.
I have opened a new issue (C-7) for passing copy-constructed parameters by reference. Since doing so would turn an empty value parameter into a non-empty reference parameter, this issue can ignore such cases.
[990701 All] An empty parameter will not occupy a slot in the parameter sequence unless:
Daveed and Matt will pursue the question of when copy constructors may be ignored for parameters with the Core committee, and if they identify cases where the constructors may clearly be omitted, those (empty) parameters will also be elided.
[001109 CodeSourcery -- Mark] Both g++ and the HP compiler have great difficulty dealing with this, and prefer to reserve the parameter slot even for empty parameters. At the meeting, we tentatively decided to reverse our decision and allocate an integer parameter slot even for empty parameters. We will place no constraints on the data in the parameter slot, except that on IA-64, it must be not be NaT data.
[001117 All -- Jim] There having been no objection to the proposed resolution, it is adopted. Results will be treated the same way.
# | Issue | Class | Status | Source | Opened | Closed |
---|---|---|---|---|---|---|
A-6 | RTTI .o representation | data call ps | closed | SGI | 990520 | 991028 |
Summary:
Define the data structure to be used for RTTI, that is:
| ||||||
Resolution: Defined in the Draft C++ ABI for IA-64. |
[990701 All] Daveed will put together a proposal by the 15th (action #13); the group will discuss it on the 22nd.
[990805 All] Daveed should have his proposal together for discussion. Michael Lam will look into the Sun dynamic cast algorithm.
It was noted that appropriate name selection along with the normal DSO global name resolution should be sufficient to produce a unique address for each class' RTTI struct, which address would then be a suitable identifier for comparisons.
[990812 Sun -- Michael] Sun has provided a description, in a separate page, describing their implementation. They are filing for a patent on the algorithms described.
[990819 EDG -- Daveed] (Proposal replaced by later version on 6 October.)
[990826 All] Discussion centered on whether the representation should include all base classes or just the direct ones, and in the former case how hashing might be handled. It was agreed that the __qualifier_type_info variant is not needed, and it is now striken in the above proposal. Also, a pointer-to-member variant is needed. Christophe will provide a description of the HP hashing approach, and Daveed will update the specification.
[991006 EDG -- Daveed]
The C++ programming language definition implies that information about types be available at run time for three distinct purposes:
The following conclusions were arrived at by the attending members of the C++ IA-64 ABI group:
The full proposal has been incorporated in the Draft C++ ABI for IA-64.
[991014 all]
ACTION ITEMS: Daveed---make these changes. Jim---incorporate these changes into the open issues list. We are almost ready to close this issue; we intend to close it at the 28 October meeting, after we've all had a change to go over the modified writeup.
[991028 all]
[990617 All] It will be shared with the first polymorphic non-virtual base class, or if none, with the first nearly empty polymorphic virtual base class. (See A-2 for the definition of nearly empty.)
[990624 All] HP noted that Taligent chooses a base class with virtual bases before one without as the primary base class), probably to avoid additional "this" pointer adjustments. SGI observed that such a rule would prevent users from controlling the choice by their ordering of the base classes in the declaration. The bias of the group remains the above resolution, but HP will attempt to find the Taligent rationale before this is decided.
[990729 All] Close with the agree resolution. If a convincing Taligent rationale is found, we can reconsider.
# | Issue | Class | Status | Source | Opened | Closed |
---|---|---|---|---|---|---|
A-8 | (Virtual) base class alignment | data | closed | HP | 990603 | 990624 |
Summary: A (virtual) base class may have a larger alignment constraint than a derived class. Do we agree to extend the alignment constraint to the derived class? (An alternative for virtual bases: allow the virtual base to move in the complete object.) |
[990623 SGI] We propose that the alignment of a class be the maximum alignment of its virtual and non-virtual base classes, non-static data members, and Vptr if any.
[990624 All] Above proposal accepted. (SGI observation: the size of the class is rounded up to a multiple of this alignment, per the underlying psABI rules.)
# | Issue | Class | Status | Source | Opened | Closed |
---|---|---|---|---|---|---|
A-9 | Sorting fields as allowed by [class.mem]/12 | data | closed | HP | 990603 | 990624 |
Summary: The standard constrains ordering of class members in memory only if they are not separated by an access clause. Do we use an access clause as an opportunity to fill the gaps left by padding? | ||||||
Resolution: See separate writeup of Draft C++ ABI for IA-64. |
[990610 all] Some participants want to avoid attempts to reorder members differently than the underlying C struct ABI rules. Others think there may be benefit in reordering later access sections to fill holes in earlier ones, or even in base classes.
[990617 all] There are several potential reordering questions, more or less independent:
There is no apparent support for (1), since no simple heuristic has been identified with obvious benefits. There is interest in (2), based on a simple heuristic which might sometimes help and will never hurt. However, it is not clear that it will help much, and Sun objects on grounds that they prefer to match C struct layout. Unless someone is interested enough to implement and run experiments, this will be hard to agree upon. G++ has implemented (3) as an option, based on specific user complaints. It clearly helps HP's example of a base class containing a word and flag, with a derived class adding more flags. Idea (4) has more problems, including some non-intuitive (to users) layouts, and possibly complicating the selection of bitwise copy in the compiler.
[990624 all] We will not do (1), (2), or (4). We will do (3). Specifically, allocation will be in modified declaration order as follows:
[990722 all] The precise placement of empty bases when they don't fit at offset zero remained imprecise in the original description. Accordingly, a precise layout algorithm is described in a separate writeup of Data Layout.
[990729 all] The layout writeup was accepted, with the first choice for empty base placement. That is, if placement at offset zero doesn't work, it will be placed like a normal base/member. The concensus was that this won't happen often, and such bases will often overlap with the preceding tail padding or following components anyway. Jim will modify the writeup accordingly.
# | Issue | Class | Status | Source | Opened | Closed |
---|---|---|---|---|---|---|
A-10 | Class parameters in registers | call | closed | HP | 990603 | 990710 |
Summary:
The C ABI specifies that structs are passed in registers.
Does this apply to small non-POD C++ objects passed by value?
What about the copy constructor and this pointer in that case?
|
[990701 all] A separate issue (C-7) deals with cases where a non-trivial copy constructor is required; we ignore those cases here. Our conclusion is that, without a non-trivial copy constructor, we need not be concerned about the class object moving in the process of being passed, and there is no need to use a mechanism different from the base ABI C struct mechanism. At the same time, if we do use the underlying C struct mechanism, the user has complete control of the passing technique, by choosing whether to pass by value or reference/pointer.
Therefore, except in cases identified by issue C-7 for different treatment, class parameters will be passed using the underlying C struct protocol.
[990729 All] Jason described the g++ implementation, which is a three-member struct:
A concern about covariant returns was raised. It was observed that, given our decision to use distinct Vtable entries for distinct return types, no further concern is required here. Others will describe their representations. IBM has an alternative, but it is believed to be patented by Microsoft.
[990805 All] It is agreed that a two-element struct will be used for a pointer to a member function, with elements as follows:
ptr
:
adj
:
Although we agreed to close this, SGI suggests a minor modification. Since the Vtable offset of a virtual function will always be even, we suggest that it not be doubled before adding 1. This is because shifts are more restricted on many processors than other integer ALU operations (shifters are large structures), so an XOR or NAND will often be cheaper than a right shift.
[990812 All] Close this issue with the suggested modification.
# | Issue | Class | Status | Source | Opened | Closed |
---|---|---|---|---|---|---|
A-12 | Merging secondary vtables | data | closed | Sun | 990610 | 990805 |
Summary: Sun merges the secondary Vtables for a class (i.e. those for non-primary base classes) with the primary Vtable by appending them. This allows their reference via the primary Vtable entry symbol, minimizing the number of external symbols required in linking, in the GOT, etc. | ||||||
Resolution: Concatenate the Vtables associated with a class in the same order that the corresponding base subobjects are allocated in the object. |
[990701 Michael Lam] Michael will check what the Sun ABI treatment is and report back.
[990729 All] A separate issue raised in conjunction with A-7 is whether to include Vfunc pointers in the primary Vtable for functions defined only in the base classes and not overridden. If the primary and secondary Vtables are concatenated, this is no longer an issue, since all can be referenced from the primary Vptr.
[990805 All] All of the Vtables associated with a class will be concatenated, and a single external symbol used (to be identified as part of the mangling issue F-1). The order of the tables will be the same as the order of base class subobjects in an object of the class, i.e. first the primary Vtable, then the non-virtual base classes in declaration order, and finally the virtual base classes in depth-first declaration order.
# | Issue | Class | Status | Source | Opened | Closed |
---|---|---|---|---|---|---|
A-13 | Parameter struct field promotion | call | closed | SGI | 990603 | 990701 |
Summary: It is possible to pass small classes either as memory images, as is specified by the base ABI for C structs, or as a sequence of parameters, one for each member. Which should be done, and if the latter, what are the rules for identifying "small" classes? | ||||||
Resolution: No special treatment will be specified by the ABI. |
[990701 all] Define no special treatment for this case in the ABI. A translator with control over both caller and callee may choose to optimize.
# | Issue | Class | Status | Source | Opened | Closed |
---|---|---|---|---|---|---|
A-14 | Pointers to data members | data | closed | SGI | 990729 | 990805 |
Summary: How should pointers to data members be represented? | ||||||
Resolution: Represented as one plus the offset from the base address. |
[990729 SGI]
We suggest an offset from the base address of the class,
represented as a ptrdiff_t
.
[990805 All]
Such pointers are represented as one plus the offset from the base
address of the class, as a ptrdiff_t
.
NULL pointers are zero.
# | Issue | Class | Status | Source | Opened | Closed |
---|---|---|---|---|---|---|
A-15 | Empty bit-fields | data | closed | CodeSourcery | 991214 | 000106 |
Summary: How are zero-length bit-fields handled? | ||||||
Resolution: Zero-length bit-fields do not prevent a class from being considered empty or nearly empty. |
[991214 CodeSourcery -- Mark]
Question: Does the presence of a zero-width bit-field prevent a class from being empty?
Suggested Resolution: No. Amend the definition of an "empty class" to read:
Amend the definition of a "nearly empty class" to read:
[000106 All] Accept the CodeSourcery proposal.
[000106 All] Accept the proposal.
# | Issue | Class | Status | Source | Opened | Closed |
---|---|---|---|---|---|---|
A-17 | Primary indirect virtual base allocation | data | closed | SGI | 991228 | 000113 |
Summary: When a nearly empty virtual base class A is allocated as the primary base class of class B, and then B is allocated as a base class of C, should A (i.e. its vptr) be separately allocated in C, or should its first occurrence in a previously allocated base B be used as its allocation in C? | ||||||
Resolution: Do not reallocate a nearly empty virtual base class that is the primary base class of any other base class, direct or indirect. Use the first primary base class instance in the inheritance hierarchy as its allocation, in the usual depth-first, left-to-right order. |
[991228 SGI -- Jim] Specific wording for a proposed change is in the Draft C++ ABI for IA-64.
[000103 CodeSourcery -- Mark]
I think the current proposal for allocating virtual bases is still a
little suboptimal. In particular, given:
we'll give `C' a larger size than for:
struct A { void f(); };
struct B : virtual public A { };
struct C : virtual public A, virtual public B { };
where we'll reuse the `A' part of `B' rather than reallocating it.
struct C : virtual public B, virtual public A { };
I know that ordering can already affect size (principally because of alignment issues) but I think that in this case we might as well not punish programmers for choosing the "wrong" ordering.
I think we should change the green A-17 proposed resolution to indicate that if one of the virtual bases is a (direct or indirect) primary base of one of the other virtual bases then we need not allocate a fresh copy.
FWIW, it turns out to actually be easier in GCC to code the more generous version.
The algorithm to do this is linear in the size of the hierarchy: just iterate through the inheritance DAG marking all primary bases. Any virtual base classes that remain unmarked need to be allocated in step III. A slight formalization of this sentence might be a good way to express which bases to choose for III.
[000113 All]
# | Issue | Class | Status | Source | Opened | Closed |
---|---|---|---|---|---|---|
A-18 | Virtual base alignment | data | closed | SGI | 991228 | 000113 |
Summary: Should virtual bases have a different effect on class alignment than other components? | ||||||
Resolution: Yes. When allocating the non-virtual part of a base class, use its non-virtual allignment, i.e. ignoring its virtual bases' contributions. |
[991228 SGI -- Jim] Since the allocation of virtual bases is "floating" relative to the classes in which they occur, it is possible for them to have independent alignment constraints. Specifically, when allocating a base class with a virtual base, we could treat its alignment as that obtained by ignoring the virtual base, and later allocate the virtual base with greater alignment.
Since the class with a virtual base already has a vptr, this only matters if the virtual base contains components more strictly aligned than a pointer. Thus, the benefit of doing so is probably not large. To get some idea of the effect on the layout definition, look at dsize and nvsize, and assume a similar pair of alignment values.
[000106 All] No strong opinions were expressed on this issue. We will decide it at the next meeting after people have a chance to think it over. The bias will be to keep the current simpler definition.
[000113 All] It turns out that both Compaq and someone else (Cygnus?) already do this, find it straightforward, and prefer to keep it. Therefore, accept the suggestion that when allocating the non-virtual part of a base class, we use its non-virtual allignment, i.e. ignoring its virtual bases' contributions.
# | Issue | Class | Status | Source | Opened | Closed |
---|---|---|---|---|---|---|
A-19 | Primary indirect virtual base choice | data | closed | All | 000106 | 000120 |
Summary: In allocating class C, when the first nearly empty virtual base class A is allocated as the primary base class of a later nearly empty virtual base class B, should A or B become the primary base class of C? | ||||||
Resolution: Do not use a virtual base as primary if it is already a primary base of some other direct or indirect base, unless such are the only candidates. In either case, use the first candidate in depth-first, left-to-right order in the inheritance graph. |
[000106 All] This issue was initially confused in the discussion with A-17, but is independent. Recall that non-virtual bases have priority over virtual bases for selection as the primary base. Assuming that no non-virtual base is suitable, this issue involves which virtual base should be selected. Our original decision was to use the first in left-to-right order.
The proposal here is that, if this initial candidate A is itself already a primary base class of a later virtual base B, then B will be used instead, unless it is already a primary base class of a later virtual base, and so on. See proposed wording in the ABI layout document.
Noone can identify a case in which this approach is worse than the original definition.
[000113 All] The proposed resolution on the table is to use the following priority to choose the primary base class:
[000113 All] Modify the above to use any virtual base in the inheritance graph, first one that is not already primary to some base if possible, or then any candidate, chosen as the first in a depth-first, left-to-right inheritance graph walk.
# | Issue | Class | Status | Source | Opened | Closed |
---|---|---|---|---|---|---|
A-20 | Operator new array cookies | data | closed | All | 000113 | 000120 |
Summary: When operator new is used to create a new dynamic-length array, a cookie must be stored to remember the allocated length so that it can be deallocated correctly. | ||||||
Resolution: In principle, place cookie immediately before array, aligned naturally. Use no cookie for array element types without destructors. See the Draft C++ ABI for IA-64. |
[000113 All] The proposed resolution is as follows:
sizeof(size_t)
.
align
be the maximum alignment of
size_t
and an element of the array to be allocated.
align
bytes.
align
bytes.
align
bytes
from the space allocated for the array.
sizeof(size_t)
bytes
immediately preceding the array data.
sizeof(size_t)
is smaller than the array element alignment,
and if present will precede the cookie.
[000120 All] Accept the above.
# | Issue | Class | Status | Source | Opened | Closed |
---|---|---|---|---|---|---|
A-21 | Placement new array cookies | data | closed | All | 000113 | 000217 |
Summary: Same issue as A-20, except that for placement new, the user supplies already-allocated space. Therefore, there is a conflict between wanting to make delete() work on arrays created in this way, and wanting to avoid surprising users who haven't allocated enough space for the cookie. Also, are cookies allocated if there is no destructor? | ||||||
Resolution:
Use no cookie for element types with no destructors,
nor for ::operator new(size_t, void*) .
Otherwise, use a cookie as in issue A-20.
See the Draft C++ ABI for IA-64.
|
[000119 SGI -- Matt]
What the standard says (3.7.3.1, 5.3.4, and 18.4.1.3)
Array placement new has the form "new(ARGS) T[n]". The "(ARGS)" part is optional. If it's present then this is a placement new-expression, and we use a version of operator new[] with two or more arguments, otherwise it's an ordinary new-expression, and we use a version of operator new[] with one argument. For the purposes of this proposal, the distinction isn't all that important.
After finding the appropriate operation new, a new-expression obtains storage with
void* p = operator new[](n1, ARGS),
It is required (3.7.3.1/2) that the return value of any operator new[], whether it's built-in or provided by the user, must be suitably aligned for objects of any type.
If T is "char" or "unsigned char" the standard requires that delta is a nonnegative multiple of the most stringent alignment constraint for objects of size less than or equal to n (5.3.4/10). Otherwise the only restriction is that delta is nonnegative.
Some implementations store the number of elements in the array at a negative offset from p1. The standard neither requires nor forbids it.
There's a predefined placement version of array operator new,
::operator new[](size_t n1, void* p),
IA-64 Specifics
On IA-64 long double is 80 bits. long double has 128-bit alignment, as do classes and unions containing long double, so sizeof(long double) is 16. All other types have at most 64-bit alignment.
What the abi needs to specify
Proposal A
No version of operator new[] is a special case. For any array new-expression we store the number of elements in the array, as a size_t, at an offset of -sizeof(size_t) from the pointer returned by the new-expression. For any type T other than char, unsigned char, long double, or a type containing a long double, n1 = n * sizeof(T) + sizeof(size_t). For those three types, since we need to preserve long double alignment, n1 = n * sizeof(T) + sizeof(long double).
Pseudocode for new(ARGS) T[n] under this proposal:
if T = char or unsigned char, or if it has long double alignment,
padding = sizeof(long double)
else
padding = sizeof(size_t)
p = operator new[](n * sizeof(T) + padding, ARGS)
p1 = (T*) (p + padding)
((unsigned long*) p1 - 1) = n
for i = [0, n)
create a T, using the default constructor, at p1[i]
return p1
Proposal B
::operator new[](size_t, void*) is a special case. For that version of operator new[] only, n1 = n * sizeof(T). We do not store the number of elements in such an array anywhere.
Pseudocode for new(ARGS) T[n] under this proposal:
If the expression is new(p) T[n], and if overload resolution
determines we're using ::operator new[](size_t, void*), then
p1 = (T*) p
for i = [0, n)
create a T, using the default constructor, at p1[i]
return p1
For all other cases, same as proposal A.
Proposal A is simpler, but proposal B probably conforms more closely to user expectations.
[000210 All -- Matt]::operator new(size_t, void*)
is a special case with no cookie,
is preferable to Proposal A,
where all versions of array new get cookies.
We also agreed to the variation where we don't reserve space for a cookie if the type has no destructor. We're calling it Proposal C. We need a writeup, but we should be able to close this issue next week.
[000302 CodeSourcery -- Mark]
In particular, there are situations in which we do not allocate cookies, even when allocating arrays of class type. But, the standard guarantees that:
When a delete-expression is executed, the selected deallocation function shall be called with the address of the block of storage to be reclaimed as its first argument and (if the two-parameter style is used) the size of the block as its second argument.)
That paragraph doesn't require that the class type have a non-trivial destructor.
I think that means the first bullet:
(Note: if the usual array deallocation functions takes two arguments, then its second argument is of type size_t. The standard guarantees that this function will be passed the number of bytes allocated with the previous array new expression. See [class.free] for details.)
[000302 All]
# | Issue | Class | Status | Source | Opened | Closed |
---|---|---|---|---|---|---|
A-22 | RTTI for reference types | data | closed | CodeSourcery | 000119 | 000203 |
Summary: __reference_type_info does not appear to be necessary. | ||||||
Resolution: Remove it. |
[000119 CodeSourcery -- Nathan] When would a type_info of a reference ever be generated? (So why __ref_type_info?)
[000126 CodeSourcery -- Nathan]
[000128 Cygnus -- Jason] Based on that, I definitely think reference type_info can go away.
[000203 All] Remove __ref_type_info.
# | Issue | Class | Status | Source | Opened | Closed |
---|---|---|---|---|---|---|
A-23 | RTTI class descriptors | data | closed | CodeSourcery | 000124 | 000302 |
Summary: Resolve several questions about the RTTI representation of class types. | ||||||
Resolution: See the Draft C++ ABI for IA-64. |
[000124 CodeSourcery -- Nathan]
si_class_type_info
is for a single nonvirtual inheritance heirarchy.
Presumably this single non-virtual inheritance is between the derrived
and the base (the base may or may not have multiple or virtual bases).
An additional constraint is that, if the derrived class is polymorphic,
the base class is too. Rationale: if the derrived class adds
polymorphism, the base will be at a non-zero offset.
[000126 CodeSourcery -- Nathan] More useful for dynamic cast (and possibly catch matching) {than the current set of flags -- editor} would be the following flags:
Note that the virtual/non-virtual and public/non-public are not mutually exclusive. Also note that I have not actually implemented anything with these flags, so I could be wrong.
[class.mi] (clause 10.1) provides good examples of "diamond shaped."
Paragraph 4 gives a non-diamond shaped graph with multiple base object.
At least one of the multiply inherited base objects must be non-virtual.
struct L {};
struct A : L {};
struct B : L {};
struct C : A, B {};
There are two distinct L base objects in C. C would have the non-diamond shaped multiple inheritance flag set. A, B and C would have the non-virtual base flag and public base flag set.
Paragraph 5 gives a diamond shaped graph.
Such a multiply inherited base object must be virtual.
struct V {};
struct A : virtual V {};
struct B : virtual V {};
struct C : A, B {};
This time C would have the diamond shaped flag set. A, B & C would have the virtual base flag set and the public base flag set. C would also have the non-virtual base flag set.
Paragraph 6 gives a graph which contains both features.
Here there is one non-virtual base and one virtual base.
struct B {};
struct X : virtual B {};
struct Y : virtual B {};
struct Z : B {};
struct AA : X, Y, Z {};
In that example, AA would have both diamond and non-diamond flags set. all would have the public base flag set, AA & Z would have the non-virtual base flag set, AA, X & Y would have the virtual base flag set.
The above is treating the non-virtual and virtual base flags differently, they should have the following meaning:
My thinking is that for dynamic_cast, having such information will allow pruning parts of the inheritance graph walk. For instance, there can only be distinct multiple target base objects when the non-diamond shaped flag is set in the complete object. When we find them, the base sub-object started from can only be a common base for both of them, if the diamond shaped flag is set in the complete object. Alternatively, there can only be (at most) one instance of the target type when the non-diamond shaped flag is clear. When we find it via a non-public path, there could only be an alternative public path if the complete object has the diamond shaped flag set. Similar pruning should be possible for catch matching. Without such information, the graph walk has to be pessimistic, which I beleive will slow down the common case.
[000126 CodeSourcery -- Nathan]
__si_class_type_info
is documented for
a single non-virtual hierarchy,
and __vmi_class_type_info
for a class containing
(directly or indirectly)
a multiple or virtual inheritance component.
My mistake was to use __si_class_type_info
for a class with a single base,
regardless of the
heirachy within the base (that is the current g++ behaviour).
__si_class_type_info
is for both public and non-public inheritance
(again, something I'd not noticed, thinking it was for public only).
For this to work,
the __class_type_info flag bit 0x8 'non-publicly inherited base'
must mean `non-publicly inherited direct base'.
Please can the wording about bases here explicitly say
`direct base,' `indirect base,' or `direct or indirect base.'
The description currently use `contains' and `has' which
are open to interpretation.
In dynamic casting, access is important. In a cross cast from base A via complete type C to another base B, both B and A must be publicly accessible from C. It might be that dynamic_cast locates B, and, knowing that C does not have multiply inherited subobjects, determines it need look no further. However, it must determine access. If C has no non-public direct or indirect bases, access must be OK, without further inspection. However the hint flag 0x8 can't be indicating that, as it is only for direct bases. (This was the one case where I was able to take advantage of these flags, but alas it seems I can't.)
[000127 All]
We decided on Thursday that your "mistakes" are what we want.
__si_class_type_info
will be for any class with a
single direct base at offset 0 which is public and non-virtual.
We also decided that the flags should move from
__class_type_info
into __vmi_class_type_info
,
and that the polymorphic flag should be removed.
[000126 CodeSourcery -- Nathan] I think this moving of the flags is a mistake. If I understood correctly, they indicated information about direct and indirect bases (whether there was virtuality anywhere in the heirarchy for instance). Such information can speed up dynamic cast. When walking the inheritance graph, we can take some early outs, if we know there are no multiple subobject types within the complete graph. With the flags in every class's type_info, it becomes easier to get hold of that info. With it only for vmi classes, we have to remember `unknown' when presented with a complete object of si type, and fill the information in when/if we find a vmi base.
Another case is in a potential cross-cast case, which I had in the previous email. Suppose we've found the target base, which we know is unique, but not found the source base (because we early outed, maybe). To be a valid cross-cast both the source and target base objects must be public in the complete object. If we know the complete heirarchy has no non-public bases, there's no need to search for the source base in this case.
[000129 Cygnus -- Jason]
I think I'd rather pay that small performance hit than add a word to the type_info for each class. Matt, would this affect locales?
... cross-casts only come up in the context of classes with multiple bases, so it wouldn't make sense to look for this in single inheritance classes anyway.
[000127 All]
[000203 All]
[000203 SGI -- Jim]
I moved the flags from __class_type_info to __vmi_class_type_info, discovering that they don't need to share space with the offset field in the __base_class_info records, but rather with the base class count. But, the __base_class_info has its own flags (virtual and public) which can reasonably share a doubleword, as we were discussing for the other flags this morning. So I specified that. Note that I put the flags in the low byte rather than the high byte. That is because the offset is signed, and it is likely that implementations will sign-extend (signed doubleword>>8), but not (doubleword & 0x00ffffffffffffffll).
After an exchange with Nathan, I reinstated his first flag (contains non-diamond multiple inheritance).
[000210 All -- Matt]
Minor corrections to RTTI discussion in data layout document: In section 7c, which describes the vmi_flags, flag 0x01 is documented incorrectly. It says "class has non-diamond multiple inheritance", which isn't quite right. We're really talking more about repeated inheritance: having multiple subobjects of the same type.
Also in vmi_flags, Jason questions whether flags 0x04 and 0x08 are necessary. What do we really need "has virtual base(s)" and "has non-virtual base(s)" for? Jason has sent email to Nathan about this.
Naming issue: we decided to put all of our type_info subclasses
in namespace abi, not namespace std. This means, of course,
that they can't go in any of the standard headers. Rather than
inventing multiple header names, we would like to put everything
(unwinding longjmp, type_info subclasses, etc.) into one quasi-
standard header. We propose the name
Issue A23 can almost be closed. The only thing we need to
resolve is whether to keep the two flags that Jason is unsure about.
[000302 All -- Matt]
[000126 CodeSourcery -- Nathan]
The amended (25th Jan) RTTI specification says:
I don't believe this is the case,
the example I posted a couple of weeks back pointed this out.
Here it is, in a slightly more compact form
I believe this is well formed and should not abort.
The RTTI document indicates that `typeid (A const * const *)' and
`typeid (B const * const *)' will produce __pointer_type_info chains
that end at a weak symbol reference for A and B respectively.
These will both resolve to zero.
How is catch matching able to determine the difference between
`A const * const *' and `B const * const *' under these circumstances?
If this is a shortcoming of the ABI,
or considered a defect in the standard, it should be documented.
There seems to be no discussion of this case.
[000127 All]
[000128 CodeSourcery -- Nathan]
In the catch matching,
the type_infos for `A const *const *' and `A const *' will be:
and those for `B const *const *' and `B const *':
I fail to see how the catch matcher can get different results comparing
__tiPP1B to __tiPCPC1A as opposed to comparing __tiPP1B to __tiPCPC1B.
They both look like qualification conversions of pointers to pointers
to incomplete type.
In the first case we'll end up comparing __tiP1B to __tiPC1A,
which still is a valid qualification conversion,
then have two NULL pointers for the pointed to types,
which somehow we have to tell apart.
In the second case we'll end up comparing __tiP1B to __tiPC1B,
and again have two NULL pointers for the pointed to types,
but this time we have to consider them the same type.
I don't see anything in [conv.qual] saying that qualification
conversions don't have to deal with incomplete types.
N.B.: old-abi g++ seg faults on the above code because it does wander
into the NULL pointers.
[000129 Cygnus -- Jason]
I think that leaves us with something like what EDG does now:
namely, comparisons are done by comparing the addresses of
one-byte commons rather than of the type_info nodes themselves.
Then we could emit incomplete info in one file and complete info in
another file and they would compare the
same because both refer to the same ID proxy.
We could mangle the complete and incomplete versions differently,
so they would not be combined by the linker.
This would also change how we refer to type_infos;
under the current scheme,
references to type_infos in the EH type table need to be via
relocs that will be resolved by the dynamic linker at runtime.
If we don't need to compare addresses,
we could use gp-relative references.
Of course,
we'd still have the absolute references in the type_infos to the ID proxies,
so we're no better off.
[000130 CodeSourcery -- Nathan]
[000203 All]
Since all we need from the common block is a distinct address,
we may want to float a base ABI proposal for a new symbol type which is
resolved by the linkers to a unique address without allocating storage.
[000210 All -- Matt]
A class's __class_type_info object and its comdat proxy both receive
mangled names. We must make sure that the proxy's mangled name is the
same for all complete and incomplete declarations of a class, that the
mangled name of the __class_type_info object is the same for all
complete declarations of a class, and that the mangled name of the
__class_type_info object is different for incomplete declarations than
for complete declarations. One way to achieve this is to make
__class_type_info objects for incomplete declarations static.
We add a new flag to __pointer_type_info; let's say bit 0x4. If
this is set, it means we have a pointer to an incomplete type (or
pointer to pointer to incomplete type, etc.)
We compare two __class_type_infos for equality by pointer comparison
of the id_proxy_ptr fields. We compare two __pointer_type_infos for
equality by looking at the addresses of the type_info objects,
*unless* the incomplete bit is set in at least one of them. If the
incomplete bit is set, we have to compare the pointed-to types. For
everything other than classes and pointers we can just use address
equality of the type_info objects themselves.
In response to Jason's 000129 question: we can't use gp-relative
references for type_info objects because we're only using comdat
proxies for __class_type_info, not for other kinds of type_info
objects.
In response to Nathan's 000130 question: this is the reason to
give the complete and incomplete __class_type_info objects different
mangled names. That way a complete __class_type_info object in a
DSO won't be overridden by an incomplete __class_type_info object
in the executable.
At the very end of this meeting we got a suggestion from Christophe
for a complete different mechanism. We agreed that we can't evaluate
it without a writeup. The suggestion: abandon these comdat proxies
altogether. Instead we have a new type_info class,
__incomplete_class_type_info. Comparisons involving two
__class_type_info objects use address equality, comparisons involving
two __incomplete_class_type_info objects, or a __class_type_info and
an __incomplete_class_type_info, do string comparison on the name. We
still would have an incomplete bit in the __pointer_type_info class,
which, again, we would use to determine whether two
__pointer_type_info objects with different addresses might
nevertheless represent the same pointer type.
[000309 All]
[000314 SGI -- Jim]
[000330 All]
When the specified width of a bitfield exceeds the size of the declared type,
the standard specifies that the accessible field is
to be padded to the specified width,
with the location of the padding implementation-defined.
That is, the accessible field could be placed at the beginning,
at the end, or in the middle of the specified bits.
(Note that such declarations are explicitly disallowed by the C 2000
draft, so this is not a C ABI issue.)
[000204 SGI -- Jim]
It seems to me that the situation that makes it interesting is the
following:
One could express this by the following rule:
[000221 CodeSourcery -- Mark]
The ABI document says that a NULL pointer-to-member function has
`ptr == 0'. It does, not, however say whether or not a NULL
pointer-to-member function also has `adj == 0'.
I believe that this should be specified as well so that code generated
to do comparison of pointers to members (of the same type)
looks like:
So, I would say:
It's occurred to me that this imposes some overhead on casting
pointers-to-members around: now when you convert from a base pointer
to member to a derived version (or vice versa), you can't just adjust
the `adj' member willy-nilly; instead, you have to check first whether
or not the pointer is NULL.
So, I'm not sure any more which scheme is preferable -- but we
definitely need to say clearly which we want.
[000222 CodeSourcery -- Mark]
So, it would be helpful if we were to add:
[000229 SGI -- Jim]
Comparisons (5.10) of pointers to virtual member functions are undefined.
So, for pointer-to-function-member comparisons,
we only need to worry about non-virtual members and null.
Since the representation stores the actual address of the function descriptor,
we should be able to just compare the pointers, and ignore the adjustment.
For conversions between base classes,
it seems that we need only modify the adjustment,
and then only if one is not primary for the other.
For conversion to null,
it seems that we need only set the pointer to 0,
and can ignore the adjustment.
[000302 All]
Represent NULL by a 0 pointer, with the adjustment unspecified.
[000222 CodeSourcery -- Mark]
We haven't specified a way to represent a NULL pointer to data member.
G++ presently adds one to the offset,
allowing zero to serve as the NULL pointer to member.
[000223 CodeSourcery -- Mark]
What is the value for the NULL pointer to data member?
I guess -1 would do,
unless there are cases I can't think of where the pointer
to member would legitimately have a negative value.
Maybe 0x8000000000000000 is better...
It's illegal to do this if the base is virtual. But, that's the
only case in which the `this' pointer can increase.
[000229 SGI -- Jim]
From the Standard:
So we can conclude that,
since we always allocate non-virtual bases before data members,
any base object in a derivation chain will have its base address
smaller than any of the data members declared in members of the chain.
Therefore, the offset represented by a pointer-to-data-member
will always be non-negative,
even after the permitted conversions above.
So, we could either use -1 for NULL, or use 0 and increment the offset.
0x800...000 is an unnecessary complication.
[000302 All]
Represent NULL by the value -1.
[000406 CodeSourcery -- Nathan]
The current RTTI proposal loses the property that all type_info objects
can be compared for equality and orderability by address comparison.
Instead, type_info::operator== must involve a virtual function call
or unconditionaly strcmp.
(An alternative of testing the typeid of the
polymorphic type_info objects results in infinite recursion!)
Here are two proposals which reinstate the address equality property.
The first is rather different to the current scheme, but when I was
done documenting it, I realised there was a minor modification to the
current scheme, which partially reinstates the address equality. I
present both for consideration. Feel free to shot them down ...
The base class of these is:
This contains a pointer to the type_info object
produced by the typeid operator,
for whatever type this is describing.
That will be a unique object.
There are a number of necessary derivations of this type,
which can be taken largely unaltered from the current proposal.
It is necessary to distinguish function types, so that catch matching
can distinguish a data pointer object from a function pointer object.
Other types (fundamental, enum, array) need not be distinguished,
and can be represented by an abi::__type_info object.
(Or we could keep the current proposal of having separate derivations
for these.)
Pointers are as they currently are,
other than the base class change.
We still need the incomplete target flag.
Pointers to member could be a sibling class of non member pointers.
However, they do share common functionality,
and IMO it makes sense to derive from __pointer_type_info.
The __class_type_info, __si_class_type_info and __vmi_class_type_info
are unchanged, other than the change to __class_type_info's base.
The vtable slot -1,
(which currently holds a pointer to the std::type_info object for a class),
points to the abi::__class_type_info object.
To implement typeid(X),
where X is polymorphic,
involves an additional indirection through the
abi::__type_info base to return the `type' member.
dynamic_cast uses the abi::__class_type_info object pointed to in the vtable.
throwing and catch matching use the abi::__type_info object
for the type being thrown or caught.
As with the current proposal,
an incomplete type is represented by an abi::__class_type_info object.
Note that its abi::__type_info base
will point to the unique std::type_info object for that type,
regardless of whether a DSO completes the type.
This incomplete type is prevented
from preempting the complete type information.
Also direct or indirect pointers to incomplete have their incomplete
flag set,
and are also prevented from preempting the equivalent pointer to
complete object.
During catch matching,
comparison of pointers can compare the abi::__pointer_type_info addresses,
unless either has the incomplete flag set,
in which case the std::type_info objects pointed to must be compared.
(The std::type_info objects could be compared even when the incomplete
flags are clear.)
There are two or three naming schemes with this proposal:
Advantages of this proposal are:
The cost of this proposal is
The first proposal is essentially
using the std::type_info objects as unique objects,
via which incomplete types can be compared.
We already have such a unique object candidate --
the NTBS name member of std::type_info.
Currently we've not said anything about that.
If, however, we give that NTBS comdat linkage, a unique name,
and prevent it being commonized with other strings, we have a proxy.
These features can be obtained by treating it as a
`const char []' rather than a string constant.
type_info equality and orderability can now use the address of this array,
rather than the type_info objects themselves.
We can do this in all cases,
even though it is only necessary for the pointer to incomplete case,
as that avoids a virtual function call.
Here is an implementaion of type_info::operator==
We need to specify the naming scheme for the NTBS.
The advantages of this are
The costs over proposal A are
[000411 CodeSourcery -- Nathan]
Issue 2
The algorithm for collation order of type_infos,
cannot simply compare addresses for non-pointer types,
and complete pointer types.
Using string collation only works
when one of the types is a pointer with the incomplete_mask set.
There are two difficulties.
Firstly, we might be
comparing a non-pointer type_info with a pointer type_info. We need to
determine this and DTRT WRT the incomplete flag of the pointer
type_info. to do that will require dynamic_cast or typeid'ing the
type_infos. Secondly, assume we are just comparing pointer type_info's.
We have two pointers to complete, Aptr and Bptr, and a third pointer to
incomplete, Cptr.
There is nothing maintaining the consistency of the results of these
three tests -- result 1 is uncorrelated with results 2 & 3.
Therefore type_info::before must be implemented as string compare on
the type's names. We lose any advantage of commonizing the type_infos.
Issue 3
17.4.4.4 prevents an implementation adding member functions to one
of the std classes, except in particular circumstance. About the only
leeway given is whether a particular non-virtual function is inline or
not. So I presume we're not permitted to add virtual member functions
to std::type_info (18.5.1). The rules given in 17.4.4.4 specifying what
member functions can be added look like applications of the as-if rule,
but there must be something deeper going on, as if that was all, it
wouldn't be mentioned. I'm not sure how a conforming program could tell
whether additional functions had been added.
The abi requires us to add virtual functions to type_info.
For instance the implementation of operator== will require it to
deal with pointers to incomplete. G++ needs several for catch matching.
Issue 4
5.2.8 talks about typeid returning something derived from type_info,
but the footnote mentioning extended_type_info implies to me that
typeid always returns objects of the same type.
Again, I'm not sure how a conforming program could tell.
The two proposals above resolve these issues.
Proposal A resolves issues 2,3 &4,
whilst proposal B resolves issue 2 only,
and will leave us (slightly) non-conformant.
[000413 All]
Proposal B resolves the remaining issue,
and the group is inclined to accept it,
while considering whether to go further with A.
Jim will (and has) integrated B into the
Draft C++ ABI for IA-64.
[000504 All]
[000407 CodeSourcery -- Nathan]
__pointer_to_member_type_info is derived from type_info.
I strongly recommend it be derived from __pointer_type_info,
as it requires much of the same functionality,
and has the same meanings of its flags.
By subclassing __pointer_type_info, much code could be reused.
Thus point 8 of the rtti classes would become
[000411 CodeSourcery -- Nathan]
incomplete_mask is an inclusive or of the other two flags.
incomplete_klass_mask is only used by __pointer_to_member_type_info,
and __pointer_type_info knows nothing about it (it simply examines the
other two).
A __pointer_type_info or __pointer_to_member_type_info sets the
incomplete_mask and incomplete_chain_mask, if the target is an
incomplete
type, or has its incomplete_mask set.
A __pointer_to_member_type_info sets the incomplete_mask and the
incomplete_klass_mask, if the class of the member is incomplete.
[000411 Ed.]
[000413 All]
(Ed. note) I've added updates to the
Draft C++ ABI for IA-64.
[000504 All]
[001012 all -- Jim]
The issue here, raised originally by Martin, I will open as A-30.
Implementations will generally need additional virtual functions
associated with the type_info hierarchy to implement such functionality
as dynamic cast. Gcc for instance has functions __is_function_p,
__do_catch, __pointer_catch, ...
A program that is built from pieces from different compilers, where the
pieces come from different implementations of the hierarchy, will see
different structures, at least in the vtables, if we allow this extra
material to be arbitrary, creating a problem if such programs actually
make use of parts of the hierarchy.
We worked out the following possible solution:
The implementation will create one instance of this class for each of
the classes derived from std::type_info, and we will specify a
mangled name for it.
Now an implementation can add an arbitrary set of functions to
__cxa_aux_typeinfo, specialized to the derived class like a virtual
function, without changing the external interface (to the user) of
the hierarchy.
[001103 SGI -- Jim]
[...leaving out much discussion...]
So, after all the above, I suggest the following actions:
[001109 all]
[001019 CodeSourcery -- Mark]
I think I recall that the committee was intentionally trying to use
the tail padding of one object to save space. For example, consider:
(These are PODs, but you can easily make an equivalent non-POD
example).
Here, I think the comittee wanted to give `B' size 4, by packing `d'
into the tail padding of `A'.
I think this is a mistake. David Gross came up with the following
example:
Code generator needs to copy dsize, not sizeof, unless it can prove
that the object is in a context where tail padding isn't overlayed.
Reason? Tail padding might be overlayed by a volatile field.
Hence, a non-POD that looks like
requires ld2/st2/ld1/st1 for a copy instead of ld4/st4 because we
might have
Similarly, people using memcpy to copy around POD components of
non-PODs will get burned.
This completely breaks user expectation since people routinely expect
to be able to stick a function or two into a POD without changing its
layout.
I think we should make the following changes:
Note that this still permits the empty base optimization; nvsize will
be zero, and sizeof will be 1.
There's an important different between using the tail padding in an
empty base and the tail padding in a generic object: you know that you
never have to copy an empty base.
[001109 all]
Therefore, we have decided to eliminate the overlaying of tail padding.
Mark will provide alternate proposed wording for the ABI document.
[990623 HP -- Christophe]
The following proposal applies only to calls to virtual functions
when a this pointer adjustment is required from a base class to a
derived class.
Essentially, this means multiple inheritance, and the
existence of two or more virtual table pointers (vptr)
in the complete object.
The multiple vptrs are required so that the layout
of all bases is unchanged in the complete object.
There will be one additional vptr for each base class which already
required a vptr,
but cannot be placed in the whole object so that it shares its vptr
with the whole object.
Note: when the vptr is shared,
the base class is said to be the "primary base class",
and there is only one such class.
For the primary base class, no pointer adjustment is needed.
For all other bases, a pointer to the whole object is not a pointer
to the base class,
so whenever a pointer to the base class is needed,
adjustment will occur.
In particular, when calling a virtual function,
one does not know in advance in which class the function was actually defined.
Depending on the actual class of the object pointed to,
pointer adjustment may be needed or not,
and the pointer adjustment value may vary from class to class.
The existing solution is to have the vtable point not to the function itself,
but to a "thunk" which does pointer adjustment when needed,
and then jumps to the actual function.
Another possibility is to have an offset in the vtable,
which is used by the called function.
However, more often than not, this implies adding zero.
Virtual bases make things slightly more complicated.
In that case, the data layout is such that there is only
one instance of the virtual base in the whole object.
Therefore, the offset from a this
pointer to a same virtual base may change along the inheritance tree.
This is solved by placing an offset in the virtual table,
which is used to adjust the this pointer to the virtual base.
My proposal is to replace thunks with offsets,
with two additional tricks:
The thunks are believed to cost more on IA64 than they would on
other platforms.
The reason is that they are small islands of code spread throughout the code,
where you cannot guarantee any cache locality.
Since they immediately follow an indirect branch,
chances are we will always encounter both a branch misprediction and a
I-cache miss in a row.
On the other hand,
a virtual function call starts by reading the virtual function address.
Reading the offset immediately thereafter should almost never cause a
D-cache miss (cache locality should be good).
More often than not, no adjustment is needed,
or the adjustment will be done at call site correctly.
In the worst case scenario, we perform two adjustments,
one static at call site, and one dynamic in the callee,
but this case should be really infrequent.
The new calling convention requires that the 'this' pointer on entry
points to the class for which the virtual function is just defined.
That is, for A::f(),
the pointer is an A* when the main entry of the function is reached.
If the actual pointer is not an A*,
then an adjusting entry point is used,
which immediately precedes the function.
In the following, we will assume the following examples:
convert_to_D and convert_to_E are likely to be at the same offset in
the vtable. This is not a problem, even if D and E are used in the
same class, such as F, because this is the same offset in different
vtables.
The fact that an offset is reserved does not mean that it is
actually used. A vtable need to contain the offset only if it refers
to a function that will use it. An offset of 0 is not needed, since
the function pointer will point to the non-adjusting entry point in
that case.
In other words, adjustment is made only when necessary, and at a
place where it is better scheduled than with thunks. The only bad
case is double adjustment for call_Cg called with an E*. This case
can probably be considered rare enough, compared to calls such as
call_Cg called with a C*, where we now actually do the adjustment at
the call-site.
Currently, the sequence for a virtual function call in a shared
library will look as follows. I'm assuming +DD64, there would be some
additional addp4 in +DD32. The trail below is the dynamic execution
sequence. In bold and between #if/#endif, the affected code.
[990812 All]
Discussion of B-6 raises questions of impact on the above approach.
Christophe will look at the issues.
[990826 Cygnus -- Jason]
[An alternative suggestion from Jason via email.]
Rather than per-function offsets, we have per-target type offsets.
These offsets (if any) are stored at a negative index from the vptr.
When a derived class D overrides a virtual function F from a base class B,
if no previously allocated offset slot can be reused,
we add one to the beginning of the vtable(s) of the closest base(s)
which are non-virtually derived from B.
In the case of non-virtual inheritance, that would be D's vtable;
in simple virtual inheritance, it would be B's.
The vtables are written out in one large block,
laid out like an object of the class,
so if B is a non-virtual base of D,
we can find the D vtable from the B vptr.
D::f then recieves a B*, loads the offset from the vtable,
and makes the adjustment to get a D*.
The plan is to also have a non-adjusting vtable entry in D's vtable,
so we don't have to do two adjustments to call D::f with a D*;
the implementation of this is up to the compiler.
I expect that for g++,
we will do the adjustment in a thunk which just falls into the main function.
The performance problems with classic thunks occur when the thunk is
not close enough to the function it jumps to for a pc-relative branch.
This cannot be avoided in certain cases of virtual inheritance,
where a derived class must whip up a thunk for a new adjustment
to a method it doesn't override.
In this case, we will only ever have one thunk per function,
so we don't even have to jump.
Except in the case of covariant returns, that is,
where we will have one per return adjustment.
But we know all necessary adjustments at the
point of definition of the function,
so they can all be within pc-relative branch range.
[Extensive discussion followed by email --
this suggestion is not completely correct,
but may be the basis of a workable solution.]
[990831 Cygnus -- Ian]
A couple of observations ...
On the state of the art:
The Microsoft approach is worth mentioning.
(I haven't seen it discussed --
though perhaps that is because of the patent situation.)
It allows zero-adjusting (i.e. non-thunking) calls for (almost)
every virtual function call in a non-virtual,
multiple inheritance hierarchy.
For those that are unfamiliar,
the idea is that all calls go via the base class vft and overriding
functions expect a pointer to the base class type.
(That is, if D::f overrides B::f, it expects the first
parameter to be of type B*, not D*.)
The callee does the necessary static adjustment to get to the
derived class 'this' pointer as needed.
It avoids requiring a thunk,
and it's often the case that the cost is zero in the callee because
the this-adjustment can be folded into other offset computations.
On the balance,
it could well win over all the other approaches being discussed here.
[Though, it may lose in some specific cases vs. Christophe's approach
where one would create additional extra entries in
the derived class vft.]
On when to make extra virtual function table entries for functions:
One of Cristophe's suggestions is sort-of separate
from the rest of the discussion:
making extra entries in the derived class' vft for some
overridden virtual functions.
It has the benefit of giving you a faster calls if you happen to be in
(or near) the derived class -- at the expense of space in the vft.
Of course, you can always make the call through the introducing base class,
so these extra entries are a pure space/time performance trade off
(w/ some unpredictable D-cache effects) and the cost/benefit analysis
will depend a little on what the rest of the strategy looks like.
The same idea is potentially applicable,
no matter what strategy you actually use for vft layout,
and different criteria for deciding what extra entries to make are possible.
For example,
creating an extra entry when overriding a function introduced in a
virtual base has the added benefit of avoiding a cast to a virtual
base at the call site.
[990909 All]
We are getting closer --
understanding of the alternatives is improving,
and Christophe may agree with the Jason/Brian proposal after more thought.
To make sure we really understand what we're agreeing to,
Jason and Christophe will write up more precise proposal(s).
[991111 jason]
We have decided that for virtual functions not inherited from a virtual base,
regular thunks will work fine,
since we can emit them immediately before the
function to avoid the indirect branch penalty;
we will use offsets in the
vtable for functions that come from a virtual base,
because it is impossible to predict what the offset between the
current class and its virtual base will
be in classes derived from the current class.
The calling convention is as follows:
For each virtual function defined in a class,
we add an entry to the primary vtable if one is not already there.
In particular, a definition which overrides a function inherited from
a secondary base gets a new slot in the primary vtable.
We do this to avoid useless adjustments when calling a virtual
function through a pointer to the most derived class.
When a class is used as a virtual base,
we add a vcall offset slot to the beginning of its vtable for each of
the virtual functions it provides,
whether in its primary or secondary vtables.
Derived classes which override these functions will use the slots to
determine the adjustment necessary.
As in Christophe's proposal above,
the caller adjusts the 'this' argument to
point to the class which last overrode the function being called.
The result provides both the 'this' argument and the vtable pointer
for finding the function we want.
Each virtual function 'f' defined in a class 'A' has one entry point
which takes an A*, and performs no adjustment.
The primary vtable for A points to this entry point.
For each secondary vtable from a non-virtual base class 'B' which
defines f,
an additional entry point is generated which performs the constant
adjustment from B* to A*.
For each secondary vtable from a virtual base class 'C' which defines f,
an additional entry point is generated which performs the adjustment
from C* to A* using the vcall offset for f stored in the secondary
vtable for C.
For each secondary vtable from a base 'D' which is a non-virtual base
of a virtual base 'E',
an additional entry point is generated which
first performs the constant adjustment from D* to E*,
then the adjustment from E* to A* using the vcall offset for f stored
in the secondary vtable for E.
Note that the ABI only specifies the multiple entry points;
how those entry points are provided is unspecified.
An existing compiler which uses thunks could be converted to use this
ABI by only adding support for the vcall offsets.
A more efficient implementation would be to emit all of the thunks
immediately before the non-adjusting entry point to the function.
Another might use predication rather than branches to reach the main function.
Another might emit a new copy of the function for each entry point;
this is a quality of implementation issue.
[991202 all]
[990610 Matt]
One possibility is to have two Vtable entries,
which might point to different functions, different entrypoints,
or a real entrypoint and a thunk.
Another is to return two result pointers (base/derived),
and have the caller select the right one.
[990715 All]
Daveed presented his multiple-return-value scheme,
including an example that involved virtual base classes,
return values that are pointers to nonpolymorphic classes,
and other equally horrible things.
Consensus: we need to get the horrible cases correct,
but speed only matters in the simple case.
The simple case: class B has a virtual function f returning a B1*
and class D has a virtual function f returning a D1*,
where all four classes are polymorphic,
B is a primary base of D, and B1 is a primary base of D1.
(The really important case is where B1 is B and D1 is D,
but that simplification doesn't make any difference.)
Jason: Would the usual multiple-entry-point scheme work just as well?
That is, would it be just as fast as Daveed's scheme in the simple case,
and still preserve enough information for the more complicated cases?
It appears so, but we don't have a proof.
Jason will try to provide one.
[990716 Cygnus -- Jason]
Proof?
You always know what types a given override must be able to return,
and you know how to convert from the return type to those base types.
You know from the entry point which type is desired.
Seems pretty straightforward to me.
[990716 Cygnus -- Jason]
The alternative I was talking about yesterday goes something like this:
When we have a non-trivial covariant return situation,
we create a new entry in the vtable for the new return type.
The caller chooses which vtable entry to use based on the type they want.
This could be implemented several ways,
at the discretion of the vendor:
The advantage of this approach to the complex case is that we don't have to
do a dynamic_cast when faced with multiple levels of virtual derivation.
It is also strictly simpler;
Daveed's model already requires something like
this in cases of multiple inheritance.
Of course, we can always mix and match;
we could choose to only do this in cases of virtual inheritance,
or use Daveed's proposal and do this only in
cases of repeated virtual inheritance.
In that case, the multiple returns
would just be an optimization for the single virtual inheritance case.
Since we don't seem to care about the performance of
anything but single nonvirtual inheritance,
it seems simpler not to bother with multiple returns.
The remaining question is how to handle the case of nontrivial
nonvirtual inheritance:
do we use multiple slots or have the caller do the adjustment?
My inclination is to have the caller adjust.
WRT patents,
the idea of having the function return the base-most class and having
the caller adjust is parallel to the patented Microsoft scheme whereby
they pass the base-most class as the 'this' argument to virtual functions,
but the word 'return' does not appear anywhere in the patent,
so it seems safe.
[990722 All]
The group was generally agreed that the simplicity of multiple entries
in the vtable outweighed any space/performance advantage of more
complex schemes (e.g. the method Daveed described on 15 July).
Discussion focussed on whether it is worthwhile to eliminate some of
the entries in cases where they are unnecessary because the caller
knows the required conversion,
namely when the return type has a unique non-virtual subobject of the
original return type.
Agreement was reached to avoid the complication of eliminating some of
the Vtable entries.
Thus, the Vtable will have one entry for each accessible return type of
a covariant virtual function.
These may be implemented in a variety of ways,
e.g. duplicated functions, separate entrypoints, or stubs,
and the ABI need not specify the choice.
The location of the Vtable entries is part of the separate Vtable
layout issue B-6.
[990604 HP -- Christophe]
Mike (Ball) gave me what I believe is an excellent definition of
when caching is allowed. I'd like him to present it.
[990805 All]
Christophe explained that the rule is simply that,
within a call to a member function of the class,
the class Vtable may not be modified.
Between such calls, no assumption may be made.
With this observation, the issue is closed.
[990812 All]
The rule is even simpler.
Once a program changes the type of a pointer's target,
the pointer is invalidated, and its value may not be reused.
Therefore, a code sequence which repeatedly refers to the same pointer
value is invalid if the pointee's vtable has been changed.
[990624 All]
Note that putting GP in the Vtable prevents putting it in shared memory.
See B-7.
[990805 All]
It was decided that special representations to accomodate shared memory
would be expensive and therefore undesirable.
Therefore, the decision is to put the function address/GP pair in the
vtable, avoiding the cost of an extra indirection in using it.
[991007 IBM -- Brian]
A while ago Jason was worried about COM compatibility.
Part of that is to ensure that vtables can be expressed in C.
But the resolution of issue B-4 says that a vtable contains
function descriptors rather than function descriptor pointers.
From the standpoint of call performance that is a good thing,
but the result can't be built in C.
I know that we at least will also have to rewrite parts of our
C++ runtime that hand-build vtables.
Neither of these are critical for IBM but may be for others.
[991103 Cygnus -- Richard Henderson]
[991106 SGI -- Jim]
Richard Henderson of Cygnus points out that the IA-64 relocations
don't support doing this (inserting a function descriptor in data).
However, the R_IA_64_IPLT*SB relocations do perform the correct action.
The problem is that they are currently specified to be valid only
in executables and shared objects.
I believe that the problem can be solved by simply removing this restriction.
The static linker support required shouldn't be major --
it would presumably just pass the relocations through to the linked
object and let the dynamic linker deal with them.
The above issue has been raised with the IA-64 base ABI group.
[990624 Cygnus -- Jason]
There are several ways of dealing with vague linkage items:
#3 and #4 are feasible for templates,
but I consider them too heavyweight to be used for other things.
The typical heuristic for #2 is "with the first non-inline,
non-abstract virtual function in the class".
This works pretty well,
but fails for classes that have no such virtual function,
and for non-member inlines.
Worse, the heuristic may produce different results in different
translation units,
as a method could be defined inline after being declared non-inline
in the class body.
So we have to handle multiple copies in some cases anyway.
The way to handle this in standard ELF is weak symbols.
If all definitions are marked weak,
the linker will choose one
and the others will just sit there taking up space.
Christophe mentioned the other day that the HP compiler used the
typical heuristic above,
and handled the case of different results by encoding the
key function in the vtable name.
But this seems unnecessary when we can just choose one of multiple defns.
A better solution than weak symbols alone would be to set things up so
that the linker will discard the extra copies.
Various existing implementations of this are:
The GNU ELF toolchain does a variant of #1 here;
any sections with names beginning with ".gnu.linkonce."
are treated as COMDAT sections.
It seems more sensible to me to key off of the section name
than the first symbol name as in PE.
The GNU linker recently added support for garbage collection,
and I've been thinking about changing our handling of vague
linkage to make use of it, but haven't.
I propose that the ia64 base ABI be extended to
provide for either COMDAT sections or garbage collection,
and that we use that support for vague linkage.
I further propose that we not use heuristics to
cut down the number of copies ahead of time;
they usually work fine, but can cause problems in some situations,
such as when not all of the class's members are in the same symbol space.
Does the ia64 ABI provide for controlling which symbols
are exported from a shared library?
A side issue: What do we want to do with
dynamically-initialized variables?
The same thing, or use COMMON?
I propose COMMON.
See also G-3, for vague linkage of inlined routines and their static variables.
[990624 SGI summarizing others]
Defining a COMDAT mechanism doesn't preclude using heuristics to avoid
some copies up front.
A COMDAT mechanism should also specify how to get rid of associated
sections like debugging info, unless the identical mechanism works.
[990629 HP -- Christophe]
This breaks and becomes non-portable if:
Now, the COMDAT issue is as follows:
a COMDAT section is, in some cases, slightly more difficult to handle
(at least, that's the impression Jason gave me).
For statics with runtime initialization,
what you can do is reserve COMMON space ('easier'),
then initialize that space at runtime.
As I said, the problem is if two compilers disagree on whether this
is a runtime or a compile time initialization, such as in :
So I personally recommend that we put everything in COMDAT.
[990715 All]
Consensus so far: use a heuristic for vtable and typeinfo emission,
based on the definition of the key function.
(The first virtual function that is not
declared inline in the class definition.)
The vtable must be emitted where the key function is defined,
it may also be emitted in other translation units as well.
If there is no key function then the vtable must be emitted in any
translation unit that refers to the vtable in any way.
Implication: the linker must be prepared to discard duplicate vtables.
We want to use COMDAT sections for this
(and for other entities with vague linkage.)
Open issue: the elf format allows only 16 bits for section identifiers,
and typically two of those bits are already taken up for other things.
So we've only got 16k sections available,
which is unacceptable if we're creating lots of small sections.
Jason - COMDATs disappear into text and data at link time,
so the issue is really only serious if we've got more than 16k vtables
(or template instantiations, etc.)
in a single translation unit.
Daveed - HP has gotten around this problem by hacking their ELF files
to steal another 8 bits from somewhere else.
Jack - a new kind of section table could be a viable solution.
However, it would break everything if we did it for ia32.
Is a solution that only works on ia64 acceptable?
Note also that the elf section table has its own string table,
which we wouldn't be able to share with the new kind of section table.
Index and link fields often point into section table,
we would have to figure out how to deal with this.
(Jack is not opposed to the idea of an alternate section table,
he is just pointing out some of the issues we will have to resolve.)
[990805 All]
[revised 991012 SGI]
[991007]
Change default to simply group; COMDAT semantics is option.
Don't support removal based on duplication of non-COMDAT sections.
Just remove symbols defined relative to removed sections.
C++ has many situations where the compiler may need to emit code or data,
but may not be able to identify a unique compilation unit
where it should be emitted.
The approach chosen by the C++ ABI group to deal with this problem,
is to allow the compiler to emit the required information in multiple
compilation units,
in a form which allows the linker to remove all but one copy.
This is essentially the idea called COMDAT in several existing
implementations.
Various other implementations (notably Windows NT) and proposals obtain
more generality by varying the duplicate removal semantics.
The most obviously useful variant supports grouping of sections for
removal purposes, but treats duplication as an error,
using it to support link-time removal of unreferenced sections.
The proposal below treats this simple grouping as the default semantics,
and provides duplicate removal as an option.
Our objectives include:
The proposal below is based on the HP definition,
with minor modifications and more precise definitions.
This attribute flag may be set in any section header,
and no other modification or indication is made in the grouped sections.
All additional information is contained in the associated
SHT_GROUP section (see below).
Some sections occur in interrelated groups.
For instance, an out-of-line definition of an inline function might require,
in addition to its .text section,
a read-only data section containing literals referenced,
one or more debug information sections,
and/or other informational sections.
Furthermore, there may be internal references among these sections that
would not make sense if one of them were removed or
replaced by a duplicate from another object.
Therefore, we assume that such groups are to be included or omitted
from the linked object as a unit.
(Except for the GRP_COMDAT flag described below,
this definition does not specify the circumstances under which the
members of a group might be discarded from the linked object.)
To facilitate this, we define a SHT_GROUP section:
The section header attributes of a Group Section are:
The section group's
The section data of a SHT_GROUP section is a flag word
followed by a sequence of section indices.
The flag word may contain the following flags:
The section indices in the SHT_GROUP section identify
the sections which make up the group.
The
The linker may choose to discard a section in a group,
i.e. not include its data in the linked object,
based on COMDAT duplicate semantics (above),
or for other implementation-defined reasons
(e.g. removing unreferenced code).
If it does so, the group semantics requires that all of the group
members be removed as a unit.
(Note, however, that this is not intended to imply that special-case
behavior like removing debug information requires removing the sections
to which it refers, even if they are in a group.
We could clarify this issue by tying the removal semantics to the
section which contains the identifying symbol,
but this seems overly restrictive and unnecessary.
The above rules allow a group to be removed without leaving
dangling references, with only minimal processing of the symbol table.
[revised 991012 SGI]
[991007]
Change section/flag names, move ELF header extension to section header 0.
SGI has long been concerned about the 64K limitation on the number of
sections in an object file.
Although this need not normally be a problem,
there are purposes for which we would like to place distinct functions,
and sometimes data items,
in distinct sections.
When one takes into account associated sections,
e.g. relocation, debug information, etc.,
this leads to a limitation on the order of 16K units,
and threatens to be a problem for some large compilation units such as
machine-generated simulators.
C++ ABI efforts raise the same issue from another source.
Various C++ structures are emitted under circumstances
where the compiler cannot reliably identify a single compilation unit
in which to emit them.
Examples include common cases like class virtual tables,
out-of-line copies of inline functions,
and template instantiations.
The favored solution is COMDAT sections,
i.e. putting the potentially duplicated items in their own sections,
and allowing the linker to remove the duplicates.
Once again, though, this threatens to be a problem for very large
compilation units.
The following proposal attempts to remove this limitation.
Obviously, even if the problem is real,
it will actually arise in very few compilation units.
Therefore, the elements of the proposed solution are defined so as to
leave unchanged object files which do not encounter the problem.
We consider this compatibility objective as primary --
much more important than performance or
clean definitions for the problematic object files --
particularly as it should allow vendors to merge the solution into
existing tool chains at convenient times without disrupting existing
programs.
Proposed ABI wording is in normal font; commentary is in italics.
Section numbers are from the Intel IA-64 psABI.
The range of section indices from 0xff00 (SHN_LORESERVE) to
0xffff (SHN_HIRESERVE) is reserved for special purposes,
and the gABI already forbids real sections with these indices.
Our approach is to deal with situations where section indices cannot
be compatibly expanded to a full 32 bits
by using one of these indices as an escape value indicating that the
actual index will be found elsewhere.
The ELF header has two relevant 16-bit fields:
e_shnum contains the section count,
and e_shtrndx the index of a string section.
We modify their descriptions to include an overflow indicator,
and put the actual values in the reserved section header at index 0
if necessary, as follows:
If the number of sections is greater than
If the section name string table index is greater than
We define a new special section index as an escape value for
large section indices, as referenced above:
We note here that the section header contains two fields commonly used
to hold section indices,
A new section type is defined:
The
A new special section name is defined:
There is no available field to point from the
The symbol table is the most problematic.
It has no convenient location for an expanded section index.
Therefore, we propose that the escape value imply redirection to a
separate, parallel table containing full-size section indices.
Modify the definition of
As the
If any of the
The .dynsym section in a linked object is completely analogous to a
.symtab section in a relocatable object,
and could be handled in the same way with the addition of a dynamic tag
to locate it.
We have not specified handling here because we expect the linking process
to remove most of the section duplication process which causes the problem,
e.g. leaving only a small number of .text sections.
There should be no compatibility impact on existing environments,
since only very large section counts require object file changes.
Individual vendors can postpone implementation until convenient,
with no impact on typical programs.
Note, however, that any ELF consumer applications that are currently
storing section indices as 16-bit values must change.
[991014 All]
[991118 All]
[990624]
Issue split from A-1.
[990630 HP - Christophe]
The current full proposal has been incorporated in the
Draft C++ ABI for IA-64.
[990701 All]
The above arrived to late for everyone to read it carefully.
It was agreed that we would consider it outside the meetings,
discuss any issues noted by email,
and attempt to close on 22 July.
(Christophe is on vacation until that week,
and Daveed leaves on vacation the next week.)
[990811 SGI -- Jim]
I've put a reworked version of Christophe's writeup in the
Draft C++ ABI for IA-64,
along with a number of questions it raises.
[990812 All]
Extensive discussion of this issue produced the observations that
[990820 IBM -- Brian]
I'm going to write the exam on this to see how well I
am understanding the issue.
If I understand it correctly,
the proposal under consideration is tied to the decision to replicate
virtual function entries in vtables.
It requires replicating in the vtable for base class B all virtual functions
that are overridden in B; more replication that this implies will
be wasted since a function is always called through a vtable
of an introducing or overriding class.
When a non-pure virtual function X::f() is compiled it is possible to
determine whether it requires a secondary entry point.
It will require one if that function may be virtually called
(i.e., is the final overrider)
in any class in which f() appears in more than one vtable;
this needs to be decidable knowing only X.
A rule that works is: X::f() overrides one or more f()'s
from base classes of X,
and either one or more of those base classes are
virtual or X fails to share its vptr with all instances of them.
[Though a virtual base may happen to share its vptr with X
in an object of complete type X,
that relationship may fail to hold in further derived classes,
so we need to generate the secondary entry point just in case.]
["Sharing a vptr" is the condition under which no adjustment is necessary;
if the bases involved are all nonvirtual then
subsequent class derivation won't change this.]
Each vtable that requires a nonzero adjustment will have a
"convert to X" offset mixed in with its virtual base offsets.
It is necessary that a "convert to X" appears in the same position in
each vtable that references X::f()'s secondary entry;
it is desirable that the "convert to X" also be unique in each vtable.
Assume that X has nonvirtual nonprimary bases Nx (x=1,2,...),
and virtual bases Vx, all of which have a virtual f().
Then vtables for Nx in X,
or in anyclass derived from X that does not further override f(),
will reference X::f()'s secondary entry.
Vtables for Vx in X or any derived class where Vx
does not share a vptr with X,
will also reference X::f()'s secondary entry;
note this will occur in a construction vtable even if the
derived class does further override f().
The question, then,
is whether a position for the "convert to X" offset can be chosen,
knowing only X and its parentage,
that can be used consistently in all those vtables and that won't
collide with a "convert to Y" position chosen on account of some other
hierarchy where Y::g() overrides an Nx::g() or Vx::g().
If Y derives from X,
we will be able to select a "convert to Y" position that doesn't conflict,
so we can restrict our attention to cases where X and Y are unrelated.
Also, if the base involved is nonvirtual (Nx) then we are safe,
because no instance of Nx will be a subobject of both X and Y,
so no Nx vtable will require both "convert to X" and "convert to Y" offsets.
The remaining case is where X and Y are unrelated but both have
a virtual base Vx:
The vtable for N1 in ZZ does require both offsets.
The only way I see to accomplish this is to preallocate
an adjustment slot for each virtual function in V1.
That is, X::f() uses the first slot position, and Y::g() the second,
based on the order that f() and g() are declared in V1.
This only needs to be done in hierarchies where V1 is virtual,
but the same offset has to be used for any Nx tables in X too.
Is this close?
I don't understand the comment that varying numbers of virtual
base offsets make it impossible to concatenate vtables and refer
to them via a single symbol.
The only code that refers by name to X's vtable and the vtables
of N1 in X etc. is X's constructor and destructor,
and maybe some derived classes that find they are able to reuse some pieces.
All that code is aware of X's declaration and can map out its tables.
What am I missing?
[990826 All]
There is still considerable confusion about what will work.
Key questions are
(1) whether member functions can share offsets to base classes,
or each need their own; and
(2) when we need a no-this-adjustment override entry.
[990901 SGI -- Jim]
Being confused myself by all the discussion,
I've constructed a new page
containing (initially)
an example of a class hierarchy supplied by Christophe,
and attempted to identify possible function calls,
the class data layout,
and the class vtable layout based on Christophe's original proposal.
Please provide corrections,
and if you're proposing alternative vtable constructions,
describing them for this example might help (me, at least).
Also feel free to provide additional examples illustrating other points.
[990930 Cygnus -- Jason]
Jason has updated the Vtable layout description in
abi.html
to reflect the approach from Cygnus and IBM.
[991014 all]
ACTION ITEMS: Jason---update writeup to reflect these three changes.
Our decision on issue B-8 will require a one-sentence change.
All of us: study the revised version.
We are almost ready to close this issue,
and if we agree with the revised version we can close it at the
21 October meeting.
[991028 all]
[990624 All]
Note that putting GP in the Vtable prevents putting it in shared memory.
This interacts with B-4.
[990624 HP -- Cary]
For a C++ object to be placed into shared memory,
its vtable pointer must be valid in all processes
that are sharing that object.
One way or another,
we need a way of ensuring that a pointer from shared
memory to private memory is valid in all processes,
which means that we will need a means to ensure that certain shared
library data segments can get mapped at the same address in all
processes that load those certain libraries.
My wild idea a few years ago was to put the vtables in shared memory
(by allocating and building them at load time, as Taligent did),
and store a shared library index in place of the gp value
in each function descriptor.
Each process would have its own table of gp values,
indexed by this shared library index,
but the index space would be managed system-wide.
The C++ runtime library would have been responsible for allocating
a new index for each unique C++ shared library loaded on the system,
then storing the process-local copy of the gp pointer in the
appropriate slot of the table.
[990628 SGI -- Jim]
Note a further problem with vtables in shared memory (Cary's point 2).
If a virtual function comes from another DSO,
it may be pre-empted differently in different programs.
Hence, the function pointer itself is a problem even if the GP isn't.
[990701 All]
An extensive discussion boiled down to a few points:
These ideas are very fuzzy.
Participants should think about the need and possibilities and attempt
to identify more concrete approaches.
[990805 All]
It was determined (largely based on consideration by Jason)
that the only practical approach to putting objects in shared memory
is to force the objects, Vtables, functions, etc. to the same addresses
in the various processes involved.
If this is done, data representation issues are irrelevant.
Therefore, this issue is closed as moot.
Note that the base psABI defines a flag, EF_IA_64_ABSOLUTE,
which forces an executable object to the addresses specified in ELF,
so at least one method of representing this is already available.
[990701 All]
This should be part of the proposal Daveed will put together
by the 15th (action #13); the group will discuss it on the 22nd.
[990812 Sun -- Michael]
Sun has provided a description,
in a separate page,
describing their implementation.
They are filing for a patent on the algorithms described.
[991014 All]
This is closely related to issues A-6 and B-6.
It is agreed that what we need is an offset to the beginning of
the complete object, and a pointer or offset to the typeinfo object.
We choose to have an offset to the typeinfo object instead of a pointer,
which effectively means that the typeinfo object is part of the vtable.
We will put it at the very beginning, at a negative offset from the vptr.
[991027 SGI -- Matt]
At the October 14 meeting we decided to include RTTI information as
part of the vtable block, and to include an offset to RTTI information
in the vtable rather than a pointer to RTTI information. (We decided
on this change so that we would have fewer symbols to resolve at link
time.)
Jim came up with a serious objection at the October 21 meeting:
during construction we need different RTTI information at different
points. A few of us talked about this at Kona, and my impression is
that Jim's objection is fatal. We could imagine having base class
typeinfo objects in every vtable block, but (1) this would kill any
performance advantage we'd get by using an offset rather than a
pointer; and (2) we'd lose the ability to use simple pointer identity
as a way of telling whether two typeinfos represent the same type.
I propose that we abandon that decision, and go back to using pointers.
Does everyone agree?
[991028 All]
Agreed.
[000217 All]
Jason noticed an issue today involving the layout of primary vtables.
Our chosen layout starts with the primary base class vtable layout
(if any),
and adds additional vbase/vcall offsets to the beginning,
and additional vfunc pointers at the end.
It is then followed by the secondary vtables, in inheritance graph order.
We have assumed, for instance in our decision
not to propagate vbase offsets from non-virtual bases,
that the secondary vtables were directly
accessible at compile-time offsets from the primary vptr.
However, this is not currently the case if we are dealing with
a class that is the primary base of a derived class.
The derived class's additional vfunc pointers will be added between the
base class vtable and its secondary vtables for the base's base classes.
Therefore, non-overridden base class member functions, at least,
can't make assumptions about secondary vtable offsets.
One can, of course,
get to the secondary vtable via the secondary vptr in the object,
but that costs an additional load.
There is a "solution" that should work, but is a touch ugly.
That would be to place the additional vfunc fields for the derived class
not immediately after the primary base vtable,
but after all of its non-virtual secondary vtables.
If we don't think this is worthwhile,
we should reconsider the decision about promoting vbase offsets.
[000302 All]
It was decided that the simplest solution is to include vbase pointers
for all virtual bases,
even those with vbase pointers in direct base vtables.
They may then be referenced via either the primary or the secondary vtable.
[000629 CodeSourcery -- Mark]
We need to have a standard entry point to put in vtables to indicate a
pure virtual function.
(Some compilers use __pure_virtual, for example.)
I think we want:
[000706 All]
Accepted.
We will not mandate behavior,
since this will be called only in case of Standard-specified undefined
behavior,
but will comment that program termination is expected,
possibly after an error message.
[991202 All]
It was decided to use the array forms for all required initialization
or finalization entries,
i.e. to put initialization entries into .init_array sections with ELF
section type SHT_INIT_ARRAY,
and finalization entries into .fini_array sections with ELF
section type SHT_FINI_ARRAY.
The static linker will combine them,
and identify them to the dynamic linker using DT_INIT_ARRAY,
DT_INIT_ARRAYSZ, DT_FINI_ARRAY, and DT_FINI_ARRAYSZ dynamic tags.
[990610 All]
Meeting concensus is that the desirable order is right to left on the
link command line, i.e. last listed relocatable object is initialized
first.
[990701 SGI]
This does not address the global destructor problem.
That solution needs to deal not only with the global objects seen by
the compiler, but also interspersed local static objects.
This treatment seems to be tied up in the question of how early
unloading of DSOs is handled, and the data structure used for that
purpose (issue C-3).
[990715 All]
IBM scheme:
priorities are 32-bit signed integers, higher numbers are higher priority.
Something that isn't explicitly assigned a priority effectively
gets a priority of 0.
Consensus:
nobody is sure that negative priorities are very important,
but also nobody can think of a reason not to allow them.
We accept the idea that priorities are 32-bit signed integers.
On a source level Cygnus will keep lower numbers as higher priority,
but that's a source issue, not an ABI issue.
Status: No real technical issues,
we have consensus on everything that matters.
We need to write up the finicky details.
[990722 all]
To be resolved are the precise source pragma definition (possibly IBM's),
and the ELF file representation.
[990729 all]
IBM noted that they combine their equivalent data structures in the
linker, but don't sort them, leaving that to a runtime routine.
This can be done without explicit linker support,
but involves runtime overhead.
Cygnus suggested that if we are going to require linker sorting,
we should make the facility more general.
Jim will write up a more precise proposal.
My objectives are:
Define a new section type, e.g.
Each implementation shall provide a runtime library function with
prototype:
The linker must take the collection of SHT_CXX_PRIORITY_INIT section
entries from the relocatable object files being linked,
and other initialization tasks specified in other ways
(and treated as source priority 0 or object priority -MIN_INT),
and produce an executable object file which executes the initialization
tasks in priority order using only
Note that if one is linking ELF-32 objects into a 64-bit program,
the entries must be expanded as part of this process.
Jason suggested that if we base this feature on sorting sections,
we should provide a general mechanism.
Following is a proposal for that purpose.
Define a new section header flag,
The sort must be stable.
The sort key must be naturally aligned.
Other conceivable options would be to allow sorting strings
(like SHF_MERGE, this would be indicated by setting SHF_STRING
and putting the character size in
[990810 HU-B -- Martin]
[990810 SGI -- Matt]
The relevant part of the C++ standard is section 3.6.3, paragraph 3:
What this implies to me is that atexit, and the part of the runtime
library that handles destructors for static objects, must know about
each other.
[990812 All]
[991110 SGI -- Jim]
I believe the proposal made need have almost no linker impact.
Consider the second suggested implementation scheme, based on IBM's
description of their approach.
A minimalist implementation (from the linker point of view)
includes:
The one at the end calls
These are both in the implementation runtime. The begin routine
determines the address and size of the SHT_CXX_PRIORITY_INIT section
(below). It sorts the section by priority, and calls
__cxx_priority_init(addr,cnt) as described in the proposal with the
count of <=0 entries.
__cxx_priority_init_end calls __cxx_priority_init(addr,cnt) with the
address and count of >0 entries.
My original proposal did not describe the dynamic tags to delimit the
section, nor the __cxx_priority_init_
Now suppose you want to minimize runtime instead of linker impact --
the first suggested implementation scheme. There are at least two
approaches:
One of my original objectives, and I think a key attribute of this
proposal, is that this full range of possible implementations, from
minimal linker impact to minimal runtime impact, makes absolutely no
difference to the generated .o files -- compatibility between compilers
does not depend on the chosen link-time implementation.
Sorting is a more interesting issue. I see four possibilities:
I'll say up front that I think implicit sorting is adequate for the
purpose at hand, and I'd like to understand other applications before
I'd choose (3) or (4).
There are two differences between (3) and (4):
Either would work for the application at hand. Approach (3) would
require only one SHT_CXX_PRIORITY_INIT section per .o file, while
approach (4) would require up to one such section per constructor call
(though only if the user used lots of different priorities). I
personally think sorting based on a data vector that's already been
concatenated should be much more efficient, but it probably doesn't
matter much.
On the other hand, sorting an arbitrarily-sized section, based on an
external key, is more flexible except that the keys may be more
constrained. So, again, I think the choice comes down to other
applications of the feature. Absent significant other demands, I'd
just stick to implicit sorting (and optional at that) for now.
[991202 All]
The proposed alternative of sorting based on section name is
specifically the Linux implementation of treating all section names
containing a dollar sign ($) as being a section name before the dollar
sign and a sort key after it.
As mentioned above, this has the advantage of being more general,
except with respect to the sort key, which isn't an issue here,
and it is implemented in Linux.
The primary concern with the Linux approach is that some
implementations must deal with static linkers which are under control
of other groups or companies,
and therefore can't depend on getting linker sorting implemented.
IBM has been in that position,
though it isn't clear whether it will be an issue on IA-64.
A secondary concern is a general objection from SGI to features that
depend on section naming rather than section types and attributes.
Jim will attempt to frame the issue and get feedback from the base ABI
group.
[000106 All]
We will wait for base ABI feedback before deciding.
[000502 SGI -- Jim]
We have three choices:
I don't think we should pursue the first unless we have vendors
anxious to support it.
[000504 All]
[000720 All]
Jim reported that the psABI group agreed to allocate a section type for
this purpose, and will add a writeup to the Draft ABI (section 3.3.4).
[000803 All]
We will follow more closely the IBM pragma semantics:
no variable names, applying until the next pragma or end of file.
Rename the pragma simply "priority."
[000808 SGI -- Dehnert]
I remembered why I changed the pragma name.
I'm concerned about "priority" conflicting with more traditional uses
of the term, e.g. for multiprocessing priority.
[000817 All]
Accepted, changing pragma name from init_priority to priority.
There is no conflict with OpenMP or pthreads.
For destructors, the Standard requires opposite-order destruction,
which implies a runtime structure to keep track of the order.
Furthermore, the potential for dynamic unloading of a DSO
(e.g. by dlclose)
requires a mechanism for early destruction of a subset.
[990804 SGI -- Jim]
My objectives are:
The runtime library shall maintain a list of termination functions
with the following information about each:
The representation of this structure is implementation defined.
All references are via the API described below.
When a global or local static object is constructed,
which will require destruction on exit,
a termination function is registered as follows:
The registration function is called separate from the constructor.
When the user registers exit functions with
When linking any DSO containing a call to
Note that the above can be accomplished either by explicitly providing
the symbol and call in the linker, or by implicitly including a
relocatable object in the link with the necessary definitions,
using a .fini_array section for the FINI call.
Also, note that these can be omitted for an object with no calls to
Finally, a main program should be linked with a FINI call to
When
Issue: By passing a NULL-terminated vector of DSO handles to
Since
[991202 All]
During discussion, it was noted that this proposal will not deal
effectively with DSOs which (a) have cross-DSO destructor interactions
and (b) are unloaded dynamically.
It is generally believed that such code would not reliably work on a
variety of platforms today,
and is not a robust methodology worthy of ABI support.
However, note that if it becomes an issue,
it would be possible to define a
[991215 CodeSourcery -- Mark]
[991216 CodeSourcery -- Mark]
What I'm suggesting (for exit finalization) is:
[991217 CodeSourcery -- Mark]
[991220 SGI -- Jim]
In the elf context assumed by the base IA-64 ABI, I expect that a C++
program will typically be running with the C run-time library libc.so,
the C++ runtime library libC.so, likely other system DSOs, and its own
components.
In this context, achieving an integrated solution could be accomplished
in a couple of ways. The obvious one is to replace the routines
atexit, on_exit, and exit in the C run-time library with routines that
are cognizant of the C++ __cxa_atexit and __cxa_finalize facilities.
a less obvious method, but still generally usable, would be to insert
C++-specific versions of them in the C++ runtime library, and depend on
preemption to achieve the replacement. This works as long as libC.so
precedes libc.so in the library list.
There are other possible non-integrated solutions,
but given the assumption of the underlying IA-64 ABI,
and the fact that the second solution above can work without changing
the underlying C run-time library,
it doesn't seem necessary to consider them.
What is an issue, however, is that the application could in theory be
linked on a different system than the one where it ultimately runs,
and therefore presumably on a different system than that which built
the run-time library DSOs. It is that interface which we need to pin
down, namely (a) what routines (with what interfaces and semantics)
must be present in libC.so/libc.so, and (b) what sequences of calls
the libraries may assume the program will make.
We appear to be agreed on the presence of __cxa_atexit and
__cxa_finalize in libC.so, on the registration of C++ destructors
and C atexit cleanup with __cxa_atexit, and on the use of
__cxa_finalize for destructor execution upon early unloading.
The open questions are (1) whether (or how) on_exit registration can
be integrated, and (2) how the final cleanup is invoked.
The original proposal adopted ignored (1) out of ignorance, and
answered (2) by specifying a call to __cxa_finalize(NULL). If (1) is
addressed by calling __cxa_atexit for on_exit with a parameter, and
passing an additional exit code parameter to __cxa_finalize (and thence
to all the finalization actions it invokes), this works, i.e. on_exit
works as currently defined by Sun and is properly integrated into the
finalization order. But that assumes that the exit code is available
for passing to __cxa_finalize, which may imply calling it from exit if
it's not available to a .fini_array routine (which was what the
original proposal specified).
Mark points out that it works to just assume that exit does the
call to __cxa_finalize, or performs the equivalent processing,
eliminating the need for the explicit __cxa_finalize call in
.fini_array. This is slightly simpler in that it doesn't require
generation of the .fini_array entry, and the library implementation
can coordinate features like on_exit without exposing the interfaces
necessary to implement them. It also probably preserves more
faithfully the traditional semantics that atexit routines are executed
before the main program .fini_array, although doing __cxa_finalize
first in the latter should produce the same effect.
Note that we can't just not choose -- one approach requires the builder
of the main executable to insert a .fini_array entry, while the other
doesn't -- unless we want to require the run-time to handle either,
which doesn't seem useful.
My current preference is to proceed with Mark's proposal, requiring
that exit handle the __cxa_atexit -registered calls (but _not_
requiring that anyone explicitly register __cxa_finalize or anything
else to accomplish that). Upon re-reading all the mail, this seems
quite workable. In any case, I'll re-open the issue and we can discuss
it next time.
[000504 All]
[990630 HP -- Christophe]
A rough idea from Christophe's original vtable layout proposal
has been incorporated in the
Draft C++ ABI for IA-64.
[000217 All]
Coleen has generated a proposal.
[000308 All]
Discussed and clarified the proposal.
Jim will clarify the content descriptions.
Coleen will describe how some of the base vtables can be eliminated
from the construction vtable groups given vbase promotion.
She should be out to California in two weeks,
so we can finalize this issue.
[000323 All]
Discussion clarified the two proposals and their relative merits:
It was decided that the space savings outweighed the lost optimizations,
and proposal B was adopted.
Jim will clean up the writeup for final adoption.
For the record,
following are several issues that have been raised and resolved in the
process of developing this proposal:
[000504 All]
[990729 all]
Some implementations combine destructors with deletion,
checking a flag in the destructor to determine whether to delete.
This produces somewhat less code,
especially if there are many delete() calls.
However, it adds overhead to any destructor which does not require
deletion, e.g. base and member objects, automatic objects.
There is some concern that a runtime test is sometimes required,
but noone has yet identified why.
[990819 Cygnus -- Jason]
The [above] questions the usefulness of calling op delete from the destructor.
But it's required by the language,
in case the derived class defines its own op delete.
This only applies to virtual dtors, of course.
One option would be to have two dtor slots, one which performs deletion
and one which doesn't.
The advantage of this sort of approach would be avoiding pulling in all
the memory management code if you never actually touch the heap.
Microsoft has a patent on this device,
but the old Sun ABI also talks about it,
which seems to qualify as prior art.
[991014 all]
One solution to the problem with destructors is to have
two destructor entry points, and two destructor slots in the vtable.
One entry point destroys the object and then calls operator delete,
the other destroys the object without calling operator delete.
We can use a similar solution for constructors
(but without any impact on the vtable layout):
one entry point for constructing a complete object,
another for constructing a subobject.
Note that one of the entry points may call the other, but that's not
an ABI issue and can be left to individual implementors.
There was general agreement that this is a promising idea.
We don't have a detailed proposal yet.
HP is working on a prototype implementation.
Christophe will submit a writeup.
[991028 all]
We distinguish the delete/no-delete cases by distinct entrypoints,
so only a
[991028 all]
We will produce two constructor entries,
one in-charge (constructing virtual bases) for a most-derived object,
and one not in-charge for a base subobject.
The object allocation will be the responsibility of the caller,
so there will be no variation or parameters for that purpose.
[990701 All]
Daveed and Matt will attempt to pin down the copy requirements with the
Core committee, i.e. when a non-trivial copy constructor may be elided.
The relevant Standard requirement is 12.8/15,
and there is an open defect report related to this question.
For cases where the ctor may not be elided,
we expect to perform the copy at the call site,
and pass a reference.
[990729 All]
Matt will produce a clear proposal for when the ABI will elide the
constructor (and therefore pass the class object like a normal C struct),
based on the Standard's exceptions.
[990805 All]
There are no cases where a non-trivial copy constructor can be simply
elided for all instances of a particular parameter.
Therefore, we shall use the consistent convention that,
if a value parameter's (class) type has a non-trivial copy constructor,
the caller will allocate space for it, perform the copy,
and pass a reference.
Note that the standard does allow the caller,
if the value being passed is a temporary,
to construct the temporary directly into the parameter memory
and elide the copy constructor call.
[991028 all]
For value parameter types with a non-trivial copy constructor or destructor,
a call handles the parameter as follows:
[991028 all]
For classes with virtual bases,
the Standard allows a synthesized copy assignment to copy the virtual
bases multiple times, but does not require it.
The simplest approach, recursively copying the base objects,
will cause multiple copies for virtual bases with multiple inheritance
paths.
This can be avoided by synthesizing a second copy assignment operator
which does not copy virtual bases, to be called when assigning a subobject.
The decision was made not to do so,
on grounds that the situation is rare,
and virtual bases are often empty besides,
so that the solution is not worth the resulting code bloat.
[000130 Cygnus -- Jason]
I don't see any reason to return a value from constructors,
since we will always pass in the address of the object.
g++ currently returns that address, for historical reasons
(previously, to support assignment to 'this').
[000131 IBM -- Mark]
Currently, we use the returned value from the ctor for cases like S().i.
It wouldn't be hard to change the compiler,
but we do need a decision one way or another.
[000308 All]
Decided to return void.
Open another issue (C-13) to consider alternate allocating constructors
(low priority).
[000308 HP -- Christophe]
We should consider defining alternate constructors which
allocate the object before constructing it.
[000803 All]
The definition in the Draft, section 5.2.5, is accepted.
[000309 All]
We have defined a mangling for the guard variable object (issue F-1),
but we need to define at least its size and either its content or
a library interface to it.
This is tied up with multithreading issue G-4.
If we want the initialization to be implicitly thread-safe,
the object probably needs to contain both an initialized flag
and a thread semaphore,
and it is desirable that they be in different cache lines.
[000511 All]
The size of the guard variable is 64 bits.
The low-order byte shall contain the value 0 prior to initialization of
the associated variable, and 1 after initialization is complete.
Usage of the other bytes of the guard variable is
implementation-defined.
[000628 CodeSourcery -- Mark]
__cxa_vec_new and __cxa_vec_delete would be a lot more useful if
they accepted pointers to the allocation and deallocation functions
as well.
As it is, they are hard-wired to use the `::operator new[]' and
`::operator delete[]'.
Since the whole purpose of these functions is to provide compilers
a convenient way to manage construction and destruction,
I think we should either add allocation/deallocation
routine pointers to these functions,
or add additional entry points.
This additional flexibility would also be useful for C++-compatible
allocations from other languages, etc.
[000706 All]
We agreed to do this. Jim will write it up.
[000720 All]
Accepted as documented.
[000628 CodeSourcery -- Mark]
I think we should also add a runtime support routine for copy
constructors. Here's a sample definition:
This routine will be useful to compilers when copying a structure
containing an array. The EDG front-end uses this method.
[000706 All]
We agreed to do this. Jim will write it up.
[000720 All]
Accepted as documented. NULL constructor is not allowed.
An allocating version is not needed.
[000724 SGI -- Dehnert]
I just noticed that the IA-64 psABI requires returning large aggregates
(over 256 bits except for some floating point ones) via a buffer
allocated by the caller and passed in r8. We have specified in the C++
ABI that class results with non-trivial copy constructors be returned
in a buffer allocated by the caller and passed as an implicit first
parameter (i.e. in out0, not in r8). I suggest that we make these two
cases consistent, i.e. pass the buffer address in r8 instead of out0.
(This would not affect non-IA-64 compilers.)
[000817 All]
Accepted.
In all cases where a result class object is returned in a buffer
created by the caller,
the buffer address will be passed in r8,
and not like an implicit first parameter.
[000806 CodeSourcery -- Mark]
The ABI doesn't say whether or not the constructor and destructor
parameters may be NULL for many of the functions. In some cases, it
does say that the pointers may not be NULL.
I believe that a) the spec should explicitly specify this everywhere,
and b) we should allow NULL pointers whenever it makes sense. These
are convenience routines; why not make them convenient?
For example, why not allow __cxa_vec_new2 to be used with both NULL
constructors and destructors? The caller should then pass in zero for
the padding size, of course. There's no reason to try to make these
routines go fast -- they're just their for convenience, and the memory
allocation/function call indirection overhead will swamp a few
conditionals on NULL parameters.
[000824 CodeSourcery -- Mark]
`constructor' and/or `destructor' may be NULL.
The destructor may be NULL if and only if the padding_size is zero.
`constructor' and/or `destructor' may be NULL.
The destructor may be NULL if and only if the padding_size is zero.
`alloc' and `dealloc' may not be NULL.
`constructor' and/or `destructor' may be NULL.
`destructor' may be NULL.
`destructor' may be NULL.
`dealloc' may not be NULL.
`constructor' and/or `destructor' may be NULL.
[000831 All]
For reference, we have design information as follows:
[990902 All]
We observed that there are three levels at which we can discuss EH
compatibility.
The first, minimal level is effectively that of the definition in the
IA-64 Software Conventions document.
It describes a framework which can be used by an arbitrary implementation,
with a complete definition of the stack unwind mechanism,
but no significant constraints on the language-specific processing.
In particular, it is not sufficient to guarantee that two object files
compiled by different C++ compilers could interoperate,
e.g. throwing an exception in one of them and catching it in the other.
The second level is the minimum that must be specified to allow
interoperability in the sense described above.
This level requires agreement on:
The third level is a specification sufficient to allow all compliant
systems to share the relevant runtime implementation.
It includes, in addition to the above:
The vocal attendees at the meeting wish to achieve the third level,
and we will attempt to do so.
Whether or not that is achieved, however,
a second-level specification must be part of the ABI.
Here is a quick description of the personality routine interface
and semantics. This description is a slight extension of the existing
personality routine implemented by HP for IA64. The extension is to
allow multiple runtimes from possibly different vendors or for
possibly different languages to cooperate in processsing an
exception. This document assumes that the chapter 11 of the Intel/HP "IA-64 =
Software Conventions and Runtime Architecture" document is known to =
the reader. INTERFACE: The complete exception processing framework consists of at least the =
following routines: _RaiseException, _ResumeUnwind, =
_DeleteException, _Unwind_getGR, =
_Unwind_setGR, _Unwind_getIP, _Unwind_setIP, =
_Unwind_getLanguageSpecificData, =
_Unwind_getRegionStart. In addition, a language and vendor =
specific personality routine will be stored by the compiler in the =
unwind descriptor for the stack frames requiring exception =
processing. UNWIND RUNTIME ROUTINES: The unwind runtime routines have the following interface and =
semantics (all routines are extern "C"): uint64 _RaiseException(uint64 exception_class, void =
*exception_object); Raise an exception, passing along the given exception class and =
exception object. The exception object has been allocated by the =
language-specific runtime, and has a language-specific format. =
_RaiseException does not return, unless an error condition is =
found (such as no handler accepting to handle the exception, bad stack =
format, etc). void _ResumeUnwind (void =
*exception_object); void _DeleteException(void =
*exception_object); uint64 _Unwind_getGR(void *context, int index); uint64 _Unwind_getLanguageSpecificData(void =
*context) uint64 _Unwind_getRegionStart(void =
*context) PERSONALITY ROUTINE: The personality routine is defined with the following =
interface: [Note: the frame_handle argument has been removed: it was used only =
once in the runtime, and the cost of reading it back from the exception =
object is really minimal, compared to the cost of having to spill it in =
all landing pads... The context argument type has been made opaque] The arguments have the following role and meanings: version: Version number that the compiler and personality =
routine agree on, identifying for instance language-specific table =
format. This version number is read from the unwind information block =
(unwind tables) phase: Indicates what processing the personality routine is =
supposed to perform. The possible actions are described below under =
'UNWINDING PHASES' exceptionClass: An 8-bytes identifier specifying the type of =
the thrown exception. By convention, the high 4 bytes indicate the =
vendor (for instance HP\0\0), and the low 4 bytes indicate the language =
(for instance C++\0.) [Note: For C++, it is expected that agreement will =
be reached on a common 'exceptionObject', but different vendors may =
still chose to have different personality routines with different table =
formats.] exceptionObject: The pointer to a memory location recording =
the necessary information for processing the exception according to the =
semantics of a given language. [Note: For C++, it is assumed that the =
format of this exception object can be agreed upon, even if we disagree =
on the LSDA and/or landing pad registers or similar =
details.] context: Unwinder state information for use by the personality =
routine. This is used by the personality routine in particular to access =
the frame's registers. [Note: I don't see how anything could work =
without a minimal common unwinder interface - which is why it has been =
defined above] return value: The return value from the personality routine =
indicates how further undinwind should happen, as well as possible error =
conditions. See "UNWINDING PHASES" below for =
details. UNWINDING PHASES Unwinding is a 2-phases process. PASS 1 unwinds through the stack, looking for a "handler", =
that is a code that has the potential to stop the exception propagation. =
For C++, this would be a 'catch' clause. The first pass can do a =
"quick" unwind, meaning it does not need to maintain full =
registers state. PASS 2 starts once a handler has been found. For each stack frame =
that requires some cleanup, it performs that cleanup. For C++, this =
would be destructors in addition to catch clauses. If compensation code =
for some optimization is required, this is also the pass this code will =
be executed. During that pass, the stack is actually unwound, and full =
register state is restored prior to executing any cleanup, compensation =
or handler code. [Note: Cleanup code is code doing some user-defined cleanup such as =
destructors. Compensation code is code inserted by the compiler to =
compensate for an optimization that moved code past the throwing call. =
Handler code is user-defined code that possibly can resume normal =
execution] The unwinding phase argument to the personality routine is a bitwise =
or of the following constants: EH_FORCE_UNWIND =3D 32: During pass 2, indicates that =
no language is allowed to "catch" the exception. This flag is =
set while unwinding the stack for setjmp or during thread cancellation. =
User-defined code in a catch clause may still be executed, but the catch =
clause has to resume unwinding at its end. TRANSFERRING CONTROL TO A LANDING =
PAD: In the case the personality routine wants to transfer control to a =
landing pad, it setups registers (including IP) to suitable values for =
entering the landing pad. Prior to executing code in the landing pad, =
registers not altered by the personality routine will be restored to the =
exact state they were in that frame before the call that threw the =
exception. The landing pad can either resume execution to normal (as, for =
instance, at end of a C++ catch), or resume unwinding by =
calling the _ResumeUnwind function and passing it the =
'exceptionObject' argument received by the personality routine. =
_ResumeUnwind will never return. _ResumeUnwind should be called if and only if the =
personality routine did not return EH_HANDLER_FOUND during =
phase 1. In other words, the unwinder can allocate some resources (for =
instance memory) and keep track of them in the exception object reserved =
words. It should then free these resources before transferring control =
to the last (handler) landing pad. It does not need to free the =
resources before entering non-handler landing-pads, since =
_ResumeUnwind will ultimately be called. The landing pad will receive various arguments from the runtime, =
typically passed in registers set using _Unwind_setGR by the =
personality routine. For a landing pad that can lead to =
_ResumeUnwind, one argument must be the =
exceptionObject pointer, which must be preserved to be passed =
to _ResumeUnwind. [Note: Thanks to the 4 reserved words in the =
exception object, 2 landing-pad arguments have been eliminated.] The =
landing pad may receive other arguments, for instance a 'switch value' =
indicating the type of the exception being caught. RULES FOR CORRECT INTER-LANGUAGE =
OPERATION: The following rules must be observed for correct operation between =
languages and/or runtimes from different vendors: An exception which has an unknown class must not be altered by the =
personality routine. The semantics of foreign exception processing =
depend on the language of the stack frame being unwound. This covers in =
particular how exceptions from a foreign language are mapped to the =
native language in that frame. If a runtime resumes normal execution, and the caught exception was =
created by another runtime, it should call _DeleteException. =
This is true even if it understands the exception object format (such as =
would be the case between different C++ runtimes). [Note: This is =
because the other runtime might have to update some global variables =
that point to the exception being deleted.] A runtime is not allowed to catch an exception if the =
EH_FORCE_UNWIND flag was passed to the personality =
routine. CATCHING FOREIGN EXCEPTIONS IN C++ Foreign exception can be caught in a catch(...). They can =
also be caught as if they were of a __foreign_exception class, =
defined in <exception>. [Note: The =
__foreign_exception may have subclasses, such as =
__java_exception and __ada_exception, if the runtime =
is capable of identifying some of the foreign languages.] The behavior is undefined in the following cases: A __foreign_exception catch argument is accessed in any way =
(including taking its address). A __foreign_exception is active at the same time as another =
exception (either there is a nested exception while catching the foreign =
exception, or the foreign exception was itself =
nested) uncaught_exception(), set_terminate(), =
set_unexpected(), terminate() or unexpected() =
is called at a time a foreign exception exists (for instance, calling =
set_terminate() during unwinding of a foreign =
exception) [Note: All these cases might involve accessing the C++ specific =
content of the thrown exception, for instance to chain active =
exceptions] Otherwise, a catch block catching a foreign exception is =
allowed: To resume normal execution, thereby stopping propagation of the =
foreign exception and deleting it, Or to rethrow the foreign exception. In that case, the original =
exception object should have been unaltered in any way by the =
C++ runtime. A catch-all block may be executed during forced unwinding. For =
instance, a setjmp may execute code in a catch(...) during stack =
unwinding. However, if this happens, unwinding will proceed at the end =
of the catch-all block, whether or not there is an explicit =
rethrow.
[990923 All]
[991202 All]
[991209 All]
[991216 All]
HP management has agreed to release the C++ exception handling runtime,
but don't consider their unwind library suitable for release.
SGI has agreed to release their unwind library.
SGI is now (5 Jan) working on ABI conformance in preparation for doing so.
It was clarified (and should be in the document)
that unwinding determines in Phase 1 that an exception will be uncaught,
and calls
[990826 Intel/HP]
The Software Conventions document is claimed to specify the interface,
with the parameters indicating which action is required.
(I can't find it, but this would be an acceptable solution -- Jim.)
[991209 all]
Observe that this issue is part of a
level 1 specification,
i.e. part of the base ABI.
It is being described as part of the
proposed common EH interface from HP.
[000106 all]
Closed -- specified as part of the accepted
exception handling specification.
[991209 all]
Observe that this issue is part of a
level 1 specification,
i.e. part of the base ABI.
It is being described as part of the
proposed common EH interface from HP.
[000106 all]
Closed -- specified as part of the accepted
exception handling specification.
[990902 All]
Discussion reveals that Intel and HP have very different models of how
cleanup actions are handled.
Intel builds one or more routines which are called from the unwind runtime,
based on action descriptors in the unwind tables,
and acting on the stack contents or objects to be destroyed
without actually modifying the stack pointer until the final transfer
of control to the user handler.
This approach avoids actually restoring registers until the final
transfer to the handler.
HP transfers control back to a user landing pad whenever anything needs
to be done -- descriptors or handlers --
and reenters the unwind runtime if further processing is required.
They believe this approach to use much less space than the action
descriptors would,
and most importantly,
that it allows arbitrary fixup for code motion around the call that throws.
[991209 All]
An implementation can conform with the proposed C++ personality routine
interface and either support or not support nested handlers --
the only requirement is that the generated personality tables and
routine collaborate.
The
proposed common EH interface from HP
does not use nested functions as handlers,
but could easily be extended.
This issue is closed, with the immediate resolution of changing the
base unwind ABI to not require nested function handlers.
[990908 SGI -- Jim]
We propose that this be resolved by identifying the source language in
the exception descriptor and specifying that the personality routine be
able to perform cleanup actions during handling of foreign-language
exceptions, but not attempt to catch them.
[991006 All]
The concensus of the group,
from the discussion of the low-level exception API, is:
[991007 All]
In addition to the above,
Christophe will define an exception __foreign_exception to be used
by foreign-language code which wants to raise an exception that C++ can
catch.
Close this issue.
[990908 SGI -- Jim]
The typical case of cleanup and resume is floating point trap handling,
which is normally handled entirely in the original FP trap handler.
Is there an example where stack walkback must occur to identify the
handler, but resumption at the point-of-exception is required?
I can't think of any, and I think the model of registering a trap
handler is preferable for such purposes.
[991006 All]
This common ABI will not allow throwing exceptions from a signal handler.
[991007 All]
There remains concern about how to help customers
(examples were presented of big database applications)
for which raising exceptions from signal handlers for I/O failures
is a highly desirable design.
We will revisit this issue.
[991209 All]
Further discussion clarified the situation.
The fundamental problem is that signals thrown from a signal handler
(or otherwise asynchronously)
may appear at arbitrary points in the program,
where the unwind information is inadequate to reliably clean up,
for instance because global variable updates have been moved across the
point of exception.
A second problem is that signals are often processed on their own stack,
and making the transition to the main user stack might not happen
automatically.
As a result, it was generally agreed that dealing with exceptions
raised asynchronously would require simply passing through the
immediately enclosing stack frame (to avoid the first problem),
and a special raise invocation (as a basis for addressing both).
However, the only customer that has been adamant about supporting
asynchronous exceptions has also been adamant that such a partial
solution would not be adequate.
Their intended application involves raising the exception in a simple
routine that they expect to be inlined (for performance reasons)
directly into a try block,
which would be bypassed by the proposed solution.
Since making this work would involve significant performance penalties
elsewhere, the group's concensus is that there is inadequate benefit
from an attempted solution.
[990908 IBM -- Mendell]
Does longjmp run destructors?
I believe that the C ABI makes this optional.
I would like to propose that it does run destructors.
[990908 SGI -- Wilkinson]
The C++ standard, 18.7 paragraph 4,
says a call to longjmp has undefined behavior if any automatic objects
would have been destroyed by a
throw/catch with the same source and destination.
I don't see that this is something we need to fix.
[990908 IBM -- Thomson]
Yes it does, but ANSI is not my customer.
Meeting the bare minimum of function that ANSI requires
doesn't necessarily mean that users can build robust applications.
How can they know to avoid longjmp in their C code,
because some third party library they are using has C++ buried in it?
[990908 SGI -- Dehnert]
Implementation is a significant issue.
The normal longjmp implementation is very simple --
setjmp stores the register/stack state,
and longjmp copies it back and branches.
There is normally no traceback involved,
so what you suggest is a dramatic change,
and probably would make C people very unhappy.
Furthermore, C++ users have the option of using C++ exceptions,
which have the effect you seek.
[990908 SGI -- Boehm]
The problem is that on the C side:
I don't know whether it's possible to avoid breaking these clients
while providing the stack-unwinding semantics.
[990908 IBM -- Mendell/Thomson]
[VisualAge C++] on OS/2 and Windows does do the unwinding.
This is probably because unwinding support is in the OS.
Also OS/390 and I believe AS/400 too.
Our AIX implementation does not do the unwinding.
[990909 DEC -- Brender]
In addition to the systems already mentioned by
others, these systems also do exception-handling compatible unwinding
for C's setjmp/longjmp:
If you believe in safe and compatible multi-language systems,
there really is no choice but to do EH compatible unwinding for
setjmp/longjmp -- at least by default.
I suppose it would be OK for an implementation to offer an alternate
setjmp/longjmp that could be linked in for those who either know that
it is safe in particular cases or are happy to trade safety for speed...
[990909 All]
A brief discussion agreed that concensus is not absolutely necessary.
An implementation could replace setjmp/longjmp with a version that
either unwinds or justs restores and jumps,
without breaking any code except that which assumed one or the other.
(Ed.: In fact, if setjmp stores enough information to either restore
or to catch an exception, one could just swap longjmp,
although that would not be optimal for the unwind and catch case,
since setjmp doesn't need to save much information in that case
as most of what is needed is in the unwind descriptors.)
[990923 All]
We agreed that:
See the HP low-level exception writeup at the beginning of the
exception issues section.
[991216 All]
Use the name
[000106 All]
We agreed to define a new header for ABI definitions,
initially containing this and the special exception objects agreed upon.
SGI will create an initial version.
We also agreed to put ABI-defined new features in an "abi" namespace.
Therefore, for this issue, we have a prototype in
[000113 All]
Accept as described and close.
[991216 All]
This is essentially Section 8 of the HP working paper.
SGI has reworked it into the
draft exception handling specification.
This group needs to approve the reworked version,
at which time it can be submitted to the base ABI group.
The draft needs to clarify that the unwinder will detect uncaught
exceptions in Phase 1, and call
[000120 All]
Close with minor modifications.
[000106 All]
Christophe will extract a list of what the HP library expects and send
it.
[000120 HP -- Christophe]
Data types:
Functions:
Extra expected functionality:
[000201 SGI -- Jim]
Two sets of functionality are provided:
once-only initialization,
and thread-private data key management.
The group also wants a means of identifying whether the real pthreads
implementation is present -- I have not yet proposed such a feature.
The purpose of the
The first time
The default single-threaded implementation need not lock
accesses to
The purpose of this functionality is to allow a program to manage
data segments which are specific to a particular thread,
but are identified by a key common to all threads.
It is required in the C++ exception handling library, for example,
to maintain thread-specific active exception lists.
The user program must first create a key variable of type
If successful,
A user thread may then associate a value with the key,
typically the address of a thread-specific data area,
by calling
Later, a thread can obtain the value it has associated with the key
by calling
[000203 All]
These functions are only used in the exception handling library at Level 3,
i.e. they are part of the interface between the system-specific
implementation and other system-provided libraries,
and do not involve interfaces to either compiled code or other
components not under control of the system vendor.
Therefore, no specification is needed.
[991216 SGI -- Jim]
The unwind table consists of triples:
a begin and end location bounding the code fragment described by the
unwind descriptors,
and the location of the unwind information for this fragment.
The base psABI states that these are segment-relative offsets,
to avoid the need to relocate them at runtime.
It also specifies a section type and name for the unwind table,
with attribute
The psABI specification leaves open the question of how to identify the
relevant segments for the unwind table segment-relative entries.
There are several possibilities:
Forcing the unwind information tables into the text
segment is constraining.
Given that their format is undefined by the ABI
(i.e. the language-specific data area),
the severity of that constraint is not fully predictable.
It would, for example, interfere with the bias in some systems to
avoid data in text segments.
[000120 HP -- Cary]
To overcome any limitations that placing info blocks in text might impose,
we designed the LTV family of relocations,
which allows a link-time virtual address to be placed in an info block
without requiring a dynamic relocation;
the consumer is expected to be able to calculate from context what
segment the LTV address refers to so it can relocate the address on the fly.
We also have the LTOFF_FPTR family of relocations,
which is needed to identify the personality routine as a gp-relative
offset to a linkage table entry that contains the function pointer.
The advantages to this scheme are that there are no dynamic relocations
for any unwind information
(except function pointers in the GOT created by LTOFF_FPTR),
and that the unwind information does not cause any
increase in the application's per-process data space.
In order to unwind correctly,
it's important that there is a one-to-one
relationship between text segments and unwind tables.
The dynamic loader needs to keep track of these relationships,
so that the unwinder can find the appropriate unwind table,
given a pc value.
Instead of a table of triples,
there is a PT_UNWIND program header table entry that locates
the unwind information for a load module;
this entry is intended to refer to a subset of the text segment.
It's through this entry that the dynamic loader finds the unwind table.
[991224 SGI -- Jim]
[000126 HP -- Cary]
Re. multiple text segments...
Our position is that we would only need more than one text segment in a
single load module where we need to establish different access
permissions for some text pages than for others.
In such a case, we consider them to be separate -- but contiguous --
text segments from the loader's point of view,
and a single text segment from the unwinder's point of view.
Therefore, we still need only one unwind table per load module.
This points out the hazy definition of "segment" and "program header
table entry" in the ELF specification.
Some program header table entries describe segments that are disjoint
from all other segments,
while others (like PT_DYNAMIC and PT_UNWIND)
describe "sub-segments" that are really part of another segment.
Re. unwind tables in data...
The performance bigots here would *never* let me put the unwind tables
in the data segment.
Nevertheless, if some language-specific data really needs to be in data,
it can be arranged by putting "LTV" pointers in the language-specific
data that point to an auxiliary block of info in the data segment.
A much earlier version of our C++ exception handling tables
in fact did just that.
("LTV" pointers are "link-time virtual" addresses.
At link time,
an LTV relocation works just like the corresponding DIR relocation,
except that no dynamic relocation is generated,
so the associated word can be placed in a read-only segment.
The consumer of that pointer must, at run time,
figure out what segment the link-time virtual address refers to and
apply the appropriate relocation factor to the address.
The required information can be obtained from the dynamic loader.
Note that this scheme requires that the linker-assigned addresses
for all of the loadable segments do not overlap.)
No, but the dynamic loader does have access to it.
When we need to find an unwind table,
we ask the dynamic loader:
given a pc value, its
[991226 SGI -- Jim]
[000203 All]
It contains references:
It contains references:
It contains references:
[000323 Inprise -- Eli]
The updated proposal seems to handle most of my concerns,
but I'd still like to see the PC map hidden,
so that language implementors can do as they see fit with this.
To that end, I'd like to toss out the following additions.
Note that these are tentative,
based on my fiddling with it just a bit for the past day or so.
I'm going to do a prototype to see how it holds together.
I would like to see the unwind tables registered with the _Unwind library,
and referenced only through callbacks,
like this:
The first function takes the address of a lookup function which returns
a personality and pointer to implementation specific data based on an IP.
Start and end addresses are made available so that the _Unwind library
can optimize calls to these routines.
When an exception is raised, the _Unwind
library looks up the current IP by calling these registered procedures.
The need for something like this was implied in the Intel Software
Conventions and Runtime Architecture Guide,
Chap 11 (SCRAG is what I'll call it).
Section 11.1.2 says that the dynamic loader needs to provide an API
for finding the unwind table.
I've just changed the 'ownership' of the data a bit.
The second function lets you uninstall a lookup function.
That's for when you're unloading,
and you don't want to leave bad fn pointers floating.
Yes, the RTL for the language does have to cooperate,
or things can go south a considerable time after a module unloads.
The personality routine as it is stated in the C++ ABI doesn't have the
implementation specific data passed to it.
I'd like to add that:
The ImplementationData parameter is the item that is returned by the
lookup function that resolves the personality for a given IP.
Given these changes,
the format of most of the unwind data in chapter 11 of
the SCRAG becomes mostly advisory
(the frame info was already made so by the current document).
Chapter 11 could essentially become an appendix implementation that
could be used by implementors if they chose,
but not forced on them.
The other thing that I like about the lookup registering
is that it allows implementors to innovate with respect to fast lookup
schemes within a loadable module.
The current scheme allows for no innovation whatever.
I'd prefer that the implementors be left with the option to build as
fancy or as simple a scheme for lookups and frame decomposition as possible,
depending on the needs of the language.
[000406 All]
Cary Coutant mentioned in a private conversation that he expects this
to be handled by having the Java compiler (for example)
register additional unwind tables with the dynamic linker.
Since the HP implementation gets the table locations from the dynamic
linker, this makes the additions transparent to the unwind library.
[000406 HP -- Christophe]
I had a discussion with Laurent Morichetti a few minutes ago.
It is unclear at that point whether their unwinding would be based
on the unwind library at all
(there are alternatives, such as encoding unwind information themselves).
But assuming they want to leverage all the code that deals
with the RSE and all that magic,
they need to have a way to be compatible with the unwind library.
Today, the unwind library uses dlmodinfo to find the start of the code
segment for the current IP
(and a predefined symbol in the case of archive-bound executables).
From there, it can find the start of the unwind table,
and from there do a binary search on the IP to find the unwind info block.
The JVM could be compatible with this black magic by having a way to
tell dld what to return for the newly created code segment.
I don't think there is a public dld interface to do that,
and it creates a rather obscure and difficult
to document dependency between the JVM,
the unwind library and dld.
Alternatively,
we could have a couple of APIs to do IP->UnwindInfo translation,
and to register a new range of text and provide the corresponding
unwind info pointer.
In that scheme,
the actual location of the unwind table would become irrelevant.
Also note that in addition to Java support,
an implementation of Dynamo for IA64 would probably have a similar problem.
[000502 SGI -- Jim]
[000504 All]
To deal with applications that create code and unwind information
dynamically (e.g. Java JITs),
the base ABI should define an interface by which the application
can register a new code/unwind data pair with ld.so.
This issue has been submitted to the psABI group.
[000106 All]
Coleen will send a description of their thread cancellation mechanism.
[000120 All]
Close with minor modifications.
Christophe will send a thread cancellation example writeup.
[991216 All]
Define how
[991216 Compaq - Coleen]
If you need to clean up more than one live exception from a
catch handler, don't you need a 'count' parameter to
__cxa_end_catch? In this case, you destroy both X and
Y objects (whether or not they're both on the stack,
or just X is).
Our equivalent of end_catch has a count parameter which
is set to the number of live exception objects to
delete and is used for branching out of the nested catch
clause (not by rethrow).
[991217 HP -- Christophe]
All you need is a flag set by __rethrow,
saying "this top exception is the one being just rethrown".
In that case, when __end_catch finds that the exception
exits its last catch block, it will not delete it.
Instead, the exception will just be popped from the stack.
As a result, the exception being rethrown remains on the caught
stack until you exit the last catch that caught it,
and then becomes referred to only through the exception object
passed in the runtime
(that is, it becomes similar to a new exception being thrown:
it does not appear in the caught stack.)
This is the "stack + 1" model I mentioned...
__begin_catch clears the flag,
in case you catch the rethrown exception before
exiting the last catch handler.
This mechanism is actually correctly specified in the description of
__cxa_end_catch (see in particular the last bullet):
Upon exit from the handler by any means,
the epilogue calls __cxa_end_catch(),
which:
What is unclear, though,
is the fact that __rethrow needs to pass a flag to
__end_catch for that purpose,
and also that the flag is stored in the high bit of the handlerCount
(which is why it did not appear in the specification...).
[000112 editor]
[000126 editor]
The answer to the above question is yes.
This issue is effectively closed,
but I will not close it officially until the
working paper
reflects the clarifications in the email discussion.
[001109 Editor]
These routines are specified adequately in the Exception ABI document.
[991216 All]
C++ allows the user to register
Several members believe the second choice (per-thread) would be very
surprising to many users and is therefore a highly undesirable default.
[000106 All]
Handler registration is global, applying to all threads.
It is observed that the global handler can be programmed to do
thread-specific processing, e.g. by keying off a per-thread datum,
but that many users would find it very surprising if the registration
only worked for the calling thread.
[991216 All]
The working paper specifies this,
but HP wishes to propose a different representation.
[000106 All]
Christophe believes the submitted version may actually be the desired one.
He will attempt to determine this,
and others should look at it closely to determine whether it has a
large combinatorial impact on the compiler.
[000113 All]
Noone has identified a problem with the proposal in the HP document.
Close this issue, and it can be reopened if a problem surfaces.
[000629 CodeSourcery -- Mark]
Both EDG and G++ call run-time library routines to throw the bad_cast
and bad_typeid exceptions, rather than trying to expand the throws
inline. This is much more convenient since those exceptions can be
thrown without the headers declaring bad_cast being included. I think
we should follow this existing practice and provide appropriate entry
points. How about:
[000629 CodeSourcery -- Nathan]
FYI, the G++ declarations are
One side of a conditional expr can be void, but only if it is a throw
expression, wrapping up the throws in function calls hides that, and
in g++'s case caused problems. The easiest solution was the above
declarations.
I suggest the following:
[000629 CodeSourcery -- Mark]
That's a reasonable suggestion, too. With a `void' return, you can
always do:
[000706 All]
Accepted as originally proposed by Mark, without return types.
The decision is intended to not burden the routines with dummy returns,
since callers with ?: operators can use casts to achieve the desired
result.
[001012 all]
Making this type be a pair (type_info and destructor pointers)
makes it necessary that a thrower or
We propose that instead,
we replace the
We also noticed that,
if the thrown object is an array,
the destructor passed will need to be a fabricated one which
loops over the array elements.
The alternative,
to store the array bounds explicitly in the exception object,
seems to be a lot of overhead for a very rare case.
[001109 all]
The interface change will be made.
[000511 All]
[000511 All]
[991019/28 various]
The following is assembled from several mail messages on the subject.
For entities with C name linkage,
the entity's linkable name is identical to its base name (as usual).
Note that linkable names include not only names with
C++ global scope semantics,
but also "local" names which for some reason end up
requiring linker resolution
(e.g. static local variables declared in inline functions).
Note also that inlining requirements apply equally to functions
declared inline and those chosen to be inlined by the compiler.
For function-like entities with C++ name linkage,
the following components MUST be part of the of the name:
[ For the last item, consider:
In addition, it may be desirable to encode the following components:
Namespace scope variables and static data members have
linkable names that must include at least:
Note that although there are benefits to encoding array size,
and therefore being able to catch mismatches,
the ability to declare
fundamental types:
type modifiers/constructors:
The types in parentheses are available in C99,
but not in standard C++.
[991021 all]
[991028 all]
Objectives of a specification were discussed,
and have been added to the writeup above.
[000127 IBM -- Mark]
[Ed.]:
Mark raises the issue of how template expression parameters are mangled.
The Standard requires that equivalent expressions be identified,
but not all functionally equivalent ones.
The relevant paragraph is 14.5.5.1.
Don't lose this issue.
[000127 All]
Notes from the meeting:
[000210 All -- Matt]
We have agreed that local statics and local classes must be mangled.
We agreed that string literals should also be mangled even if linker
features might make it unnecessary.
The motivation is a desire to support less capable linkers on other platforms.
For local statics and local classes,
the mangled name consists of the mangled function name,
a sequence number, and the name of the local class/varaible.
For string literals the mangled name consists only of
the mangled function name and the sequence number.
(There was concern that this might prevent merging of identical string
literals.
Jason believes that given a smart linker it
will just result in multiple names for the same string literal.)
Sequence numbers are assigned in lexical order within a function,
starting at 1.
The entities that receive sequence numbers are local static variables,
local classes, and string literals.
Other entities (e.g. automatic variables)
do not receive or affect sequence numbers.
Exception specification information must be part of the mangled name
of a function.
Special entities that need to receive mangled names,
in addition to those mentioned in Daveed's document:
Exported template may require other things to be mangled. We don't
have a detailed analysis.
We discussed the idea of having a small dictionary of well known
names, so that mangled names could be shorter.
Jason was concerned with readability of mangled names
if we had too many things in this dictionary,
and Daveed was concerned that a large dictionary wouldn't give enough
of a space savings because an index would take too many bits.
If we have such a dictionary it will have very few names in it.
Some obvious candidates are:
[000215 HU-Berlin -- Martin]
The C99 standard defines an implicit variable inside of each function:
Proposal: The sequence number of __func__ is 0.
Of course, there is always discussion what the value of __func__ is in
C++ context; I think this does not necessarily need to be defined by
the ABI (or the question whether __func__ is defined at all - if it is
not used in a function, it does not matter).
[000217 Editor]
[000308 All]
[000313 SGI -- Jim]
I have reworked the description in the
Draft C++ ABI for IA-64,
to get a more precise grammar description,
and to incorporate the loose ends decisions from the meeting
and proposals for a few more.
[000316 All]
Extensive discussions in the meeting,
reflected in the updated
Draft C++ ABI for IA-64.
[000323 All]
Extensive discussions in the meeting,
reflected in the updated
Draft C++ ABI for IA-64.
The principal decisions were:
[000330 All]
Change virtual thunk mangling to encode static offset to nearest
virtual derived class.
Encode single void parameter type for parameterless functions,
to facilitate demangling distinction from data objects.
Use object name for named entities, hash for strings,
in mangling local names,
to minimize implementation mistakes.
SGI has considered solutions to this problem including modified string
tables and/or symbol tables to eliminate redundancy.
Cygnus, HP, and Sun have also considered or implemented approaches
which at least mitigate it.
[991028 all]
Cygnus and Sun use a mangling scheme which has proven extremely
effective at compression, but not overly complex.
Each time the mangler incorporates a type into a name,
it remembers it and assigns it a number,
and subsequent occurrences of the type in the name are replaced by
the (escaped) number.
Jason believes this might be adequate compression,
without going to large character sets or more complex schemes.
[991115 SCO -- Jonathon]
In a discussion with Matt Austern I suggested using a
collision-resistant hash function on the manglings to generate the
names actually used in object files.
(The algorithm is: first mangle, then hash.)
This could really reduce .o size a ton;
think expression templates, etc.
I bet this would have a much bigger impact that any
obvious compression algorithm; you could just decree that all symbols
be no longer than 256 bits long, say.
Lots of tools (assemblers, debuggers) will use less space/time
dealing with the shorter names.
You would keep around a table mapping hashes back to the original
mangled names for debugging.
An interesting twist on this would be to use a secure hash with a key.
For ordinary compilation, use some well-known key.
But, by setting some flag/environment-variable you could tell the
compiler to use a key of your choice.
You can now distribute a .o that is hard to link to --
unless you know the key.
<After a request for clarification...>
A collision-resistant hash function is a notion from cryptography.
(That's the world I spend a lot of my time in when I'm not doing
compiler stuff.)
Suppose you have an n-bit hash, so you have 2^n hash values.
A collision-resistant hash is one where the probability of two randomly
chosen strings hashing to the same value is (very close to) 1/(2^n).
A stronger notion of this is that finding strings that collide is
computationally infeasible.
Certainly, hashing introduces a probabilistic nature to things:
it becomes possible that two different functions could hash to the same
hash-mangled name.
However, by choosing a good hash function (and provably good ones exist)
and enough bits,
you can make it considerably less likely that in the next hundred years
two distinct functions will hash to the same name,
than that cosmic rays will cause unpredicatable linker errors.
... this (the assumption that mangling is reversible,
as the basis for such things as the c++filt tool)
is the biggest objection I can think of.
We originally came up with this idea for our C++-to-C translator.
We ship this to people with embedded systems whose linkers only support
16-characters;
by using a collision-resistant hash they can use C++.
Nobody has ever run into a collision.
We solved the c++-filt problem
by keeping a database mapping hashes back to mangled names.
(The probabilistic guarantee says that this database can actually be
global; in our lifetime will never see two things with the same hash.)
So, it's still possible to make a c++-filt that works,
but it is admittedly more difficult.
The biggest advantage to this scheme is that you can put an upper
bound on symbol lengths,
even if the presence of truly huge template usage.
(I've seen programs where mangled names approached a megabyte in length.)
I would only suggest hashing long names;
names under 100 characters, or even a thousand characters, say,
could be left unhashed.
[000504 All]
[000511 HU-Berlin -- Martin]
As a result, some of these names come out wrong. In particular, if
template parameters appear in the signature, I use the substituted
parameters instead of the formal ones (i.e. I never use
I've produced a table
showing how the size of EDG-mangled names relates to the new names.
For each length of an old name,
it shows how often a certain new length appeared.
E.g. for
89 : 71(2x) 72(5x)
there were a total of 7 names with 89 characters in Matt's list. Under
the new mangling, 2 of them are now 71 characters, and 5 are 72
characters in size.
In general, all names under the new mangling are shorter than under
EGG's mangling, with a single exception (listed on top). For short
names (<80char), size reduction is small, unless one of the predefined
dictionary entries is used. For longer names (>200 chars), compression
under the new ABI is about 50% better than under the EDG scheme.
If you find errors in my implementation that could be corrected from
looking at the demangled names, please let me know; I can then produce
corrected statistics.
[000210 All -- Matt]
HP will be supporting an ilp32 model as well as as an lp64 model.
The ABI only discusses an lp64 model.
Do we want to support ilp32 in any way?
What will we have to do to support
(a) compatibility between different vendors' ilp32 compilers, or
(b) compatibility between ilp32 and lp64?
HP has suggested, for example,
modifying the mangling scheme so that long long in ilp32
is mangled the same way as long in lp64.
Is this enough to ensure ilp32/lp64 link compatibility,
or would we need to make many other changes as well?
[000217 All]
The group observed that one can prevent all incorrect linkage by using
a different version prefix for LP64 and ILP32 mangling.
Christophe would prefer to just mangle those types that are different
differently, so as not to prevent linkage when it would work.
It is not clear whether mixed models are workable enough to make such a
complication useful.
Christophe will produce a concrete proposal to discuss once the base
mangling is settled enough to base it on.
[000210 all -- Matt]
Users have access to types' mangled names via the standard type_info class.
Users may sometimes want to get demangled names.
Should we provide an entry point for calling a demangler?
This might be a standalone function,
perhaps with an interface like that of EDG's demangle(),
or it might be some kind of type_info extension.
If we do this,
should we attempt to specify exactly what demangled names look like,
or should we explicitly leave it unspecified and warn users
not to depend on the exact format?
[000321 HU-Berlin -- Martin]
Suggestion:
[000330 all]
The problem with the suggested interface is that using std::string
requires sucking in half the standard library.
An alternate proposed is that the user pass in a buffer,
with a NULL pointer causing the routine to allocate storage.
Christophe also volunteered to send the HP interface,
though it is a bit heavyweight.
[000330 HP -- Christophe]
Here is the interface HP offers today.
As I said, it seems overly complicated,
compared to what Matt proposed.
On the plus side, it has handling of erroneous input,
which I believe we need to define.
[000406 all]
There was some discussion of the desirability of making the demangler a
class member.
Christophe believes it would thereby become easier to derive from it,
e.g. to tailor output.
Others believe it would add unnecessary complication;
one particular concern is that it be callable from C.
Christophe and Matt will send specific proposals.
It was observed that Martin's suggestion of two functions is unnecessary.
A name beginning with "_Z" is a <mangled-name>;
otherwise it is a type name (if valid).
[000406 SGI -- Matt]
ALTERNATIVE A
mangled_name is a null-terminated string with the mangled name.
buf is a pointer to a user-provided buffer of at least n characters.
If buf is a null pointer then n is ignored,
and demangle allocates its own buffer with malloc.
The user is responsible for freeing it.
If the return value is non-null,
it points to a null-terminated string with the demangled name.
If the return value is null, an error has occurred.
*status == 0 means the demangling failed because the buffer
wasn't long enough (or because malloc failed).
*status == -1 means the demangling failed because mangled_name is invalid.
Users may pass a null pointer as the last argument to __cxa_demangle.
All that means is that, if the demangling fails, they won't be able
to find out why.
ALTERNATIVE B
mangled_name is a null-terminated string with the mangled name.
buf is a pointer to a user-provided buffer of at least n characters.
If buf is a null pointer then n is ignored,
and demangle allocates its own buffer with malloc.
The user is responsible for freeing it.
If result.name is non-null,
it points to a null-terminated string with the demangled name.
If result.name is null,
demangling has failed and result.status gives the type of failure.
DISCUSSION
I prefer alternative A,
even though the error indication is clumsier,
because it's callable from C.
Having a C-callable demangling interface could come in handy,
e.g. for linkers.
If we decide that's unimportant, we should go with alternative B.
[000406 HP -- Christophe]
ALTERNATIVE C
Interface:
Implementation:
[000413 All]
[000427 SGI -- Matt]
Behavior:
the return value is a pointer to a null-terminated array
of characters, the demangled name.
If there is an error in demangling, the return value is a null pointer.
The user can examine *status to find out what kind of error it is.
Meaning of error indications:
Memory management:
[000504 All]
[000323 HU-Berlin -- Martin]
2.2, [lex.charset]/2, allows usage of universal-character-names in
C++ programs, especially in identifiers and strings.
How do we mangle the variable pi below?
This is also an issue for C99, so it may be that the base ABI has a
specification; we'd have to follow that at least for extern "C" names.
If not, I propose that such names are encoded in UTF-8.
[000405 Cygnus -- Jason]
UTF-8 is inappropriate for mangled names,
as it uses values > 127 to encode non-ASCII characters.
GNU Java encodes names in UTF-8 internally.
For the mangled name, if there are non-ASCII characters,
it adds a 'U' to the beginning and encodes each
such UCS-2 character as _%04x.
See gcc/java/mangle.c.
This assumes that all interesting characters fall within the Basic
Multilingual Plane (the low 16 bits);
that is a valid assumption for us, since all the extended characters
valid for use in C++ identifiers are part of the BMP.
[000411 HU-Berlin -- Martin]
Why is [UTF-8] not appropriate?
AFAICT, the gABI has no restriction in that respect.
ch4.strtab.html says
I can see there are a number of alternatives. I think it is important
that there is agreement on the rules, in a way that is also
interoperable with C99 implementations. What those rules are is not
that important.
In the C++ ABI, the natural adaptation of that approach would be to
mangle non-ASCII-containing identifiers as _U instead of _Z, right?
Unfortunately, that does not give a solution for C names. I believe
the GNU Java approach also cannot be extended to C99.
[000413 All]
We need to follow the underlying C ABI.
Names containing unicode letters after mangling according to our normal
mangling rules will be encoded as required for external names by the C ABI.
[000504 All]
[000323 HU-Berlin -- Martin]
2.2, [lex.charset]/2, allows usage of universal-character-names in
C++ programs, especially in identifiers and strings.
Consider the example:
First, what is sizeof(wchar_t) in the base ABI? I'll assume 4 for
the moment. Then, the question comes down to: What is the execution
character set, and the wide execution character set? 2.2/3 says
they are implementation-defined, so I guess we must define
them. Typically, people expect this to be a run-time setting (which
is a reasonable assumption), but it kind-of breaks for string
literals.
Proposal: The wide execution character set is UCS-4.
The execution-character-set is "as-is",
i.e. bytes from the source character set are
copied unmodified to the object file.
Universal-character-names appearing in narrow (ie. char)
strings are not portable in this ABI
(the other alternatives would be to say they are Latin-1,
or encoded as UTF-8, I guess).
[000405 Cygnus -- Jason]
I have been told that it is inappropriate to assume that wchar_t is
always UCS-4;
a suggestion was to convert from UCS-4 to the host locale character
set using iconv(),
and then if we're in a wide string,
convert to wchar_t with mbtowc().
This makes sense to me,
though of course it requires iconv to know about UCS-4.
[000413 All]
We need to follow the underlying C ABI.
Strings containing unicode letters
will be encoded as required by the C ABI.
[000504 All]
[000406 All]
One suggestion is to go back to the collision-resistant hash suggested
by Mark in November in another context.
The relevant source code is attached as
fingerprint.h and
fingerprint.c .
[991119 CodeSourcery -- Mark]
I was asked to provide a little more information on collision-free
hashing algorithms.
I've appended our source to do this in our C++-to-C translator.
The hash function here was originally used in Modula-3;
it is provably collision-resistant.
This version uses 64 bits;
the algorithm can be extended to any bit length, however.
Even for 64 bits, the probabilistic guarantee (details at
Compaq research)
ensures that (for example),
the chance of getting a collision with a thousand mangled names
of length a thousand is less than one in a billion.
At CenterLine, we used this algorithm to compute type fingerprints to
detect ODR mismatches at link-time. The same trick could be used to
see whether all definitions of an inline function are really the same.
It's better to use a collision-resistant hash (like this one) than an
ad-hoc hash because the math actually guarantees nice properties.
Other examples of collision-free hashses are "secure hashes", i.e.,
those designed to resist an adversaries ability to create a text with
a given hash, or to find collisions. Well-known examples include SHA
and MD5.
[000504 All]
[000413 All]
No.
It requires more space,
it can be done external to the mangling,
and the group is uncomfortable with the potential breakage.
[990624 Cygnus -- Jason]
How should we handle local static variables in inlines?
G++ currently avoids this issue by suppressing
inlining of functions with local statics.
If we don't want to do that,
we'll need to specify a mangling for the statics,
and handle multiple copies like we do above.
[990721 Cygnus -- Jason]
[We should emit inline routines]
in translation units where an out-of-line copy is needed.
I am opposed to emitting the inlines with the vtable,
for two reasons:
[991118 All]
We discussed linkage of static locals in inline functions.
The C++ standard requires that there be only a single object
in the entire program,
i.e. the static locals in different translation units must be merged.
Two cases: string literals and everything else.
"Everything else" is believed to be a rare and unimportant case.
We'll just give the static locals mangled names,
and put them in comdat groups.
String literals are believed to be common,
and mangled names in COMDAT is too heavyweight.
The base ABI provides an optional mechanism for
merging all copies of a given string literal.
We would like to make this mechanism mandatory,
so that string literals in inline functions get merged automatically.
[991202 All]
The use of the new SHT_MERGE/SHT_STRING attributes,
requiring the static linker to do the merging,
was decided to be a suitable solution.
It was noted that this will not provide merging across DSOs,
but this is not considered a problem.
An implementation may overcome this by naming the strings
and invoking dynamic linker name preemption,
at the cost of additional dynamic link time.
[990607 SCO -- Jonathan]
When compiled with CC -Kthread thr5.C on UnixWare 7, for instance,
it passes by returning 0. When compiled with CC -mt thr5.C on
Solaris/x86 C++ 4.2 (sorry don't have the latest version!), it
fails by returning 5.
[990607 Sun -- Mike Ball]
If you look at the entire statement you find that it reads:
The word "recursively" is normative,
so eliminates that sentence from consideration.
One can, of course, make any extension to the language,
but in this case I think the extension invalidates some otherwise valid code.
The sentence I'm referring to is that the object is considered
initialized upon the completion of its initialization.
This is explicit, and the reason for it is covered in the following sentence,
which discusses an initialization that terminates with an exception.
A person catching such an exception has the right to try again
without danger that the static variable will be initialized in the meantime.
I don't see anything at all to justify semantics that say,
"after initialization is started, Any other threads of control are
blocked until that thread completes the initialization,
unless, of course, it executes by an exception,
in which case the other thread can do the initialization before the
exception handler gets a chance to try again, except...."
Take an attempt to define the semantics as far as you like.
The problem is that there is no way for the compiler writer to know
what the programmer really wanted to do.
I can (and will at some other date, if necessary)
come up with scenarios justifying a variety of mutual exclusion policies,
including none.
The solution is to let the programmer write the mutual exclusion, the
same as we do for every other potential race condition.
It's a real mess, and, I claim, an unwise one to put in as an extension.
[990608 HP -- Christophe]
There are details of our implementation that I disagree with, but in
general, the semantics seem clear and sane, not as convoluted as you
seemed to imply. In particular, it correctly covers the case where
the static initialization fails with an exception. Any thread at that
point can attempt the initialization.
[990608 SCO -- Jonathan]
[...] This is in local static variables with dynamic initialization,
where the compiler generates out a static one-time flag to guard the
initialization.
Two threads could read the flag as zero before either of them set it,
resulting in multiple initializations.
[...] Accordingly, when compilation is done with -Kthread on,
a code sequence will be generated to lock this initialization.
[...] the basic idea is to have one guard saying whether the
initialization is done (so that multiple initializations do not occur)
and have another guard saying whether initialization is in progress
(so that a second thread doesn't access what it thinks is
an initialized value before the first thread has finished the
initialization). [...]
When compiled with -Kthread, the generated code for a dynamic
initialization of a local static variable will look like the
following. guard is a local static boolean, initialized to zero,
generated by the [middle pass of the compiler].
Two bits of it are used: the low-order 'done bit'
and the next-low-order 'busy bit'.
The above code will work for position-independent code as well.
The complication due to exceptions is:
what happens if the initialization code throws an exception?
The [compiler] EH tables will have set up a special region and flag in
their region table to detect this situation,
along with a pointer to the guard variable.
Because the initialization never completed,
when the RTS sees that it is cleaning up from such a region,
it will reset the guard variable back to both zeroes.
This will free up a busy-waiting thread, if any,
or will reset everything for the next thread that calls the function.
The idea of the __static_init_wait() RTS routine is to monitor the
value of guard bits passed in, by looping on this decision table:
As for how the wait is done [... not relevant for ABI,
although currently we're using thr_yield(),
which may or may not be right for this context].
[990608 SGI -- Hans]
I believe that these arguments imply that this problem is not important
enough to warrant added ABI complexity or overhead for sequential code.
Consider the following skeletal example:
[990607 SCO -- Jonathan]
I don't agree with these claims.
There are sometimes situations where a group of objects is being processed,
and you want to arbitrarily pick one of them
to serve as an identifier or key for all of them.
Consider perhaps a golf course scheduler,
which is taking in players and assigning them to foursomes.
You want to name each foursome by one of the names of the players
(it doesn't matter which one),
such as the "Jones group" or the "Smith group".
A natural way to program this might be:
Now if the golfers being scheduled are coming from four different databases,
it might be that a thread is running to extract from each database.
Thus build_foursome() might be called concurrently.
That's fine, and there is no need for application-level locks in
either the caller or this function; we don't care which golfer
the group is named after.
We just want the 'static' to work correctly;
what we don't want is a double initialization,
with two different group names being generated for golfers in the same group,
which is possible if the guard code isn't thread-safe.
Now one can say that this kind of design isn't wise,
or that locks will probably be needed later in this function
to do the rest of the processing,
or that this can be coded in several other ways.
And that may all be so.
But I think this usage is *reasonable* in this context,
and that as implementors we should get it right.
[Editorial: Especially with the advent of Java,
threaded application programming is becoming more the norm;
and language implementations that dodge the challenge and say that
thread support is solely the job of libraries,
may not be looked upon kindly by users.]
[000511 All]
[000706 All]
[000706 HP -- Christophe]
The calling sequence for:
This has the following benefits:
The function itself deals with the flag in a thread-safe way,
but this requires only one mutex inside the function.
This is important, since test and set operations
are potentially costly memorywise on IA64
(they are definitely on PA-RISC,
where any mutex / lock / whatever must be 16-bytes aligned)
[000803 All]
The concern was repeated that there are objections to any automatic
locking approach, and we should go back and consider them again.
[000720 All]
A potential interface that allows this would be the following.
Expand the guard object to the following structure:
[000817 SGI -- Jim]
[001109 all]
Since there has been no further feedback from HP on the more
complicated proposal above, and the current HP attendees do not think
it necessary, this issue will be closed.
[990810 HU-B Martin]
I'd like to see an indirection in vararg lists,
so they can be passed through thunks.
This is necessary at least for the covariant returns,
but might have other applications as well.
[990810 HU-B Martin]
Since there already was the decision not to return a list of
pointers from a covariant method,
the only alternative to real thunks
is code duplication (as done in Sun Workshop 5).
(Or alternate entrypoints... Jim)
With real thunks, you have to copy the argument list.
That is not possible for a varargs list,
so here is my proposal for varargs in C++:
In the place of the ellipsis, a pointer to the first argument is passed.
In case of a thunk for covariant returns,
this pointer can be copied to the destination function.
The variable arguments are put on the stack as they normally would.
With that, the issue is in which cases to use such a calling
convention:
Option (1) could be further restricted to methods returning a pointer
or reference to class type.
[990812 All]
In response to a question,
it was observed that passing one variant of a class hierarchy in a
varargs list and referencing another variant in the va_arg macro is undefined,
and we don't need to worry about a mechanism for doing the conversion.
[991014 All]
We would want to reject option (3),
even if it were still possible to change the base ABI.
The present scheme is compatible with K&R C methods,
the proposed change would not be.
Decision: Close with no action.
We're using multiple entry points for covariant return types, not thunks,
so there's no need for doing anything different for varargs functions
with covariant return types than for any other varargs functions.
[991202 All]
It was decided not to treat bool parameters specially,
i.e. they will be passed like chars.
[000817 All]
Agreed to name the library
#
Issue Class Status
Source Opened Closed
A-24
RTTI for incomplete types
data
closed
CodeSourcery
000126
000330
Summary:
How does RTTI represent incomplete types?
Resolution:
Use class_type_info distinct from the complete type copy,
add a flag to pointer_type_info if it points to incomplete type RTTI,
and do mangled name comparison if an incomplete pointer is involved.
Note that the full structure described by an RTTI descriptor may
include incomplete types not required by the Standard to be completed,
although not in contexts where it would cause ambiguity.
struct A;
struct B;
int main ()
{
try {
throw (B **)0;
} catch (A const * const *) {
abort ();
} catch (B const * const *) {
;//ok
} catch (...) {
abort ();
}
}
__tiPP1B:
.long __vt_19__pointer_type_info
.long .LC2
.long 0
.long __tiP1B
__tiP1B:
.long __vt_19__pointer_type_info
.long .LC3
.long 0
.long __ti1B ;; not emitted, will resolve to zero
__tiPCPC1A:
.long __vt_19__pointer_type_info
.long .LC1
.long 1
.long __tiPC1A
__tiPC1A:
.long __vt_19__pointer_type_info
.long .LC4
.long 1
.long __ti1A ;; not emitted, will resolve to zero
__tiPCPC1B:
.long __vt_19__pointer_type_info
.long .LC0
.long 1
.long __tiPC1B
__tiPC1B:
.long __vt_19__pointer_type_info
.long .LC5
.long 1
.long __ti1B ;; not emitted, will resolve to zero
#
Issue Class Status
Source Opened Closed
A-25
Excess-width bitfields
data
closed
IBM
000204
000217
Summary:
C++ allows bitfields with a larger size specified than that required by
the declared type, e.g. int f: 64
.
How should they be allocated?
Resolution:
Allocate the field with alignment determined as though it were the
largest integer type that fits in the specified size,
and use the first bits available in the field
(lowest order for little endian IA-64)
for the actual data.
In this case, I don't want the accessible part of i at the beginning or
the end -- I want it in the middle. Doing otherwise yields either a
badly aligned i, or wasted space.
struct s {
short s1;
int i: 64;
short s2;
}
[000204 IBM -- Mark]
I disagree.
If the user wants the bitfield to be aligned in a certain place,
he has the tools to do so.
He can certainly pick a different size bitfield.
I think that this should be aligned as if it is the same size as the type,
and then the extra bits put somewhere.
Putting them afterwards is probably simpler than before,
or splitting it in the middle.
[000217 All]
The rationale for the solution chosen is that the most likely reason
for using this feature is to achieve a known allocation for an enum
type when the user does not know how big compilers will make it.
Thus, we want "enum ... e : 32;
" to behave as though
the compiler allocated a 32-bit int,
even if it actually uses only 8 bits for the enum value.
#
Issue Class Status
Source Opened Closed
A-26
NULL pointers to member functions
data
closed
CodeSourcery
000221
000302
Summary:
How are NULL pointers to member functions represented?
Resolution:
A NULL pointer is represented by a 0 value of ptr
,
and the value of adj
is irrelevant.
and not:
to the ABI document.
is required since in the case that p1.ptr and p2.ptr are both
zero, there `adj' fields are irrelevant.)
#
Issue Class Status
Source Opened Closed
A-27
NULL pointers to data members
data
closed
CodeSourcery
000222
000302
Summary:
How are NULL pointers to member data represented?
Resolution:
A NULL pointer is represented by the value -1.
But, therefore, converting a non-NULL value to NULL is explicitly
permitted by the standard.
#
Issue Class Status
Source Opened Closed
A-28
RTTI equality testing
data
closed
CodeSourcery
000406
000504
Summary:
Can we get back the ability to do a simple test for RTTI equality?
Resolution:
Mangle the name NTBS for std::type_info separately,
emit it in its own COMDAT,
and use it instead of the RTTI struct,
at least if the incomplete flags are set in pointer types.
Proposal A
class abi::__type_info
{
std::type_info const *type; // pointer to typeid(foo) object.
virtual ~__type_info ();
... other implementation defined member functions
};
class abi::__function_type_info
: public abi::__type_info
{
virtual ~__function_type_info ();
... other implementation defined member functions
};
class abi::__pointer_type_info
: public abi::__type_info
{
abi::__type_info const *target; // target type of the pointer
unsigned flags; // flags, as currently specified
virtual ~__pointer_type_info ();
... other implementation defined member functions
};
class abi::__pointer_to_member_type_info
: public abi::__pointer_type_info
{
abi::__class_type_info const *klass; // class of the member
virtual ~__pointer_to_member_type_info ();
... other implementation defined member functions
};
class abi::__class_type_info
: public abi::__type_info
{
... as currently defined
}
Proposal B
bool type_info::operator== (type_info const &other) throw ()
{
return name == other.name;
}
#
Issue Class Status
Source Opened Closed
A-29
RTTI pointer-to-member
data
closed
CodeSourcery
000407
000504
Summary:
Derive __pointer_to_member_type_info from __pointer_type_info.
Resolution:
Derive __pointer_to_member_type_info and __pointer_type_info from
a common base class __pbase_type_info.
Add a new flag to __pbase_type_info indicating that the class of a
pointer-to-member is incomplete
(propagated up a chain of pointers).
The abi::__pointer_to_member_type_info type adds one field to
abi::__pointer_type_info:
incomplete_mask = 0x8
incomplete_chain_mask = 0x10
incomplete_klass_mask = 0x20
#
Issue Class Status
Source Opened Closed
A-30
RTTI portability
data
closed
HUB
001012
001109
Summary:
What must be specified to produce RTTI portability?
Are member layouts specified? Names? Virtual functions?
Resolution:
Data members of the ABI-defined type_info derived classes must be
allocated as specified, and their names are normative.
Virtual functions, beyond the Standard-specified destructor,
are implementation-specific,
and may not be referenced outside the compiler and system vendors'
runtime libraries.
std::type_info
:
class __cxa_aux_typeinfo {
... (*__is_function_p) (...);
...
};
class std::type_info {
...
protected:
__cxa_aux_typeinfo *__aux;
type_info (void) { /* set up __aux */ };
};
#
Issue Class Status
Source Opened Closed
A-31
Overlaying tail padding
data
closed
CodeSourcery
001019
001109
Summary:
Should we change the decision to overlay tail padding in class layout?
For volatile members? In general?
Resolution:
The overlaying of tail padding is eliminated,
but we will retain the treatment of empty bases.
struct A { short s; char c; };
struct B { A a; char d; };
struct S { short sh; char ch; };
struct T { S s; volatile char d; };
B. Virtual Function Handling Issues
#
Issue Class Status
Source Opened Closed
B-1
Adjustment of "this" pointer (e.g. thunks)
data call
closed
SGI
990520
991202
Summary:
There are several methods for adjusting the this pointer
for a member function call,
including thunks or offsets located in the vtable.
We need to agree on the mechanism used,
and on the location of offsets, if any are needed.
To maximize performance on IA64,
a slightly unusual approach such as using secondary entry points
to perform the adjustment may actually prove interesting.
Resolution:
See the writeup in the Draft C++ ABI for IA-64.
Open Issues Relevant To This Discussion
1. Scope and "State of the Art"
2. Proposal and Rationale
3. New Calling Convention
struct A { virtual void f(); };
struct B { virtual void g(); };
struct C: A, B { }
struct D : C { virtual void f(); virtual void g(); }
struct E: Other, C { virtual void f(); virtual void g(); }
struct F: D, E { virtual void f(); }
void call_Cf(C *c) { c->f(); }
void call_Cg(C *c) { c->g(); }
void call_Df(D* d) { d->f(); }
void call_Dg(D* d) { d->g(); }
void call_Ef(E* e) { e->f(); }
void call_Eg(E* e) { e->g(); }
void call_Ff(F *ff) { ff->f(); }
void call_Fg(F *ff) { ff->g(); } // Invalid: ambiguous
4. Cases where adjustment is performed
5. Comparing the code trails
// Compute the address of the vptr in the object,
// from the this pointer
// Optional, since vptroffset is often 0.
// This also adjusts to the class of the final overrider
addi Rthis=vptroffset_of_final_overrider,Rthis
;;
// Load the vptr in a register
ld8 Rvptr=[Rthis]
;;
// Add the offset to get to the function descriptor pointer
// in the vtable. Never zero, this instruction is always generated
addi Rfndescr=fndescroffset,Rvptr
;;
// (Assuming inlined stub) Load the function address and new GP
ld8 Rfnaddr=[Rfndescr],8
;;
// Load the new GP
ld8 GP=[Rfndescr]
mov BRn=Rfnaddr
;;
// Perform the actual branch to the target
// ...
// ... Branch misprediction almost always, followed by
// ... I-Cache miss almost always if jumping to a thunk
br.call B0=BRn
#if OLD_ADJUST
thunk_A::f_from_a_B:
// If the 'adjustment_from_B_to_A is the 'adjustment_to_A' above,
// then in the new case, the vtable directly points to A::f
addi Rthis,adjustment_from_B_to_A
// In most cases, we can probably generate a PC-relative branch here
// It is unclear whether we would correctly predict that branch
// (since it is assumed that we arrive here immediately following
// a misprediction at call site)
br A::f
#endif // OLD_ADJUST
// This occurs less often than OLD_ADJUST
// (it does not happen when call-site adjustment is correct)
#if NEW_ADJUST
adjusting_entry_A::f
// Can't be executed in less than 3 cycles?
addi Rvptr=class_adjustment_offset,Rvptr
;;
// This loads data which is close to the fn descriptor,
// so it's likely to be in the D-cache
ld8 Rvptr=[Rvptr]
;;
add Rthis=Rthis,Rvptr
#endif
A::f:
alloc ...
Final virtual calling convention:
#
Issue Class Status
Source Opened Closed
B-3
Allowed caching of vtable contents
call
closed
HP
990603
990805
Summary:
The contents of the vtable can sometimes be modified,
but the concensus is that it is nonetheless always allowed to "cache" elements,
i.e. to retain them in registers and reuse them,
whenever it is really useful.
However, this may sometimes break "beyond the standard" code,
such as code loading a shared library that replaces a virtual function.
Can we all agree when caching is allowed?
Resolution :
Caching is allowed.
> The ia64 C++ ABI committee has decided to use the descriptors.
> If this doesn't make sense (i.e. if there's no way to express
> such a thing to the assembler), now's the time to let us know...:)
You mean you want the vtable to look like
struct { void *code, *gp } vtable[];
There are no suitable IA-64 relocations to express this.
the vtable is emitted only in the TU that contains the definition of h().
struct X {
void a();
virtual void f() { return; }
virtual void g() = 0;
virtual void h();
virtual void i();
};
int f() { return 1; }
int x = f(); // Static (COMDAT) or Dynamic (COMMON) initialization?
C++ ABI: COMDAT Proposal
Revisions
Introduction
Proposal
SHF_GROUP: Group Member Sections
A section which is part of a group,
and is to be retained or discarded with the group as a whole,
is identified by a new section header attribute:
SHT_GROUP: Section Group Definition
name
unspecified
sh_type
SHT_GROUP
sh_link
.symtab
section index
sh_info
symbol index
sh_flags
none
sh_entsize
size of section indices (
4
)
requirements
may not be stripped
sh_link
field identifies a symbol
table section, and its sh_info
field the index of a
symbol in that section.
The name of that symbol is treated as the identifier of the section group.
sh_size
value is sh_entsize
times
one plus the number of sections in the group.
Requirements
Questions
gABI: Section Indices
Revisions and Status
Background
Proposed gABI Changes
General Approach
4.1 Elf Header
ElfXX_Half e_shnum;
e_shentsize
and e_shnum
gives the section header table's size in bytes.
If a file has no section header table,
e_shnum
holds the value zero.
SHN_LORESERVE
(0xff00
),
this member has the value SHN_XINDEX
(0xffff
),
and the actual number of section header table entries is in the member
sh_size
of the section header at index 0.
ElfXX_Half e_shstrndx;
SHN_UNDEF
.
See ``Sections'' and ``String Table'' below for more information.
SHN_LORESERVE
(0xff00
),
this member has the value SHN_XINDEX
(0xffff
),
and the actual index of the section name string table is in the member
sh_link
of the section header at index 0.
4.2 Sections
SHN_XINDEX (0xffff)
sh_link
and sh_info
,
but they are already defined as ElfXX_Word, and require no change.
SHT_SYMTAB_SHNDX (17)
sh_link
field of this section contains the index of the
associated SHT_SYMTAB
section.
.symtab_shndx
.symtab
section.
The section's attributes will include the SHF_ALLOC
bit
if the associated .symtab
section does;
otherwise, that bit will be off.
.symtab
section
to its associated .symtab_shndx
section,
so we use the sh_link
field in the latter to point back.
It is recommended (but not required) that implementations place each
.symtab_shndx
section immediately after its associated
.symtab
section (in the section header table)
to make it easy for the linker to find.
4.x Symbol Table
st_shndx
as follows:
st_shndx
sh_link
and sh_info
interpretation
table and the related text describe,
section indexes in the range 0xff00 to 0xffff indicate special meanings.
In particular, SHN_XINDEX (0xffff)
indicates that the
real index is too large to fit in this field,
and must be found in the associated SHT_SYMTAB_SHNDX table (above).
st_shndx
fields in a symbol table section
contain the value SHN_XINDEX (0xffff)
,
there must be an associated SHT_SYMTAB_SHNDX
section,
with a sh_link
field containing the index of this
SHT_SYMTAB
section.
That section contains an array of 32-bit section indices,
matching the symbol table entries 1-1 in the same order.
Entries corresponding to SHN_XINDEX (0xffff)
values of
st_shndx
in the symbol table must contain the actual
section header index to be used.
Others should contain either the correct section header index
(i.e. duplicating the value in st_shndx
), or zero.
Compatibility
#
Issue Class Status
Source Opened Closed
B-6
Virtual function table layout
data
closed
SGI
990520
991028
Summary:
What is the layout of the Vtable?
Resolution:
See the Draft C++ ABI for IA-64,
abi.html.
Christophe will look at the implications of these observations.
Others should too.
Re: vtable layout, sharing vtable offsets
struct V1 { virtual void f(); virtual void g(); };
struct Other1 { virtual void ignore1(); }
struct X : Other1, virtual V1 { virtual void f(); }
struct Y : Other1, virtual V1 { virtual void g(); }
struct ZZ: X, Y { }
Re: Concatenating vtables
#
Issue Class Status
Source Opened Closed
B-7
Objects and Vtables in shared memory
data
closed
HP
990624
990805
Summary:
Is it possible to allocate objects in shared memory?
For polymorphic objects, this implies that the Vtable must also be
in shared memory.
Resolution :
No special representation is useful in support of shared memory.
#
Issue Class Status
Source Opened Closed
B-9
Primary base vtable embedding
data
closed
Cygnus
000217
000302
Summary:
Resolve the embedding of the vtable for the primary base class
in the derived class vtable.
Resolution:
Any class with virtual bases shall contain vbase pointers
for all of its virtual bases.
#
Issue Class Status
Source Opened Closed
B-10
Pure virtual runtime
call
closed
CodeSourcery
000629
000706
Summary:
Define a runtime proxy routine for pure virtual functions.
Resolution:
Define such a runtime routine, with implementation-defined behavior.
extern "C" void __cxa_pure_virtual ();
C. Object Construction/Destruction Issues
#
Issue Class Status
Source Opened Closed
C-2
Order of ctors/dtors w.r.t. link
lif ps
closed
HP
990603
000817
Summary:
Given that the compiler has identified constructor/destructor calls for
static objects in each relocatable object, in what order should the
static linker combine them in the linked executable object?
(The initialization order determines the finalization order,
as its opposite.)
Resolution:
Accepted method based on IBM's specification.
See Draft C++ ABI for IA-64, Section 3.3.4.
Proposal
Object File Representation
SHT_CXX_PRIORITY_INIT
.
Its elements are structs:
The semantics are that
typedef struct {
ElfXX_Word pi_pri;
ElfXX_Addr pi_addr;
} ElfXX_Cxx_Priority_Init;
pi_addr
is a function pointer,
with an unsigned int
priority parameter,
which performs some initialization at priority pi_pri
.
Each of these functions will be called with the GP of the
executable object containing the table.
The section header field sh_entsize
is 8 for ELF-32,
or 16 for ELF-64.
Runtime Library Support
It will be called with the address of a cnt
-element
(sub-)vector of the priority initialization entries,
and will call each of them in order.
It will be called with the GP of the initialization entries.
Linker Processing
DT_INIT
,
DT_INIT_ARRAY
, and __cxx_priority_init
.
Priority order is first according to the priority of the task,
and then according to the order of relocatable objects and options
in the link command.
The order of tasks specified by other methods,
relative to SHT_CXX_PRIORITY_INIT tasks of priority zero,
is implementation defined.
There are several possible implementations. Two extremes are:
Sorting Sections
SHF_SORT
.
If present, the linker is required to sort the elements of the
concatenated sections of the same type,
where the elements are determined by sh_entsize
.
The sort is controlled by fields in sh_info
:
#define SH_INFO_KEYSIZE(info) (info & 0xff)
#define SH_INFO_KEYSTART(info) ((info>>8) & 0xff)
#define SH_INFO_SORTKIND(info) ((info>>16) & 0xf)
sh_entsize
),
or floating point data.
Also, note that if we don't anticipate using such a general mechanism,
it becomes possible to avoid padding words in the ELF-64 format by
separating the priority and address vectors.
I'll address them separately.
A) Linker impact
B) Sorting approach
#
Issue Class Status
Source Opened Closed
C-3
Order of ctors/dtors w.r.t. DSOs
ps
closed
HP
990603
000504
Summary:
Given the constructor/destructor calls for each executable object
comprising a program, what is the order of execution between objects?
For constructors, there is not much question:
unless we choose some explicit means of control,
file-scope objects will be initialized by the DT_INIT/DT_INITARRAY
functions in the order determined by the base ABI order rules,
and local objects will be initialized in the order their containing
scopes are entered.
Resolution:
Accept SGI proposal for a simple API which registers destructors and
atexit calls.
Subsequently, accept proposal to eliminate call to __cxa_finalize when
program exits.
Proposal
Runtime Data Structure
Runtime API
int __cxa_atexit ( void (*f)(void *), void *p, dso_handle d );
__cxa_atexit(f,p,d)
,
is intended to cause the call f(p)
when DSO d is unloaded,
before all such termination calls registered before this one.
It returns zero if registration is successful, nonzero on failure.
Should we use exceptions instead?
atexit
,
they should be registered with NULL parameter and DSO handle, i.e.
__cxa_atexit ( f, NULL, NULL );
atexit
implementation, so that C-only DSOs will nevertheless interact with C++
programs in a C++-standard-conforming manner.
No user interface to __cxa_atexit is supported,
so the user is not able register an atexit
function with a
parameter or a home DSO.
__cxa_atexit
,
the linker should define a hidden symbol __dso_handle
,
with a value which is an address in one of the object's segments.
(It doesn't matter what address,
as long as they are different in different DSOs.)
It should also include a call to the following function in the FINI
list (to be executed first):
void __cxa_finalize ( dso_handle d );
__dso_handle
.
__cxa_atexit
, but they can be safely included in all objects.
__cxa_finalize
with NULL parameter.
__cxa_finalize(d)
is called,
it should walk the termination function list,
calling each in turn if d
matches
__dso_handle
for the termination function entry.
If d == NULL
, it should call all of them.
Multiple calls to __cxa_finalize
should not result in
calling termination function entries multiple times;
the implementation may either remove entries or mark them finished.
__cxa_finalize
instead of one,
we could deal with unloading multiple DSOs at once.
However, dlclose
closes one at a time,
so I'm not sure the extra complexity is worthwhile.
__cxa_atexit
and __cxa_finalize
must both manipulate the same termination function list,
they must be defined in the implementation's C++ runtime library,
rather than in the individual linked objects.
__cxa_atexit
is supported.
__cxa_finalize
analog
which takes a list of DSOs instead of a single DSO,
and if the program or dynamic linker identifies a set of DSOs to be
unloaded together, run their finalization entries in a single pass
instead of one DSO at a time.
================================
===== filename="cxa_atexit.c"
================================
/* Copyright (C) 1999 Free Software Foundation, Inc.
This file is part of the GNU C Library.
The GNU C Library is free software; you can redistribute it and/or
modify it under the terms of the GNU Library General Public License as
published by the Free Software Foundation; either version 2 of the
License, or (at your option) any later version.
The GNU C Library is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
Library General Public License for more details.
You should have received a copy of the GNU Library General Public
License along with the GNU C Library; see the file COPYING.LIB. If not,
write to the Free Software Foundation, Inc., 59 Temple Place - Suite 330,
Boston, MA 02111-1307, USA. */
#include
#
Issue Class Status
Source Opened Closed
C-4
Construction vtables
call
closed
Cygnus
990603
000504
Summary:
When calling a virtual function from the constructor/destructor of a
base subobject,
the version specific to the base type is required,
unlike the typical case when calling such a vfunc for the full object
from some other context.
Since the pointer for that vfunc in the the subobject's sub-vtable
of the full object's vtable is the full object version,
some other means is required for accessing the correct vfunc.
Resolution:
Accept Compaq proposal as currently documented in the
Draft C++ ABI for IA-64.
Since only the most-derived object calls delete()
is called.
delete()
,
and only the most-derived object does destruction for virtual bases,
only three of the possible combinations arise:
The first will be called for a base object when a derived object is
being destroyed,
and one of the other two for the most-derived object.
Therefore, for any particular vtable,
no more than two will be required.
A vtable for a class with virtual destructors will contain two
destructor entries, delete and no-delete,
and they will both be the in-charge versions for the most-derived class
in the structure.
The no-delete, not in-charge destructors may be called from those,
but always directly,
so a global name is required but no vtable entry.
delete()
, not in-charge.
delete()
, in-charge.
delete()
, in-charge.
this
parameter is required,
and the standard calling conventions are used.
The only special treatment of virtual destructors is the pair of vtable
entries described above.
#
Issue Class Status
Source Opened Closed
C-7
Passing value parameters by reference
call
closed
All
990624
990805
Summary:
It may be desirable in some cases where a type has a non-trivial
copy constructor to pass value parameters of that type by performing
the copy at the call site and passing a reference.
Resolution :
Whenever a class type has a non-trivial copy constructor,
pass value parameters of that type by performing
the copy at the call site and passing a reference.
#
Issue Class Status
Source Opened Closed
C-8
Returning classes with non-trival copy constructors
call
closed
All
990625
990722
Summary:
How do we return classes with non-trivial copy constructors?
Resolution:
The caller allocates space,
and passes a pointer as an implicit first parameter
(prior to the implicit this parameter).
#
Issue Class Status
Source Opened Closed
C-9
Passing parameters with ctors/dtors
call
closed
All
991028
991104
Summary:
Where do allocation, construction, destruction, and deallocation occur
for value parameters?
Summary:
See the description in the
closed issues list.
this
in memory.)
#
Issue Class Status
Source Opened Closed
C-11
Array constructors/destructors
call
closed
Cygnus
000130
000309
Summary:
How are constructors/destructors run for arrays?
Many compilers use a __vec_new
function;
g++ doesn't, to allow for inlining of constructors.
Resolution:
Define standard library entries for array construction/destruction.
See the Draft C++ ABI for IA-64.
#
Issue Class Status
Source Opened Closed
C-12
Constructor return values
call
closed
Cygnus
000130
000309
Summary:
What is the return value of a constructor?
Void, this, ...?
Resolution:
Void.
#
Issue Class Status
Source Opened Closed
C-13
Allocating constructors
call
closed
HP
000309
000803
Summary:
Should we define allocating constructors?
Resolution:
Their use is optional.
Their name mangling is specified.
If used, they must be emitted everywhere referenced as a COMDAT group
(Draft ABI section 5.2.5).
#
Issue Class Status
Source Opened Closed
C-14
Local-scope dynamic constructors
data
closed
all
000309
000511
Summary:
The Standard requires that local static objects with dynamic
constructors be initialized exactly once,
the first time the containing scope is entered.
This requires a data object to serve as a guard variable;
define its content or interface.
Resolution:
The size of the guard variable is 64 bits.
The low-order byte shall contains a boolean initialization flag.
#
Issue Class Status
Source Opened Closed
C-15
Alternate array allocators
call
closed
CodeSourcery
000628
000720
Summary:
Allow alternate allocators/deallocators to
__cxa_vec_new
and __cxa_vec_delete
.
Resolution:
Add two new allocators, and two new deallocators,
with one of each pair using a simple user deallocator
and one using a user deallocator requiring a size.
See the Draft C++ ABI for IA-64.
#
Issue Class Status
Source Opened Closed
C-16
Copy constructor runtime
call
closed
CodeSourcery
000628
000720
Summary:
Define a runtime support routine for copy constructors.
Resolution:
Add a new runtime for vector copy construction.
See the Draft C++ ABI for IA-64.
extern "C" void
__cxa_vec_cctor (void *dest_array,
void *src_array,
size_t element_count,
size_t element_size,
void (*constructor) (void *, void *),
void (*destructor) (void *))
{
size_t ix = 0;
char *dest_ptr = static_cast
#
Issue Class Status
Source Opened Closed
C-18
Result buffers
call
closed
SGI
000724
000817
Summary:
Should buffers for results with non-trivial copy constructors be passed
as a dummy first parameter, or in r8 as specified by the psABI for long
structured results?
Resolution:
All results with non-trivial copy constructors or destructors will be
returned in buffers allocated by the caller,
with their addresses passed as an implicit first parameter.
Other structure results too large for the return registers are
returned in a buffer created by the caller,
with the buffer address passed in r8.
#
Issue Class Status
Source Opened Closed
C-19
NULL ctor/dtor API parameters
call
closed
CodeSourcery
000806
000831
Summary:
Allow NULL constructor/destructor parameters whereever it makes sense
in the Section 3.3 APIs.
Resolution:
Accepted as proposed.
D. Exception Handling Issues
#
Issue Class Status
Source Opened Closed
D-0
Exception handling framework
lib ps
closed
SGI
990520
991216
Summary:
Define the general framework for exception handling,
including Level I (psABI unwinding API)
and Level II (C++ ABI exception handling API).
Resolution:
See the HP proposal,
accepted as a working paper,
and discussions in the closed issues page.
Resume propagation of an =
existing exception. [Note: _ResumeUnwind should not be used to implement =
rethrowing. To the unwinding runtime, the catch code that rethrows was a =
handler, and the previous unwinding session was terminated before =
entering it.] [Note: Compared to HP runtime, the exception class =
and frame handle arguments have been removed. They also need no longer =
be passed to the landing pads. Instead, the unwinder will store the =
information in one of its 2 reserved words.]
If a given runtime resumes =
normal execution after catching a foreign exception, it will not know =
how to delete that exception. This exception will be deleted by calling =
_DeleteException, which in turn will delegate the task to the =
original personality routine (see EH_DELETE_EXCEPTION_OBJECT =
below).
uint64 =
_Unwind_getIP(void *context);
void _Unwind_setGR(void *context, int =
index, uint64 new_value);
void _Unwind_setIP(void *context, uint64 =
new_value);Get or set registers from the given =
unwinder context. The 'context' argument is the same argument passed to =
the personality routine (see below).[Note: Minor changes compared to the =
existing unwinding interface, mostly to hide the register =
classes]
Get the address of the language-specific =
data area for the current stack frame. The 'context' argument =
is the same argument passed to the personality routine.[Note: This is =
not stricly required: it could be accessed through getIP using the =
documented format of the UnwindInfoBlock, but since this work has been =
done for finding the personality routine in the first place, it makes =
sense to cache the result in the context, as we currently =
do]
Get the address of the beginning of the =
current procedure or region of code. [Note: This is required for us =
because we store data relative to the beginning of the code. So let's =
make it mandatory ;-]
int =
PersonalityRoutine
(int =
version,
int =
phase,
UInt64 =
exceptionClass,
void * =
exceptionObject,
void =
*context);
setjmp
in the document should be to
longjmp
.
terminate()
before starting Phase 2.
#
Issue Class Status
Source Opened Closed
D-2
Unwind personality routines
lib ps
closed
SGI
990520
000106
Summary:
The IA-64 runtime conventions provide for a personality routine
pointer for language-specific actions when unwinding the stack.
They do not specify its interface.
There are typically two required actions for C++:
locating a handler (non-destructively)
and destroying automatic objects while unwinding.
This issue involves specification of the API (see also D-3).
Resolution:
See the exception handling specification,
level 1, and the working paper.
#
Issue Class Status
Source Opened Closed
D-3
Unwind process clarification
lib ps
closed
SGI
990520
000106
Summary:
The IA-64 runtime conventions provide for a personality routine
pointer for language-specific actions when unwinding the stack.
However, they are quite muddy about the precise sequence of calls.
This issue involves specification of unwind process (see also D-2).
Resolution:
See the exception handling specification.
#
Issue Class Status
Source Opened Closed
D-8
Interaction with threads packages
lib ps
closed
SGI
990603
000106
Summary:
What happens when an exception is not caught in the thread where raised?
What does uncaught_exception()
return if another thread is currently processing an exception?
Resolution:
With one exception, exception handling is entirely per-thread --
exceptions must be caught in the thread where raised,
and queries about them (e.g. uncaught_exception()
)
are answered only with respect to the thread doing the query.
The only global exception behavior is handler registration --
see issue D-15.
#
Issue Class Status
Source Opened Closed
D-9
longjmp interaction
lib ps
closed
IBM
990908
000113
Summary:
Does longjmp run destructors?
Resolution:
Define an alternate routine, longjmp_unwind in namespace abi,
defined in new header cxxabi.h,
which always does full cleanup during unwinding.
longjmp_unwind
for the alternate
longjmp
that always does full C++ unwinding.
The issue of where to put it (namespace and header) remains.
cxxabi.h
:
namespace abi {
extern "C" void longjmp_unwind (jmp_buf env, int val);
}
#
Issue Class Status
Source Opened Closed
D-10
psABI proposal
lib ps
closed
all
991216
000120
Summary:
Solidify the Level I (psABI) specification and submit it to the base
ABI group.
Resolution:
See the exception handling specification.
terminate()
before Phase 2.
Issues D-11 through D-14 below are also relevant to the Level I
specification.
#
Issue Class Status
Source Opened Closed
D-11
pthreads interface
lib ps
closed
all
991216
000203
Summary:
Certain pthreads functionality is a prerequisite,
e.g. to acquire thread-local storage.
The ABI should specify the requirements,
along with the expected stub behavior when
the pthreads library is not present.
Resolution:
No specification necessary.
This is Level 3 material.
pthreads
package,
with multi-threading semantics.
However, it is expected that an implementation will provide default
versions in the C++ (or C) library for single-threading programs,
and override them in the thread library for multi-threading cases.
Once-only Initialization
typedef ... pthread_once_t;
pthread_once_t once_control = PTHREAD_ONCE_INIT;
int pthread_once ( pthread_once_t *once_control,
void (*init_routine) (void) );
pthread_once
routine is to execute
a particular initialization routine exactly once in a thread-safe manner.
The user declares a control variable of type
pthread_once_t
statically initialized to PTHREAD_ONCE_INIT
,
and passes it to the pthread_once
routine.
pthread_once
is called with a given
once_control
argument,
it calls init_routine
with no argument
and changes the value of the once_control
variable to record that initialization has been performed.
Subsequent calls to pthread_once
with the
same once_control
argument do nothing.
pthread_once
always returns 0.
once_control
,
whereas overriding versions in multi-threading libraries
presumably will.
Thread-Private Data Key Management
typedef ... pthread_key_t;
int pthread_key_create ( pthread_key_t *key,
void (*destr_function) (void *) );
int pthread_setspecific ( pthread_key_t key,
const void *pointer );
void * pthread_getspecific ( pthread_key_t key );
pthread_key_t
.
It then obtains an identifying key value from the implementation
by calling pthread_key_create
,
also specifying at that time a destructor routine that will be called
if a thread terminates,
with a single argument that is the value associated with the key
for the terminating thread.
This destructor call is only made if the associated value is not NULL,
and it is set to NULL before making the call.
pthread_key_create
returns zero,
places the value of the key identifier in *key
, and
initializes the value associated with the key to NULL for all threads.
If unsuccessful, e.g. exceeding the number of allocated keys,
it returns an error code.
pthread_setspecific
.
If successful, pthread_setspecific
returns zero.
If unsuccessful, e.g. because of an invalid key identifier,
it returns an error code.
pthread_getspecific
,
which returns the value associated with key
on success,
and NULL on error.
#
Issue Class Status
Source Opened Closed
D-12
Table location
lib ps
closed
all
991216
000504
Summary:
Determine constraints on the location of the unwind table
and the unwind information table.
Resolution:
The unwind tables must reside in the text segment they describe.
SHF_ALLOC
(but not writable),
as well as a segment type,
but does not specify the unwind information table section information.
dlmodinfo()
entry point locates
the load module containing that text segment,
and returns a struct load_module_desc,
which contains, among other things,
a pointer to the unwind table for that load module.
dlmodinfo()
is already standard.
typedef __personality_routine
(*_Unwind_IPLookupFn) (uint64 IP, void **pImplementationData);
int _Unwind_RegisterIPLookup
(_Unwind_IPLookupFn LookupFn, uint64 StartAddr, uint64 EndAddr);
void _Unwind_UnregisterIPLookup (_Unwind_IPLookupFn LookupFn);
typedef _Unwind_Reason_Code(*__personality_routine)
( int version,
_Unwind_Action actions,
uint64 exceptionClass,
_Unwind_Exception *exceptionObject,
_Unwind_Context *context,
void *ImplementationData );
#
Issue Class Status
Source Opened Closed
D-13
_Unwind_ForcedUnwind
lib ps
closed
all
991216
000120
Summary:
Define the interface of _Unwind_ForcedUnwind.
Resolution:
See the exception handling specification.
#
Issue Class Status
Source Opened Closed
D-14
__cxa_begin/end_catch
lib
closed
all
991216
001109
Summary:
Define the interfaces of __cxa_begin_catch
and __cxa_end_catch
.
Resolution:
See the exception handling specification.
__cxa_begin_catch
and __cxa_end_catch
identify the thrown exception.
struct X {
X(); ~X(); };
struct Y {
Y(); ~Y(); };
extern "C" int printf(const char *,...);
main()
{
try {
throw X();
} catch (X x) {
try {
throw Y();
} catch(...) {
//generates __cxa_end_catch(/*levels=*/2)
return 1;
}
}
}
#
Issue Class Status
Source Opened Closed
D-15
Terminate handler and threads
lib ps
closed
all
991216
000106
Summary:
Define how the terminate and unexpected handler registration
interacts with threads.
Resolution:
Handler registration applies to all threads.
terminate()
and
unexpected()
handlers,
but does not specify how the registration interacts with threading.
There are (at least) three possibilities:
#
Issue Class Status
Source Opened Closed
D-16
Exception specifications
lib ps
closed
all
991216
000113
Summary:
How is the type list for an exception specification
represented in the action records?
Resolution:
As specified in the HP document
#
Issue Class Status
Source Opened Closed
D-17
bad_cast, bad_typeid runtime
call
closed
CodeSourcery
000629
000706
Summary:
Define runtime support routines for throwing bad_cast and bad_typeid
exceptions.
Resolution:
Accepted as proposed originally.
See draft EH Specification.
extern "C" void __cxa_bad_cast ();
extern "C" void __cxa_bad_typeid ();
Of course these never actually return, but it causes least
confusion at the calling point by keeping the type system consistent.
These are called with something like the following pseudo C++
for dynamic_cast
extern "C" void *__throw_bad_cast ();
extern "C" std::type_info const &__throw_bad_typeid ();
for typeid (*ptr):
(void *tmp = __dynamic_cast (...),
*(T*)(tmp ? tmp : __throw_bad_cast ()))
(ptr ? *(type_info const *)ptr->vtable[-1] : __throw_bad_typeid ())
That typeid signature will mean a little reworking of the typeid
operator implementation for G++,
but not too much.
For implementations where Mark's suggestion is valid,
these will be too, but not vice-versa.
extern "C" void *__cxa_bad_cast ();
extern "C" const void *__cxa_bad_typeid ();
or whatever, in the compiler, to make the arms of the conditional have
the right type.
(__cxa_bad_cast (), (void*) NULL)
#
Issue Class Status
Source Opened Closed
D-18
__cxa_throw_type_info
lib
closed
all
001012
001109
Summary:
Should we replace the __cxa_throw_type_info pointer in the exception
object by a pair of pointers to a std::type_info
and a
destructor?
Resolution:
Make the replacement.
See Sections 2.2.1 and 2.4.3 of the
See draft EH Specification.
__cxa_throw
construct one so that the exception object can point to it.
This can't be done on the stack,
since it's about to be unwound,
and doing it on the heap when the
exception might be out-of-memory doesn't seem ideal.
__cxa_throw_type_info
pointer
in the exception object header by separate
std::type_info
and destructor pointers,
and pass them as two parameters to __cxa_throw
.
E. Template Instantiation Model Issues
E-1
When does instantiation occur?
tools
closed
SGI
990520
000511
Summary:
There are two principal models for instantiation.
The early instantiation (or Borland) model performs all
instantiation at compile time,
potentially resulting in extra copies which are removed at link time.
The pre-link instantiation model identifies the required
instantiations prior to linking and instantiates them via a special
compile step.
Resolution:
Non-export templates are instantiated where referenced in COMDAT groups.
See the Draft C++ ABI for IA-64.
#
Issue Class Status
Source Opened Closed
E
Template Instantiation Model
E-1
When does instantiation occur?
tools
closed
SGI
990520
000511
Summary:
There are two principal models for instantiation.
The early instantiation (or Borland) model performs all
instantiation at compile time,
potentially resulting in extra copies which are removed at link time.
The pre-link instantiation model identifies the required
instantiations prior to linking and instantiates them via a special
compile step.
Resolution:
Non-export templates are instantiated where referenced in COMDAT groups.
See the Draft C++ ABI for IA-64.
#
Issue Class Status
Source Opened Closed
E-3
Template repository
tools
closed
HP
990603
000511
Summary:
Independent of the template instantiation model,
we need to make sure that whatever template persistent storage is used
by one vendor does not interact negatively with other vendors' mechanisms.
Issues:
(1) Avoiding conflict on the name of any repository.
(2) If .o files are used,
describe how this information is to be preserved, ignored, etc.
(3) Evaluate if tools such as make, ld, ar, or others, can
break because .o files get written at unexpected times.
Resolution:
COMDAT emission and naming for non-export templates is specified in
the Draft C++ ABI for IA-64.
F. Name Mangling Issues
#
Issue Class Status
Source Opened Closed
F-1
Mangling convention
call
closed
SGI
990520
000330
Summary:
What rules shall be used for mangling names,
i.e. for encoding the information other than the source-level object
name necessary to resolve overloading?
Resolution:
See the Draft C++ ABI for IA-64.
Objectives of the mangling scheme include:
Entities with linkable names to be resolved include:
Name decomposition for function-like entities:
The encoding of each of these templates instantiated for template
(Combined with the parameter types,
this encodes the type of the function.
Note that even though exception specifications are not
considered part of the function type in the C++ standard;
they actually are.)
Name decomposition for data entities:
In addition, it may be desirable to encode:
a[]
makes this problematic.
Fundamental types and type operators:
std
std::char_traits
static const char __func__[]="function-name";
#
Issue Class Status
Source Opened Closed
F-2
Mangled name size
call g
closed
SGI
990520
000511
Summary:
Typical name mangling schemes to date typically begin to produce very
long names. SGI routinely encounters multi-kilobyte names,
and increasing usage of namespaces and templates will make them worse.
This has a negative impact on object file size, and on linker speed.
Resolution:
The current mangling solution is considered an adequate solution to
this problem.
51 : 43(18x) 44(10x) 27
52 : 45(30x) 44(7x) 43(8x) 50(6x)
53 : 47 46(12x) 45(18x) 44(8x) 51(2x) 50(8x)
54 : 47(32x) 46(10x) 45(2x) 53 48
55 : 47(19x) 46(16x) 53 41 48(21x)
56 : 47 48(12x)
57 : 55 44 51 50(10x) 48(4x)
58 : 38 50(7x) 56
59 : 47(2x)
60 : 47 38 51(8x) 59
61 : 55
62 : 54 53(16x) 50 65 INCREASED 56
63 : 51(2x)
64 : 63 52(2x)
65 : 54(2x) 44 50(2x) 52
66 : 55(3x) 65
67 : 57 56
68 : 49 11 58(2x) 57(2x) 56
69 : 47(6x) 12(3x) 59 58(3x) 57 55(4x) 50(4x) 9
70 : 13(2x) 60(2x) 51(3x) 56(2x) 48(3x)
71 : 14(4x) 52(3x) 59(2x) 57 56
72 : 15 14 53(2x) 60(2x) 57
73 : 63 62(2x) 58(2x) 54(7x) 53(2x) 15
74 : 59(6x) 55(6x) 54(3x) 69 66 64(2x)
75 : 63 60(3x) 57(2x) 56(4x) 70 18(2x)
76 : 63 62(2x) 61(2x) 58 57(5x) 55(2x) 64
77 : 59(2x) 62(4x) 66
78 : 63(2x) 68 57 66(2x) 60(2x) 64(2x)
79 : 78 61(3x) 62 67(2x) 65(2x)
80 : 63(2x) 62(11x) 69(2x) 66
81 : 63(3x) 62(2x) 61 58(2x) 23(8x) 54 64(2x)
82 : 23(4x) 69 68 26(2x) 64 24
83 : 71 78 69 27 66(4x) 65(2x)
84 : 55(3x) 73 67(4x) 66 65(2x)
85 : 63(2x) 69(4x) 65(2x)
86 : 68(8x)
87 : 65
88 : 70(2x)
89 : 71(2x) 74 73(4x)
90 : 68 75 74(2x) 73 72(2x)
91 : 24 74 73(2x) 64
92 : 77(2x) 76(6x)
93 : 78(2x) 77(4x) 76(2x) 11 41
94 : 79(4x) 77(2x) 80(2x)
95 : 79(2x) 65(2x)
96 : 75(2x) 73
97 : 67 68(4x)
98 : 14 69(4x) 84 83(2x) 56
99 : 15(4x) 45(2x) 60 27(3x) 83 70(4x) 67
100 : 68(3x)
101 : 17 68(4x) 59 82(2x) 19 49
102 : 63(2x) 70 60(2x) 17
103 : 71(2x) 70(2x) 18 64(3x)
104 : 78(2x) 86(8x) 21 89
105 : 86(8x) 85(2x) 67 90 64
106 : 54 24
107 : 91 88(2x)
108 : 87(4x) 92
109 : 87 74 88(2x)
110 : 94(2x) 27(2x) 26 89(4x)
111 : 95(2x) 28 27 73 89(4x)
112 : 29 97(2x)
113 : 98
114 : 31 30(2x) 93(6x)
115 : 31(4x) 33
116 : 95(8x) 101
117 : 95(8x) 103
118 : 97
119 : 36
120 : 31 95
122 : 38(2x)
124 : 74
125 : 109(2x)
126 : 110 42(2x) 52
128 : 72(2x) 77 108(2x) 44(4x) 112
129 : 33 44(5x) 113(2x) 65 73(2x)
130 : 47 110(4x) 45 75 115(2x) 114(2x)
131 : 51 116 115(3x)
132 : 47 74 56 72 53 116 82(3x) 117(3x)
133 : 83 118 117
134 : 119(3x) 118(3x)
135 : 119(2x) 51 120(4x)
136 : 50 121(3x)
137 : 122(5x) 105
138 : 123(4x) 106(2x)
139 : 124(4x)
140 : 125(5x) 65(2x)
141 : 126(2x)
142 : 127 110 44(2x)
143 : 128
146 : 94
148 : 96
149 : 52
150 : 55 60
152 : 70(2x) 122
154 : 56
157 : 68
160 : 70
162 : 55 69
169 : 126
171 : 130
174 : 72(2x)
176 : 74(2x)
178 : 75(2x)
180 : 78(2x)
185 : 61
186 : 71
187 : 83
188 : 71 70
191 : 74
192 : 75(2x) 89(8x)
193 : 89(8x)
194 : 97(2x) 108
196 : 109
197 : 101(2x)
202 : 95(2x) 150
215 : 106 48
218 : 121
220 : 106
226 : 132
228 : 133
232 : 108
234 : 111 109
235 : 139
240 : 116
242 : 117
243 : 119
250 : 145
251 : 143 128
264 : 111
267 : 163
268 : 133
278 : 88
280 : 98 93(2x) 113
282 : 132
283 : 101 116
285 : 151
288 : 130
303 : 143
305 : 144 100
308 : 148
330 : 159
333 : 133
342 : 133 148
347 : 177
355 : 101
530 : 161
#
Issue Class Status
Source Opened Closed
F-5
ILP32 vs. LP64
call
closed
HP
000210
000824
Summary:
This ABI focusses on the LP64 data model.
What should we do (if anything) to support
(a) compatibility between different vendors' ILP32 compilers
(b) compatibility between ILP32 and LP64?
Resolution:
Withdrawn -- no action.
#
Issue Class Status
Source Opened Closed
F-6
Demangling
lib
closed
Cygnus
000210
000504
Summary:
Users may sometimes want to get demangled names.
Should we provide an entry point for calling a demangler?
Resolution:
Provide a simple demangler interface callable from C.
See the Draft C++ ABI for IA-64.
namespace abi {
std::string demangle_mangled_name (const char*); // <mangled-name>
std::string demangle_type (const char*); // <type>
}
class TDemangler {
public:
void * operator new(size_t size) {
return (void*)malloc(size);
}
void operator delete(void *deadObject) {
free(deadObject);
}
TDemangler();
TDemangler(const char *mangledDecl);
~TDemangler();
enum Status { OK, Empty, Error, Truncated };
void reset();
Status getStatus() const { return status; }
Status demangleDecl(const char *mangledDecl);
Status demangleType(const char *mangledType);
Status copy(char *result, size_t maxToCopy /*including null*/) const;
Status copy(char *result, size_t maxToCopy /*including null*/,
char *name, size_t nameLength) const;
private:
Status status;
const char *p;
const char *end;
void partial(bool top, bool typeOfExternalDecl = false);
void typeName(size_t &baseOffset, size_t &baseLength);
void templateArgs();
void writePrefix(const char *text, size_t length);
void writeSuffix(const char *text, size_t length);
void writeDuplicate(unsigned offset, unsigned length);
void writeBaseName(const char *baseName, size_t baseNameLength,
size_t classNameOffset, size_t classNameLength);
enum Spacing { Before, None, After };
void writeQualifiers(const char *cv, Spacing spacing);
size_t extractCount();
void demangleDecl();
char *buffer;
size_t bufferSize;
enum { InternalBufferSize = 200 };
char internalBuffer[InternalBufferSize];
size_t nameSize;
size_t prefixSize;
size_t suffixSize;
bool spaceBeforeName;
void makeAvailable(size_t length);
void merge();
static size_t min(size_t a, size_t b) { return a < b ? a : b; }
};
namespace abi {
extern "C"
char* __cxa_demangle ( const char* mangled_name,
char* buf, size_t n,
int* status );
}
namespace abi {
struct dm {
char* name;
enum { buffer_too_small, invalid_name } status;
};
dm demangle(const char* mangled_name, char* buf, size_t n);
}
namespace abi
{
struct demangler
{
// Provide name to demangle
void demangle(char *);
protected:
// Output demangled characters
// I don't know whether it is better to output
// on char or a string... It seems there are
// many cases where the demangler can put
// multiple chars at the same time, but they
// are not zero-terminated (we know the length)
virtual void output(char c);
};
}
#include <cxxabi.h>
#include <iostream>
using namespace std;
void abi::demangler::output(char c)
{
cout << c;
}
namespace abi {
char* __cxa_demangle(const char* mangled_name,
char* buf,
size_t* n,
int* status);
}
mangled-name
is a pointer to a null-terminated array of characters.
buf
may be null.
If it is non-null, then n
must also be nonnull,
and buf is a pointer to an array, of at least *n characters,
that was allocated using malloc.
status
points to an int that's used as an error indicator.
It is permitted to be null,
in which case the user just doesn't get any detailed error information.
#
Issue Class Status
Source Opened Closed
F-7
Mangling statics
call
closed
HP
000223
000504
Summary:
What, if anything, should we do about mangling the names of objects in
static functions in case a compiler chooses to inline them?
Resolution:
Local objects are mangled with the name of the containing function
followed by a discriminator,
consisting of the object name and possibly a sequence ID.
Strings are mangled with a discriminator consisting of "s" followed
by a sequence ID.
See the Draft C++ ABI for IA-64.
#
Issue Class Status
Source Opened Closed
F-8
Identifiers with unicode letters
call
closed
HU-Berlin
000323
000413
Summary:
How should we mangle names containing unicode letters?
Resolution:
Follow the underlying C ABI.
namespace newmath {
const long double \u03A0 = 3.14159265358979;
}
String table sections hold null-terminated character sequences,
commonly called strings.
GNU Java encodes names in UTF-8 internally.
For the mangled name, if there are non-ASCII characters,
it adds a 'U' to the beginning and encodes each
such UCS-2 character as _%04x. See gcc/java/mangle.c.
#
Issue Class Status
Source Opened Closed
F-9
Strings with unicode letters
call
closed
HU-Berlin
000323
000413
Summary:
How should we handle the object file representation of narrow and wide
string literals containing unicode letters?
Resolution:
Follow the underlying C ABI.
wchar_t MvL[]=L"Martin von L\u00F6wis";
#
Issue Class Status
Source Opened Closed
F-10
Mangling function return types
call
closed
all
000330
000413
Summary:
Should we always mangle the return type of a function?
Resolution:
No. It is mangled only for template instantiations/specializations.
#
Issue Class Status
Source Opened Closed
F-11
Hash for local strings
call
closed
all
000330
000504
Summary:
How should we hash strings for local name mangling?
Resolution:
Strings are mangled with a discriminator consisting of "s" followed
by a sequence ID.
See the Draft C++ ABI for IA-64.
G. Miscellaneous Issues
#
Issue Class Status
Source Opened Closed
G-1
Basic command line options
tools
closed
HP
990603
000824
Summary:
Can we agree on basic command line options (compiler and linker)
for fundamental functionality,
possibly allowing portable makefiles?
Resolution:
Withdrawn -- no action.
#
Issue Class Status
Source Opened Closed
G-2
Detection of ODR violations
call
closed
Sun
990603
000504
Summary:
[Sun]
(See also F-3.)
Resolution:
This is a duplicate.
See F-3, F-4, F-10.
#
Issue Class Status
Source Opened Closed
G-4
Dynamic init of local static objects and multithreading
call
closed
SCO
990607
001109
Summary:
The Standard requires that local static objects with dynamic
constructors be initialized exactly once, the first time the containing
scope is entered.
Multi-threading renders the simple check of a flag before
initialization inadequate to prevent multiple initialization.
Should the ABI require locking for this purpose,
and if so, what are the necessary interfaces?
In addition to the locking of the initialization,
special exception handling treatment is required to deal with an
exception during construction.
Resolution:
The ABI will specify an 8-byte guard variable,
with one byte used for the initialization flag,
and the others available for use by a threading package for locking.
ABI routines are specified for acquiring and releasing the lock.
See ABI section 3.3.2.
thr5.C:
// static local initialization and threads
#include
"Otherwise such an object is initialized the first time control passes
through its declaration; such an object is considered initialized upon
the completion of its initialization.
If the initialization exits by throwing an exception,
the initialization is not complete,
so it will be tried again the next time control enters the declaration.
If control re-enters the declaration (recursively)
while the object is being initialized,
the behavior is undefined."
.again:
movl $guard,%eax
testl $1,(%eax) // test the done bit
jnz .done // if set, variable is initialized,
done
lock; btsl $1,(%eax) // test and set the busy bit
jc .busy
< init code > // not busy, do the initialization
movl $guard,%eax
movl $3,(%eax) // set the done bit
jmp .done
.busy:
pushl %eax // call RTS routine to wait, passing address
call1 __static_init_wait // of guard to monitor
testl %eax,%eax // 1 means exception occurred in init code,
popl %ecx
jnz .again // start the whole thing over
.done // 0 means wait finished
done busy
0 0 return 1 in %eax (EH wipe-out)
1 1 return 0 in %eax (no longer busy)
0 1 continue to wait (still busy)
1 0 internal error, shouldn't happen
f(int x) { static foo a(...); ... }
static foo a(...);
f(int x) { ... }
> 6) Static function scope constructor calls which depend on function
> arguments are likely to involve a race condition anyway, if multiple
> instances of the function can be invoked concurrently. Any of the
> calls might determine the constructor parameters. Thus these aren't
> very interesting either. And if they are really needed, they can be
> replaced with a file scope static constructor call plus an assignment.
void build_foursome(string golfer) {
static string group_name(golfer);
// process golfer into group group_name ...
}
extern "C" void __cxa_allocate_static(
bool *flag,
void *object_address,
void (*object_dtor)(void *object));
becomes:
static X x
static bool static_x_flag;
static X x;
if (!static_x_flag)
__cxa_allocate_static(&static_x_flag,
&x, __addressof(X::~X));
__cxa_allocate_static
can't work precisely as described,
since the constructor and its arguments are also needed.
Christophe said that the actual sequence is more complex,
he removed too much to simplify the presentation,
and he will attempt to provide a fuller description.
An implementation that chooses to implement its __cxa_atexit list with
elements matching this structure could then simply enqueue the above
structure on the list (without its initial doubleword guard).
An implementation using another structure might need to rearrange the data.
(This ABI would not specify either choice.)
The __cxa_guard_release call above would be re-specified to also
enqueue the object on the destruction list by calling __cxa_atexit or
its equivalent.
struct __cxa_guard {
long long guard; // Guard variable
void *next; // List link for destructor chain
void (*dtor) (void*); // Pointer to destruction routine
void *p; // Pointer to dtor parameter
dso_handle dhandle; // DSO handle for owning DSO
};
guard
member
to indicate that to the release routine?)
H. Library Interface Issues
#
Issue Class Status
Source Opened Closed
H-1
Runtime library DSO name
tools
closed
SGI
990616
000817
Summary:
Determine the name of the common C++ runtime library DSO,
e.g. libC.so
.
If there are to be vendor-specific support libraries which must coexist
in programs from mixed sources, identify naming convention for them.
Resolution:
The runtime library will be named libcxa.so
.
libcxa.so
.