Regarding the ELF COMDAT proposal

Ron 603-884-2088 brender at gemevn.zko.dec.com
Thu Oct 7 14:00:58 UTC 1999


Following are comments from our local object file/linker person...

Ron

================================================================================

Ron, I've looked the proposal over.  What they're doing looks clean and
reasonable.  I have some comments regarding the three questions raised
at the end of the spec:


Do we want flags to specify checking prior to removal of duplicates,
e.g. for identical sections, same defined global symbols, etc.? If so,
should there be one flags word per section index, instead of per group? 
(We don't see a need, but this was suggested in other proposals.) 

    For the purpose of eliminating duplicate C++ definitions, the
    proposal is OK as it stands.  However, it has been the experience
    of Microsoft, in their PECOFF inplementation of COMDATs, that they
    are useful for other purposes.  For example, they perform a link-
    time optimization called "transitive COMDAT elimination" (TCE) that
    removes COMDAT groups not referenced (via relocations) from outside
    the group.  A single object file might implement a library of
    related routines, only one or two of which are actually used by the
    executable.  Space savings can be considerable if these routines are
    discarded.  In PECOFF, the compiler puts each routine in its own
    COMDAT group, and, when the linker does the TCE optimization, it
    builds a transitive reference graph (i.e., the executable references
    all non-COMDAT sections; section X references section Y (or Y's
    COMDAT group) if it has relocations for symbols in section Y).  The
    linker then excludes any COMDAT groups not in the graph.

    The proposal as it stands is insufficient to implement TCE.  Suppose
    we have an ordinary global routine named foo().  Were we to try to
    implement TCE, the compiler would generate a COMDAT group for foo()
    containing its .text and .data sections and associated sections for
    relocations and whatnot.  Now suppose that there is a second object
    file participating in the link that also implements foo().  Because
    both foo()s are COMDATs, under the proposal as it stands, the linker
    would discard one of them without raising the expected "multiple
    definition" error.  Microsoft's PECOFF solves this problem by
    implmenting a "COMDAT selection criterion" attribute associated
    with each COMDAT group.  For C++-style member function COMDAT
    matching, it uses the "select any" attribute (the linker is free to
    choose any one of the matching COMDAT groups).  For implementing TCE
    for the "hard" global definition case, PECOFF has a "no duplicates"
    attribute, which means that it is an error if a matching COMDAT
    group is found.

    I therefore propose that a flag be defined in the sh_flags field of
    a COMDAT group section header:

    SHF_COMDAT_SELECT_NODUPLICATES

    If set:  The linker issues a "multiply-defined symbol" error if
    either multiple COMDAT groups have the same identifier, or if a
    symbol matching the COMDAT group's identifier is defined in a non-
    COMDAT section in some object.

    If clear:  If multiple COMDAT groups in different object files are
    identified by symbols with the same name, the linker should remove
    all but one of the groups. If the identifying symbol is defined in
    a non-COMDAT section in some object, the linker should remove all
    of the COMDAT groups identified by that symbol. 


    Regarding the second part of the question (flags word per section
    vs. per group):  I think that per-group is sufficient.         


Do we want more control over when global symbols are removed vs. being
converted to UNDEF? Alternatively, should we simply require
that all symbols defined as addresses in the group be removed, and that
references to them from outside do so via distinct UNDEF global
symbols? 

    I think it's cleaner to have all references from outside the group
    be done via distinct UNDEF global symbols.  The only drawback to
    using distinct UNDEF globals is an increase in the size of the
    symbol table.

Do we want to replace the symbol rule by simply requiring that any
symbols defined as addresses in the group be defined in a .symtab
section that is itself in the group?

    Again, this is a cleaner way of doing things.  My big concern is
    that it would potentially mean a lot of .symtab sections, and ELF32
    currently has an architectural restriction of only 65535 sections
    per object file (due to e_shnum being a Elf32_Half).  Some ELF
    implementations are already running into this limit, even without
    the additional sections that will be created due to COMDATs.

--PSW




More information about the cxx-abi-dev mailing list