thread-safe local static variable dynamic initialization

Tue Jun 8 18:59:00 UTC 1999

> From: Christophe de Dinechin <ddd at cup.hp.com>
> 
> > However, it's not very effective to discuss the problem in the
> > abstract.  If someone has a specific proposal for solving the problem, 
> > submit it, and we can discuss concrete characteristics instead of
> > speculations.
> 
> Maybe we can present what HP aC++ does. Any other compiler has a  
> similar mechanism?

Here's what the SCO UnixWare 7 C++ compiler does for IA-32, from a (slightly
sanitized) design document.  It meets Jim's goal of having no overhead
for non-threaded programs and minimal overhead for threaded programs unless
actual contention occurs (infrequent), and meets Mike's goal of handling
exceptions in the initialization correctly (although it doesn't guarantee
that the thread getting the exception is the one that gets next crack
at initializing the static).  It's also worth noting that dynamic
initialization of local variables (static or otherwise) is very 
common in C++, since that's what most object constructions involve,
so I don't think this case is as rare as Jim does.

Jonathan Schilling		SCO, Inc.		jls at sco.com

   [...] This is in local static variables with dynamic
   initialization, where the compiler generates out a static one-time
   flag to guard the initialization. Two threads could read the flag as
   zero before either of them set it, resulting in multiple
   initializations.

   [...] Accordingly, when compilation is done with -Kthread on, a code
   sequence will be generated to lock this initialization.  
   [...] the basic idea is to have one guard saying
   whether the initialization is done (so that multiple initializations
   do not occur) and have another guard saying whether initialization is
   in progress (so that a second thread doesn't access what it thinks is
   an initialized value before the first thread has finished the
   initialization).  [...]

   When compiled with -Kthread, the generated code for a dynamic
   initialization of a local static variable will look like the
   following. guard is a local static boolean, initialized to zero,
   generated by the [middle pass of the compiler]. 
   Two bits of it are used: the low-order 'done bit'
   and the next-low-order 'busy bit'.

.again:
        movl    $guard,%eax
        testl   $1,(%eax)       // test the done bit
        jnz     .done           // if set, variable is initialized, done
        lock; btsl  $1,(%eax)   // test and set the busy bit
        jc      .busy
        < init code >           // not busy, do the initialization
        movl    $guard,%eax
        movl    $3,(%eax)       // set the done bit
        jmp     .done
.busy:
        pushl   %eax            // call RTS routine to wait, passing address
        call1   __static_init_wait      // of guard to monitor
        testl   %eax,%eax       // 1 means exception occurred in init code,
        popl    %ecx
        jnz     .again                  // start the whole thing over
.done                                   // 0 means wait finished

   The above code will work for position-independent code as well.

   The complication due to exceptions is: what happens if the
   initialization code throws an exception? The [compiler] EH tables will have
   set up a special region and flag in their region table to detect this
   situation, along with a pointer to the guard variable. Because the
   initialization never completed, when the RTS sees that it is cleaning
   up from such a region, it will reset the guard variable back to both
   zeroes. This will free up a busy-waiting thread, if any, or will reset
   everything for the next thread that calls the function.

   The idea of the __static_init_wait() RTS routine is to monitor the
   value of guard bits passed in, by looping on this decision table:

        done    busy
        0       0       return 1 in %eax        (EH wipe-out)
        1       1       return 0 in %eax        (no longer busy)
        0       1       continue to wait        (still busy)
        1       0       internal error, shouldn't happen

   As for how the wait is done [... not relevant for ABI, although currently
   we're using thr_yield(), which may or may not be right for this context].