ghc/docs/comm/rts-libs/multi-thread.html

   1 <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">
   2 <html>
   3 <head>
   4     <META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=ISO-8859-1">
   5 <title>The GHC Commentary - Supporting multi-threaded interoperation</title>
   6 </head>
   7 <body>
   8 <h1>The GHC Commentary - Supporting multi-threaded interoperation</h1>
   9 <em>
  10 <p>
  11 Authors: sof@galois.com, simonmar@microsoft.com<br>
  12 Date:    April 2002
  13 </p>
  14 </em>
  15 <p>
  16 This document presents the implementation of an extension to
  17 Concurrent Haskell that provides two enhancements:
  18 </p>
  19 <ul>
  20 <li>A Concurrent Haskell thread may call an external (e.g., C)
  21 function in a manner that's transparent to the execution/evaluation of
  22 other Haskell threads. Section <a href="#callout">Calling out"</a> covers this.
  23 </li>
  24 <li>
  25 OS threads may safely call Haskell functions concurrently. Section
  26 <a href="#callin">"Calling in"</a> covers this.
  27 </li>
  28 </ul>
  29
  30 <!---- ***************************************  ----->
  31 <h2 id="callout">Calling out</h2>
  32 <p>
  33 When a Concurrent Haskell(CH) thread calls a 'foreign import'ed
  34 function, the runtime system(RTS) has to handle this in a manner
  35 transparent to other CH threads. That is, they shouldn't be blocked
  36 from making progress while the CH thread executes the external
  37 call. Presently, all threads will block.
  38 </p>
  39 <p>
  40 Clearly, we have to rely on OS-level threads in order to support this
  41 kind of concurrency. The implementation described here defines the
  42 (abstract) OS threads interface that the RTS assumes. The implementation
  43 currently provides two instances of this interface, one for POSIX
  44 threads (pthreads) and one for the Win32 threads.
  45 </p>
  46
  47 <!---- ***************************************  ----->
  48 <h3>Multi-threading the RTS</h3>
  49
  50 <p>
  51 From an RTS perspective, a simple and efficient way to implement this
  52 is to retain the property that only one OS thread is allowed to
  53 execute code inside of the GHC runtime system. [There are alternate
  54 designs, but I won't go into details on their pros and cons here.]
  55 </p>
  56 <p>
  57 When this OS thread comes to execute a potentially blocking 'foreign
  58 import', it leaves the RTS, but before doing so it makes certain that
  59 another OS worker thread is available to take over its RTS executing
  60 priviledges. Consequently, the external call will be handled
  61 concurrently to the execution of the other Concurrent Haskell threads.
  62 When the external call eventually completes, the Concurrent Haskell
  63 thread that made the call is passed the result and made runnable
  64 again.
  65 </p>
  66
  67 <p>
  68 The rest of this section describes the mechanics of implementing
  69 this. There's two parts to it, one that describes how a native thread
  70 leaves the RTS to service the external call, the other how the same
  71 thread handles returning the result of the external call back to the
  72 Haskell thread.
  73 </p>
  74
  75 <!---- ***************************************  ----->
  76 <h3>Making the external call</h3>
  77
  78 <p>
  79 Presently, GHC handles 'safe' C calls by effectively emitting the
  80 following code sequence:
  81 </p>
  82
  83 <pre>
  84     ...save thread state...
  85     t = suspendThread();
  86     r = foo(arg1,...,argn);
  87     resumeThread(t);
  88     ...restore thread state...
  89     return r;
  90 </pre>
  91
  92 <p>
  93 After having squirreled away the state of a Haskell thread,
  94 <tt>Schedule.c:suspendThread()</tt> is called which puts the current
  95 thread on a list [<tt>Schedule.c:suspended_ccalling_threads</tt>]
  96 containing threads that are currently blocked waiting for external calls
  97 to complete (this is done for the purposes of finding roots when
  98 garbage collecting).
  99 </p>
 100
 101 <p>
 102 In addition to putting the Haskell thread on
 103 <tt>suspended_ccalling_threads</tt>, <tt>suspendThread()</tt> now also
 104 does the following:
 105 </p>
 106 <ul>
 107 <li>Instructs the <em>Task Manager</em> to make sure that there's a
 108 another native thread waiting in the wings to take over the execution
 109 of Haskell threads. This might entail creating a new
 110 <em>worker thread</em> or re-using one that's currently waiting for
 111 more work to do. The <a href="#taskman">Task Manager</a> section
 112 presents the functionality provided by this subsystem.
 113 </li>
 114
 115 <li>Releases its capability to execute within the RTS. By doing
 116 so, another worker thread will become unblocked and start executing
 117 code within the RTS. See the <a href="#capability">Capability</a>
 118 section for details.
 119 </li>
 120
 121 <li><tt>suspendThread()</tt> returns a token which is used to
 122 identify the Haskell thread that was added to
 123 <tt>suspended_ccalling_threads</tt>. This is done so that once the
 124 external call has completed, we know what Haskell thread to pull off
 125 the <tt>suspended_ccalling_threads</tt> list.
 126 </li>
 127 </ul>
 128
 129 <p>
 130 Upon return from <tt>suspendThread()</tt>, the OS thread is free of
 131 its RTS executing responsibility, and can now invoke the external
 132 call. Meanwhile, the other worker thread that have now gained access
 133 to the RTS will continue executing Concurrent Haskell code. Concurrent
 134 'stuff' is happening!
 135 </p>
 136
 137 <!---- ***************************************  ----->
 138 <h3>Returning the external result</h3>
 139
 140 <p>
 141 When the native thread eventually returns from the external call,
 142 the result needs to be communicated back to the Haskell thread that
 143 issued the external call. The following steps takes care of this:
 144 </p>
 145
 146 <ul>
 147 <li>The returning OS thread calls <tt>Schedule.c:resumeThread()</tt>,
 148 passing along the token referring to the Haskell thread that made the
 149 call we're returning from.
 150 </li>
 151
 152 <li>
 153 The OS thread then tries to grab hold of a <em>returning worker
 154 capability</em>, via <tt>Capability.c:grabReturnCapability()</tt>.
 155 Until granted, the thread blocks waiting for RTS permissions. Clearly we
 156 don't want the thread to be blocked longer than it has to, so whenever
 157 a thread that is executing within the RTS enters the Scheduler (which
 158 is quite often, e.g., when a Haskell thread context switch is made),
 159 it checks to see whether it can give up its RTS capability to a
 160 returning worker, which is done by calling
 161 <tt>Capability.c:yieldToReturningWorker()</tt>.
 162 </li>
 163
 164 <li>
 165 If a returning worker is waiting (the code in <tt>Capability.c</tt>
 166 keeps a counter of the number of returning workers that are currently
 167 blocked waiting), it is woken up and the given the RTS execution
 168 priviledges/capabilities of the worker thread that gave up its.
 169 </li>
 170
 171 <li>
 172 The thread that gave up its capability then tries to re-acquire
 173 the capability to execute RTS code; this is done by calling
 174 <tt>Capability.c:waitForWorkCapability()</tt>.
 175 </li>
 176
 177 <li>
 178 The returning worker that was woken up will continue execution in
 179 <tt>resumeThread()</tt>, removing its associated Haskell thread
 180 from the <tt>suspended_ccalling_threads</tt> list and start evaluating
 181 that thread, passing it the result of the external call.
 182 </li>
 183 </ul>
 184
 185 <!---- ***************************************  ----->
 186 <h3 id="rts-exec">RTS execution</h3>
 187
 188 <p>
 189 If a worker thread inside the RTS runs out of runnable Haskell
 190 threads, it goes to sleep waiting for the external calls to complete.
 191 It does this by calling <tt>waitForWorkCapability</tt>
 192 </p>
 193
 194 <p>
 195 The availability of new runnable Haskell threads is signalled when:
 196 </p>
 197
 198 <ul>
 199 <li>When an external call is set up in <tt>suspendThread()</tt>.</li>
 200 <li>When a new Haskell thread is created (e.g., whenever
 201 <tt>Concurrent.forkIO</tt> is called from within Haskell); this is
 202 signalled in <tt>Schedule.c:scheduleThread_()</tt>.
 203 </li>
 204 <li>Whenever a Haskell thread is removed from a 'blocking queue'
 205 attached to an MVar (only?).
 206 </li>
 207 </ul>
 208
 209 <!---- ***************************************  ----->
 210 <h2 id="callin">Calling in</h2>
 211
 212 Providing robust support for having multiple OS threads calling into
 213 Haskell is not as involved as its dual.
 214
 215 <ul>
 216 <li>The OS thread issues the call to a Haskell function by going via
 217 the <em>Rts API</em> (as specificed in <tt>RtsAPI.h</tt>).
 218 <li>Making the function application requires the construction of a
 219 closure on the heap. This is done in a thread-safe manner by having
 220 the OS thread lock a designated block of memory (the 'Rts API' block,
 221 which is part of the GC's root set) for the short period of time it
 222 takes to construct the application.
 223 <li>The OS thread then creates a new Haskell thread to execute the
 224 function application, which (eventually) boils down to calling
 225 <tt>Schedule.c:createThread()</tt>
 226 <li>
 227 Evaluation is kicked off by calling <tt>Schedule.c:scheduleExtThread()</tt>,
 228 which asks the Task Manager to possibly create a new worker (OS)
 229 thread to execute the Haskell thread.
 230 <li>
 231 After the OS thread has done this, it blocks waiting for the
 232 Haskell thread to complete the evaluation of the Haskell function.
 233 <p>
 234 The reason why a separate worker thread is made to evaluate the Haskell
 235 function and not the OS thread that made the call-in via the
 236 Rts API, is that we want that OS thread to return as soon as possible.
 237 We wouldn't be able to guarantee that if the OS thread entered the
 238 RTS to (initially) just execute its function application, as the
 239 Scheduler may side-track it and also ask it to evaluate other Haskell threads.
 240 </li>
 241 </ul>
 242
 243 <p>
 244 <strong>Note:</strong> As of 20020413, the implementation of the RTS API
 245 only serializes access to the allocator between multiple OS threads wanting
 246 to call into Haskell (via the RTS API.) It does not coordinate this access
 247 to the allocator with that of the OS worker thread that's currently executing
 248 within the RTS. This weakness/bug is scheduled to be tackled as part of an
 249 overhaul/reworking of the RTS API itself.
 250
 251
 252 <!---- ***************************************  ----->
 253 <h2>Subsystems introduced/modified</h2>
 254
 255 <p>
 256 These threads extensions affect the Scheduler portions of the runtime
 257 system. To make it more manageable to work with, the changes
 258 introduced a couple of new RTS 'sub-systems'. This section presents
 259 the functionality and API of these sub-systems.
 260 </p>
 261
 262 <!---- ***************************************  ----->
 263 <h3 id="#capability">Capabilities</h3>
 264
 265 <p>
 266 A Capability represent the token required to execute STG code,
 267 and all the state an OS thread/task needs to run Haskell code:
 268 its STG registers, a pointer to its TSO, a nursery etc. During
 269 STG execution, a pointer to the capabilitity is kept in a
 270 register (BaseReg).
 271 </p>
 272 <p>
 273 Only in an SMP build will there be multiple capabilities, for
 274 the threaded RTS and other non-threaded builds, there is only
 275 one global capability, namely <tt>MainCapability</tt>.
 276
 277 <p>
 278 The Capability API is as follows:
 279 <pre>
 280 /* Capability.h */
 281 extern void initCapabilities(void);
 282
 283 extern void grabReturnCapability(Mutex* pMutex, Capability** pCap);
 284 extern void waitForWorkCapability(Mutex* pMutex, Capability** pCap, rtsBool runnable);
 285 extern void releaseCapability(Capability* cap);
 286
 287 extern void yieldToReturningWorker(Mutex* pMutex, Capability* cap);
 288
 289 extern void grabCapability(Capability** cap);
 290 </pre>
 291
 292 <ul>
 293 <li><tt>initCapabilities()</tt> initialises the subsystem.
 294
 295 <li><tt>grabReturnCapability()</tt> is called by worker threads
 296 returning from an external call. It blocks them waiting to gain
 297 permissions to do so.
 298
 299 <li><tt>waitForWorkCapability()</tt> is called by worker threads
 300 already inside the RTS, but without any work to do. It blocks them
 301 waiting for there to new work to become available.
 302
 303 <li><tt>releaseCapability()</tt> hands back a capability. If a
 304 'returning worker' is waiting, it is signalled that a capability
 305 has become available. If not, <tt>releaseCapability()</tt> tries
 306 to signal worker threads that are blocked waiting inside
 307 <tt>waitForWorkCapability()</tt> that new work might now be
 308 available.
 309
 310 <li><tt>yieldToReturningWorker()</tt> is called by the worker thread
 311 that's currently inside the Scheduler. It checks whether there are other
 312 worker threads waiting to return from making an external call. If so,
 313 they're given preference and a capability is transferred between worker
 314 threads. One of the waiting 'returning worker' threads is signalled and made
 315 runnable, with the other, yielding, worker blocking to re-acquire
 316 a capability.
 317 </ul>
 318
 319 <p>
 320 The condition variables used to implement the synchronisation between
 321 worker consumers and providers are local to the Capability
 322 implementation. See source for details and comments.
 323 </p>
 324
 325 <!---- ***************************************  ----->
 326 <h3 id="taskman">The Task Manager</h3>
 327
 328 <p>
 329 The Task Manager API is responsible for managing the creation of
 330 OS worker RTS threads. When a Haskell thread wants to make an
 331 external call, the Task Manager is asked to possibly create a
 332 new worker thread to take over the RTS-executing capability of
 333 the worker thread that's exiting the RTS to execute the external call.
 334
 335 <p>
 336 The Capability subsystem keeps track of idle worker threads, so
 337 making an informed decision about whether or not to create a new OS
 338 worker thread is easy work for the task manager. The Task manager
 339 provides the following API:
 340 </p>
 341
 342 <pre>
 343 /* Task.h */
 344 extern void startTaskManager ( nat maxTasks, void (*taskStart)(void) );
 345 extern void stopTaskManager ( void );
 346
 347 extern void startTask ( void (*taskStart)(void) );
 348 </pre>
 349
 350 <ul>
 351 <li><tt>startTaskManager()</tt> and <tt>stopTaskManager()</tt> starts
 352 up and shuts down the subsystem. When starting up, you have the option
 353 to limit the overall number of worker threads that can be
 354 created. An unbounded (modulo OS thread constraints) number of threads
 355 is created if you pass '0'.
 356 <li><tt>startTask()</tt> is called when a worker thread calls
 357 <tt>suspendThread()</tt> to service an external call, asking another
 358 worker thread to take over its RTS-executing capability. It is also
 359 called when an external OS thread invokes a Haskell function via the
 360 <em>Rts API</em>.
 361 </ul>
 362
 363 <!---- ***************************************  ----->
 364 <h3>Native threads API</h3>
 365
 366 To hide OS details, the following API is used by the task manager and
 367 the scheduler to interact with an OS' threads API:
 368
 369 <pre>
 370 /* OSThreads.h */
 371 typedef <em>..OS specific..</em> Mutex;
 372 extern void initMutex    ( Mutex* pMut );
 373 extern void grabMutex    ( Mutex* pMut );
 374 extern void releaseMutex ( Mutex* pMut );
 375
 376 typedef <em>..OS specific..</em> Condition;
 377 extern void    initCondition      ( Condition* pCond );
 378 extern void    closeCondition     ( Condition* pCond );
 379 extern rtsBool broadcastCondition ( Condition* pCond );
 380 extern rtsBool signalCondition    ( Condition* pCond );
 381 extern rtsBool waitCondition      ( Condition* pCond,
 382                                     Mutex* pMut );
 383
 384 extern OSThreadId osThreadId      ( void );
 385 extern void shutdownThread        ( void );
 386 extern void yieldThread           ( void );
 387 extern int  createOSThread        ( OSThreadId* tid,
 388                                     void (*startProc)(void) );
 389 </pre>
 390
 391
 392
 393 <!---- ***************************************  ----->
 394 <h2>User-level interface</h2>
 395
 396 To signal that you want an external call to be serviced by a separate
 397 OS thread, you have to add the attribute <tt>threadsafe</tt> to
 398 a foreign import declaration, i.e.,
 399
 400 <pre>
 401 foreign import "bigComp" threadsafe largeComputation :: Int -> IO ()
 402 </pre>
 403
 404 <p>
 405 The distinction between 'safe' and thread-safe C calls is made
 406 so that we may call external functions that aren't re-entrant but may
 407 cause a GC to occur.
 408 <p>
 409 The <tt>threadsafe</tt> attribute subsumes <tt>safe</tt>.
 410 </p>
 411
 412 <!---- ***************************************  ----->
 413 <h2>Building the GHC RTS</h2>
 414
 415 The multi-threaded extension isn't currently enabled by default. To
 416 have it built, you need to run the <tt>fptools</tt> configure script
 417 with the extra option <tt>--enable-threaded-rts</tt> turned on, and
 418 then proceed to build the compiler as per normal.
 419
 420 <hr>
 421 <small>
 422 <!-- hhmts start --> Last modified: Wed Apr 10 14:21:57 Pacific Daylight Time 2002 <!-- hhmts end -->
 423 </small>
 424 </body> </html>
 425