docs/comm/rts-libs/multi-thread.html

   1 <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">
   2 <html>
   3 <head>
   4     <META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=ISO-8859-1">
   5 <title>The GHC Commentary - Supporting multi-threaded interoperation</title>
   6 </head>
   7 <body>
   8 <h1>The GHC Commentary - Supporting multi-threaded interoperation</h1>
   9 <em>
  10 <p>
  11 Authors: sof@galois.com, simonmar@microsoft.com<br>
  12 Date:    April 2002
  13 </p>
  14 </em>
  15 <p>
  16 This document presents the implementation of an extension to
  17 Concurrent Haskell that provides two enhancements:
  18 </p>
  19 <ul>
  20 <li>A Concurrent Haskell thread may call an external (e.g., C)
  21 function in a manner that's transparent to the execution/evaluation of
  22 other Haskell threads. Section <a href="#callout">Calling out"</a> covers this.
  23 </li>
  24 <li>
  25 OS threads may safely call Haskell functions concurrently. Section
  26 <a href="#callin">"Calling in"</a> covers this.
  27 </li>
  28 </ul>
  29
  30 <!---- ***************************************  ----->
  31 <h2 id="callout">The problem: foreign calls that block</h2>
  32 <p>
  33 When a Concurrent Haskell(CH) thread calls a 'foreign import'ed
  34 function, the runtime system(RTS) has to handle this in a manner
  35 transparent to other CH threads. That is, they shouldn't be blocked
  36 from making progress while the CH thread executes the external
  37 call. Presently, all threads will block.
  38 </p>
  39 <p>
  40 Clearly, we have to rely on OS-level threads in order to support this
  41 kind of concurrency. The implementation described here defines the
  42 (abstract) OS threads interface that the RTS assumes. The implementation
  43 currently provides two instances of this interface, one for POSIX
  44 threads (pthreads) and one for the Win32 threads.
  45 </p>
  46
  47 <!---- ***************************************  ----->
  48 <h3>Multi-threading the RTS</h3>
  49
  50 <p>
  51 A simple and efficient way to implement non-blocking foreign calls is like this:
  52 <ul>
  53 <li> Invariant: only one OS thread is allowed to
  54 execute code inside of the GHC runtime system. [There are alternate
  55 designs, but I won't go into details on their pros and cons here.]
  56 We'll call the OS thread that is currently running Haskell threads
  57 the <em>Current Haskell Worker Thread</em>.
  58 <p>
  59 The Current Haskell Worker Thread repeatedly grabs a Haskell thread, executes it until its
  60 time-slice expires or it blocks on an MVar, then grabs another, and executes
  61 that, and so on.
  62 </p>
  63 <li>
  64 <p>
  65 When the Current Haskell Worker comes to execute a potentially blocking 'foreign
  66 import', it leaves the RTS and ceases being the Current Haskell Worker, but before doing so it makes certain that
  67 another OS worker thread is available to become the Current Haskell Worker.
  68 Consequently, even if the external call blocks, the new Current Haskell Worker
  69 continues execution of the other Concurrent Haskell threads.
  70 When the external call eventually completes, the Concurrent Haskell
  71 thread that made the call is passed the result and made runnable
  72 again.
  73 </p>
  74 <p>
  75 <li>
  76 A pool of OS threads are constantly trying to become the Current Haskell Worker.
  77 Only one succeeds at any moment.   If the pool becomes empty, the RTS creates more workers.
  78 <p><li>
  79 The OS worker threads are regarded as interchangeable.  A given Haskell thread
  80 may, during its lifetime, be executed entirely by one OS worker thread, or by more than one.
  81 There's just no way to tell.
  82
  83 <p><li>If a foreign program wants to call a Haskell function, there is always a thread switch involved.
  84 The foreign program uses thread-safe mechanisms to create a Haskell thread and make it runnable; and
  85 the current Haskell Worker Thread exectutes it. See Section <a href="#callin">Calling in</a>.
  86 </ul>
  87 <p>
  88 The rest of this section describes the mechanics of implementing all
  89 this. There's two parts to it, one that describes how a native (OS) thread
  90 leaves the RTS to service the external call, the other how the same
  91 thread handles returning the result of the external call back to the
  92 Haskell thread.
  93 </p>
  94
  95 <!---- ***************************************  ----->
  96 <h3>Making the external call</h3>
  97
  98 <p>
  99 Presently, GHC handles 'safe' C calls by effectively emitting the
 100 following code sequence:
 101 </p>
 102
 103 <pre>
 104     ...save thread state...
 105     t = suspendThread();
 106     r = foo(arg1,...,argn);
 107     resumeThread(t);
 108     ...restore thread state...
 109     return r;
 110 </pre>
 111
 112 <p>
 113 After having squirreled away the state of a Haskell thread,
 114 <tt>Schedule.c:suspendThread()</tt> is called which puts the current
 115 thread on a list [<tt>Schedule.c:suspended_ccalling_threads</tt>]
 116 containing threads that are currently blocked waiting for external calls
 117 to complete (this is done for the purposes of finding roots when
 118 garbage collecting).
 119 </p>
 120
 121 <p>
 122 In addition to putting the Haskell thread on
 123 <tt>suspended_ccalling_threads</tt>, <tt>suspendThread()</tt> now also
 124 does the following:
 125 </p>
 126 <ul>
 127 <li>Instructs the <em>Task Manager</em> to make sure that there's a
 128 another native thread waiting in the wings to take over the execution
 129 of Haskell threads. This might entail creating a new
 130 <em>worker thread</em> or re-using one that's currently waiting for
 131 more work to do. The <a href="#taskman">Task Manager</a> section
 132 presents the functionality provided by this subsystem.
 133 </li>
 134
 135 <li>Releases its capability to execute within the RTS. By doing
 136 so, another worker thread will become unblocked and start executing
 137 code within the RTS. See the <a href="#capability">Capability</a>
 138 section for details.
 139 </li>
 140
 141 <li><tt>suspendThread()</tt> returns a token which is used to
 142 identify the Haskell thread that was added to
 143 <tt>suspended_ccalling_threads</tt>. This is done so that once the
 144 external call has completed, we know what Haskell thread to pull off
 145 the <tt>suspended_ccalling_threads</tt> list.
 146 </li>
 147 </ul>
 148
 149 <p>
 150 Upon return from <tt>suspendThread()</tt>, the OS thread is free of
 151 its RTS executing responsibility, and can now invoke the external
 152 call. Meanwhile, the other worker thread that have now gained access
 153 to the RTS will continue executing Concurrent Haskell code. Concurrent
 154 'stuff' is happening!
 155 </p>
 156
 157 <!---- ***************************************  ----->
 158 <h3>Returning the external result</h3>
 159
 160 <p>
 161 When the native thread eventually returns from the external call,
 162 the result needs to be communicated back to the Haskell thread that
 163 issued the external call. The following steps takes care of this:
 164 </p>
 165
 166 <ul>
 167 <li>The returning OS thread calls <tt>Schedule.c:resumeThread()</tt>,
 168 passing along the token referring to the Haskell thread that made the
 169 call we're returning from.
 170 </li>
 171
 172 <li>
 173 The OS thread then tries to grab hold of a <em>returning worker
 174 capability</em>, via <tt>Capability.c:grabReturnCapability()</tt>.
 175 Until granted, the thread blocks waiting for RTS permissions. Clearly we
 176 don't want the thread to be blocked longer than it has to, so whenever
 177 a thread that is executing within the RTS enters the Scheduler (which
 178 is quite often, e.g., when a Haskell thread context switch is made),
 179 it checks to see whether it can give up its RTS capability to a
 180 returning worker, which is done by calling
 181 <tt>Capability.c:yieldToReturningWorker()</tt>.
 182 </li>
 183
 184 <li>
 185 If a returning worker is waiting (the code in <tt>Capability.c</tt>
 186 keeps a counter of the number of returning workers that are currently
 187 blocked waiting), it is woken up and the given the RTS execution
 188 priviledges/capabilities of the worker thread that gave up its.
 189 </li>
 190
 191 <li>
 192 The thread that gave up its capability then tries to re-acquire
 193 the capability to execute RTS code; this is done by calling
 194 <tt>Capability.c:waitForWorkCapability()</tt>.
 195 </li>
 196
 197 <li>
 198 The returning worker that was woken up will continue execution in
 199 <tt>resumeThread()</tt>, removing its associated Haskell thread
 200 from the <tt>suspended_ccalling_threads</tt> list and start evaluating
 201 that thread, passing it the result of the external call.
 202 </li>
 203 </ul>
 204
 205 <!---- ***************************************  ----->
 206 <h3 id="rts-exec">RTS execution</h3>
 207
 208 <p>
 209 If a worker thread inside the RTS runs out of runnable Haskell
 210 threads, it goes to sleep waiting for the external calls to complete.
 211 It does this by calling <tt>waitForWorkCapability</tt>
 212 </p>
 213
 214 <p>
 215 The availability of new runnable Haskell threads is signalled when:
 216 </p>
 217
 218 <ul>
 219 <li>When an external call is set up in <tt>suspendThread()</tt>.</li>
 220 <li>When a new Haskell thread is created (e.g., whenever
 221 <tt>Concurrent.forkIO</tt> is called from within Haskell); this is
 222 signalled in <tt>Schedule.c:scheduleThread_()</tt>.
 223 </li>
 224 <li>Whenever a Haskell thread is removed from a 'blocking queue'
 225 attached to an MVar (only?).
 226 </li>
 227 </ul>
 228
 229 <!---- ***************************************  ----->
 230 <h2 id="callin">Calling in</h2>
 231
 232 Providing robust support for having multiple OS threads calling into
 233 Haskell is not as involved as its dual.
 234
 235 <ul>
 236 <li>The OS thread issues the call to a Haskell function by going via
 237 the <em>Rts API</em> (as specificed in <tt>RtsAPI.h</tt>).
 238 <li>Making the function application requires the construction of a
 239 closure on the heap. This is done in a thread-safe manner by having
 240 the OS thread lock a designated block of memory (the 'Rts API' block,
 241 which is part of the GC's root set) for the short period of time it
 242 takes to construct the application.
 243 <li>The OS thread then creates a new Haskell thread to execute the
 244 function application, which (eventually) boils down to calling
 245 <tt>Schedule.c:createThread()</tt>
 246 <li>
 247 Evaluation is kicked off by calling <tt>Schedule.c:scheduleExtThread()</tt>,
 248 which asks the Task Manager to possibly create a new worker (OS)
 249 thread to execute the Haskell thread.
 250 <li>
 251 After the OS thread has done this, it blocks waiting for the
 252 Haskell thread to complete the evaluation of the Haskell function.
 253 <p>
 254 The reason why a separate worker thread is made to evaluate the Haskell
 255 function and not the OS thread that made the call-in via the
 256 Rts API, is that we want that OS thread to return as soon as possible.
 257 We wouldn't be able to guarantee that if the OS thread entered the
 258 RTS to (initially) just execute its function application, as the
 259 Scheduler may side-track it and also ask it to evaluate other Haskell threads.
 260 </li>
 261 </ul>
 262
 263 <p>
 264 <strong>Note:</strong> As of 20020413, the implementation of the RTS API
 265 only serializes access to the allocator between multiple OS threads wanting
 266 to call into Haskell (via the RTS API.) It does not coordinate this access
 267 to the allocator with that of the OS worker thread that's currently executing
 268 within the RTS. This weakness/bug is scheduled to be tackled as part of an
 269 overhaul/reworking of the RTS API itself.
 270
 271
 272 <!---- ***************************************  ----->
 273 <h2>Subsystems introduced/modified</h2>
 274
 275 <p>
 276 These threads extensions affect the Scheduler portions of the runtime
 277 system. To make it more manageable to work with, the changes
 278 introduced a couple of new RTS 'sub-systems'. This section presents
 279 the functionality and API of these sub-systems.
 280 </p>
 281
 282 <!---- ***************************************  ----->
 283 <h3 id="#capability">Capabilities</h3>
 284
 285 <p>
 286 A Capability represent the token required to execute STG code,
 287 and all the state an OS thread/task needs to run Haskell code:
 288 its STG registers, a pointer to its TSO, a nursery etc. During
 289 STG execution, a pointer to the capabilitity is kept in a
 290 register (BaseReg).
 291 </p>
 292 <p>
 293 Only in an SMP build will there be multiple capabilities, for
 294 the threaded RTS and other non-threaded builds, there is only
 295 one global capability, namely <tt>MainCapability</tt>.
 296
 297 <p>
 298 The Capability API is as follows:
 299 <pre>
 300 /* Capability.h */
 301 extern void initCapabilities(void);
 302
 303 extern void grabReturnCapability(Mutex* pMutex, Capability** pCap);
 304 extern void waitForWorkCapability(Mutex* pMutex, Capability** pCap, rtsBool runnable);
 305 extern void releaseCapability(Capability* cap);
 306
 307 extern void yieldToReturningWorker(Mutex* pMutex, Capability* cap);
 308
 309 extern void grabCapability(Capability** cap);
 310 </pre>
 311
 312 <ul>
 313 <li><tt>initCapabilities()</tt> initialises the subsystem.
 314
 315 <li><tt>grabReturnCapability()</tt> is called by worker threads
 316 returning from an external call. It blocks them waiting to gain
 317 permissions to do so.
 318
 319 <li><tt>waitForWorkCapability()</tt> is called by worker threads
 320 already inside the RTS, but without any work to do. It blocks them
 321 waiting for there to new work to become available.
 322
 323 <li><tt>releaseCapability()</tt> hands back a capability. If a
 324 'returning worker' is waiting, it is signalled that a capability
 325 has become available. If not, <tt>releaseCapability()</tt> tries
 326 to signal worker threads that are blocked waiting inside
 327 <tt>waitForWorkCapability()</tt> that new work might now be
 328 available.
 329
 330 <li><tt>yieldToReturningWorker()</tt> is called by the worker thread
 331 that's currently inside the Scheduler. It checks whether there are other
 332 worker threads waiting to return from making an external call. If so,
 333 they're given preference and a capability is transferred between worker
 334 threads. One of the waiting 'returning worker' threads is signalled and made
 335 runnable, with the other, yielding, worker blocking to re-acquire
 336 a capability.
 337 </ul>
 338
 339 <p>
 340 The condition variables used to implement the synchronisation between
 341 worker consumers and providers are local to the Capability
 342 implementation. See source for details and comments.
 343 </p>
 344
 345 <!---- ***************************************  ----->
 346 <h3 id="taskman">The Task Manager</h3>
 347
 348 <p>
 349 The Task Manager API is responsible for managing the creation of
 350 OS worker RTS threads. When a Haskell thread wants to make an
 351 external call, the Task Manager is asked to possibly create a
 352 new worker thread to take over the RTS-executing capability of
 353 the worker thread that's exiting the RTS to execute the external call.
 354
 355 <p>
 356 The Capability subsystem keeps track of idle worker threads, so
 357 making an informed decision about whether or not to create a new OS
 358 worker thread is easy work for the task manager. The Task manager
 359 provides the following API:
 360 </p>
 361
 362 <pre>
 363 /* Task.h */
 364 extern void startTaskManager ( nat maxTasks, void (*taskStart)(void) );
 365 extern void stopTaskManager ( void );
 366
 367 extern void startTask ( void (*taskStart)(void) );
 368 </pre>
 369
 370 <ul>
 371 <li><tt>startTaskManager()</tt> and <tt>stopTaskManager()</tt> starts
 372 up and shuts down the subsystem. When starting up, you have the option
 373 to limit the overall number of worker threads that can be
 374 created. An unbounded (modulo OS thread constraints) number of threads
 375 is created if you pass '0'.
 376 <li><tt>startTask()</tt> is called when a worker thread calls
 377 <tt>suspendThread()</tt> to service an external call, asking another
 378 worker thread to take over its RTS-executing capability. It is also
 379 called when an external OS thread invokes a Haskell function via the
 380 <em>Rts API</em>.
 381 </ul>
 382
 383 <!---- ***************************************  ----->
 384 <h3>Native threads API</h3>
 385
 386 To hide OS details, the following API is used by the task manager and
 387 the scheduler to interact with an OS' threads API:
 388
 389 <pre>
 390 /* OSThreads.h */
 391 typedef <em>..OS specific..</em> Mutex;
 392 extern void initMutex    ( Mutex* pMut );
 393 extern void grabMutex    ( Mutex* pMut );
 394 extern void releaseMutex ( Mutex* pMut );
 395
 396 typedef <em>..OS specific..</em> Condition;
 397 extern void    initCondition      ( Condition* pCond );
 398 extern void    closeCondition     ( Condition* pCond );
 399 extern rtsBool broadcastCondition ( Condition* pCond );
 400 extern rtsBool signalCondition    ( Condition* pCond );
 401 extern rtsBool waitCondition      ( Condition* pCond,
 402                                     Mutex* pMut );
 403
 404 extern OSThreadId osThreadId      ( void );
 405 extern void shutdownThread        ( void );
 406 extern void yieldThread           ( void );
 407 extern int  createOSThread        ( OSThreadId* tid,
 408                                     void (*startProc)(void) );
 409 </pre>
 410
 411
 412
 413 <!---- ***************************************  ----->
 414 <h2>User-level interface</h2>
 415
 416 To signal that you want an external call to be serviced by a separate
 417 OS thread, you have to add the attribute <tt>threadsafe</tt> to
 418 a foreign import declaration, i.e.,
 419
 420 <pre>
 421 foreign import "bigComp" threadsafe largeComputation :: Int -> IO ()
 422 </pre>
 423
 424 <p>
 425 The distinction between 'safe' and thread-safe C calls is made
 426 so that we may call external functions that aren't re-entrant but may
 427 cause a GC to occur.
 428 <p>
 429 The <tt>threadsafe</tt> attribute subsumes <tt>safe</tt>.
 430 </p>
 431
 432 <!---- ***************************************  ----->
 433 <h2>Building the GHC RTS</h2>
 434
 435 The multi-threaded extension isn't currently enabled by default. To
 436 have it built, you need to run the <tt>fptools</tt> configure script
 437 with the extra option <tt>--enable-threaded-rts</tt> turned on, and
 438 then proceed to build the compiler as per normal.
 439
 440 <hr>
 441 <small>
 442 <!-- hhmts start --> Last modified: Wed Apr 10 14:21:57 Pacific Daylight Time 2002 <!-- hhmts end -->
 443 </small>
 444 </body> </html>
 445