ghc/docs/users_guide/parallel.lit

   1 % both concurrent and parallel
   2 %************************************************************************
   3 %*                                                                      *
   4 \section[concurrent-and-parallel]{Concurrent and Parallel Haskell}
   5 \index{Concurrent Haskell}
   6 \index{Parallel Haskell}
   7 %*                                                                      *
   8 %************************************************************************
   9
  10 Concurrent and Parallel Haskell are Glasgow extensions to Haskell
  11 which let you structure your program as a group of independent
  12 `threads'.
  13
  14 Concurrent and Parallel Haskell have very different purposes.
  15
  16 Concurrent Haskell is for applications which have an inherent
  17 structure of interacting, concurrent tasks (i.e. `threads').  Threads
  18 in such programs may be {\em required}.  For example, if a concurrent
  19 thread has been spawned to handle a mouse click, it isn't
  20 optional---the user wants something done!
  21
  22 A Concurrent Haskell program implies multiple `threads' running within
  23 a single Unix process on a single processor.
  24
  25 You will find at least one paper about Concurrent Haskell hanging off
  26 of Simon Peyton Jones's Web page;
  27 \tr{http://www.dcs.gla.ac.uk/~simonpj/}.
  28
  29 Parallel Haskell is about {\em speed}---spawning threads onto multiple
  30 processors so that your program will run faster.  The `threads'
  31 are always {\em advisory}---if the runtime system thinks it can
  32 get the job done more quickly by sequential execution, then fine.
  33
  34 A Parallel Haskell program implies multiple processes running on
  35 multiple processors, under a PVM (Parallel Virtual Machine) framework.
  36
  37 Parallel Haskell is still relatively new; it is more about ``research
  38 fun'' than about ``speed.'' That will change.
  39
  40 Again, check Simon's Web page for publications about Parallel Haskell
  41 (including ``GUM'', the key bits of the runtime system).
  42
  43 Some details about Concurrent and Parallel Haskell follow.
  44
  45 %************************************************************************
  46 %*                                                                      *
  47 \subsection{Concurrent and Parallel Haskell---language features}
  48 \index{Concurrent Haskell---features}
  49 \index{Parallel Haskell---features}
  50 %*                                                                      *
  51 %************************************************************************
  52
  53 %************************************************************************
  54 %*                                                                      *
  55 \subsubsection{Features specific to Concurrent Haskell}
  56 %*                                                                      *
  57 %************************************************************************
  58
  59 %************************************************************************
  60 %*                                                                      *
  61 \subsubsubsection{The \tr{Concurrent} interface (recommended)}
  62 \index{Concurrent interface}
  63 %*                                                                      *
  64 %************************************************************************
  65
  66 GHC provides a \tr{Concurrent} module, a common interface to a
  67 collection of useful concurrency abstractions, including those
  68 mentioned in the ``concurrent paper''.
  69
  70 Just put \tr{import Concurrent} into your modules, and away you go.
  71 To create a ``required thread'':
  72
  73 \begin{verbatim}
  74 forkIO :: IO a -> IO a
  75 \end{verbatim}
  76
  77 The \tr{Concurrent} interface also provides access to ``I-Vars''
  78 and ``M-Vars'', which are two flavours of {\em synchronising variables}.
  79 \index{synchronising variables (Glasgow extension)}
  80 \index{concurrency -- synchronising variables}
  81
  82 \tr{IVars}\index{IVars (Glasgow extension)} are write-once
  83 variables.  They start out empty, and any threads that attempt to read
  84 them will block until they are filled.  Once they are written, any
  85 blocked threads are freed, and additional reads are permitted.
  86 Attempting to write a value to a full \tr{IVar} results in a runtime
  87 error.  Interface:
  88 \begin{verbatim}
  89 newIVar     :: IO (IVar a)
  90 readIVar    :: IVar a -> IO a
  91 writeIVar   :: IVar a -> a -> IO ()
  92 \end{verbatim}
  93
  94 \tr{MVars}\index{MVars (Glasgow extension)} are rendezvous points,
  95 mostly for concurrent threads.  They begin empty, and any attempt to
  96 read an empty \tr{MVar} blocks.  When an \tr{MVar} is written, a
  97 single blocked thread may be freed.  Reading an \tr{MVar} toggles its
  98 state from full back to empty.  Therefore, any value written to an
  99 \tr{MVar} may only be read once.  Multiple reads and writes are
 100 allowed, but there must be at least one read between any two
 101 writes. Interface:
 102 \begin{verbatim}
 103 newEmptyMVar :: IO (MVar a)
 104 newMVar      :: a -> IO (MVar a)
 105 takeMVar     :: MVar a -> IO a
 106 putMVar      :: MVar a -> a -> IO ()
 107 readMVar     :: MVar a -> IO a
 108 swapMVar     :: MVar a -> a -> IO a
 109 \end{verbatim}
 110
 111 A {\em channel variable} (@CVar@) is a one-element channel, as
 112 described in the paper:
 113
 114 \begin{verbatim}
 115 data CVar a
 116 newCVar :: IO (CVar a)
 117 putCVar :: CVar a -> a -> IO ()
 118 getCVar :: CVar a -> IO a
 119 \end{verbatim}
 120
 121 A @Channel@ is an unbounded channel:
 122
 123 \begin{verbatim}
 124 data Chan a
 125 newChan         :: IO (Chan a)
 126 putChan         :: Chan a -> a -> IO ()
 127 getChan         :: Chan a -> IO a
 128 dupChan         :: Chan a -> IO (Chan a)
 129 unGetChan       :: Chan a -> a -> IO ()
 130 getChanContents :: Chan a -> IO [a]
 131 \end{verbatim}
 132
 133 General and quantity semaphores:
 134
 135 \begin{verbatim}
 136 data QSem
 137 newQSem     :: Int   -> IO QSem
 138 waitQSem    :: QSem  -> IO ()
 139 signalQSem  :: QSem  -> IO ()
 140
 141 data QSemN
 142 newQSemN    :: Int   -> IO QSemN
 143 signalQSemN :: QSemN -> Int -> IO ()
 144 waitQSemN   :: QSemN -> Int -> IO ()
 145 \end{verbatim}
 146
 147 Merging streams---binary and n-ary:
 148
 149 \begin{verbatim}
 150 mergeIO  :: [a]   -> [a] -> IO [a]
 151 nmergeIO :: [[a]] -> IO [a]
 152 \end{verbatim}
 153
 154 A {\em Sample variable} (@SampleVar@) is slightly different from a
 155 normal @MVar@:
 156 \begin{itemize}
 157 \item Reading an empty @SampleVar@ causes the reader to block
 158     (same as @takeMVar@ on empty @MVar@).
 159 \item Reading a filled @SampleVar@ empties it and returns value.
 160     (same as @takeMVar@)
 161 \item Writing to an empty @SampleVar@ fills it with a value, and
 162 potentially, wakes up a blocked reader  (same as for @putMVar@ on empty @MVar@).
 163 \item Writing to a filled @SampleVar@ overwrites the current value.
 164  (different from @putMVar@ on full @MVar@.)
 165 \end{itemize}
 166
 167 \begin{verbatim}
 168 type SampleVar a = MVar (Int, MVar a)
 169
 170 emptySampleVar :: SampleVar a -> IO ()
 171 newSampleVar   :: IO (SampleVar a)
 172 readSample     :: SampleVar a -> IO a
 173 writeSample    :: SampleVar a -> a -> IO ()
 174 \end{verbatim}
 175
 176 Finally, there are operations to delay a concurrent thread, and to
 177 make one wait:\index{delay a concurrent thread}
 178 \index{wait for a file descriptor}
 179 \begin{verbatim}
 180 threadDelay     :: Int -> IO () -- delay rescheduling for N microseconds
 181 threadWaitRead  :: Int -> IO () -- wait for input on specified file descriptor
 182 threadWaitWrite :: Int -> IO () -- (read and write, respectively).
 183 \end{verbatim}
 184
 185 %************************************************************************
 186 %*                                                                      *
 187 \subsubsection{Features specific to Parallel Haskell}
 188 %*                                                                      *
 189 %************************************************************************
 190
 191 %************************************************************************
 192 %*                                                                      *
 193 \subsubsubsection{The \tr{Parallel} interface (recommended)}
 194 \index{Parallel interface}
 195 %*                                                                      *
 196 %************************************************************************
 197
 198 GHC provides two functions for controlling parallel execution, through
 199 the \tr{Parallel} interface:
 200 \begin{verbatim}
 201 interface Parallel where
 202 infixr 0 `par`
 203 infixr 1 `seq`
 204
 205 par :: a -> b -> b
 206 seq :: a -> b -> b
 207 \end{verbatim}
 208
 209 The expression \tr{(x `par` y)} {\em sparks} the evaluation of \tr{x}
 210 (to weak head normal form) and returns \tr{y}.  Sparks are queued for
 211 execution in FIFO order, but are not executed immediately.  At the
 212 next heap allocation, the currently executing thread will yield
 213 control to the scheduler, and the scheduler will start a new thread
 214 (until reaching the active thread limit) for each spark which has not
 215 already been evaluated to WHNF.
 216
 217 The expression \tr{(x `seq` y)} evaluates \tr{x} to weak head normal
 218 form and then returns \tr{y}.  The \tr{seq} primitive can be used to
 219 force evaluation of an expression beyond WHNF, or to impose a desired
 220 execution sequence for the evaluation of an expression.
 221
 222 For example, consider the following parallel version of our old
 223 nemesis, \tr{nfib}:
 224
 225 \begin{verbatim}
 226 import Parallel
 227
 228 nfib :: Int -> Int
 229 nfib n | n <= 1 = 1
 230        | otherwise = par n1 (seq n2 (n1 + n2 + 1))
 231                      where n1 = nfib (n-1)
 232                            n2 = nfib (n-2)
 233 \end{verbatim}
 234
 235 For values of \tr{n} greater than 1, we use \tr{par} to spark a thread
 236 to evaluate \tr{nfib (n-1)}, and then we use \tr{seq} to force the
 237 parent thread to evaluate \tr{nfib (n-2)} before going on to add
 238 together these two subexpressions.  In this divide-and-conquer
 239 approach, we only spark a new thread for one branch of the computation
 240 (leaving the parent to evaluate the other branch).  Also, we must use
 241 \tr{seq} to ensure that the parent will evaluate \tr{n2} {\em before}
 242 \tr{n1} in the expression \tr{(n1 + n2 + 1)}.  It is not sufficient to
 243 reorder the expression as \tr{(n2 + n1 + 1)}, because the compiler may
 244 not generate code to evaluate the addends from left to right.
 245
 246 %************************************************************************
 247 %*                                                                      *
 248 \subsubsubsection{Underlying functions and primitives}
 249 \index{parallelism primitives}
 250 \index{primitives for parallelism}
 251 %*                                                                      *
 252 %************************************************************************
 253
 254 The functions \tr{par} and \tr{seq} are wired into GHC, and unfold
 255 into uses of the \tr{par#} and \tr{seq#} primitives, respectively.  If
 256 you'd like to see this with your very own eyes, just run GHC with the
 257 \tr{-ddump-simpl} option.  (Anything for a good time...)
 258
 259 You can use \tr{par} and \tr{seq} in Concurrent Haskell, though
 260 I'm not sure why you would want to.
 261
 262 %************************************************************************
 263 %*                                                                      *
 264 \subsubsection{Features common to Concurrent and Parallel Haskell}
 265 %*                                                                      *
 266 %************************************************************************
 267
 268 Actually, you can use the \tr{`par`} and \tr{`seq`} combinators
 269 (really for Parallel Haskell) in Concurrent Haskell as well.
 270 But doing things like ``\tr{par} to \tr{forkIO} many required threads''
 271 counts as ``jumping out the 9th-floor window, just to see what happens.''
 272
 273 %************************************************************************
 274 %*                                                                      *
 275 \subsubsubsection{Scheduling policy for concurrent/parallel threads}
 276 \index{Scheduling---concurrent/parallel}
 277 \index{Concurrent/parallel scheduling}
 278 %*                                                                      *
 279 %************************************************************************
 280
 281 Runnable threads are scheduled in round-robin fashion.  Context
 282 switches are signalled by the generation of new sparks or by the
 283 expiry of a virtual timer (the timer interval is configurable with the
 284 \tr{-C[<num>]}\index{-C<num> RTS option (concurrent, parallel)} RTS option).
 285 However, a context switch doesn't really happen until the next heap
 286 allocation.  If you want extremely short time slices, the \tr{-C} RTS
 287 option can be used to force a context switch at each and every heap
 288 allocation.
 289
 290 When a context switch occurs, pending sparks which have not already
 291 been reduced to weak head normal form are turned into new threads.
 292 However, there is a limit to the number of active threads (runnable or
 293 blocked) which are allowed at any given time.  This limit can be
 294 adjusted with the \tr{-t<num>}\index{-t <num> RTS option (concurrent, parallel)}
 295 RTS option (the default is 32).  Once the
 296 thread limit is reached, any remaining sparks are deferred until some
 297 of the currently active threads are completed.
 298
 299 %************************************************************************
 300 %*                                                                      *
 301 \subsection{How to use Concurrent and Parallel Haskell}
 302 %*                                                                      *
 303 %************************************************************************
 304
 305 [You won't get far unless your GHC system was configured/built with
 306 concurrency and/or parallelism enabled.  (They require separate
 307 library modules.)  The relevant section of the installation guide says
 308 how to do this.]
 309
 310 %************************************************************************
 311 %*                                                                      *
 312 \subsubsection{Using Concurrent Haskell}
 313 \index{Concurrent Haskell---use}
 314 %*                                                                      *
 315 %************************************************************************
 316
 317 To compile a program as Concurrent Haskell, use the \tr{-concurrent}
 318 option,\index{-concurrent option} both when compiling {\em and
 319 linking}.  You will probably need the \tr{-fglasgow-exts} option, too.
 320
 321 Three RTS options are provided for modifying the behaviour of the
 322 threaded runtime system.  See the descriptions of \tr{-C[<us>]}, \tr{-q},
 323 and \tr{-t<num>} in \Sectionref{parallel-rts-opts}.
 324
 325 %************************************************************************
 326 %*                                                                      *
 327 \subsubsubsection[concurrent-problems]{Potential problems with Concurrent Haskell}
 328 \index{Concurrent Haskell problems}
 329 \index{problems, Concurrent Haskell}
 330 %*                                                                      *
 331 %************************************************************************
 332
 333 The main thread in a Concurrent Haskell program is given its own
 334 private stack space, but all other threads are given stack space from
 335 the heap.  Stack space for the main thread can be
 336 adjusted as usual with the \tr{-K} RTS
 337 option,\index{-K RTS option (concurrent, parallel)} but if this
 338 private stack space is exhausted, the main thread will switch to stack
 339 segments in the heap, just like any other thread.  Thus, problems
 340 which would normally result in stack overflow in ``sequential Haskell''
 341 can be expected to result in heap overflow when using threads.
 342
 343 The concurrent runtime system uses black holes as synchronisation
 344 points for subexpressions which are shared among multiple threads.  In
 345 ``sequential Haskell'', a black hole indicates a cyclic data
 346 dependency, which is a fatal error.  However, in concurrent execution, a
 347 black hole may simply indicate that the desired expression is being
 348 evaluated by another thread.  Therefore, when a thread encounters a
 349 black hole, it simply blocks and waits for the black hole to be
 350 updated.  Cyclic data dependencies will result in deadlock, and the
 351 program will fail to terminate.
 352
 353 Because the concurrent runtime system uses black holes as
 354 synchronisation points, it is not possible to disable black-holing
 355 with the \tr{-N} RTS option.\index{-N RTS option} Therefore, the use
 356 of signal handlers (including timeouts) with the concurrent runtime
 357 system can lead to problems if a thread attempts to enter a black hole
 358 that was created by an abandoned computation.  The use of signal
 359 handlers in conjunction with threads is strongly discouraged.
 360
 361
 362 %************************************************************************
 363 %*                                                                      *
 364 \subsubsection{Using Parallel Haskell}
 365 \index{Parallel Haskell---use}
 366 %*                                                                      *
 367 %************************************************************************
 368
 369 [You won't be able to execute parallel Haskell programs unless PVM3
 370 (Parallel Virtual Machine, version 3) is installed at your site.]
 371
 372 To compile a Haskell program for parallel execution under PVM, use the
 373 \tr{-parallel} option,\index{-parallel option} both when compiling
 374 {\em and linking}.  You will probably want to \tr{import Parallel}
 375 into your Haskell modules.
 376
 377 To run your parallel program, once PVM is going, just invoke it ``as
 378 normal''.  The main extra RTS option is \tr{-N<n>}, to say how many
 379 PVM ``processors'' your program to run on.  (For more details of
 380 all relevant RTS options, please see \sectionref{parallel-rts-opts}.)
 381
 382 In truth, running Parallel Haskell programs and getting information
 383 out of them (e.g., parallelism profiles) is a battle with the vagaries of
 384 PVM, detailed in the following sections.
 385
 386 %************************************************************************
 387 %*                                                                      *
 388 \subsubsubsection{Dummy's guide to using PVM}
 389 \index{PVM, how to use}
 390 \index{Parallel Haskell---PVM use}
 391 %*                                                                      *
 392 %************************************************************************
 393
 394 Before you can run a parallel program under PVM, you must set the
 395 required environment variables (PVM's idea, not ours); something like,
 396 probably in your \tr{.cshrc} or equivalent:
 397 \begin{verbatim}
 398 setenv PVM_ROOT /wherever/you/put/it
 399 setenv PVM_ARCH `$PVM_ROOT/lib/pvmgetarch`
 400 setenv PVM_DPATH $PVM_ROOT/lib/pvmd
 401 \end{verbatim}
 402
 403 Creating and/or controlling your ``parallel machine'' is a purely-PVM
 404 business; nothing specific to Parallel Haskell.
 405
 406 You use the \tr{pvm}\index{pvm command} command to start PVM on your
 407 machine.  You can then do various things to control/monitor your
 408 ``parallel machine;'' the most useful being:
 409
 410 \begin{tabular}{ll}
 411 \tr{Control-D} & exit \tr{pvm}, leaving it running \\
 412 \tr{halt} & kill off this ``parallel machine'' \& exit \\
 413 \tr{add <host>} & add \tr{<host>} as a processor \\
 414 \tr{delete <host>} & delete \tr{<host>} \\
 415 \tr{reset}      & kill what's going, but leave PVM up \\
 416 \tr{conf}       & list the current configuration \\
 417 \tr{ps}         & report processes' status \\
 418 \tr{pstat <pid>} & status of a particular process \\
 419 \end{tabular}
 420
 421 The PVM documentation can tell you much, much more about \tr{pvm}!
 422
 423 %************************************************************************
 424 %*                                                                      *
 425 \subsubsection{Parallelism profiles}
 426 \index{parallelism profiles}
 427 \index{profiles, parallelism}
 428 \index{visualisation tools}
 429 %*                                                                      *
 430 %************************************************************************
 431
 432 With Parallel Haskell programs, we usually don't care about the
 433 results---only with ``how parallel'' it was!  We want pretty pictures.
 434
 435 Parallelism profiles (\`a la \tr{hbcpp}) can be generated with the
 436 \tr{-q}\index{-q RTS option (concurrent, parallel)} RTS option.  The
 437 per-processor profiling info is dumped into files named
 438 \tr{<full-path><program>.gr}.  These are then munged into a PostScript picture,
 439 which you can then display.  For example, to run your program
 440 \tr{a.out} on 8 processors, then view the parallelism profile, do:
 441
 442 \begin{verbatim}
 443 % ./a.out +RTS -N8 -q
 444 % grs2gr *.???.gr > temp.gr     # combine the 8 .gr files into one
 445 % gr2ps -O temp.gr              # cvt to .ps; output in temp.ps
 446 % ghostview -seascape temp.ps   # look at it!
 447 \end{verbatim}
 448
 449 The scripts for processing the parallelism profiles are distributed
 450 in \tr{ghc/utils/parallel/}.
 451
 452 %$$************************************************************************
 453 %$$*                                                                      *
 454 %$$\subsubsection{Activity profiles}
 455 %$$\index{activity profiles}
 456 %$$\index{profiles, activity}
 457 %$$\index{visualisation tools}
 458 %$$%$$*                                                                      *
 459 %$$%$$************************************************************************
 460 %$$
 461 %$$You can also use the standard GHC ``cost-centre'' profiling to see how
 462 %$$much time each PVM ``processor'' spends
 463 %$$
 464 %$$No special compilation flags beyond \tr{-parallel} are required to get
 465 %$$this basic four-activity profile.  Just use the \tr{-P} RTS option,
 466 %$$thusly:
 467 %$$\begin{verbatim}
 468 %$$./a.out +RTS -N7 -P  # 7 processors
 469 %$$\end{verbatim}
 470 %$$
 471 %$$The above will create files named \tr{<something>.prof} and/or
 472 %$$\tr{<something>.time} {\em in your home directory}.  You can
 473 %$$process the \tr{.time} files into PostScript using \tr{hp2ps},
 474 %$$\index{hp2ps}
 475 %$$as described elsewhere in this guide.
 476 %$$
 477 %$$Because of the weird file names, you probably need to use
 478 %$$\tr{hp2ps} as a filter.  Also, you probably want to give \tr{hp2ps}
 479 %$$a \tr{-t0} flag, so that no ``inconsequential'' data is ignored---in
 480 %$$parallel-land it's all consequential.  So:
 481 %$$\begin{verbatim}
 482 %$$%$$ hp2ps -t0 < fooo.001.time > temp.ps
 483 %$$\end{verbatim}
 484 %$$
 485 %$$ The first line of the
 486 %$$ \tr{.qp} file contains the name of the program executed, along with
 487 %$$ any program arguments and thread-specific RTS options.  The second
 488 %$$ line contains the date and time of program execution.  The third
 489 %$$ and subsequent lines contain information about thread state transitions.
 490 %$$
 491 %$$ The thread state transition lines have the following format:
 492 %$$ \begin{verbatim}
 493 %$$ time transition thread-id thread-name [thread-id thread-name]
 494 %$$ \end{verbatim}
 495 %$$
 496 %$$ The \tr{time} is the virtual time elapsed since the program started
 497 %$$ execution, in milliseconds.  The \tr{transition} is a two-letter code
 498 %$$ indicating the ``from'' queue and the ``to'' queue, where each queue
 499 %$$ is one of:
 500 %$$ \begin{itemize}
 501 %$$ \item[\tr{*}] Void: Thread creation or termination.
 502 %$$ \item[\tr{G}] Green: Runnable (or actively running, with \tr{-qv}) threads.
 503 %$$ \item[\tr{A}] Amber: Runnable threads (\tr{-qv} only).
 504 %$$ \item[\tr{R}] Red: Blocked threads.
 505 %$$ \end{itemize}
 506 %$$ The \tr{thread-id} is a unique integer assigned to each thread.  The
 507 %$$ \tr{thread-name} is currently the address of the thread's root closure
 508 %$$ (in hexadecimal).  In the future, it will be the name of the function
 509 %$$ associated with the root of the thread.
 510 %$$
 511 %$$ The first \tr{(thread-id, thread-name)} pair identifies the thread
 512 %$$ involved in the indicated transition.  For \tr{RG} and \tr{RA} transitions
 513 %$$ only, there is a second \tr{(thread-id, thread-name)} pair which identifies
 514 %$$ the thread that released the blocked thread.
 515 %$$
 516 %$$ Provided with the GHC distribution is a perl script, \tr{qp2pp}, which
 517 %$$ will convert \tr{.qp} files to \tr{hbcpp}'s \tr{.pp} format, so that
 518 %$$ you can use the \tr{hbcpp} profiling tools, such as \tr{pp2ps92}.  The
 519 %$$ \tr{.pp} format has undergone many changes, so the conversion script
 520 %$$ is not compatible with earlier releases of \tr{hbcpp}.  Note that GHC
 521 %$$ and \tr{hbcpp} use different thread scheduling policies (in
 522 %$$ particular, \tr{hbcpp} threads never move from the green queue to the
 523 %$$ amber queue).  For compatibility, the \tr{qp2pp} script eliminates the
 524 %$$ GHC amber queue, so there is no point in using the verbose (\tr{-qv})
 525 %$$ option if you are only interested in using the \tr{hbcpp} profiling
 526 %$$ tools.
 527
 528 %************************************************************************
 529 %*                                                                      *
 530 \subsubsection{Other useful info about running parallel programs}
 531 %*                                                                      *
 532 %************************************************************************
 533
 534 The ``garbage-collection statistics'' RTS options can be useful for
 535 seeing what parallel programs are doing.  If you do either
 536 \tr{+RTS -Sstderr}\index{-Sstderr RTS option} or \tr{+RTS -sstderr}, then
 537 you'll get mutator, garbage-collection, etc., times on standard
 538 error. The standard error of all PE's other than the `main thread'
 539 appears in \tr{/tmp/pvml.nnn}, courtesy of PVM.
 540
 541 Whether doing \tr{+RTS -Sstderr} or not, a handy way to watch
 542 what's happening overall is: \tr{tail -f /tmp/pvml.nnn}.
 543
 544 %************************************************************************
 545 %*                                                                      *
 546 \subsubsection[parallel-rts-opts]{RTS options for Concurrent/Parallel Haskell}
 547 \index{RTS options, concurrent}
 548 \index{RTS options, parallel}
 549 \index{Concurrent Haskell---RTS options}
 550 \index{Parallel Haskell---RTS options}
 551 %*                                                                      *
 552 %************************************************************************
 553
 554 Besides the usual runtime system (RTS) options
 555 (\sectionref{runtime-control}), there are a few options particularly
 556 for concurrent/parallel execution.
 557
 558 \begin{description}
 559 \item[\tr{-N<N>}:]
 560 \index{-N<N> RTS option (parallel)}
 561 (PARALLEL ONLY) Use \tr{<N>} PVM processors to run this program;
 562 the default is 2.
 563
 564 \item[\tr{-C[<us>]}:]
 565 \index{-C<us> RTS option}
 566 Sets the context switch interval to \pl{<us>} microseconds.  A context
 567 switch will occur at the next heap allocation after the timer expires.
 568 With \tr{-C0} or \tr{-C}, context switches will occur as often as
 569 possible (at every heap allocation).  By default, context switches
 570 occur every 10 milliseconds.  Note that many interval timers are only
 571 capable of 10 millisecond granularity, so the default setting may be
 572 the finest granularity possible, short of a context switch at every
 573 heap allocation.
 574
 575 \item[\tr{-q[v]}:]
 576 \index{-q RTS option}
 577 Produce a quasi-parallel profile of thread activity, in the file
 578 \tr{<program>.qp}.  In the style of \tr{hbcpp}, this profile records
 579 the movement of threads between the green (runnable) and red (blocked)
 580 queues.  If you specify the verbose suboption (\tr{-qv}), the green
 581 queue is split into green (for the currently running thread only) and
 582 amber (for other runnable threads).  We do not recommend that you use
 583 the verbose suboption if you are planning to use the \tr{hbcpp}
 584 profiling tools or if you are context switching at every heap check
 585 (with \tr{-C}).
 586
 587 \item[\tr{-t<num>}:]
 588 \index{-t<num> RTS option}
 589 Limit the number of concurrent threads per processor to \pl{<num>}.
 590 The default is 32.  Each thread requires slightly over 1K {\em words}
 591 in the heap for thread state and stack objects.  (For 32-bit machines,
 592 this translates to 4K bytes, and for 64-bit machines, 8K bytes.)
 593
 594 \item[\tr{-d}:]
 595 \index{-d RTS option (parallel)}
 596 (PARALLEL ONLY) Turn on debugging.  It pops up one xterm (or GDB, or
 597 something...) per PVM processor.  We use the standard \tr{debugger}
 598 script that comes with PVM3, but we sometimes meddle with the
 599 \tr{debugger2} script.  We include ours in the GHC distribution,
 600 in \tr{ghc/utils/pvm/}.
 601
 602 \item[\tr{-e<num>}:]
 603 \index{-e<num> RTS option (parallel)}
 604 (PARALLEL ONLY) Limit the number of pending sparks per processor to
 605 \tr{<num>}. The default is 100. A larger number may be appropriate if
 606 your program generates large amounts of parallelism initially.
 607
 608 \item[\tr{-Q<num>}:]
 609 \index{-Q<num> RTS option (parallel)}
 610 (PARALLEL ONLY) Set the size of packets transmitted between processors
 611 to \tr{<num>}. The default is 1024 words. A larger number may be
 612 appropriate if your machine has a high communication cost relative to
 613 computation speed.
 614 \end{description}
 615
 616 %************************************************************************
 617 %*                                                                      *
 618 \subsubsubsection[parallel-problems]{Potential problems with Parallel Haskell}
 619 \index{Parallel Haskell---problems}
 620 \index{problems, Parallel Haskell}
 621 %*                                                                      *
 622 %************************************************************************
 623
 624 The ``Potential problems'' for Concurrent Haskell also apply for
 625 Parallel Haskell.  Please see \Sectionref{concurrent-problems}.
 626
 627 %$$ \subsubsubsection[par-notes]{notes for 0.26}
 628 %$$
 629 %$$ \begin{verbatim}
 630 %$$ Install PVM somewhere, as it says.  We use 3.3
 631 %$$
 632 %$$ pvm.h : can do w/ a link from ghc/includes to its true home (???)
 633 %$$
 634 %$$
 635 %$$ ghc -gum ... => a.out
 636 %$$
 637 %$$     a.out goes to $PVM_ROOT/bin/$PVM_ARCH/$PE
 638 %$$
 639 %$$     (profiling outputs go to ~/$PE.<process-num>.<suffix>)
 640 %$$
 641 %$$     trinder scripts in: ~trinder/bin/any/instPHIL
 642 %$$
 643 %$$ To run:
 644 %$$
 645 %$$     Then:
 646 %$$     SysMan [-] N (PEs) args-to-program...
 647 %$$
 648 %$$         - ==> debug mode
 649 %$$                 mattson setup: GDB window per task
 650 %$$                 /local/grasp_tmp5/mattson/pvm3/lib/debugger{,2}
 651 %$$
 652 %$$                 to set breakpoint, etc, before "run", just modify debugger2
 653 %$$
 654 %$$     stderr and stdout are directed to /tmp/pvml.NNN
 655 %$$
 656 %$$ Visualisation stuff (normal _mp build):
 657 %$$
 658 %$$ +RTS -q         gransim-like profiling
 659 %$$                 (should use exactly-gransim RTS options)
 660 %$$      -qb        binary dumps : not tried, not recommended: hosed!
 661 %$$
 662 %$$     ascii dump : same info as gransim, one extra line at top w/
 663 %$$                 start time; all times are ms since then
 664 %$$
 665 %$$     dumps appear in $HOME/<program>.nnn.gr
 666 %$$
 667 %$$ ~mattson/grs2gr.pl == combine lots into one (fixing times)
 668 %$$
 669 %$$ /local/grasp/hwloidl/GrAn/bin/ is where scripts are.
 670 %$$
 671 %$$ gr2ps == activity profile (bash script)
 672 %$$
 673 %$$ ~mattson/bin/`arch`/gr2qp must be picked up prior to hwloidl's for
 674 %$$ things to work...
 675 %$$
 676 %$$ +RTS -[Pp]      (parallel) 4-cost-centre "profiling" (gc,MAIN,msg,idle)
 677 %$$
 678 %$$         ToDos: time-profiles from hp2ps: something about zeroth sample;
 679 %$$ \end{verbatim}