ghc/docs/users_guide/parallel.lit

   1 % both concurrent and parallel
   2 %************************************************************************
   3 %*                                                                      *
   4 \section[concurrent-and-parallel]{Concurrent and Parallel Haskell}
   5 \index{Concurrent Haskell}
   6 \index{Parallel Haskell}
   7 %*                                                                      *
   8 %************************************************************************
   9
  10 Concurrent and Parallel Haskell are Glasgow extensions to Haskell
  11 which let you structure your program as a group of independent
  12 `threads'.
  13
  14 Concurrent and Parallel Haskell have very different purposes.
  15
  16 Concurrent Haskell is for applications which have an inherent
  17 structure of interacting, concurrent tasks (i.e. `threads').  Threads
  18 in such programs may be {\em required}.  For example, if a concurrent
  19 thread has been spawned to handle a mouse click, it isn't
  20 optional---the user wants something done!
  21
  22 A Concurrent Haskell program implies multiple `threads' running within
  23 a single Unix process on a single processor.
  24
  25 Simon Peyton Jones and Sigbjorn Finne have a paper available,
  26 ``Concurrent Haskell: preliminary version.''
  27 (draft available via \tr{ftp}
  28 from \tr{ftp.dcs.gla.ac.uk/pub/glasgow-fp/drafts}).
  29
  30 Parallel Haskell is about {\em speed}---spawning threads onto multiple
  31 processors so that your program will run faster.  The `threads'
  32 are always {\em advisory}---if the runtime system thinks it can
  33 get the job done more quickly by sequential execution, then fine.
  34
  35 A Parallel Haskell program implies multiple processes running on
  36 multiple processors, under a PVM (Parallel Virtual Machine) framework.
  37
  38 Parallel Haskell is new with GHC 0.26; it is more about ``research
  39 fun'' than about ``speed.'' That will change.  There is no paper about
  40 Parallel Haskell.  That will change, too.
  41
  42 Some details about Concurrent and Parallel Haskell follow.
  43
  44 %************************************************************************
  45 %*                                                                      *
  46 \subsection{Concurrent and Parallel Haskell---language features}
  47 \index{Concurrent Haskell---features}
  48 \index{Parallel Haskell---features}
  49 %*                                                                      *
  50 %************************************************************************
  51
  52 %************************************************************************
  53 %*                                                                      *
  54 \subsubsection{Features specific to Concurrent Haskell}
  55 %*                                                                      *
  56 %************************************************************************
  57
  58 %************************************************************************
  59 %*                                                                      *
  60 \subsubsubsection{The \tr{Concurrent} interface (recommended)}
  61 \index{Concurrent interface}
  62 %*                                                                      *
  63 %************************************************************************
  64
  65 GHC provides a \tr{Concurrent} module, a common interface to a
  66 collection of useful concurrency abstractions, including those
  67 mentioned in the ``concurrent paper''.
  68
  69 Just put \tr{import Concurrent} into your modules, and away you go.
  70 NB: intended for use with the \tr{-fhaskell-1.3} flag.
  71
  72 To create a ``required thread'':
  73
  74 \begin{verbatim}
  75 forkIO :: IO a -> IO a
  76 \end{verbatim}
  77
  78 The \tr{Concurrent} interface also provides access to ``I-Vars''
  79 and ``M-Vars'', which are two flavours of {\em synchronising variables}.
  80 \index{synchronising variables (Glasgow extension)}
  81 \index{concurrency -- synchronising variables}
  82
  83 \tr{_IVars}\index{_IVars (Glasgow extension)} are write-once
  84 variables.  They start out empty, and any threads that attempt to read
  85 them will block until they are filled.  Once they are written, any
  86 blocked threads are freed, and additional reads are permitted.
  87 Attempting to write a value to a full \tr{_IVar} results in a runtime
  88 error.  Interface:
  89 \begin{verbatim}
  90 type IVar a = _IVar a -- more convenient name
  91
  92 newIVar     :: IO (_IVar a)
  93 readIVar    :: _IVar a -> IO a
  94 writeIVar   :: _IVar a -> a -> IO ()
  95 \end{verbatim}
  96
  97 \tr{_MVars}\index{_MVars (Glasgow extension)} are rendezvous points,
  98 mostly for concurrent threads.  They begin empty, and any attempt to
  99 read an empty \tr{_MVar} blocks.  When an \tr{_MVar} is written, a
 100 single blocked thread may be freed.  Reading an \tr{_MVar} toggles its
 101 state from full back to empty.  Therefore, any value written to an
 102 \tr{_MVar} may only be read once.  Multiple reads and writes are
 103 allowed, but there must be at least one read between any two
 104 writes. Interface:
 105 \begin{verbatim}
 106 type MVar a  = _MVar a -- more convenient name
 107
 108 newEmptyMVar :: IO (_MVar a)
 109 newMVar      :: a -> IO (_MVar a)
 110 takeMVar     :: _MVar a -> IO a
 111 putMVar      :: _MVar a -> a -> IO ()
 112 readMVar     :: _MVar a -> IO a
 113 swapMVar     :: _MVar a -> a -> IO a
 114 \end{verbatim}
 115
 116 A {\em channel variable} (@CVar@) is a one-element channel, as
 117 described in the paper:
 118
 119 \begin{verbatim}
 120 data CVar a
 121 newCVar :: IO (CVar a)
 122 putCVar :: CVar a -> a -> IO ()
 123 getCVar :: CVar a -> IO a
 124 \end{verbatim}
 125
 126 A @Channel@ is an unbounded channel:
 127
 128 \begin{verbatim}
 129 data Chan a
 130 newChan         :: IO (Chan a)
 131 putChan         :: Chan a -> a -> IO ()
 132 getChan         :: Chan a -> IO a
 133 dupChan         :: Chan a -> IO (Chan a)
 134 unGetChan       :: Chan a -> a -> IO ()
 135 getChanContents :: Chan a -> IO [a]
 136 \end{verbatim}
 137
 138 General and quantity semaphores:
 139
 140 \begin{verbatim}
 141 data QSem
 142 newQSem     :: Int   -> IO QSem
 143 waitQSem    :: QSem  -> IO ()
 144 signalQSem  :: QSem  -> IO ()
 145
 146 data QSemN
 147 newQSemN    :: Int   -> IO QSemN
 148 signalQSemN :: QSemN -> Int -> IO ()
 149 waitQSemN   :: QSemN -> Int -> IO ()
 150 \end{verbatim}
 151
 152 Merging streams---binary and n-ary:
 153
 154 \begin{verbatim}
 155 mergeIO  :: [a]   -> [a] -> IO [a]
 156 nmergeIO :: [[a]] -> IO [a]
 157 \end{verbatim}
 158
 159 A {\em Sample variable} (@SampleVar@) is slightly different from a
 160 normal @_MVar@:
 161 \begin{itemize}
 162 \item Reading an empty @SampleVar@ causes the reader to block
 163     (same as @takeMVar@ on empty @_MVar@).
 164 \item Reading a filled @SampleVar@ empties it and returns value.
 165     (same as @takeMVar@)
 166 \item Writing to an empty @SampleVar@ fills it with a value, and
 167 potentially, wakes up a blocked reader  (same as for @putMVar@ on empty @_MVar@).
 168 \item Writing to a filled @SampleVar@ overwrites the current value.
 169  (different from @putMVar@ on full @_MVar@.)
 170 \end{itemize}
 171
 172 \begin{verbatim}
 173 type SampleVar a = _MVar (Int, _MVar a)
 174
 175 emptySampleVar :: SampleVar a -> IO ()
 176 newSampleVar   :: IO (SampleVar a)
 177 readSample     :: SampleVar a -> IO a
 178 writeSample    :: SampleVar a -> a -> IO ()
 179 \end{verbatim}
 180
 181 Finally, there are operations to delay a concurrent thread, and to
 182 make one wait:\index{delay a concurrent thread}
 183 \index{wait for a file descriptor}
 184 \begin{verbatim}
 185 threadDelay :: Int -> IO () -- delay rescheduling for N microseconds
 186 threadWait  :: Int -> IO () -- wait for input on specified file descriptor
 187 \end{verbatim}
 188
 189 %************************************************************************
 190 %*                                                                      *
 191 \subsubsection{Features specific to Parallel Haskell}
 192 %*                                                                      *
 193 %************************************************************************
 194
 195 %************************************************************************
 196 %*                                                                      *
 197 \subsubsubsection{The \tr{Parallel} interface (recommended)}
 198 \index{Parallel interface}
 199 %*                                                                      *
 200 %************************************************************************
 201
 202 GHC provides two functions for controlling parallel execution, through
 203 the \tr{Parallel} interface:
 204 \begin{verbatim}
 205 interface Parallel where
 206 infixr 0 `par`
 207 infixr 1 `seq`
 208
 209 par :: a -> b -> b
 210 seq :: a -> b -> b
 211 \end{verbatim}
 212
 213 The expression \tr{(x `par` y)} {\em sparks} the evaluation of \tr{x}
 214 (to weak head normal form) and returns \tr{y}.  Sparks are queued for
 215 execution in FIFO order, but are not executed immediately.  At the
 216 next heap allocation, the currently executing thread will yield
 217 control to the scheduler, and the scheduler will start a new thread
 218 (until reaching the active thread limit) for each spark which has not
 219 already been evaluated to WHNF.
 220
 221 The expression \tr{(x `seq` y)} evaluates \tr{x} to weak head normal
 222 form and then returns \tr{y}.  The \tr{seq} primitive can be used to
 223 force evaluation of an expression beyond WHNF, or to impose a desired
 224 execution sequence for the evaluation of an expression.
 225
 226 For example, consider the following parallel version of our old
 227 nemesis, \tr{nfib}:
 228
 229 \begin{verbatim}
 230 import Parallel
 231
 232 nfib :: Int -> Int
 233 nfib n | n <= 1 = 1
 234        | otherwise = par n1 (seq n2 (n1 + n2 + 1))
 235                      where n1 = nfib (n-1)
 236                            n2 = nfib (n-2)
 237 \end{verbatim}
 238
 239 For values of \tr{n} greater than 1, we use \tr{par} to spark a thread
 240 to evaluate \tr{nfib (n-1)}, and then we use \tr{seq} to force the
 241 parent thread to evaluate \tr{nfib (n-2)} before going on to add
 242 together these two subexpressions.  In this divide-and-conquer
 243 approach, we only spark a new thread for one branch of the computation
 244 (leaving the parent to evaluate the other branch).  Also, we must use
 245 \tr{seq} to ensure that the parent will evaluate \tr{n2} {\em before}
 246 \tr{n1} in the expression \tr{(n1 + n2 + 1)}.  It is not sufficient to
 247 reorder the expression as \tr{(n2 + n1 + 1)}, because the compiler may
 248 not generate code to evaluate the addends from left to right.
 249
 250 %************************************************************************
 251 %*                                                                      *
 252 \subsubsubsection{Underlying functions and primitives}
 253 \index{parallelism primitives}
 254 \index{primitives for parallelism}
 255 %*                                                                      *
 256 %************************************************************************
 257
 258 The functions \tr{par} and \tr{seq} are really just renamings:
 259 \begin{verbatim}
 260 par a b = _par_ a b
 261 seq a b = _seq_ a b
 262 \end{verbatim}
 263
 264 The functions \tr{_par_} and \tr{_seq_} are built into GHC, and unfold
 265 into uses of the \tr{par#} and \tr{seq#} primitives, respectively.  If
 266 you'd like to see this with your very own eyes, just run GHC with the
 267 \tr{-ddump-simpl} option.  (Anything for a good time...)
 268
 269 You can use \tr{_par_} and \tr{_seq_} in Concurrent Haskell, though
 270 I'm not sure why you would want to.
 271
 272 %************************************************************************
 273 %*                                                                      *
 274 \subsubsection{Features common to Concurrent and Parallel Haskell}
 275 %*                                                                      *
 276 %************************************************************************
 277
 278 Actually, you can use the \tr{`par`} and \tr{`seq`} combinators
 279 (really for Parallel Haskell) in Concurrent Haskell as well.
 280 But doing things like ``\tr{par} to \tr{forkIO} many required threads''
 281 counts as ``jumping out the 9th-floor window, just to see what happens.''
 282
 283 %************************************************************************
 284 %*                                                                      *
 285 \subsubsubsection{Scheduling policy for concurrent/parallel threads}
 286 \index{Scheduling---concurrent/parallel}
 287 \index{Concurrent/parallel scheduling}
 288 %*                                                                      *
 289 %************************************************************************
 290
 291 Runnable threads are scheduled in round-robin fashion.  Context
 292 switches are signalled by the generation of new sparks or by the
 293 expiry of a virtual timer (the timer interval is configurable with the
 294 \tr{-C[<num>]}\index{-C<num> RTS option (concurrent, parallel)} RTS option).
 295 However, a context switch doesn't really happen until the next heap
 296 allocation.  If you want extremely short time slices, the \tr{-C} RTS
 297 option can be used to force a context switch at each and every heap
 298 allocation.
 299
 300 When a context switch occurs, pending sparks which have not already
 301 been reduced to weak head normal form are turned into new threads.
 302 However, there is a limit to the number of active threads (runnable or
 303 blocked) which are allowed at any given time.  This limit can be
 304 adjusted with the \tr{-t<num>}\index{-t <num> RTS option (concurrent, parallel)}
 305 RTS option (the default is 32).  Once the
 306 thread limit is reached, any remaining sparks are deferred until some
 307 of the currently active threads are completed.
 308
 309 %************************************************************************
 310 %*                                                                      *
 311 \subsection{How to use Concurrent and Parallel Haskell}
 312 %*                                                                      *
 313 %************************************************************************
 314
 315 [You won't get far unless your GHC system was configured/built with
 316 concurrency and/or parallelism enabled.  (They require separate
 317 library modules.)  The relevant section of the installation guide says
 318 how to do this.]
 319
 320 %************************************************************************
 321 %*                                                                      *
 322 \subsubsection{Using Concurrent Haskell}
 323 \index{Concurrent Haskell---use}
 324 %*                                                                      *
 325 %************************************************************************
 326
 327 To compile a program as Concurrent Haskell, use the \tr{-concurrent}
 328 option,\index{-concurrent option} both when compiling {\em and
 329 linking}.  You will probably need the \tr{-fglasgow-exts} option, too.
 330
 331 Three RTS options are provided for modifying the behaviour of the
 332 threaded runtime system.  See the descriptions of \tr{-C[<us>]}, \tr{-q},
 333 and \tr{-t<num>} in \Sectionref{parallel-rts-opts}.
 334
 335 %************************************************************************
 336 %*                                                                      *
 337 \subsubsubsection[concurrent-problems]{Potential problems with Concurrent Haskell}
 338 \index{Concurrent Haskell problems}
 339 \index{problems, Concurrent Haskell}
 340 %*                                                                      *
 341 %************************************************************************
 342
 343 The main thread in a Concurrent Haskell program is given its own
 344 private stack space, but all other threads are given stack space from
 345 the heap.  Stack space for the main thread can be
 346 adjusted as usual with the \tr{-K} RTS
 347 option,\index{-K RTS option (concurrent, parallel)} but if this
 348 private stack space is exhausted, the main thread will switch to stack
 349 segments in the heap, just like any other thread.  Thus, problems
 350 which would normally result in stack overflow in ``sequential Haskell''
 351 can be expected to result in heap overflow when using threads.
 352
 353 The concurrent runtime system uses black holes as synchronisation
 354 points for subexpressions which are shared among multiple threads.  In
 355 ``sequential Haskell'', a black hole indicates a cyclic data
 356 dependency, which is a fatal error.  However, in concurrent execution, a
 357 black hole may simply indicate that the desired expression is being
 358 evaluated by another thread.  Therefore, when a thread encounters a
 359 black hole, it simply blocks and waits for the black hole to be
 360 updated.  Cyclic data dependencies will result in deadlock, and the
 361 program will fail to terminate.
 362
 363 Because the concurrent runtime system uses black holes as
 364 synchronisation points, it is not possible to disable black-holing
 365 with the \tr{-N} RTS option.\index{-N RTS option} Therefore, the use
 366 of signal handlers (including timeouts) with the concurrent runtime
 367 system can lead to problems if a thread attempts to enter a black hole
 368 that was created by an abandoned computation.  The use of signal
 369 handlers in conjunction with threads is strongly discouraged.
 370
 371
 372 %************************************************************************
 373 %*                                                                      *
 374 \subsubsection{Using Parallel Haskell}
 375 \index{Parallel Haskell---use}
 376 %*                                                                      *
 377 %************************************************************************
 378
 379 [You won't be able to execute parallel Haskell programs unless PVM3
 380 (Parallel Virtual Machine, version 3) is installed at your site.]
 381
 382 To compile a Haskell program for parallel execution under PVM, use the
 383 \tr{-parallel} option,\index{-parallel option} both when compiling
 384 {\em and linking}.  You will probably want to \tr{import Parallel}
 385 into your Haskell modules.
 386
 387 To run your parallel program, once PVM is going, just invoke it ``as
 388 normal''.  The main extra RTS option is \tr{-N<n>}, to say how many
 389 PVM ``processors'' your program to run on.  (For more details of
 390 all relevant RTS options, please see \sectionref{parallel-rts-opts}.)
 391
 392 In truth, running Parallel Haskell programs and getting information
 393 out of them (e.g., activity profiles) is a battle with the vagaries of
 394 PVM, detailed in the following sections.
 395
 396 For example: the stdout and stderr from your parallel program run will
 397 appear in a log file, called something like \tr{/tmp/pvml.NNN}.
 398
 399 %************************************************************************
 400 %*                                                                      *
 401 \subsubsubsection{Dummy's guide to using PVM}
 402 \index{PVM, how to use}
 403 \index{Parallel Haskell---PVM use}
 404 %*                                                                      *
 405 %************************************************************************
 406
 407 Before you can run a parallel program under PVM, you must set the
 408 required environment variables (PVM's idea, not ours); something like,
 409 probably in your \tr{.cshrc} or equivalent:
 410 \begin{verbatim}
 411 setenv PVM_ROOT /wherever/you/put/it
 412 setenv PVM_ARCH `$PVM_ROOT/lib/pvmgetarch`
 413 setenv PVM_DPATH $PVM_ROOT/lib/pvmd
 414 \end{verbatim}
 415
 416 Creating and/or controlling your ``parallel machine'' is a purely-PVM
 417 business; nothing specific to Parallel Haskell.
 418
 419 You use the \tr{pvm}\index{pvm command} command to start PVM on your
 420 machine.  You can then do various things to control/monitor your
 421 ``parallel machine;'' the most useful being:
 422
 423 \begin{tabular}{ll}
 424 \tr{Control-D} & exit \tr{pvm}, leaving it running \\
 425 \tr{halt} & kill off this ``parallel machine'' \& exit \\
 426 \tr{add <host>} & add \tr{<host>} as a processor \\
 427 \tr{delete <host>} & delete \tr{<host>} \\
 428 \tr{reset}      & kill what's going, but leave PVM up \\
 429 \tr{conf}       & list the current configuration \\
 430 \tr{ps}         & report processes' status \\
 431 \tr{pstat <pid>} & status of a particular process \\
 432 \end{tabular}
 433
 434 The PVM documentation can tell you much, much more about \tr{pvm}!
 435
 436 %************************************************************************
 437 %*                                                                      *
 438 \subsubsection{Parallelism profiles}
 439 \index{parallelism profiles}
 440 \index{profiles, parallelism}
 441 \index{visualisation tools}
 442 %*                                                                      *
 443 %************************************************************************
 444
 445 With Parallel Haskell programs, we usually don't care about the
 446 results---only with ``how parallel'' it was!  We want pretty pictures.
 447
 448 Parallelism profiles (\`a la \tr{hbcpp}) can be generated with the
 449 \tr{-q}\index{-q RTS option (concurrent, parallel)} RTS option.  The
 450 per-processor profiling info is dumped into files {\em in your home
 451 directory} named \tr{<program>.gr}.  These are then munged into a
 452 PostScript picture, which you can then display.  For example,
 453 to run your program \tr{a.out} on 8 processors, then view the
 454 parallelism profile, do:
 455
 456 \begin{verbatim}
 457 % ./a.out +RTS -N8 -q
 458 % cd                    # to home directory
 459 % grs2gr *.???.gr       # combine the 8 .gr files into one
 460 % gr2ps -O temp.gr      # cvt to .ps; output in temp.ps
 461 % ghostview -seascape temp.ps   # look at it!
 462 \end{verbatim}
 463
 464 The scripts for processing the parallelism profiles are distributed
 465 in \tr{ghc/utils/parallel/}.
 466
 467 %************************************************************************
 468 %*                                                                      *
 469 \subsubsection{Activity profiles}
 470 \index{activity profiles}
 471 \index{profiles, activity}
 472 \index{visualisation tools}
 473 %*                                                                      *
 474 %************************************************************************
 475
 476 You can also use the standard GHC ``cost-centre'' profiling to see how
 477 much time each PVM ``processor'' spends
 478
 479 No special compilation flags beyond \tr{-parallel} are required to get
 480 this basic four-activity profile.  Just use the \tr{-P} RTS option,
 481 thusly:
 482 \begin{verbatim}
 483 ./a.out +RTS -N7 -P     # 7 processors
 484 \end{verbatim}
 485
 486 The above will create files named \tr{<something>.prof} and/or
 487 \tr{<something>.time} {\em in your home directory}.  You can
 488 process the \tr{.time} files into PostScript using \tr{hp2ps},
 489 \index{hp2ps}
 490 as described elsewhere in this guide.
 491
 492 Because of the weird file names, you probably need to use
 493 \tr{hp2ps} as a filter.  Also, you probably want to give \tr{hp2ps}
 494 a \tr{-t0} flag, so that no ``inconsequential'' data is ignored---in
 495 parallel-land it's all consequential.  So:
 496 \begin{verbatim}
 497 % hp2ps -t0 < fooo.001.time > temp.ps
 498 \end{verbatim}
 499
 500 %$$ The first line of the
 501 %$$ \tr{.qp} file contains the name of the program executed, along with
 502 %$$ any program arguments and thread-specific RTS options.  The second
 503 %$$ line contains the date and time of program execution.  The third
 504 %$$ and subsequent lines contain information about thread state transitions.
 505 %$$
 506 %$$ The thread state transition lines have the following format:
 507 %$$ \begin{verbatim}
 508 %$$ time transition thread-id thread-name [thread-id thread-name]
 509 %$$ \end{verbatim}
 510 %$$
 511 %$$ The \tr{time} is the virtual time elapsed since the program started
 512 %$$ execution, in milliseconds.  The \tr{transition} is a two-letter code
 513 %$$ indicating the ``from'' queue and the ``to'' queue, where each queue
 514 %$$ is one of:
 515 %$$ \begin{itemize}
 516 %$$ \item[\tr{*}] Void: Thread creation or termination.
 517 %$$ \item[\tr{G}] Green: Runnable (or actively running, with \tr{-qv}) threads.
 518 %$$ \item[\tr{A}] Amber: Runnable threads (\tr{-qv} only).
 519 %$$ \item[\tr{R}] Red: Blocked threads.
 520 %$$ \end{itemize}
 521 %$$ The \tr{thread-id} is a unique integer assigned to each thread.  The
 522 %$$ \tr{thread-name} is currently the address of the thread's root closure
 523 %$$ (in hexadecimal).  In the future, it will be the name of the function
 524 %$$ associated with the root of the thread.
 525 %$$
 526 %$$ The first \tr{(thread-id, thread-name)} pair identifies the thread
 527 %$$ involved in the indicated transition.  For \tr{RG} and \tr{RA} transitions
 528 %$$ only, there is a second \tr{(thread-id, thread-name)} pair which identifies
 529 %$$ the thread that released the blocked thread.
 530 %$$
 531 %$$ Provided with the GHC distribution is a perl script, \tr{qp2pp}, which
 532 %$$ will convert \tr{.qp} files to \tr{hbcpp}'s \tr{.pp} format, so that
 533 %$$ you can use the \tr{hbcpp} profiling tools, such as \tr{pp2ps92}.  The
 534 %$$ \tr{.pp} format has undergone many changes, so the conversion script
 535 %$$ is not compatible with earlier releases of \tr{hbcpp}.  Note that GHC
 536 %$$ and \tr{hbcpp} use different thread scheduling policies (in
 537 %$$ particular, \tr{hbcpp} threads never move from the green queue to the
 538 %$$ amber queue).  For compatibility, the \tr{qp2pp} script eliminates the
 539 %$$ GHC amber queue, so there is no point in using the verbose (\tr{-qv})
 540 %$$ option if you are only interested in using the \tr{hbcpp} profiling
 541 %$$ tools.
 542
 543 %************************************************************************
 544 %*                                                                      *
 545 \subsubsection{Other useful info about running parallel programs}
 546 %*                                                                      *
 547 %************************************************************************
 548
 549 The ``garbage-collection statistics'' RTS options can be useful
 550 for seeing what parallel programs are doing.  If you do either
 551 \tr{+RTS -Sstderr}\index{-Sstderr RTS option} or \tr{+RTS -sstderr},
 552 then you'll get mutator, garbage-collection, etc., times on standard
 553 error which, for PVM programs, appears in \tr{/tmp/pvml.nnn}.
 554
 555 Whether doing \tr{+RTS -Sstderr} or not, a handy way to watch
 556 what's happening overall is: \tr{tail -f /tmp/pvml.nnn}.
 557
 558 %************************************************************************
 559 %*                                                                      *
 560 \subsubsection[parallel-rts-opts]{RTS options for Concurrent/Parallel Haskell}
 561 \index{RTS options, concurrent}
 562 \index{RTS options, parallel}
 563 \index{Concurrent Haskell---RTS options}
 564 \index{Parallel Haskell---RTS options}
 565 %*                                                                      *
 566 %************************************************************************
 567
 568 Besides the usual runtime system (RTS) options
 569 (\sectionref{runtime-control}), there are a few options particularly
 570 for concurrent/parallel execution.
 571
 572 \begin{description}
 573 \item[\tr{-N<N>}:]
 574 \index{-N<N> RTS option (parallel)}
 575 (PARALLEL ONLY) Use \tr{<N>} PVM processors to run this program;
 576 the default is 2.
 577
 578 \item[\tr{-C[<us>]}:]
 579 \index{-C<us> RTS option}
 580 Sets the context switch interval to \pl{<us>} microseconds.  A context
 581 switch will occur at the next heap allocation after the timer expires.
 582 With \tr{-C0} or \tr{-C}, context switches will occur as often as
 583 possible (at every heap allocation).  By default, context switches
 584 occur every 10 milliseconds.  Note that many interval timers are only
 585 capable of 10 millisecond granularity, so the default setting may be
 586 the finest granularity possible, short of a context switch at every
 587 heap allocation.
 588
 589 \item[\tr{-q[v]}:]
 590 \index{-q RTS option}
 591 Produce a quasi-parallel profile of thread activity, in the file
 592 \tr{<program>.qp}.  In the style of \tr{hbcpp}, this profile records
 593 the movement of threads between the green (runnable) and red (blocked)
 594 queues.  If you specify the verbose suboption (\tr{-qv}), the green
 595 queue is split into green (for the currently running thread only) and
 596 amber (for other runnable threads).  We do not recommend that you use
 597 the verbose suboption if you are planning to use the \tr{hbcpp}
 598 profiling tools or if you are context switching at every heap check
 599 (with \tr{-C}).
 600
 601 \item[\tr{-t<num>}:]
 602 \index{-t<num> RTS option}
 603 Limit the number of concurrent threads per processor to \pl{<num>}.
 604 The default is 32.  Each thread requires slightly over 1K {\em words}
 605 in the heap for thread state and stack objects.  (For 32-bit machines,
 606 this translates to 4K bytes, and for 64-bit machines, 8K bytes.)
 607
 608 \item[\tr{-d}:]
 609 \index{-d RTS option (parallel)}
 610 (PARALLEL ONLY) Turn on debugging.  It pops up one xterm (or GDB, or
 611 something...) per PVM processor.  We use the standard \tr{debugger}
 612 script that comes with PVM3, but we sometimes meddle with the
 613 \tr{debugger2} script.  We include ours in the GHC distribution,
 614 in \tr{ghc/utils/pvm/}.
 615
 616 \item[\tr{-e<num>}:]
 617 \index{-e<num> RTS option (parallel)}
 618 (PARALLEL ONLY) Limit the number of pending sparks per processor to
 619 \tr{<num>}. The default is 100. A larger number may be appropriate if
 620 your program generates large amounts of parallelism initially.
 621 \end{description}
 622
 623 %************************************************************************
 624 %*                                                                      *
 625 \subsubsubsection[parallel-problems]{Potential problems with Parallel Haskell}
 626 \index{Parallel Haskell---problems}
 627 \index{problems, Parallel Haskell}
 628 %*                                                                      *
 629 %************************************************************************
 630
 631 The ``Potential problems'' for Concurrent Haskell also apply for
 632 Parallel Haskell.  Please see \Sectionref{concurrent-problems}.
 633
 634 %$$ \subsubsubsection[par-notes]{notes for 0.26}
 635 %$$
 636 %$$ \begin{verbatim}
 637 %$$ Install PVM somewhere, as it says.  We use 3.3
 638 %$$
 639 %$$ pvm.h : can do w/ a link from ghc/includes to its true home (???)
 640 %$$
 641 %$$
 642 %$$ ghc -gum ... => a.out
 643 %$$
 644 %$$     a.out goes to $PVM_ROOT/bin/$PVM_ARCH/$PE
 645 %$$
 646 %$$     (profiling outputs go to ~/$PE.<process-num>.<suffix>)
 647 %$$
 648 %$$     trinder scripts in: ~trinder/bin/any/instPHIL
 649 %$$
 650 %$$ To run:
 651 %$$
 652 %$$     Then:
 653 %$$     SysMan [-] N (PEs) args-to-program...
 654 %$$
 655 %$$         - ==> debug mode
 656 %$$                 mattson setup: GDB window per task
 657 %$$                 /local/grasp_tmp5/mattson/pvm3/lib/debugger{,2}
 658 %$$
 659 %$$                 to set breakpoint, etc, before "run", just modify debugger2
 660 %$$
 661 %$$     stderr and stdout are directed to /tmp/pvml.NNN
 662 %$$
 663 %$$ Visualisation stuff (normal _mp build):
 664 %$$
 665 %$$ +RTS -q         gransim-like profiling
 666 %$$                 (should use exactly-gransim RTS options)
 667 %$$      -qb        binary dumps : not tried, not recommended: hosed!
 668 %$$
 669 %$$     ascii dump : same info as gransim, one extra line at top w/
 670 %$$                 start time; all times are ms since then
 671 %$$
 672 %$$     dumps appear in $HOME/<program>.nnn.gr
 673 %$$
 674 %$$ ~mattson/grs2gr.pl == combine lots into one (fixing times)
 675 %$$
 676 %$$ /local/grasp/hwloidl/GrAn/bin/ is where scripts are.
 677 %$$
 678 %$$ gr2ps == activity profile (bash script)
 679 %$$
 680 %$$ ~mattson/bin/`arch`/gr2qp must be picked up prior to hwloidl's for
 681 %$$ things to work...
 682 %$$
 683 %$$ +RTS -[Pp]      (parallel) 4-cost-centre "profiling" (gc,MAIN,msg,idle)
 684 %$$
 685 %$$         ToDos: time-profiles from hp2ps: something about zeroth sample;
 686 %$$ \end{verbatim}