% both concurrent and parallel
%************************************************************************
%*                                                                      *
\section[concurrent-and-parallel]{Concurrent and Parallel Haskell}
\index{Concurrent Haskell}
\index{Parallel Haskell}
%*                                                                      *
%************************************************************************

Concurrent and Parallel Haskell are Glasgow extensions to Haskell
which let you structure your program as a group of independent
`threads'.

Concurrent and Parallel Haskell have very different purposes.

Concurrent Haskell is for applications which have an inherent
structure of interacting, concurrent tasks (i.e. `threads').  Threads
in such programs may be {\em required}.  For example, if a concurrent
thread has been spawned to handle a mouse click, it isn't
optional---the user wants something done!

A Concurrent Haskell program implies multiple `threads' running within
a single Unix process on a single processor.

You will find at least one paper about Concurrent Haskell hanging off
of Simon Peyton Jones's Web page;
\tr{http://www.dcs.gla.ac.uk/~simonpj/}.

Parallel Haskell is about {\em speed}---spawning threads onto multiple
processors so that your program will run faster.  The `threads'
are always {\em advisory}---if the runtime system thinks it can
get the job done more quickly by sequential execution, then fine.

A Parallel Haskell program implies multiple processes running on
multiple processors, under a PVM (Parallel Virtual Machine) framework.

Parallel Haskell is still relatively new; it is more about ``research
fun'' than about ``speed.'' That will change.

Again, check Simon's Web page for publications about Parallel Haskell
(including ``GUM'', the key bits of the runtime system).

Some details about Concurrent and Parallel Haskell follow.

%************************************************************************
%*                                                                      *
\subsection{Concurrent and Parallel Haskell---language features}
\index{Concurrent Haskell---features}
\index{Parallel Haskell---features}
%*                                                                      *
%************************************************************************

%************************************************************************
%*                                                                      *
\subsubsection{Features specific to Concurrent Haskell}
%*                                                                      *
%************************************************************************

%************************************************************************
%*                                                                      *
\subsubsubsection{The \tr{Concurrent} interface (recommended)}
\index{Concurrent interface}
%*                                                                      *
%************************************************************************

GHC provides a \tr{Concurrent} module, a common interface to a
collection of useful concurrency abstractions, including those
mentioned in the ``concurrent paper''.

Just put \tr{import Concurrent} into your modules, and away you go.
To create a ``required thread'':

\begin{verbatim}
forkIO :: IO a -> IO a
\end{verbatim}

The \tr{Concurrent} interface also provides access to ``I-Vars''
and ``M-Vars'', which are two flavours of {\em synchronising variables}.
\index{synchronising variables (Glasgow extension)}
\index{concurrency -- synchronising variables}

\tr{IVars}\index{IVars (Glasgow extension)} are write-once
variables.  They start out empty, and any threads that attempt to read
them will block until they are filled.  Once they are written, any
blocked threads are freed, and additional reads are permitted.
Attempting to write a value to a full \tr{IVar} results in a runtime
error.  Interface:
\begin{verbatim}
newIVar     :: IO (IVar a)
readIVar    :: IVar a -> IO a
writeIVar   :: IVar a -> a -> IO ()
\end{verbatim}

\tr{MVars}\index{MVars (Glasgow extension)} are rendezvous points,
mostly for concurrent threads.  They begin empty, and any attempt to
read an empty \tr{MVar} blocks.  When an \tr{MVar} is written, a
single blocked thread may be freed.  Reading an \tr{MVar} toggles its
state from full back to empty.  Therefore, any value written to an
\tr{MVar} may only be read once.  Multiple reads and writes are
allowed, but there must be at least one read between any two
writes. Interface:
\begin{verbatim}
newEmptyMVar :: IO (MVar a)
newMVar      :: a -> IO (MVar a)
takeMVar     :: MVar a -> IO a
putMVar      :: MVar a -> a -> IO ()
readMVar     :: MVar a -> IO a
swapMVar     :: MVar a -> a -> IO a
\end{verbatim}

A {\em channel variable} (@CVar@) is a one-element channel, as
described in the paper:

\begin{verbatim}
data CVar a
newCVar :: IO (CVar a)
putCVar :: CVar a -> a -> IO ()
getCVar :: CVar a -> IO a
\end{verbatim}

A @Channel@ is an unbounded channel:

\begin{verbatim}
data Chan a 
newChan         :: IO (Chan a)
putChan         :: Chan a -> a -> IO ()
getChan         :: Chan a -> IO a
dupChan         :: Chan a -> IO (Chan a)
unGetChan       :: Chan a -> a -> IO ()
getChanContents :: Chan a -> IO [a]
\end{verbatim}

General and quantity semaphores:

\begin{verbatim}
data QSem
newQSem     :: Int   -> IO QSem
waitQSem    :: QSem  -> IO ()
signalQSem  :: QSem  -> IO ()

data QSemN
newQSemN    :: Int   -> IO QSemN
signalQSemN :: QSemN -> Int -> IO ()
waitQSemN   :: QSemN -> Int -> IO ()
\end{verbatim}

Merging streams---binary and n-ary:

\begin{verbatim}
mergeIO  :: [a]   -> [a] -> IO [a]
nmergeIO :: [[a]] -> IO [a]
\end{verbatim}

A {\em Sample variable} (@SampleVar@) is slightly different from a
normal @MVar@:
\begin{itemize}
\item Reading an empty @SampleVar@ causes the reader to block
    (same as @takeMVar@ on empty @MVar@).
\item Reading a filled @SampleVar@ empties it and returns value.
    (same as @takeMVar@)
\item Writing to an empty @SampleVar@ fills it with a value, and
potentially, wakes up a blocked reader  (same as for @putMVar@ on empty @MVar@).
\item Writing to a filled @SampleVar@ overwrites the current value.
 (different from @putMVar@ on full @MVar@.)
\end{itemize}

\begin{verbatim}
type SampleVar a = MVar (Int, MVar a)

emptySampleVar :: SampleVar a -> IO ()
newSampleVar   :: IO (SampleVar a)
readSample     :: SampleVar a -> IO a
writeSample    :: SampleVar a -> a -> IO ()
\end{verbatim}

Finally, there are operations to delay a concurrent thread, and to
make one wait:\index{delay a concurrent thread}
\index{wait for a file descriptor}
\begin{verbatim}
threadDelay :: Int -> IO () -- delay rescheduling for N microseconds
threadWait  :: Int -> IO () -- wait for input on specified file descriptor
\end{verbatim}

%************************************************************************
%*                                                                      *
\subsubsection{Features specific to Parallel Haskell}
%*                                                                      *
%************************************************************************

%************************************************************************
%*                                                                      *
\subsubsubsection{The \tr{Parallel} interface (recommended)}
\index{Parallel interface}
%*                                                                      *
%************************************************************************

GHC provides two functions for controlling parallel execution, through
the \tr{Parallel} interface:
\begin{verbatim}
interface Parallel where
infixr 0 `par`
infixr 1 `seq`

par :: a -> b -> b
seq :: a -> b -> b
\end{verbatim}

The expression \tr{(x `par` y)} {\em sparks} the evaluation of \tr{x}
(to weak head normal form) and returns \tr{y}.  Sparks are queued for
execution in FIFO order, but are not executed immediately.  At the
next heap allocation, the currently executing thread will yield
control to the scheduler, and the scheduler will start a new thread
(until reaching the active thread limit) for each spark which has not
already been evaluated to WHNF.

The expression \tr{(x `seq` y)} evaluates \tr{x} to weak head normal
form and then returns \tr{y}.  The \tr{seq} primitive can be used to
force evaluation of an expression beyond WHNF, or to impose a desired
execution sequence for the evaluation of an expression.

For example, consider the following parallel version of our old
nemesis, \tr{nfib}:

\begin{verbatim}
import Parallel

nfib :: Int -> Int
nfib n | n <= 1 = 1
       | otherwise = par n1 (seq n2 (n1 + n2 + 1))
                     where n1 = nfib (n-1) 
                           n2 = nfib (n-2)
\end{verbatim}

For values of \tr{n} greater than 1, we use \tr{par} to spark a thread
to evaluate \tr{nfib (n-1)}, and then we use \tr{seq} to force the
parent thread to evaluate \tr{nfib (n-2)} before going on to add
together these two subexpressions.  In this divide-and-conquer
approach, we only spark a new thread for one branch of the computation
(leaving the parent to evaluate the other branch).  Also, we must use
\tr{seq} to ensure that the parent will evaluate \tr{n2} {\em before}
\tr{n1} in the expression \tr{(n1 + n2 + 1)}.  It is not sufficient to
reorder the expression as \tr{(n2 + n1 + 1)}, because the compiler may
not generate code to evaluate the addends from left to right.

%************************************************************************
%*                                                                      *
\subsubsubsection{Underlying functions and primitives}
\index{parallelism primitives}
\index{primitives for parallelism}
%*                                                                      *
%************************************************************************

The functions \tr{par} and \tr{seq} are wired into GHC, and unfold
into uses of the \tr{par#} and \tr{seq#} primitives, respectively.  If
you'd like to see this with your very own eyes, just run GHC with the
\tr{-ddump-simpl} option.  (Anything for a good time...)

You can use \tr{par} and \tr{seq} in Concurrent Haskell, though
I'm not sure why you would want to.

%************************************************************************
%*                                                                      *
\subsubsection{Features common to Concurrent and Parallel Haskell}
%*                                                                      *
%************************************************************************

Actually, you can use the \tr{`par`} and \tr{`seq`} combinators
(really for Parallel Haskell) in Concurrent Haskell as well.
But doing things like ``\tr{par} to \tr{forkIO} many required threads''
counts as ``jumping out the 9th-floor window, just to see what happens.''

%************************************************************************
%*                                                                      *
\subsubsubsection{Scheduling policy for concurrent/parallel threads}
\index{Scheduling---concurrent/parallel}
\index{Concurrent/parallel scheduling}
%*                                                                      *
%************************************************************************

Runnable threads are scheduled in round-robin fashion.  Context
switches are signalled by the generation of new sparks or by the
expiry of a virtual timer (the timer interval is configurable with the
\tr{-C[<num>]}\index{-C<num> RTS option (concurrent, parallel)} RTS option).
However, a context switch doesn't really happen until the next heap
allocation.  If you want extremely short time slices, the \tr{-C} RTS
option can be used to force a context switch at each and every heap
allocation.

When a context switch occurs, pending sparks which have not already
been reduced to weak head normal form are turned into new threads.
However, there is a limit to the number of active threads (runnable or
blocked) which are allowed at any given time.  This limit can be
adjusted with the \tr{-t<num>}\index{-t <num> RTS option (concurrent, parallel)}
RTS option (the default is 32).  Once the
thread limit is reached, any remaining sparks are deferred until some
of the currently active threads are completed.

%************************************************************************
%*                                                                      *
\subsection{How to use Concurrent and Parallel Haskell}
%*                                                                      *
%************************************************************************

[You won't get far unless your GHC system was configured/built with
concurrency and/or parallelism enabled.  (They require separate
library modules.)  The relevant section of the installation guide says
how to do this.]

%************************************************************************
%*                                                                      *
\subsubsection{Using Concurrent Haskell}
\index{Concurrent Haskell---use}
%*                                                                      *
%************************************************************************

To compile a program as Concurrent Haskell, use the \tr{-concurrent}
option,\index{-concurrent option} both when compiling {\em and
linking}.  You will probably need the \tr{-fglasgow-exts} option, too.

Three RTS options are provided for modifying the behaviour of the
threaded runtime system.  See the descriptions of \tr{-C[<us>]}, \tr{-q},
and \tr{-t<num>} in \Sectionref{parallel-rts-opts}.

%************************************************************************
%*                                                                      *
\subsubsubsection[concurrent-problems]{Potential problems with Concurrent Haskell}
\index{Concurrent Haskell problems}
\index{problems, Concurrent Haskell}
%*                                                                      *
%************************************************************************

The main thread in a Concurrent Haskell program is given its own
private stack space, but all other threads are given stack space from
the heap.  Stack space for the main thread can be
adjusted as usual with the \tr{-K} RTS
option,\index{-K RTS option (concurrent, parallel)} but if this
private stack space is exhausted, the main thread will switch to stack
segments in the heap, just like any other thread.  Thus, problems
which would normally result in stack overflow in ``sequential Haskell''
can be expected to result in heap overflow when using threads.

The concurrent runtime system uses black holes as synchronisation
points for subexpressions which are shared among multiple threads.  In
``sequential Haskell'', a black hole indicates a cyclic data
dependency, which is a fatal error.  However, in concurrent execution, a
black hole may simply indicate that the desired expression is being
evaluated by another thread.  Therefore, when a thread encounters a
black hole, it simply blocks and waits for the black hole to be
updated.  Cyclic data dependencies will result in deadlock, and the
program will fail to terminate.

Because the concurrent runtime system uses black holes as
synchronisation points, it is not possible to disable black-holing
with the \tr{-N} RTS option.\index{-N RTS option} Therefore, the use
of signal handlers (including timeouts) with the concurrent runtime
system can lead to problems if a thread attempts to enter a black hole
that was created by an abandoned computation.  The use of signal
handlers in conjunction with threads is strongly discouraged.


%************************************************************************
%*                                                                      *
\subsubsection{Using Parallel Haskell}
\index{Parallel Haskell---use}
%*                                                                      *
%************************************************************************

[You won't be able to execute parallel Haskell programs unless PVM3
(Parallel Virtual Machine, version 3) is installed at your site.]

To compile a Haskell program for parallel execution under PVM, use the
\tr{-parallel} option,\index{-parallel option} both when compiling
{\em and linking}.  You will probably want to \tr{import Parallel}
into your Haskell modules.

To run your parallel program, once PVM is going, just invoke it ``as
normal''.  The main extra RTS option is \tr{-N<n>}, to say how many
PVM ``processors'' your program to run on.  (For more details of
all relevant RTS options, please see \sectionref{parallel-rts-opts}.)

In truth, running Parallel Haskell programs and getting information
out of them (e.g., parallelism profiles) is a battle with the vagaries of
PVM, detailed in the following sections.

%************************************************************************
%*                                                                      *
\subsubsubsection{Dummy's guide to using PVM}
\index{PVM, how to use}
\index{Parallel Haskell---PVM use}
%*                                                                      *
%************************************************************************

Before you can run a parallel program under PVM, you must set the
required environment variables (PVM's idea, not ours); something like,
probably in your \tr{.cshrc} or equivalent:
\begin{verbatim}
setenv PVM_ROOT /wherever/you/put/it
setenv PVM_ARCH `$PVM_ROOT/lib/pvmgetarch`
setenv PVM_DPATH $PVM_ROOT/lib/pvmd
\end{verbatim}

Creating and/or controlling your ``parallel machine'' is a purely-PVM
business; nothing specific to Parallel Haskell.

You use the \tr{pvm}\index{pvm command} command to start PVM on your
machine.  You can then do various things to control/monitor your
``parallel machine;'' the most useful being:

\begin{tabular}{ll}
\tr{Control-D} & exit \tr{pvm}, leaving it running \\
\tr{halt} & kill off this ``parallel machine'' \& exit \\
\tr{add <host>} & add \tr{<host>} as a processor \\
\tr{delete <host>} & delete \tr{<host>} \\
\tr{reset}	& kill what's going, but leave PVM up \\
\tr{conf}       & list the current configuration \\
\tr{ps}         & report processes' status \\
\tr{pstat <pid>} & status of a particular process \\
\end{tabular}

The PVM documentation can tell you much, much more about \tr{pvm}!

%************************************************************************
%*                                                                      *
\subsubsection{Parallelism profiles}
\index{parallelism profiles}
\index{profiles, parallelism}
\index{visualisation tools}
%*                                                                      *
%************************************************************************

With Parallel Haskell programs, we usually don't care about the
results---only with ``how parallel'' it was!  We want pretty pictures.

Parallelism profiles (\`a la \tr{hbcpp}) can be generated with the
\tr{-q}\index{-q RTS option (concurrent, parallel)} RTS option.  The
per-processor profiling info is dumped into files named
\tr{<full-path><program>.gr}.  These are then munged into a PostScript picture,
which you can then display.  For example, to run your program
\tr{a.out} on 8 processors, then view the parallelism profile, do:

\begin{verbatim}
% ./a.out +RTS -N8 -q
% grs2gr *.???.gr > temp.gr	# combine the 8 .gr files into one
% gr2ps -O temp.gr		# cvt to .ps; output in temp.ps
% ghostview -seascape temp.ps	# look at it!
\end{verbatim}

The scripts for processing the parallelism profiles are distributed
in \tr{ghc/utils/parallel/}.

%$$************************************************************************
%$$*                                                                      *
%$$\subsubsection{Activity profiles}
%$$\index{activity profiles}
%$$\index{profiles, activity}
%$$\index{visualisation tools}
%$$%$$*                                                                      *
%$$%$$************************************************************************
%$$
%$$You can also use the standard GHC ``cost-centre'' profiling to see how
%$$much time each PVM ``processor'' spends
%$$
%$$No special compilation flags beyond \tr{-parallel} are required to get
%$$this basic four-activity profile.  Just use the \tr{-P} RTS option,
%$$thusly:
%$$\begin{verbatim}
%$$./a.out +RTS -N7 -P	# 7 processors
%$$\end{verbatim}
%$$
%$$The above will create files named \tr{<something>.prof} and/or
%$$\tr{<something>.time} {\em in your home directory}.  You can
%$$process the \tr{.time} files into PostScript using \tr{hp2ps},
%$$\index{hp2ps}
%$$as described elsewhere in this guide.
%$$
%$$Because of the weird file names, you probably need to use
%$$\tr{hp2ps} as a filter.  Also, you probably want to give \tr{hp2ps}
%$$a \tr{-t0} flag, so that no ``inconsequential'' data is ignored---in
%$$parallel-land it's all consequential.  So:
%$$\begin{verbatim}
%$$%$$ hp2ps -t0 < fooo.001.time > temp.ps
%$$\end{verbatim}
%$$
%$$ The first line of the
%$$ \tr{.qp} file contains the name of the program executed, along with
%$$ any program arguments and thread-specific RTS options.  The second
%$$ line contains the date and time of program execution.  The third
%$$ and subsequent lines contain information about thread state transitions.
%$$ 
%$$ The thread state transition lines have the following format:
%$$ \begin{verbatim}
%$$ time transition thread-id thread-name [thread-id thread-name]
%$$ \end{verbatim}
%$$ 
%$$ The \tr{time} is the virtual time elapsed since the program started
%$$ execution, in milliseconds.  The \tr{transition} is a two-letter code
%$$ indicating the ``from'' queue and the ``to'' queue, where each queue
%$$ is one of:
%$$ \begin{itemize}
%$$ \item[\tr{*}] Void: Thread creation or termination.
%$$ \item[\tr{G}] Green: Runnable (or actively running, with \tr{-qv}) threads.
%$$ \item[\tr{A}] Amber: Runnable threads (\tr{-qv} only).
%$$ \item[\tr{R}] Red: Blocked threads.
%$$ \end{itemize}
%$$ The \tr{thread-id} is a unique integer assigned to each thread.  The
%$$ \tr{thread-name} is currently the address of the thread's root closure
%$$ (in hexadecimal).  In the future, it will be the name of the function
%$$ associated with the root of the thread.
%$$ 
%$$ The first \tr{(thread-id, thread-name)} pair identifies the thread
%$$ involved in the indicated transition.  For \tr{RG} and \tr{RA} transitions 
%$$ only, there is a second \tr{(thread-id, thread-name)} pair which identifies
%$$ the thread that released the blocked thread.
%$$ 
%$$ Provided with the GHC distribution is a perl script, \tr{qp2pp}, which
%$$ will convert \tr{.qp} files to \tr{hbcpp}'s \tr{.pp} format, so that
%$$ you can use the \tr{hbcpp} profiling tools, such as \tr{pp2ps92}.  The
%$$ \tr{.pp} format has undergone many changes, so the conversion script
%$$ is not compatible with earlier releases of \tr{hbcpp}.  Note that GHC
%$$ and \tr{hbcpp} use different thread scheduling policies (in
%$$ particular, \tr{hbcpp} threads never move from the green queue to the
%$$ amber queue).  For compatibility, the \tr{qp2pp} script eliminates the
%$$ GHC amber queue, so there is no point in using the verbose (\tr{-qv})
%$$ option if you are only interested in using the \tr{hbcpp} profiling
%$$ tools.

%************************************************************************
%*                                                                      *
\subsubsection{Other useful info about running parallel programs}
%*                                                                      *
%************************************************************************

The ``garbage-collection statistics'' RTS options can be useful for
seeing what parallel programs are doing.  If you do either
\tr{+RTS -Sstderr}\index{-Sstderr RTS option} or \tr{+RTS -sstderr}, then
you'll get mutator, garbage-collection, etc., times on standard
error. The standard error of all PE's other than the `main thread'
appears in \tr{/tmp/pvml.nnn}, courtesy of PVM.

Whether doing \tr{+RTS -Sstderr} or not, a handy way to watch
what's happening overall is: \tr{tail -f /tmp/pvml.nnn}.

%************************************************************************
%*                                                                      *
\subsubsection[parallel-rts-opts]{RTS options for Concurrent/Parallel Haskell}
\index{RTS options, concurrent}
\index{RTS options, parallel}
\index{Concurrent Haskell---RTS options}
\index{Parallel Haskell---RTS options}
%*                                                                      *
%************************************************************************

Besides the usual runtime system (RTS) options
(\sectionref{runtime-control}), there are a few options particularly
for concurrent/parallel execution.

\begin{description}
\item[\tr{-N<N>}:]
\index{-N<N> RTS option (parallel)}
(PARALLEL ONLY) Use \tr{<N>} PVM processors to run this program;
the default is 2.

\item[\tr{-C[<us>]}:]
\index{-C<us> RTS option}
Sets the context switch interval to \pl{<us>} microseconds.  A context
switch will occur at the next heap allocation after the timer expires.
With \tr{-C0} or \tr{-C}, context switches will occur as often as
possible (at every heap allocation).  By default, context switches
occur every 10 milliseconds.  Note that many interval timers are only
capable of 10 millisecond granularity, so the default setting may be
the finest granularity possible, short of a context switch at every
heap allocation.

\item[\tr{-q[v]}:]
\index{-q RTS option}
Produce a quasi-parallel profile of thread activity, in the file
\tr{<program>.qp}.  In the style of \tr{hbcpp}, this profile records
the movement of threads between the green (runnable) and red (blocked)
queues.  If you specify the verbose suboption (\tr{-qv}), the green
queue is split into green (for the currently running thread only) and
amber (for other runnable threads).  We do not recommend that you use
the verbose suboption if you are planning to use the \tr{hbcpp}
profiling tools or if you are context switching at every heap check
(with \tr{-C}).

\item[\tr{-t<num>}:]
\index{-t<num> RTS option}
Limit the number of concurrent threads per processor to \pl{<num>}.
The default is 32.  Each thread requires slightly over 1K {\em words}
in the heap for thread state and stack objects.  (For 32-bit machines,
this translates to 4K bytes, and for 64-bit machines, 8K bytes.)

\item[\tr{-d}:]
\index{-d RTS option (parallel)}
(PARALLEL ONLY) Turn on debugging.  It pops up one xterm (or GDB, or
something...) per PVM processor.  We use the standard \tr{debugger}
script that comes with PVM3, but we sometimes meddle with the
\tr{debugger2} script.  We include ours in the GHC distribution,
in \tr{ghc/utils/pvm/}.

\item[\tr{-e<num>}:]
\index{-e<num> RTS option (parallel)}
(PARALLEL ONLY) Limit the number of pending sparks per processor to
\tr{<num>}. The default is 100. A larger number may be appropriate if
your program generates large amounts of parallelism initially.

\item[\tr{-Q<num>}:]
\index{-Q<num> RTS option (parallel)}
(PARALLEL ONLY) Set the size of packets transmitted between processors
to \tr{<num>}. The default is 1024 words. A larger number may be
appropriate if your machine has a high communication cost relative to
computation speed.
\end{description}

%************************************************************************
%*                                                                      *
\subsubsubsection[parallel-problems]{Potential problems with Parallel Haskell} 
\index{Parallel Haskell---problems} 
\index{problems, Parallel Haskell} 
%*                                                                      *
%************************************************************************

The ``Potential problems'' for Concurrent Haskell also apply for
Parallel Haskell.  Please see \Sectionref{concurrent-problems}.

%$$ \subsubsubsection[par-notes]{notes for 0.26}
%$$ 
%$$ \begin{verbatim}
%$$ Install PVM somewhere, as it says.  We use 3.3
%$$ 
%$$ pvm.h : can do w/ a link from ghc/includes to its true home (???)
%$$ 
%$$ 
%$$ ghc -gum ... => a.out
%$$ 
%$$     a.out goes to $PVM_ROOT/bin/$PVM_ARCH/$PE
%$$ 
%$$     (profiling outputs go to ~/$PE.<process-num>.<suffix>)
%$$ 
%$$     trinder scripts in: ~trinder/bin/any/instPHIL
%$$ 
%$$ To run:
%$$ 
%$$     Then:
%$$     SysMan [-] N (PEs) args-to-program...
%$$ 
%$$         - ==> debug mode
%$$                 mattson setup: GDB window per task
%$$                 /local/grasp_tmp5/mattson/pvm3/lib/debugger{,2}
%$$ 
%$$                 to set breakpoint, etc, before "run", just modify debugger2
%$$ 
%$$     stderr and stdout are directed to /tmp/pvml.NNN
%$$ 
%$$ Visualisation stuff (normal _mp build):
%$$ 
%$$ +RTS -q         gransim-like profiling
%$$                 (should use exactly-gransim RTS options)
%$$      -qb        binary dumps : not tried, not recommended: hosed!
%$$ 
%$$     ascii dump : same info as gransim, one extra line at top w/
%$$                 start time; all times are ms since then
%$$ 
%$$     dumps appear in $HOME/<program>.nnn.gr
%$$ 
%$$ ~mattson/grs2gr.pl == combine lots into one (fixing times)
%$$ 
%$$ /local/grasp/hwloidl/GrAn/bin/ is where scripts are.
%$$ 
%$$ gr2ps == activity profile (bash script)
%$$ 
%$$ ~mattson/bin/`arch`/gr2qp must be picked up prior to hwloidl's for
%$$ things to work...
%$$ 
%$$ +RTS -[Pp]      (parallel) 4-cost-centre "profiling" (gc,MAIN,msg,idle)
%$$ 
%$$         ToDos: time-profiles from hp2ps: something about zeroth sample;
%$$ \end{verbatim}