-</para>
-
-<para>
-The scripts for processing the parallelism profiles are distributed
-in <filename>ghc/utils/parallel/</filename>.
-</para>
-
-</sect2>
-
-<sect2>
-<title>Other useful info about running parallel programs</title>
-
-<para>
-The “garbage-collection statistics” RTS options can be useful for
-seeing what parallel programs are doing. If you do either
-<option>+RTS -Sstderr</option><indexterm><primary>-Sstderr RTS option</primary></indexterm> or <option>+RTS -sstderr</option>, then
-you'll get mutator, garbage-collection, etc., times on standard
-error. The standard error of all PE's other than the `main thread'
-appears in <filename>/tmp/pvml.nnn</filename>, courtesy of PVM.
-</para>
-
-<para>
-Whether doing <option>+RTS -Sstderr</option> or not, a handy way to watch
-what's happening overall is: <command>tail -f /tmp/pvml.nnn</command>.
-</para>
-
-</sect2>
-
-<sect2 id="parallel-rts-opts">
-<title>RTS options for Parallel Haskell
-</title>
-
-<para>
-<indexterm><primary>RTS options, parallel</primary></indexterm>
-<indexterm><primary>parallel Haskell—RTS options</primary></indexterm>
-</para>
-
-<para>
-Besides the usual runtime system (RTS) options
-(<xref linkend="runtime-control"/>), there are a few options particularly
-for parallel execution.
-</para>
-
-<para>
-<variablelist>
-
-<varlistentry>
-<term><option>-qp<N></option>:</term>
-<listitem>
-<para>
-<indexterm><primary>-qp<N> RTS option</primary></indexterm>
-(paraLLEL ONLY) Use <literal><N></literal> PVM processors to run this program;
-the default is 2.
-</para>
-</listitem>
-</varlistentry>
-<varlistentry>
-<term><option>-C[<s>]</option>:</term>
-<listitem>
-<para>
-<indexterm><primary>-C<s> RTS option</primary></indexterm> Sets
-the context switch interval to <literal><s></literal> seconds.
-A context switch will occur at the next heap block allocation after
-the timer expires (a heap block allocation occurs every 4k of
-allocation). With <option>-C0</option> or <option>-C</option>,
-context switches will occur as often as possible (at every heap block
-allocation). By default, context switches occur every 20ms. Note that GHC's internal timer ticks every 20ms, and
-the context switch timer is always a multiple of this timer, so 20ms
-is the maximum granularity available for timed context switches.
-</para>
-</listitem>
-</varlistentry>
-<varlistentry>
-<term><option>-q[v]</option>:</term>
-<listitem>
-<para>
-<indexterm><primary>-q RTS option</primary></indexterm>
-(paraLLEL ONLY) Produce a quasi-parallel profile of thread activity,
-in the file <filename><program>.qp</filename>. In the style of <command>hbcpp</command>, this profile
-records the movement of threads between the green (runnable) and red
-(blocked) queues. If you specify the verbose suboption (<option>-qv</option>), the
-green queue is split into green (for the currently running thread
-only) and amber (for other runnable threads). We do not recommend
-that you use the verbose suboption if you are planning to use the
-<command>hbcpp</command> profiling tools or if you are context switching at every heap
-check (with <option>-C</option>).
--->
-</para>
-</listitem>
-</varlistentry>
-<varlistentry>
-<term><option>-qt<num></option>:</term>
-<listitem>
-<para>
-<indexterm><primary>-qt<num> RTS option</primary></indexterm>
-(paraLLEL ONLY) Limit the thread pool size, i.e. the number of
-threads per processor to <literal><num></literal>. The default is
-32. Each thread requires slightly over 1K <emphasis>words</emphasis> in
-the heap for thread state and stack objects. (For 32-bit machines, this
-translates to 4K bytes, and for 64-bit machines, 8K bytes.)
-</para>
-</listitem>
-</varlistentry>
-<!-- no more -HWL
-<varlistentry>
-<term><option>-d</option>:</term>
-<listitem>
-<para>
-<indexterm><primary>-d RTS option (parallel)</primary></indexterm>
-(paraLLEL ONLY) Turn on debugging. It pops up one xterm (or GDB, or
-something…) per PVM processor. We use the standard <command>debugger</command>
-script that comes with PVM3, but we sometimes meddle with the
-<command>debugger2</command> script. We include ours in the GHC distribution,
-in <filename>ghc/utils/pvm/</filename>.
-</para>
-</listitem>
-</varlistentry>
--->
-<varlistentry>
-<term><option>-qe<num></option>:</term>
-<listitem>
-<para>
-<indexterm><primary>-qe<num> RTS option
-(parallel)</primary></indexterm> (paraLLEL ONLY) Limit the spark pool size
-i.e. the number of pending sparks per processor to
-<literal><num></literal>. The default is 100. A larger number may be
-appropriate if your program generates large amounts of parallelism
-initially.
-</para>
-</listitem>
-</varlistentry>
-<varlistentry>
-<term><option>-qQ<num></option>:</term>
-<listitem>
-<para>
-<indexterm><primary>-qQ<num> RTS option (parallel)</primary></indexterm>
-(paraLLEL ONLY) Set the size of packets transmitted between processors
-to <literal><num></literal>. The default is 1024 words. A larger number may be
-appropriate if your machine has a high communication cost relative to
-computation speed.
-</para>
-</listitem>
-</varlistentry>
-<varlistentry>
-<term><option>-qh<num></option>:</term>
-<listitem>
-<para>
-<indexterm><primary>-qh<num> RTS option (parallel)</primary></indexterm>
-(paraLLEL ONLY) Select a packing scheme. Set the number of non-root thunks to pack in one packet to
-<num>-1 (0 means infinity). By default GUM uses full-subgraph
-packing, i.e. the entire subgraph with the requested closure as root is
-transmitted (provided it fits into one packet). Choosing a smaller value
-reduces the amount of pre-fetching of work done in GUM. This can be
-advantageous for improving data locality but it can also worsen the balance
-of the load in the system.
-</para>
-</listitem>
-</varlistentry>
-<varlistentry>
-<term><option>-qg<num></option>:</term>
-<listitem>
-<para>
-<indexterm><primary>-qg<num> RTS option
-(parallel)</primary></indexterm> (paraLLEL ONLY) Select a globalisation
-scheme. This option affects the
-generation of global addresses when transferring data. Global addresses are
-globally unique identifiers required to maintain sharing in the distributed
-graph structure. Currently this is a binary option. With <num>=0 full globalisation is used
-(default). This means a global address is generated for every closure that
-is transmitted. With <num>=1 a thunk-only globalisation scheme is
-used, which generated global address only for thunks. The latter case may
-lose sharing of data but has a reduced overhead in packing graph structures
-and maintaining internal tables of global addresses.
-</para>
-</listitem>
-</varlistentry>
-</variablelist>
-</para>
-
-</sect2>
+ <para>GHC supports running Haskell programs in parallel on an SMP
+ (symmetric multiprocessor).</para>
+
+ <para>There's a fine distinction between
+ <emphasis>concurrency</emphasis> and <emphasis>parallelism</emphasis>:
+ parallelism is all about making your program run
+ <emphasis>faster</emphasis> by making use of multiple processors
+ simultaneously. Concurrency, on the other hand, is a means of
+ abstraction: it is a convenient way to structure a program that must
+ respond to multiple asynchronous events.</para>
+
+ <para>However, the two terms are certainly related. By making use of
+ multiple CPUs it is possible to run concurrent threads in parallel,
+ and this is exactly what GHC's SMP parallelism support does. But it
+ is also possible to obtain performance improvements with parallelism
+ on programs that do not use concurrency. This section describes how to
+ use GHC to compile and run parallel programs, in <xref
+ linkend="lang-parallel" /> we desribe the language features that affect
+ parallelism.</para>
+
+ <sect2 id="parallel-options">
+ <title>Options to enable SMP parallelism</title>