doc tweak

[ghc-hetmet.git] / docs / users_guide / parallel.xml
diff --git a/docs/users_guide/parallel.xml b/docs/users_guide/parallel.xml

index 11c2547..d971980 100644 (file)
--- a/docs/users_guide/parallel.xml
+++ b/docs/users_guide/parallel.xml
@@ -1,204 +1,173 @@
  <?xml version="1.0" encoding="iso-8859-1"?>
-<sect1 id="concurrent-and-parallel">
-<title>Concurrent and Parallel Haskell</title>
-
-<para>
-<indexterm><primary>Concurrent Haskell</primary></indexterm>
-<indexterm><primary>Parallel Haskell</primary></indexterm>
-Concurrent and Parallel Haskell are Glasgow extensions to Haskell
-which let you structure your program as a group of independent
-`threads'.
+<sect1 id="lang-parallel">
+  <title>Concurrent and Parallel Haskell</title>
+  <indexterm><primary>parallelism</primary>
+  </indexterm>
+
+  <para>GHC implements some major extensions to Haskell to support 
+  concurrent and parallel programming.  Let us first etablish terminology:
+  <itemizedlist>
+       <listitem><para><emphasis>Parallelism</emphasis> means running
+         a Haskell program on multiple processors, with the goal of improving
+         performance.  Ideally, this should be done invisibly, and with no
+         semantic changes.
+           </para></listitem>
+       <listitem><para><emphasis>Concurrency</emphasis> means implementing 
+         a program by using multiple I/O-performing threads.  While a
+         concurrent Haskell program <emphasis>can</emphasis> run on a 
+         parallel machine, the primary goal of using concurrency is not to gain
+         performance, but rather because that is the simplest and most
+         direct way to write the program.  Since the threads perform I/O,
+         the semantics of the program is necessarily non-deterministic.
+           </para></listitem>
+  </itemizedlist>
+  GHC supports both concurrency and parallelism. 
+  </para>
+
+  <sect2 id="concurrent-haskell">
+    <title>Concurrent Haskell</title>
+
+  <para>Concurrent Haskell is the name given to GHC's concurrency extension.
+  It is enabled by default, so no special flags are required.
+   The <ulink
+             url="http://research.microsoft.com/copyright/accept.asp?path=/users/simonpj/papers/concurrent-haskell.ps.gz">
+             Concurrent Haskell paper</ulink> is still an excellent
+             resource, as is <ulink
+             url="http://research.microsoft.com/%7Esimonpj/papers/marktoberdorf">Tackling
+             the awkward squad</ulink>.
+  </para><para>
+  To the programmer, Concurrent Haskell introduces no new language constructs;
+  rather, it appears simply as a library, <ulink
+  url="../libraries/base/Control-Concurrent.html">
+             Control.Concurrent</ulink>.  The functions exported by this
+             library include:
+  <itemizedlist>
+<listitem><para>Forking and killing threads.</para></listitem>
+<listitem><para>Sleeping.</para></listitem>
+<listitem><para>Synchronised mutable variables, called <literal>MVars</literal></para></listitem>
+<listitem><para>Support for bound threads; see the paper <ulink
+url="http://research.microsoft.com/%7Esimonpj/Papers/conc-ffi/index.htm">Extending
+the FFI with concurrency</ulink>.</para></listitem>
+</itemizedlist>
  </para>
+</sect2>
  
-<para>
-Concurrent and Parallel Haskell have very different purposes.
-</para>
-
-<para>
-Concurrent Haskell is for applications which have an inherent
-structure of interacting, concurrent tasks (i.e. `threads').  Threads
-in such programs may be <emphasis>required</emphasis>.  For example, if a concurrent thread has been spawned to handle a mouse click, it isn't
-optional&mdash;the user wants something done!
-</para>
-
-<para>
-A Concurrent Haskell program implies multiple `threads' running within
-a single Unix process on a single processor.
-</para>
-
-<para>
-You will find at least one paper about Concurrent Haskell hanging off
-of <ulink url="http://research.microsoft.com/~simonpj/">Simon Peyton
-Jones's Web page</ulink>.
-</para>
-
-<para>
-Parallel Haskell is about <emphasis>speed</emphasis>&mdash;spawning
-threads onto multiple processors so that your program will run faster.
-The `threads' are always <emphasis>advisory</emphasis>&mdash;if the
-runtime system thinks it can get the job done more quickly by
-sequential execution, then fine.
-</para>
-
-<para>
-A Parallel Haskell program implies multiple processes running on
-multiple processors, under a PVM (Parallel Virtual Machine) framework.
-An MPI interface is under development but not fully functional, yet.
-</para>
-
-<para>
-Parallel Haskell is still relatively new; it is more about &ldquo;research
-fun&rdquo; than about &ldquo;speed.&rdquo; That will change.
-</para>
-
-<para>
-Check the <ulink url="http://www.cee.hw.ac.uk/~dsg/gph/">GPH Page</ulink>
-for more information on &ldquo;GPH&rdquo; (Haskell98 with extensions for
-parallel execution), the latest version of &ldquo;GUM&rdquo; (the runtime
-system to enable parallel executions) and papers on research issues.  A
-list of publications about GPH and about GUM is also available from Simon's
-Web Page.
-</para>
-
-<para>
-Some details about Parallel Haskell follow.  For more information
-about concurrent Haskell, see the module
-<literal>Control.Concurrent</literal> in the library documentation.
-</para>
-
-<sect2>
-<title>Features specific to Parallel Haskell
-<indexterm><primary>Parallel Haskell&mdash;features</primary></indexterm></title>
-
-<sect3>
-<title>The <literal>Parallel</literal> interface (recommended)
-<indexterm><primary>Parallel interface</primary></indexterm></title>
-
-<para>
-GHC provides two functions for controlling parallel execution, through
-the <literal>Parallel</literal> interface:
+   <sect2><title>Software Transactional Memory</title>
+
+    <para>GHC now supports a new way to coordinate the activities of Concurrent
+    Haskell threads, called Software Transactional Memory (STM).  The 
+    <ulink
+    url="http://research.microsoft.com/%7Esimonpj/papers/stm/index.htm">STM
+    papers</ulink> are an excellent introduction to what STM is, and how to use
+    it.</para>
+
+   <para>The main library you need to use STM is <ulink
+  url="../libraries/stm/Control-Concurrent-STM.html">
+             Control.Concurrent.STM</ulink>. The main features supported are these:
+<itemizedlist>
+<listitem><para>Atomic blocks.</para></listitem>
+<listitem><para>Transactional variables.</para></listitem>
+<listitem><para>Operations for composing transactions:
+<literal>retry</literal>, and <literal>orElse</literal>.</para></listitem>
+<listitem><para>Data invariants.</para></listitem>
+</itemizedlist>
+All these features are described in the papers mentioned earlier.
  </para>
+</sect2>
  
-<para>
+<sect2><title>Parallel Haskell</title>
+
+  <para>GHC includes support for running Haskell programs in parallel
+  on symmetric, shared-memory multi-processor 
+      (SMP)<indexterm><primary>SMP</primary></indexterm>.
+  By default GHC runs your program on one processor; if you
+     want it to run in parallel you must link your program
+      with the <option>-threaded</option>, and run it with the RTS
+      <option>-N</option> option; see  <xref linkend="using-smp" />).
+      The runtime will
+      schedule the running Haskell threads among the available OS
+      threads, running as many in parallel as you specified with the
+      <option>-N</option> RTS option.</para>
+
+  <para>GHC only supports parallelism on a shared-memory multiprocessor.
+    Glasgow Parallel Haskell<indexterm><primary>Glasgow Parallel Haskell</primary></indexterm>
+    (GPH) supports running Parallel Haskell
+    programs on both clusters of machines, and single multiprocessors.  GPH is
+    developed and distributed
+    separately from GHC (see <ulink url="http://www.cee.hw.ac.uk/~dsg/gph/">The
+      GPH Page</ulink>).  However, the current version of GPH is based on a much older
+    version of GHC (4.06).</para>
+
+  </sect2>
+  <sect2>
+    <title>Annotating pure code for parallelism</title>
+
+  <para>Ordinary single-threaded Haskell programs will not benefit from
+    enabling SMP parallelism alone: you must expose parallelism to the
+    compiler.
+
+    One way to do so is forking threads using Concurrent Haskell (<xref
+    linkend="concurrent-haskell"/>), but the simplest mechanism for extracting parallelism from pure code is
+      to use the <literal>par</literal> combinator, which is closely related to (and often used
+      with) <literal>seq</literal>.  Both of these are available from <ulink
+       url="../libraries/base/Control-Parallel.html"><literal>Control.Parallel</literal></ulink>:</para>
  
  <programlisting>
-interface Parallel where
  infixr 0 `par`
  infixr 1 `seq`
  
  par :: a -&#62; b -&#62; b
-seq :: a -&#62; b -&#62; b
-</programlisting>
+seq :: a -&#62; b -&#62; b</programlisting>
  
-</para>
+    <para>The expression <literal>(x `par` y)</literal>
+      <emphasis>sparks</emphasis> the evaluation of <literal>x</literal>
+      (to weak head normal form) and returns <literal>y</literal>.  Sparks are
+      queued for execution in FIFO order, but are not executed immediately.  If
+      the runtime detects that there is an idle CPU, then it may convert a
+      spark into a real thread, and run the new thread on the idle CPU.  In
+      this way the available parallelism is spread amongst the real
+      CPUs.</para>
  
-<para>
-The expression <literal>(x `par` y)</literal> <emphasis>sparks</emphasis> the evaluation of <literal>x</literal>
-(to weak head normal form) and returns <literal>y</literal>.  Sparks are queued for
-execution in FIFO order, but are not executed immediately.  At the
-next heap allocation, the currently executing thread will yield
-control to the scheduler, and the scheduler will start a new thread
-(until reaching the active thread limit) for each spark which has not
-already been evaluated to WHNF.
-</para>
-
-<para>
-The expression <literal>(x `seq` y)</literal> evaluates <literal>x</literal> to weak head normal
-form and then returns <literal>y</literal>.  The <function>seq</function> primitive can be used to
-force evaluation of an expression beyond WHNF, or to impose a desired
-execution sequence for the evaluation of an expression.
-</para>
-
-<para>
-For example, consider the following parallel version of our old
-nemesis, <function>nfib</function>:
-</para>
-
-<para>
+    <para>For example, consider the following parallel version of our old
+      nemesis, <function>nfib</function>:</para>
  
  <programlisting>
-import Parallel
+import Control.Parallel
  
  nfib :: Int -&#62; Int
  nfib n | n &#60;= 1 = 1
         | otherwise = par n1 (seq n2 (n1 + n2 + 1))
                       where n1 = nfib (n-1)
-                           n2 = nfib (n-2)
-</programlisting>
-
-</para>
-
-<para>
-For values of <varname>n</varname> greater than 1, we use <function>par</function> to spark a thread
-to evaluate <literal>nfib (n-1)</literal>, and then we use <function>seq</function> to force the
-parent thread to evaluate <literal>nfib (n-2)</literal> before going on to add
-together these two subexpressions.  In this divide-and-conquer
-approach, we only spark a new thread for one branch of the computation
-(leaving the parent to evaluate the other branch).  Also, we must use
-<function>seq</function> to ensure that the parent will evaluate <varname>n2</varname> <emphasis>before</emphasis>
-<varname>n1</varname> in the expression <literal>(n1 + n2 + 1)</literal>.  It is not sufficient to
-reorder the expression as <literal>(n2 + n1 + 1)</literal>, because the compiler may
-not generate code to evaluate the addends from left to right.
-</para>
-
-</sect3>
-
-<sect3>
-<title>Underlying functions and primitives
-<indexterm><primary>parallelism primitives</primary></indexterm>
-<indexterm><primary>primitives for parallelism</primary></indexterm></title>
-
-<para>
-The functions <function>par</function> and <function>seq</function> are wired into GHC, and unfold
-into uses of the <function>par&num;</function> and <function>seq&num;</function> primitives, respectively.  If
-you'd like to see this with your very own eyes, just run GHC with the
-<option>-ddump-simpl</option> option.  (Anything for a good time&hellip;)
-</para>
-
-</sect3>
-
-<sect3>
-<title>Scheduling policy for concurrent threads
-<indexterm><primary>Scheduling&mdash;concurrent</primary></indexterm>
-<indexterm><primary>Concurrent scheduling</primary></indexterm></title>
-
-<para>
-Runnable threads are scheduled in round-robin fashion.  Context
-switches are signalled by the generation of new sparks or by the
-expiry of a virtual timer (the timer interval is configurable with the
-<option>-C[&lt;num&gt;]</option><indexterm><primary>-C&lt;num&gt; RTS option (concurrent,
-parallel)</primary></indexterm> RTS option).  However, a context switch doesn't
-really happen until the current heap block is full.  You can't get any
-faster context switching than this.
-</para>
-
-<para>
-When a context switch occurs, pending sparks which have not already
-been reduced to weak head normal form are turned into new threads.
-However, there is a limit to the number of active threads (runnable or
-blocked) which are allowed at any given time.  This limit can be
-adjusted with the <option>-t&lt;num&gt;</option><indexterm><primary>-t &lt;num&gt; RTS option (concurrent, parallel)</primary></indexterm>
-RTS option (the default is 32).  Once the
-thread limit is reached, any remaining sparks are deferred until some
-of the currently active threads are completed.
-</para>
-
-</sect3>
-
-<sect3>
-<title>Scheduling policy for parallel threads
-<indexterm><primary>Scheduling&mdash;parallel</primary></indexterm>
-<indexterm><primary>Parallel scheduling</primary></indexterm></title>
-
-<para>
-In GUM we use an unfair scheduler, which means that a thread continues to
-perform graph reduction until it blocks on a closure under evaluation, on a
-remote closure or until the thread finishes. 
-</para>
- 
-</sect3>
-
-</sect2>
+                           n2 = nfib (n-2)</programlisting>
+
+    <para>For values of <varname>n</varname> greater than 1, we use
+      <function>par</function> to spark a thread to evaluate <literal>nfib (n-1)</literal>,
+      and then we use <function>seq</function> to force the
+      parent thread to evaluate <literal>nfib (n-2)</literal> before going on
+      to add together these two subexpressions.  In this divide-and-conquer
+      approach, we only spark a new thread for one branch of the computation
+      (leaving the parent to evaluate the other branch).  Also, we must use
+      <function>seq</function> to ensure that the parent will evaluate
+      <varname>n2</varname> <emphasis>before</emphasis> <varname>n1</varname>
+      in the expression <literal>(n1 + n2 + 1)</literal>.  It is not sufficient
+      to reorder the expression as <literal>(n2 + n1 + 1)</literal>, because
+      the compiler may not generate code to evaluate the addends from left to
+      right.</para>
+
+    <para>When using <literal>par</literal>, the general rule of thumb is that
+      the sparked computation should be required at a later time, but not too
+      soon.  Also, the sparked computation should not be too small, otherwise
+      the cost of forking it in parallel will be too large relative to the
+      amount of parallelism gained.  Getting these factors right is tricky in
+      practice.</para>
+
+    <para>More sophisticated combinators for expressing parallelism are
+      available from the <ulink
+       url="../libraries/base/Control-Parallel-Strategies.html"><literal>Control.Parallel.Strategies</literal></ulink> module.
+      This module builds functionality around <literal>par</literal>,
+      expressing more elaborate patterns of parallel computation, such as
+      parallel <literal>map</literal>.</para>
+  </sect2>
  
  </sect1>