ghc/docs/comm/exts/ndp.html

   1 <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">
   2 <html>
   3   <head>
   4     <META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=ISO-8859-1">
   5     <title>The GHC Commentary - Parallel Arrays</title>
   6   </head>
   7
   8   <body BGCOLOR="FFFFFF">
   9     <h1>The GHC Commentary - Parallel Arrays</h1>
  10     <p>
  11       This section describes an experimental extension by high-performance
  12       arrays, which comprises special syntax for array types and array
  13       comprehensions, a set of optimising program transformations, and a set
  14       of special purpose libraries.  The extension is currently only partially
  15       implemented, but the development will be tracked here.
  16     <p>
  17       Parallel arrays originally got their name from the aim to provide an
  18       architecture-independent programming model for a range of parallel
  19       computers.  However, since experiments showed that the approach is also
  20       worthwhile for sequential array code, the emphasis has shifted to their
  21       parallel evaluation semantics: As soon as any element in a parallel
  22       array is demanded, all the other elements are evaluated, too.  This
  23       makes parallel arrays more strict than <a
  24       href="http://haskell.org/onlinelibrary/array.html">standard Haskell 98
  25       arrays</a>, but also opens the door for a loop-based implementation
  26       strategy that leads to significantly more efficient code.
  27     <p>
  28       The programming model as well as the use of the <em>flattening
  29       transformation</em>, which is central to the approach, has its origin in
  30       the programming language <a
  31       href="http://www.cs.cmu.edu/~scandal/nesl.html">Nesl</a>.
  32
  33     <h2>More Sugar: Special Syntax for Array Comprehensions</h2>
  34     <p>
  35       The option <code>-fparr</code>, which is a dynamic hsc option that can
  36       be reversed with <code>-fno-parr</code>, enables special syntax for
  37       parallel arrays, which is not essential to using parallel arrays, but
  38       makes for significantly more concise programs.  The switch works by
  39       making the lexical analyser (located in <a
  40       href="http://cvs.haskell.org/cgi-bin/cvsweb.cgi/fptools/ghc/compiler/parser/Lex.lhs"><code>Lex.lhs</code></a>)
  41       recognise the tokens <code>[:</code> and <code>:]</code>.   Given that
  42       the additional productions in the parser (located in <a
  43       href="http://cvs.haskell.org/cgi-bin/cvsweb.cgi/fptools/ghc/compiler/parser/Parser.y"><code>Parser.y</code></a>)
  44       cannot be triggered without the lexer generating the necessary tokens,
  45       there is no need to alter the behaviour of the parser.
  46     <p>
  47       The following additional syntax is accepted (the non-terminals are those
  48       from the <a href="http://haskell.org/onlinereport/">Haskell 98 language
  49       definition</a>):
  50     <p>
  51       <blockquote><pre>
  52 atype -> '[:' type ':]                               (parallel array type)
  53
  54 aexp  -> '[:' exp1 ',' ... ',' expk ':]'             (explicit array, k >= 0)
  55       |  '[:' exp1 [',' exp2] '..' exp3 ':]'         (arithmetic array sequence)
  56       |  '[:' exp '|' quals1 '|' ... '|' qualsn ':]' (array comprehension, n >= 1)
  57
  58 quals -> qual1 ',' ... ',' qualn                     (qualifier list, n >= 1)
  59
  60 apat  -> '[:' pat1 ',' ... ',' patk ':]'             (array pattern, k >= 0)
  61 </pre>
  62     </blockquote>
  63     <p>
  64       Moreover, the extended comprehension syntax that allows for <em>parallel
  65       qualifiers</em> (i.e., qualifiers separated by "<code>|</code>") is also
  66       supported in list comprehensions.  In general, the similarity to the
  67       special syntax for list is obvious.  The two main differences are that
  68       (a) arithmetic array sequences are always finite and (b)
  69       <code>[::]</code> is not treated as a constructor in expressions and
  70       patterns, but rather as a special case of the explicit array syntax.
  71       The former is a simple consequence of the parallel evaluation semantics
  72       of parallel arrays and the latter is due to arrays not being a
  73       constructor-based data type.
  74     <p>
  75       As a naming convention, types and functions that are concerned with
  76       parallel arrays usually contain the string <code>parr</code> or
  77       <code>PArr</code> (often as a prefix), and where corresponding types or
  78       functions for handling lists exist, the name is identical, except for
  79       containing the substring <code>parr</code> instead of <code>list</code>
  80       (possibly in caps).
  81     <p>
  82       The following implications are worth noting explicitly:
  83     <ul>
  84       <li>As the value and pattern <code>[::]</code> is an empty explicit
  85         parallel array (i.e., something of the form <code>ExplicitPArr ty
  86         []</code> in the AST).  This is in contrast to lists, which use the
  87         nil-constructor instead.  In the case of parallel arrays, using a
  88         constructor would be rather awkward, as it is not a constructor-based
  89         type. (This becomes rather clear in the desugarer.)
  90       <li>As a consequence, array patterns have the general form <code>[:p1,
  91           p2, ..., pn:]</code>, where <code>n</code> >= 0.  Thus, two array
  92         patterns overlap iff they have the same length -- an important property
  93         for the pattern matching compiler.
  94     </ul>
  95
  96     <h2>Prelude Support for Parallel Arrays</h2>
  97     <p>
  98       The Prelude module <a
  99       href="http://cvs.haskell.org/cgi-bin/cvsweb.cgi/fptools/ghc/lib/std/PrelPArr.lhs"><code>PrelPArr</code></a>
 100       defines the standard operations and their types on parallel arrays and
 101       provides a basic implementation based on boxed arrays.  The interface of
 102       <code>PrelPArr</code> is oriented by H98's <code>PrelList</code>, but
 103       leaving out all functions that require infinite structures and adding
 104       frequently needed array operations, such as permutations.  Parallel
 105       arrays are quite unqiue in that they use an entirely different
 106       representation as soon as the flattening transformation is activated,
 107       which is described further below.  In particular, <code>PrelPArr</code>
 108       defines the type <code>[::]</code> and operations to create, process,
 109       and inspect parallel arrays.  The type as well as the names of some of
 110       the operations are also hardwired in <a
 111       href="http://cvs.haskell.org/cgi-bin/cvsweb.cgi/fptools/ghc/compiler/prelude/TysWiredIn.lhs"><code>TysWiredIn</code></a>
 112       (see the definition of <code>parrTyCon</code> in this module) and <a
 113       href="http://cvs.haskell.org/cgi-bin/cvsweb.cgi/fptools/ghc/compiler/prelude/PrelNames.lhs"><code>PrelNames</code></a>.
 114       This is again very much like the case of lists, where the type is
 115       defined in <a
 116       href="http://cvs.haskell.org/cgi-bin/cvsweb.cgi/fptools/ghc/lib/std/PrelBase.lhs"><code>PrelBase</code></a>
 117       and similarly wired in; however, for lists the entirely
 118       constructor-based definition is exposed to user programs, which is not
 119       the case for parallel arrays.
 120
 121     <h2>Desugaring Parallel Arrays</h2>
 122     <p>
 123       The parallel array extension requires the desugarer to replace all
 124       occurrences of (1) explicit parallel arrays, (2) array patterns, and (3)
 125       array comprehensions by a suitable combination of invocations of
 126       operations defined in <code>PrelPArr</code>.
 127
 128     <h4>Explicit Parallel Arrays</h4>
 129     <p>
 130       These are by far the simplest to remove.  We simply replace every
 131       occurrence of <code>[:<i>e<sub>1</sub></i>, ...,
 132       <i>e<sub>n</sub></i>:]</code> by
 133     <blockquote>
 134       <code>
 135         toP [<i>e<sub>1</sub></i>, ..., <i>e<sub>n</sub></i>]
 136       </code>
 137     </blockquote>
 138     <p>
 139       i.e., we build a list of the array elements, which is, then, converted
 140       into a parallel array.
 141
 142     <h4>Parallel Array Patterns</h4>
 143     <p>
 144       Array patterns are much more tricky as element positions may contain
 145       further patterns and the <a
 146       href="../the-beast/desugar.html#patmat">pattern matching compiler</a>
 147       requires us to flatten all those out.  But before we turn to the gory
 148       details, here first the basic idea: A flat array pattern matches exactly
 149       iff it's length corresponds to the length of the matched array.  Hence,
 150       if we have a set of flat array patterns matching an array value
 151       <code>a</code>, it suffices to generate a Core <code>case</code>
 152       expression that scrutinises <code>lengthP a</code> and has one
 153       alternative for every length of array occuring in one of the patterns.
 154       Moreover, there needs to be a default case catching all other array
 155       lengths.  In each alternative, array indexing (i.e., the functions
 156       <code>!:</code>) is used to bind array elements to the corresponding
 157       pattern variables.  This sounds easy enough and is essentially what the
 158       parallel array equation of the function <a
 159       href="http://cvs.haskell.org/cgi-bin/cvsweb.cgi/fptools/ghc/compiler/deSugar/DsUtils.lhs"><code>DsUtils</code></a><code>.mkCoAlgCaseMatchResult</code>
 160       does.
 161     <p>
 162       Unfortunately, however, the pattern matching compiler expects that it
 163       can turn (almost) any pattern into variable patterns, literals, or
 164       constructor applications by way of the functions <a
 165       href="http://cvs.haskell.org/cgi-bin/cvsweb.cgi/fptools/ghc/compiler/deSugar/Match.lhs"><code>Match</code></a><code>.tidy1</code>.
 166       And to make matters worse, some weird machinery in the module <a
 167       href="http://cvs.haskell.org/cgi-bin/cvsweb.cgi/fptools/ghc/compiler/deSugar/Check.lhs"><code>Check</code></a>
 168       insists on being able to reverse the process (essentially to pretty
 169       print patterns in warnings for incomplete or overlapping patterns).
 170     <p>
 171       The solution to this is an (unlimited) set of <em>fake</em> constructors
 172       for parallel arrays, courtesy of <a
 173       href="http://cvs.haskell.org/cgi-bin/cvsweb.cgi/fptools/ghc/compiler/prelude/TysWiredIn.lhs"><code>TysWiredIn</code></a><code>.parrFakeCon</code>.
 174       In other words, any pattern of the form <code>[:<i>p<sub>1</sub></i>,
 175       ..., <i>p<sub>n</sub></i>:]</code> is transformed into
 176     <blockquote>
 177       <code>
 178         MkPArray<i>n</i> <i>p<sub>1</sub></i> ... <i>p<sub>n</sub></i>
 179       </code>
 180     </blockquote>
 181     <p>
 182       by <code>Match.tidy1</code>, then, run through the rest of the pattern
 183       matching compiler, and finally, picked up by
 184       <code>DsUtils.mkCoAlgCaseMatchResult</code>, which converts it into a
 185       <code>case</code> expression as outlined above.
 186     <p>
 187       As an example consider the source expression
 188     <blockquote><pre>
 189 case v of
 190   [:x1:]         -> e1
 191   [:x2, x3, x4:] -> e2
 192   _              -> e3</pre>
 193     </blockquote>
 194     <p>
 195       <code>Match.tidy1</code> converts it into a form that is equivalent to
 196     <blockquote><pre>
 197 case v of {
 198   MkPArr1 x1       -> e1;
 199   MkPArr2 x2 x3 x4 -> e2;
 200   _                -> e3;
 201 }</pre>
 202     </blockquote>
 203     <p>
 204       which <code>DsUtils.mkCoAlgCaseMatchResult</code> turns into the
 205       following Core code:
 206     <blockquote><pre>
 207       case lengthP v of
 208         Int# i# ->
 209           case i# of l {
 210             DFT ->                                        e3
 211             1   -> let x1 = v!:0                       in e1
 212             3   -> let x2 = v!:0; x2 = v!:1; x3 = v!:2 in e2
 213           }</pre>
 214     </blockquote>
 215
 216     <h4>Parallel Array Comprehensions</h4>
 217     <p>
 218       The most challenging construct of the three are array comprehensions.
 219       In principle, it would be possible to transform them in essentially the
 220       same way as list comprehensions, but this would lead to abysmally slow
 221       code as desugaring of list comprehensions generates code that is
 222       optimised for sequential, constructor-based structures.  In contrast,
 223       array comprehensions need to be transformed into code that solely relies
 224       on collective operations and avoids the creation of many small
 225       intermediate arrays.
 226     <p>
 227       The transformation is implemented by the function <a
 228       href="http://cvs.haskell.org/cgi-bin/cvsweb.cgi/fptools/ghc/compiler/deSugar/DsListComp.lhs"><code>DsListComp</code></a><code>.dsPArrComp</code>.
 229       In the following, we denote this transformation function by the form
 230       <code>&lt;&lt;<i>e</i>&gt;&gt; pa ea</code>, where <code><i>e</i></code>
 231       is the comprehension to be compiled and the arguments <code>pa</code>
 232       and <code>ea</code> denote a pattern and the currently processed array
 233       expression, respectively.  The invariant constraining these two
 234       arguments is that all elements in the array produced by <code>ea</code>
 235       will <em>successfully</em> match against <code>pa</code>.
 236     <p>
 237       Given a source-level comprehensions <code>[:e | qss:]</code>, we compile
 238       it with <code>&lt;&lt;[:e | qss:]&gt;&gt; () [:():]</code> using the
 239       rules
 240     <blockquote><pre>
 241 <<[:e' |           :]>> pa ea = mapP (\pa -> e') ea
 242 <<[:e' | b     , qs:]>> pa ea = <<[:e' | qs:]>> pa (filterP (\pa -> b) ea)
 243 <<[:e' | p <- e, qs:]>> pa ea =
 244   let ef = filterP (\x -> case x of {p -> True; _ -> False}) e
 245   in
 246   <<[:e' | qs:]>> (pa, p) (crossP ea ef)
 247 <<[:e' | let ds, qs:]>> pa ea =
 248   <<[:e' | qs:]>> (pa, (x_1, ..., x_n))
 249               (mapP (\v@pa -> (v, let ds in (x_1, ..., x_n))) ea)
 250 where
 251   {x_1, ..., x_n} = DV (ds)             -- Defined Variables
 252 <<[:e' | qs | qss:]>>   pa ea =
 253   <<[:e' | qss:]>> (pa, (x_1, ..., x_n))
 254                (zipP ea <<[:(x_1, ..., x_n) | qs:]>>)
 255 where
 256   {x_1, ..., x_n} = DV (qs)</pre>
 257     </blockquote>
 258     <p>
 259       We assume the denotation of <code>crossP</code> to be given by
 260     <blockquote><pre>
 261 crossP       :: [:a:] -> [:b:] -> [:(a, b):]
 262 crossP a1 a2  = let
 263                 len1 = lengthP a1
 264                 len2 = lengthP a2
 265                 x1   = concatP $ mapP (replicateP len2) a1
 266                 x2   = concatP $ replicateP len1 a2
 267               in
 268               zipP x1 x2</pre>
 269     </blockquote>
 270     <p>
 271       For a more efficient implementation of <code>crossP</code>, see
 272       <a
 273       href="http://cvs.haskell.org/cgi-bin/cvsweb.cgi/fptools/ghc/lib/std/PrelPArr.lhs"><code>PrelPArr</code></a>.
 274     <p>
 275       Moreover, the following optimisations are important:
 276     <ul>
 277       <li>In the <code>p &lt;- e</code> rule, if <code>pa == ()</code>, drop
 278         it and simplify the <code>crossP ea e</code> to <code>e</code>.
 279       <li>We assume that fusion will optimise sequences of array processing
 280         combinators.
 281       <li>FIXME: Do we want to have the following function?
 282         <blockquote><pre>
 283 mapFilterP :: (a -> Maybe b) -> [:a:] -> [:b:]</pre>
 284         </blockquote>
 285         <p>
 286           Even with fusion <code>(mapP (\p -&gt; e) . filterP (\p -&gt;
 287           b))</code> may still result in redundant pattern matching
 288           operations.  (Let's wait with this until we have seen what the
 289           Simplifier does to the generated code.)
 290     </ul>
 291
 292     <h2>Doing Away With Nested Arrays: The Flattening Transformation</h2>
 293     <p>
 294       On the quest towards an entirely unboxed representation of parallel
 295       arrays, the flattening transformation is the essential ingredient.  GHC
 296       uses a <a
 297       href="http://www.cse.unsw.edu.au/~chak/papers/CK00.html">substantially
 298       improved version</a> of the transformation whose original form was
 299       described by Blelloch &amp; Sabot.  The flattening transformation
 300       replaces values of type <code>[:a:]</code> as well as functions
 301       operating on these values by alternative, more efficient data structures
 302       and functions.
 303     <p>
 304       The flattening machinery is activated by the option
 305       <code>-fflatten</code>, which is a static hsc option.  It consists of
 306       two steps: function vectorisation and array specialisation.  Only the
 307       first of those is implemented so far.  If selected, the transformation
 308       is applied to a module in Core form immediately after the <a
 309       href="../the-beast/desugar.html">desugarer,</a> before the <a
 310       href="../the-beast/simplifier.html">Mighty Simplifier</a> gets to do its
 311       job.  After vectorisation, the Core program can be dumped using the
 312       option <code>-ddump-vect</code>.  The is a good reason for us to perform
 313       flattening immediately after the desugarer.  In <a
 314       href="http://cvs.haskell.org/cgi-bin/cvsweb.cgi/fptools/ghc/compiler/main/HscMain.lhs"><code>HscMain</code></a><code>.hscRecomp</code>
 315       the so-called <em>persistent compiler state</em> is maintained, which
 316       contains all the information about imported interface files needed to
 317       lookup the details of imported names (e.g., during renaming and type
 318       checking).  However, all this information is zapped before the
 319       simplifier is invoked (supposedly to reduce the space-consumption of
 320       GHC).  As flattening has to get at all kinds of identifiers from Prelude
 321       modules, we need to do it before the relevant information in the
 322       persistent compiler state is gone.
 323
 324     <p>
 325       As flattening generally requires all libraries to be compiled for
 326       flattening (just like profiling does), there is a <em>compiler way</em>
 327       <code>"ndp"</code>, which can be selected using the way option code
 328       <code>-ndp</code>.  This option will automagically select
 329       <code>-fparr</code> and <code>-fflatten</code>.
 330
 331     <h4><code>FlattenMonad</code></h4>
 332     <p>
 333       The module <a
 334       href="http://cvs.haskell.org/cgi-bin/cvsweb.cgi/fptools/ghc/compiler/ndpFlatten/FlattenMonad.lhs"><code>FlattenMonad</code></a>
 335       implements the monad <code>Flatten</code> that is used during
 336       vectorisation to keep track of various sets of bound variables and a
 337       variable substitution map; moreover, it provides a supply of new uniques
 338       and allows us to look up names in the persistent compiler state (i.e.,
 339       imported identifiers).
 340     <p>
 341       In order to be able to look up imported identifiers in the persistent
 342       compiler state, it is important that these identifies are included into
 343       the free variable lists computed by the renamer.  More precisely, all
 344       names needed by flattening are included in the names produced by
 345       <code>RnEnv.getImplicitModuleFVs</code>.  To avoid putting
 346       flattening-dependent lists of names into the renamer, the module
 347       <code>FlattenInfo</code> exports <code>namesNeededForFlattening</code>.
 348
 349       [FIXME: It might be worthwhile to document in the non-Flattening part of
 350       the Commentary that the persistent compiler state is zapped after
 351       desugaring and how the free variables determined by the renamer imply
 352       which names are imported.]
 353
 354     <p><small>
 355 <!-- hhmts start -->
 356 Last modified: Tue Feb 12 01:44:21 EST 2002
 357 <!-- hhmts end -->
 358     </small>
 359   </body>
 360 </html>