X-Git-Url: http://git.megacz.com/?p=ghc-hetmet.git;a=blobdiff_plain;f=docs%2Fcomm%2Fexts%2Fndp.html;fp=docs%2Fcomm%2Fexts%2Fndp.html;h=0c94c3960b7834d2e65e78191215422099af5325;hp=0000000000000000000000000000000000000000;hb=0065d5ab628975892cea1ec7303f968c3338cbe1;hpb=28a464a75e14cece5db40f2765a29348273ff2d2 diff --git a/docs/comm/exts/ndp.html b/docs/comm/exts/ndp.html new file mode 100644 index 0000000..0c94c39 --- /dev/null +++ b/docs/comm/exts/ndp.html @@ -0,0 +1,360 @@ + + + + + The GHC Commentary - Parallel Arrays + + + +

The GHC Commentary - Parallel Arrays

+ This section describes an experimental extension by high-performance + arrays, which comprises special syntax for array types and array + comprehensions, a set of optimising program transformations, and a set + of special purpose libraries. The extension is currently only partially + implemented, but the development will be tracked here. +

+ Parallel arrays originally got their name from the aim to provide an + architecture-independent programming model for a range of parallel + computers. However, since experiments showed that the approach is also + worthwhile for sequential array code, the emphasis has shifted to their + parallel evaluation semantics: As soon as any element in a parallel + array is demanded, all the other elements are evaluated, too. This + makes parallel arrays more strict than standard Haskell 98 + arrays, but also opens the door for a loop-based implementation + strategy that leads to significantly more efficient code. +

+ The programming model as well as the use of the flattening + transformation, which is central to the approach, has its origin in + the programming language Nesl. + +

More Sugar: Special Syntax for Array Comprehensions

+ The option -fparr, which is a dynamic hsc option that can + be reversed with -fno-parr, enables special syntax for + parallel arrays, which is not essential to using parallel arrays, but + makes for significantly more concise programs. The switch works by + making the lexical analyser (located in Lex.lhs) + recognise the tokens [: and :]. Given that + the additional productions in the parser (located in Parser.y) + cannot be triggered without the lexer generating the necessary tokens, + there is no need to alter the behaviour of the parser. +

+ The following additional syntax is accepted (the non-terminals are those + from the Haskell 98 language + definition): +

+atype -> '[:' type ':]				     (parallel array type)
+
+aexp  -> '[:' exp1 ',' ... ',' expk ':]'             (explicit array, k >= 0)
+      |  '[:' exp1 [',' exp2] '..' exp3 ':]'	     (arithmetic array sequence)
+      |  '[:' exp '|' quals1 '|' ... '|' qualsn ':]' (array comprehension, n >= 1)
+
+quals -> qual1 ',' ... ',' qualn	             (qualifier list, n >= 1)
+
+apat  -> '[:' pat1 ',' ... ',' patk ':]'	     (array pattern, k >= 0)
+

+ Moreover, the extended comprehension syntax that allows for parallel + qualifiers (i.e., qualifiers separated by "|") is also + supported in list comprehensions. In general, the similarity to the + special syntax for list is obvious. The two main differences are that + (a) arithmetic array sequences are always finite and (b) + [::] is not treated as a constructor in expressions and + patterns, but rather as a special case of the explicit array syntax. + The former is a simple consequence of the parallel evaluation semantics + of parallel arrays and the latter is due to arrays not being a + constructor-based data type. +

+ As a naming convention, types and functions that are concerned with + parallel arrays usually contain the string parr or + PArr (often as a prefix), and where corresponding types or + functions for handling lists exist, the name is identical, except for + containing the substring parr instead of list + (possibly in caps). +

+ The following implications are worth noting explicitly: +

As the value and pattern [::] is an empty explicit + parallel array (i.e., something of the form ExplicitPArr ty + [] in the AST). This is in contrast to lists, which use the + nil-constructor instead. In the case of parallel arrays, using a + constructor would be rather awkward, as it is not a constructor-based + type. (This becomes rather clear in the desugarer.) +
As a consequence, array patterns have the general form [:p1, + p2, ..., pn:], where n >= 0. Thus, two array + patterns overlap iff they have the same length -- an important property + for the pattern matching compiler. +

+ +

Prelude Support for Parallel Arrays

+ The Prelude module PrelPArr + defines the standard operations and their types on parallel arrays and + provides a basic implementation based on boxed arrays. The interface of + PrelPArr is oriented by H98's PrelList, but + leaving out all functions that require infinite structures and adding + frequently needed array operations, such as permutations. Parallel + arrays are quite unqiue in that they use an entirely different + representation as soon as the flattening transformation is activated, + which is described further below. In particular, PrelPArr + defines the type [::] and operations to create, process, + and inspect parallel arrays. The type as well as the names of some of + the operations are also hardwired in TysWiredIn + (see the definition of parrTyCon in this module) and PrelNames. + This is again very much like the case of lists, where the type is + defined in PrelBase + and similarly wired in; however, for lists the entirely + constructor-based definition is exposed to user programs, which is not + the case for parallel arrays. + +

Desugaring Parallel Arrays

+ The parallel array extension requires the desugarer to replace all + occurrences of (1) explicit parallel arrays, (2) array patterns, and (3) + array comprehensions by a suitable combination of invocations of + operations defined in PrelPArr. + +

Explicit Parallel Arrays

+ These are by far the simplest to remove. We simply replace every + occurrence of [:e₁, ..., + e_n:] by +

+ + toP [e₁, ..., e_n] + +

+ i.e., we build a list of the array elements, which is, then, converted + into a parallel array. + +

Parallel Array Patterns

+ Array patterns are much more tricky as element positions may contain + further patterns and the pattern matching compiler + requires us to flatten all those out. But before we turn to the gory + details, here first the basic idea: A flat array pattern matches exactly + iff it's length corresponds to the length of the matched array. Hence, + if we have a set of flat array patterns matching an array value + a, it suffices to generate a Core case + expression that scrutinises lengthP a and has one + alternative for every length of array occuring in one of the patterns. + Moreover, there needs to be a default case catching all other array + lengths. In each alternative, array indexing (i.e., the functions + !:) is used to bind array elements to the corresponding + pattern variables. This sounds easy enough and is essentially what the + parallel array equation of the function DsUtils.mkCoAlgCaseMatchResult + does. +

+ Unfortunately, however, the pattern matching compiler expects that it + can turn (almost) any pattern into variable patterns, literals, or + constructor applications by way of the functions Match.tidy1. + And to make matters worse, some weird machinery in the module Check + insists on being able to reverse the process (essentially to pretty + print patterns in warnings for incomplete or overlapping patterns). +

+ The solution to this is an (unlimited) set of fake constructors + for parallel arrays, courtesy of TysWiredIn.parrFakeCon. + In other words, any pattern of the form [:p₁, + ..., p_n:] is transformed into +

+ + MkPArrayn p₁ ... p_n + +

+ by Match.tidy1, then, run through the rest of the pattern + matching compiler, and finally, picked up by + DsUtils.mkCoAlgCaseMatchResult, which converts it into a + case expression as outlined above. +

+ As an example consider the source expression +

+case v of
+  [:x1:]         -> e1
+  [:x2, x3, x4:] -> e2
+  _		 -> e3

+ Match.tidy1 converts it into a form that is equivalent to +

+case v of {
+  MkPArr1 x1       -> e1;
+  MkPArr2 x2 x3 x4 -> e2;
+  _	           -> e3;
+}

+ which DsUtils.mkCoAlgCaseMatchResult turns into the + following Core code: +

+      case lengthP v of
+        Int# i# -> 
+	  case i# of l {
+	    DFT ->					  e3
+	    1   -> let x1 = v!:0                       in e1
+	    3   -> let x2 = v!:0; x2 = v!:1; x3 = v!:2 in e2
+	  }

+ +

Parallel Array Comprehensions

+ The most challenging construct of the three are array comprehensions. + In principle, it would be possible to transform them in essentially the + same way as list comprehensions, but this would lead to abysmally slow + code as desugaring of list comprehensions generates code that is + optimised for sequential, constructor-based structures. In contrast, + array comprehensions need to be transformed into code that solely relies + on collective operations and avoids the creation of many small + intermediate arrays. +

+ The transformation is implemented by the function DsListComp.dsPArrComp. + In the following, we denote this transformation function by the form + <<e>> pa ea, where e + is the comprehension to be compiled and the arguments pa + and ea denote a pattern and the currently processed array + expression, respectively. The invariant constraining these two + arguments is that all elements in the array produced by ea + will successfully match against pa. +

+ Given a source-level comprehensions [:e | qss:], we compile + it with <<[:e | qss:]>> () [:():] using the + rules +

+<<[:e' |           :]>> pa ea = mapP (\pa -> e') ea
+<<[:e' | b     , qs:]>> pa ea = <<[:e' | qs:]>> pa (filterP (\pa -> b) ea)
+<<[:e' | p <- e, qs:]>> pa ea = 
+  let ef = filterP (\x -> case x of {p -> True; _ -> False}) e
+  in
+  <<[:e' | qs:]>> (pa, p) (crossP ea ef)
+<<[:e' | let ds, qs:]>> pa ea = 
+  <<[:e' | qs:]>> (pa, (x_1, ..., x_n)) 
+    	      (mapP (\v@pa -> (v, let ds in (x_1, ..., x_n))) ea)
+where
+  {x_1, ..., x_n} = DV (ds)		-- Defined Variables
+<<[:e' | qs | qss:]>>   pa ea = 
+  <<[:e' | qss:]>> (pa, (x_1, ..., x_n)) 
+    	       (zipP ea <<[:(x_1, ..., x_n) | qs:]>>)
+where
+  {x_1, ..., x_n} = DV (qs)

+ We assume the denotation of crossP to be given by +

+crossP       :: [:a:] -> [:b:] -> [:(a, b):]
+crossP a1 a2  = let
+  		len1 = lengthP a1
+  		len2 = lengthP a2
+  		x1   = concatP $ mapP (replicateP len2) a1
+  		x2   = concatP $ replicateP len1 a2
+  	      in
+  	      zipP x1 x2

+ For a more efficient implementation of crossP, see + PrelPArr. +

+ Moreover, the following optimisations are important: +

In the p <- e rule, if pa == (), drop + it and simplify the crossP ea e to e. +
We assume that fusion will optimise sequences of array processing + combinators. +
FIXME: Do we want to have the following function? +
```
+mapFilterP :: (a -> Maybe b) -> [:a:] -> [:b:]
```
+
+
+ Even with fusion (mapP (\p -> e) . filterP (\p -> + b)) may still result in redundant pattern matching + operations. (Let's wait with this until we have seen what the + Simplifier does to the generated code.) +

+ +

Doing Away With Nested Arrays: The Flattening Transformation

+ On the quest towards an entirely unboxed representation of parallel + arrays, the flattening transformation is the essential ingredient. GHC + uses a substantially + improved version of the transformation whose original form was + described by Blelloch & Sabot. The flattening transformation + replaces values of type [:a:] as well as functions + operating on these values by alternative, more efficient data structures + and functions. +

+ The flattening machinery is activated by the option + -fflatten, which is a static hsc option. It consists of + two steps: function vectorisation and array specialisation. Only the + first of those is implemented so far. If selected, the transformation + is applied to a module in Core form immediately after the desugarer, before the Mighty Simplifier gets to do its + job. After vectorisation, the Core program can be dumped using the + option -ddump-vect. The is a good reason for us to perform + flattening immediately after the desugarer. In HscMain.hscRecomp + the so-called persistent compiler state is maintained, which + contains all the information about imported interface files needed to + lookup the details of imported names (e.g., during renaming and type + checking). However, all this information is zapped before the + simplifier is invoked (supposedly to reduce the space-consumption of + GHC). As flattening has to get at all kinds of identifiers from Prelude + modules, we need to do it before the relevant information in the + persistent compiler state is gone. + +

+ As flattening generally requires all libraries to be compiled for + flattening (just like profiling does), there is a compiler way + "ndp", which can be selected using the way option code + -ndp. This option will automagically select + -fparr and -fflatten. + +

`FlattenMonad`

+ The module FlattenMonad + implements the monad Flatten that is used during + vectorisation to keep track of various sets of bound variables and a + variable substitution map; moreover, it provides a supply of new uniques + and allows us to look up names in the persistent compiler state (i.e., + imported identifiers). +

+ In order to be able to look up imported identifiers in the persistent + compiler state, it is important that these identifies are included into + the free variable lists computed by the renamer. More precisely, all + names needed by flattening are included in the names produced by + RnEnv.getImplicitModuleFVs. To avoid putting + flattening-dependent lists of names into the renamer, the module + FlattenInfo exports namesNeededForFlattening. + + [FIXME: It might be worthwhile to document in the non-Flattening part of + the Commentary that the persistent compiler state is zapped after + desugaring and how the free variables determined by the renamer imply + which names are imported.] + +

+ +Last modified: Tue Feb 12 01:44:21 EST 2002 + + + +