The GHC Commentary - Parallel Arrays

This section describes an experimental extension by high-performance arrays, which comprises special syntax for array types and array comprehensions, a set of optimising program transformations, and a set of special purpose libraries. The extension is currently only partially implemented, but the development will be tracked here.

Parallel arrays originally got their name from the aim to provide an architecture-independent programming model for a range of parallel computers. However, since experiments showed that the approach is also worthwhile for sequential array code, the emphasis has shifted to their parallel evaluation semantics: As soon as any element in a parallel array is demanded, all the other elements are evaluated, too. This makes parallel arrays more strict than standard Haskell 98 arrays, but also opens the door for a loop-based implementation strategy that leads to significantly more efficient code.

The programming model as well as the use of the flattening transformation, which is central to the approach, has its origin in the programming language Nesl.

More Sugar: Special Syntax for Array Comprehensions

The option -fparr, which is a dynamic hsc option that can be reversed with -fno-parr, enables special syntax for parallel arrays, which is not essential to using parallel arrays, but makes for significantly more concise programs. The switch works by making the lexical analyser (located in Lex.lhs) recognise the tokens [: and :]. Given that the additional productions in the parser (located in Parser.y) cannot be triggered without the lexer generating the necessary tokens, there is no need to alter the behaviour of the parser.

The following additional syntax is accepted (the non-terminals are those from the Haskell 98 language definition):

atype -> '[:' type ':]				     (parallel array type)

aexp  -> '[:' exp1 ',' ... ',' expk ':]'             (explicit array, k >= 0)
      |  '[:' exp1 [',' exp2] '..' exp3 ':]'	     (arithmetic array sequence)
      |  '[:' exp '|' quals1 '|' ... '|' qualsn ':]' (array comprehension, n >= 1)

quals -> qual1 ',' ... ',' qualn	             (qualifier list, n >= 1)

apat  -> '[:' pat1 ',' ... ',' patk ':]'	     (array pattern, k >= 0)

Moreover, the extended comprehension syntax that allows for parallel qualifiers (i.e., qualifiers separated by "|") is also supported in list comprehensions. In general, the similarity to the special syntax for list is obvious. The two main differences are that (a) arithmetic array sequences are always finite and (b) [::] is not treated as a constructor in expressions and patterns, but rather as a special case of the explicit array syntax. The former is a simple consequence of the parallel evaluation semantics of parallel arrays and the latter is due to arrays not being a constructor-based data type.

As a naming convention, types and functions that are concerned with parallel arrays usually contain the string parr or PArr (often as a prefix), and where corresponding types or functions for handling lists exist, the name is identical, except for containing the substring parr instead of list (possibly in caps).

The following implications are worth noting explicitly:

As the value and pattern [::] is an empty explicit parallel array (i.e., something of the form ExplicitPArr ty [] in the AST). This is in contrast to lists, which use the nil-constructor instead. In the case of parallel arrays, using a constructor would be rather awkward, as it is not a constructor-based type. (This becomes rather clear in the desugarer.)
As a consequence, array patterns have the general form [:p1, p2, ..., pn:], where n >= 0. Thus, two array patterns overlap iff they have the same length -- an important property for the pattern matching compiler.

Prelude Support for Parallel Arrays

The Prelude module PrelPArr defines the standard operations and their types on parallel arrays and provides a basic implementation based on boxed arrays. The interface of PrelPArr is oriented by H98's PrelList, but leaving out all functions that require infinite structures and adding frequently needed array operations, such as permutations. Parallel arrays are quite unqiue in that they use an entirely different representation as soon as the flattening transformation is activated, which is described further below. In particular, PrelPArr defines the type [::] and operations to create, process, and inspect parallel arrays. The type as well as the names of some of the operations are also hardwired in TysWiredIn (see the definition of parrTyCon in this module) and PrelNames. This is again very much like the case of lists, where the type is defined in PrelBase and similarly wired in; however, for lists the entirely constructor-based definition is exposed to user programs, which is not the case for parallel arrays.

Desugaring Parallel Arrays

The parallel array extension requires the desugarer to replace all occurrences of (1) explicit parallel arrays, (2) array patterns, and (3) array comprehensions by a suitable combination of invocations of operations defined in PrelPArr.

Explicit Parallel Arrays

These are by far the simplest to remove. We simply replace every occurrence of [:e₁, ..., e_n:] by

toP [e₁, ..., e_n]

i.e., we build a list of the array elements, which is, then, converted into a parallel array.

Parallel Array Patterns

Array patterns are much more tricky as element positions may contain further patterns and the pattern matching compiler requires us to flatten all those out. But before we turn to the gory details, here first the basic idea: A flat array pattern matches exactly iff it's length corresponds to the length of the matched array. Hence, if we have a set of flat array patterns matching an array value a, it suffices to generate a Core case expression that scrutinises lengthP a and has one alternative for every length of array occuring in one of the patterns. Moreover, there needs to be a default case catching all other array lengths. In each alternative, array indexing (i.e., the functions !:) is used to bind array elements to the corresponding pattern variables. This sounds easy enough and is essentially what the parallel array equation of the function DsUtils.mkCoAlgCaseMatchResult does.

Unfortunately, however, the pattern matching compiler expects that it can turn (almost) any pattern into variable patterns, literals, or constructor applications by way of the functions Match.tidy1. And to make matters worse, some weird machinery in the module Check insists on being able to reverse the process (essentially to pretty print patterns in warnings for incomplete or overlapping patterns).

The solution to this is an (unlimited) set of fake constructors for parallel arrays, courtesy of TysWiredIn.parrFakeCon. In other words, any pattern of the form [:p₁, ..., p_n:] is transformed into

MkPArrayn p₁ ... p_n

by Match.tidy1, then, run through the rest of the pattern matching compiler, and finally, picked up by DsUtils.mkCoAlgCaseMatchResult, which converts it into a case expression as outlined above.

As an example consider the source expression

case v of
  [:x1:]         -> e1
  [:x2, x3, x4:] -> e2
  _		 -> e3

Match.tidy1 converts it into a form that is equivalent to

case v of {
  MkPArr1 x1       -> e1;
  MkPArr2 x2 x3 x4 -> e2;
  _	           -> e3;
}

which DsUtils.mkCoAlgCaseMatchResult turns into the following Core code:

      case lengthP v of
        Int# i# -> 
	  case i# of l {
	    DFT ->					  e3
	    1   -> let x1 = v!:0                       in e1
	    3   -> let x2 = v!:0; x2 = v!:1; x3 = v!:2 in e2
	  }

Parallel Array Comprehensions

The most challenging construct of the three are array comprehensions. In principle, it would be possible to transform them in essentially the same way as list comprehensions, but this would lead to abysmally slow code as desugaring of list comprehensions generates code that is optimised for sequential, constructor-based structures. In contrast, array comprehensions need to be transformed into code that solely relies on collective operations and avoids the creation of many small intermediate arrays.

The transformation is implemented by the function DsListComp.dsPArrComp. In the following, we denote this transformation function by the form <<e>> pa ea, where e is the comprehension to be compiled and the arguments pa and ea denote a pattern and the currently processed array expression, respectively. The invariant constraining these two arguments is that all elements in the array produced by ea will successfully match against pa.

Given a source-level comprehensions [:e | qss:], we compile it with <<[:e | qss:]>> () [:():] using the rules

<<[:e' |           :]>> pa ea = mapP (\pa -> e') ea
<<[:e' | b     , qs:]>> pa ea = <<[:e' | qs:]>> pa (filterP (\pa -> b) ea)
<<[:e' | p <- e, qs:]>> pa ea = 
  let ef = filterP (\x -> case x of {p -> True; _ -> False}) e
  in
  <<[:e' | qs:]>> (pa, p) (crossP ea ef)
<<[:e' | let ds, qs:]>> pa ea = 
  <<[:e' | qs:]>> (pa, (x_1, ..., x_n)) 
    	      (mapP (\v@pa -> (v, let ds in (x_1, ..., x_n))) ea)
where
  {x_1, ..., x_n} = DV (ds)		-- Defined Variables
<<[:e' | qs | qss:]>>   pa ea = 
  <<[:e' | qss:]>> (pa, (x_1, ..., x_n)) 
    	       (zipP ea <<[:(x_1, ..., x_n) | qs:]>>)
where
  {x_1, ..., x_n} = DV (qs)

We assume the denotation of crossP to be given by

crossP       :: [:a:] -> [:b:] -> [:(a, b):]
crossP a1 a2  = let
  		len1 = lengthP a1
  		len2 = lengthP a2
  		x1   = concatP $ mapP (replicateP len2) a1
  		x2   = concatP $ replicateP len1 a2
  	      in
  	      zipP x1 x2

For a more efficient implementation of crossP, see PrelPArr.

Moreover, the following optimisations are important:

In the p <- e rule, if pa == (), drop it and simplify the crossP ea e to e.
We assume that fusion will optimise sequences of array processing combinators.
FIXME: Do we want to have the following function?
```
mapFilterP :: (a -> Maybe b) -> [:a:] -> [:b:]
```
Even with fusion (mapP (\p -> e) . filterP (\p -> b)) may still result in redundant pattern matching operations. (Let's wait with this until we have seen what the Simplifier does to the generated code.)

Doing Away With Nested Arrays: The Flattening Transformation

On the quest towards an entirely unboxed representation of parallel arrays, the flattening transformation is the essential ingredient. GHC uses a substantially improved version of the transformation whose original form was described by Blelloch & Sabot. The flattening transformation replaces values of type [:a:] as well as functions operating on these values by alternative, more efficient data structures and functions.

The flattening machinery is activated by the option -fflatten, which is a static hsc option. It consists of two steps: function vectorisation and array specialisation. Only the first of those is implemented so far. If selected, the transformation is applied to a module in Core form immediately after the desugarer, before the Mighty Simplifier gets to do its job. After vectorisation, the Core program can be dumped using the option -ddump-vect. The is a good reason for us to perform flattening immediately after the desugarer. In HscMain.hscRecomp the so-called persistent compiler state is maintained, which contains all the information about imported interface files needed to lookup the details of imported names (e.g., during renaming and type checking). However, all this information is zapped before the simplifier is invoked (supposedly to reduce the space-consumption of GHC). As flattening has to get at all kinds of identifiers from Prelude modules, we need to do it before the relevant information in the persistent compiler state is gone.

As flattening generally requires all libraries to be compiled for flattening (just like profiling does), there is a compiler way "ndp", which can be selected using the way option code -ndp. This option will automagically select -fparr and -fflatten.

`FlattenMonad`

The module FlattenMonad implements the monad Flatten that is used during vectorisation to keep track of various sets of bound variables and a variable substitution map; moreover, it provides a supply of new uniques and allows us to look up names in the persistent compiler state (i.e., imported identifiers).

In order to be able to look up imported identifiers in the persistent compiler state, it is important that these identifies are included into the free variable lists computed by the renamer. More precisely, all names needed by flattening are included in the names produced by RnEnv.getImplicitModuleFVs. To avoid putting flattening-dependent lists of names into the renamer, the module FlattenInfo exports namesNeededForFlattening. [FIXME: It might be worthwhile to document in the non-Flattening part of the Commentary that the persistent compiler state is zapped after desugaring and how the free variables determined by the renamer imply which names are imported.]

Last modified: Tue Feb 12 01:44:21 EST 2002