This section describes an experimental extension by high-performance arrays, which comprises special syntax for array types and array comprehensions, a set of optimising program transformations, and a set of special purpose libraries. The extension is currently only partially implemented, but the development will be tracked here.
Parallel arrays originally got their name from the aim to provide an architecture-independent programming model for a range of parallel computers. However, since experiments showed that the approach is also worthwhile for sequential array code, the emphasis has shifted to their parallel evaluation semantics: As soon as any element in a parallel array is demanded, all the other elements are evaluated, too. This makes parallel arrays more strict than standard Haskell 98 arrays, but also opens the door for a loop-based implementation strategy that leads to significantly more efficient code.
The programming model as well as the use of the flattening transformation, which is central to the approach, has its origin in the programming language Nesl.
The option -fparr
, which is a dynamic hsc option that can
be reversed with -fno-parr
, enables special syntax for
parallel arrays, which is not essential to using parallel arrays, but
makes for significantly more concise programs. The switch works by
making the lexical analyser (located in Lex.lhs
)
recognise the tokens [:
and :]
. Given that
the additional productions in the parser (located in Parser.y
)
cannot be triggered without the lexer generating the necessary tokens,
there is no need to alter the behaviour of the parser.
The following additional syntax is accepted (the non-terminals are those from the Haskell 98 language definition):
atype -> '[:' type ':] (parallel array type) aexp -> '[:' exp1 ',' ... ',' expk ':]' (explicit array, k >= 0) | '[:' exp1 [',' exp2] '..' exp3 ':]' (arithmetic array sequence) | '[:' exp '|' quals1 '|' ... '|' qualsn ':]' (array comprehension, n >= 1) quals -> qual1 ',' ... ',' qualn (qualifier list, n >= 1) apat -> '[:' pat1 ',' ... ',' patk ':]' (array pattern, k >= 0)
Moreover, the extended comprehension syntax that allows for parallel
qualifiers (i.e., qualifiers separated by "|
") is also
supported in list comprehensions. In general, the similarity to the
special syntax for list is obvious. The two main differences are that
(a) arithmetic array sequences are always finite and (b)
[::]
is not treated as a constructor in expressions and
patterns, but rather as a special case of the explicit array syntax.
The former is a simple consequence of the parallel evaluation semantics
of parallel arrays and the latter is due to arrays not being a
constructor-based data type.
As a naming convention, types and functions that are concerned with
parallel arrays usually contain the string parr
or
PArr
(often as a prefix), and where corresponding types or
functions for handling lists exist, the name is identical, except for
containing the substring parr
instead of list
(possibly in caps).
The following implications are worth noting explicitly:
[::]
is an empty explicit
parallel array (i.e., something of the form ExplicitPArr ty
[]
in the AST). This is in contrast to lists, which use the
nil-constructor instead. In the case of parallel arrays, using a
constructor would be rather awkward, as it is not a constructor-based
type. (This becomes rather clear in the desugarer.)
[:p1,
p2, ..., pn:]
, where n
>= 0. Thus, two array
patterns overlap iff they have the same length -- an important property
for the pattern matching compiler.
The Prelude module PrelPArr
defines the standard operations and their types on parallel arrays and
provides a basic implementation based on boxed arrays. The interface of
PrelPArr
is oriented by H98's PrelList
, but
leaving out all functions that require infinite structures and adding
frequently needed array operations, such as permutations. Parallel
arrays are quite unqiue in that they use an entirely different
representation as soon as the flattening transformation is activated,
which is described further below. In particular, PrelPArr
defines the type [::]
and operations to create, process,
and inspect parallel arrays. The type as well as the names of some of
the operations are also hardwired in TysWiredIn
(see the definition of parrTyCon
in this module) and PrelNames
.
This is again very much like the case of lists, where the type is
defined in PrelBase
and similarly wired in; however, for lists the entirely
constructor-based definition is exposed to user programs, which is not
the case for parallel arrays.
The parallel array extension requires the desugarer to replace all
occurrences of (1) explicit parallel arrays, (2) array patterns, and (3)
array comprehensions by a suitable combination of invocations of
operations defined in PrelPArr
.
These are by far the simplest to remove. We simply replace every
occurrence of [:e1, ...,
en:]
by
toP [e1, ..., en]
i.e., we build a list of the array elements, which is, then, converted into a parallel array.
Array patterns are much more tricky as element positions may contain
further patterns and the pattern matching compiler
requires us to flatten all those out. But before we turn to the gory
details, here first the basic idea: A flat array pattern matches exactly
iff it's length corresponds to the length of the matched array. Hence,
if we have a set of flat array patterns matching an array value
a
, it suffices to generate a Core case
expression that scrutinises lengthP a
and has one
alternative for every length of array occuring in one of the patterns.
Moreover, there needs to be a default case catching all other array
lengths. In each alternative, array indexing (i.e., the functions
!:
) is used to bind array elements to the corresponding
pattern variables. This sounds easy enough and is essentially what the
parallel array equation of the function DsUtils
.mkCoAlgCaseMatchResult
does.
Unfortunately, however, the pattern matching compiler expects that it
can turn (almost) any pattern into variable patterns, literals, or
constructor applications by way of the functions Match
.tidy1
.
And to make matters worse, some weird machinery in the module Check
insists on being able to reverse the process (essentially to pretty
print patterns in warnings for incomplete or overlapping patterns).
The solution to this is an (unlimited) set of fake constructors
for parallel arrays, courtesy of TysWiredIn
.parrFakeCon
.
In other words, any pattern of the form [:p1,
..., pn:]
is transformed into
MkPArrayn p1 ... pn
by Match.tidy1
, then, run through the rest of the pattern
matching compiler, and finally, picked up by
DsUtils.mkCoAlgCaseMatchResult
, which converts it into a
case
expression as outlined above.
As an example consider the source expression
case v of [:x1:] -> e1 [:x2, x3, x4:] -> e2 _ -> e3
Match.tidy1
converts it into a form that is equivalent to
case v of { MkPArr1 x1 -> e1; MkPArr2 x2 x3 x4 -> e2; _ -> e3; }
which DsUtils.mkCoAlgCaseMatchResult
turns into the
following Core code:
case lengthP v of Int# i# -> case i# of l { DFT -> e3 1 -> let x1 = v!:0 in e1 3 -> let x2 = v!:0; x2 = v!:1; x3 = v!:2 in e2 }
The most challenging construct of the three are array comprehensions. In principle, it would be possible to transform them in essentially the same way as list comprehensions, but this would lead to abysmally slow code as desugaring of list comprehensions generates code that is optimised for sequential, constructor-based structures. In contrast, array comprehensions need to be transformed into code that solely relies on collective operations and avoids the creation of many small intermediate arrays.
The transformation is implemented by the function DsListComp
.dsPArrComp
.
In the following, we denote this transformation function by the form
<<e>> pa ea
, where e
is the comprehension to be compiled and the arguments pa
and ea
denote a pattern and the currently processed array
expression, respectively. The invariant constraining these two
arguments is that all elements in the array produced by ea
will successfully match against pa
.
Given a source-level comprehensions [:e | qss:]
, we compile
it with <<[:e | qss:]>> () [:():]
using the
rules
<<[:e' | :]>> pa ea = mapP (\pa -> e') ea <<[:e' | b , qs:]>> pa ea = <<[:e' | qs:]>> pa (filterP (\pa -> b) ea) <<[:e' | p <- e, qs:]>> pa ea = let ef = filterP (\x -> case x of {p -> True; _ -> False}) e in <<[:e' | qs:]>> (pa, p) (crossP ea ef) <<[:e' | let ds, qs:]>> pa ea = <<[:e' | qs:]>> (pa, (x_1, ..., x_n)) (mapP (\v@pa -> (v, let ds in (x_1, ..., x_n))) ea) where {x_1, ..., x_n} = DV (ds) -- Defined Variables <<[:e' | qs | qss:]>> pa ea = <<[:e' | qss:]>> (pa, (x_1, ..., x_n)) (zipP ea <<[:(x_1, ..., x_n) | qs:]>>) where {x_1, ..., x_n} = DV (qs)
We assume the denotation of crossP
to be given by
crossP :: [:a:] -> [:b:] -> [:(a, b):] crossP a1 a2 = let len1 = lengthP a1 len2 = lengthP a2 x1 = concatP $ mapP (replicateP len2) a1 x2 = concatP $ replicateP len1 a2 in zipP x1 x2
For a more efficient implementation of crossP
, see
PrelPArr
.
Moreover, the following optimisations are important:
p <- e
rule, if pa == ()
, drop
it and simplify the crossP ea e
to e
.
mapFilterP :: (a -> Maybe b) -> [:a:] -> [:b:]
Even with fusion (mapP (\p -> e) . filterP (\p ->
b))
may still result in redundant pattern matching
operations. (Let's wait with this until we have seen what the
Simplifier does to the generated code.)
On the quest towards an entirely unboxed representation of parallel
arrays, the flattening transformation is the essential ingredient. GHC
uses a substantially
improved version of the transformation whose original form was
described by Blelloch & Sabot. The flattening transformation
replaces values of type [:a:]
as well as functions
operating on these values by alternative, more efficient data structures
and functions.
The flattening machinery is activated by the option
-fflatten
, which is a static hsc option. It consists of
two steps: function vectorisation and array specialisation. Only the
first of those is implemented so far. If selected, the transformation
is applied to a module in Core form immediately after the desugarer, before the Mighty Simplifier gets to do its
job. After vectorisation, the Core program can be dumped using the
option -ddump-vect
. The is a good reason for us to perform
flattening immediately after the desugarer. In HscMain
.hscRecomp
the so-called persistent compiler state is maintained, which
contains all the information about imported interface files needed to
lookup the details of imported names (e.g., during renaming and type
checking). However, all this information is zapped before the
simplifier is invoked (supposedly to reduce the space-consumption of
GHC). As flattening has to get at all kinds of identifiers from Prelude
modules, we need to do it before the relevant information in the
persistent compiler state is gone.
As flattening generally requires all libraries to be compiled for
flattening (just like profiling does), there is a compiler way
"ndp"
, which can be selected using the way option code
-ndp
. This option will automagically select
-fparr
and -fflatten
.
FlattenMonad
The module FlattenMonad
implements the monad Flatten
that is used during
vectorisation to keep track of various sets of bound variables and a
variable substitution map; moreover, it provides a supply of new uniques
and allows us to look up names in the persistent compiler state (i.e.,
imported identifiers).
In order to be able to look up imported identifiers in the persistent
compiler state, it is important that these identifies are included into
the free variable lists computed by the renamer. More precisely, all
names needed by flattening are included in the names produced by
RnEnv.getImplicitModuleFVs
. To avoid putting
flattening-dependent lists of names into the renamer, the module
FlattenInfo
exports namesNeededForFlattening
.
[FIXME: It might be worthwhile to document in the non-Flattening part of
the Commentary that the persistent compiler state is zapped after
desugaring and how the free variables determined by the renamer imply
which names are imported.]
Last modified: Tue Feb 12 01:44:21 EST 2002