X-Git-Url: http://git.megacz.com/?p=ghc-hetmet.git;a=blobdiff_plain;f=docs%2Fcomm%2Fexts%2Fndp.html;fp=docs%2Fcomm%2Fexts%2Fndp.html;h=0c94c3960b7834d2e65e78191215422099af5325;hp=0000000000000000000000000000000000000000;hb=0065d5ab628975892cea1ec7303f968c3338cbe1;hpb=28a464a75e14cece5db40f2765a29348273ff2d2 diff --git a/docs/comm/exts/ndp.html b/docs/comm/exts/ndp.html new file mode 100644 index 0000000..0c94c39 --- /dev/null +++ b/docs/comm/exts/ndp.html @@ -0,0 +1,360 @@ + + +
+ ++ This section describes an experimental extension by high-performance + arrays, which comprises special syntax for array types and array + comprehensions, a set of optimising program transformations, and a set + of special purpose libraries. The extension is currently only partially + implemented, but the development will be tracked here. +
+ Parallel arrays originally got their name from the aim to provide an + architecture-independent programming model for a range of parallel + computers. However, since experiments showed that the approach is also + worthwhile for sequential array code, the emphasis has shifted to their + parallel evaluation semantics: As soon as any element in a parallel + array is demanded, all the other elements are evaluated, too. This + makes parallel arrays more strict than standard Haskell 98 + arrays, but also opens the door for a loop-based implementation + strategy that leads to significantly more efficient code. +
+ The programming model as well as the use of the flattening + transformation, which is central to the approach, has its origin in + the programming language Nesl. + +
+ The option -fparr
, which is a dynamic hsc option that can
+ be reversed with -fno-parr
, enables special syntax for
+ parallel arrays, which is not essential to using parallel arrays, but
+ makes for significantly more concise programs. The switch works by
+ making the lexical analyser (located in Lex.lhs
)
+ recognise the tokens [:
and :]
. Given that
+ the additional productions in the parser (located in Parser.y
)
+ cannot be triggered without the lexer generating the necessary tokens,
+ there is no need to alter the behaviour of the parser.
+
+ The following additional syntax is accepted (the non-terminals are those + from the Haskell 98 language + definition): +
+
++atype -> '[:' type ':] (parallel array type) + +aexp -> '[:' exp1 ',' ... ',' expk ':]' (explicit array, k >= 0) + | '[:' exp1 [',' exp2] '..' exp3 ':]' (arithmetic array sequence) + | '[:' exp '|' quals1 '|' ... '|' qualsn ':]' (array comprehension, n >= 1) + +quals -> qual1 ',' ... ',' qualn (qualifier list, n >= 1) + +apat -> '[:' pat1 ',' ... ',' patk ':]' (array pattern, k >= 0) ++
+ Moreover, the extended comprehension syntax that allows for parallel
+ qualifiers (i.e., qualifiers separated by "|
") is also
+ supported in list comprehensions. In general, the similarity to the
+ special syntax for list is obvious. The two main differences are that
+ (a) arithmetic array sequences are always finite and (b)
+ [::]
is not treated as a constructor in expressions and
+ patterns, but rather as a special case of the explicit array syntax.
+ The former is a simple consequence of the parallel evaluation semantics
+ of parallel arrays and the latter is due to arrays not being a
+ constructor-based data type.
+
+ As a naming convention, types and functions that are concerned with
+ parallel arrays usually contain the string parr
or
+ PArr
(often as a prefix), and where corresponding types or
+ functions for handling lists exist, the name is identical, except for
+ containing the substring parr
instead of list
+ (possibly in caps).
+
+ The following implications are worth noting explicitly: +
[::]
is an empty explicit
+ parallel array (i.e., something of the form ExplicitPArr ty
+ []
in the AST). This is in contrast to lists, which use the
+ nil-constructor instead. In the case of parallel arrays, using a
+ constructor would be rather awkward, as it is not a constructor-based
+ type. (This becomes rather clear in the desugarer.)
+ [:p1,
+ p2, ..., pn:]
, where n
>= 0. Thus, two array
+ patterns overlap iff they have the same length -- an important property
+ for the pattern matching compiler.
+
+ The Prelude module PrelPArr
+ defines the standard operations and their types on parallel arrays and
+ provides a basic implementation based on boxed arrays. The interface of
+ PrelPArr
is oriented by H98's PrelList
, but
+ leaving out all functions that require infinite structures and adding
+ frequently needed array operations, such as permutations. Parallel
+ arrays are quite unqiue in that they use an entirely different
+ representation as soon as the flattening transformation is activated,
+ which is described further below. In particular, PrelPArr
+ defines the type [::]
and operations to create, process,
+ and inspect parallel arrays. The type as well as the names of some of
+ the operations are also hardwired in TysWiredIn
+ (see the definition of parrTyCon
in this module) and PrelNames
.
+ This is again very much like the case of lists, where the type is
+ defined in PrelBase
+ and similarly wired in; however, for lists the entirely
+ constructor-based definition is exposed to user programs, which is not
+ the case for parallel arrays.
+
+
+ The parallel array extension requires the desugarer to replace all
+ occurrences of (1) explicit parallel arrays, (2) array patterns, and (3)
+ array comprehensions by a suitable combination of invocations of
+ operations defined in PrelPArr
.
+
+
+ These are by far the simplest to remove. We simply replace every
+ occurrence of [:e1, ...,
+ en:]
by
+
+
+ toP [e1, ..., en]
+
+
+ + i.e., we build a list of the array elements, which is, then, converted + into a parallel array. + +
+ Array patterns are much more tricky as element positions may contain
+ further patterns and the pattern matching compiler
+ requires us to flatten all those out. But before we turn to the gory
+ details, here first the basic idea: A flat array pattern matches exactly
+ iff it's length corresponds to the length of the matched array. Hence,
+ if we have a set of flat array patterns matching an array value
+ a
, it suffices to generate a Core case
+ expression that scrutinises lengthP a
and has one
+ alternative for every length of array occuring in one of the patterns.
+ Moreover, there needs to be a default case catching all other array
+ lengths. In each alternative, array indexing (i.e., the functions
+ !:
) is used to bind array elements to the corresponding
+ pattern variables. This sounds easy enough and is essentially what the
+ parallel array equation of the function DsUtils
.mkCoAlgCaseMatchResult
+ does.
+
+ Unfortunately, however, the pattern matching compiler expects that it
+ can turn (almost) any pattern into variable patterns, literals, or
+ constructor applications by way of the functions Match
.tidy1
.
+ And to make matters worse, some weird machinery in the module Check
+ insists on being able to reverse the process (essentially to pretty
+ print patterns in warnings for incomplete or overlapping patterns).
+
+ The solution to this is an (unlimited) set of fake constructors
+ for parallel arrays, courtesy of TysWiredIn
.parrFakeCon
.
+ In other words, any pattern of the form [:p1,
+ ..., pn:]
is transformed into
+
+
+ MkPArrayn p1 ... pn
+
+
+
+ by Match.tidy1
, then, run through the rest of the pattern
+ matching compiler, and finally, picked up by
+ DsUtils.mkCoAlgCaseMatchResult
, which converts it into a
+ case
expression as outlined above.
+
+ As an example consider the source expression +
++case v of + [:x1:] -> e1 + [:x2, x3, x4:] -> e2 + _ -> e3+
+ Match.tidy1
converts it into a form that is equivalent to
+
++case v of { + MkPArr1 x1 -> e1; + MkPArr2 x2 x3 x4 -> e2; + _ -> e3; +}+
+ which DsUtils.mkCoAlgCaseMatchResult
turns into the
+ following Core code:
+
+ ++ case lengthP v of + Int# i# -> + case i# of l { + DFT -> e3 + 1 -> let x1 = v!:0 in e1 + 3 -> let x2 = v!:0; x2 = v!:1; x3 = v!:2 in e2 + }+
+ The most challenging construct of the three are array comprehensions. + In principle, it would be possible to transform them in essentially the + same way as list comprehensions, but this would lead to abysmally slow + code as desugaring of list comprehensions generates code that is + optimised for sequential, constructor-based structures. In contrast, + array comprehensions need to be transformed into code that solely relies + on collective operations and avoids the creation of many small + intermediate arrays. +
+ The transformation is implemented by the function DsListComp
.dsPArrComp
.
+ In the following, we denote this transformation function by the form
+ <<e>> pa ea
, where e
+ is the comprehension to be compiled and the arguments pa
+ and ea
denote a pattern and the currently processed array
+ expression, respectively. The invariant constraining these two
+ arguments is that all elements in the array produced by ea
+ will successfully match against pa
.
+
+ Given a source-level comprehensions [:e | qss:]
, we compile
+ it with <<[:e | qss:]>> () [:():]
using the
+ rules
+
++<<[:e' | :]>> pa ea = mapP (\pa -> e') ea +<<[:e' | b , qs:]>> pa ea = <<[:e' | qs:]>> pa (filterP (\pa -> b) ea) +<<[:e' | p <- e, qs:]>> pa ea = + let ef = filterP (\x -> case x of {p -> True; _ -> False}) e + in + <<[:e' | qs:]>> (pa, p) (crossP ea ef) +<<[:e' | let ds, qs:]>> pa ea = + <<[:e' | qs:]>> (pa, (x_1, ..., x_n)) + (mapP (\v@pa -> (v, let ds in (x_1, ..., x_n))) ea) +where + {x_1, ..., x_n} = DV (ds) -- Defined Variables +<<[:e' | qs | qss:]>> pa ea = + <<[:e' | qss:]>> (pa, (x_1, ..., x_n)) + (zipP ea <<[:(x_1, ..., x_n) | qs:]>>) +where + {x_1, ..., x_n} = DV (qs)+
+ We assume the denotation of crossP
to be given by
+
++crossP :: [:a:] -> [:b:] -> [:(a, b):] +crossP a1 a2 = let + len1 = lengthP a1 + len2 = lengthP a2 + x1 = concatP $ mapP (replicateP len2) a1 + x2 = concatP $ replicateP len1 a2 + in + zipP x1 x2+
+ For a more efficient implementation of crossP
, see
+ PrelPArr
.
+
+ Moreover, the following optimisations are important: +
p <- e
rule, if pa == ()
, drop
+ it and simplify the crossP ea e
to e
.
+ ++mapFilterP :: (a -> Maybe b) -> [:a:] -> [:b:]+
+ Even with fusion (mapP (\p -> e) . filterP (\p ->
+ b))
may still result in redundant pattern matching
+ operations. (Let's wait with this until we have seen what the
+ Simplifier does to the generated code.)
+
+ On the quest towards an entirely unboxed representation of parallel
+ arrays, the flattening transformation is the essential ingredient. GHC
+ uses a substantially
+ improved version of the transformation whose original form was
+ described by Blelloch & Sabot. The flattening transformation
+ replaces values of type [:a:]
as well as functions
+ operating on these values by alternative, more efficient data structures
+ and functions.
+
+ The flattening machinery is activated by the option
+ -fflatten
, which is a static hsc option. It consists of
+ two steps: function vectorisation and array specialisation. Only the
+ first of those is implemented so far. If selected, the transformation
+ is applied to a module in Core form immediately after the desugarer, before the Mighty Simplifier gets to do its
+ job. After vectorisation, the Core program can be dumped using the
+ option -ddump-vect
. The is a good reason for us to perform
+ flattening immediately after the desugarer. In HscMain
.hscRecomp
+ the so-called persistent compiler state is maintained, which
+ contains all the information about imported interface files needed to
+ lookup the details of imported names (e.g., during renaming and type
+ checking). However, all this information is zapped before the
+ simplifier is invoked (supposedly to reduce the space-consumption of
+ GHC). As flattening has to get at all kinds of identifiers from Prelude
+ modules, we need to do it before the relevant information in the
+ persistent compiler state is gone.
+
+
+ As flattening generally requires all libraries to be compiled for
+ flattening (just like profiling does), there is a compiler way
+ "ndp"
, which can be selected using the way option code
+ -ndp
. This option will automagically select
+ -fparr
and -fflatten
.
+
+
FlattenMonad
+ The module FlattenMonad
+ implements the monad Flatten
that is used during
+ vectorisation to keep track of various sets of bound variables and a
+ variable substitution map; moreover, it provides a supply of new uniques
+ and allows us to look up names in the persistent compiler state (i.e.,
+ imported identifiers).
+
+ In order to be able to look up imported identifiers in the persistent
+ compiler state, it is important that these identifies are included into
+ the free variable lists computed by the renamer. More precisely, all
+ names needed by flattening are included in the names produced by
+ RnEnv.getImplicitModuleFVs
. To avoid putting
+ flattening-dependent lists of names into the renamer, the module
+ FlattenInfo
exports namesNeededForFlattening
.
+
+ [FIXME: It might be worthwhile to document in the non-Flattening part of
+ the Commentary that the persistent compiler state is zapped after
+ desugaring and how the free variables determined by the renamer imply
+ which names are imported.]
+
+
+ +Last modified: Tue Feb 12 01:44:21 EST 2002 + + + +