1 \documentstyle[11pt,slpj]{article}
3 \newcommand{\reference}[4]{ % authors, title, details, abstract
13 \newcommand{\Haskell}[1]{{\sc Haskell}}
17 \title{Abstracts of GRIP/GRASP-related papers and reports till 1989\\
18 Dept of Computing Science \\
19 University of Glasgow G12 8QQ}
22 Cordelia Hall (cvh@cs.glasgow.ac.uk) \and
23 Kevin Hammond (kh@cs.glasgow.ac.uk) \and
24 Will Partain (partain@cs.glasgow.ac.uk) \and
25 Simon L Peyton Jones (simonpj@cs.glasgow.ac.uk) \and
26 Phil Wadler (wadler@cs.glasgow.ac.uk)
32 We present a list of papers and reports related to the GRIP
34 covering {\em the design, compilation technology,
35 and parallel implementations of functional programming languages, especially
38 Most of them can be obtained by writing to
39 Teresa Currie, Dept of Computing Science,
40 University of Glasgow G12 8QQ, UK. Her electronic mail address is
41 teresa@uk.ac.glasgow.cs.
43 Those marked ($\spadesuit$) can be obtained from the School of Information
44 Systems, University of East Anglia, Norwich, UK.
45 Those marked ($\clubsuit$) can be obtained from Chris Clack at the
46 Department of Computer Science, University College London, Gower St,
50 \section{Published papers}
52 \reference{Simon L Peyton Jones and Jon Salkild}
53 {The Spineless Tagless G-machine}
54 {Proc IFIP Symposium on Functional Programming Languages and Computer
55 Architecture, London, Sept 1989}
57 The Spineless Tagless G-machine is an abstract machine based on graph
58 reduction, designed as a target for compilers for non-strict functional
60 As its name implies, it is a development of earlier work, especially
61 the G-machine and Tim.
63 It has a number of unusual features: the abstract machine code is
64 rather higher-level than is common, allowing better code generation;
65 the representation of the graph eliminates most interpretive overheads;
66 vectored returns from data structures give fast case-analysis;
67 and the machine is readily extended for a parallel implementation.
69 The resulting implementation runs at least 75\% faster
70 than the Chalmers G-machine.
73 \reference{Simon L Peyton Jones}
74 {Parallel implementations of functional programming languages}
75 {Computer Journal 32(2), pp175-186, April 1989}
77 It is now very nearly as easy to build a parallel computer
78 as to build a sequential one, and there are strong incentives to do so:
79 parallelism seems to offer the opportunity to improve both the
80 absolute performance level and the cost/performance ratio of our machines.
82 One of the most attractive features of functional programming languages
83 is their suitability for programming such parallel computers.
84 This paper is devoted to a discussion of this claim.
86 First of all, we discuss parallel functional programming
87 from the programmer's point of view.
88 Most parallel functional language implementations are based on graph reduction,
89 we proceed to a discussion of some implementation issues raised by parallel
91 The paper concludes with a case study of a particular parallel graph reduction
92 machine, GRIP, and a brief survey of other similar machines.
95 \reference{Kevin Hammond}
96 {Implementing Functional Languages for Parallel Machines}
97 {PhD thesis, University of East Anglia, 1989 ($\spadesuit$)}
98 {Commencing with the Standard ML language, dialects XSML and PSML are
99 defined, which permit parallel evaluation of functional programs. XSML
100 is Standard ML with a novel mechanism for handling exceptions; PSML is a
101 side-effect free version of XSML. A formal semantics for PSML and a
102 translation algorithm from this language into Dactl, a compiler target
103 language based on the theory of graph-rewriting, are presented. The
104 thesis proves that a simplified version of this translation preserves
105 meaning for flat domains, and that the strategy for reduction to normal
108 The implementation of standard compilation techniques such as strictness
109 analysis, maximal free sub-expression elision and common sub-expresssion
110 elimination is considered with respect to Dactl, and problems
111 highlighted. Techniques are also presented for compiling
112 exception-handling correctly in a parallel environment, and for
113 compiling side-effect for a parallel machine. Metrics for performance
114 evaluation are presented and results obtained using the Dactl reference
115 interpreter are presented.}
118 \reference{Simon L Peyton Jones, Chris Clack and Jon Salkild}
119 {High-performance parallel graph reduction}
120 {Proc Parallel Architectures and Languages Europe (PARLE), LNCS 365, pp193-207,
123 Parallel graph reduction is an attractive implementation for functional
124 programming languages because of its simplicity and inherently distributed
126 This paper outlines some of the issues raised by parallel compiled
127 graph reduction, and presents the solutions we have adopted to produce an
128 efficient implementation.
130 We concentrate on two particular issues:
131 the efficient control of parallelism, resulting in an ability to alter
132 the granularity of parallelism
134 and the efficient use of the memory hierachy to improve locality.
138 {Phil Trinder and Philip Wadler}
139 {Improving list comprehension database queries}
140 {{\em TENCON '89\/} (IEEE Region 10 Conference),
141 Bombay, India, November 1989.}
143 The task of increasing the efficiency of database queries has recieved
144 considerable attention. In this paper we describe the improvement of
145 queries expressed as list comprehensions in a lazy functional
146 language. The database literature identifies four algebraic and two
147 implementation-based improvement strategies. For each strategy we show
148 an equivalent improvement for queries expressed as list
149 comprehensions. This means that well-developed database algorithms
150 that improve queries using several of these strategies can be emulated
151 to improve comprehension queries. We are also able to improve queries
152 which require greater power than that provided by the relational
153 algebra. Most of the improvements entail transforming a simple,
154 inefficient query into a more complex, but more efficient form. We
155 illustrate each improvement using examples drawn from the database
159 \reference{Kevin Hammond}
160 {Exception Handling in a Parallel Functional Language: PSML}
161 {Proc TENCON '89, Bombay, India, Nov 1989}
163 Handling exception occurrences during computation is a problem in most
164 functional programming languages, even when the computation is eager and
165 sequential. This paper presents a version of the error value method
166 which allows lazy computation with deterministic semantics for parallel
167 evaluation even in the presence of errors. The realisation of this
168 technique is illustrated by reference to PSML, a referentially
169 transparent variant of Standard ML designed for parallel evaluation.
175 {{\em 4'th International Conference on Functional Programming
176 Languages and Computer Architecture}, London, September 1989.}
178 From the type of a polymorphic function we can derive a theorem
179 that it satisfies. Every function of the same type satisfies the same
180 theorem. This provides a free source of useful theorems,
181 courtesy of Reynolds' abstraction theorem for the polymorphic lambda
186 {Philip Wadler and Stephen Blott}
187 {How to make {\em ad-hoc\/} polymorphism less {\em ad hoc}}
188 {{\em 16'th ACM Symposium on Principles of Programming Languages},
189 Austin, Texas, January 1989.}
191 This paper presents {\em type classes}, a new approach to {\em
192 ad-hoc\/} polymorphism. Type classes permit overloading of arithmetic
193 operators such as multiplication, and generalise the ``eqtype variables''
195 Type classes extend the Hindley\X Milner polymorphic type system, and
196 provide a new approach to issues that arise in object-oriented
197 programming, bounded type quantification, and abstract data types.
198 This paper provides an informal introduction to type classes, and
199 defines them formally by means of type inference rules.
202 \reference{Kevin Hammond}
203 {Implementing Type Classes for Haskell}
204 {Proc Glasgow Workshop on Functional Programming, Fraserburgh, Aug 1989}
206 This paper describes the implementation of the type class mechanism for
207 the functional language Haskell, which has been undertaken at Glasgow
208 University. A simple introduction to type classes discusses the methods
209 used to select operators and dictionaries in the Glasgow Haskell
210 compiler. A solution to the problem of selecting super-class
211 dictionaries, not considered by the original paper on type class, is
212 also presented. The modifications which must be made to the standard
213 Hindley/Milner type-checking algorithm to permit the translation of
214 operators are described, and a revised definition of algorithm W is
215 provided. Finally, a set of performance figures compares the run-time
216 efficiency of Haskell and LML programs, indicating the overhead inherent
217 in the original, naive method of operator selection, and the improvement
218 which may be obtained through simple optimisations.
221 \reference{Simon L Peyton Jones}
222 {FLIC - a functional language intermediate code}
223 {SIGPLAN Notices 23(8) 1988, revised 1989}
225 FLIC is a Functional Language Intermediate Code, intended to
226 provide a common intermediate language between diverse
227 implementations of functional languages, including parallel
229 This paper gives a formal definition of FLIC's syntax and
230 semantics, in the hope that its existence may encourage greater
231 exchange of programs and benchmarks between research groups.
234 \reference{Simon L Peyton Jones, Chris Clack, Jon Salkild, Mark Hardie}
235 {Functional programming on the GRIP multiprocessor}
236 {Proc IEE Seminar on Digital Parallel Processors, Lisbon, Portugal, 1988}
238 Most MIMD computer architectures can be classified as
239 tightly-coupled or loosely-coupled,
240 depending on the relative latencies seen by a processor accessing different
241 parts of its address space.
243 By adding microprogrammable functionality to the memory units, we have
244 developed a MIMD computer architecture which explores the middle region
246 This has resulted in an unusual and flexible bus-based multiprocessor,
247 which we are using as a base for our research in parallel functional programming
250 In this paper we introduce parallel functional programming, and describe
251 the architecture of the GRIP multiprocessor.
254 \reference{Geoffrey Burn, Simon L Peyton Jones, and John Robson}
255 {The spineless G-machine}
256 {Proc ACM Conference on Lisp and Functional Programming, Snowbird, pp244-258,
259 Recent developments in functional language implementations have
260 resulted in the G-machine, a programmed graph-reduction machine.
261 Taking this as a basis, we introduce an optimised method of
262 performing graph reduction, which does not need to build the
263 spine of the expression being reduced.
264 This Spineless G-machine only updates shared expressions, and
265 then only when they have been reduced to weak head normal form.
266 It is thus more efficient than the standard method of performing
269 We begin by outlining the philosophy and key features of the
270 Spineless G-machine, and comparing it with the standard
272 Simulation results for the two machines are then presented and
275 The Spineless G-machine is also compared with Tim, giving a
276 series of transformations by which they can be interconverted.
277 These open up a wide design space for abstract graph reduction
278 machines, which was previously unknown.
280 A full specification of the machine is given in the appendix,
281 together with compilation rules for a simple functional language.
284 \reference{Chris Hankin, Geoffrey Burn, and Simon L Peyton Jones}
285 {A safe approach to parallel combinator reduction}
286 {Theoretical Computer Science 56, pp17-36, North Holland, 1988}
288 In this paper we present the results of two pieces of work which, when
289 combined, allow us to take a program text of a functional langauge and
290 produce a parallel implementation of that program.
291 We present the techniques for discovering sources of parallelism in
292 a program at compile time, and then show how this parallelism is
293 naturally mapped onto a parallel combinator set that we will define.
295 To discover sources of parallelism in a program, we use
296 {\em abstract interpretation} a compile-time technique which is used
297 to gain information about a program which may then be used to optimise
298 the program's execution.
299 A particular use of abstract interpretation is in
300 {\em strictness analysis}
301 of functional program.
302 In a language that has lazy semantics, the main potential for parallelism
303 arises in the evaluation of arguments of strict operators.
305 Having identified the sources of parallelism at compile time, it is
306 necessary to communicate these to the run-time system.
307 In the second part of the paper we introduce an extended set of combinators,
308 including some parallel combinators, to achieve this purpose.
311 \reference{Simon L Peyton Jones}
312 {GRIP - a parallel processor for functional languages}
313 {Electronics and Power, pp633-636, Oct 1987;
314 also in ICL Technical Journal 5(3), May 1987}
316 A brief 4-page article about the GRIP architecture.
319 \reference{Simon L Peyton Jones, Chris Clack, Jon Salkild, and Mark Hardie}
320 {GRIP - a high-performance architecture for parallel graph reduction}
321 {Proc IFIP conference on Functional Programming Languages and
322 Computer Architecture, Portland,
323 ed Kahn, Springer Verlag LNCS 274, pp98-112, Sept 1987}
325 GRIP is a high-performance parallel machine designed to execute
326 functional programs using supercombinator graph reduction.
327 It uses a high-bandwidth bus to provide access to a
328 large, distributed shared memory, using intelligent memory units and
329 packet-switching protocols to increase the number of processors
330 which the bus can support.
331 GRIP is also being programmed to support parallel Prolog and
334 We outline GRIP's architecture and firmware, discuss the major design
335 issues, and describe the current state of the project and
336 our plans for the future.
339 \reference{Simon L Peyton Jones and Chris Clack}
340 {Finding fixpoints in abstract interpretation}
341 {in Abstract Interpretation of Declarative Languages,
342 ed Hankin \& Abramsky, Ellis Horwood, pp246-265, 1987.}
344 Abstract interpretation is normally used as the basis for
345 a static, compile-time analysis of a program.
346 For example, strictness analysis attempts to establish which
347 functions in the program are strict (we will use strictness
348 analysis as a running example).
350 Using abstract interpretation in this way requires the
351 compile-time evaluation of expressions in the abstract domain.
352 It is obviously desirable that this evaluation should
353 always terminate, since otherwise the compiler would risk
355 In the case of non-recursive functions there is no problem, and
356 termination is guaranteed.
357 Recursive functions, however, present more of a problem, and it
358 is the purpose of this paper to explain the problem and to
359 offer some practical solutions.
362 \reference{Chris Clack and Simon L Peyton Jones}
363 {The four-stroke reduction engine}
364 {Proc ACM Conference on Lisp and Functional Programming,
365 Boston, pp220-232, Aug 1986}
367 Functional languages are widely claimed to be amenable to concurrent
368 execution by multiple processors. This paper presents an algorithm for
369 the parallel graph reduction of a functional program.
370 The algorithm supports transparent management of parallel
371 tasks with no explicit
372 communication between processors.
375 \reference{Simon L Peyton Jones}
376 {Functional programming languages as a software engineering tool}
377 {in Software Engineering - the critical decade D Ince,
378 Peter Peregrinus, pp124-151, 1986}
380 It is the purpose of this paper to suggest that functional
381 languages are an appropriate tool for supporting the activity
382 of programming in the large, and to present a justification of
386 \reference{Simon L Peyton Jones}
387 {Using Futurebus in a fifth generation computer architecture}
388 {Microprocessors and Microsystems 10(2), March 1986}
390 Despite the bandwidth limitations of a bus, we present a design
391 for a parallel computer (GRIP) based on Futurebus, which limits bus
392 bandwidth requirements by using intelligent memories.
394 Such a machine offers higher performance than a uniprocessor
395 and lower cost than a more extensible multiprocessor, as well
396 as serving as a vehicle for research in parallel architectures.
400 \section{Internal reports}
402 \reference{Kevin Hammond and John Glauert}
403 {Implementing Pattern-Matching Functional Languages using Dactl}
404 {University of Glasgow, 1989}
406 This paper describes the implementation of a family of pattern-matching
407 functional languages in the parallel graph-rewriting language Dactl.
408 Attention is focussed on the direct implementation of the
409 pattern-matching constructs in the context of various reduction
410 strategies: eager, lazy, and lazy with strictness analysis. Two new
411 reduction strategies combining lazy evaluation with a technique for
412 compiling non-overlapping patterns are also illustrated. The latter
413 strategies provide improved termination properties compared with
414 conventional functional language implementations for non-overlapping
415 patterns. The implementations described here cover all pattern-matching
416 constructs found in Standard ML, including named patterns and deep
417 patterns. The use of Dactl renders explicit the complexities of
418 pattern-matching which are obscured by implementation in a conventional
419 intermediate language or abstract machine.
422 \reference{Simon L Peyton Jones}
423 {A practical technique for designing asynchronous finite-state machines}
424 {Proc Glasgow Workshop on Functional Programming, Fraserburgh,Aug 1989}
426 The literature on asynchronous logic design is mostly of a fairly theoretical
427 nature. We present a practical technique for generating asynchronous finite-state
428 machines from a description of their states and transitions. The technique
429 has been used successfully to design a number of state machines in
430 the GRIP mulitprocessor.
433 \reference{Kevin Hammond}
434 {A Proposal for an Implementation of Full Dactl on a Meiko Transputer Rack}
435 {SYS-C89-02, University of East Anglia, 1989}
437 The design of an abstract machine instruction set for Dactl is
438 described. The instruction set is sufficient to encapsulate all Dactl
439 constructs; it will also permit parallel execution where applicable.
440 The paper considers the difficulties involved in the implementation of
441 this abstract instruction set on the UEA Meiko M40 transputer rack,
442 using a ZAPP-style kernel. Part of the code for a simulation of this
443 instruction set is included as an appendix to the report.
446 \reference{Chris Clack}
447 {Tuning the four-stroke reduction engine}
448 {University College London, January 1989 ($\clubsuit$)}
450 This paper analyses the current implementation of the four-stroke reduction
451 engine (a virtual machine for parallel graph reduction).
452 The current implementation is shown to be inefficient, and a number of
453 enhancements are suggested.
454 This paper proposes that major performance benefits will accrue from
455 increasing the intelligence of the memory units and giving them a more
456 important role in the four-stroke cycle.
459 \reference{Chris Clack}
460 {Performance cost accounting for GRIP}
461 {University College London, January 1989 ($\clubsuit$)}
463 This paper presents a general model for efficiency anakysis of shared-memory
464 parallel graph reduction architectures.
465 The efficiency of the GRIP implementation of the four-stroke reduction engine
466 is subsequently analysed by approtioning costs to the various components
467 of the general model.
469 In particular, attention is focussed on the two aspects of execution
470 profiling, and analysis of resource utilsation.
473 \reference{Chris Clack}
474 {Diagnosis and cure for dislocated spines}
475 {University College London, June 1988 ($\clubsuit$)}
477 Locality of reference is a key issue for parallel machines, and especially
478 for parallel implementations of graph reduction.
479 If locality can be achieved then communications costs fall,
480 and we are better able to exploit distributed architectures.
481 This paper analyses a particular implementation of graph reduction --
482 the four-stroke reduction engine -- and introduces the concept of
483 spine-locality as a basis for graph building and task-scheduling strategies
484 that enhance locality.