ghc/docs/abstracts/abstracts89.tex

   1 \documentstyle[11pt,slpj]{article}
   2
   3 \newcommand{\reference}[4]{     % authors, title, details, abstract
   4 \large
   5 #1, {\em #2}, #3.
   6 \normalsize
   7 \begin{quotation}
   8 #4
   9 \end{quotation}
  10 \vspace{0.2in}
  11 }
  12
  13 \newcommand{\Haskell}[1]{{\sc Haskell}}
  14
  15 \begin{document}
  16
  17 \title{Abstracts of GRIP/GRASP-related papers and reports till 1989\\
  18 Dept of Computing Science \\
  19 University of Glasgow G12 8QQ}
  20
  21 \author{
  22 Cordelia Hall (cvh@cs.glasgow.ac.uk) \and
  23 Kevin Hammond (kh@cs.glasgow.ac.uk) \and
  24 Will Partain (partain@cs.glasgow.ac.uk) \and
  25 Simon L Peyton Jones (simonpj@cs.glasgow.ac.uk) \and
  26 Phil Wadler (wadler@cs.glasgow.ac.uk)
  27 }
  28
  29 \maketitle
  30
  31 \begin{abstract}
  32 We present a list of papers and reports related to the GRIP
  33 and GRASP projects,
  34 covering {\em the design, compilation technology,
  35 and parallel implementations of functional programming languages, especially
  36 \Haskell{}}.
  37
  38 Most of them can be obtained by writing to
  39 Teresa Currie, Dept of Computing Science,
  40 University of Glasgow G12 8QQ, UK.   Her electronic mail address is
  41 teresa@uk.ac.glasgow.cs.
  42
  43 Those marked ($\spadesuit$) can be obtained from the School of Information
  44 Systems, University of East Anglia, Norwich, UK.
  45 Those marked ($\clubsuit$) can be obtained from Chris Clack at the
  46 Department of Computer Science, University College London, Gower St,
  47 London WC1E 6BT, UK.
  48 \end{abstract}
  49
  50 \section{Published papers}
  51
  52 \reference{Simon L Peyton Jones and Jon Salkild}
  53 {The Spineless Tagless G-machine}
  54 {Proc IFIP Symposium on Functional Programming Languages and Computer
  55 Architecture, London, Sept 1989}
  56 {
  57 The Spineless Tagless G-machine is an abstract machine based on graph
  58 reduction, designed as a target for compilers for non-strict functional
  59 languages.
  60 As its name implies, it is a development of earlier work, especially
  61 the G-machine and Tim.
  62
  63 It has a number of unusual features: the abstract machine code is
  64 rather higher-level than is common, allowing better code generation;
  65 the representation of the graph eliminates most interpretive overheads;
  66 vectored returns from data structures give fast case-analysis;
  67 and the machine is readily extended for a parallel implementation.
  68
  69 The resulting implementation runs at least 75\% faster
  70 than the Chalmers G-machine.
  71 }
  72
  73 \reference{Simon L Peyton Jones}
  74 {Parallel implementations of functional programming languages}
  75 {Computer Journal 32(2), pp175-186, April 1989}
  76 {
  77 It is now very nearly as easy to build a parallel computer
  78 as to build a sequential one, and there are strong incentives to do so:
  79 parallelism seems to offer the opportunity to improve both the
  80 absolute performance level and the cost/performance ratio of our machines.
  81
  82 One of the most attractive features of functional programming languages
  83 is their suitability for programming such parallel computers.
  84 This paper is devoted to a discussion of this claim.
  85
  86 First of all, we discuss parallel functional programming
  87 from the programmer's point of view.
  88 Most parallel functional language implementations are based on graph reduction,
  89 we proceed to a discussion of some implementation issues raised by parallel
  90 graph reduction.
  91 The paper concludes with a case study of a particular parallel graph reduction
  92 machine, GRIP, and a brief survey of other similar machines.
  93 }
  94
  95 \reference{Kevin Hammond}
  96 {Implementing Functional Languages for Parallel Machines}
  97 {PhD thesis, University of East Anglia, 1989 ($\spadesuit$)}
  98 {Commencing with the Standard ML language, dialects  XSML  and  PSML  are
  99 defined,  which permit parallel evaluation of functional programs.  XSML
 100 is Standard ML with a novel mechanism for handling exceptions; PSML is a
 101 side-effect  free  version  of  XSML.  A formal semantics for PSML and a
 102 translation algorithm from this language into Dactl, a  compiler  target
 103 language  based  on  the  theory of graph-rewriting, are presented.  The
 104 thesis proves that a simplified version of  this  translation  preserves
 105 meaning  for flat domains, and that the strategy for reduction to normal
 106 form is correct.
 107
 108 The implementation of standard compilation techniques such as strictness
 109 analysis, maximal free sub-expression elision and common sub-expresssion
 110 elimination  is  considered  with  respect  to   Dactl,   and   problems
 111 highlighted.     Techniques    are    also   presented   for   compiling
 112 exception-handling  correctly  in  a  parallel  environment,   and   for
 113 compiling  side-effect  for a parallel machine.  Metrics for performance
 114 evaluation are presented and results obtained using the Dactl  reference
 115 interpreter are presented.}
 116
 117
 118 \reference{Simon L Peyton Jones, Chris Clack and Jon Salkild}
 119 {High-performance parallel graph reduction}
 120 {Proc Parallel Architectures and Languages Europe (PARLE), LNCS 365, pp193-207,
 121 July 1989}
 122 {
 123 Parallel graph reduction is an attractive implementation for functional
 124 programming languages because of its simplicity and inherently distributed
 125 nature.
 126 This paper outlines some of the issues raised by parallel compiled
 127 graph reduction, and presents the solutions we have adopted to produce an
 128 efficient implementation.
 129
 130 We concentrate on two particular issues:
 131 the efficient control of parallelism, resulting in an ability to alter
 132 the granularity of parallelism
 133 {\em dynamically};
 134 and the efficient use of the memory hierachy to improve locality.
 135 }
 136
 137 \reference
 138 {Phil Trinder and Philip Wadler}
 139 {Improving list comprehension database queries}
 140 {{\em TENCON '89\/} (IEEE Region 10 Conference),
 141 Bombay, India, November 1989.}
 142 {
 143 The task of increasing the efficiency of database queries has recieved
 144 considerable attention. In this paper we describe the improvement of
 145 queries expressed as list comprehensions in a lazy functional
 146 language. The database literature identifies four algebraic and two
 147 implementation-based improvement strategies. For each strategy we show
 148 an equivalent improvement for queries expressed as list
 149 comprehensions.  This means that well-developed database algorithms
 150 that improve queries using several of these strategies can be emulated
 151 to improve comprehension queries.  We are also able to improve queries
 152 which require greater power than that provided by the relational
 153 algebra.  Most of the improvements entail transforming a simple,
 154 inefficient query into a more complex, but more efficient form. We
 155 illustrate each improvement using examples drawn from the database
 156 literature.
 157 }
 158
 159 \reference{Kevin Hammond}
 160 {Exception Handling in a Parallel Functional Language: PSML}
 161 {Proc TENCON '89, Bombay, India, Nov 1989}
 162 {
 163 Handling exception occurrences during computation is a problem  in  most
 164 functional programming languages, even when the computation is eager and
 165 sequential.  This paper presents a version of  the  error  value  method
 166 which  allows lazy computation with deterministic semantics for parallel
 167 evaluation even in the presence of  errors.   The  realisation  of  this
 168 technique   is   illustrated  by  reference  to  PSML,  a  referentially
 169 transparent variant of Standard ML designed for parallel evaluation.
 170 }
 171
 172 \reference
 173 {Philip Wadler}
 174 {Theorems for free!}
 175 {{\em 4'th International Conference on Functional Programming
 176 Languages and Computer Architecture}, London, September 1989.}
 177 {
 178 From the type of a polymorphic function we can derive a theorem
 179 that it satisfies.  Every function of the same type satisfies the same
 180 theorem.  This provides a free source of useful theorems,
 181 courtesy of Reynolds' abstraction theorem for the polymorphic lambda
 182 calculus.
 183 }
 184
 185 \reference
 186 {Philip Wadler and Stephen Blott}
 187 {How to make {\em ad-hoc\/} polymorphism less {\em ad hoc}}
 188 {{\em 16'th ACM Symposium on Principles of Programming Languages},
 189 Austin, Texas, January 1989.}
 190 {
 191 This paper presents {\em type classes}, a new approach to {\em
 192 ad-hoc\/} polymorphism.  Type classes permit overloading of arithmetic
 193 operators such as multiplication, and generalise the ``eqtype variables''
 194 of Standard ML.
 195 Type classes extend the Hindley\X Milner polymorphic type system, and
 196 provide a new approach to issues that arise in object-oriented
 197 programming, bounded type quantification, and abstract data types.
 198 This paper provides an informal introduction to type classes, and
 199 defines them formally by means of type inference rules.
 200 }
 201
 202 \reference{Kevin Hammond}
 203 {Implementing Type Classes for Haskell}
 204 {Proc Glasgow  Workshop  on  Functional  Programming,  Fraserburgh, Aug 1989}
 205 {
 206 This  paper describes the implementation of the type class mechanism for
 207 the functional language Haskell, which has been  undertaken  at  Glasgow
 208 University.  A simple introduction to type classes discusses the methods
 209 used to  select  operators  and  dictionaries  in  the  Glasgow  Haskell
 210 compiler.    A   solution   to  the  problem  of  selecting  super-class
 211 dictionaries, not considered by the original paper  on  type  class,  is
 212 also  presented.   The  modifications which must be made to the standard
 213 Hindley/Milner type-checking algorithm  to  permit  the  translation  of
 214 operators  are  described,  and  a  revised definition of algorithm W is
 215 provided. Finally, a set of performance figures  compares  the  run-time
 216 efficiency of Haskell and LML programs, indicating the overhead inherent
 217 in the original, naive method of operator selection, and the improvement
 218 which may be obtained through simple optimisations.
 219 }
 220
 221 \reference{Simon L Peyton Jones}
 222 {FLIC - a functional language intermediate code}
 223 {SIGPLAN Notices 23(8) 1988, revised 1989}
 224 {
 225 FLIC is a Functional Language Intermediate Code, intended to
 226 provide a common intermediate language between diverse
 227 implementations of functional languages, including parallel
 228 ones.
 229 This paper gives a formal definition of FLIC's syntax and
 230 semantics, in the hope that its existence may encourage greater
 231 exchange of programs and benchmarks between research groups.
 232 }
 233
 234 \reference{Simon L Peyton Jones, Chris Clack, Jon Salkild, Mark Hardie}
 235 {Functional programming on the GRIP multiprocessor}
 236 {Proc IEE Seminar on Digital Parallel Processors, Lisbon, Portugal, 1988}
 237 {
 238 Most MIMD computer architectures can be classified as
 239 tightly-coupled or loosely-coupled,
 240 depending on the relative latencies seen by a processor accessing different
 241 parts of its address space.
 242
 243 By adding microprogrammable functionality to the memory units, we have
 244 developed a MIMD computer architecture which explores the middle region
 245 of this spectrum.
 246 This has resulted in an unusual and flexible bus-based multiprocessor,
 247 which we are using as a base for our research in parallel functional programming
 248 languages.
 249
 250 In this paper we introduce parallel functional programming, and describe
 251 the architecture of the GRIP multiprocessor.
 252 }
 253
 254 \reference{Geoffrey Burn, Simon L Peyton Jones, and John Robson}
 255 {The spineless G-machine}
 256 {Proc ACM Conference on Lisp and Functional Programming, Snowbird, pp244-258,
 257 August 1988}
 258 {
 259 Recent developments in functional language implementations have
 260 resulted in the G-machine, a programmed graph-reduction machine.
 261 Taking this as a basis, we introduce an optimised method of
 262 performing graph reduction, which does not need to build the
 263 spine of the expression being reduced.
 264 This Spineless G-machine only updates shared expressions, and
 265 then only when they have been reduced to weak head normal form.
 266 It is thus more efficient than the standard method of performing
 267 graph reduction.
 268
 269 We begin by outlining the philosophy and key features of the
 270 Spineless G-machine, and comparing it with the standard
 271 G-machine.
 272 Simulation results for the two machines are then presented and
 273 discussed.
 274
 275 The Spineless G-machine is also compared with Tim, giving a
 276 series of transformations by which they can be interconverted.
 277 These open up a wide design space for abstract graph reduction
 278 machines, which was previously unknown.
 279
 280 A full specification of the machine is given in the appendix,
 281 together with compilation rules for a simple functional language.
 282 }
 283
 284 \reference{Chris Hankin, Geoffrey Burn, and Simon L Peyton Jones}
 285 {A safe approach to parallel combinator reduction}
 286 {Theoretical Computer Science 56, pp17-36, North Holland, 1988}
 287 {
 288 In this paper we present the results of two pieces of work which, when
 289 combined, allow us to take a program text of a functional langauge and
 290 produce a parallel implementation of that program.
 291 We present the techniques for discovering sources of parallelism in
 292 a program at compile time, and then show how this parallelism is
 293 naturally mapped onto a parallel combinator set that we will define.
 294
 295 To discover sources of parallelism in a program, we use
 296 {\em abstract interpretation} a compile-time technique which is used
 297 to gain information about a program which may then be used to optimise
 298 the program's execution.
 299 A particular use of abstract interpretation is in
 300 {\em strictness analysis}
 301 of functional program.
 302 In a language that has lazy semantics, the main potential for parallelism
 303 arises in the evaluation of arguments of strict operators.
 304
 305 Having identified the sources of parallelism at compile time, it is
 306 necessary to communicate these to the run-time system.
 307 In the second part of the paper we introduce an extended set of combinators,
 308 including some parallel combinators, to achieve this purpose.
 309 }
 310
 311 \reference{Simon L Peyton Jones}
 312 {GRIP - a parallel processor for functional languages}
 313 {Electronics and Power, pp633-636, Oct 1987;
 314 also in ICL Technical Journal 5(3), May 1987}
 315 {
 316 A brief 4-page article about the GRIP architecture.
 317 }
 318
 319 \reference{Simon L Peyton Jones, Chris Clack, Jon Salkild, and Mark Hardie}
 320 {GRIP - a high-performance architecture for parallel graph reduction}
 321 {Proc IFIP conference on Functional Programming Languages and
 322 Computer Architecture, Portland,
 323 ed Kahn, Springer Verlag LNCS 274, pp98-112, Sept 1987}
 324 {
 325 GRIP is a high-performance parallel machine designed to execute
 326 functional programs using supercombinator graph reduction.
 327 It uses a high-bandwidth bus to provide access to a
 328 large, distributed shared memory, using intelligent memory units and
 329 packet-switching protocols to increase the number of processors
 330 which the bus can support.
 331 GRIP is also being programmed to support parallel Prolog and
 332 DACTL.
 333
 334 We outline GRIP's architecture and firmware, discuss the major design
 335 issues, and describe the current state of the project and
 336 our plans for the future.
 337 }
 338
 339 \reference{Simon L Peyton Jones and Chris Clack}
 340 {Finding fixpoints in abstract interpretation}
 341 {in Abstract Interpretation of Declarative Languages,
 342 ed Hankin \& Abramsky, Ellis Horwood, pp246-265, 1987.}
 343 {
 344 Abstract interpretation is normally used as the basis for
 345 a static, compile-time analysis of a program.
 346 For example, strictness analysis attempts to establish which
 347 functions in the program are strict (we will use strictness
 348 analysis as a running example).
 349
 350 Using abstract interpretation in this way requires the
 351 compile-time evaluation of expressions in the abstract domain.
 352 It is obviously desirable that this evaluation should
 353 always terminate, since otherwise the compiler would risk
 354 non-termination.
 355 In the case of non-recursive functions there is no problem, and
 356 termination is guaranteed.
 357 Recursive functions, however, present more of a problem, and it
 358 is the purpose of this paper to explain the problem and to
 359 offer some practical solutions.
 360 }
 361
 362 \reference{Chris Clack and Simon L Peyton Jones}
 363 {The four-stroke reduction engine}
 364 {Proc ACM Conference on Lisp and Functional Programming,
 365 Boston, pp220-232, Aug 1986}
 366 {
 367 Functional languages are widely claimed to be amenable to concurrent
 368 execution by multiple processors.  This paper presents an algorithm for
 369 the parallel graph reduction of a functional program.
 370 The algorithm supports transparent management of parallel
 371 tasks with no explicit
 372 communication between processors.
 373 }
 374
 375 \reference{Simon L Peyton Jones}
 376 {Functional programming languages as a software engineering tool}
 377 {in Software Engineering - the critical decade D Ince,
 378 Peter Peregrinus, pp124-151, 1986}
 379 {
 380 It is the purpose of this paper to suggest that functional
 381 languages are an appropriate tool for supporting the activity
 382 of programming in the large, and to present a justification of
 383 this claim.
 384 }
 385
 386 \reference{Simon L Peyton Jones}
 387 {Using Futurebus in a fifth generation computer architecture}
 388 {Microprocessors and Microsystems 10(2), March 1986}
 389 {
 390 Despite the bandwidth limitations of a bus, we present a design
 391 for a parallel computer (GRIP) based on Futurebus, which limits bus
 392 bandwidth requirements by using intelligent memories.
 393
 394 Such a machine offers higher performance than a uniprocessor
 395 and lower cost than a more extensible multiprocessor, as well
 396 as serving as a vehicle for research in parallel architectures.
 397 }
 398
 399
 400 \section{Internal reports}
 401
 402 \reference{Kevin Hammond and John Glauert}
 403 {Implementing Pattern-Matching Functional Languages using Dactl}
 404 {University of Glasgow, 1989}
 405 {
 406 This paper describes the implementation of a family of  pattern-matching
 407 functional  languages  in  the  parallel graph-rewriting language Dactl.
 408 Attention  is   focussed   on   the   direct   implementation   of   the
 409 pattern-matching   constructs   in  the  context  of  various  reduction
 410 strategies: eager, lazy, and lazy with  strictness  analysis.   Two  new
 411 reduction  strategies  combining  lazy  evaluation  with a technique for
 412 compiling non-overlapping patterns are  also  illustrated.   The  latter
 413 strategies   provide   improved  termination  properties  compared  with
 414 conventional functional  language  implementations  for  non-overlapping
 415 patterns.  The implementations described here cover all pattern-matching
 416 constructs found in Standard  ML,  including  named  patterns  and  deep
 417 patterns.   The  use  of  Dactl  renders  explicit  the  complexities of
 418 pattern-matching which are obscured by implementation in a  conventional
 419 intermediate language or abstract machine.
 420 }
 421
 422 \reference{Simon L Peyton Jones}
 423 {A practical technique for designing asynchronous finite-state machines}
 424 {Proc Glasgow Workshop on Functional Programming, Fraserburgh,Aug 1989}
 425 {
 426 The literature on asynchronous logic design is mostly of a fairly theoretical
 427 nature.   We present a practical technique for generating asynchronous finite-state
 428 machines from a description of their states and transitions.  The technique
 429 has been used successfully to design a number of state machines in
 430 the GRIP mulitprocessor.
 431 }
 432
 433 \reference{Kevin Hammond}
 434 {A Proposal for an Implementation of Full Dactl on a Meiko Transputer Rack}
 435 {SYS-C89-02, University of East Anglia, 1989}
 436 {
 437 The  design  of  an  abstract  machine  instruction  set  for  Dactl  is
 438 described.   The  instruction set is sufficient to encapsulate all Dactl
 439 constructs; it will also permit  parallel  execution  where  applicable.
 440 The  paper  considers the difficulties involved in the implementation of
 441 this abstract instruction set on the  UEA  Meiko  M40  transputer  rack,
 442 using  a  ZAPP-style  kernel.  Part of the code for a simulation of this
 443 instruction set is included as an appendix to the report.
 444 }
 445
 446 \reference{Chris Clack}
 447 {Tuning the four-stroke reduction engine}
 448 {University College London, January 1989 ($\clubsuit$)}
 449 {
 450 This paper analyses the current implementation of the four-stroke reduction
 451 engine (a virtual machine for parallel graph reduction).
 452 The current implementation is shown to be inefficient, and a number of
 453 enhancements are suggested.
 454 This paper proposes that major performance benefits will accrue from
 455 increasing the intelligence of the memory units and giving them a more
 456 important role in the four-stroke cycle.
 457 }
 458
 459 \reference{Chris Clack}
 460 {Performance cost accounting for GRIP}
 461 {University College London, January 1989 ($\clubsuit$)}
 462 {
 463 This paper presents a general model for efficiency anakysis of shared-memory
 464 parallel graph reduction architectures.
 465 The efficiency of the GRIP implementation of the four-stroke reduction engine
 466 is subsequently analysed by approtioning costs to the various components
 467 of the general model.
 468
 469 In particular, attention is focussed on the two aspects of execution
 470 profiling, and analysis of resource utilsation.
 471 }
 472
 473 \reference{Chris Clack}
 474 {Diagnosis and cure for dislocated spines}
 475 {University College London, June 1988 ($\clubsuit$)}
 476 {
 477 Locality of reference is a key issue for parallel machines, and especially
 478 for parallel implementations of graph reduction.
 479 If locality can be achieved then communications costs fall,
 480 and we are better able to exploit distributed architectures.
 481 This paper analyses a particular implementation of graph reduction --
 482 the four-stroke reduction engine -- and introduces the concept of
 483 spine-locality as a basis for graph building and task-scheduling strategies
 484 that enhance locality.
 485 }
 486
 487 \end{document}