1 \documentclass[10pt]{article}
7 \setlength{\parskip}{0.5\baselineskip plus 0.2\baselineskip minus 0.1\baselineskip}
8 \setlength{\parsep}{\parskip}
9 \setlength{\topsep}{0cm}
10 \setlength{\parindent}{0cm}
11 %\oddsidemargin -0.5 in
12 %\evensidemargin -0.5 in
15 \newcommand{\derives}{\mbox{$\rightarrow$}}
16 \newcommand{\orderives}{\mbox{$\mid$}}
17 \newcommand{\many}[1]{\{ {#1} \}}
18 \newcommand{\oneormore}[1]{\{ {#1} \}$^{+}$}
19 \newcommand{\optional}[1]{[ {#1} ]}
21 \newcommand{\at}{\texttt{@}}
23 \newcommand{\lam}{\texttt{\char`\\}}
25 \newcommand{\workingnote}[1]%
27 \framebox{\parbox{.8 \linewidth}
28 {\textbf{\textsl{Working note:}} \textsl{#1}}}
33 \title{An External Representation for the GHC Core Language (DRAFT for GHC5.02)}
34 \author{Andrew Tolmach ({\tt apt@cs.pdx.edu})\\and The GHC Team}
40 This document provides a precise definition for the GHC Core language,
41 so that it can be used to communicate between GHC and new stand-alone
42 compilation tools such as back-ends or optimizers.
43 The definition includes a formal grammar and an informal semantics.
44 An executable typechecker and interpreter (in Haskell),
45 which formally embody the static and dynamic semantics,
46 are available separately.
48 Note: This is a draft document, which attempts to describe GHC's current
49 behavior as precisely as possible. Working notes scattered throughout indicate
50 areas where further work is needed. Constructive comments are very welcome,
51 both on the presentation, and on ways in which GHC could be improved in order
52 to simplify the Core story.
55 \section{Introduction}
57 The Glasgow Haskell Compiler (GHC) uses an intermediate language, called
58 ``Core,'' as its internal program representation during
59 several key stages of compiling.
60 Core resembles a subset of Haskell, but with explicit type annotations
61 in the style of the polymorphic lambda calculus (F$_\omega$).
62 GHC's front end translates full Haskell 98 (plus some extensions) into
63 well-typed Core, which is then repeatedly rewritten by the GHC optimizer.
64 Ultimately, GHC translates Core into STG-machine code and then into
65 C or native code. The rationale for the design of Core and its use are discussed
66 in existing papers~\cite{ghc-inliner,comp-by-trans-scp}, although the (two different)
67 idealized versions of Core described therein differ in significant ways
68 from the actual Core language in current GHC.
70 Researchers interested in writing just {\it part} of a Haskell compiler,
71 such as a new back-end or a new optimizer pass, might like to make
72 use of GHC to provide the other parts of the compiler. For example, they
73 might like to use GHC's front end to parse, desugar, and type-check source Haskell,
74 then feeding the resulting code to their own back-end tool.
75 Currently, they can only do this by linking their code into the
76 GHC executable, which is an arduous process (and essentially requires
77 the new code to be written in Haskell). It would be much easier for
78 external developers if GHC could be made to produce Core files in
79 an agreed-upon external format. To allow the widest range of interoperability,
80 the external format should be text-based; pragmatically, it should
81 also be human-readable. (It may ultimately be desirable to use a
82 standard interchange base format such as ASDL or XML.)
84 In the past, Core has had no rigorously defined external representation, although
85 by setting certain compiler flags, one can get a (rather ad-hoc) textual
86 representation to be printed at various points in the compilation process;
87 this is usually done to help debug the compiler. To make Core fully useable
88 a bi-directional communication format, it will be necssary to
91 \item define precisely the external format of Core;
93 \item modify GHC to produce external Core files, if so requested, at one or more
94 useful points in the compilation sequence -- e.g., just before optimization,
97 \item modify GHC to accept external Core files in place of Haskell
98 source files, again at one or more useful points.
102 The first two facilities will let one couple GHC's front-end (parser,
103 type-checker, etc.), and optionally its optimizer, with new back-end tools.
104 Adding the last facility will let one implement new Core-to-Core
105 transformations in an external tool and integrate them into GHC. It will also
106 allow new front-ends to generate Core that can be fed into GHC's optimizer or
107 back end; however, because there are many (undocumented)
108 idiosynracies in the way GHC produces Core from source Haskell, it will be hard
109 for an external tool to produce Core that can be integrated with GHC-produced core
110 (e.g., for the Prelude), and we don't aim to support this.
112 This document addresses the first requirement, a formal Core definition,
113 by proposing a formal grammar for an external representation of Core
114 (Section~\ref{sec:external}, and
115 an informal semantics (Section~\ref{sec:informal}.
117 Beginning in GHC5.02, external Core (post-optimization) adhering to this definition
118 can be generated using the compiler flag @-fext-core@.
120 Formal static and dynamic semantics in the form of an executable typechecker and interpreter
121 are available separately in the GHC source tree under @fptools/ghc/utils/ext-core@.
123 \section{External Grammar of Core}
126 In designing the external grammar, we have tried to strike a balance among
127 a number of competing goals, including easy parseability by machines,
128 easy readability by humans, and adequate structural simplicity to
129 allow straightforward presentations of the semantics. This has inevitably
130 led to certain compromise. In particular:
133 \item In order to avoid explosion of parentheses, various standard precedences
134 and short-cuts are supported for expressions, types, and kinds; this led to the introduction
135 of multiple non-terminals for each of these syntactic categories, which
136 makes the concrete grammar longer and more complex than the underlying abstract syntax.
138 \item On the other hand, the grammar has been kept simpler by avoiding special syntax for
139 tuple types and terms; tuples (both boxed and unboxed) are treated
140 as ordinary constructors.
142 \item All type abstractions and applications are given in full, even though
143 some of them (e.g., for tuples) could be reconstructed; this permits Core to
144 be parsed without the necessity of performing any type reconstruction.
146 \item The syntax of identifiers is heavily restricted (essentially to just
147 alphanumerics); this again makes Core easier to parse but harder to read.
150 \workingnote{These choices are certainly debatable. In particular, keeping
151 type applications on tuples and case arms considerably increases the size of core files and
152 makes them less human-readable, though it allows a Core parser to be simpler.}
154 We use the following notational conventions for syntax:
157 {\it [ pat ]} & optional \\
158 {\it \{ pat \}} & zero or more repetitions \\
159 {\it \{ pat \}$^{+}$} & one or more repetitions \\
160 {\it pat$_1$ \orderives\ pat$_2$} & choice \\
161 @fibonacci@ & terminal syntax in typewriter font \\
165 \begin{tabular}{lrclr}
166 {\rm Module} & module & \derives &
167 \multicolumn{2}{l}{@\%module@ mident \many{tdef @;@} \many{\optional{@\%local@} vdefg @;@}} \\
169 {\rm Type defn.} & tdef & \derives & @%data@ qtycon \many{tbind} @=@ @{@ cdef \many{@;@ cdef} @}@ & {\rm algebraic type}\\
170 & & \orderives & @%newtype@ qtycon \many{tbind} \optional{@=@ ty} & {\rm newtype} \\
172 {\rm Constr. defn.} & cdef & \derives & qdcon \many{@\at@ tbind} \many{aty} \\
174 {\rm Value defn.} & vdefg & \derives & @%rec@ @{@ vdef \many{@;@ vdef} @}@ & {\rm recursive} \\
175 & & \orderives & vdef & {\rm non-recursive} \\
176 & vdef & \derives & qvar @::@ ty @=@ exp & \\
178 {\rm Atomic expr.} & aexp & \derives & qvar & {\rm variable} \\
179 & & \orderives & qdcon & {\rm data constructor}\\
180 & & \orderives & lit & {\rm literal} \\
181 & & \orderives & @(@ exp @)@ & {\rm nested expr.}\\
183 {\rm Expression} & exp & \derives & aexp & {\rm atomic expresion}\\
184 & & \orderives & aexp \oneormore{arg} & {\rm application}\\
185 & & \orderives & @\@ \oneormore{binder} @->@ exp & {\rm abstraction}\\
186 & & \orderives & @%let@ vdefg @%in@ exp & {\rm local definition}\\
187 & & \orderives & @%case@ exp @%of@ vbind @{@ alt \many{@;@ alt} @}@ & {\rm case expression}\\
188 & & \orderives & @%coerce@ aty exp & {\rm type coercion}\\
189 & & \orderives & @%note@ @"@ \many{char} @"@ exp & {\rm expression note}\\
190 & & \orderives & @%external@ @"@ \many{char} @"@ aty & {\rm external reference}\\
192 {\rm Argument} & arg & \derives & \at\ aty & {\rm type argument}\\
193 & & \orderives & aexp & {\rm value argument} \\
195 {\rm Case alt.} & alt & \derives & qdcon \many {@\at@ tbind} \many{vbind} @->@ exp &{\rm constructor alternative}\\
196 & & \orderives & lit @->@ exp & {\rm literal alternative} \\
197 & & \orderives & @%_@ @->@ exp & {\rm default alternative} \\
199 {\rm Binder} & binder & \derives & \at\ tbind & {\rm type binder}\\
200 & & \orderives & vbind & {\rm value binder}\\
202 {\rm Type binder} & tbind & \derives & tyvar & {\rm implicitly of kind @*@} \\
203 & & \orderives & @(@ tyvar @::@ kind @)@ & {\rm explicitly kinded} \\
205 {\rm Value binder} & vbind & \derives & @(@ var @::@ ty @)@ \\
207 {\rm Literal} & lit & \derives & @(@ [@-@] \oneormore{digit} @::@ ty @)@ & {\rm integer} \\
208 & & \orderives & @(@ [@-@] \oneormore{digit} @.@ \oneormore{digit} @::@ ty @)@ & {\rm rational} \\
209 & & \orderives & @(@ @'@ char @'@ @::@ ty @)@ & {\rm character} \\
210 & & \orderives & @(@ @"@ \many{char} @"@ @::@ ty @)@ & {\rm string} \\
212 {\rm Character} & char & \derives & \multicolumn{2}{l}{any ASCII character in range 0x20-0x7E except 0x22,0x27,0x5c}\\
213 & & \orderives & @\x@ hex hex & {\rm ASCII code escape sequence} \\
214 & hex & \derives & @0@ \orderives \ldots \orderives @9@ \orderives @a@ \orderives \ldots \orderives @f@ \\
217 \begin{tabular}{lrclr}
218 {\rm Atomic type} & aty & \derives & tyvar & {\rm type variable} \\
219 & & \orderives & qtycon & {\rm type constructor}\\
220 & & \orderives & @(@ ty @)@ & {\rm nested type}\\
222 {\rm Basic type} & bty & \derives & aty & {\rm atomic type}\\
223 & & \orderives & bty aty & {\rm type application}\\
225 {\rm Type} & ty & \derives & bty & {\rm basic type}\\
226 & & \orderives & @%forall@ \oneormore{tbind} @.@ ty & {\rm type abstraction}\\
227 & & \orderives & bty @->@ ty & {\rm arrow type construction} \\
229 {\rm Atomic kind} & akind & \derives & @*@ & {\rm lifted kind}\\
230 & & \orderives & @#@ & {\rm unlifted kind}\\
231 & & \orderives & @?@ & {\rm open kind}\\
232 & & \orderives & @(@ kind @)@& {\rm nested kind}\\
234 {\rm Kind} & kind & \derives & akind & {\rm atomic kind}\\
235 & & \orderives & akind @->@ kind & {\rm arrow kind} \\
237 {\rm Identifier} & mident & \derives &uname & {\rm module} \\
238 & tycon & \derives & uname & {\rm type constr.} \\
239 & qtycon & \derives & mident @.@ tycon & {\rm qualified type constr.} \\
240 & tyvar & \derives & lname & {\rm type variable} \\
241 & dcon & \derives & uname & {\rm data constr.} \\
242 & qdcon & \derives & mident @.@ dcon & {\rm qualified data constr.} \\
243 & var & \derives & lname & {\rm variable} \\
244 & qvar & \derives & [ mident @.@ ] var & {\rm optionally qualified variable} \\
246 {\rm Name} & lname & \derives & lower \many{namechar} \\
247 & uname & \derives & upper \many{namechar} & \\
248 & namechar & \derives & lower \orderives\ upper \orderives\ digit \orderives\ @'@ \\
249 & lower & \derives & @a@ \orderives\ @b@ \orderives\ \ldots \orderives\ @z@ \orderives\ @_@ \\
250 & upper & \derives & @A@ \orderives\ @B@ \orderives\ \ldots \orderives\ @Z@ \\
251 & digit & \derives & @0@ \orderives\ @1@ \orderives\ \ldots \orderives\ @9@ \\
256 \workingnote{Should add some provision for comments.}
258 \section{Informal Semantics}
261 Core resembles a explicitly-typed polymorphic lambda calculus (F$_\omega$), with the addition
262 of local @let@ bindings, algebraic type definitions, constructors, and @case@ expressions,
263 and primitive types, literals and operators.
264 It is hoped that this makes it easy to obtain an informal understanding of Core programs
265 without elaborate description. This section therefore concentrates on the less obvious points.
267 \subsection{Program Organization and Modules}
269 Core programs are organized into {\em modules}, corresponding directly to source-level Haskell modules.
270 Each module has a identifying name {\it mident}.
272 Each module may contain the following kinds of top-level declarations:
274 \item Algebraic data type declarations, each defining a type constructor and one or more data constructors;
275 \item Newtype declarations, corresponding to Haskell @newtype@ declarations, each defining a type constructor; and
276 \item Value declarations, defining the types and values of top-level variables.
278 No type constructor, data constructor, or top-level value may be declared more than once within a given module.
279 All the type declarations are (potentially) mutually recursive. Value declarations must be
280 in dependency order, with explicit grouping of mutually recursive declarations.
282 Identifiers defined in top-level declarations may be {\it external} or {\it internal}.
283 External identifiers can be referenced from any other module in
284 the program, using conventional dot notation (e.g., @PrelBase.Bool@, @PrelBase.True@).
285 Internal identifiers are visible only within the defining module.
286 All type and data constructors are external, and are always defined and referenced using
287 fully qualified names (with dots). A top-level value is external if it is defined and referenced
288 using a fully qualified name with a dot (e.g., @MyModule.foo = ...@); otherwise, it is internal
290 Note that the notion of external identifier does not necessarily coincide with that of ``exported''
291 identifier in a Haskell source module: all constructors are external, even if not exported, and
292 non-exported values may be external if they are referenced from potentially in-lineable exported values.
293 Core modules have no explicit import or export lists.
294 Modules may be mutually recursive.
296 \workingnote{But in the presence of inter-module recursion, is there much point in
297 keeping track of recursive groups within modules? Options: (1) don't worry about it;
298 (2) put all declarations in module (indeed whole program) into one huge recursive pot;
299 (3) abandon general module recursion, and introduce some kind of import declaration to define the
300 types (only) of things from external modules that currently introduce module recursion.}
302 There is also an implicitly-defined module @PrelGHC@, which exports the ``built-in'' types and values
303 that must be provided by any implementation of Core (including GHC). Details of this
304 module are in Section~\ref{sec:prims}.
306 A Core {\em program} is a collection of distinctly-named modules that includes a module
307 called @Main@ having an exported value called @main@ of type @PrelIOBase.IO a@ (for some type @a@).
309 Many modules of interest derive from library modules, such as @PrelBase@, which implement parts of
310 the Haskell basis library. In principle, these modules have no special status. In practice, the
311 requirement on the type of @Main.main@ implies that every program will contain a large subset of
312 the Prelude library modules.
314 \subsection{Namespaces}
316 There are five distinct name spaces:
318 \item module identifiers (@mident@),
319 \item type constructors (@tycon@),
320 \item type variables (@tyvar@),
321 \item data constructors (@dcon@),
322 \item term variables (@var@).
324 Spaces (1), (2+3), and (4+5) can be distinguished from each other by context.
325 To distinguish (2) from (3) and (4) from (5), we require that
326 (both sorts of) constructors begin with an upper-case character
327 and that (both sorts of) variables begin with a lower-case character (or @_@).
328 Primitive types and operators are not syntactically distinguished.
330 A given variable (type or term) may have multiple (local) definitions within a module.
331 However, definitions never ``shadow'' one another; that is, the scope of the definition
332 of a given variable never contains a redefinition of the same variable. The only exception
333 to this is that (necessarily closed) types labelling @%external@ expressions may contain
334 @tyvar@ bindings that shadow outer bindings.
336 Core generated by GHC makes heavy use of encoded names, in which the characters @Z@ and @z@ are
337 used to introduce escape sequences for non-alphabetic characters such as dollar sign @$@ (@zd@),
338 hash @#@ (@zh@), plus @+@ (@zp@), etc. This is the same encoding used in @.hi@ files and in the
339 back-end of GHC itself, except that we sometimes change an initial @z@ to @Z@, or vice-versa,
340 in order to maintain case distinctions.
342 \subsection{Types and Kinds}
344 In Core, all type abstractions and applications are explicit. This make it easy to
345 typecheck any (closed) fragment. An full executable typechecker is available separately.
347 Types are described by type expressions, which
348 are built from named type constructors and type variables
349 using type application and universal quantification.
350 Each type constructor has a fixed arity $\geq 0$.
351 Because it is so widely used, there is
352 special infix syntax for the fully-applied function type constructor (@->@).
353 (The prefix identifier for this constructor is @PrelGHC.ZLzmzgZR@; this should
354 only appear in unapplied or partially applied form.)
355 There are also a number of other primitive type constructors (e.g., @Intzh@) that
356 are predefined in the @PrelGHC@ module, but have no special syntax.
357 Additional type constructors are
358 introduced by @%data@ and @%newtype@ declarations, as described below.
359 Type constructors are distinguished solely by name.
361 As described in the Haskell definition, it is necessary to distinguish
362 well-formed type-expressions by classifying them into different {\it kinds}.
363 In particular, Core explicitly records the kind of every bound type variable.
364 Base kinds (@*@,@#@, and @?@) represent actual types, i.e., those that can be assigned
365 to term variables; all the nullary type constructors have one of these kinds.
366 Non-nullary type constructors have higher kinds of the form $k_1 @->@ k_2$,
367 where $k_1$ and $k_2$ are kinds. For example, the function type constructor
368 @->@ has kind @* -> (* -> *)@. Since Haskell allows abstracting over type
369 constructors, it is possible for type variables to have higher kinds; however,
370 it is much more common for them to have kind @*@, so this is the default if
371 the kind is omitted in a type binder.
373 The three base kinds distinguish the {\it liftedness} of the types they classify:
374 @*@ represents lifted types; @#@ represents unlifted types; and @?@ represents
375 ``open'' types, which may be either lifted or unlifted. Of these, only @*@ ever
376 appears in Core code generated from user code; the other two are needed to describe
377 certain types in primitive (or otherwise specially-generated) code.
378 Semantically, a type is lifted if and only if it has bottom as an element.
379 Operationally, lifted types may be represented by closures; hence, any unboxed
380 value is necessarily unlifted.
381 In particular, no top-level identifier (except in @PrelGHC@) has a type of kind @#@ or @?@.
382 Currently, all the primitive types are unlifted
383 (including a few boxed primitive types such as @ByteArrayzh@).
384 The ideas behind the use of unboxed and unlifted types are described in ~\cite{pj:unboxed}.
386 There is no mechanism for defining type synonyms (corresponding to
387 Haskell @type@ declarations).
388 Type equivalence is just syntactic equivalence on type expressions
389 (of base kinds) modulo:
392 \item alpha-renaming of variables bound in @%forall@ types;
393 \item the identity $a$ @->@ $b$ $\equiv$ @PrelGHC.ZLzmzgZR@ $a$ $b$
394 \item the substitution of representation types for {\it fully applied} instances of newtypes
395 (see Section~\ref{sec:newtypes}).
398 \subsection{Algebraic data types}
400 Each @data@ declaration introduces a new type constructor and a set of one or
401 more data constructors, normally corresponding directly to a source Haskell @data@ declaration.
402 For example, the source declaration
405 Fork (Bintree a) (Bintree a)
408 might induce the following Core declaration
411 Fork (Bintree a) (Bintree a);
414 which introduces the unary type constructor @Bintree@ of kind @*->*@ and two data constructors with types
416 Fork :: %forall a . Bintree a -> Bintree a -> Bintree a
417 Leaf :: %forall a . a -> Bintree a
419 We define the {\it arity} of each data constructor to be the number of value arguments it takes;
420 e.g. @Fork@ has arity 2 and @Leaf@ has arity 1.
422 For a less conventional example illustrating the possibility of higher-order kinds, the Haskell source declaration
424 data A f a = MkA (f a)
426 might induce the core declaration
428 %data A (f::*->*) (a::*) = { MkA (f a) }
430 which introduces the constructor
432 MkA :: %forall (f::*->*) (a::*) . (f a) -> (A f) a
436 GHC (like some other Haskell implementations) supports an extension to Haskell98
437 for existential types such as
439 data T = forall a . MkT a (a -> Bool)
441 This is represented by the Core declaration
443 %data T = {MkT @a a (a -> Bool)}
445 which introduces the nullary type constructor @T@ and the data constructor
447 MkT :: %forall a . a -> (a -> Bool) -> T
449 In general, existentially quantified variables appear as extra univerally
450 quantified variables in the data contructor types.
451 An example of how to construct and deconstruct values of type @T@ is shown in
452 Section~\ref{sec:exprs}.
454 \subsection{Newtypes}
458 Each Core @%newtype@ declaration introduces a new type constructor and (usually) an associated
459 representation type, corresponding to a source Haskell @newtype@
460 declaration. However, unlike in source Haskell, no data constructors are introduced.
461 In fact, newtypes seldom appear in value types
462 in Core programs, because GHC usually replaces them with their representation type.
463 For example, the Haskell fragment
470 might induce the Core fragment
478 The main purpose of including @%newtype@ declarations in Core is to permit checking of
479 type expressions in which partially-applied newtype constructors are used to instantiate higher-kinded
480 type variables. For example:
482 newtype W a = MkW (Bool -> a)
483 data S k = MkS (k Bool)
484 a :: S W = MkS (MkW(\x -> not x))
486 might generate this Core:
488 %newtype W a = Bool -> a;
489 %data S (k::(*->*)) = MkS (k Bool);
490 a :: S W = MkS @ W (\(x::Bool) -> not x)
492 The type application @(S W)@ cannot be checked without a definition for @W@.
494 Very rarely, source @newtype@ declarations may be (directly or indirectly) recursive. In such
495 cases, it is not possible to subsitute the representation type for the new type;
496 in fact, the representation type is omitted from the corresponding Core @%newtype@ declaration.
498 type can only be created or examined by first explicitly coercing them from/to
499 the representation type, using a @%coerce@ expression. For example, the silly
502 newtype U = MkU (U -> Bool)
507 might induce the Core fragment
510 u :: U = %coerce U (\ (x::U) -> True);
512 %let f :: U -> Bool = %coerce (U -> Bool) u
516 \workingnote{The treatment of newtypes is still very unattractive: acres of explanation for
517 very rare phenomena.}
519 \subsection{Expression Forms}
522 Variables and data constructors are straightforward.
524 Literal ({\it lit}) expressions consist of a literal value, in one of four different formats,
525 and a (primitive) type annotation. Only certain combinations of format and type
526 are permitted; see Section~\ref{sec:prims}. The character and string formats can describe only
527 8-bit ASCII characters. Moreover, because strings are interpreted as C-style null-terminated
528 strings, they should not contain embedded nulls.
530 Both value applications and type applications are made explicit, and similarly
531 for value and type abstractions. To tell them apart, type arguments in applications
532 and formal type arguments in abstractions are preceded by an \at\ symbol. (In abstractions,
533 the \at\ plays essentially the same role as the more usual $\Lambda$ symbol.)
534 For example, the Haskell source declaration
538 might induce the Core declaration
540 f :: %forall a . a -> BinTree (BinTree a) =
541 \ @a (x::a) -> Leaf @(Bintree a) (Leaf @a x)
544 Value applications may be of user-defined functions, data constructors, or primitives.
545 None of these sorts of applications are necessarily saturated (although previously published variants
546 of Core did require the latter two sorts to be).
548 Note that the arguments of type applications are not always of kind @*@. For example,
549 given our previous definition of type @A@:
551 data A f a = MkA (f a)
559 (MkA @Bintree @Bool) (Leaf @Bool True)
562 Local bindings, of a single variable or of a set of mutually recursive variables,
563 are represented by @%let@ expressions in the usual way.
565 By far the most complicated expression form is @%case@.
566 @%case@ expressions are permitted over values of any type, although they will normally
567 be algebraic or primitive types (with literal values).
568 Evaluating a @%case@ forces the evaluation of the expression being
569 tested (the ``scrutinee''). The value of the scrutinee is bound to the variable
570 following the @%of@ keyword, which is in scope in all alternatives;
571 this is useful when the scrutinee is a non-atomic
572 expression (see next example).
574 In an algebraic @%case@, all the case alternatives must be
575 labeled with distinct data constructors from the algebraic type, followed by
576 any existential type variable bindings (see below), and
577 typed term variable bindings corresponding to the data constructor's
578 arguments. The number of variables must match the data constructor's arity.
580 For example, the following Haskell source expression
584 t@(Leaf v) -> Fork t t
586 might induce the Core expression
588 %case g x %of (t::Bintree a)
589 Fork (l::Bintree a) (r::Bintree a) ->
595 When performing a @%case@ over a value of an existentially-quantified algebraic
596 type, the alternative must include extra local type bindings
597 for the existentially-quantified variables. For example, given
599 data T = forall a . MkT a (a -> Bool)
609 MkT @b (w::b) (g::b->Bool) -> g w
612 In a @%case@ over literal alternatives,
613 all the case alternatives must be distinct literals of the same primitive type.
615 The list of alternatives may begin with a
616 default alternative labeled with an underscore (@%_@), which will be chosen if
617 none of the other alternative match. The default is optional except for a case
618 over a primitive type, or when there are no other alternatives.
619 If the case is over neither an
620 algebraic type nor a primitive type, the default alternative is the {\it only}
622 For algebraic cases, the set of alternatives
623 need not be exhaustive, even if no default is given; if alternatives are missing,
624 this implies that GHC has deduced that they cannot occur.
626 The @%coerce@ expression is primarily used in conjunction with manipulation of
627 newtypes, as described in Section~\ref{sec:newtypes}.
628 However, @%coerce@ is sometimes used for
629 other purposes, e.g. to coerce the return type of a function (such as @error@)
630 that is guaranteed never to return. By their natures, uses of @%coerce@ cannot
631 be independently justified, and must be taken on faith by a type-checker for Core.
633 A @%note@ expression is used to carry arbitrary internal information of interest to
634 GHC. The information must be encoded as a string. Expression notes currently generated by GHC
635 include the inlining pragma (@InlineMe@) and cost-center labels for profiling.
637 A @%external@ expression denotes an external identifier, which has
638 the indicated type (always expressed in terms of Haskell primitive types).
639 \workingnote{The present syntax is sufficient for describing C functions and labels.
640 Interfacing to other languages may require additional information or a different interpretation
644 \subsection{Expression Evaluation}
646 The dynamic semantics of Core are defined on the type-erasure of the program;
647 ie. we ignore all type abstractions and applications. The denotational semantics
648 the resulting type-free program are just the conventional ones for a call-by-name
649 language, in which expressions are only evaluated on demand.
650 But Core is intended to be a call-by-{\it{need}} language, in which
651 expressions are only evaluated {\it once}. To express the sharing behavior
652 of call-by-need, we give an operational model in the style of Launchbury.
653 This section describes the model informally; a more formal semantics is
654 separately available in the form of an executable interpreter.
656 To simplify the semantics, we consider only ``well-behaved'' Core programs in which
657 constructor and primitive applications are fully saturated, and in which
658 non-trivial expresssions of unlifted kind (@#@) appear only as scrutinees
659 in @%case@ expressions. Any program can easily be put into this form;
660 a separately available executable preprocessor illustrates how.
661 In the remainder of this section, we use ``Core'' to mean ``well-behaved'' Core.
663 Evaluating a Core expression means reducing it to {\it weak-head normal form (WHNF)},
664 i.e., a primitive value, lambda abstraction, or fully-applied data constructor.
665 Evaluation of a program is evaluation of the expression @Main.main@.
667 To make sure that expression evaluation is shared, we
668 make use of a {\it heap}, which can contain
670 \item {\em Thunks} representing suspended (i.e., as yet unevaluated) expressions.
672 \item {\em WHNF}s representing the result of evaluating such thunks. Computations over
673 primitive types are never suspended, so these results are always closures (representing
674 lambda abstractions) or data constructions.
676 Thunks are allocated when it
677 is necessary to suspend a computation whose result may be shared.
678 This occurs when evaluating three different kinds of expressions:
680 \item Value definitions at top-level or within a local @let@ expression.
681 Here, the defining expressions are suspended and the defined names
682 are bound to heap pointers to the suspensions.
684 \item User function applications. Here, the actual argument expression is
685 suspended and the formal argument is bound to a heap pointer to the suspension.
687 \item Constructor applications. Here, the actual argument expression is
688 suspended and a heap pointer to the suspension is embedded in the constructed value.
691 As computation proceeds, copies of the heap pointer propagate.
692 When the computation is eventually forced, the heap entry is overwritten with the resulting
693 WHNF, so all copies of the pointer now point to this WHNF. Forcing occurs
694 only in the context of
696 \item evaluating the operator expression of an application;
698 \item evaluating the ``scrutinee'' of a @case@ expression; or
700 \item evaluating an argument to a primitive or external function application
703 Ultimately, if there are no remaining pointers to the heap entry (whether suspended or evaluated),
704 the entry can be garbage-collected; this is assumed to happen implicitly.
706 With the exception of functions, arrays, and mutable variables, the intention is that values of all primitive types
707 should be held {\it unboxed}, i.e., not heap-allocated. This causes no problems for laziness because all
708 primitive types are {\it unlifted}. Unboxed tuple types are not heap-allocated either.
710 Certain primitives and @%external@ functions cause side-effects to state threads or to the real world.
711 Where the ordering of these side-effects matters, Core already forces this order
712 by means of data dependencies on the psuedo-values representing the threads.
714 The @raisezh@ and @handlezh@ primitives requires special support in an implementation, such as a handler stack;
715 again, real-world threading guarantees that they will execute in the correct order.
717 \section{Primitive Module}
720 This section describes the contents and informal semantics of the primitive module @PrimGHC@.
721 Nearly all the primitives are required in order to cover GHC's implementation of the Haskell98
722 standard prelude; the only operators that can be completely omitted are those supporting the byte-code interpreter,
723 parallelism, and foreign objects. Some of the concurrency primitives are needed, but can be
724 given degenerate implementations if it desired to target a purely sequential backend; see Section~\ref{sec:sequential}.
726 In addition to these primitives, a large number of C library functions are required to implement
727 the full standard Prelude, particularly to handle I/O and arithmetic on less usual types.
728 % We list these separately in section~\ref{sec:ccalls}.
732 \begin{tabular}{|l|l|l|}
734 Type & Kind & Description \\
736 @ZLzmzgZR@ & @* -> * -> *@ & functions (@->@) \\
737 @Z1H@ & @? -> #@ & unboxed 1-tuple \\
738 @Z2H@ & @? -> ? -> #@ & unboxed 2-tuple \\
739 \ldots & \ldots & \ldots \\
740 @Z100H@ & @? -> ? -> ? -> ... -> ? -> #@ & unboxed 100-tuple \\
741 @Addrzh@ & @#@ & machine address (pointer) \\
742 @Charzh@ & @#@ & unicode character (31 bits) \\
743 @Doublezh@ & @#@ & double-precision float \\
744 @Floatzh@ & @#@ & float \\
745 @Intzh@ & @#@ & int (30+ bits) \\
746 @Int32zh@ & @#@ & int (32 bits) \\
747 @Int64zh@ & @#@ & int (64 bits) \\
748 @Wordzh@ & @#@ & unsigned word (30+ bits) \\
749 @Word32zh@ & @#@ & unsigned word (32 bits) \\
750 @Word64zh@ & @#@ & unsigned word (64 bits) \\
751 @RealWorld@ & @*@ & pseudo-type for real world state \\
752 @Statezh@ & @* -> #@ & mutable state \\
753 @Arrayzh@ & @* -> #@ & immutable arrays \\
754 @ByteArrayzh@ & @#@ & immutable byte arrays \\
755 @MutableArrayzh@ & @* -> * -> #@ & mutable arrays \\
756 @MutableByteArrayzh@ & @* -> #@ & mutable byte arrays \\
757 @MutVarzh@ & @* -> * -> #@ & mutable variables \\
758 @MVarzh@ & @* -> * -> #@ & synchronized mutable variables \\
759 @Weakzh@ & @* -> #@ & weak pointers \\
760 @StablePtrzh@ & @* -> #@ & stable pointers \\
761 @ForeignObjzh@ & @#@ & foreign object \\
762 @ThreadIdzh@ & @#@ & thread id \\
763 @ZCTCCallable@ & @? -> *@ & dictionaries for CCallable pseudo-class \\
764 @ZCTCReturnable@ & @? -> *@ & dictionaries for CReturnable pseudo-class \\
768 In addition, the types @PrelBase.Bool@ and @PrelBase.Unit@, which are non-primitive
769 and are defined as ordinary algebraic types in module @PrelBase@, are used in
770 the types of some operators in @PrelGHC@.
772 The unboxed tuple types are quite special: they hold sets of values in an unlifted
773 context, i.e., to be manipulated directly rather than being stored in the heap. They can only
774 appear in limited contexts in programs; in particular, they cannot be bound by a
775 lambda abstraction or case alternative pattern. Note that they can hold either lifted
776 or unlifted values. The limitation to 100-tuples is an arbitrary one set by GHC.
778 The type of arbitrary precision integers (@Integer@) is not primitive; it is made
779 up of an ordinary primitive integer (@Intzh@) and a byte array (@ByteArrzh@).
780 The components of an @Integer@ are passed to primitive operators as two separate
781 arguments and returned as an unboxed pair.
783 The @Statezh@ type constructor takes a dummy type argument that is used only
784 to distinguish different state {\it threads}~\cite{Launchbury94}.
785 The @RealWorld@ type is used only as an argument to @Statezh@, and represents
786 the thread of real-world state; it contains just the single value @realWorldzh@.
787 The mutable data types @MutableArrayzh@,@MutableByteArrayzh@,@MutVarzh@
788 take an initial type argument of the form @(Statezh@ $t$@)@ for some thread $t$.
789 The synchronized mutable variable type constructor @MVarzh@ always takes an argument of type
792 @Weakzh@ is the type of weak pointers.
794 @StablePtrzh@ is the type of stable pointers, which are guaranteed not to move
795 during garbage collections; these are useful in connection with foreign functions.
797 @ForeignPtrzh@ is the type of foreign pointers.
799 The dictionary types @ZCTCCallable@ and @ZCTCReturnable@ are just placeholders
800 which can be represented by a void type;
801 any code they appear in should be unreachable.
803 \subsubsection{Non-concurrent Back End}
804 \label{sec:sequential}
806 The Haskell98 standard prelude doesn't include any concurrency support, but GHC's
807 implementation of it relies on the existence of some concurrency primitives. However,
808 it never actually forks multiple threads. Hence, the concurrency primitives can
809 be given degenerate implementations that will work in a non-concurrent setting,
812 \item @ThreadIdzh@ can be represented
813 by a singleton type, whose (unique) value is returned by @myThreadIdzh@.
815 \item @forkzh@ can just die with an ``unimplemented'' message.
817 \item @killThreadzh@ and @yieldzh@ can also just die ``unimplemented'' since
818 in a one-thread world, the only thread a thread can kill is itself, and
819 if a thread yields the program hangs.
821 \item @MVarzh a@ can be represented by @MutVarzh (Maybe a)@;
822 where a concurrent implementation would block, the sequential implementation can
823 just die with a suitable message (since no other thread exists to unblock it).
825 \item @waitReadzh@ and @waitWritezh@ can be implemented using a @select@ with no timeout.
828 \subsection{Literals}
830 Only the following combination of literal forms and types are permitted:
832 \begin{tabular}{|l|l|l|}
834 Literal form & Type & Description \\
836 integer & @Intzh@ & Int \\
837 % & @Int32zh@ & Int32 \\
838 % & @Int64zh@ & Int64 \\
840 % & @Word32zh@ & Word32 \\
841 % & @Word64zh@ & Word64 \\
842 & @Addrzh@ & Address \\
843 & @Charzh@ & Unicode character code \\
844 rational & @Floatzh@ & Float \\
845 & @Doublezh@ & Double \\
846 character & @Charzh@ & Unicode character specified by ASCII character\\
847 string & @Addrzh@ & Address of specified C-format string \\
851 \subsection{Data Constructors}
853 The only primitive data constructors are for unboxed tuples:
855 \begin{tabular}{|l|l|l|}
857 Constructor & Type & Description \\
859 @ZdwZ1H@ & @%forall (a::?).a -> Z1H a@ & unboxed 1-tuple \\
860 @ZdwZ2H@ & @%forall (a1::?) (a2::?).a1 -> a2 -> Z2H a1 a2@ & unboxed 2-tuple \\
861 \ldots & \ldots & \ldots \\
862 @ZdwZ100H@ & @%forall (a1::?) (a2::?)... (a100::?) .@ & \\
863 & \ \ \ @a1 -> a2 -> ... -> a100 -> Z100H a1 a2 ... a100@ & unboxed 100-tuple \\
869 Operators are (roughly) divided into collections according to the primary
870 type on which they operate.
872 \workingnote{How do primitives fail, e.g., on division by zero or
873 attempting an invalid narrowing coercion?}
875 \workingnote{The following primop descriptions are automatically generated.
876 The exact set of primops and their types presented here
877 depends on the underlying word size at the time of generation; these
878 were done for 32 bit words. This is a bit stupid.
879 More importantly, the word size has a big impact on just what gets produced
880 in a Core file, but this isn't documented anywhere in the file itself.
881 Perhaps there should be a global flag in the file?}
883 \newcommand{\primoptions}[7]{{#1} {#2} {#3} {#4} {#5}}
885 \newcommand{\primopsection}[2]{\subsubsection{#1}{#2}\vspace*{0.1in}}
886 \newcommand{\primopdefaults}[1]{Unless otherwise noted, each primop has the following default characteristics: {#1}}
888 \newcommand{\primopdesc}[8]{
889 \par\noindent{\texttt{{{#3} :: {#6}}}}
894 \subsubsection{RealWorld}
896 There is just one value of type @RealWorld@, namely @realWorldzh@. It is used
897 only for dependency threading of side-effecting operations.
899 \begin{thebibliography}{}
901 \bibitem[Launchbury and {Peyton~Jones}, 1994]{Launchbury94}
902 Launchbury, J. and {Peyton~Jones}, S. (1994).
903 \newblock Lazy functional state threads.
904 \newblock Technical report FP-94-05, Department of Computing Science,
905 University of Glasgow.
907 \bibitem[{Peyton~Jones} and Launchbury, 1991]{pj:unboxed}
908 {Peyton~Jones}, S. and Launchbury, J. (1991).
909 \newblock Unboxed values as first class citizens.
910 \newblock In {\em ACM Conference on Functional Programming and Computer
911 Architecture (FPCA'91)}, pages 636--666, Boston. ACM.
913 \bibitem[{Peyton~Jones} and Marlow, 1999]{ghc-inliner}
914 {Peyton~Jones}, S. and Marlow, S. (1999).
915 \newblock Secrets of the {Glasgow Haskell Compiler} inliner.
916 \newblock In {\em Workshop on Implementing Declarative Languages}, Paris,
919 \bibitem[Peyton~Jones and Santos, 1998]{comp-by-trans-scp}
920 Peyton~Jones, S. and Santos, A. (1998).
921 \newblock A transformation-based optimiser for {Haskell}.
922 \newblock {\em Science of Computer Programming}, 32(1-3):3--47.
924 \end{thebibliography}