1 %************************************************************************
3 \subsection{The front end of the compiler}
6 %************************************************************************
8 The previous section covered the main points about the front end of
9 the compiler: it is dominated by a ``renamer'' and a typechecker
10 working directly at the Haskell source level. In this section, we
11 will look at some basic datatypes used or introduced in the front
12 end---ones that you may see as input to your optimisation pass.
14 \subsubsection{``Abstract syntax'', a source-level datatype}
17 As Figure~\ref{fig:overview} shows, the typechecker both reads and
18 writes a collection of datatypes we call ``Abstract syntax.''
19 This is misleading in that what
20 goes into the typechecker is quite different from what comes out.
22 Let's first consider this fragment of the abstract-syntax
23 definition,\srcloc{abstractSyn/HsExpr.lhs} for Haskell explicit-list
24 expressions (Haskell report, section~3.5
25 \cite{hudak91a}):\nopagebreak[4]
29 | ExplicitList [Expr var pat]
30 | ExplicitListOut UniType [Expr var pat]
33 type ProtoNameExpr = Expr ProtoName ProtoNamePat
34 type RenamedExpr = Expr Name RenamedPat
35 type TypecheckedExpr = Expr Id TypecheckedPat
37 an @ExplicitList@ appears only in typechecker input; an @ExplicitListOut@
38 is the corresponding construct that appears
39 only in the output, with the inferred type information attached.
41 The fragment above also shows how abstract syntax is {\em parameterised}
42 with respect to variables and patterns. The idea is the same for
43 both; let's just consider variables.
45 The renamer converts @ProtoNameExprs@ (expressions with
46 @ProtoNames@\srcloc{basicTypes/ProtoName.lhs} as variables---little
47 more than just strings) into @RenamedExprs@, which have all naming sorted out
48 (using @Names@\srcloc{abstractSyn/Name.lhs}). A @Name@ is known to be
49 properly bound, isn't duplicated, etc.; it's known if it's bound to a
50 built-in standard-prelude entity.
52 (The renamer also does dependency analysis, which is required to
53 maximise polymorphism in a Hindley-Milner type system.)
55 The typechecker reads the @RenamedExprs@, sorts out the types, and
56 spits out @TypecheckedExprs@, with variables represented by
57 @Ids@\srcloc{basicTypes/Id.lhs}. You can find out just about anything
58 you want to know about a variable from its @Id@.
60 To see what GHC makes of some Haskell, in a file @Foo.hs@, say:
61 try @ghc -noC -ddump-rn4 Foo.hs@, to see what comes out of the renamer [pass~4];
62 try @ghc -noC -ddump-tc Foo.hs@, to see what comes out of the typechecker.
64 \subsubsection{Basic datatypes in the compiler}
66 None of the internal datatypes in the example just given are
67 particularly interesting except @Ids@.\srcloc{basicTypes/Id.lhs} A
68 program variable, which enters the typechecker as a string, emerges as
71 The important thing about @Id@---and the datatypes representing other
72 basic program entities (type variables, type constructors, classes,
73 etc.)---is that they often include {\em memoised} information that can
74 be used throughout the rest of the compiler.
76 Let us take a cursory look at @Ids@, as a representative example of
77 these basic data types. (Don't be too scared---@Ids@ are the hairiest
78 entities in the whole compiler!)
80 \begin{tightcode}\codeallowbreaks{}
82 = Id Unique {\dcd\rm-- key for fast comparison}
83 UniType {\dcd\rm-- Id's type; used all the time;}
84 IdInfo {\dcd\rm-- non-essential info about this Id;}
85 PragmaInfo {\dcd\rm-- user-specified pragmas about this Id;}
86 IdDetails {\dcd\rm-- stuff about individual kinds of Ids.}
89 So, every @Id@ comes with:
92 A @Unique@,\srcloc{basicTypes/Unique.lhs} essentially a unique
93 @Int@, for fast comparison;
95 A @UniType@ (more on them below... section~\ref{sec:UniType}) giving the variable's
96 type---this is the most useful memoised information.
98 A @PragmaInfo@, which is pragmatic stuff the user specified for
99 this @Id@; e.g., @INLINE@ it; GHC does not promise to honour these
100 pragma requests, but this is where it keeps track of them.
102 An @IdInfo@ (more on {\em them} below... section~\ref{sec:IdInfo}),
103 which tells you all the extra things
104 that the compiler doesn't {\em have} to know about an @Id@, but it's jolly nice...
105 This corresponds pretty closely to the @GHC_PRAGMA@ cross-module info that you will
106 see in any interface produced using @ghc -O@.
107 An example of some @IdInfo@
108 would be: that @Id@'s unfolding; or its arity.
111 Then the fun begins with @IdDetails@...
112 \begin{tightcode}\codeallowbreaks{}
115 {\dcd\rm---------------- Local values}
117 = LocalId ShortName {\dcd\rm-- mentioned by the user}
119 | SysLocalId ShortName {\dcd\rm-- made up by the compiler}
121 {\dcd\rm---------------- Global values}
123 | ImportedId FullName {\dcd\rm-- Id imported from an interface}
125 | PreludeId FullName {\dcd\rm-- Prelude things the compiler ``knows'' about}
127 | TopLevId FullName {\dcd\rm-- Top-level in the orig source pgm}
128 {\dcd\rm-- (not moved there by transformations).}
130 {\dcd\rm---------------- Data constructors}
134 [TyVarTemplate] ThetaType [UniType] TyCon
135 {\dcd\rm-- split-up type: the constructor's type is:}
136 {\dcd\rm-- $\forall~tyvars . theta\_ty \Rightarrow$}
137 {\dcd\rm-- $unitype_1 \rightarrow~ ... \rightarrow~ unitype_n \rightarrow~ tycon tyvars$}
139 | TupleCon Int {\dcd\rm-- its arity}
141 {\dcd\rm-- There are quite a few more flavours of {\tt IdDetails}...}
144 % A @ShortName@,\srcloc{basicTypes/NameTypes.lhs} which includes a name string
145 % and a source-line location for the name's binding occurrence;
147 In short: everything that later parts of the compiler might want to
148 know about a local @Id@ is readily at hand. The same principle holds
149 true for imported-variable and data-constructor @Ids@ (tuples are an
150 important enough case to merit special pleading), as well as for other
151 basic program entities. Here are a few further notes about the @Id@
155 A @FullName@\srcloc{basicTypes/NameTypes.lhs} is one that may be
156 globally visible, with a module-name as well; it may have been
159 @DataConKey@\srcloc{prelude/PrelUniqs.lhs} is a specialised
160 fast-comparison key for data constructors; there are several of these
163 The @UniType@ for @DataConIds@ is given in {\em two} ways: once, just as
164 a plain type; secondly, split up into its component parts. This is to
165 save frequently re-splitting such types.
167 Similarly, a @TupleCon@ has a type attached, even though we could
168 construct it just from the arity.
171 \subsubsection{@UniTypes@, representing types in the compiler}
174 Let us look further at @UniTypes@.\srcloc{uniType/} Their definition
176 \begin{tightcode}\codeallowbreaks{}
180 | UniFun UniType {\dcd\rm-- function type}
183 | UniData TyCon {\dcd\rm-- non-synonym datatype}
186 | UniSyn TyCon {\dcd\rm-- type synonym}
187 [UniType] {\dcd\rm--\ \ unexpanded form}
188 UniType {\dcd\rm--\ \ expanded form}
190 | UniDict Class {\dcd\rm-- for types with overloading}
193 {\dcd\rm-- The next two are to do with universal quantification.}
194 | UniTyVarTemplate TyVarTemplate
196 | UniForall TyVarTemplate
199 When the typechecker processes a source module, it adds @UniType@
200 information to all the basic entities (e.g., @Ids@), among other
201 places (see Section~\ref{sec:second-order} for more details). These
202 types are used throughout the compiler.
204 The following example shows several things about @UniTypes@.
205 If a programmer wrote @(Eq a) => a -> [a]@, it would be represented
206 as:\footnote{The internal structures of @Ids@,
207 @Classes@, @TyVars@, and @TyCons@ are glossed over here...}
208 \begin{tightcode}\codeallowbreaks{}
209 UniForall {\dcd$\alpha$}
210 (UniFun (UniDict {\dcd\em Eq} (UniTyVar {\dcd$\alpha$}))
211 (UniFun (UniTyVarTemplate {\dcd$\alpha$})
212 (UniData {\dcd\em listTyCon}
213 [UniTyVarTemplate {\dcd$\alpha$}])))
215 From this example we see:
218 The universal quantification of the type variable $\alpha$ is made explicit
219 (with a @UniForall@).
221 The class assertion @(Eq a)@ is represented with a @UniDict@ whose
222 second component is a @UniType@---this
223 reflects the fact that we expect @UniType@ to be used in a stylized
224 way, avoiding nonsensical constructs (e.g.,
225 @(UniDict f (UniDict g (UniDict h ...)))@).
227 The ``double arrow'' (@=>@) of the Haskell source, indicating an
228 overloaded type, is represented by the usual
229 @UniFun@ ``single arrow'' (@->@), again in a stylized way.
230 This change reflects the fact that each class assertion in a
231 function's type is implemented by adding an extra dictionary argument.
233 In keeping with the memoising tradition we saw with @Ids@, type
234 synonyms (@UniSyns@) keep both the unexpanded and expanded forms handy.
237 \subsubsection{@IdInfo@: extra pragmatic info about an @Id@}
240 [New in 0.16.] All the nice-to-have-but-not-essential information
241 about @Ids@ is now hidden in the
242 @IdInfo@\srcloc{basicTypes/IdInfo.lhs} datatype. It looks something
244 \begin{tightcode}\codeallowbreaks{}
246 = NoIdInfo {\dcd\rm-- OK, we know nothing...}
249 ArityInfo {\dcd\rm-- its arity}
250 DemandInfo {\dcd\rm-- whether or not it is definitely demanded}
251 InliningInfo {\dcd\rm-- whether or not we should inline it}
252 SpecialisationInfo {\dcd\rm-- specialisations of this overloaded}
253 {\dcd\rm-- function which exist}
254 StrictnessInfo {\dcd\rm-- strictness properties, notably}
255 {\dcd\rm-- how to conjure up ``worker'' functions}
256 WrapperInfo {\dcd\rm-- ({\em informational only}) if an Id is}
257 {\dcd\rm-- a ``worker,'' this says what Id it's}
258 {\dcd\rm-- a worker for, i.e., ``who is my wrapper''}
259 {\dcd\rm-- (used to match up workers/wrappers)}
260 UnfoldingInfo {\dcd\rm-- its unfolding}
261 UpdateInfo {\dcd\rm-- which args should be updated}
262 SrcLocation {\dcd\rm-- source location of definition}
264 As you can see, we may accumulate a lot of information about an Id!
265 (The types for all the sub-bits are given in the same place...)
267 \subsubsection{Introducing dictionaries for overloading}
269 The major novel feature of the Haskell language is its systematic
270 overloading using {\em type classes}; Wadler and Blott's paper is the
271 standard reference \cite{wadler89a}.
273 To implement type classes, the typechecker not only checks the Haskell
274 source program, it also {\em translates} it---by inserting code to
275 pass around in {\em dictionaries}. These dictionaries
276 are essentially tuples of functions, from which the correct code may
277 be plucked at run-time to give the desired effect. Kevin Hammond
278 wrote and described the first working implementation of type
279 classes \cite{hammond89a}, and the ever-growing static-semantics paper
280 by Peyton Jones and Wadler is replete with the glories of dictionary
281 handling \cite{peyton-jones90a}. (By the way, the typechecker's
282 structure closely follows the static semantics paper; inquirers into
283 the former will become devoted students of the latter.)
285 Much of the abstract-syntax datatypes are given
286 over to output-only translation machinery. Here are a few more
287 fragments of the @Expr@ type, all of which appear only in typechecker
292 | DictLam [DictVar] (Expr var pat)
293 | DictApp (Expr var pat) [DictVar]
294 | Dictionary [DictVar] [Id]
298 You needn't worry about this stuff:
299 After the desugarer gets through with such constructs, there's nothing
300 left but @Ids@, tuples, tupling functions, etc.,---that is, ``plain
301 simple stuff'' that should make the potential optimiser's heart throb.
302 Optimisation passes don't deal with dictionaries explicitly but, in
303 some cases, quite a bit of the code passed through to them will be for