1 %************************************************************************
3 \subsection{The front end of the compiler}
6 %************************************************************************
8 The previous section covered the main points about the front end of
9 the compiler: it is dominated by a ``renamer'' and a typechecker
10 working directly at the Haskell source level. In this section, we
11 will look at some basic datatypes used or introduced in the front
12 end---ones that you may see as input to your optimisation pass.
14 \subsubsection{``Abstract syntax'', a source-level datatype}
17 As Figure~\ref{fig:overview} shows, the typechecker both reads and
18 writes a collection of datatypes we call ``Abstract syntax.''
19 This is misleading in that what
20 goes into the typechecker is quite different from what comes out.
22 Let's first consider this fragment of the abstract-syntax
23 definition\srcloc{abstractSyn/HsExpr.lhs}, for Haskell explicit-list
24 expressions (Haskell report, section~3.5
25 \cite{hudak91a}):\nopagebreak[4]
29 | ExplicitList [Expr var pat]
30 | ExplicitListOut UniType [Expr var pat]
33 type ProtoNameExpr = Expr ProtoName ProtoNamePat
34 type RenamedExpr = Expr Name RenamedPat
35 type TypecheckedExpr = Expr Id TypecheckedPat\end{mytightcode}
36 an @ExplicitList@ appears only in typechecker input; an @ExplicitListOut@
37 is the corresponding construct that appears
38 only in the output, with the inferred type information attached.
40 The fragment above also shows how abstract syntax is {\em parameterised}
41 with respect to variables and patterns. The idea is the same for
42 both; let's just consider variables.
44 The renamer converts @ProtoNameExprs@ (expressions with
45 @ProtoNames@\srcloc{basicTypes/ProtoName.lhs} as variables---little
46 more than just strings) into @RenamedExprs@, which have all naming sorted out
47 (using @Names@\srcloc{abstractSyn/Name.lhs}). A @Name@ is known to be
48 properly bound, isn't duplicated, etc.; it's known if it's bound to a
49 built-in standard-prelude entity.
51 (The renamer also does dependency analysis, which is required to
52 maximise polymorphism in a Hindley-Milner type system.)
54 The typechecker reads the @RenamedExprs@, sorts out the types, and
55 spits out @TypecheckedExprs@, with variables represented by
56 @Ids@\srcloc{basicTypes/Id.lhs}. You can find out just about anything
57 you want to know about a variable from its @Id@.
59 To see what GHC makes of some Haskell, in a file @Foo.hs@, say:
60 try @ghc -noC -ddump-rn4 Foo.hs@, to see what comes out of the renamer [pass~4];
61 try @ghc -noC -ddump-tc Foo.hs@, to see what comes out of the typechecker.
63 \subsubsection{Basic datatypes in the compiler}
65 None of the internal datatypes in the example just given are
66 particularly interesting except @Ids@\srcloc{basicTypes/Id.lhs}. A
67 program variable, which enters the typechecker as a string, emerges as
70 The important thing about @Id@---and the datatypes representing other
71 basic program entities (type variables, type constructors, classes,
72 etc.)---is that they often include {\em memoised} information that can
73 be used throughout the rest of the compiler.
75 Let us take a cursory look at @Ids@, as a representative example of
76 these basic data types. (Don't be too scared---@Ids@ are the hairiest
77 entities in the whole compiler!)
80 \codeallowbreaks{}data Id
81 = Id Unique {\dcd\rm-- key for fast comparison}
82 UniType {\dcd\rm-- Id's type; used all the time;}
83 IdInfo {\dcd\rm-- non-essential info about this Id;}
84 PragmaInfo {\dcd\rm-- user-specified pragmas about this Id;}
85 IdDetails {\dcd\rm-- stuff about individual kinds of Ids.}\end{mytightcode}
87 So, every @Id@ comes with:
90 A @Unique@\srcloc{basicTypes/Unique.lhs}, essentially a unique
91 @Int@, for fast comparison;
93 A @UniType@ (more on them below... section~\ref{sec:UniType}) giving the variable's
94 type---this is the most useful memoised information.
96 A @PragmaInfo@, which is pragmatic stuff the user specified for
97 this @Id@; e.g., @INLINE@ it; GHC does not promise to honour these
98 pragma requests, but this is where it keeps track of them.
100 An @IdInfo@ (more on {\em them} below... section~\ref{sec:IdInfo}),
101 which tells you all the extra things
102 that the compiler doesn't {\em have} to know about an @Id@, but it's jolly nice...
103 This corresponds pretty closely to the @GHC_PRAGMA@ cross-module info that you will
104 see in any interface produced using @ghc -O@.
105 An example of some @IdInfo@
106 would be: that @Id@'s unfolding; or its arity.
109 Then the fun begins with @IdDetails@...
111 \codeallowbreaks{}data IdDetails
113 {\dcd\rm---------------- Local values}
115 = LocalId ShortName {\dcd\rm-- mentioned by the user}
117 | SysLocalId ShortName {\dcd\rm-- made up by the compiler}
119 {\dcd\rm---------------- Global values}
121 | ImportedId FullName {\dcd\rm-- Id imported from an interface}
123 | PreludeId FullName {\dcd\rm-- Prelude things the compiler ``knows'' about}
125 | TopLevId FullName {\dcd\rm-- Top-level in the orig source pgm}
126 {\dcd\rm-- (not moved there by transformations).}
128 {\dcd\rm---------------- Data constructors}
132 [TyVarTemplate] ThetaType [UniType] TyCon
133 {\dcd\rm-- split-up type: the constructor's type is:}
134 {\dcd\rm-- $\forall~tyvars . theta\_ty \Rightarrow$}
135 {\dcd\rm-- $unitype_1 \rightarrow~ ... \rightarrow~ unitype_n \rightarrow~ tycon tyvars$}
137 | TupleCon Int {\dcd\rm-- its arity}
139 {\dcd\rm-- There are quite a few more flavours of {\tt IdDetails}...}\end{mytightcode}
141 % A @ShortName@,\srcloc{basicTypes/NameTypes.lhs} which includes a name string
142 % and a source-line location for the name's binding occurrence;
144 In short: everything that later parts of the compiler might want to
145 know about a local @Id@ is readily at hand. The same principle holds
146 true for imported-variable and data-constructor @Ids@ (tuples are an
147 important enough case to merit special pleading), as well as for other
148 basic program entities. Here are a few further notes about the @Id@
152 A @FullName@\srcloc{basicTypes/NameTypes.lhs} is one that may be
153 globally visible, with a module-name as well; it may have been
156 @DataConKey@\srcloc{prelude/PrelUniqs.lhs} is a specialised
157 fast-comparison key for data constructors; there are several of these
160 The @UniType@ for @DataConIds@ is given in {\em two} ways: once, just as
161 a plain type; secondly, split up into its component parts. This is to
162 save frequently re-splitting such types.
164 Similarly, a @TupleCon@ has a type attached, even though we could
165 construct it just from the arity.
168 \subsubsection{@UniTypes@, representing types in the compiler}
171 Let us look further at @UniTypes@\srcloc{uniType/}. Their definition
174 \codeallowbreaks{}data UniType
177 | UniFun UniType {\dcd\rm-- function type}
180 | UniData TyCon {\dcd\rm-- non-synonym datatype}
183 | UniSyn TyCon {\dcd\rm-- type synonym}
184 [UniType] {\dcd\rm--\ \ unexpanded form}
185 UniType {\dcd\rm--\ \ expanded form}
187 | UniDict Class {\dcd\rm-- for types with overloading}
190 {\dcd\rm-- The next two are to do with universal quantification.}
191 | UniTyVarTemplate TyVarTemplate
193 | UniForall TyVarTemplate
194 UniType\end{mytightcode}
195 When the typechecker processes a source module, it adds @UniType@
196 information to all the basic entities (e.g., @Ids@), among other
197 places (see Section~\ref{sec:second-order} for more details). These
198 types are used throughout the compiler.
200 The following example shows several things about @UniTypes@.
201 If a programmer wrote @(Eq a) => a -> [a]@, it would be represented
202 as:\footnote{The internal structures of @Ids@,
203 @Classes@, @TyVars@, and @TyCons@ are glossed over here...}
205 \codeallowbreaks{}UniForall {\dcd$\alpha$}
206 (UniFun (UniDict {\dcd\em Eq} (UniTyVar {\dcd$\alpha$}))
207 (UniFun (UniTyVarTemplate {\dcd$\alpha$})
208 (UniData {\dcd\em listTyCon}
209 [UniTyVarTemplate {\dcd$\alpha$}])))\end{mytightcode}
210 From this example we see:
213 The universal quantification of the type variable $\alpha$ is made explicit
214 (with a @UniForall@).
216 The class assertion @(Eq a)@ is represented with a @UniDict@ whose
217 second component is a @UniType@---this
218 reflects the fact that we expect @UniType@ to be used in a stylized
219 way, avoiding nonsensical constructs (e.g.,
220 @(UniDict f (UniDict g (UniDict h ...)))@).
222 The ``double arrow'' (@=>@) of the Haskell source, indicating an
223 overloaded type, is represented by the usual
224 @UniFun@ ``single arrow'' (@->@), again in a stylized way.
225 This change reflects the fact that each class assertion in a
226 function's type is implemented by adding an extra dictionary argument.
228 In keeping with the memoising tradition we saw with @Ids@, type
229 synonyms (@UniSyns@) keep both the unexpanded and expanded forms handy.
232 \subsubsection{@IdInfo@: extra pragmatic info about an @Id@}
235 [New in 0.16.] All the nice-to-have-but-not-essential information
236 about @Ids@ is now hidden in the
237 @IdInfo@\srcloc{basicTypes/IdInfo.lhs} datatype. It looks something
240 \codeallowbreaks{}data IdInfo
241 = NoIdInfo {\dcd\rm-- OK, we know nothing...}
244 ArityInfo {\dcd\rm-- its arity}
245 DemandInfo {\dcd\rm-- whether or not it is definitely demanded}
246 InliningInfo {\dcd\rm-- whether or not we should inline it}
247 SpecialisationInfo {\dcd\rm-- specialisations of this overloaded}
248 {\dcd\rm-- function which exist}
249 StrictnessInfo {\dcd\rm-- strictness properties, notably}
250 {\dcd\rm-- how to conjure up ``worker'' functions}
251 WrapperInfo {\dcd\rm-- ({\em informational only}) if an Id is}
252 {\dcd\rm-- a ``worker,'' this says what Id it's}
253 {\dcd\rm-- a worker for, i.e., ``who is my wrapper''}
254 {\dcd\rm-- (used to match up workers/wrappers)}
255 UnfoldingInfo {\dcd\rm-- its unfolding}
256 UpdateInfo {\dcd\rm-- which args should be updated}
257 SrcLocation {\dcd\rm-- source location of definition}\end{mytightcode}
258 As you can see, we may accumulate a lot of information about an Id!
259 (The types for all the sub-bits are given in the same place...)
261 \subsubsection{Introducing dictionaries for overloading}
263 The major novel feature of the Haskell language is its systematic
264 overloading using {\em type classes}; Wadler and Blott's paper is the
265 standard reference \cite{wadler89a}.
267 To implement type classes, the typechecker not only checks the Haskell
268 source program, it also {\em translates} it---by inserting code to
269 pass around in {\em dictionaries}. These dictionaries
270 are essentially tuples of functions, from which the correct code may
271 be plucked at run-time to give the desired effect. Kevin Hammond
272 wrote and described the first working implementation of type
273 classes \cite{hammond89a}, and the ever-growing static-semantics paper
274 by Peyton Jones and Wadler is replete with the glories of dictionary
275 handling \cite{peyton-jones90a}. (By the way, the typechecker's
276 structure closely follows the static semantics paper; inquirers into
277 the former will become devoted students of the latter.)
279 Much of the abstract-syntax datatypes are given
280 over to output-only translation machinery. Here are a few more
281 fragments of the @Expr@ type, all of which appear only in typechecker
286 | DictLam [DictVar] (Expr var pat)
287 | DictApp (Expr var pat) [DictVar]
288 | Dictionary [DictVar] [Id]
291 You needn't worry about this stuff:
292 After the desugarer gets through with such constructs, there's nothing
293 left but @Ids@, tuples, tupling functions, etc.,---that is, ``plain
294 simple stuff'' that should make the potential optimiser's heart throb.
295 Optimisation passes don't deal with dictionaries explicitly but, in
296 some cases, quite a bit of the code passed through to them will be for