ghc/docs/add_to_compiler/overview.verb

   1 %************************************************************************
   2 %*                                                                      *
   3 \section{Overview of the Glasgow Haskell compiler}
   4 %*                                                                      *
   5 %************************************************************************
   6
   7 Figure~\ref{fig:overview} shows a schematic overview of the Glasgow
   8 Haskell compiler (GHC), including all the major datatypes and most
   9 existing passes.
  10 \begin{figure}
  11 \centering
  12 \input{overview-fig}
  13 %\psfig{figure=closure.ps}
  14 \caption{Compiler overview}
  15 \label{fig:overview}
  16 \end{figure}
  17 The compiler is itself written in Haskell.  As of now, the compiler is
  18 made up of about 200?~modules, with roughly 40,000?~lines of
  19 Haskell code, excluding comments and blank lines.
  20
  21 The compiler divides unsurprisingly into a {\em front end} and a {\em
  22 back end}, corresponding to the left and right columns of
  23 Figure~\ref{fig:overview}, respectively.
  24
  25 The front end, discussed further in Section~\ref{sec:front-end}, is
  26 the part that may report errors back to the user.  The two main pieces
  27 are a {\em renamer},\srcloc{renamer/} which handles naming issues,
  28 including support of the Haskell module system, and the {\em
  29 typechecker}.\srcloc{typecheck/}
  30
  31 The front end operates on a collection of data types that we call
  32 ``abstract syntax.''\srcloc{abstractSyn/}  These types
  33 match the Haskell language, construct for construct.  For example,
  34 if you write @... [ x | x <- [1..n] ] ...@, the typechecker
  35 will actually see something like:
  36 \begin{verbatim}
  37 ListComp
  38   (Var x)
  39   (GeneratorQual (VarPatIn x)
  40                  (ArithSeq (FromTo (Lit (IntLit 1)) (Var n))))
  41 \end{verbatim}
  42 So, the renamer and typechecker work on unrestructured Haskell source
  43 rather than its desugared equivalent.  The compiler should be {\em
  44 quicker} to find errors (because the source is much smaller and time
  45 hasn't been taken desugaring), and it should report errors more
  46 lucidly, in terms of the original program text.
  47
  48 A conventional desugaring pass\srcloc{deSugar/} (basically Wadler's
  49 Chapter~5 of Peyton Jones's 1987 implementation book
  50 \cite{peyton-jones87b}) converts the typechecker's abstract-syntax output
  51 (with types attached) into the ``CoreSyntax''\srcloc{coreSyn/} data
  52 type.  This data type is little more than the second-order polymorphic
  53 lambda calculus and is intended to be the {\em lingua franca} of the
  54 compiler's back end, including almost all of the optimisation passes.
  55 Core syntax is explained at length in Section~\ref{sec:core-syntax}.
  56
  57 The back end of the compiler, discussed further in
  58 Section~\ref{sec:back-end}, takes a successfully-typechecked module
  59 and produces executable code for it.  The back end consists of zero or
  60 more Core-to-Core transformation passes, followed by conversion to STG
  61 syntax\srcloc{stgSyn/} (a very low-level functional language, named
  62 after the intended Spineless Tagless G-machine\footnote{Oops!  Make
  63 that ``shared term graph'' language!  (Who's fooling who here,
  64 Simon?)} target architecture), then some STG-to-STG transformations,
  65 and finally out of the functional world\srcloc{codeGen/} into
  66 ``Abstract~C,''\srcloc{absCSyn/} a datatype intended as an adequate
  67 launching pad into both portable C and into get-your-hands-{\em
  68 really}-dirty native-code generation for a particular instruction-set
  69 architecture.  We can generate C, or native-code for SPARCs and DEC
  70 Alphas.