%************************************************************************
%*									*
\section{Overview of the Glasgow Haskell compiler}
%*									*
%************************************************************************

Figure~\ref{fig:overview} shows a schematic overview of the Glasgow
Haskell compiler (GHC), including all the major datatypes and most
existing passes.
\begin{figure}
\centering
\input{overview-fig}
%\psfig{figure=closure.ps}
\caption{Compiler overview}
\label{fig:overview}
\end{figure}
The compiler is itself written in Haskell.  As of now, the compiler is
made up of about 200?~modules, with roughly 40,000?~lines of
Haskell code, excluding comments and blank lines.

The compiler divides unsurprisingly into a {\em front end} and a {\em
back end}, corresponding to the left and right columns of
Figure~\ref{fig:overview}, respectively.

The front end, discussed further in Section~\ref{sec:front-end}, is
the part that may report errors back to the user.  The two main pieces
are a {\em renamer}\srcloc{renamer/}, which handles naming issues,
including support of the Haskell module system, and the {\em
typechecker}\srcloc{typecheck/}.

The front end operates on a collection of data types that we call
``abstract syntax\srcloc{abstractSyn/}.''  These types
match the Haskell language, construct for construct.  For example,
if you write @... [ x | x <- [1..n] ] ...@, the typechecker
will actually see something like:
\begin{verbatim}
ListComp
  (Var x)
  (GeneratorQual (VarPatIn x)
		 (ArithSeq (FromTo (Lit (IntLit 1)) (Var n))))
\end{verbatim}
So, the renamer and typechecker work on unrestructured Haskell source
rather than its desugared equivalent.  The compiler should be {\em
quicker} to find errors (because the source is much smaller and time
hasn't been taken desugaring), and it should report errors more
lucidly, in terms of the original program text.

A conventional desugaring pass\srcloc{deSugar/} (basically Wadler's
Chapter~5 of Peyton Jones's 1987 implementation book
\cite{peyton-jones87b}) converts the typechecker's abstract-syntax output
(with types attached) into the ``CoreSyntax\srcloc{coreSyn/}'' data
type.  This data type is little more than the second-order polymorphic
lambda calculus and is intended to be the {\em lingua franca} of the
compiler's back end, including almost all of the optimisation passes.
Core syntax is explained at length in Section~\ref{sec:core-syntax}.

The back end of the compiler, discussed further in
Section~\ref{sec:back-end}, takes a successfully-typechecked module
and produces executable code for it.  The back end consists of zero or
more Core-to-Core transformation passes, followed by conversion to STG
syntax\srcloc{stgSyn/} (a very low-level functional language, named
after the intended Spineless Tagless G-machine\footnote{Oops!  Make
that ``shared term graph'' language!  (Who's fooling who here,
Simon?)} target architecture), then some STG-to-STG transformations,
and finally out of the functional world\srcloc{codeGen/} into
``Abstract~C\srcloc{absCSyn/},'' a datatype intended as an adequate
launching pad into both portable C and into get-your-hands-{\em
really}-dirty native-code generation for a particular instruction-set
architecture.  We can generate C, or native-code for SPARCs and DEC
Alphas.