ghc/docs/add_to_compiler/core-syntax.verb

   1 %************************************************************************
   2 %*                                                                      *
   3 \section{Core syntax, and transformations on it}
   4 \label{sec:core-syntax}
   5 %*                                                                      *
   6 %************************************************************************
   7
   8 The @CoreSyntax@ datatype is intended to be the {\em lingua franca} of
   9 the back end of the compiler; a summary is shown in
  10 Figure~\ref{fig:core-syntax}.
  11 \input{core-summary-fig}
  12 As you can see, the Core syntax is a simple
  13 functional language.
  14
  15 \subsection{Second-order polymorphic lambda calculus}
  16 \label{sec:second-order}
  17
  18 Core syntax is essentially the second-order polymorphic lambda
  19 calculus.  This is reflected in the fact that Core expressions can be
  20 {\em type applications} or {\em type abstractions} (the types in
  21 question are represented as @UniTypes@, of course).\footnote{An
  22 interesting point: there are Core-syntax programs that cannot be
  23 written in Haskell!  Core syntax
  24 is the {\em more expressive} language.  One could imagine writing a
  25 front end (parser, etc.) for a richer programming language and still
  26 being able to use this compiler's back-end machinery without change.}
  27
  28 Having explicit type application and abstraction (NB: added by
  29 the typechecker during translation) gives us a uniform,
  30 well-understood, non-{\em ad hoc} way to express the types of
  31 Core expressions.  Given that variables (i.e., @Ids@) and other
  32 basic entities have their types memoised in them, it is then easy to
  33 work out the type of {\em any Core expression}.  For example, in the
  34 expression\ToDo{example here}
  35 \begin{verbatim}
  36 ... <example to be supplied> ...
  37 \end{verbatim}
  38 @a@ is a type variable, @(<???>)@ is a type application, and, assuming
  39 the type of @??@ is $some\ math\ mode\ here...$, then the type of the
  40 expression is @...@.
  41
  42 The truly great thing about using the second-order polymorphic lambda
  43 calculus is that it is {\em easy to preserve types
  44 in the face of transformation passes}, however drastic their mangling
  45 of the original program.
  46
  47 \ToDo{example here}
  48
  49 \subsection{Parameterised and annotated Core syntax}
  50 \label{sec:parameterised-core}
  51
  52 As we saw with the ``abstract syntax'' (in
  53 Section~\ref{sec:AbsSyntax}), the Core syntax is also {\em
  54 parameterised}, this time with respect to binders and bound-variables
  55 (or ``bindees'').  The definition of a Core expression
  56 begins:\srcloc{coreSyn/CoreSyn.lhs}
  57 \begin{tightcode}
  58 data CoreExpr binder bindee
  59      = CoVar        bindee
  60      | CoLit        CoreLiteral
  61      ...
  62 type PlainCoreBinder = Id
  63 type PlainCoreBindee = Id
  64 type PlainCoreExpr = CoreExpr PlainCoreBinder PlainCoreBindee
  65 \end{tightcode}
  66 Most back-end passes use the parameterisation shown above, namely
  67 @PlainCoreExprs@,\srcloc{coreSyn/PlainCore.lhs} parameterised on @Id@
  68 for both binders and bindees.
  69
  70 An example of a pass that uses a different parameterisation is
  71 occurrence analysis,\srcloc{simplCore/OccurAnal.lhs} which gathers
  72 up info about the {\em occurrences} of bound variables.  It uses:
  73 \begin{tightcode}
  74 data BinderInfo     {\dcd\rm-- various things to record about binders...}
  75 type TaggedBinder   tag = (Id, tag)
  76 type TaggedCoreExpr tag = CoreExpr (TaggedBinder tag) Id
  77
  78 substAnalyseExpr :: PlainCoreExpr -> TaggedCoreExpr BinderInfo
  79 \end{tightcode}
  80 The pass's expression-mangling function then has the unsurprising type
  81 shown above.
  82
  83 Core syntax has a ``twin'' datatype that is also sometimes useful:
  84 {\em annotated} Core syntax.\srcloc{coreSyn/AnnCoreSyn.lhs} This is a
  85 datatype identical in form to Core syntax, but such that every
  86 ``node'' of a Core expression can be annotated with some information
  87 of your choice.  As an example, the type of a pass that attaches a
  88 @Set@ of free variables to every subexpression in a Core expression
  89 might be:\srcloc{coreSyn/FreeVars.lhs}
  90 \begin{tightcode}
  91 freeVars :: PlainCoreExpr -> AnnCoreExpr Id Id (Set Id)
  92         {\dcd\rm-- parameterised on binders, bindees, and annotation}
  93 \end{tightcode}
  94
  95 \subsection{Unboxing and other Core syntax details}
  96 \label{sec:unboxing}
  97
  98 One facet of the Core syntax summary in Figure~\ref{fig:core-syntax}
  99 that may have caught your eye is the separation of case-alternatives
 100 into those for boxed entities (ordinary data constructors) and unboxed
 101 entities (real machine-level things).  The ``literals'' and
 102 ``primitives'' mentioned there are also machine-level constructs.  It
 103 is for this reason that all applications of constructors and
 104 primitives are {\em saturated}; what use, for example, is
 105 a machine-level addition if you do not
 106 have two arguments to feed to it?  (Most machines do not offer curried
 107 arithmetic in their hardware.)
 108
 109 The handling of unboxed values in the back end of the compiler follows
 110 the design described in the Peyton Jones/Launchbury paper on the
 111 subject \cite{peyton-jones91b}.  You, the enthusiastic optimiser, only
 112 need to be aware that this is the ``level of discourse.''  You will
 113 also probably want to be sure that your optimisations can take full
 114 advantage of the explicitness of the unboxery.
 115
 116 \subsection{``Core Lint''---your dear friend}
 117 \label{sec:core-lint}
 118
 119 ToDo ToDo
 120
 121 % \subsection{STG syntax}
 122 % \label{sec:stg-syntax}
 123 %
 124 % As mentioned earlier, the compiler converts Core syntax into ``STG
 125 % syntax'' (named for the Spineless Tagless G-machine) before finally
 126 % making its move into the dark world we call ``Abstract~C''.
 127 %
 128 % Figure~\ref{fig:stg-syntax} shows the STG syntax,
 129 % \input{stg-summary-fig}
 130 % mainly so that you can compare it with Core syntax.  (It is at least
 131 % conceivable that you might to perform your optimisation pass at this
 132 % level.)
 133 %
 134 % STG syntax is a truly austere functional language.  In places where
 135 % Core syntax allows "exprs", STG syntax insists on "vars"---everything
 136 % has been flattened out.  Type information (abstractions and
 137 % applications) have been thrown overboard.  Other than that, STG syntax
 138 % is the ``same'' as Core syntax, with some extra non-essential
 139 % annotations on bindings: update flags and free-variable information.
 140 %
 141 % You will want to consult the revised Spineless Tagless G-machine paper
 142 % \cite{peyton-jones92a} if you wish to spend any time in the STG world.