%************************************************************************ %* * \section{Core syntax, and transformations on it} \label{sec:core-syntax} %* * %************************************************************************ The @CoreSyntax@ datatype is intended to be the {\em lingua franca} of the back end of the compiler; a summary is shown in Figure~\ref{fig:core-syntax}. \input{core-summary-fig} As you can see, the Core syntax is a simple functional language. \subsection{Second-order polymorphic lambda calculus} \label{sec:second-order} Core syntax is essentially the second-order polymorphic lambda calculus. This is reflected in the fact that Core expressions can be {\em type applications} or {\em type abstractions} (the types in question are represented as @UniTypes@, of course).\footnote{An interesting point: there are Core-syntax programs that cannot be written in Haskell! Core syntax is the {\em more expressive} language. One could imagine writing a front end (parser, etc.) for a richer programming language and still being able to use this compiler's back-end machinery without change.} Having explicit type application and abstraction (NB: added by the typechecker during translation) gives us a uniform, well-understood, non-{\em ad hoc} way to express the types of Core expressions. Given that variables (i.e., @Ids@) and other basic entities have their types memoised in them, it is then easy to work out the type of {\em any Core expression}. For example, in the expression\ToDo{example here} \begin{verbatim} ... ... \end{verbatim} @a@ is a type variable, @()@ is a type application, and, assuming the type of @??@ is $some\ math\ mode\ here...$, then the type of the expression is @...@. The truly great thing about using the second-order polymorphic lambda calculus is that it is {\em easy to preserve types in the face of transformation passes}, however drastic their mangling of the original program. \ToDo{example here} \subsection{Parameterised and annotated Core syntax} \label{sec:parameterised-core} As we saw with the ``abstract syntax'' (in Section~\ref{sec:AbsSyntax}), the Core syntax is also {\em parameterised}, this time with respect to binders and bound-variables (or ``bindees''). The definition of a Core expression begins\srcloc{coreSyn/CoreSyn.lhs}: \begin{mytightcode} data CoreExpr binder bindee = CoVar bindee | CoLit CoreLiteral ... type PlainCoreBinder = Id type PlainCoreBindee = Id type PlainCoreExpr = CoreExpr PlainCoreBinder PlainCoreBindee\end{mytightcode} Most back-end passes use the parameterisation shown above, namely @PlainCoreExprs@\srcloc{coreSyn/PlainCore.lhs}, parameterised on @Id@ for both binders and bindees. An example of a pass that uses a different parameterisation is occurrence analysis\srcloc{simplCore/OccurAnal.lhs}, which gathers up info about the {\em occurrences} of bound variables. It uses: \begin{mytightcode} data BinderInfo {\dcd\rm-- various things to record about binders...} type TaggedBinder tag = (Id, tag) type TaggedCoreExpr tag = CoreExpr (TaggedBinder tag) Id substAnalyseExpr :: PlainCoreExpr -> TaggedCoreExpr BinderInfo\end{mytightcode} The pass's expression-mangling function then has the unsurprising type shown above. Core syntax has a ``twin'' datatype that is also sometimes useful: {\em annotated} Core syntax\srcloc{coreSyn/AnnCoreSyn.lhs}. This is a datatype identical in form to Core syntax, but such that every ``node'' of a Core expression can be annotated with some information of your choice. As an example, the type of a pass that attaches a @Set@ of free variables to every subexpression in a Core expression might be\srcloc{coreSyn/FreeVars.lhs}: \begin{mytightcode} freeVars :: PlainCoreExpr -> AnnCoreExpr Id Id (Set Id) {\dcd\rm-- parameterised on binders, bindees, and annotation}\end{mytightcode} \subsection{Unboxing and other Core syntax details} \label{sec:unboxing} One facet of the Core syntax summary in Figure~\ref{fig:core-syntax} that may have caught your eye is the separation of case-alternatives into those for boxed entities (ordinary data constructors) and unboxed entities (real machine-level things). The ``literals'' and ``primitives'' mentioned there are also machine-level constructs. It is for this reason that all applications of constructors and primitives are {\em saturated}; what use, for example, is a machine-level addition if you do not have two arguments to feed to it? (Most machines do not offer curried arithmetic in their hardware.) The handling of unboxed values in the back end of the compiler follows the design described in the Peyton Jones/Launchbury paper on the subject \cite{peyton-jones91b}. You, the enthusiastic optimiser, only need to be aware that this is the ``level of discourse.'' You will also probably want to be sure that your optimisations can take full advantage of the explicitness of the unboxery. \subsection{``Core Lint''---your dear friend} \label{sec:core-lint} ToDo ToDo % \subsection{STG syntax} % \label{sec:stg-syntax} % % As mentioned earlier, the compiler converts Core syntax into ``STG % syntax'' (named for the Spineless Tagless G-machine) before finally % making its move into the dark world we call ``Abstract~C''. % % Figure~\ref{fig:stg-syntax} shows the STG syntax, % \input{stg-summary-fig} % mainly so that you can compare it with Core syntax. (It is at least % conceivable that you might to perform your optimisation pass at this % level.) % % STG syntax is a truly austere functional language. In places where % Core syntax allows "exprs", STG syntax insists on "vars"---everything % has been flattened out. Type information (abstractions and % applications) have been thrown overboard. Other than that, STG syntax % is the ``same'' as Core syntax, with some extra non-essential % annotations on bindings: update flags and free-variable information. % % You will want to consult the revised Spineless Tagless G-machine paper % \cite{peyton-jones92a} if you wish to spend any time in the STG world.