The GHC Commentary - Sugar Free: From Haskell To Core

Up until after type checking, GHC keeps the source program in an abstract representation of Haskell source without removing any of the syntactic sugar (such as, list comprehensions) that could easily be represented by more primitive Haskell. This complicates part of the front-end considerably as the abstract syntax of Haskell (as exported by the module HsSyn) is much more complex than a simplified representation close to, say, the Haskell Kernel would be. However, having a representation that is as close as possible to the surface syntax simplifies the generation of clear error messages. As GHC (quite in contrast to "conventional" compilers) prints code fragments as part of error messages, the choice of representation is especially important.

Nonetheless, as soon as the input has passed all static checks, it is transformed into GHC's principal intermediate language that goes by the name of Core and whose representation is exported by the module CoreSyn. All following compiler phases, except code generation operate on Core. Due to Andrew Tolmach's effort, there is also an external representation for Core.

The conversion of the compiled module from HsSyn into that of CoreSyn is performed by a phase called the desugarer, which is located in fptools/ghc/compiler/deSugar/. It's operation is detailed in the following.

Auxilliary Functions

The modules DsMonad defines the desugarer monad (of type DsM) which maintains the environment needed for desugaring. In particular, it encapsulates a unique supply for generating new variables, a map to lookup standard names (such as functions from the prelude), a source location for error messages, and a pool to collect warning messages generated during desugaring. Initialisation of the environment happens in the function Desugar.desugar, which is also the main entry point into the desugarer.

The generation of Core code often involves the use of standard functions for which proper identifiers (i.e., values of type Id that actually refer to the definition in the right Prelude) need to be obtained. This is supported by the function DsMonad.dsLookupGlobalValue :: Name -> DsM Id.

Pattern Matching

Nested pattern matching with guards and everything is translated into the simple, flat case expressions of Core by the following modules:

Match:: This modules contains the main pattern-matching compiler in the form of a function called match. There is some documentation as to how match works contained in the module itself. Generally, the implemented algorithm is similar to the one described in Phil Wadler's Chapter ? of Simon Peyton Jones' The Implementation of Functional Programming Languages. Match exports a couple of functions with not really intuitive names. In particular, it exports match, matchWrapper, matchExport, and matchSimply. The function match, which is the main work horse, is only used by the other matching modules. The function matchExport - despite it's name - is merely used internally in Match and handles warning messages (see below for more details). The actual interface to the outside is matchWrapper, which converts the output of the type checker into the form needed by the pattern matching compiler (i.e., a list of EquationInfo). Similar in function to matchWrapper is matchSimply, which provides an interface for the case where a single expression is to be matched against a single pattern (as, for example, is the case in bindings in a do expression).
MatchCon:: This module generates code for a set of alternative constructor patterns that belong to a single type by means of the routine matchConFamily. More precisely, the routine gets a set of equations where the left-most pattern of each equation is a constructor pattern with a head symbol from the same type as that of all the other equations. A Core case expression is generated that distinguihes between all these constructors. The routine is clever enough to generate a sparse case expression and to add a catch-all default case only when needed (i.e., if the case expression isn't exhaustive already). There is also an explanation at the start of the modules.
MatchLit:: Generates code for a set of alternative literal patterns by means of the routine matchLiterals. The principle is similar to that of matchConFamily, but all left-most patterns are literals of the same type.
DsUtils:: This module provides a set of auxilliary definitions as well as the data types EquationInfo and MatchResult that form the input and output, respectively, of the pattern matching compiler.
Check:: This module does not really contribute the compiling pattern matching, but it inspects sets of equations to find whether there are any overlapping patterns or non-exhaustive pattern sets. This task is implemented by the function check, which returns a list of patterns that are part of a non-exhaustive case distinction as well as a set of equation labels that can be reached during execution of the code; thus, the remaining equations are shadowed due to overlapping patterns. The function check is invoked and its result converted into suitable warning messages by the function Match.matchExport (which is a wrapper for Match.match).

The central function match, given a set of equations, proceeds in a number of steps:

It starts by desugaring the left-most pattern of each equation using the function tidy1 (indirectly via tidyEqnInfo). During this process, non-elementary pattern (e.g., those using explicit list syntax [x, y, ..., z]) are converted to a standard constructor pattern and also irrefutable pattern are removed.
Then, a process called unmixing clusters the equations into blocks (without re-ordering them), such that the left-most pattern of all equations in a block are either all variables, all literals, or all constructors.
Each block is, then, compiled by matchUnmixedEqns, which forwards the handling of literal pattern blocks to MatchLit.matchLiterals, of constructor pattern blocks to MatchCon.matchConFamily, and hands variable pattern blocks back to match.

Last modified: Mon Feb 11 22:35:25 EST 2002