The GHC Commentary - Checking Types

Probably the most important phase in the frontend is the type checker, which is located at fptools/ghc/compiler/typecheck/. GHC type checks programs in their original Haskell form before the desugarer converts them into Core code. This complicates the type checker as it has to handle the much more verbose Haskell AST, but it improves error messages, as the those message are based on the same structure that the user sees.

GHC defines the abstract syntax of Haskell programs in HsSyn using a structure that abstracts over the concrete representation of bound occurences of identifiers and patterns. The module TcHsSyn instantiates this structure for the type checker using TcEnv.TcId to represent identifiers - in fact, a TcId is currently nothing but just a synonym for a plain Id.

Types Variables and Zonking

During type checking type variables are represented by mutable variables - cf. the variable story. Consequently, unification can instantiate type variables by updating those mutable variables. This process of instantiation is (for reasons that elude me) called zonking in GHC's sources. The zonking routines for the various forms of Haskell constructs are responsible for most of the code in the module TcHsSyn, whereas the routines that actually operate on mutable types are defined in TcMTypes; this includes routines to create mutable structures and update them as well as routines that check constraints, such as that type variables in function signatures have not been instantiated during type checking. The actual type unification routine is uTys in the same module.

All type variables that may be instantiated (those in signatures may not), but haven't been instantiated during type checking, are zonked to (), so that after type checking all mutable variables have been eliminated.

Type Checking Environment

During type checking, GHC maintains a type environment whose details are fixed in TcEnv.lhs. Among other things, the environment contains all imported and local instances as well as a list of global entities (imported and local types and classes together with imported identifiers) and local entities (locally defined identifiers). This environment is threaded through the type checking monad.

Expressions

Expressions are type checked by TcExpr.lhs.

Usage occurences of identifiers are processed by the function tcId whose main purpose is to instantiate overloaded identifiers. It essentially calls TcInst.instOverloadedFun once for each universally quantified set of type constraints. It should be noted that overloaded identifiers are replaced by new names that are first defined in the LIE (Local Instance Environment?) and later promoted into top-level bindings.

Handling of Dictionaries and Method Instances

GHC implements overloading using so-called dictionaries. A dictionary is a tuple of functions -- one function for each method in the class of which the dictionary implements an instance. During type checking, GHC replaces each type constraint of a function with one additional argument. At runtime, the extended function gets passed a matching class dictionary by way of these additional arguments. Whenever the function needs to call a method of such a class, it simply extracts it from the dictionary.

This sounds simple enough; however, the actual implementation is a bit more tricky as it wants to keep track of all the instances at which overloaded functions are used in a module. This information is useful to optimise the code. The implementation is the module Inst.lhs.

The function instOverloadedFun is invoked for each overloaded usage occurence of an identifier, where overloaded means that the type of the idendifier contains a non-trivial type constraint. It proceeds in two steps: (1) Allocation of a method instance (newMethodWithGivenTy) and (2) instantiation of functional dependencies. The former implies allocating a new unique identifier, which replaces the original (overloaded) identifier at the currently type-checked usage occurrence.

The new identifier (after being threaded through the LIE) eventually will be bound by a top-level binding whose rhs contains a partial application of the original overloaded identifier. This papp applies the overloaded function to the dictionaries needed for the current instance. In GHC lingo, this is called a method. Before becoming a top-level binding, the method is first represented as a value of type Inst.Inst, which makes it easy to fold multiple instances of the same identifier at the same types into one global definition. (And probably other things, too, which I haven't investigated yet.)

Note: As of 13 January 2001 (wrt. to the code in the CVS HEAD), the above mechanism interferes badly with RULES pragmas defined over overloaded functions. During instantiation, a new name is created for an overloaded function partially applied to the dictionaries needed in a usage position of that function. As the rewrite rule, however, mentions the original overloaded name, it won't fire anymore -- unless later phases remove the intermediate definition again. The latest CVS version of GHC has an option -fno-method-sharing, which avoids sharing instantiation stubs. This is usually/often/sometimes sufficient to make the rules fire again.

Last modified: Tue Nov 13 14:28:44 EST 2001