The GHC Commentary - Data types and data constructors

Data types

Consider the following data type declaration:

  data T a = MkT !(a,a) !(T a) | Nil

The user's source program mentions only the constructors MkT and Nil. However, these constructors actually do something in addition to building a data value. For a start, MkT evaluates its arguments. Secondly, with the flag -funbox-strict-fields GHC will flatten (or unbox) the strict fields. So GHC generates a top-level function for each data constructor, as follows:

  MkT :: (a,a) -> T a -> T a
  MkT p t = case p of 
              (a,b) -> seq t ($wMkT a b t)

  Nil :: T a
  Nil = $wNil

Here, the wrapper MkT evaluates and takes the argument p, evaluates the argument t, and builds a three-field data value with the worker constructor $wMKT. (There are more notes below about the unboxing of strict fields.)

So the original constructors, MkT and Nil are really just wrappers which perhaps do some work before calling the workers $wMkT and $wNil. The workers are the "representation constructors" of the "representation data type", which we can think of as being defined thus:

  data T a = $wMkT a a Int | $wNil

This representation data type, gives the number and types of fields of the constructors used to represent values of type T. This representation type is also what is emitted when you print External Core from GHC.

The constructor wrapper functions

The wrapper functions are automatically generated by GHC, and are really emitted into the result code (albeit only after CorePre; see CorePrep.mkImplicitBinds). The wrapper functions are inlined very vigorously, so you will not see many occurrences of the wrapper functions in an optimised program, but you may see some. For example, if your Haskell source has

    map MkT xs

then MkT will not be inlined (because it is not applied to anything). That is why we generate real top-level bindings for the wrapper functions, and generate code for them.

Unboxing strict fields

If GHC unboxes strict fields (as in the first argument of MkT above), it also transforms source-language case expressions. Suppose you write this in your Haskell source:

   case e of 
     MkT p t -> ..p..t..

GHC will desugar this to the following Core code:

   case e of
     $wMkT a b t -> let p = (a,b) in ..p..t..

(Important note: perhaps misleadingly, when printing Core we actually print the constructor in the case expression as "MkT" not as "$wMkT", but it really means the latter.)

The local let-binding reboxes the pair because it may be mentioned in the case alternative. This may well be a bad idea, which is why -funbox-strict-fields is an experimental feature.

It's essential that when importing a type T defined in some external module M, GHC knows what representation was used for that type, and that in turn depends on whether module M was compiled with -funbox-strict-fields. So when writing an interface file, GHC therefore records with each data type whether its strict fields (if any) should be unboxed.