From: simonpj Date: Tue, 11 Feb 2003 17:19:36 +0000 (+0000) Subject: [project @ 2003-02-11 17:19:35 by simonpj] X-Git-Tag: Approx_11550_changesets_converted~1180 X-Git-Url: http://git.megacz.com/?a=commitdiff_plain;h=77b2c81daf347e4574353dec381804739ba19fb8;p=ghc-hetmet.git [project @ 2003-02-11 17:19:35 by simonpj] Lots of new stuff about data types --- diff --git a/ghc/docs/comm/genesis/modules.html b/ghc/docs/comm/genesis/modules.html index 2706038..8d63c53 100644 --- a/ghc/docs/comm/genesis/modules.html +++ b/ghc/docs/comm/genesis/modules.html @@ -65,17 +65,13 @@ identifiers, expressions, rules, and their operations.

  • Type (loop DataCon.DataCon, loop Subst.substTy)

  • - FieldLabel( Type)
    + FieldLabel(Type)
    TysPrim(Type)
    - PprEnv (loop DataCon.DataCon, Type) -

  • - Unify
    - PprType (PprEnv)

  • Literal (TysPrim, PprType)
    - DataCon (loop PprType) + DataCon (loop PprType, loop Subst.substTyWith, FieldLabel.FieldLabel)

  • - TysWiredIn (DataCon.mkDataCon, loop MkId.mkDataConId, loop Generics.mkGenInfo) + TysWiredIn (loop MkId.mkDataConWorkId, loop Generics.mkGenInfo, DataCon.mkDataCon)

  • TcType( lots of TysWiredIn stuff)

  • diff --git a/ghc/docs/comm/index.html b/ghc/docs/comm/index.html index 1b5b216..32b07f2 100644 --- a/ghc/docs/comm/index.html +++ b/ghc/docs/comm/index.html @@ -60,7 +60,7 @@
  • The Basics
  • Modules, ModuleNames and Packages -
  • The truth about names: Names and OccNamesd +
  • The truth about names: Names and OccNames
  • The Real Story about Variables, Ids, TyVars, and the like
  • Data types and constructors diff --git a/ghc/docs/comm/the-beast/data-types.html b/ghc/docs/comm/the-beast/data-types.html index 1d73f6e..384655c 100644 --- a/ghc/docs/comm/the-beast/data-types.html +++ b/ghc/docs/comm/the-beast/data-types.html @@ -9,6 +9,7 @@

    The GHC Commentary - Data types and data constructors

    +This chapter was thoroughly changed Feb 2003.

    Data types

    @@ -16,44 +17,111 @@ Consider the following data type declaration:
       data T a = MkT !(a,a) !(T a) | Nil
    +
    +  f x = case x of
    +          MkT p q -> MkT p (q+1)
    +	  Nil     -> Nil
     
    The user's source program mentions only the constructors MkT and Nil. However, these constructors actually do something in addition to building a data value. For a start, MkT evaluates its arguments. Secondly, with the flag -funbox-strict-fields GHC -will flatten (or unbox) the strict fields. So GHC generates a top-level function -for each data constructor, as follows: +will flatten (or unbox) the strict fields. So we may imagine that there's the +source constructor MkT and the representation constructor +MkT, and things start to get pretty confusing. +

    +GHC now generates three unique Names for each data constructor: +

    +                                 ---- OccName ------
    +			         String  Name space	Used for
    +  ---------------------------------------------------------------------------
    +  The "source data con" 	   MkT	  DataName	The DataCon itself
    +  The "worker data con"		   MkT	  VarName	Its worker Id
    +    aka "representation data con"
    +  The "wrapper data con"	   $WMkT  VarName	Its wrapper Id (optional)
    +
    +Recall that each occurrence name (OccName) is a pair of a string and a +name space (see The truth about names), and +two OccNames are considered the same only if both components match. +That is what distinguishes the name of the name of the DataCon from +the name of its worker Id. To keep things unambiguous, in what +follows we'll write "MkT{d}" for the source data con, and "MkT{v}" for +the worker Id. (Indeed, when you dump stuff with "-ddumpXXX", if you +also add "-dppr-debug" you'll get stuff like "Foo {- d rMv -}". The +"d" part is the name space; the "rMv" is the unique key.) +

    +Each of these three names gets a distinct unique key in GHC's name cache. +

    The life cycle of a data type

    + +Suppose the Haskell source looks like this:
    -  MkT :: (a,a) -> T a -> T a
    -  MkT p t = case p of 
    -              (a,b) -> seq t ($wMkT a b t)
    +  data T a = MkT !(a,a) !Int | Nil{d}
     
    -  Nil :: T a
    -  Nil = $wNil
    +  f x = case x of
    +          Nil     -> Nil
    +          MkT p q -> MkT p (q+1)
     
    +When the parser reads it in, it decides which name space each lexeme comes +from, thus: +
    +  data T a = MkT{d} !(a,a) !Int | Nil{d}
     
    -Here, the wrapper MkT evaluates and takes the argument p,
    +  f x = case x of
    +          Nil{d}     -> Nil{d}
    +          MkT{d} p q -> MkT{d} p (q+1)
    +
    +Notice that in the Haskell source all data contructors are named via the "source data con" MkT{d}, +whether in pattern matching or in expressions. +

    +In the translated source produced by the type checker (-ddump-tc), the program looks like this: +

    +  f x = case x of
    +          Nil{d}     -> Nil{v}
    +          MkT{d} p q -> $WMkT p (q+1)
    +	  
    +
    +Notice that the type checker replaces the occurrence of MkT by the wrapper, but +the occurrence of Nil by the worker. Reason: Nil doesn't have a wrapper because there is +nothing to do in the wrapper (this is the vastly common case). +

    +Though they are not printed out by "-ddump-tc", behind the scenes, there are +also the following: the data type declaration and the wrapper function for MkT. +

    +  data T a = MkT{d} a a Int# | Nil{d}
    + 
    +  $WMkT :: (a,a) -> T a -> T a
    +  $WMkT p t = case p of 
    +                (a,b) -> seq t (MkT{v} a b t)
    +
    +Here, the wrapper $WMkT evaluates and takes apart the argument p, evaluates the argument t, and builds a three-field data value -with the worker constructor $wMKT. (There are more notes below -about the unboxing of strict fields.) +with the worker constructor MkT{v}. (There are more notes below +about the unboxing of strict fields.) The worker $WMkT is called an implicit binding, +because it's introduced implicitly by the data type declaration (record selectors +are also implicit bindings, for example). Implicit bindings are injected into the code +just before emitting code or External Core.

    -So the original constructors, MkT and Nil are really just -wrappers which perhaps do some work before calling the workers -$wMkT and $wNil. The workers are -the "representation constructors" of -the "representation data type", which we can think of as being defined thus: - +After desugaring into Core (-ddump-ds), the definition of f looks like this: +

    +  f x = case x of
    +          Nil{d}       -> Nil{v}
    +          MkT{d} a b r -> let { p = (a,b); q = I#r } in 
    +	                  $WMkT p (q+1)
    +
    +Notice the way that pattern matching has been desugared to take account of the fact +that the "real" data constructor MkT has three fields. +

    +By the time the simplifier has had a go at it, f will be transformed to:

    -  data T a = $wMkT a a Int | $wNil
    +  f x = case x of
    +          Nil{d}       -> Nil{v}
    +          MkT{d} a b r -> MkT{v} a b (r +# 1#)
     
    +Which is highly cool. -This representation data type, gives the number and types of -fields of the constructors used to represent values of type T. -This representation type is also what is emitted when you print External Core -from GHC. -

    The constructor wrapper functions

    +

    The constructor wrapper functions

    The wrapper functions are automatically generated by GHC, and are really emitted into the result code (albeit only after CorePre; see @@ -65,12 +133,70 @@ if your Haskell source has
         map MkT xs
     
    -then MkT will not be inlined (because it is not applied to anything). +then $WMkT will not be inlined (because it is not applied to anything). That is why we generate real top-level bindings for the wrapper functions, and generate code for them. -

    Unboxing strict fields

    +

    The constructor worker functions

    + +Saturated applications of the constructor worker function MkT{v} are +treated specially by the code generator; they really do allocation. +However, we do want a single, shared, top-level definition for +top-level nullary constructors (like True and False). Furthermore, +what if the code generator encounters a non-saturated application of a +worker? E.g. (map Just xs). We could declare that to be an +error (CorePrep should saturate them). But instead we currently +generate a top-level defintion for each constructor worker, whether +nullary or not. It takes the form: +
    +  MkT{v} = \ p q r -> MkT{v} p q r
    +
    +This is a real hack. The occurrence on the RHS is saturated, so the code generator (both the +one that generates abstract C and the byte-code generator) treats it as a special case and +allocates a MkT; it does not make a recursive call! So now there's a top-level curried +version of the worker which is available to anyone who wants it. +

    +This strange defintion is not emitted into External Core. Indeed, you might argue that +we should instead pass the list of TyCons to the code generator and have it +generate magic bindings directly. As it stands, it's a real hack: see the code in +CorePrep.mkImplicitBinds. + + +

    External Core

    + +When emitting External Core, we should see this for our running example: + +
    +  data T a = MkT a a Int# | Nil{d}
    + 
    +  $WMkT :: (a,a) -> T a -> T a
    +  $WMkT p t = case p of 
    +                (a,b) -> seq t (MkT a b t)
    +
    +  f x = case x of
    +          Nil       -> Nil
    +          MkT a b r -> MkT a b (r +# 1#)
    +
    +Notice that it makes perfect sense as a program all by itself. Constructors +look like constructors (albeit not identical to the original Haskell ones). +

    +When reading in External Core, the parser is careful to read it back in just +as it was before it was spat out, namely: +

    +  data T a = MkT{d} a a Int# | Nil{d}
    + 
    +  $WMkT :: (a,a) -> T a -> T a
    +  $WMkT p t = case p of 
    +                (a,b) -> seq t (MkT{v} a b t)
    +
    +  f x = case x of
    +          Nil{d}       -> Nil{v}
    +          MkT{d} a b r -> MkT{v} a b (r +# 1#)
    +
    + + +

    Unboxing strict fields

    If GHC unboxes strict fields (as in the first argument of MkT above), it also transforms @@ -82,13 +208,8 @@ source-language case expressions. Suppose you write this in your Haskell source GHC will desugar this to the following Core code:
        case e of
    -     $wMkT a b t -> let p = (a,b) in ..p..t..
    +     MkT a b t -> let p = (a,b) in ..p..t..
     
    -(Important note: perhaps misleadingly, when printing Core we -actually print the constructor in the case expression as -"MkT" not as "$wMkT", but it really means the -latter.) -

    The local let-binding reboxes the pair because it may be mentioned in the case alternative. This may well be a bad idea, which is why -funbox-strict-fields is an experimental feature. diff --git a/ghc/docs/comm/the-beast/prelude.html b/ghc/docs/comm/the-beast/prelude.html index 87c16fe..f3aa206 100644 --- a/ghc/docs/comm/the-beast/prelude.html +++ b/ghc/docs/comm/the-beast/prelude.html @@ -8,13 +8,76 @@

    The GHC Commentary - Primitives and the Prelude

    + One of the trickiest aspects of GHC is the delicate interplay + between what knowledge is baked into the compiler, and what + knowledge it gets by reading the interface files of library + modules. In general, the less that is baked in, the better. +

    Most of what the compiler has to have wired in about primitives and prelude definitions is in fptools/ghc/compiler/prelude/.

    -

    Primitives

    +GHC recognises these main classes of baked-in-ness: +
    +
    Primitive types. +
    Primitive types cannot be defined in Haskell, and are utterly baked into the compiler. +They are notionally defined in the fictional module GHC.Prim. The TyCons for these types are all defined +in module TysPrim; for example, +
    +  intPrimTyCon :: TyCon 
    +  intPrimTyCon = ....
    +
    +Examples: +Int#, Float#, Addr#, State#. +

    +

    Wired-in types. +
    Wired-in types can be defined in Haskell, and indeed are (many are defined in GHC.Base). +However, it's very convenient for GHC to be able to use the type constructor for (say) Int +without looking it up in any environment. So module TysWiredIn contains many definitions +like this one: +
    +  intTyCon :: TyCon
    +  intTyCon = ....
    +
    +  intDataCon :: DataCon 
    +  intDataCon = ....
    +
    +However, since a TyCon value contains the entire type definition inside it, it follows +that the complete definition of Int is thereby baked into the compiler. +

    +Nevertheless, the library module GHC.Base still contains a definition for Int +just so that its info table etc get generated somewhere. Chaos will result if the wired-in definition +in TysWiredIn differs from that in GHC.Base. +

    +The rule is that only very simple types should be wired in (for example, Ratio is not, +and IO is certainly not). No class is wired in: classes are just too complicated. +

    +Examples: Int, Float, List, tuples. + +

    +

    Known-key things. +
    GHC knows of the existence of many, many other types, classes and values. But all it knows is +their Name. Remember, a Name includes a unique key that identifies the +thing, plus its defining module and occurrence name +(see The truth about Names). Knowing a Name, therefore, GHC can +run off to the interface file for the module and find out everything else it might need. +

    +Most of these known-key names are defined in module PrelNames; a further swathe concerning +Template Haskell are defined in DsMeta. The allocation of unique keys is done manually; +chaotic things happen if you make a mistake here, which is why they are all together. +

    + +All the Names from all the above categories are used to initialise the global name cache, +which maps (module,occurrence-name) pairs to the globally-unique Name for that +thing. (See HscMain.initOrigNames.) + +

    +The next sections elaborate these three classes a bit. + + +

    Primitives (module TysPrim)

    Some types and functions have to be hardwired into the compiler as they are atomic; all other code is essentially built around this primitive @@ -51,7 +114,7 @@ TyCon converts PrimRep values into the corresponding type constructor. -

    The Prelude

    +

    Wired in types (module TysWiredIn)

    In addition to entities that are primitive, as the compiler has to treat them specially in the backend, there is a set of types, functions, @@ -84,6 +147,9 @@ as mkListTy and mkTupleTy, which construct compound types.

    + +

    Known-key names (module PrelNames)

    + All names of types, functions, etc. known to the compiler are defined in PrelNames. @@ -123,6 +189,8 @@ floatPrimTyConKey = mkPreludeTyConUnique 11 the parser, such as [], and code generated from deriving clauses), which will take care of adding uniqueness information.

    + +

    Gathering it all together (module PrelInfo)

    The module PrelInfo in some sense ties all the above together and provides a reasonably