X-Git-Url: http://git.megacz.com/?p=ghc-hetmet.git;a=blobdiff_plain;f=docs%2Fcomm%2Fthe-beast%2Fdata-types.html;fp=docs%2Fcomm%2Fthe-beast%2Fdata-types.html;h=fef4852d4d10085533fd77c5f44a4db6c279ff91;hp=0000000000000000000000000000000000000000;hb=0065d5ab628975892cea1ec7303f968c3338cbe1;hpb=28a464a75e14cece5db40f2765a29348273ff2d2 diff --git a/docs/comm/the-beast/data-types.html b/docs/comm/the-beast/data-types.html new file mode 100644 index 0000000..fef4852 --- /dev/null +++ b/docs/comm/the-beast/data-types.html @@ -0,0 +1,242 @@ + + + + + The GHC Commentary - Data types and data constructors + + + +

The GHC Commentary - Data types and data constructors

+ +This chapter was thoroughly changed Feb 2003. + +

Data types

+ +Consider the following data type declaration: + +

+  data T a = MkT !(a,a) !(T a) | Nil
+
+  f x = case x of
+          MkT p q -> MkT p (q+1)
+	  Nil     -> Nil
+

+The user's source program mentions only the constructors MkT +and Nil. However, these constructors actually do something +in addition to building a data value. For a start, MkT evaluates +its arguments. Secondly, with the flag -funbox-strict-fields GHC +will flatten (or unbox) the strict fields. So we may imagine that there's the +source constructor MkT and the representation constructor +MkT, and things start to get pretty confusing. +

+GHC now generates three unique Names for each data constructor: +

+                                 ---- OccName ------
+			         String  Name space	Used for
+  ---------------------------------------------------------------------------
+  The "source data con" 	   MkT	  DataName	The DataCon itself
+  The "worker data con"		   MkT	  VarName	Its worker Id
+    aka "representation data con"
+  The "wrapper data con"	   $WMkT  VarName	Its wrapper Id (optional)
+

+Recall that each occurrence name (OccName) is a pair of a string and a +name space (see The truth about names), and +two OccNames are considered the same only if both components match. +That is what distinguishes the name of the name of the DataCon from +the name of its worker Id. To keep things unambiguous, in what +follows we'll write "MkT{d}" for the source data con, and "MkT{v}" for +the worker Id. (Indeed, when you dump stuff with "-ddumpXXX", if you +also add "-dppr-debug" you'll get stuff like "Foo {- d rMv -}". The +"d" part is the name space; the "rMv" is the unique key.) +

+Each of these three names gets a distinct unique key in GHC's name cache. + +

The life cycle of a data type

+ +Suppose the Haskell source looks like this: +

+  data T a = MkT !(a,a) !Int | Nil
+
+  f x = case x of
+          Nil     -> Nil
+          MkT p q -> MkT p (q+1)
+

+When the parser reads it in, it decides which name space each lexeme comes +from, thus: +

+  data T a = MkT{d} !(a,a) !Int | Nil{d}
+
+  f x = case x of
+          Nil{d}     -> Nil{d}
+          MkT{d} p q -> MkT{d} p (q+1)
+

+Notice that in the Haskell source all data contructors are named via the "source data con" MkT{d}, +whether in pattern matching or in expressions. +

+In the translated source produced by the type checker (-ddump-tc), the program looks like this: +

+  f x = case x of
+          Nil{d}     -> Nil{v}
+          MkT{d} p q -> $WMkT p (q+1)
+	  
+

+Notice that the type checker replaces the occurrence of MkT by the wrapper, but +the occurrence of Nil by the worker. Reason: Nil doesn't have a wrapper because there is +nothing to do in the wrapper (this is the vastly common case). +

+Though they are not printed out by "-ddump-tc", behind the scenes, there are +also the following: the data type declaration and the wrapper function for MkT. +

+  data T a = MkT{d} a a Int# | Nil{d}
+ 
+  $WMkT :: (a,a) -> T a -> T a
+  $WMkT p t = case p of 
+                (a,b) -> seq t (MkT{v} a b t)
+

+Here, the wrapper $WMkT evaluates and takes apart the argument p, +evaluates the argument t, and builds a three-field data value +with the worker constructor MkT{v}. (There are more notes below +about the unboxing of strict fields.) The worker $WMkT is called an implicit binding, +because it's introduced implicitly by the data type declaration (record selectors +are also implicit bindings, for example). Implicit bindings are injected into the code +just before emitting code or External Core. +

+After desugaring into Core (-ddump-ds), the definition of f looks like this: +

+  f x = case x of
+          Nil{d}       -> Nil{v}
+          MkT{d} a b r -> let { p = (a,b); q = I# r } in 
+	                  $WMkT p (q+1)
+

+Notice the way that pattern matching has been desugared to take account of the fact +that the "real" data constructor MkT has three fields. +

+By the time the simplifier has had a go at it, f will be transformed to: +

+  f x = case x of
+          Nil{d}       -> Nil{v}
+          MkT{d} a b r -> MkT{v} a b (r +# 1#)
+

+Which is highly cool. + + +

The constructor wrapper functions

+ +The wrapper functions are automatically generated by GHC, and are +really emitted into the result code (albeit only after CorePre; see +CorePrep.mkImplicitBinds). +The wrapper functions are inlined very +vigorously, so you will not see many occurrences of the wrapper +functions in an optimised program, but you may see some. For example, +if your Haskell source has +

+    map MkT xs
+

+then $WMkT will not be inlined (because it is not applied to anything). +That is why we generate real top-level bindings for the wrapper functions, +and generate code for them. + + +

The constructor worker functions

+ +Saturated applications of the constructor worker function MkT{v} are +treated specially by the code generator; they really do allocation. +However, we do want a single, shared, top-level definition for +top-level nullary constructors (like True and False). Furthermore, +what if the code generator encounters a non-saturated application of a +worker? E.g. (map Just xs). We could declare that to be an +error (CorePrep should saturate them). But instead we currently +generate a top-level defintion for each constructor worker, whether +nullary or not. It takes the form: +

+  MkT{v} = \ p q r -> MkT{v} p q r
+

+This is a real hack. The occurrence on the RHS is saturated, so the code generator (both the +one that generates abstract C and the byte-code generator) treats it as a special case and +allocates a MkT; it does not make a recursive call! So now there's a top-level curried +version of the worker which is available to anyone who wants it. +

+This strange defintion is not emitted into External Core. Indeed, you might argue that +we should instead pass the list of TyCons to the code generator and have it +generate magic bindings directly. As it stands, it's a real hack: see the code in +CorePrep.mkImplicitBinds. + + +

External Core

+ +When emitting External Core, we should see this for our running example: + +

+  data T a = MkT a a Int# | Nil{d}
+ 
+  $WMkT :: (a,a) -> T a -> T a
+  $WMkT p t = case p of 
+                (a,b) -> seq t (MkT a b t)
+
+  f x = case x of
+          Nil       -> Nil
+          MkT a b r -> MkT a b (r +# 1#)
+

+Notice that it makes perfect sense as a program all by itself. Constructors +look like constructors (albeit not identical to the original Haskell ones). +

+When reading in External Core, the parser is careful to read it back in just +as it was before it was spat out, namely: +

+  data T a = MkT{d} a a Int# | Nil{d}
+ 
+  $WMkT :: (a,a) -> T a -> T a
+  $WMkT p t = case p of 
+                (a,b) -> seq t (MkT{v} a b t)
+
+  f x = case x of
+          Nil{d}       -> Nil{v}
+          MkT{d} a b r -> MkT{v} a b (r +# 1#)
+

+ + +

Unboxing strict fields

+ +If GHC unboxes strict fields (as in the first argument of MkT above), +it also transforms +source-language case expressions. Suppose you write this in your Haskell source: +

+   case e of 
+     MkT p t -> ..p..t..
+

+GHC will desugar this to the following Core code: +

+   case e of
+     MkT a b t -> let p = (a,b) in ..p..t..
+

+The local let-binding reboxes the pair because it may be mentioned in +the case alternative. This may well be a bad idea, which is why +-funbox-strict-fields is an experimental feature. +

+It's essential that when importing a type T defined in some +external module M, GHC knows what representation was used for +that type, and that in turn depends on whether module M was +compiled with -funbox-strict-fields. So when writing an +interface file, GHC therefore records with each data type whether its +strict fields (if any) should be unboxed. + +

Labels and info tables

+ +Quick rough notes: SLPJ March 2003. +

+Every data constructor Chas two info tables: +

The static info table (label C_static_info), used for statically-allocated constructors. + +
The dynamic info table (label C_con_info), used for dynamically-allocated constructors. +

+Statically-allocated constructors are not moved by the garbage collector, and therefore have a different closure +type from dynamically-allocated constructors; hence they need +a distinct info table. +Both info tables share the same entry code, but since the entry code is phyiscally juxtaposed with the +info table, it must be duplicated (C_static_entry and C_con_entry respectively). + + + +