docs/comm/the-beast/vars.html

   1 <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">
   2 <html>
   3   <head>
   4     <META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=ISO-8859-1">
   5     <title>The GHC Commentary - The Real Story about Variables, Ids, TyVars, and the like</title>
   6   </head>
   7
   8   <body BGCOLOR="FFFFFF">
   9     <h1>The GHC Commentary - The Real Story about Variables, Ids, TyVars, and the like</h1>
  10     <p>
  11
  12
  13 <h2>Variables</h2>
  14
  15 The <code>Var</code> type, defined in <code>basicTypes/Var.lhs</code>,
  16 represents variables, both term variables and type variables:
  17 <pre>
  18     data Var
  19       = Var {
  20             varName    :: Name,
  21             realUnique :: FastInt,
  22             varType    :: Type,
  23             varDetails :: VarDetails,
  24             varInfo    :: IdInfo
  25         }
  26 </pre>
  27 <ul>
  28 <li> The <code>varName</code> field contains the identity of the variable:
  29 its unique number, and its print-name.  See "<a href="names.html">The truth about names</a>".
  30
  31 <p><li> The <code>realUnique</code> field caches the unique number in the
  32 <code>varName</code> field, just to make comparison of <code>Var</code>s a little faster.
  33
  34 <p><li> The <code>varType</code> field gives the type of a term variable, or the kind of a
  35 type variable.  (Types and kinds are both represented by a <code>Type</code>.)
  36
  37 <p><li> The <code>varDetails</code> field distinguishes term variables from type variables,
  38 and makes some further distinctions (see below).
  39
  40 <p><li> For term variables (only) the <code>varInfo</code> field contains lots of useful
  41 information: strictness, unfolding, etc.  However, this information is all optional;
  42 you can always throw away the <code>IdInfo</code>.  In contrast, you can't safely throw away
  43 the <code>VarDetails</code> of a <code>Var</code>
  44 </ul>
  45 <p>
  46 It's often fantastically convenient to have term variables and type variables
  47 share a single data type.  For example,
  48 <pre>
  49   exprFreeVars :: CoreExpr -> VarSet
  50 </pre>
  51 If there were two types, we'd need to return two sets.  Simiarly, big lambdas and
  52 little lambdas use the same constructor in Core, which is extremely convenient.
  53 <p>
  54 We define a couple of type synonyms:
  55 <pre>
  56   type Id    = Var  -- Term variables
  57   type TyVar = Var  -- Type variables
  58 </pre>
  59 just to help us document the occasions when we are expecting only term variables,
  60 or only type variables.
  61
  62
  63 <h2> The <code>VarDetails</code> field </h2>
  64
  65 The <code>VarDetails</code> field tells what kind of variable this is:
  66 <pre>
  67 data VarDetails
  68   = LocalId             -- Used for locally-defined Ids (see NOTE below)
  69         LocalIdDetails
  70
  71   | GlobalId            -- Used for imported Ids, dict selectors etc
  72         GlobalIdDetails
  73
  74   | TyVar
  75   | MutTyVar (IORef (Maybe Type))       -- Used during unification;
  76              TyVarDetails
  77 </pre>
  78
  79 <a name="TyVar">
  80 <h2>Type variables (<code>TyVar</code>)</h2>
  81 </a>
  82 <p>
  83 The <code>TyVar</code> case is self-explanatory.  The <code>MutTyVar</code>
  84 case is used only during type checking.  Then a type variable can be unified,
  85 using an imperative update, with a type, and that is what the
  86 <code>IORef</code> is for.  The <code>TcType.TyVarDetails</code> field records
  87 the sort of type variable we are dealing with.  It is defined as
  88 <pre>
  89 data TyVarDetails = SigTv | ClsTv | InstTv | VanillaTv
  90 </pre>
  91 <code>SigTv</code> marks type variables that were introduced when
  92 instantiating a type signature prior to matching it against the inferred type
  93 of a definition.  The variants <code>ClsTv</code> and <code>InstTv</code> mark
  94 scoped type variables introduced by class and instance heads, respectively.
  95 These first three sorts of type variables are skolem variables (tested by the
  96 predicate <code>isSkolemTyVar</code>); i.e., they must <em>not</em> be
  97 instantiated. All other type variables are marked as <code>VanillaTv</code>.
  98 <p>
  99 For a long time I tried to keep mutable Vars statically type-distinct
 100 from immutable Vars, but I've finally given up.   It's just too painful.
 101 After type checking there are no MutTyVars left, but there's no static check
 102 of that fact.
 103
 104 <h2>Term variables (<code>Id</code>)</h2>
 105
 106 A term variable (of type <code>Id</code>) is represented either by a
 107 <code>LocalId</code> or a <code>GlobalId</code>:
 108 <p>
 109 A <code>GlobalId</code> is
 110 <ul>
 111 <li> Always bound at top-level.
 112 <li> Always has a <code>GlobalName</code>, and hence has
 113      a <code>Unique</code> that is globally unique across the whole
 114      GHC invocation (a single invocation may compile multiple modules).
 115 <li> Has <code>IdInfo</code> that is absolutely fixed, forever.
 116 </ul>
 117
 118 <p>
 119 A <code>LocalId</code> is:
 120 <ul>
 121 <li> Always bound in the module being compiled:
 122 <ul>
 123 <li> <em>either</em> bound within an expression (lambda, case, local let(rec))
 124 <li> <em>or</em> defined at top level in the module being compiled.
 125 </ul>
 126 <li> Has IdInfo that changes as the simpifier bashes repeatedly on it.
 127 </ul>
 128 <p>
 129 The key thing about <code>LocalId</code>s is that the free-variable finder
 130 typically treats them as candidate free variables. That is, it ignores
 131 <code>GlobalId</code>s such as imported constants, data contructors, etc.
 132 <p>
 133 An important invariant is this: <em>All the bindings in the module
 134 being compiled (whether top level or not) are <code>LocalId</code>s
 135 until the CoreTidy phase.</em> In the CoreTidy phase, all
 136 externally-visible top-level bindings are made into GlobalIds.  This
 137 is the point when a <code>LocalId</code> becomes "frozen" and becomes
 138 a fixed, immutable <code>GlobalId</code>.
 139 <p>
 140 (A binding is <em>"externally-visible"</em> if it is exported, or
 141 mentioned in the unfolding of an externally-visible Id.  An
 142 externally-visible Id may not have an unfolding, either because it is
 143 too big, or because it is the loop-breaker of a recursive group.)
 144
 145 <h3>Global Ids and implicit Ids</h3>
 146
 147 <code>GlobalId</code>s are further categorised by their <code>GlobalIdDetails</code>.
 148 This type is defined in <code>basicTypes/IdInfo</code>, because it mentions other
 149 structured types like <code>DataCon</code>.  Unfortunately it is *used* in <code>Var.lhs</code>
 150 so there's a <code>hi-boot</code> knot to get it there.  Anyway, here's the declaration:
 151 <pre>
 152 data GlobalIdDetails
 153   = NotGlobalId                 -- Used as a convenient extra return value
 154                                 -- from globalIdDetails
 155
 156   | VanillaGlobal               -- Imported from elsewhere
 157
 158   | PrimOpId PrimOp             -- The Id for a primitive operator
 159   | FCallId ForeignCall         -- The Id for a foreign call
 160
 161   -- These next ones are all "implicit Ids"
 162   | RecordSelId FieldLabel      -- The Id for a record selector
 163   | DataConId DataCon           -- The Id for a data constructor *worker*
 164   | DataConWrapId DataCon       -- The Id for a data constructor *wrapper*
 165                                 -- [the only reasons we need to know is so that
 166                                 --  a) we can  suppress printing a definition in the interface file
 167                                 --  b) when typechecking a pattern we can get from the
 168                                 --     Id back to the data con]
 169 </pre>
 170 The <code>GlobalIdDetails</code> allows us to go from the <code>Id</code> for
 171 a record selector, say, to its field name; or the <code>Id</code> for a primitive
 172 operator to the <code>PrimOp</code> itself.
 173 <p>
 174 Certain <code>GlobalId</code>s are called <em>"implicit"</em> Ids.  An implicit
 175 Id is derived by implication from some other declaration.  So a record selector is
 176 derived from its data type declaration, for example.  An implicit Ids is always
 177 a <code>GlobalId</code>.  For most of the compilation, the implicit Ids are just
 178 that: implicit.  If you do -ddump-simpl you won't see their definition.  (That's
 179 why it's true to say that until CoreTidy all Ids in this compilation unit are
 180 LocalIds.)  But at CorePrep, a binding is added for each implicit Id defined in
 181 this module, so that the code generator will generate code for the (curried) function.
 182 <p>
 183 Implicit Ids carry their unfolding inside them, of course, so they may well have
 184 been inlined much earlier; but we generate the curried top-level defn just in
 185 case its ever needed.
 186
 187 <h3>LocalIds</h3>
 188
 189 The <code>LocalIdDetails</code> gives more info about a <code>LocalId</code>:
 190 <pre>
 191 data LocalIdDetails
 192   = NotExported -- Not exported
 193   | Exported    -- Exported
 194   | SpecPragma  -- Not exported, but not to be discarded either
 195                 -- It's unclean that this is so deeply built in
 196 </pre>
 197 From this we can tell whether the <code>LocalId</code> is exported, and that
 198 tells us whether we can drop an unused binding as dead code.
 199 <p>
 200 The <code>SpecPragma</code> thing is a HACK.  Suppose you write a SPECIALIZE pragma:
 201 <pre>
 202    foo :: Num a => a -> a
 203    {-# SPECIALIZE foo :: Int -> Int #-}
 204    foo = ...
 205 </pre>
 206 The type checker generates a dummy call to <code>foo</code> at the right types:
 207 <pre>
 208    $dummy = foo Int dNumInt
 209 </pre>
 210 The Id <code>$dummy</code> is marked <code>SpecPragma</code>.  Its role is to hang
 211 onto that call to <code>foo</code> so that the specialiser can see it, but there
 212 are no calls to <code>$dummy</code>.
 213 The simplifier is careful not to discard <code>SpecPragma</code> Ids, so that it
 214 reaches the specialiser.  The specialiser processes the right hand side of a <code>SpecPragma</code> Id
 215 to find calls to overloaded functions, <em>and then discards the <code>SpecPragma</code> Id</em>.
 216 So <code>SpecPragma</code> behaves a like <code>Exported</code>, at least until the specialiser.
 217
 218
 219 <h3> ExternalNames and InternalNames </h3>
 220
 221 Notice that whether an Id is a <code>LocalId</code> or <code>GlobalId</code> is
 222 not the same as whether the Id has an <code>ExternaName</code> or an <code>InternalName</code>
 223 (see "<a href="names.html#sort">The truth about Names</a>"):
 224 <ul>
 225 <li> Every <code>GlobalId</code> has an <code>ExternalName</code>.
 226 <li> A <code>LocalId</code> might have either kind of <code>Name</code>.
 227 </ul>
 228
 229 <!-- hhmts start -->
 230 Last modified: Fri Sep 12 15:17:18 BST 2003
 231 <!-- hhmts end -->
 232     </small>
 233   </body>
 234 </html>
 235