ghc/docs/comm/the-beast/vars.html

   1 <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">
   2 <html>
   3   <head>
   4     <META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=ISO-8859-1">
   5     <title>The GHC Commentary - The Real Story about Variables, Ids, TyVars, and the like</title>
   6   </head>
   7
   8   <body BGCOLOR="FFFFFF">
   9     <h1>The GHC Commentary - The Real Story about Variables, Ids, TyVars, and the like</h1>
  10     <p>
  11
  12
  13 <h2>Variables</h2>
  14
  15 The <code>Var</code> type, defined in <code>basicTypes/Var.lhs</code>,
  16 represents variables, both term variables and type variables:
  17 <pre>
  18     data Var
  19       = Var {
  20             varName    :: Name,
  21             realUnique :: FastInt,
  22             varType    :: Type,
  23             varDetails :: VarDetails,
  24             varInfo    :: IdInfo
  25         }
  26 </pre>
  27 <ul>
  28 <li> The <code>varName</code> field contains the identity of the variable:
  29 its unique number, and its print-name.  The unique number is cached in the
  30 <code>realUnique</code> field, just to make comparison of <code>Var</code>s a little faster.
  31
  32 <p><li> The <code>Type</code> field gives the type of a term variable, or the kind of a
  33 type variable.  (Types and kinds are both represented by a <code>Type</code>.)
  34
  35 <p><li> The <code>varDetails</code> field distinguishes term variables from type variables,
  36 and makes some further distinctions (see below).
  37
  38 <p><li> For term variables (only) the <code>varInfo</code> field contains lots of useful
  39 information: strictness, unfolding, etc.  However, this information is all optional;
  40 you can always throw away the <code>IdInfo</code>.  In contrast, you can't safely throw away
  41 the <code>VarDetails</code> of a <code>Var</code>
  42 </ul>
  43 <p>
  44 It's often fantastically convenient to have term variables and type variables
  45 share a single data type.  For example,
  46 <pre>
  47   exprFreeVars :: CoreExpr -> VarSet
  48 </pre>
  49 If there were two types, we'd need to return two sets.  Simiarly, big lambdas and
  50 little lambdas use the same constructor in Core, which is extremely convenient.
  51 <p>
  52 We define a couple of type synonyms:
  53 <pre>
  54   type Id    = Var  -- Term variables
  55   type TyVar = Var  -- Type variables
  56 </pre>
  57 just to help us document the occasions when we are expecting only term variables,
  58 or only type variables.
  59
  60 <h2> The <code>VarDetails</code> field </h2>
  61
  62 The <code>VarDetails</code> field tells what kind of variable this is:
  63 <pre>
  64 data VarDetails
  65   = LocalId             -- Used for locally-defined Ids (see NOTE below)
  66         LocalIdDetails
  67
  68   | GlobalId            -- Used for imported Ids, dict selectors etc
  69         GlobalIdDetails
  70
  71   | TyVar
  72   | MutTyVar (IORef (Maybe Type))       -- Used during unification;
  73              Bool                       -- True <=> this is a type signature variable, which
  74                                         --          should not be unified with a non-tyvar type
  75 </pre>
  76
  77 <a name="TyVar">
  78 <h2>Type variables (<code>TyVar</code>)</h2>
  79 </a>
  80 <p>
  81 The <code>TyVar</code> case is self-explanatory.  The
  82 <code>MutTyVar</code> case is used only during type checking.  Then a
  83 type variable can be unified, using an imperative update, with a type,
  84 and that is what the <code>IORef</code> is for.  The <code>Bool</code>
  85 field records whether the type variable arose from a type signature,
  86 in which case it should not be unified with a type (only with another
  87 type variable).
  88 <p>
  89 For a long time I tried to keep mutable Vars statically type-distinct
  90 from immutable Vars, but I've finally given up.   It's just too painful.
  91 After type checking there are no MutTyVars left, but there's no static check
  92 of that fact.
  93
  94 <h2>Term variables (<code>Id</code>)</h2>
  95
  96 A term variable (of type <code>Id</code>) is represented either by a
  97 <code>LocalId</code> or a <code>GlobalId</code>:
  98 <p>
  99 A <code>GlobalId</code> is
 100 <ul>
 101 <li> Always bound at top-level.
 102 <li> Always has a <code>GlobalName</code>, and hence has
 103      a <code>Unique</code> that is globally unique across the whole
 104      GHC invocation (a single invocation may compile multiple modules).
 105 <li> Has <code>IdInfo</code> that is absolutely fixed, forever.
 106 </ul>
 107
 108 <p>
 109 A <code>LocalId</code> is:
 110 <ul>
 111 <li> Always bound in the module being compiled:
 112 <ul>
 113 <li> <em>either</em> bound within an expression (lambda, case, local let(rec))
 114 <li> <em>or</em> defined at top level in the module being compiled.
 115 </ul>
 116 <li> Has IdInfo that changes as the simpifier bashes repeatedly on it.
 117 </ul>
 118 <p>
 119 The key thing about <code>LocalId</code>s is that the free-variable finder
 120 typically treats them as candidate free variables. That is, it ignores
 121 <code>GlobalId</code>s such as imported constants, data contructors, etc.
 122 <p>
 123 An important invariant is this: <em>All the bindings in the module
 124 being compiled (whether top level or not) are <code>LocalId</code>s
 125 until the CoreTidy phase.</em> In the CoreTidy phase, all
 126 externally-visible top-level bindings are made into GlobalIds.  This
 127 is the point when a <code>LocalId</code> becomes "frozen" and becomes
 128 a fixed, immutable <code>GlobalId</code>.
 129 <p>
 130 (A binding is <em>"externally-visible"</em> if it is exported, or
 131 mentioned in the unfolding of an externally-visible Id.  An
 132 externally-visible Id may not have an unfolding, either because it is
 133 too big, or because it is the loop-breaker of a recursive group.)
 134
 135 <h3>Global Ids and implicit Ids</h3>
 136
 137 <code>GlobalId</code>s are further categorised by their <code>GlobalIdDetails</code>.
 138 This type is defined in <code>basicTypes/IdInfo</code>, because it mentions other
 139 structured types like <code>DataCon</code>.  Unfortunately it is *used* in <code>Var.lhs</code>
 140 so there's a <code>hi-boot</code> knot to get it there.  Anyway, here's the declaration:
 141 <pre>
 142 data GlobalIdDetails
 143   = NotGlobalId                 -- Used as a convenient extra return value
 144                                 -- from globalIdDetails
 145
 146   | VanillaGlobal               -- Imported from elsewhere
 147
 148   | PrimOpId PrimOp             -- The Id for a primitive operator
 149   | FCallId ForeignCall         -- The Id for a foreign call
 150
 151   -- These next ones are all "implicit Ids"
 152   | RecordSelId FieldLabel      -- The Id for a record selector
 153   | DataConId DataCon           -- The Id for a data constructor *worker*
 154   | DataConWrapId DataCon       -- The Id for a data constructor *wrapper*
 155                                 -- [the only reasons we need to know is so that
 156                                 --  a) we can  suppress printing a definition in the interface file
 157                                 --  b) when typechecking a pattern we can get from the
 158                                 --     Id back to the data con]
 159 </pre>
 160 The <code>GlobalIdDetails</code> allows us to go from the <code>Id</code> for
 161 a record selector, say, to its field name; or the <code>Id</code> for a primitive
 162 operator to the <code>PrimOp</code> itself.
 163 <p>
 164 Certain <code>GlobalId</code>s are called <em>"implicit"</em> Ids.  An implicit
 165 Id is derived by implication from some other declaration.  So a record selector is
 166 derived from its data type declaration, for example.  An implicit Ids is always
 167 a <code>GlobalId</code>.  For most of the compilation, the implicit Ids are just
 168 that: implicit.  If you do -ddump-simpl you won't see their definition.  (That's
 169 why it's true to say that until CoreTidy all Ids in this compilation unit are
 170 LocalIds.)  But at CorePrep, a binding is added for each implicit Id defined in
 171 this module, so that the code generator will generate code for the (curried) function.
 172 <p>
 173 Implicit Ids carry their unfolding inside them, of course, so they may well have
 174 been inlined much earlier; but we generate the curried top-level defn just in
 175 case its ever needed.
 176
 177 <h3>LocalIds</h3>
 178
 179 The <code>LocalIdDetails</code> gives more info about a <code>LocalId</code>:
 180 <pre>
 181 data LocalIdDetails
 182   = NotExported -- Not exported
 183   | Exported    -- Exported
 184   | SpecPragma  -- Not exported, but not to be discarded either
 185                 -- It's unclean that this is so deeply built in
 186 </pre>
 187 From this we can tell whether the <code>LocalId</code> is exported, and that
 188 tells us whether we can drop an unused binding as dead code.
 189 <p>
 190 The <code>SpecPragma</code> thing is a HACK.  Suppose you write a SPECIALIZE pragma:
 191 <pre>
 192    foo :: Num a => a -> a
 193    {-# SPECIALIZE foo :: Int -> Int #-}
 194    foo = ...
 195 </pre>
 196 The type checker generates a dummy call to <code>foo</code> at the right types:
 197 <pre>
 198    $dummy = foo Int dNumInt
 199 </pre>
 200 The Id <code>$dummy</code> is marked <code>SpecPragma</code>.  Its role is to hang
 201 onto that call to <code>foo</code> so that the specialiser can see it, but there
 202 are no calls to <code>$dummy</code>.
 203 The simplifier is careful not to discard <code>SpecPragma</code> Ids, so that it
 204 reaches the specialiser.  The specialiser processes the right hand side of a <code>SpecPragma</code> Id
 205 to find calls to overloaded functions, <em>and then discards the <code>SpecPragma</code> Id</em>.
 206 So <code>SpecPragma</code> behaves a like <code>Exported</code>, at least until the specialiser.
 207
 208
 209 <h3>Global and Local <code>Name</code>s</h3>
 210
 211 Notice that whether an Id is a <code>LocalId</code> or <code>GlobalId</code> is
 212 not the same as whether the Id has a <code>Local</code> or <code>Global</code> <code>Name</code>:
 213 <ul>
 214 <li> Every <code>GlobalId</code> has a <code>Global</code> <code>Name</code>.
 215 <li> A <code>LocalId</code> might have either kind of <code>Name</code>.
 216 </ul>
 217 The significance of Global vs Local names is this:
 218 <ul>
 219 <li> A <code>Global</code> Name has a module and occurrence name; a <code>Local</code>
 220 has only an occurrence name.
 221 <p> <li> A <code>Global</code> Name has a unique that never changes.  It is never
 222 cloned.  This is important, because the simplifier invents new names pretty freely,
 223 but we don't want to lose the connnection with the type environment (constructed earlier).
 224 A <code>Local</code> name can be cloned freely.
 225 </ul>
 226
 227
 228 <!-- hhmts start -->
 229 Last modified: Tue Nov 13 14:11:35 EST 2001
 230 <!-- hhmts end -->
 231     </small>
 232   </body>
 233 </html>