<body BGCOLOR="FFFFFF">
<h1>The GHC Commentary - The truth about names: OccNames, and Names</h1>
<p>
-
-
-Every entity (type constructor, class, identifier, type variable) has
-a <code>Name</code>. The <code>Name</code> type is pervasive in GHC,
-and is defined in <code>basicTypes/Name.lhs</code>. Here is what a Name looks like,
-though it is private to the Name module.
-<pre>
- data Name = Name {
- n_sort :: NameSort, -- What sort of name it is
- n_occ :: !OccName, -- Its occurrence name
- n_uniq :: Unique, -- Its identity
- n_loc :: !SrcLoc -- Definition site
- }
-</pre>
-
-<ul>
-<li> The <code>n_sort</code> field says what sort of name this is: see
-<a href="#sort">NameSort below</a>.
-<li> The <code>n_occ</code> field gives the "occurrence name" of the Name; see
-<a href="#occname">OccName below</a>.
-<li> The <code>n_uniq</code> field allows fast tests for equality of Names.
-<li> The <code>n_loc</code> field gives some indication of where the name was bound.
-</ul>
-
-<h2><a name="sort">The <code>NameSort</code> of a <code>Name</code></a></h2>
-
-There are three flavours of <code>Name</code>:
-<pre>
- data NameSort
- = External Module
- | Internal
- | System
-</pre>
-
-<ul>
-<li> Here are the sorts of Name an entity can have:
-<ul>
-<li> Class, TyCon: External.
-<li> Id: External, Internal, or System.
-<li> TyVar: Internal, or System.
-</ul>
-
-<p><li> An <code>ExternalName</code> has a globally-unique
-(module name,occurrence name) pair, namely the
-<em>original name</em> of the entity,
-describing where the thing was originally defined. So for example,
-if we have
-<pre>
- module M where
- f = e1
- g = e2
-
- module A where
- import qualified M as Q
- import M
- a = Q.f + g
-</pre>
-then the RdrNames for "a", "Q.f" and "g" get replaced (by the Renamer)
-by the Names "A.a", "M.f", and "M.g" respectively.
-
-<p><li> An <code>InternalName</code>
-has only an occurrence name. Distinct InternalNames may have the same occurrence
-name; use the Unique to distinguish them.
-
-<p> <li> An <code>ExternalName</code> has a unique that never changes. It is never
-cloned. This is important, because the simplifier invents new names pretty freely,
-but we don't want to lose the connnection with the type environment (constructed earlier).
-An <code>InternalName</code> name can be cloned freely.
-
-<p><li> <strong>Before CoreTidy</strong>: the Ids that were defined at top level
-in the original source program get <code>ExternalNames</code>, whereas extra
-top-level bindings generated (say) by the type checker get <code>InternalNames</code>.
-This distinction is occasionally useful for filtering diagnostic output; e.g.
-for -ddump-types.
-
-<p><li> <strong>After CoreTidy</strong>: An Id with an <code>ExternalName</code> will generate symbols that
-appear as external symbols in the object file. An Id with an <code>InternalName</code>
-cannot be referenced from outside the module, and so generates a local symbol in
-the object file. The CoreTidy pass makes the decision about which names should
-be External and which Internal.
-
-<p><li> A <code>System</code> name is for the most part the same as an
-<code>Internal</code>. Indeed, the differences are purely cosmetic:
-<ul>
-<li>Internal names usually come from some name the
-user wrote, whereas a System name has an OccName like "a", or "t". Usually
-there are masses of System names with the same OccName but different uniques,
-whereas typically there are only a handful of distince Internal names with the same
-OccName.
-<li>
-Another difference is that when unifying the type checker tries to
-unify away type variables with System names, leaving ones with Internal names
-(to improve error messages).
-</ul>
-</ul>
-
-
-<h2> <a name="occname">Occurrence names: <code>OccName</code></a> </h2>
-
-An <code>OccName</code> is more-or-less just a string, like "foo" or "Tree",
-giving the (unqualified) name of an entity.
-
-Well, not quite just a string, because in Haskell a name like "C" could mean a type
-constructor or data constructor, depending on context. So GHC defines a type
-<tt>OccName</tt> (defined in <tt>basicTypes/OccName.lhs</tt>) that is a pair of
-a <tt>FastString</tt> and a <tt>NameSpace</tt> indicating which name space the
-name is drawn from:
-<pre>
- data OccName = OccName NameSpace EncodedFS
-</pre>
-The <tt>EncodedFS</tt> is a synonym for <tt>FastString</tt> indicating that the
-string is Z-encoded. (Details in <tt>OccName.lhs</tt>.) Z-encoding encodes
-funny characters like '%' and '$' into alphabetic characters, like "zp" and "zd",
-so that they can be used in object-file symbol tables without confusing linkers
-and suchlike.
-
-<p>
-The name spaces are:
-<ul>
-<li> <tt>VarName</tt>: ordinary variables
-<li> <tt>TvName</tt>: type variables
-<li> <tt>DataName</tt>: data constructors
-<li> <tt>TcClsName</tt>: type constructors and classes (in Haskell they share a name space)
-</ul>
-
+ Every entity (type constructor, class, identifier, type variable) has a
+ <code>Name</code>. The <code>Name</code> type is pervasive in GHC, and
+ is defined in <code>basicTypes/Name.lhs</code>. Here is what a Name
+ looks like, though it is private to the Name module.
+ </p>
+ <blockquote>
+ <pre>
+data Name = Name {
+ n_sort :: NameSort, -- What sort of name it is
+ n_occ :: !OccName, -- Its occurrence name
+ n_uniq :: Unique, -- Its identity
+ n_loc :: !SrcLoc -- Definition site
+ }</pre>
+ </blockquote>
+ <ul>
+ <li> The <code>n_sort</code> field says what sort of name this is: see
+ <a href="#sort">NameSort below</a>.
+ <li> The <code>n_occ</code> field gives the "occurrence name" of the
+ Name; see
+ <a href="#occname">OccName below</a>.
+ <li> The <code>n_uniq</code> field allows fast tests for equality of
+ Names.
+ <li> The <code>n_loc</code> field gives some indication of where the
+ name was bound.
+ </ul>
+
+ <h2><a name="sort">The <code>NameSort</code> of a <code>Name</code></a></h2>
+ <p>
+ There are four flavours of <code>Name</code>:
+ </p>
+ <blockquote>
+ <pre>
+data NameSort
+ = External Module (Maybe Name)
+ -- (Just parent) => this Name is a subordinate name of 'parent'
+ -- e.g. data constructor of a data type, method of a class
+ -- Nothing => not a subordinate
+
+ | WiredIn Module (Maybe Name) TyThing BuiltInSyntax
+ -- A variant of External, for wired-in things
+
+ | Internal -- A user-defined Id or TyVar
+ -- defined in the module being compiled
+
+ | System -- A system-defined Id or TyVar. Typically the
+ -- OccName is very uninformative (like 's')</pre>
+ </blockquote>
+ <ul>
+ <li>Here are the sorts of Name an entity can have:
+ <ul>
+ <li> Class, TyCon: External.
+ <li> Id: External, Internal, or System.
+ <li> TyVar: Internal, or System.
+ </ul>
+ </li>
+ <li>An <code>External</code> name has a globally-unique
+ (module name, occurrence name) pair, namely the
+ <em>original name</em> of the entity,
+ describing where the thing was originally defined. So for example,
+ if we have
+ <blockquote>
+ <pre>
+module M where
+ f = e1
+ g = e2
+
+module A where
+ import qualified M as Q
+ import M
+ a = Q.f + g</pre>
+ </blockquote>
+ <p>
+ then the RdrNames for "a", "Q.f" and "g" get replaced (by the
+ Renamer) by the Names "A.a", "M.f", and "M.g" respectively.
+ </p>
+ </li>
+ <li>An <code>InternalName</code>
+ has only an occurrence name. Distinct InternalNames may have the same
+ occurrence name; use the Unique to distinguish them.
+ </li>
+ <li>An <code>ExternalName</code> has a unique that never changes. It
+ is never cloned. This is important, because the simplifier invents
+ new names pretty freely, but we don't want to lose the connnection
+ with the type environment (constructed earlier). An
+ <code>InternalName</code> name can be cloned freely.
+ </li>
+ <li><strong>Before CoreTidy</strong>: the Ids that were defined at top
+ level in the original source program get <code>ExternalNames</code>,
+ whereas extra top-level bindings generated (say) by the type checker
+ get <code>InternalNames</code>. q This distinction is occasionally
+ useful for filtering diagnostic output; e.g. for -ddump-types.
+ </li>
+ <li><strong>After CoreTidy</strong>: An Id with an
+ <code>ExternalName</code> will generate symbols that
+ appear as external symbols in the object file. An Id with an
+ <code>InternalName</code> cannot be referenced from outside the
+ module, and so generates a local symbol in the object file. The
+ CoreTidy pass makes the decision about which names should be External
+ and which Internal.
+ </li>
+ <li>A <code>System</code> name is for the most part the same as an
+ <code>Internal</code>. Indeed, the differences are purely cosmetic:
+ <ul>
+ <li>Internal names usually come from some name the
+ user wrote, whereas a System name has an OccName like "a", or "t".
+ Usually there are masses of System names with the same OccName but
+ different uniques, whereas typically there are only a handful of
+ distince Internal names with the same OccName.
+ </li>
+ <li>Another difference is that when unifying the type checker tries
+ to unify away type variables with System names, leaving ones with
+ Internal names (to improve error messages).
+ </li>
+ </ul>
+ </li>
+ </ul>
+
+ <h2><a name="occname">Occurrence names: <code>OccName</code></a></h2>
+ <p>
+ An <code>OccName</code> is more-or-less just a string, like "foo" or
+ "Tree", giving the (unqualified) name of an entity.
+ </p>
+ <p>
+ Well, not quite just a string, because in Haskell a name like "C" could
+ mean a type constructor or data constructor, depending on context. So
+ GHC defines a type <tt>OccName</tt> (defined in
+ <tt>basicTypes/OccName.lhs</tt>) that is a pair of a <tt>FastString</tt>
+ and a <tt>NameSpace</tt> indicating which name space the name is drawn
+ from:
+ <blockquote>
+ <pre>
+data OccName = OccName NameSpace EncodedFS</pre>
+ </blockquote>
+ <p>
+ The <tt>EncodedFS</tt> is a synonym for <tt>FastString</tt> indicating
+ that the string is Z-encoded. (Details in <tt>OccName.lhs</tt>.)
+ Z-encoding encodes funny characters like '%' and '$' into alphabetic
+ characters, like "zp" and "zd", so that they can be used in object-file
+ symbol tables without confusing linkers and suchlike.
+ </p>
+ <p>
+ The name spaces are:
+ </p>
+ <ul>
+ <li> <tt>VarName</tt>: ordinary variables</li>
+ <li> <tt>TvName</tt>: type variables</li>
+ <li> <tt>DataName</tt>: data constructors</li>
+ <li> <tt>TcClsName</tt>: type constructors and classes (in Haskell they
+ share a name space) </li>
+ </ul>
+
+ <small>
<!-- hhmts start -->
-Last modified: Tue Nov 13 14:11:35 EST 2001
+Last modified: Wed May 4 14:57:55 EST 2005
<!-- hhmts end -->
</small>
</body>
<html>
<head>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=ISO-8859-1">
- <title>The GHC Commentary - The Real Story about Variables, Ids, TyVars, and the like</title>
+ <title>The GHC Commentary - The Glorious Renamer</title>
</head>
<body BGCOLOR="FFFFFF">
<h1>The GHC Commentary - The Glorious Renamer</h1>
<p>
+ The <em>renamer</em> sits between the parser and the typechecker.
+ However, its operation is quite tightly interwoven with the
+ typechecker. This is partially due to support for Template Haskell,
+ where spliced code has to be renamed and type checked. In particular,
+ top-level splices lead to multiple rounds of renaming and type
+ checking.
+ </p>
+ <p>
+ The main externally used functions of the renamer are provided by the
+ module <code>rename/RnSource.lhs</code>. In particular, we have
+ </p>
+ <blockquote>
+ <pre>
+rnSrcDecls :: HsGroup RdrName -> RnM (TcGblEnv, HsGroup Name)
+rnTyClDecls :: [LTyClDecl RdrName] -> RnM [LTyClDecl Name]
+rnSplice :: HsSplice RdrName -> RnM (HsSplice Name, FreeVars)</pre>
+ </blockquote>
+ <p>
+ All of which execute in the renamer monad <code>RnM</code>. The first
+ function, <code>rnSrcDecls</code> renames a binding group; the second,
+ <code>rnTyClDecls</code> renames a list of (toplevel) type and class
+ declarations; and the third, <code>rnSplice</code> renames a Template
+ Haskell splice. As the types indicate, the main task of the renamer is
+ to convert converts all the <tt>RdrNames</tt> to <a
+ href="names.html"><tt>Names</tt></a>, which includes a number of
+ well-formedness checks (no duplicate declarations, all names are in
+ scope, and so on). In addition, the renamer performs other, not
+ strictly name-related, well-formedness checks, which includes checking
+ that the appropriate flags have been supplied whenever language
+ extensions are used in the source.
+ </p>
+
+ <h2>RdrNames</h2>
+ <p>
+ A <tt>RdrName.RdrName</tt> is pretty much just a string (for an
+ unqualified name like "<tt>f</tt>") or a pair of strings (for a
+ qualified name like "<tt>M.f</tt>"):
+ </p>
+ <blockquote>
+ <pre>
+data RdrName
+ = Unqual OccName
+ -- Used for ordinary, unqualified occurrences
-(This section is, like most of the Commentary, rather incomplete.)
-<p>
-The <em>renamer</em> sits between the parser and the typechecker.
-Roughly speaking, It has the type:
-<pre>
- HsModule RdrName -> HsModule Name
-</pre>
-That is, it converts all the <tt>RdrNames</tt> to <a href="names.html"><tt>Names</tt></a>.
+ | Qual Module OccName
+ -- A qualified name written by the user in
+ -- *source* code. The module isn't necessarily
+ -- the module where the thing is defined;
+ -- just the one from which it is imported
-<h2> RdrNames </h2>
+ | Orig Module OccName
+ -- An original name; the module is the *defining* module.
+ -- This is used when GHC generates code that will be fed
+ -- into the renamer (e.g. from deriving clauses), but where
+ -- we want to say "Use Prelude.map dammit".
+
+ | Exact Name
+ -- We know exactly the Name. This is used
+ -- (a) when the parser parses built-in syntax like "[]"
+ -- and "(,)", but wants a RdrName from it
+ -- (b) when converting names to the RdrNames in IfaceTypes
+ -- Here an Exact RdrName always contains an External Name
+ -- (Internal Names are converted to simple Unquals)
+ -- (c) by Template Haskell, when TH has generated a unique name</pre>
+ </blockquote>
+ <p>
+ The OccName type is described in <a href="names.html#occname">The
+ truth about names</a>.
+ </p>
-A <tt>RdrNames</tt> is pretty much just a string (for an unqualified name
-like "<tt>f</tt>") or a pair of strings (for a qualified name like "<tt>M.f</tt>"):
-<pre>
- data RdrName = RdrName Qual OccName
-
- data Qual = Unqual
-
- | Qual ModuleName -- A qualified name written by the user in source code
- -- The module isn't necessarily the module where
- -- the thing is defined; just the one from which it
- -- is imported
-
- | Orig ModuleName -- This is an *original* name; the module is the place
- -- where the thing was defined
-</pre>
-The OccName type is described in <a href="names.html#occname">"The truth about names"</a>.
-<p>
-The <tt>OrigName</tt> variant is used internally; it allows GHC to speak of <tt>RdrNames</tt>
-that refer to the original name of the thing.
+ <h2>The Renamer Monad</h2>
+ <p>
+ Due to the tight integration of the renamer with the typechecker, both
+ use the same monad in recent versions of GHC. So, we have
+ </p>
+ <blockquote>
+ <pre>
+type RnM a = TcRn a -- Historical
+type TcM a = TcRn a -- Historical</pre>
+ </blockquote>
+ <p>
+ with the combined monad defined as
+ </p>
+ <blockquote>
+ <pre>
+type TcRn a = TcRnIf TcGblEnv TcLclEnv a
+type TcRnIf a b c = IOEnv (Env a b) c
+data Env gbl lcl -- Changes as we move into an expression
+ = Env {
+ env_top :: HscEnv, -- Top-level stuff that never changes
+ -- Includes all info about imported things
-<h2> Rebindable syntax </h2>
+ env_us :: TcRef UniqSupply, -- Unique supply for local varibles
-In Haskell when one writes "3" one gets "fromInteger 3", where
-"fromInteger" comes from the Prelude (regardless of whether the
-Prelude is in scope). If you want to completely redefine numbers,
-that becomes inconvenient. So GHC lets you say
-"-fno-implicit-prelude"; in that case, the "fromInteger" comes from
-whatever is in scope. (This is documented in the User Guide.)
-<p>
-This feature is implemented as follows (I always forget).
-<ul>
-<li> Four HsSyn constructs (NegApp, NPlusKPat, HsIntegral, HsFractional)
-contain a <tt>Name</tt> (i.e. it is not parameterised).
-<li> When the parser builds these constructs, it puts in the built-in Prelude
-Name (e.g. PrelNum.fromInteger).
-<li> When the renamer encounters these constructs, it calls <tt>RnEnv.lookupSyntaxName</tt>.
-This checks for <tt>-fno-implicit-prelude</tt>; if not, it just returns the same Name;
-otherwise it takes the occurrence name of the Name, turns it into an unqualified RdrName, and looks
-it up in the environment. The returned name is plugged back into the construct.
-<li> The typechecker uses the Name to generate the appropriate typing constraints.
-</ul>
+ env_gbl :: gbl, -- Info about things defined at the top level
+ -- of the module being compiled
+
+ env_lcl :: lcl -- Nested stuff; changes as we go into
+ -- an expression
+ }</pre>
+ </blockquote>
+ <p>
+ the details of the global environment type <code>TcGblEnv</code> and
+ local environment type <code>TcLclEnv</code> are also defined in the
+ module <code>typecheck/TcRnTypes.lhs</code>. The monad
+ <code>IOEnv</code> is defined in <code>utils/IOEnv.hs</code> and extends
+ the vanilla <code>IO</code> monad with an additional state parameter
+ <code>env</code> that is treated as in a reader monad. (Side effecting
+ operations, such as updating the unique supply, are done with
+ <code>TcRef</code>s, which are simply a synonym for <code>IORef</code>s.)
+ </p>
+
+ <h2>Name Space Management</h2>
+ <p>
+ As anticipated by the variants <code>Orig</code> and <code>Exact</code>
+ of <code>RdrName</code> some names should not change during renaming,
+ whereas others need to be turned into unique names. In this context,
+ the two functions <code>RnEnv.newTopSrcBinder</code> and
+ <code>RnEnv.newLocals</code> are important:
+ </p>
+ <blockquote>
+ <pre>
+newTopSrcBinder :: Module -> Maybe Name -> Located RdrName -> RnM Name
+newLocalsRn :: [Located RdrName] -> RnM [Name]</pre>
+ </blockquote>
+ <p>
+ The two functions introduces new toplevel and new local names,
+ respectively, where the first two arguments to
+ <code>newTopSrcBinder</code> determine the currently compiled module and
+ the parent construct of the newly defined name. Both functions create
+ new names only for <code>RdrName</code>s that are neither exact nor
+ original.
+ </p>
+ <h3>Introduction of Toplevel Names: Global RdrName Environment</h3>
+ <p>
+ A global <code>RdrName</code> environment
+ <code>RdrName.GlobalRdrEnv</code> is a map from <code>OccName</code>s to
+ lists of qualified names. More precisely, the latter are
+ <code>Name</code>s with an associated <code>Provenance</code>:
+ </p>
+ <blockquote>
+ <pre>
+data Provenance
+ = LocalDef -- Defined locally
+ Module
+
+ | Imported -- Imported
+ [ImportSpec] -- INVARIANT: non-empty
+ Bool -- True iff the thing was named *explicitly*
+ -- in *any* of the import specs rather than being
+ -- imported as part of a group;
+ -- e.g.
+ -- import B
+ -- import C( T(..) )
+ -- Here, everything imported by B, and the constructors of T
+ -- are not named explicitly; only T is named explicitly.
+ -- This info is used when warning of unused names.</pre>
+ </blockquote>
+ <p>
+ The part of the global <code>RdrName</code> environment for a module
+ that contains the local definitions is created by the function
+ <code>RnNames.importsFromLocalDecls</code>, which also computes a data
+ structure recording all imported declarations in the form of a value of
+ type <code>TcRnTypes.ImportAvails</code>.
+ </p>
+ <p>
+ The function <code>importsFromLocalDecls</code>, in turn, makes use of
+ <code>RnNames.getLocalDeclBinders :: Module -> HsGroup RdrName -> RnM
+ [AvailInfo]</code> to extract all declared names from a binding group,
+ where <code>HscTypes.AvailInfo</code> is essentially a collection of
+ <code>Name</code>s; i.e., <code>getLocalDeclBinders</code>, on the fly,
+ generates <code>Name</code>s from the <code>RdrName</code>s of all
+ top-level binders of the module represented by the <code>HsGroup
+ RdrName</code> argument.
+ </p>
+ <p>
+ It is important to note that all this happens before the renamer
+ actually descends into the toplevel bindings of a module. In other
+ words, before <code>TcRnDriver.rnTopSrcDecls</code> performs the
+ renaming of a module by way of <code>RnSource.rnSrcDecls</code>, it uses
+ <code>importsFromLocalDecls</code> to set up the global
+ <code>RdrName</code> environment, which contains <code>Name</code>s for
+ all imported <em>and</em> all locally defined toplevel binders. Hence,
+ when the helpers of <code>rnSrcDecls</code> come across the
+ <em>defining</em> occurences of a toplevel <code>RdrName</code>, they
+ don't rename it by generating a new name, but they simply look up its
+ name in the global <code>RdrName</code> environment.
+ </p>
+
+ <h2>Rebindable syntax</h2>
+ <p>
+ In Haskell when one writes "3" one gets "fromInteger 3", where
+ "fromInteger" comes from the Prelude (regardless of whether the
+ Prelude is in scope). If you want to completely redefine numbers,
+ that becomes inconvenient. So GHC lets you say
+ "-fno-implicit-prelude"; in that case, the "fromInteger" comes from
+ whatever is in scope. (This is documented in the User Guide.)
+ </p>
+ <p>
+ This feature is implemented as follows (I always forget).
+ <ul>
+ <li>Names that are implicitly bound by the Prelude, are marked by the
+ type <code>HsExpr.SyntaxExpr</code>. Moreover, the association list
+ <code>HsExpr.SyntaxTable</code> is set up by the renamer to map
+ rebindable names to the value they are bound to.
+ </li>
+ <li>Currently, five constructs related to numerals
+ (<code>HsExpr.NegApp</code>, <code>HsPat.NPat</code>,
+ <code>HsPat.NPlusKPat</code>, <code>HsLit.HsIntegral</code>, and
+ <code>HsLit.HsFractional</code>) and
+ two constructs related to code>do</code> expressions
+ (<code>HsExpr.BindStmt</code> and
+ <code>HsExpr.ExprStmt</code>) have rebindable syntax.
+ </li>
+ <li> When the parser builds these constructs, it puts in the
+ built-in Prelude Name (e.g. PrelNum.fromInteger).
+ </li>
+ <li> When the renamer encounters these constructs, it calls
+ <tt>RnEnv.lookupSyntaxName</tt>.
+ This checks for <tt>-fno-implicit-prelude</tt>; if not, it just
+ returns the same Name; otherwise it takes the occurrence name of the
+ Name, turns it into an unqualified RdrName, and looks it up in the
+ environment. The returned name is plugged back into the construct.
+ </li>
+ <li> The typechecker uses the Name to generate the appropriate typing
+ constraints.
+ </li>
+ </ul>
+
+ <p><small>
<!-- hhmts start -->
-Last modified: Tue Nov 13 14:11:35 EST 2001
+Last modified: Wed May 4 17:16:15 EST 2005
<!-- hhmts end -->
</small>
</body>
</html>
+