[project @ 2002-11-04 15:33:29 by simonpj]

[ghc-hetmet.git] / ghc / docs / ghci / ghci.tex
diff --git a/ghc/docs/ghci/ghci.tex b/ghc/docs/ghci/ghci.tex

index b1cf14f..c4638a6 100644 (file)
--- a/ghc/docs/ghci/ghci.tex
+++ b/ghc/docs/ghci/ghci.tex
@@ -57,27 +57,6 @@
  %%%\newpage
  
  %%-----------------------------------------------------------------%%
-\section*{Misc text looking for a home}
-
-\subsubsection*{Starting up}
-Some of the session-lifetime data structures are opaque to CM, so
-it doesn't know how to create an initial one.  Hence it relies on its
-client to supply the following:
-\begin{verbatim}
-   emptyPCS :: PCS
-   emptyOST :: OST
-\end{verbatim}
-The PCS is maintained solely by @compile@, and OST solely by
-@link@/@unlink@.  CM cannot know the representation of the latter
-since it depends on whether we're operating in interactive or batch
-mode.
-
-
-@compile@ is supplied with, and checks PIT (inside PCS) before
-reading package interfaces, so it doesn't read and add duplicate
-@ModIFace@s to PIT.
-
-
  \section{Details}
  
  \subsection{Outline of the design}
@@ -138,7 +117,7 @@ visibility.  Subsequent sections elaborate who can see what.
        unlinked translations of home modules only.
  \item {\bf Module Graph (MG)} (owner: CM) is the current module graph.
  \item {\bf Static Info (SI)} (owner: CM) is the package configuration
-      information and compiler flags.
+      information (PCI) and compiler flags (FLAGS).
  \item {\bf Persistent Compiler State (PCS)} (owner: @compile@)
        is @compile@'s private cache of information about package
        modules.
@@ -168,7 +147,7 @@ maps, so they are given a @Unique@.
  \end{verbatim}
  
  A @ModLocation@ says where a module is, what it's called and in what
-form it it.
+form it is.
  \begin{verbatim}
     data ModLocation = SourceOnly Module Path         -- .hs
                      | ObjectCode Module Path Path    -- .o, .hi
@@ -184,7 +163,7 @@ updated, presumably by a compile run outside of the GHCI session.
  Hence the two-stage type:
  \begin{verbatim}
     type Finder = ModName -> IO ModLocation
-   newFinder :: [PCI] -> IO Finder
+   newFinder :: PCI -> IO Finder
  \end{verbatim}
  @newFinder@ examines the package information right at the start, but 
  returns an @IO@-typed function which can inspect home module changes
@@ -201,9 +180,9 @@ can be created quickly.
  \begin{verbatim}
     data ModSummary = ModSummary 
                          ModLocation   -- location and kind
-                        Maybe (String, Fingerprint)
+                        (Maybe (String, Fingerprint))
                                        -- source and fingerprint if .hs
-                        [ModName]     -- imports
+                        (Maybe [ModName])     -- imports if .hs or .hi
  
     type Fingerprint = ...  -- file timestamp, or source checksum?
  
@@ -268,15 +247,14 @@ inspecting them.
  \item
     {\bf Home Symbol Table (HST)} @:: FiniteMap Module ModDetails@
  
-   The @ModDetails@ contain tycons, classes, instances,
-   etc, collectively known as ``entities''.  Referrals from other
-   modules to these entities is direct, with no intervening
+   The @ModDetails@ (a couple of layers down) contain tycons, classes,
+   instances, etc, collectively known as ``entities''.  Referrals from
+   other modules to these entities is direct, with no intervening
     indirections of any kind; conversely, these entities refer directly
-   to other entities, regardless of module boundaries.  HST only
-   holds information for home modules; the corresponding wired-up
-   details for package (non-home) modules are created lazily in
-   the package symbol table (PST) inside the persistent compiler's state
-   (PST).
+   to other entities, regardless of module boundaries.  HST only holds
+   information for home modules; the corresponding wired-up details
+   for package (non-home) modules are created on demand in the package
+   symbol table (PST) inside the persistent compiler's state (PCS).
  
     CM maintains the HST, which is passed to, but not modified by,
     @compile@.  If compilation of a module is successful, @compile@
@@ -311,11 +289,13 @@ inspecting them.
     object, archive or DLL file.  In interactive mode, it may also be
     the STG trees derived from translating a module.  So @compile@
     returns a @Linkable@ from each successful run, namely that of
-   translating the module at hand.  At link-time, CM supplies these
-   @Linkable@s to @link@.  It also examines the @ModSummary@s for all
-   home modules, and by examining their imports and the PCI (package
-   configuration info) it can determine the @Linkable@s from all
-   required imported packages too.
+   translating the module at hand.  
+
+   At link-time, CM supplies @Linkable@s for the upwards closure of
+   all packages which have changed, to @link@.  It also examines the
+   @ModSummary@s for all home modules, and by examining their imports
+   and the SI.PCI (package configuration info) it can determine the
+   @Linkable@s from all required imported packages too.
  
     @Linkable@s and @ModIFace@s have a close relationship.  Each
     translated module has a corresponding @Linkable@ somewhere.
@@ -324,57 +304,127 @@ inspecting them.
     single @Linkable@ -- as is the case for any module from a
     multi-module package.  For these reasons it seems appropriate to
     keep the two concepts distinct.  @Linkable@s also provide
-   information about how to link package components together, and that
-   insn't the business of any specific module to know.
+   information about the sequence in which individual package
+   components should be linked, and that isn't the business of any
+   specific module to know.
  
     CM passes @compile@ a module's old @ModIFace@, if it has one, in
     the hope that the module won't need recompiling.  If so, @compile@
-   can just return the @ModIFace@ along with a new @ModDetails@
-   created from it.  Similarly, CM passes in a module's old
-   @Linkable@, if it has one, and that's returned unchanged if the
-   module isn't recompiled.
+   can just return the new @ModDetails@ created from it, and CM will
+   re-use the old @ModIFace@.  If the module {\em is} recompiled (or 
+   scheduled to be loaded from disk), @compile@ returns both the 
+   new @ModIFace@ and new @Linkable@.
  
  \item 
     {\bf Module Graph (MG)} @:: known-only-to-CM@
  
     Records, for CM's purposes, the current module graph,
     up-to-dateness and summaries.  More details when I get to them.
+   Only contains home modules.
  \end{itemize}
+Probably all this stuff is rolled together into the Persistent CM
+State (PCMS):
+\begin{verbatim}
+  data PCMS = PCMS HST HIT UI MG
+  emptyPCMS :: IO PCMS
+\end{verbatim}
+
+\subsubsection{What CM implements}
+It pretty much implements the HEP interface.  First, though, define a 
+containing structure for the state of the entire CM system and its
+subsystems @compile@ and @link@:
+\begin{verbatim}
+   data CmState 
+      = CmState PCMS      -- CM's stuff
+                PCS       -- compile's stuff
+                PLS       -- link's stuff
+                SI        -- the static info, never changes
+                Finder    -- the finder
+\end{verbatim}
  
+The @CmState@ is threaded through the HEP interface.  In reality
+this might be done using @IORef@s, but for clarity:
+\begin{verbatim}
+  type ModHandle = ... (opaque to CM/HEP clients) ...
+  type HValue    = ... (opaque to CM/HEP clients) ...
  
-\subsubsection{What CM does}
-Pretty much as before.  \ToDo{... and what was Before?}
+  cmInit       :: FLAGS 
+               -> [PkgInfo]
+               -> IO CmState
  
-Plus: detect module cycles during the downsweep.  During the upsweep,
-ensure that compilation failures for modules in cycles do not leave
-any of the global structures in an inconsistent state.  
-\begin{itemize}
-\item 
-   For PCS, that's never a problem because PCS doesn't hold any
-   information pertaining to home modules.
-\item 
-   HST and HIT: CM knows that these are mappings from @Module@ to
-   whatever, and can throw away entries from failed cycles, or,
-   equivalently, not commit updates to them until cycles succeed,
-   remembering of course to synthesise appropriate HSTs during
-   compilation of a cycle.
-\item 
-   UI -- a collection of @Linkable@s, between which there are no
-   direct refererences, so CM can remove additions from failed cycles
-   with no difficulty.
-\item 
-   OST -- linking is not carried out until the upsweep has
-   succeeded, so there's no problem here.
-\end{itemize}
+  cmLoadModule :: CmState 
+               -> ModName 
+               -> IO (CmState, Either [SDoc] ModHandle)
+
+  cmGetExpr    :: ModHandle 
+               -> CmState 
+               -> String -> IO (CmState, Either [SDoc] HValue)
+
+  cmRunExpr    :: HValue -> IO ()   -- don't need CmState here
+\end{verbatim}
+Almost all the huff and puff in this document pertains to @cmLoadModule@.
  
-Plus: clear out the global data structures after the downsweep but
-before the upsweep.
  
-\ToDo{CM needs to supply a way for @compile@ to know which modules in
-      HST are in its downwards closure, and which not, so it can
-      correctly construct its instance environment.}
+\subsubsection{Implementing \mbox{\tt cmInit}}
+@cmInit@ creates an empty @CmState@ using @emptyPCMS@, @emptyPCS@,
+@emptyPLS@, making SI from the supplied flags and package info, and 
+by supplying the package info the @newFinder@.
  
  
+\subsubsection{Implementing \mbox{\tt cmLoadModule}}
+
+\begin{enumerate}
+\item {\bf Downsweep:} using @finder@ and @summarise@, chase from 
+      the given module to
+      establish the new home module graph (MG).  Do not chase into
+      package modules.
+\item Remove from HIT, HST, UI any modules in the old MG which are
+      not in the new one.  The old MG is then replaced by the new one.
+\item Topologically sort MG to generate a bottom-to-top traversal
+      order, giving a worklist.
+\item {\bf Upsweep:} call @compile@ on each module in the worklist in 
+      turn, passing it
+      the ``correct'' HST, PCS, the old @ModIFace@ if
+      available, and the summary.  ``Correct'' HST in the sense that
+      HST contains only the modules in the this module's downward
+      closure, so that @compile@ can construct the correct instance
+      and rule environments simply as the union of those in 
+      the module's downward closure.
+
+      If @compile@ doesn't return a new interface/linkable pair,
+      compilation wasn't necessary.  Either way, update HST with
+      the new @ModDetails@, and UI and HIT respectively if a 
+      compilation {\em did} occur.
+
+      Keep going until the root module is successfully done, or
+      compilation fails.
+      
+\item If the previous step terminated because compilation failed,
+      define the successful set as those modules in successfully
+      completed SCCs, i.e. all @Linkable@s returned by @compile@ excluding
+      those from modules in any cycle which includes the module which failed.
+      Remove from HST, HIT, UI and MG all modules mentioned in MG which 
+      are not in the successful set.  Call @link@ with the successful
+      set,
+      which should succeed.  The net effect is to back off to a point
+      in which those modules which are still aboard are correctly
+      compiled and linked.
+
+      If the previous step terminated successfully, 
+      call @link@ passing it the @Linkable@s in the upward closure of
+      all those modules for which @compile@ produced a new @Linkable@.
+\end{enumerate}
+As a small optimisation, do this:
+\begin{enumerate}
+\item[3a.] Remove from the worklist any module M where M's source
+     hasn't changed and neither has the source of any module in M's
+     downward closure.  This has the effect of not starting the upsweep
+     right at the bottom of the graph when that's not needed.
+     Source-change checking can be done quickly by CM by comparing
+     summaries of modules in MG against corresponding 
+     summaries from the old MG.
+\end{enumerate}
+
  
  %%-- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --%%
  \subsection{The compiler (\mbox{\tt compile})}
@@ -382,11 +432,11 @@ before the upsweep.
  
  \subsubsection{Data structures owned by \mbox{\tt compile}}
  
-   {\bf Persistent Compiler State (PCS)} @:: known-only-to-compile@
+{\bf Persistent Compiler State (PCS)} @:: known-only-to-compile@
  
-   This contains info about foreign packages only, acting as a cache,
-   which is private to @compile@.  The cache never becomes out of
-   date.  There are at least two parts to it:
+This contains info about foreign packages only, acting as a cache,
+which is private to @compile@.  The cache never becomes out of
+date.  There are three parts to it:
  
     \begin{itemize}
     \item
@@ -396,19 +446,14 @@ before the upsweep.
     caches them in the PIT.  Subsequent imports of the same module get
     them directly out of the PIT, avoiding slow lexing/parsing phases.
     Because foreign packages are assumed never to become out of date,
-   all contents of PIT remain valid forever.
-
-   Successful runs of @compile@ can add arbitrary numbers of new
-   interfaces to the PIT.  Failed runs could also contribute any new
-   interfaces read, but this could create inconsistencies between the
-   PIT and the unlinked images (UI).  Specifically, we don't want the
-   PIT to acquire interfaces for which UI hasn't got a corresponding
-   @Linkable@, and we don't want @Linkable@s from failed compilation
-   runs to enter UI, because we can't be sure that they are actually
-   necessary for a successful link.  So it seems simplest, albeit at a
-   small compilation speed loss, for @compile@ not to update PCS at
-   all following a failed compile.  We may revisit this
-   decision later.
+   all contents of PIT remain valid forever.  @compile@ of course
+   tries to find package interfaces in PIT in preference to reading
+   them from files.  
+
+   Both successful and failed runs of @compile@ can add arbitrary
+   numbers of new interfaces to the PIT.  The failed runs don't matter
+   because we assume that packages are static, so the data cached even
+   by a failed run is valid forever (ie for the rest of the session).
  
     \item
        {\bf Package Symbol Table (PST)} @:: FiniteMap Module ModDetails@
@@ -422,20 +467,26 @@ before the upsweep.
     interfaces, and we don't want to do that unnecessarily.
  
     The PST avoids these problems by allowing incremental wiring-in to
-   happen.  Pieces of foreign interfaces are renamed and placed in the
-   PST, but only as @compile@ discovers it needs them.  In the process
-   of incremental renaming, @compile@ may need to read more package
-   interfaces, which are returned to CM to add to the PIT.
+   happen.  Pieces of foreign interfaces are copied out of the holding
+   pen (HP), renamed, typechecked, and placed in the PST, but only as
+   @compile@ discovers it needs them.  In the process of incremental
+   renaming/typechecking, @compile@ may need to read more package
+   interfaces, which are added to the PIT and hence to 
+   HP.~\ToDo{How? When?}
  
     CM passes the PST to @compile@ and is returned an updated version
-   on success.  On failure, @compile@ doesn't return an updated
-   version even though it might have created some updates on the way
-   to failure.  This seems necessary to retain the (thus far unstated)
-   invariant that PST only contains renamed fragments of interfaces in
-   PIT.
+   on both success and failure.
  
-   \item
-      {\bf Holding Pen (HP)} @:: Ifaces@
+   \item 
+      {\bf Holding Pen (HP)} @:: HoldingPen@ 
+
+   HP holds parsed but not-yet renamed-or-typechecked fragments of
+   package interfaces.  As typechecking of other modules progresses,
+   fragments are removed (``slurped'') from HP, renamed and
+   typechecked, and placed in PCS.PST (see above).  Slurping a
+   fragment may require new interfaces to be read into HP.  The hope
+   is, though, that many fragments will never get slurped, reducing
+   the total number of interfaces read (as compared to eager slurping).
  
     \end{itemize}
  
@@ -453,28 +504,27 @@ before the upsweep.
  
  
  
-\subsubsection*{What {\tt compile} does}
+\subsubsection{What {\tt compile} does}
  @compile@ is necessarily somewhat complex.  We've decided to do away
-with private global variables -- they make the design harder to
-understand and may interfere with CM's need to roll the system back
-to a consistent state following compilation failure for modules in 
-a cycle.  Without further ado:
+with private global variables -- they make the design specification
+less clear, although the implementation might use them.  Without
+further ado:
  \begin{verbatim}
-   compile :: FLAGS       -- obvious
+   compile :: SI          -- obvious
             -> Finder      -- to find modules
             -> ModSummary  -- summary, including source
-           -> Maybe (ModIFace, Linkable)
-                          -- former summary and code, if avail
+           -> Maybe ModIFace
+                          -- former summary, if avail
             -> HST         -- for home module ModDetails
             -> PCS         -- IN: the persistent compiler state
  
-           -> CompResult
+           -> IO CompResult
  
     data CompResult
        = CompOK  ModDetails   -- new details (== HST additions)
-                (ModIFace, Linkable)
-                             -- summary and code; same as went in if
-                             -- compilation was not needed
+                (Maybe (ModIFace, Linkable))
+                             -- summary and code; Nothing => compilation
+                             -- not needed (old summary and code are still valid)
                  PCS          -- updated PCS
                  [SDoc]       -- warnings
  
@@ -483,7 +533,10 @@ a cycle.  Without further ado:
  
     data PCS
        = MkPCS PIT         -- package interfaces
-              PST         -- rename cache/global symtab contents
+              PST         -- post slurping global symtab contribs
+              HoldingPen  -- pre slurping interface bits and pieces
+
+   emptyPCS :: IO PCS     -- since CM has no other way to make one
  \end{verbatim}
  Although @compile@ is passed three of the global structures (FLAGS,
  HST and PCS), it only modifies PCS.  The rest are modified by CM as it
@@ -511,18 +564,18 @@ What @compile@ does: \ToDo{A bit vague ... needs refining.  How does
  
  \item
     If recompilation is not needed, create a new @ModDetails@ from the
-   old @ModIFace@, looking up information in HST and PCS.PST as necessary.
-   Return the new details, the old @ModIFace@ and @Linkable@, the PCS
-   \ToDo{I don't think the PCS should be updated, but who knows?}, and
-   an empty warning list.
+   old @ModIFace@, looking up information in HST and PCS.PST as
+   necessary.  Return the new details, a @Nothing@ denoting
+   compilation was not needed, the PCS \ToDo{I don't think the PCS
+   should be updated, but who knows?}, and an empty warning list.
  
  \item
     Otherwise, compilation is needed.  
  
     If the module is only available in object+interface form, read the
     interface, make up details, create a linkable pointing at the
-   object code.  Does this involve reading any more interfaces?  Does
-   it involve updating PST?
+   object code.  \ToDo{Does this involve reading any more interfaces?  Does
+   it involve updating PST?}
     
     Otherwise, translate from source, then create and return: an
     details, interface, linkable, updated PST, and warnings.
@@ -535,6 +588,62 @@ What @compile@ does: \ToDo{A bit vague ... needs refining.  How does
     boot interface against the inferred interface.}
  \end{itemize}
  
+
+\subsubsection{Contents of \mbox{\tt ModDetails}, 
+               \mbox{\tt ModIFace} and \mbox{\tt HoldingPen}}
+Only @compile@ can see inside these three types -- they are opaque to
+everyone else.  @ModDetails@ holds the post-renaming,
+post-typechecking environment created by compiling a module.
+
+\begin{verbatim}
+   data ModDetails
+      = ModDetails {
+           moduleExports :: Avails
+           moduleEnv     :: GlobalRdrEnv    -- == FM RdrName [Name]
+           typeEnv       :: FM Name TyThing -- TyThing is in TcEnv.lhs
+           instEnv       :: InstEnv
+           fixityEnv     :: FM Name Fixity
+           ruleEnv       :: FM Id [Rule]
+        }
+\end{verbatim}
+
+@ModIFace@ is nearly the same as @ParsedIFace@ from @RnMonad.lhs@:
+\begin{verbatim}
+   type ModIFace = ParsedIFace    -- not really, but ...
+   data ParsedIface
+      = ParsedIface {
+           pi_mod       :: Module,                   -- Complete with package info
+           pi_vers      :: Version,                  -- Module version number
+           pi_orphan    :: WhetherHasOrphans,        -- Whether this module has orphans
+           pi_usages    :: [ImportVersion OccName],  -- Usages
+           pi_exports   :: [ExportItem],             -- Exports
+           pi_insts     :: [RdrNameInstDecl],        -- Local instance declarations
+           pi_decls     :: [(Version, RdrNameHsDecl)],    -- Local definitions
+           pi_fixity    :: (Version, [RdrNameFixitySig]), -- Local fixity declarations, 
+                                                          -- with their version
+           pi_rules     :: (Version, [RdrNameRuleDecl]),  -- Rules, with their version
+           pi_deprecs   :: [RdrNameDeprecation]           -- Deprecations
+       }
+\end{verbatim}
+
+@HoldingPen@ is a cleaned-up version of that found in @RnMonad.lhs@, 
+retaining just the 3 pieces actually comprising the holding pen:
+\begin{verbatim}
+   data HoldingPen 
+      = HoldingPen {
+           iDecls :: DeclsMap,     -- A single, global map of Names to decls
+
+           iInsts :: IfaceInsts,
+           -- The as-yet un-slurped instance decls; this bag is depleted when we
+           -- slurp an instance decl so that we don't slurp the same one twice.
+           -- Each is 'gated' by the names that must be available before
+           -- this instance decl is needed.
+
+           iRules :: IfaceRules
+           -- Similar to instance decls, only for rules
+        }
+\end{verbatim}
+
  %%-- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --%%
  \subsection{The linker (\mbox{\tt link})}
  \label{sec:linker}
@@ -542,10 +651,10 @@ What @compile@ does: \ToDo{A bit vague ... needs refining.  How does
  \subsubsection{Data structures owned by the linker}
  
  In the same way that @compile@ has a persistent compiler state (PCS),
-the linker has a persistent (session-lifetime) state, LPS, the
-Linker's Persistent State.  In batch mode LPS is entirely irrelevant,
+the linker has a persistent (session-lifetime) state, PLS, the
+Linker's Persistent State.  In batch mode PLS is entirely irrelevant,
  because there is only a single link step, and can be a unit value
-ignored by everybody.  In interactive mode LPS is composed of the
+ignored by everybody.  In interactive mode PLS is composed of the
  following three parts:
  
  \begin{itemize}
@@ -628,7 +737,7 @@ following three parts:
           indistinguishably from compiled versions of the same code.
     \end{itemize}
     Because object code is outside the heap and never deallocated,
-   whilst interpreted code is held alive by the OST, there's no need
+   whilst interpreted code is held alive via the HST, there's no need
     to have a data structure which ``is'' the linked image.
  
     For batch compilation, LI doesn't exist because OST doesn't exist,
@@ -637,7 +746,8 @@ following three parts:
  
     \ToDo{Do we need to say anything about CAFs and SRTs?  Probably ...}
  \end{itemize}
-
+As with PCS, CM has no way to create an initial PLS, so we supply
+@emptyPLS@ for that purpose.
  
  \subsubsection{The linker's interface}
  
@@ -646,12 +756,14 @@ than passed around explicitly.  (The same might be true for PCS).
  Anyway:
  
  \begin{verbatim}
-   data PCS -- as described above; opaque to everybody except the linker
+   data PLS -- as described above; opaque to everybody except the linker
+
+   link :: PCI -> ??? -> [[Linkable]] -> PLS -> IO LinkResult
  
-   link :: PCI -> ??? -> [[Linkable]] -> LinkState -> IO LinkResult
+   data LinkResult = LinkOK   PLS
+                   | LinkErrs PLS [SDoc]
  
-   data LinkResult = LinkOK   LinkState
-                   | LinkErrs LinkState [SDoc]
+   emptyPLS :: IO PLS     -- since CM has no other way to make one
  \end{verbatim}
  
  CM uses @link@ as follows:
@@ -691,7 +803,7 @@ it may not be possible to link recursive groups containing libraries.
  \end{itemize}
  
  If linking in of a group should fail for some reason, @link@ should
-not modify its @LinkState@ at all.  In other words, linking each group
+not modify its PLS at all.  In other words, linking each group
  is atomic; it either succeeds or fails.
  
  \subsubsection*{\mbox{\tt Unlinked} and \mbox{\tt Linkable}}