Reorganisation of the source tree

[ghc-hetmet.git] / docs / ghci / ghci.tex
diff --git a/docs/ghci/ghci.tex b/docs/ghci/ghci.tex

new file mode 100644 (file)

index 0000000..c4638a6
--- /dev/null
+++ b/docs/ghci/ghci.tex
@@ -0,0 +1,1598 @@
+%
+% (c) The OBFUSCATION-THROUGH-GRATUITOUS-PREPROCESSOR-ABUSE Project,
+%     Glasgow University, 1990-2000
+%
+
+% \documentstyle[preprint]{acmconf}
+\documentclass[11pt]{article}
+\oddsidemargin 0.1 in       %   Note that \oddsidemargin = \evensidemargin
+\evensidemargin 0.1 in
+\marginparwidth 0.85in    %   Narrow margins require narrower marginal notes
+\marginparsep 0 in 
+\sloppy
+
+%\usepackage{epsfig}
+\usepackage{shortvrb}
+\MakeShortVerb{\@}
+
+%\newcommand{\note}[1]{{\em Note: #1}}
+\newcommand{\note}[1]{{{\bf Note:}\sl #1}}
+\newcommand{\ToDo}[1]{{{\bf ToDo:}\sl #1}}
+\newcommand{\Arg}[1]{\mbox{${\tt arg}_{#1}$}}
+\newcommand{\bottom}{\perp}
+
+\newcommand{\secref}[1]{Section~\ref{sec:#1}}
+\newcommand{\figref}[1]{Figure~\ref{fig:#1}}
+\newcommand{\Section}[2]{\section{#1}\label{sec:#2}}
+\newcommand{\Subsection}[2]{\subsection{#1}\label{sec:#2}}
+\newcommand{\Subsubsection}[2]{\subsubsection{#1}\label{sec:#2}}
+
+% DIMENSION OF TEXT:
+\textheight 8.5 in
+\textwidth 6.25 in
+
+\topmargin 0 in
+\headheight 0 in
+\headsep .25 in
+
+
+\setlength{\parskip}{0.15cm}
+\setlength{\parsep}{0.15cm}
+\setlength{\topsep}{0cm}       % Reduces space before and after verbatim,
+                               % which is implemented using trivlist 
+\setlength{\parindent}{0cm}
+
+\renewcommand{\textfraction}{0.2}
+\renewcommand{\floatpagefraction}{0.7}
+
+\begin{document}
+
+\title{The GHCi Draft Design, round 2}
+\author{MSR Cambridge Haskell Crew \\
+        Microsoft Research Ltd., Cambridge}
+
+\maketitle
+
+%%%\tableofcontents
+%%%\newpage
+
+%%-----------------------------------------------------------------%%
+\section{Details}
+
+\subsection{Outline of the design}
+\label{sec:details-intro}
+
+The design falls into three major parts:
+\begin{itemize}
+\item The compilation manager (CM), which coordinates the 
+      system and supplies a HEP-like interface to clients.
+\item The module compiler (@compile@), which translates individual
+      modules to interpretable or machine code.
+\item The linker (@link@),
+      which maintains the executable image in interpreted mode.
+\end{itemize}
+
+There are also three auxiliary parts: the finder, which locates
+source, object and interface files, the summariser, which quickly
+finds dependency information for modules, and the static info
+(compiler flags and package details), which is unchanged over the
+course of a session.
+
+This section continues with an overview of the session-lifetime data
+structures.  Then follows the finder (section~\ref{sec:finder}),
+summariser (section~\ref{sec:summariser}), 
+static info (section~\ref{sec:staticinfo}),
+and finally the three big sections
+(\ref{sec:manager},~\ref{sec:compiler},~\ref{sec:linker})
+on the compilation manager, compiler and linker respectively.
+
+\subsubsection*{Some terminology}
+
+Lifetimes: the phrase {\bf session lifetime} covers a complete run of
+GHCI, encompassing multiple recompilation runs.  {\bf Module lifetime}
+is a lot shorter, being that of data needed to translate a single
+module, but then discarded, for example Core, AbstractC, Stix trees.
+
+Data structures with module lifetime are well documented and understood.
+This document is mostly concerned with session-lifetime data.
+Most of these structures are ``owned'' by CM, since that's
+the only major component of GHCI which deals with session-lifetime
+issues. 
+
+Modules and packages: {\bf home} refers to modules in this package,
+precisely the ones tracked and updated by the compilation manager.
+{\bf Package} refers to all other packages, which are assumed static.
+
+\subsubsection*{A summary of all session-lifetime data structures}
+
+These structures have session lifetime but not necessarily global
+visibility.  Subsequent sections elaborate who can see what.
+\begin{itemize}
+\item {\bf Home Symbol Table (HST)} (owner: CM) holds the post-renaming
+      environments created by compiling each home module.
+\item {\bf Home Interface Table (HIT)} (owner: CM) holds in-memory
+      representations of the interface file created by compiling 
+      each home module.
+\item {\bf Unlinked Images (UI)} (owner: CM) are executable but as-yet
+      unlinked translations of home modules only.
+\item {\bf Module Graph (MG)} (owner: CM) is the current module graph.
+\item {\bf Static Info (SI)} (owner: CM) is the package configuration
+      information (PCI) and compiler flags (FLAGS).
+\item {\bf Persistent Compiler State (PCS)} (owner: @compile@)
+      is @compile@'s private cache of information about package
+      modules.
+\item {\bf Persistent Linker State (PLS)} (owner: @link@) is
+      @link@'s private information concerning the the current 
+      state of the (in-memory) executable image.
+\end{itemize}
+
+
+%%-- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --%%
+\subsection{The finder (\mbox{\tt type Finder})}
+\label{sec:finder}
+
+@Path@ could be an indication of a location in a filesystem, or it
+could be some more generic kind of resource identifier, a URL for
+example.
+\begin{verbatim}
+   data Path = ...
+\end{verbatim}
+
+And some names.  @Module@s are now used as primary keys for various
+maps, so they are given a @Unique@.
+\begin{verbatim}
+   type ModName = String      -- a module name
+   type PkgName = String      -- a package name
+   type Module  = -- contains ModName and a Unique, at least
+\end{verbatim}
+
+A @ModLocation@ says where a module is, what it's called and in what
+form it is.
+\begin{verbatim}
+   data ModLocation = SourceOnly Module Path         -- .hs
+                    | ObjectCode Module Path Path    -- .o, .hi
+                    | InPackage  Module PkgName
+                          -- examine PCI to determine package Path
+\end{verbatim}
+
+The module finder generates @ModLocation@s from @ModName@s.  We expect
+it will assume packages to be static, but we want to be able to track
+changes in home modules during the session.  Specifically, we want to
+be able to notice that a module's object and interface have been
+updated, presumably by a compile run outside of the GHCI session.
+Hence the two-stage type:
+\begin{verbatim}
+   type Finder = ModName -> IO ModLocation
+   newFinder :: PCI -> IO Finder
+\end{verbatim}
+@newFinder@ examines the package information right at the start, but 
+returns an @IO@-typed function which can inspect home module changes
+later in the session.
+
+
+%%-- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --%%
+\subsection{The summariser (\mbox{\tt summarise})}
+\label{sec:summariser}
+
+A @ModSummary@ records the minimum information needed to establish the
+module graph and determine whose source has changed.  @ModSummary@s
+can be created quickly.
+\begin{verbatim}
+   data ModSummary = ModSummary 
+                        ModLocation   -- location and kind
+                        (Maybe (String, Fingerprint))
+                                      -- source and fingerprint if .hs
+                        (Maybe [ModName])     -- imports if .hs or .hi
+
+   type Fingerprint = ...  -- file timestamp, or source checksum?
+
+   summarise :: ModLocation -> IO ModSummary
+\end{verbatim}
+
+The summary contains the location and source text, and the location
+contains the name.  We would like to remove the assumption that
+sources live on disk, but I'm not sure this is good enough yet.
+
+\ToDo{Should @ModSummary@ contain source text for interface files too?}
+\ToDo{Also say that @ModIFace@ contains its module's @ModSummary@  (why?).}
+
+
+%%-- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --%%
+\subsection{Static information (SI)}
+\label{sec:staticinfo}
+
+PCI, the package configuration information, is a list of @PkgInfo@,
+each containing at least the following:
+\begin{verbatim}
+   data PkgInfo
+      = PkgInfo PkgName    -- my name
+                Path       -- path to my base location
+                [PkgName]  -- who I depend on
+                [ModName]  -- modules I supply
+                [Unlinked] -- paths to my object files
+
+   type PCI = [PkgInfo]
+\end{verbatim}
+The @Path@s in it, including those in the @Unlinked@s, are set up
+when GHCI starts.  
+
+FLAGS is a bunch of compiler options.  We haven't figured out yet how
+to partition them into those for the whole session vs those for
+specific source files, so currently the best we can do is:
+\begin{verbatim}
+   data FLAGS = ...
+\end{verbatim}
+
+The static information (SI) is the both of these:
+\begin{verbatim}
+   data SI = SI PCI
+                FLAGS
+\end{verbatim}
+
+
+
+%%-- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --%%
+\subsection{The Compilation Manager (CM)}
+\label{sec:manager}
+
+\subsubsection{Data structures owned by CM}
+
+CM maintains two maps (HST, HIT) and a set (UI).  It's important to
+realise that CM only knows about the map/set-ness, and has no idea
+what a @ModDetails@, @ModIFace@ or @Linkable@ is.  Only @compile@ and
+@link@ know that, and CM passes these types around without
+inspecting them.
+
+\begin{itemize}
+\item
+   {\bf Home Symbol Table (HST)} @:: FiniteMap Module ModDetails@
+
+   The @ModDetails@ (a couple of layers down) contain tycons, classes,
+   instances, etc, collectively known as ``entities''.  Referrals from
+   other modules to these entities is direct, with no intervening
+   indirections of any kind; conversely, these entities refer directly
+   to other entities, regardless of module boundaries.  HST only holds
+   information for home modules; the corresponding wired-up details
+   for package (non-home) modules are created on demand in the package
+   symbol table (PST) inside the persistent compiler's state (PCS).
+
+   CM maintains the HST, which is passed to, but not modified by,
+   @compile@.  If compilation of a module is successful, @compile@
+   returns the resulting @ModDetails@ (inside the @CompResult@) which
+   CM then adds to HST.
+
+   CM throws away arbitrarily large parts of HST at the start of a
+   rebuild, and uses @compile@ to incrementally reconstruct it.
+
+\item
+   {\bf Home Interface Table (HIT)} @:: FiniteMap Module ModIFace@
+
+   (Completely private to CM; nobody else sees this).
+
+   Compilation of a module always creates a @ModIFace@, which contains
+   the unlinked symbol table entries.  CM maintains this @FiniteMap@
+   @ModName@ @ModIFace@, with session lifetime.  CM never throws away
+   @ModIFace@s, but it does update them, by passing old ones to
+   @compile@ if they exist, and getting new ones back.
+
+   CM acquires @ModuleIFace@s from @compile@, which it only applies
+   to modules in the home package.  As a result, HIT only contains
+   @ModuleIFace@s for modules in the home package.  Those from other
+   packages reside in the package interface table (PIT) which is a
+   component of PCS.
+
+\item
+   {\bf Unlinked Images (UI)} @:: Set Linkable@
+
+   The @Linkable@s in UI represent executable but as-yet unlinked
+   module translations.  A @Linkable@ can contain the name of an
+   object, archive or DLL file.  In interactive mode, it may also be
+   the STG trees derived from translating a module.  So @compile@
+   returns a @Linkable@ from each successful run, namely that of
+   translating the module at hand.  
+
+   At link-time, CM supplies @Linkable@s for the upwards closure of
+   all packages which have changed, to @link@.  It also examines the
+   @ModSummary@s for all home modules, and by examining their imports
+   and the SI.PCI (package configuration info) it can determine the
+   @Linkable@s from all required imported packages too.
+
+   @Linkable@s and @ModIFace@s have a close relationship.  Each
+   translated module has a corresponding @Linkable@ somewhere.
+   However, there may be @Linkable@s with no corresponding modules
+   (the RTS, for example).  Conversely, multiple modules may share a
+   single @Linkable@ -- as is the case for any module from a
+   multi-module package.  For these reasons it seems appropriate to
+   keep the two concepts distinct.  @Linkable@s also provide
+   information about the sequence in which individual package
+   components should be linked, and that isn't the business of any
+   specific module to know.
+
+   CM passes @compile@ a module's old @ModIFace@, if it has one, in
+   the hope that the module won't need recompiling.  If so, @compile@
+   can just return the new @ModDetails@ created from it, and CM will
+   re-use the old @ModIFace@.  If the module {\em is} recompiled (or 
+   scheduled to be loaded from disk), @compile@ returns both the 
+   new @ModIFace@ and new @Linkable@.
+
+\item 
+   {\bf Module Graph (MG)} @:: known-only-to-CM@
+
+   Records, for CM's purposes, the current module graph,
+   up-to-dateness and summaries.  More details when I get to them.
+   Only contains home modules.
+\end{itemize}
+Probably all this stuff is rolled together into the Persistent CM
+State (PCMS):
+\begin{verbatim}
+  data PCMS = PCMS HST HIT UI MG
+  emptyPCMS :: IO PCMS
+\end{verbatim}
+
+\subsubsection{What CM implements}
+It pretty much implements the HEP interface.  First, though, define a 
+containing structure for the state of the entire CM system and its
+subsystems @compile@ and @link@:
+\begin{verbatim}
+   data CmState 
+      = CmState PCMS      -- CM's stuff
+                PCS       -- compile's stuff
+                PLS       -- link's stuff
+                SI        -- the static info, never changes
+                Finder    -- the finder
+\end{verbatim}
+
+The @CmState@ is threaded through the HEP interface.  In reality
+this might be done using @IORef@s, but for clarity:
+\begin{verbatim}
+  type ModHandle = ... (opaque to CM/HEP clients) ...
+  type HValue    = ... (opaque to CM/HEP clients) ...
+
+  cmInit       :: FLAGS 
+               -> [PkgInfo]
+               -> IO CmState
+
+  cmLoadModule :: CmState 
+               -> ModName 
+               -> IO (CmState, Either [SDoc] ModHandle)
+
+  cmGetExpr    :: ModHandle 
+               -> CmState 
+               -> String -> IO (CmState, Either [SDoc] HValue)
+
+  cmRunExpr    :: HValue -> IO ()   -- don't need CmState here
+\end{verbatim}
+Almost all the huff and puff in this document pertains to @cmLoadModule@.
+
+
+\subsubsection{Implementing \mbox{\tt cmInit}}
+@cmInit@ creates an empty @CmState@ using @emptyPCMS@, @emptyPCS@,
+@emptyPLS@, making SI from the supplied flags and package info, and 
+by supplying the package info the @newFinder@.
+
+
+\subsubsection{Implementing \mbox{\tt cmLoadModule}}
+
+\begin{enumerate}
+\item {\bf Downsweep:} using @finder@ and @summarise@, chase from 
+      the given module to
+      establish the new home module graph (MG).  Do not chase into
+      package modules.
+\item Remove from HIT, HST, UI any modules in the old MG which are
+      not in the new one.  The old MG is then replaced by the new one.
+\item Topologically sort MG to generate a bottom-to-top traversal
+      order, giving a worklist.
+\item {\bf Upsweep:} call @compile@ on each module in the worklist in 
+      turn, passing it
+      the ``correct'' HST, PCS, the old @ModIFace@ if
+      available, and the summary.  ``Correct'' HST in the sense that
+      HST contains only the modules in the this module's downward
+      closure, so that @compile@ can construct the correct instance
+      and rule environments simply as the union of those in 
+      the module's downward closure.
+
+      If @compile@ doesn't return a new interface/linkable pair,
+      compilation wasn't necessary.  Either way, update HST with
+      the new @ModDetails@, and UI and HIT respectively if a 
+      compilation {\em did} occur.
+
+      Keep going until the root module is successfully done, or
+      compilation fails.
+      
+\item If the previous step terminated because compilation failed,
+      define the successful set as those modules in successfully
+      completed SCCs, i.e. all @Linkable@s returned by @compile@ excluding
+      those from modules in any cycle which includes the module which failed.
+      Remove from HST, HIT, UI and MG all modules mentioned in MG which 
+      are not in the successful set.  Call @link@ with the successful
+      set,
+      which should succeed.  The net effect is to back off to a point
+      in which those modules which are still aboard are correctly
+      compiled and linked.
+
+      If the previous step terminated successfully, 
+      call @link@ passing it the @Linkable@s in the upward closure of
+      all those modules for which @compile@ produced a new @Linkable@.
+\end{enumerate}
+As a small optimisation, do this:
+\begin{enumerate}
+\item[3a.] Remove from the worklist any module M where M's source
+     hasn't changed and neither has the source of any module in M's
+     downward closure.  This has the effect of not starting the upsweep
+     right at the bottom of the graph when that's not needed.
+     Source-change checking can be done quickly by CM by comparing
+     summaries of modules in MG against corresponding 
+     summaries from the old MG.
+\end{enumerate}
+
+
+%%-- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --%%
+\subsection{The compiler (\mbox{\tt compile})}
+\label{sec:compiler}
+
+\subsubsection{Data structures owned by \mbox{\tt compile}}
+
+{\bf Persistent Compiler State (PCS)} @:: known-only-to-compile@
+
+This contains info about foreign packages only, acting as a cache,
+which is private to @compile@.  The cache never becomes out of
+date.  There are three parts to it:
+
+   \begin{itemize}
+   \item
+      {\bf Package Interface Table (PIT)} @:: FiniteMap Module ModIFace@
+
+   @compile@ reads interfaces from modules in foreign packages, and
+   caches them in the PIT.  Subsequent imports of the same module get
+   them directly out of the PIT, avoiding slow lexing/parsing phases.
+   Because foreign packages are assumed never to become out of date,
+   all contents of PIT remain valid forever.  @compile@ of course
+   tries to find package interfaces in PIT in preference to reading
+   them from files.  
+
+   Both successful and failed runs of @compile@ can add arbitrary
+   numbers of new interfaces to the PIT.  The failed runs don't matter
+   because we assume that packages are static, so the data cached even
+   by a failed run is valid forever (ie for the rest of the session).
+
+   \item
+      {\bf Package Symbol Table (PST)} @:: FiniteMap Module ModDetails@
+
+   Adding an package interface to PIT doesn't make it directly usable
+   to @compile@, because it first needs to be wired (renamed +
+   typechecked) into the sphagetti of the HST.  On the other hand,
+   most modules only use a few entities from any imported interface,
+   so wiring-in the interface at PIT-entry time might be a big time
+   waster.  Also, wiring in an interface could mean reading other
+   interfaces, and we don't want to do that unnecessarily.
+
+   The PST avoids these problems by allowing incremental wiring-in to
+   happen.  Pieces of foreign interfaces are copied out of the holding
+   pen (HP), renamed, typechecked, and placed in the PST, but only as
+   @compile@ discovers it needs them.  In the process of incremental
+   renaming/typechecking, @compile@ may need to read more package
+   interfaces, which are added to the PIT and hence to 
+   HP.~\ToDo{How? When?}
+
+   CM passes the PST to @compile@ and is returned an updated version
+   on both success and failure.
+
+   \item 
+      {\bf Holding Pen (HP)} @:: HoldingPen@ 
+
+   HP holds parsed but not-yet renamed-or-typechecked fragments of
+   package interfaces.  As typechecking of other modules progresses,
+   fragments are removed (``slurped'') from HP, renamed and
+   typechecked, and placed in PCS.PST (see above).  Slurping a
+   fragment may require new interfaces to be read into HP.  The hope
+   is, though, that many fragments will never get slurped, reducing
+   the total number of interfaces read (as compared to eager slurping).
+
+   \end{itemize}
+
+   PCS is opaque to CM; only @compile@ knows what's in it, and how to
+   update it.  Because packages are assumed static, PCS never becomes
+   out of date.  So CM only needs to be able to create an empty PCS,
+   with @emptyPCS@, and thence just passes it through @compile@ with
+   no further ado.
+
+   In return, @compile@ must promise not to store in PCS any
+   information pertaining to the home modules.  If it did so, CM would
+   need to have a way to remove this information prior to commencing a
+   rebuild, which conflicts with PCS's opaqueness to CM.
+
+
+
+
+\subsubsection{What {\tt compile} does}
+@compile@ is necessarily somewhat complex.  We've decided to do away
+with private global variables -- they make the design specification
+less clear, although the implementation might use them.  Without
+further ado:
+\begin{verbatim}
+   compile :: SI          -- obvious
+           -> Finder      -- to find modules
+           -> ModSummary  -- summary, including source
+           -> Maybe ModIFace
+                          -- former summary, if avail
+           -> HST         -- for home module ModDetails
+           -> PCS         -- IN: the persistent compiler state
+
+           -> IO CompResult
+
+   data CompResult
+      = CompOK  ModDetails   -- new details (== HST additions)
+                (Maybe (ModIFace, Linkable))
+                             -- summary and code; Nothing => compilation
+                             -- not needed (old summary and code are still valid)
+                PCS          -- updated PCS
+                [SDoc]       -- warnings
+
+      | CompErrs PCS         -- updated PCS
+                 [SDoc]      -- warnings and errors
+
+   data PCS
+      = MkPCS PIT         -- package interfaces
+              PST         -- post slurping global symtab contribs
+              HoldingPen  -- pre slurping interface bits and pieces
+
+   emptyPCS :: IO PCS     -- since CM has no other way to make one
+\end{verbatim}
+Although @compile@ is passed three of the global structures (FLAGS,
+HST and PCS), it only modifies PCS.  The rest are modified by CM as it
+sees fit, from the stuff returned in the @CompResult@.
+
+@compile@ is allowed to return an updated PCS even if compilation
+errors occur, since the information in it pertains only to foreign
+packages and is assumed to be always-correct.
+
+What @compile@ does: \ToDo{A bit vague ... needs refining.  How does
+                           @finder@ come into the game?}
+\begin{itemize}
+\item Figure out if this module needs recompilation.
+   \begin{itemize}
+   \item If there's no old @ModIFace@, it does.  Else:
+   \item Compare the @ModSummary@ supplied with that in the
+         old @ModIFace@.  If the source has changed, recompilation
+         is needed.  Else:
+   \item Compare the usage version numbers in the old @ModIFace@ with
+         those in the imported @ModIFace@s.  All needed interfaces
+         for this should be in either HIT or PIT.  If any version
+         numbers differ, recompilation is needed.
+   \item Otherwise it isn't needed.   
+   \end{itemize}
+
+\item
+   If recompilation is not needed, create a new @ModDetails@ from the
+   old @ModIFace@, looking up information in HST and PCS.PST as
+   necessary.  Return the new details, a @Nothing@ denoting
+   compilation was not needed, the PCS \ToDo{I don't think the PCS
+   should be updated, but who knows?}, and an empty warning list.
+
+\item
+   Otherwise, compilation is needed.  
+
+   If the module is only available in object+interface form, read the
+   interface, make up details, create a linkable pointing at the
+   object code.  \ToDo{Does this involve reading any more interfaces?  Does
+   it involve updating PST?}
+   
+   Otherwise, translate from source, then create and return: an
+   details, interface, linkable, updated PST, and warnings.
+
+   When looking for a new interface, search HST, then PCS.PIT, and only
+   then read from disk.  In which case add the new interface(s) to
+   PCS.PIT.  
+   
+   \ToDo{If compiling a module with a boot-interface file, check the 
+   boot interface against the inferred interface.}
+\end{itemize}
+
+
+\subsubsection{Contents of \mbox{\tt ModDetails}, 
+               \mbox{\tt ModIFace} and \mbox{\tt HoldingPen}}
+Only @compile@ can see inside these three types -- they are opaque to
+everyone else.  @ModDetails@ holds the post-renaming,
+post-typechecking environment created by compiling a module.
+
+\begin{verbatim}
+   data ModDetails
+      = ModDetails {
+           moduleExports :: Avails
+           moduleEnv     :: GlobalRdrEnv    -- == FM RdrName [Name]
+           typeEnv       :: FM Name TyThing -- TyThing is in TcEnv.lhs
+           instEnv       :: InstEnv
+           fixityEnv     :: FM Name Fixity
+           ruleEnv       :: FM Id [Rule]
+        }
+\end{verbatim}
+
+@ModIFace@ is nearly the same as @ParsedIFace@ from @RnMonad.lhs@:
+\begin{verbatim}
+   type ModIFace = ParsedIFace    -- not really, but ...
+   data ParsedIface
+      = ParsedIface {
+           pi_mod       :: Module,                   -- Complete with package info
+           pi_vers      :: Version,                  -- Module version number
+           pi_orphan    :: WhetherHasOrphans,        -- Whether this module has orphans
+           pi_usages    :: [ImportVersion OccName],  -- Usages
+           pi_exports   :: [ExportItem],             -- Exports
+           pi_insts     :: [RdrNameInstDecl],        -- Local instance declarations
+           pi_decls     :: [(Version, RdrNameHsDecl)],    -- Local definitions
+           pi_fixity    :: (Version, [RdrNameFixitySig]), -- Local fixity declarations, 
+                                                          -- with their version
+           pi_rules     :: (Version, [RdrNameRuleDecl]),  -- Rules, with their version
+           pi_deprecs   :: [RdrNameDeprecation]           -- Deprecations
+       }
+\end{verbatim}
+
+@HoldingPen@ is a cleaned-up version of that found in @RnMonad.lhs@, 
+retaining just the 3 pieces actually comprising the holding pen:
+\begin{verbatim}
+   data HoldingPen 
+      = HoldingPen {
+           iDecls :: DeclsMap,     -- A single, global map of Names to decls
+
+           iInsts :: IfaceInsts,
+           -- The as-yet un-slurped instance decls; this bag is depleted when we
+           -- slurp an instance decl so that we don't slurp the same one twice.
+           -- Each is 'gated' by the names that must be available before
+           -- this instance decl is needed.
+
+           iRules :: IfaceRules
+           -- Similar to instance decls, only for rules
+        }
+\end{verbatim}
+
+%%-- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --%%
+\subsection{The linker (\mbox{\tt link})}
+\label{sec:linker}
+
+\subsubsection{Data structures owned by the linker}
+
+In the same way that @compile@ has a persistent compiler state (PCS),
+the linker has a persistent (session-lifetime) state, PLS, the
+Linker's Persistent State.  In batch mode PLS is entirely irrelevant,
+because there is only a single link step, and can be a unit value
+ignored by everybody.  In interactive mode PLS is composed of the
+following three parts:
+
+\begin{itemize}
+\item 
+\textbf{The Source Symbol Table (SST)}@ :: FiniteMap RdrName HValue@   
+  The source symbol table is used when linking interpreted code.
+  Unlinked interpreted code consists of an STG  tree where
+  the leaves are @RdrNames@.  The linker's job is to resolve these to
+  actual addresses (the alternative is to resolve these lazily when
+  the code is run, but this requires passing the full symbol table
+  through the interpreter and the repeated lookups will probably be
+  expensive).
+
+  The source symbol table therefore maps @RdrName@s to @HValue@s, for
+  every @RdrName@ that currently \emph{has} an @HValue@, including all
+  exported functions from object code modules that are currently
+  linked in.  Linking therefore turns a @StgTree RdrName@ into an
+  @StgTree HValue@.
+
+  It is important that we can prune this symbol table by throwing away
+  the mappings for an entire module, whenever we recompile/relink a
+  given module.  The representation is therefore probably a two-level
+  mapping, from module names, to function/constructor names, to
+  @HValue@s.
+
+\item \textbf{The Object Symbol Table (OST)}@ :: FiniteMap String Addr@
+  This is a lower level symbol table, mapping symbol names in object
+  modules to their addresses in memory.  It is used only when
+  resolving the external references in an object module, and contains
+  only entries that are defined in object modules.
+
+  Why have two symbol tables?  Well, there is a clear distinction
+  between the two: the source symbol table maps Haskell symbols to
+  Haskell values, and the object symbol table maps object symbols to
+  addresses.  There is some overlap, in that Haskell symbols certainly
+  have addresses, and we could look up a Haskell symbol's address by
+  manufacturing the right object symbol and looking that up in the
+  object symbol table, but this is likely to be slow and would force
+  us to extend the object symbol table with all the symbols
+  ``exported'' by interpreted code.  Doing it this way enables us to
+  decouple the object management subsystem from the rest of the linker
+  with a minimal interface; something like
+
+  \begin{verbatim}
+  loadObject   :: Unlinked -> IO Object
+  unloadModule :: Unlinked -> IO ()
+  lookupSymbol :: String   -> IO Addr
+  \end{verbatim}
+
+  Rather unfortunately we need @lookupSymbol@ in order to populate the
+  source symbol table when linking in a new compiled module.  Our
+  object management subsystem is currently written in C, so decoupling
+  this interface as much as possible is highly desirable.
+
+\item
+   {\bf Linked Image (LI)} @:: no-explicit-representation@
+
+   LI isn't explicitly represented in the system, but we record it
+   here for completeness anyway.  LI is the current set of
+   linked-together module, package and other library fragments
+   constituting the current executable mass.  LI comprises:
+   \begin{itemize}
+   \item Machine code (@.o@, @.a@, @.DLL@ file images) in memory.
+         These are loaded from disk when needed, and stored in
+         @malloc@ville.  To simplify storage management, they are
+         never freed or reused, since this creates serious
+         complications for storage management.  When no longer needed,
+         they are simply abandoned.  New linkings of the same object
+         code produces new copies in memory.  We hope this not to be
+         too much of a space leak.
+   \item STG trees, which live in the GHCI heap and are managed by the
+         storage manager in the usual way.  They are held alive (are 
+         reachable) via the @HValue@s in the OST.  Such @HValue@s are
+         applications of the interpreter function to the trees
+         themselves.  Linking a tree comprises travelling over the 
+         tree, replacing all the @Id@s with pointers directly to the
+         relevant @_closure@ labels, as determined by searching the
+         OST.  Once the leaves are linked, trees are wrapped with the
+         interpreter function.  The resulting @HValue@s then behave
+         indistinguishably from compiled versions of the same code.
+   \end{itemize}
+   Because object code is outside the heap and never deallocated,
+   whilst interpreted code is held alive via the HST, there's no need
+   to have a data structure which ``is'' the linked image.
+
+   For batch compilation, LI doesn't exist because OST doesn't exist,
+   and because @link@ doesn't load code into memory, instead just
+   invokes the system linker.
+
+   \ToDo{Do we need to say anything about CAFs and SRTs?  Probably ...}
+\end{itemize}
+As with PCS, CM has no way to create an initial PLS, so we supply
+@emptyPLS@ for that purpose.
+
+\subsubsection{The linker's interface}
+
+In practice, the PLS might be hidden in the I/O monad rather
+than passed around explicitly.  (The same might be true for PCS).
+Anyway:
+
+\begin{verbatim}
+   data PLS -- as described above; opaque to everybody except the linker
+
+   link :: PCI -> ??? -> [[Linkable]] -> PLS -> IO LinkResult
+
+   data LinkResult = LinkOK   PLS
+                   | LinkErrs PLS [SDoc]
+
+   emptyPLS :: IO PLS     -- since CM has no other way to make one
+\end{verbatim}
+
+CM uses @link@ as follows:
+
+After repeatedly using @compile@ to compile all modules which are
+out-of-date, the @link@ is invoked.  The @[[Linkable]]@ argument to
+@link@ represents the list of (recursive groups of) home modules which
+have been newly compiled, along with @Linkable@s for each of
+the packages in use (the compilation manager knows which external
+packages are referenced by the home package).  The order of the list
+is important: it is sorted in such a way that linking any prefix of
+the list will result in an image with no unresolved references.  Note
+that for batch linking there may be further restrictions; for example
+it may not be possible to link recursive groups containing libraries.
+
+@link@ does the following:
+
+\begin{itemize}
+  \item 
+  In batch mode, do nothing.  In interactive mode,
+  examine the supplied @[[Linkable]]@ to determine which home 
+  module @Unlinked@s are new.  Remove precisely these @Linkable@s 
+  from PLS.  (In fact we really need to remove their upwards
+  transitive closure, but I think it is an invariant that CM will
+  supply an upwards transitive closure of new modules).
+  See below for descriptions of @Linkable@ and @Unlinked@.
+
+  \item 
+  Batch system: invoke the external linker to link everything in one go.
+  Interactive: bind the @Unlinked@s for the newly compiled modules,
+  plus those for any newly required packages, into PLS.
+
+  Note that it is the linker's responsibility to remember which
+  objects and packages have already been linked.  By comparing this
+  with the @Linkable@s supplied to @link@, it can determine which
+  of the linkables in LI are out of date
+\end{itemize}
+
+If linking in of a group should fail for some reason, @link@ should
+not modify its PLS at all.  In other words, linking each group
+is atomic; it either succeeds or fails.
+
+\subsubsection*{\mbox{\tt Unlinked} and \mbox{\tt Linkable}}
+
+Two important types: @Unlinked@ and @Linkable@.  The latter is a 
+higher-level representation involving multiple of the former.
+An @Unlinked@ is a reference to unlinked executable code, something
+a linker could take as input:
+
+\begin{verbatim}
+   data Unlinked = DotO   Path
+                 | DotA   Path            
+                 | DotDLL Path
+                 | Trees  [StgTree RdrName]
+\end{verbatim}
+
+The first three describe the location of a file (presumably)
+containing the code to link.  @Trees@, which only exists in
+interactive mode, gives a list of @StgTrees@, in which the unresolved
+references are @RdrNames@ -- hence it's non-linkedness.  Once linked,
+those @RdrNames@ are replaced with pointers to the machine code
+implementing them.
+
+A @Linkable@ gathers together several @Unlinked@s and associates them
+with either a module or package:
+
+\begin{verbatim}
+   data Linkable = LM Module  [Unlinked]   -- a module
+                 | LP PkgName              -- a package
+\end{verbatim}
+
+The order of the @Unlinked@s in the list is important, as
+they are linked in left-to-right order.  The @Unlinked@ objects for a
+particular package can be obtained from the package configuration (see
+Section \ref{sec:staticinfo}).
+
+\ToDo{When adding @Addr@s from an object module to SST, we need to
+      somehow find out the @RdrName@s of the symbols exported by that
+      module. 
+      So we'd need to pass in the @ModDetails@ or @ModIFace@ or some such?}
+
+
+
+%%-----------------------------------------------------------------%%
+\section{Background ideas}
+\subsubsection*{Out of date, but correct in spirit}
+
+\subsection{Restructuring the system}
+
+At the moment @hsc@ compiles one source module into C or assembly.
+This functionality is pushed inside a function called @compile@,
+introduced shortly.  The main new chunk of code is CM, the compilation manager,
+which supervises multiple runs of @compile@ so as to create up-to-date
+translations of a whole bunch of modules, as quickly as possible.
+CM also employs some minor helper functions, @finder@, @summarise@ and
+@link@, to do its work.
+
+Our intent is to allow CM to be used as the basis either of a 
+multi-module, batch mode compilation system, or to supply an
+interactive environment similar to that of Hugs.
+Only minor modifications to the behaviour of @compile@ and @link@ 
+are needed to give these different behaviours.
+
+CM and @compile@, and, for interactive use, an interpreter, are the
+main code components.  The most important data structure is the global
+symbol table; much design effort has been expended thereupon.
+
+
+\subsection{How the global symbol table is implemented}
+
+The top level symbol table is a @FiniteMap@ @ModuleName@
+@ModuleDetails@.  @ModuleDetails@ contains essentially the environment
+created by compiling a module.  CM manages this finite map, adding and
+deleting module entries as required.
+
+The @ModuleDetails@ for a module @M@ contains descriptions of all
+tycons, classes, instances, values, unfoldings, etc (henceforth
+referred to as ``entities''), available from @M@.  These are just
+trees in the GHCI heap.  References from other modules to these
+entities is direct -- when you have a @TyCon@ in your hand, you really
+have a pointer directly to the @TyCon@ structure in the defining module,
+rather than some kind of index into a global symbol table.  So there
+is a global symbol table, but it has a distributed (sphagetti-like?)
+nature.
+
+This gives fast and convenient access to tycon, class, instance,
+etc, information.  But because there are no levels of indirection,
+there's a problem when we replace @M@ with an updated version of @M@.
+We then need to find all references to entities in the old @M@'s
+sphagetti, and replace them with pointers to the new @M@'s sphagetti.
+This problem motivates a large part of the design.
+
+
+
+\subsection{Implementing incremental recompilation -- simple version}
+Given the following module graph
+\begin{verbatim}
+         D
+       /   \
+      /     \
+     B       C
+      \     /
+       \   /
+         A
+\end{verbatim}
+(@D@ imports @B@ and @C@, @B@ imports @A@, @C@ imports @A@) the aim is to do the
+least possible amount of compilation to bring @D@ back up to date.  The
+simplest scheme we can think of is:
+\begin{itemize}
+\item {\bf Downsweep}: 
+  starting with @D@, re-establish what the current module graph is
+  (it might have changed since last time).  This means getting a
+  @ModuleSummary@ of @D@.  The summary can be quickly generated,
+  contains @D@'s import lists, and gives some way of knowing whether
+  @D@'s source has changed since the last time it was summarised.
+
+  Transitively follow summaries from @D@, thereby establishing the
+  module graph.
+\item
+  Remove from the global symbol table (the @FiniteMap@ @ModuleName@
+  @ModuleDetails@) the upwards closure of all modules in this package
+  which are out-of-date with respect to their previous versions.  Also
+  remove all modules no longer reachable from @D@.
+\item {\bf Upsweep}:
+  Starting at the lowest point in the still-in-date module graph,
+  start compiling upwards, towards @D@.  At each module, call
+  @compile@, passing it a @FiniteMap@ @ModuleName@ @ModuleDetails@,
+  and getting a new @ModuleDetails@ for the module, which is added to
+  the map.
+
+  When compiling a module, the compiler must be able to know which
+  entries in the map are for modules in its strict downwards closure,
+  and which aren't, so that it can manufacture the instance
+  environment correctly (as union of instances in its downwards
+  closure).
+\item
+  Once @D@ has been compiled, invoke some kind of linking phase
+  if batch compilation.  For interactive use, can either do it all
+  at the end, or as you go along.
+\end{itemize}
+In this simple world, recompilation visits the upwards closure of
+all changed modules.  That means when a module @M@ is recompiled,
+we can be sure no-one has any references to entities in the old @M@,
+because modules importing @M@ will have already been removed from the 
+top-level finite map in the second step above.
+
+The upshot is that we don't need to worry about updating links to @M@ in
+the global symbol table -- there shouldn't be any to update.
+\ToDo{What about mutually recursive modules?}
+
+CM will happily chase through module interfaces in other packages in
+the downsweep.  But it will only process modules in this package
+during the upsweep.  So it assumes that modules in other packages
+never become out of date.  This is a design decision -- we could have
+decided otherwise.
+
+In fact we go further, and require other packages to be compiled,
+i.e. to consist of a collection of interface files, and one or more
+source files.  CM will never apply @compile@ to a foreign package
+module, so there's no way a package can be built on the fly from source.
+
+We require @compile@ to cache foreign package interfaces it reads, so
+that subsequent uses don't have to re-read them.  The cache never
+becomes out of date, since we've assumed that the source of foreign
+packages doesn't change during the course of a session (run of GHCI).
+As well as caching interfaces, @compile@ must cache, in some sense,
+the linkable code for modules.  In batch compilation this might simply
+mean remembering the names of object files to link, whereas in
+interactive mode @compile@ probably needs to load object code into
+memory in preparation for in-memory linking.
+
+Important signatures for this simple scheme are:
+\begin{verbatim}
+   finder :: ModuleName -> ModLocation
+
+   summarise :: ModLocation -> IO ModSummary
+
+   compile :: ModSummary 
+              -> FM ModName ModDetails
+              -> IO CompileResult
+
+   data CompileResult = CompOK  ModDetails
+                      | CompErr [ErrMsg]
+
+   link :: [ModLocation] -> [PackageLocation] -> IO Bool  -- linked ok?
+\end{verbatim}
+
+
+\subsection{Implementing incremental recompilation -- clever version}
+
+So far, our upsweep, which is the computationally expensive bit,
+recompiles a module if either its source is out of date, or it 
+imports a module which has been recompiled.  Sometimes we know
+we can do better than this:
+\begin{verbatim}
+   module B where                module A 
+   import A ( f )                {-# NOINLINE f #-}
+   ... f ...                     f x = x + 42
+\end{verbatim}
+If the definition of @f@ is changed to @f x = x + 43@, the simple
+upsweep would recompile @B@ unnecessarily.  We would like to detect
+this situation and avoid propagating recompilation all the way to the
+top.  There are two parts to this: detecting when a module doesn't
+need recompilation, and managing inter-module references in the
+global symbol table.
+
+\subsubsection*{Detecting when a module doesn't need recompilation}
+
+To do this, we introduce a new concept: the @ModuleIFace@.  This is
+effectively an in-memory interface file.  References to entities in
+other modules are done via strings, rather than being pointers
+directly to those entities.  Recall that, by comparison,
+@ModuleDetails@ do contain pointers directly to the entities they
+refer to.  So a @ModuleIFace@ is not part of the global symbol table.
+
+As before, compiling a module produces a @ModuleDetails@ (inside the
+@CompileResult@), but it also produces a @ModuleIFace@.  The latter
+records, amongst things, the version numbers of all imported entities
+needed for the compilation of that module.  @compile@ optionally also
+takes the old @ModuleIFace@ as input during compilation:
+\begin{verbatim}
+   data CompileResult = CompOK  ModDetails ModIFace
+                      | CompErr [ErrMsg]
+
+   compile :: ModSummary 
+              -> FM ModName ModDetails
+              -> Maybe ModuleIFace
+              -> IO CompileResult
+\end{verbatim}
+Now, if the @ModuleSummary@ indicates this module's source hasn't
+changed, we only need to recompile it if something it depends on has
+changed.  @compile@ can detect this by inspecting the imported entity
+version numbers in the module's old @ModuleIFace@, and comparing them
+with the version numbers from the entities in the modules being
+imported.  If they are all the same, nothing it depends on has
+changed, so there's no point in recompiling.
+
+\subsubsection*{Managing inter-module references in the global symbol table}
+
+In the above example with @A@, @B@ and @f@, the specified change to @f@ would
+require @A@ but not @B@ to be recompiled.  That generates a new
+@ModuleDetails@ for @A@.  Problem is, if we leave @B@'s @ModuleDetails@ 
+unchanged, they continue to refer (directly) to the @f@ in @A@'s old
+@ModuleDetails@.  This is not good, especially if equality between
+entities is implemented using pointer equality.
+
+One solution is to throw away @B@'s @ModuleDetails@ and recompile @B@.
+But this is precisely what we're trying to avoid, as it's expensive.
+Instead, a cheaper mechanism achieves the same thing: recreate @B@'s
+details directly from the old @ModuleIFace@.  The @ModuleIFace@ will
+(textually) mention @f@; @compile@ can then find a pointer to the 
+up-to-date global symbol table entry for @f@, and place that pointer
+in @B@'s @ModuleDetails@.  The @ModuleDetails@ are, therefore,
+regenerated just by a quick lookup pass over the module's former
+@ModuleIFace@.  All this applies, of course, only when @compile@ has
+concluded it doesn't need to recompile @B@.
+
+Now @compile@'s signature becomes a little clearer.  @compile@ has to
+recompile the module, generating a fresh @ModuleDetails@ and
+@ModuleIFace@, if any of the following hold:
+\begin{itemize}
+\item
+  The old @ModuleIFace@ wasn't supplied, for some reason (perhaps
+  we've never compiled this module before?)
+\item
+  The module's source has changed.
+\item
+  The module's source hasn't changed, but inspection of @ModuleIFaces@ 
+  for this and its imports indicates that an imported entity has
+  changed.
+\end{itemize}
+If none of those are true, we're in luck: quickly knock up a new
+@ModuleDetails@ from the old @ModuleIFace@, and return them both.
+
+As a result, the upsweep still visits all modules in the upwards
+closure of those whose sources have changed.  However, at some point
+we hopefully make a transition from generating new @ModuleDetails@ the
+expensive way (recompilation) to a cheap way (recycling old
+@ModuleIFaces@).  Either way, all modules still get new
+@ModuleDetails@, so the global symbol table is correctly
+reconstructed.
+
+
+\subsection{How linking works, roughly}
+
+When @compile@ translates a module, it produces a @ModuleDetails@,
+@ModuleIFace@ and a @Linkable@.  The @Linkable@ contains the
+translated but un-linked code for the module.  And when @compile@
+ventures into an interface in package it hasn't seen so far, it
+copies the package's object code into memory, producing one or more
+@Linkable@s.  CM keeps track of these linkables.  
+
+Once all modules have been @compile@d, CM invokes @link@, supplying
+the all the @Linkable@s it knows about.  If @compile@ had also been
+linking incrementally as it went along, @link@ doesn't have to do
+anything.  On the other hand, @compile@ could choose not to be
+incremental, and leave @link@ to do all the work.
+
+@Linkable@s are opaque to CM.  For batch compilation, a @Linkable@
+can record just the name of an object file, DLL, archive, or whatever,
+in which case the CM's call to @link@ supplies exactly the set of
+file names to be linked.  @link@ can pass these verbatim to the
+standard system linker.
+
+
+
+
+%%-----------------------------------------------------------------%%
+\section{Ancient stuff}
+\subsubsection*{Should be selectively merged into ``Background ideas''}
+
+\subsection{Overall}
+Top level structure is:
+\begin{itemize}
+\item The Compilation Manager (CM) calculates and maintains module
+      dependencies, and knows how create up-to-date object or bytecode
+      for a given module.  In doing so it may need to recompile 
+      arbitrary other modules, based on its knowledge of the module
+      dependencies.  
+\item On top of the CM are the ``user-level'' services.  We envisage
+      both a HEP-like interface, for interactive use, and an
+      @hmake@ style batch compiler facility.
+\item The CM only deals with inter-module issues.  It knows nothing
+      about how to recompile an individual module, nor where the compiled
+      result for a module lives, nor how to tell if 
+      a module is up to date, nor how to find the dependencies of a module.
+      Instead, these services are supplied abstractly to CM via a
+      @Compiler@ record.  To a first approximation, a @Compiler@
+      contains
+      the same functionality as @hsc@ has had until now -- the ability to
+      translate a single Haskell module to C/assembly/object/bytecode.
+
+      Different clients of CM (HEP vs @hmake@) may supply different
+      @Compiler@s, since they need slightly different behaviours.
+      Specifically, HEP needs a @Compiler@ which creates bytecode
+      in memory, and knows how to link it, whereas @hmake@ wants
+      the traditional behaviour of emitting assembly code to disk,
+      and making no attempt at linkage.
+\end{itemize}
+
+\subsection{Open questions}
+\begin{itemize}
+\item
+  Error reporting from @open@ and @compile@.
+\item
+  Instance environment management
+\item
+  We probably need to make interface files say what
+  packages they depend on (so that we can figure out
+  which packages to load/link).
+\item 
+  CM is parameterised both by the client uses and the @Compiler@
+  supplied.  But it doesn't make sense to have a HEP-style client
+  attached to a @hmake@-style @Compiler@.  So, really, the 
+  parameterising entity should contain both aspects, not just the
+  current @Compiler@ contents.
+\end{itemize}
+
+\subsection{Assumptions}
+
+\begin{itemize}
+\item Packages other than the "current" one are assumed to be 
+  already compiled.  
+\item
+  The "current" package is usually "MAIN",
+  but we can set it with a command-line flag.
+  One invocation of ghci has only one "current" package.
+\item
+  Packages are not mutually recursive
+\item
+  All the object code for a package P is in libP.a or libP.dll
+\end{itemize}
+
+\subsection{Stuff we need to be able to do}
+\begin{itemize}
+\item Create the environment in which a module has been translated,
+      so that interactive queries can be satisfied as if ``in'' that
+      module.
+\end{itemize}
+
+%%-----------------------------------------------------------------%%
+\section{The Compilation Manager}
+
+CM (@compilationManager@) is a functor, thus:
+\begin{verbatim}
+compilationManager :: Compiler -> IO HEP  -- IO so that it can create 
+                                          -- global vars (IORefs)
+
+data HEP = HEP {
+        load          :: ModuleName -> IO (),
+        compileString :: ModuleName -> String -> IO HValue,
+        ....
+   }
+
+newCompiler :: IO Compiler   -- ??? this is a peer of compilationManager?
+
+run :: HValue -> IO ()       -- Run an HValue of type IO ()
+                             -- In HEP?
+\end{verbatim}
+
+@load@ is the central action of CM: its job is to bring a module and
+all its descendents into an executable state, by doing the following:
+\begin{enumerate}
+\item 
+   Use @summarise@ to descend the module hierarchy, starting from the
+   nominated root, creating @ModuleSummary@s, and
+   building a map @ModuleName@ @->@ @ModuleSummary@.  @summarise@ 
+   expects to be passed absolute paths to files.  Use @finder@ to 
+   convert module names to file paths.
+\item
+   Topologically sort the map, 
+   using dependency info in the @ModuleSummary@s.
+\item
+   Clean up the symbol table by deleting the upward closure of 
+   changed modules.
+\item 
+   Working bottom to top, call @compile@ on the upward closure of 
+   all modules whose source has changed.  A module's source has
+   changed when @sourceHasChanged@ indicates there is a difference
+   between old and new summaries for the module.  Update the running
+   @FiniteMap@ @ModuleName@ @ModuleDetails@ with the new details
+   for this module.  Ditto for the running
+   @FiniteMap@ @ModuleName@ @ModuleIFace@.
+\item
+   Call @compileDone@ to signify that we've reached the top, so
+   that the batch system can now link.
+\end{enumerate}
+
+
+%%-----------------------------------------------------------------%%
+\section{A compiler}
+
+Most of the system's complexity is hidden inside the functions
+supplied in the @Compiler@ record:
+\begin{verbatim}        
+data Compiler = Compiler {        
+
+        finder :: PackageConf -> [Path] -> IO (ModuleName -> ModuleLocation)
+
+        summarise :: ModuleLocation -> IO ModuleSummary
+
+        compile :: ModuleSummary
+                -> Maybe ModuleIFace 
+                -> FiniteMap ModuleName ModuleDetails
+                -> IO CompileResult
+
+        compileDone     :: IO ()
+        compileStarting :: IO ()   -- still needed?  I don't think so.
+    }
+
+type ModuleName = String (or some such)
+type Path = String  -- an absolute file name
+\end{verbatim}
+
+\subsection{The module \mbox{\tt finder}}
+The @finder@, given a package configuration file and a list of
+directories to look in, will map module names to @ModuleLocation@s,
+in which the @Path@s are filenames, probably with an absolute path
+to them.
+\begin{verbatim}
+data ModuleLocation = SourceOnly Path        -- .hs
+                    | ObjectCode Path Path   -- .o & .hi
+                    | InPackage  Path        -- .hi
+\end{verbatim}
+@SourceOnly@ and @ObjectCode@ are unremarkable.  For sanity,
+we require that a module's object and interface be in the same
+directory.  @InPackage@ indicates that the module is in a 
+different package.
+
+@Module@ values -- perhaps all @Name@ish things -- contain the name of
+their package.  That's so that 
+\begin{itemize}
+\item Correct code can be generated for in-DLL vs out-of-DLL refs.
+\item We don't have version number dependencies for symbols
+      imported from different packages.
+\end{itemize}
+
+Somehow or other, it will be possible to know all the packages
+required, so that the for the linker can load them.
+We could detect package dependencies by recording them in the
+@compile@r's @ModuleIFace@ cache, and with that and the 
+package config info, figure out the complete set of packages
+to link.  Or look at the command line args on startup.
+
+\ToDo{Need some way to tell incremental linkers about packages,
+      since in general we'll need to load and link them before
+      linking any modules in the current package.}
+
+
+\subsection{The module \mbox{\tt summarise}r}
+Given a filename of a module (\ToDo{presumably source or iface}),
+create a summary of it.  A @ModuleSummary@ should contain only enough
+information for CM to construct an up-to-date picture of the
+dependency graph.  Rather than expose CM to details of timestamps,
+etc, @summarise@ merely provides an up-to-date summary of any module.
+CM can extract the list of dependencies from a @ModuleSummary@, but
+other than that has no idea what's inside it.
+\begin{verbatim}
+data ModuleSummary = ... (abstract) ...
+
+depsFromSummary :: ModuleSummary -> [ModuleName]   -- module names imported
+sourceHasChanged :: ModuleSummary -> ModuleSummary -> Bool
+\end{verbatim}
+@summarise@ is intended to be fast -- a @stat@ of the source or
+interface to see if it has changed, and, if so, a quick semi-parse to
+determine the new imports.
+
+\subsection{The module \mbox{\tt compile}r}
+@compile@ traffics in @ModuleIFace@s and @ModuleDetails@.  
+
+A @ModuleIFace@ is an in-memory representation of the contents of an
+interface file, including version numbers, unfoldings and pragmas, and
+the linkable code for the module.  @ModuleIFace@s are un-renamed,
+using @HsSym@/@RdrNames@ rather than (globally distinct) @Names@.
+
+@ModuleDetails@, by contrast, is an in-memory representation of the
+static environment created by compiling a module.  It is phrased in
+terms of post-renaming @Names@, @TyCon@s, etc, so it's basically a
+renamed-to-global-uniqueness rendition of a @ModuleIFace@.
+
+In an interactive session, we'll want to be able to evaluate
+expressions as if they had been compiled in the scope of some
+specified module.  This means that the @ModuleDetails@ must contain
+the type of everything defined in the module, rather than just the
+types of exported stuff.  As a consequence, @ModuleIFace@ must also
+contain the type of everything, because it should always be possible
+to generate a module's @ModuleDetails@ from its @ModuleIFace@.
+
+CM maintains two mappings, one from @ModuleName@s to @ModuleIFace@s,
+the other from @ModuleName@s to @ModuleDetail@s.  It passes the former
+to each call of @compile@.  This is used to supply information about
+modules compiled prior to this one (lower down in the graph).  The
+returned @CompileResult@ supplies a new @ModuleDetails@ for the module
+if compilation succeeded, and CM adds this to the mapping.  The
+@CompileResult@ also supplies a new @ModuleIFace@, which is either the
+same as that supplied to @compile@, if @compile@ decided not to
+retranslate the module, or is the result of a fresh translation (from
+source).  So these mappings are an explicitly-passed-around part of
+the global system state.
+
+@compile@ may also {\em optionally} also accumulate @ModuleIFace@s for
+modules in different packages -- that is, interfaces which we read,
+but never attempt to recompile source for.  Such interfaces, being
+from foreign packages, never change, so @compile@ can accumulate them
+in perpetuity in a private global variable.  Indeed, a major motivator
+of this design is to facilitate this caching of interface files,
+reading of which is a serious bottleneck for the current compiler.
+
+When CM restarts compilation down at the bottom of the module graph,
+it first needs to throw away all \ToDo{all?} @ModuleDetails@ in the
+upward closure of the out-of-date modules.  So @ModuleDetails@ don't
+persist across recompilations.  But @ModuleIFace@s do, since they
+are conceptually equivalent to interface files.
+
+
+\subsubsection*{What @compile@ returns}
+@compile@ returns a @CompileResult@ to CM.
+Note that the @compile@'s foreign-package interface cache can
+become augmented even as a result of reading interfaces for a
+compilation attempt which ultimately fails, although it will not be
+augmented with a new @ModuleIFace@ for the failed module.
+\begin{verbatim}
+-- CompileResult is not abstract to the Compilation Manager
+data CompileResult
+   = CompOK   ModuleIFace 
+              ModuleDetails    -- compiled ok, here are new details
+                               -- and new iface
+
+   | CompErr  [SDoc]           -- compilation gave errors
+
+   | NoChange                  -- no change required, meaning:
+                               -- exports, unfoldings, strictness, etc,
+                               -- unchanged, and executable code unchanged
+\end{verbatim}
+
+
+
+\subsubsection*{Re-establishing local-to-global name mappings}
+Consider
+\begin{verbatim}
+module Upper where                         module Lower ( f ) where
+import Lower ( f )                         f = ...
+g = ... f ...
+\end{verbatim}
+When @Lower@ is first compiled, @f@ is allocated a @Unique@
+(presumably inside an @Id@ or @Name@?).  When @Upper@ is then
+compiled, its reference to @f@ is attached directly to the
+@Id@ created when compiling @Lower@.
+
+If the definition of @f@ is now changed, but not the type,
+unfolding, strictness, or any other thing which affects the way
+it should be called, we will have to recompile @Lower@, but not
+@Upper@.  This creates a problem -- @g@ will then refer to the
+the old @Id@ for @f@, not the new one.  This may or may not
+matter, but it seems safer to ensure that all @Unique@-based
+references into child modules are always up to date.
+
+So @compile@ recreates the @ModuleDetails@ for @Upper@ from 
+the @ModuleIFace@ of @Upper@ and the @ModuleDetails@ of @Lower@.
+
+The rule is: if a module is up to date with respect to its
+source, but a child @C@ has changed, then either:
+\begin{itemize}
+\item On examination of the version numbers in @C@'s
+      interface/@ModuleIFace@ that we used last time, we discover that
+      an @Id@/@TyCon@/class/instance we depend on has changed.  So 
+      we need to retranslate the module from its source, generating
+      a new @ModuleIFace@ and @ModuleDetails@.
+\item Or: there's nothing in @C@'s interface that we depend on.
+      So we quickly recreate a new @ModuleDetails@ from the existing
+      @ModuleIFace@, creating fresh links to the new @Unique@-world
+      entities in @C@'s new @ModuleDetails@.
+\end{itemize}
+
+Upshot: we need to redo @compile@ on all modules all the way up,
+rather than just the ones that need retranslation.  However, we hope
+that most modules won't need retranslation -- just regeneration of the
+@ModuleDetails@ from the @ModuleIFace@.  In effect, the @ModuleIFace@
+is a quickly-compilable representation of the module's contents, just
+enough to create the @ModuleDetails@.
+
+\ToDo{Is there anything in @ModuleDetails@ which can't be
+      recreated from @ModuleIFace@ ?}
+
+So the @ModuleIFace@s persist across calls to @HEP.load@, whereas
+@ModuleDetails@ are reconstructed on every compilation pass.  This
+means that @ModuleIFace@s have the same lifetime as the byte/object
+code, and so should somehow contain their code.
+
+The behind-the-scenes @ModuleIFace@ cache has some kind of holding-pen
+arrangement, to lazify the copying-out of stuff from it, and thus to
+minimise redundant interface reading.  \ToDo{Burble burble.  More
+details.}.
+
+When CM starts working back up the module graph with @compile@, it
+needs to remove from the travelling @FiniteMap@ @ModuleName@
+@ModuleDetails@ the details for all modules in the upward closure of
+the compilation start points.  However, since we're going to visit
+precisely those modules and no others on the way back up, we might as
+well just zap them the old @ModuleDetails@ incrementally.  This does
+mean that the @FiniteMap@ @ModuleName@ @ModuleDetails@ will be
+inconsistent until we reach the top.
+
+In interactive mode, each @compile@ call on a module for which no
+object code is available, or for which it is out of date wrt source,
+emit bytecode into memory, update the resulting @ModuleIFace@ with the
+address of the bytecode image, and link the image.
+
+In batch mode, emit assembly or object code onto disk.  Record
+somewhere \ToDo{where?} that this object file needs to go into the
+final link.
+
+When we reach the top, @compileDone@ is called, to signify that batch
+linking can now proceed, if need be.
+
+Modules in other packages never get a @ModuleIFace@ or @ModuleDetails@
+entry in CM's maps -- those maps are only for modules in this package.
+As previously mentioned, @compile@ may optionally cache @ModuleIFace@s
+for foreign package modules.  When reading such an interface, we don't
+need to read the version info for individual symbols, since foreign
+packages are assumed static.
+
+\subsubsection*{What's in a \mbox{\tt ModuleIFace}?}
+
+Current interface file contents?
+
+
+\subsubsection*{What's in a \mbox{\tt ModuleDetails}?}
+
+There is no global symbol table @:: Name -> ???@.  To look up a
+@Name@, first extract the @ModuleName@ from it, look that up in
+the passed-in @FiniteMap@ @ModuleName@ @ModuleDetails@, 
+and finally look in the relevant @Env@.
+
+\ToDo{Do we still have the @HoldingPen@, or is it now composed from
+per-module bits too?}
+\begin{verbatim}
+data ModuleDetails = ModuleDetails {
+
+        moduleExports :: what it exports (Names)
+                         -- roughly a subset of the .hi file contents
+
+        moduleEnv     :: RdrName -> Name
+                         -- maps top-level entities in this module to
+                         -- globally distinct (Uniq-ified) Names
+  
+        moduleDefs    :: Bag Name -- All the things in the global symbol table
+                                  -- defined by this module
+
+        package       :: Package -- what package am I in?
+
+        lastCompile   :: Date -- of last compilation
+
+        instEnv       :: InstEnv                 -- local inst env
+        typeEnv       :: Name -> TyThing         -- local tycon env?
+   }
+
+-- A (globally unique) symbol table entry. Note that Ids contain
+-- unfoldings. 
+data TyThing = AClass Class
+             | ATyCon TyCon
+             | AnId Id 
+\end{verbatim}
+What's the stuff in @ModuleDetails@ used for?
+\begin{itemize}
+\item @moduleExports@ so that the stuff which is visible from outside
+      the module can be calculated.
+\item @moduleEnv@: \ToDo{umm err}
+\item @moduleDefs@: one reason we want this is so that we can nuke the
+      global symbol table contribs from this module when it leaves the
+      system.  \ToDo{except ... we don't have a global symbol table any
+      more.}
+\item @package@: we will need to chase arbitrarily deep into the
+      interfaces of other packages.  Of course we don't want to 
+      recompile those, but as we've read their interfaces, we may
+      as well cache that info.  So @package@ indicates whether this
+      module is in the default package, or, if not, which it is in.
+
+      Also, when we come to linking, we'll need to know which
+      packages are demanded, so we know to load their objects.
+
+\item @lastCompile@: When the module was last compiled.  If the 
+      source is older than that, then a recompilation can only be
+      required if children have changed.
+\item @typeEnv@: obvious??
+\item @instEnv@: the instances contributed by this module only.  The
+      Report allegedly says that when a module is translated, the
+      available
+      instance env is all the instances in the downward closure of
+      itself in the module graph.
+      
+      We choose to use this simple representation -- each module 
+      holds just its own instances -- and do the naive thing when
+      creating an inst env for compilation with.  If this turns out
+      to be a performance problem we'll revisit the design.
+\end{itemize}
+
+
+
+%%-----------------------------------------------------------------%%
+\section{Misc text looking for a home}
+
+\subsection*{Linking}
+
+\ToDo{All this linking stuff is now bogus.}
+
+There's an abstract @LinkState@, which is threaded through the linkery
+bits.  CM can call @addpkgs@ to notify the linker of packages
+required, and it can call @addmods@ to announce modules which need to
+be linked.  Finally, CM calls @endlink@, after which an executable
+image should be ready.  The linker may link incrementally, during each
+call of @addpkgs@ and @addmods@, or it can just store up names and do
+all the linking when @endlink@ is called.
+
+In order that incremental linking is possible, CM should specify
+packages and module groups in dependency order, ie, from the bottom up.
+
+\subsection*{In-memory linking of bytecode}
+When being HEP-like, @compile@ will translate sources to bytecodes
+in memory, with all the bytecode for a module as a contiguous lump
+outside the heap.  It needs to communicate the addresses of these
+lumps to the linker.  The linker also needs to know whether a 
+given module is available as in-memory bytecode, or whether it
+needs to load machine code from a file.
+
+I guess @LinkState@ needs to map module names to base addresses
+of their loaded images, + the nature of the image, + whether or not
+the image has been linked.
+
+\subsection*{On disk linking of object code, to give an executable}
+The @LinkState@ in this case is just a list of module and package
+names, which @addpkgs@ and @addmods@ add to.  The final @endlink@
+call can invoke the system linker.
+
+\subsection{Finding out about packages, dependencies, and auxiliary 
+            objects}
+
+Ask the @packages.conf@ file that lives with the driver at the mo.
+
+\ToDo{policy about upward closure?}
+
+
+
+\ToDo{record story about how in memory linking is done.}
+
+\ToDo{linker start/stop/initialisation/persistence.  Need to
+      say more about @LinkState@.}
+
+
+\end{document}
+
+