+++ /dev/null
-%
-% (c) The OBFUSCATION-THROUGH-GRATUITOUS-PREPROCESSOR-ABUSE Project,
-% Glasgow University, 1990-2000
-%
-
-% \documentstyle[preprint]{acmconf}
-\documentclass[11pt]{article}
-\oddsidemargin 0.1 in % Note that \oddsidemargin = \evensidemargin
-\evensidemargin 0.1 in
-\marginparwidth 0.85in % Narrow margins require narrower marginal notes
-\marginparsep 0 in
-\sloppy
-
-%\usepackage{epsfig}
-\usepackage{shortvrb}
-\MakeShortVerb{\@}
-
-%\newcommand{\note}[1]{{\em Note: #1}}
-\newcommand{\note}[1]{{{\bf Note:}\sl #1}}
-\newcommand{\ToDo}[1]{{{\bf ToDo:}\sl #1}}
-\newcommand{\Arg}[1]{\mbox{${\tt arg}_{#1}$}}
-\newcommand{\bottom}{\perp}
-
-\newcommand{\secref}[1]{Section~\ref{sec:#1}}
-\newcommand{\figref}[1]{Figure~\ref{fig:#1}}
-\newcommand{\Section}[2]{\section{#1}\label{sec:#2}}
-\newcommand{\Subsection}[2]{\subsection{#1}\label{sec:#2}}
-\newcommand{\Subsubsection}[2]{\subsubsection{#1}\label{sec:#2}}
-
-% DIMENSION OF TEXT:
-\textheight 8.5 in
-\textwidth 6.25 in
-
-\topmargin 0 in
-\headheight 0 in
-\headsep .25 in
-
-
-\setlength{\parskip}{0.15cm}
-\setlength{\parsep}{0.15cm}
-\setlength{\topsep}{0cm} % Reduces space before and after verbatim,
- % which is implemented using trivlist
-\setlength{\parindent}{0cm}
-
-\renewcommand{\textfraction}{0.2}
-\renewcommand{\floatpagefraction}{0.7}
-
-\begin{document}
-
-\title{The GHCi Draft Design, round 2}
-\author{MSR Cambridge Haskell Crew \\
- Microsoft Research Ltd., Cambridge}
-
-\maketitle
-
-%%%\tableofcontents
-%%%\newpage
-
-%%-----------------------------------------------------------------%%
-\section{Details}
-
-\subsection{Outline of the design}
-\label{sec:details-intro}
-
-The design falls into three major parts:
-\begin{itemize}
-\item The compilation manager (CM), which coordinates the
- system and supplies a HEP-like interface to clients.
-\item The module compiler (@compile@), which translates individual
- modules to interpretable or machine code.
-\item The linker (@link@),
- which maintains the executable image in interpreted mode.
-\end{itemize}
-
-There are also three auxiliary parts: the finder, which locates
-source, object and interface files, the summariser, which quickly
-finds dependency information for modules, and the static info
-(compiler flags and package details), which is unchanged over the
-course of a session.
-
-This section continues with an overview of the session-lifetime data
-structures. Then follows the finder (section~\ref{sec:finder}),
-summariser (section~\ref{sec:summariser}),
-static info (section~\ref{sec:staticinfo}),
-and finally the three big sections
-(\ref{sec:manager},~\ref{sec:compiler},~\ref{sec:linker})
-on the compilation manager, compiler and linker respectively.
-
-\subsubsection*{Some terminology}
-
-Lifetimes: the phrase {\bf session lifetime} covers a complete run of
-GHCI, encompassing multiple recompilation runs. {\bf Module lifetime}
-is a lot shorter, being that of data needed to translate a single
-module, but then discarded, for example Core, AbstractC, Stix trees.
-
-Data structures with module lifetime are well documented and understood.
-This document is mostly concerned with session-lifetime data.
-Most of these structures are ``owned'' by CM, since that's
-the only major component of GHCI which deals with session-lifetime
-issues.
-
-Modules and packages: {\bf home} refers to modules in this package,
-precisely the ones tracked and updated by the compilation manager.
-{\bf Package} refers to all other packages, which are assumed static.
-
-\subsubsection*{A summary of all session-lifetime data structures}
-
-These structures have session lifetime but not necessarily global
-visibility. Subsequent sections elaborate who can see what.
-\begin{itemize}
-\item {\bf Home Symbol Table (HST)} (owner: CM) holds the post-renaming
- environments created by compiling each home module.
-\item {\bf Home Interface Table (HIT)} (owner: CM) holds in-memory
- representations of the interface file created by compiling
- each home module.
-\item {\bf Unlinked Images (UI)} (owner: CM) are executable but as-yet
- unlinked translations of home modules only.
-\item {\bf Module Graph (MG)} (owner: CM) is the current module graph.
-\item {\bf Static Info (SI)} (owner: CM) is the package configuration
- information (PCI) and compiler flags (FLAGS).
-\item {\bf Persistent Compiler State (PCS)} (owner: @compile@)
- is @compile@'s private cache of information about package
- modules.
-\item {\bf Persistent Linker State (PLS)} (owner: @link@) is
- @link@'s private information concerning the the current
- state of the (in-memory) executable image.
-\end{itemize}
-
-
-%%-- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --%%
-\subsection{The finder (\mbox{\tt type Finder})}
-\label{sec:finder}
-
-@Path@ could be an indication of a location in a filesystem, or it
-could be some more generic kind of resource identifier, a URL for
-example.
-\begin{verbatim}
- data Path = ...
-\end{verbatim}
-
-And some names. @Module@s are now used as primary keys for various
-maps, so they are given a @Unique@.
-\begin{verbatim}
- type ModName = String -- a module name
- type PkgName = String -- a package name
- type Module = -- contains ModName and a Unique, at least
-\end{verbatim}
-
-A @ModLocation@ says where a module is, what it's called and in what
-form it is.
-\begin{verbatim}
- data ModLocation = SourceOnly Module Path -- .hs
- | ObjectCode Module Path Path -- .o, .hi
- | InPackage Module PkgName
- -- examine PCI to determine package Path
-\end{verbatim}
-
-The module finder generates @ModLocation@s from @ModName@s. We expect
-it will assume packages to be static, but we want to be able to track
-changes in home modules during the session. Specifically, we want to
-be able to notice that a module's object and interface have been
-updated, presumably by a compile run outside of the GHCI session.
-Hence the two-stage type:
-\begin{verbatim}
- type Finder = ModName -> IO ModLocation
- newFinder :: PCI -> IO Finder
-\end{verbatim}
-@newFinder@ examines the package information right at the start, but
-returns an @IO@-typed function which can inspect home module changes
-later in the session.
-
-
-%%-- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --%%
-\subsection{The summariser (\mbox{\tt summarise})}
-\label{sec:summariser}
-
-A @ModSummary@ records the minimum information needed to establish the
-module graph and determine whose source has changed. @ModSummary@s
-can be created quickly.
-\begin{verbatim}
- data ModSummary = ModSummary
- ModLocation -- location and kind
- (Maybe (String, Fingerprint))
- -- source and fingerprint if .hs
- (Maybe [ModName]) -- imports if .hs or .hi
-
- type Fingerprint = ... -- file timestamp, or source checksum?
-
- summarise :: ModLocation -> IO ModSummary
-\end{verbatim}
-
-The summary contains the location and source text, and the location
-contains the name. We would like to remove the assumption that
-sources live on disk, but I'm not sure this is good enough yet.
-
-\ToDo{Should @ModSummary@ contain source text for interface files too?}
-\ToDo{Also say that @ModIFace@ contains its module's @ModSummary@ (why?).}
-
-
-%%-- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --%%
-\subsection{Static information (SI)}
-\label{sec:staticinfo}
-
-PCI, the package configuration information, is a list of @PkgInfo@,
-each containing at least the following:
-\begin{verbatim}
- data PkgInfo
- = PkgInfo PkgName -- my name
- Path -- path to my base location
- [PkgName] -- who I depend on
- [ModName] -- modules I supply
- [Unlinked] -- paths to my object files
-
- type PCI = [PkgInfo]
-\end{verbatim}
-The @Path@s in it, including those in the @Unlinked@s, are set up
-when GHCI starts.
-
-FLAGS is a bunch of compiler options. We haven't figured out yet how
-to partition them into those for the whole session vs those for
-specific source files, so currently the best we can do is:
-\begin{verbatim}
- data FLAGS = ...
-\end{verbatim}
-
-The static information (SI) is the both of these:
-\begin{verbatim}
- data SI = SI PCI
- FLAGS
-\end{verbatim}
-
-
-
-%%-- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --%%
-\subsection{The Compilation Manager (CM)}
-\label{sec:manager}
-
-\subsubsection{Data structures owned by CM}
-
-CM maintains two maps (HST, HIT) and a set (UI). It's important to
-realise that CM only knows about the map/set-ness, and has no idea
-what a @ModDetails@, @ModIFace@ or @Linkable@ is. Only @compile@ and
-@link@ know that, and CM passes these types around without
-inspecting them.
-
-\begin{itemize}
-\item
- {\bf Home Symbol Table (HST)} @:: FiniteMap Module ModDetails@
-
- The @ModDetails@ (a couple of layers down) contain tycons, classes,
- instances, etc, collectively known as ``entities''. Referrals from
- other modules to these entities is direct, with no intervening
- indirections of any kind; conversely, these entities refer directly
- to other entities, regardless of module boundaries. HST only holds
- information for home modules; the corresponding wired-up details
- for package (non-home) modules are created on demand in the package
- symbol table (PST) inside the persistent compiler's state (PCS).
-
- CM maintains the HST, which is passed to, but not modified by,
- @compile@. If compilation of a module is successful, @compile@
- returns the resulting @ModDetails@ (inside the @CompResult@) which
- CM then adds to HST.
-
- CM throws away arbitrarily large parts of HST at the start of a
- rebuild, and uses @compile@ to incrementally reconstruct it.
-
-\item
- {\bf Home Interface Table (HIT)} @:: FiniteMap Module ModIFace@
-
- (Completely private to CM; nobody else sees this).
-
- Compilation of a module always creates a @ModIFace@, which contains
- the unlinked symbol table entries. CM maintains this @FiniteMap@
- @ModName@ @ModIFace@, with session lifetime. CM never throws away
- @ModIFace@s, but it does update them, by passing old ones to
- @compile@ if they exist, and getting new ones back.
-
- CM acquires @ModuleIFace@s from @compile@, which it only applies
- to modules in the home package. As a result, HIT only contains
- @ModuleIFace@s for modules in the home package. Those from other
- packages reside in the package interface table (PIT) which is a
- component of PCS.
-
-\item
- {\bf Unlinked Images (UI)} @:: Set Linkable@
-
- The @Linkable@s in UI represent executable but as-yet unlinked
- module translations. A @Linkable@ can contain the name of an
- object, archive or DLL file. In interactive mode, it may also be
- the STG trees derived from translating a module. So @compile@
- returns a @Linkable@ from each successful run, namely that of
- translating the module at hand.
-
- At link-time, CM supplies @Linkable@s for the upwards closure of
- all packages which have changed, to @link@. It also examines the
- @ModSummary@s for all home modules, and by examining their imports
- and the SI.PCI (package configuration info) it can determine the
- @Linkable@s from all required imported packages too.
-
- @Linkable@s and @ModIFace@s have a close relationship. Each
- translated module has a corresponding @Linkable@ somewhere.
- However, there may be @Linkable@s with no corresponding modules
- (the RTS, for example). Conversely, multiple modules may share a
- single @Linkable@ -- as is the case for any module from a
- multi-module package. For these reasons it seems appropriate to
- keep the two concepts distinct. @Linkable@s also provide
- information about the sequence in which individual package
- components should be linked, and that isn't the business of any
- specific module to know.
-
- CM passes @compile@ a module's old @ModIFace@, if it has one, in
- the hope that the module won't need recompiling. If so, @compile@
- can just return the new @ModDetails@ created from it, and CM will
- re-use the old @ModIFace@. If the module {\em is} recompiled (or
- scheduled to be loaded from disk), @compile@ returns both the
- new @ModIFace@ and new @Linkable@.
-
-\item
- {\bf Module Graph (MG)} @:: known-only-to-CM@
-
- Records, for CM's purposes, the current module graph,
- up-to-dateness and summaries. More details when I get to them.
- Only contains home modules.
-\end{itemize}
-Probably all this stuff is rolled together into the Persistent CM
-State (PCMS):
-\begin{verbatim}
- data PCMS = PCMS HST HIT UI MG
- emptyPCMS :: IO PCMS
-\end{verbatim}
-
-\subsubsection{What CM implements}
-It pretty much implements the HEP interface. First, though, define a
-containing structure for the state of the entire CM system and its
-subsystems @compile@ and @link@:
-\begin{verbatim}
- data CmState
- = CmState PCMS -- CM's stuff
- PCS -- compile's stuff
- PLS -- link's stuff
- SI -- the static info, never changes
- Finder -- the finder
-\end{verbatim}
-
-The @CmState@ is threaded through the HEP interface. In reality
-this might be done using @IORef@s, but for clarity:
-\begin{verbatim}
- type ModHandle = ... (opaque to CM/HEP clients) ...
- type HValue = ... (opaque to CM/HEP clients) ...
-
- cmInit :: FLAGS
- -> [PkgInfo]
- -> IO CmState
-
- cmLoadModule :: CmState
- -> ModName
- -> IO (CmState, Either [SDoc] ModHandle)
-
- cmGetExpr :: ModHandle
- -> CmState
- -> String -> IO (CmState, Either [SDoc] HValue)
-
- cmRunExpr :: HValue -> IO () -- don't need CmState here
-\end{verbatim}
-Almost all the huff and puff in this document pertains to @cmLoadModule@.
-
-
-\subsubsection{Implementing \mbox{\tt cmInit}}
-@cmInit@ creates an empty @CmState@ using @emptyPCMS@, @emptyPCS@,
-@emptyPLS@, making SI from the supplied flags and package info, and
-by supplying the package info the @newFinder@.
-
-
-\subsubsection{Implementing \mbox{\tt cmLoadModule}}
-
-\begin{enumerate}
-\item {\bf Downsweep:} using @finder@ and @summarise@, chase from
- the given module to
- establish the new home module graph (MG). Do not chase into
- package modules.
-\item Remove from HIT, HST, UI any modules in the old MG which are
- not in the new one. The old MG is then replaced by the new one.
-\item Topologically sort MG to generate a bottom-to-top traversal
- order, giving a worklist.
-\item {\bf Upsweep:} call @compile@ on each module in the worklist in
- turn, passing it
- the ``correct'' HST, PCS, the old @ModIFace@ if
- available, and the summary. ``Correct'' HST in the sense that
- HST contains only the modules in the this module's downward
- closure, so that @compile@ can construct the correct instance
- and rule environments simply as the union of those in
- the module's downward closure.
-
- If @compile@ doesn't return a new interface/linkable pair,
- compilation wasn't necessary. Either way, update HST with
- the new @ModDetails@, and UI and HIT respectively if a
- compilation {\em did} occur.
-
- Keep going until the root module is successfully done, or
- compilation fails.
-
-\item If the previous step terminated because compilation failed,
- define the successful set as those modules in successfully
- completed SCCs, i.e. all @Linkable@s returned by @compile@ excluding
- those from modules in any cycle which includes the module which failed.
- Remove from HST, HIT, UI and MG all modules mentioned in MG which
- are not in the successful set. Call @link@ with the successful
- set,
- which should succeed. The net effect is to back off to a point
- in which those modules which are still aboard are correctly
- compiled and linked.
-
- If the previous step terminated successfully,
- call @link@ passing it the @Linkable@s in the upward closure of
- all those modules for which @compile@ produced a new @Linkable@.
-\end{enumerate}
-As a small optimisation, do this:
-\begin{enumerate}
-\item[3a.] Remove from the worklist any module M where M's source
- hasn't changed and neither has the source of any module in M's
- downward closure. This has the effect of not starting the upsweep
- right at the bottom of the graph when that's not needed.
- Source-change checking can be done quickly by CM by comparing
- summaries of modules in MG against corresponding
- summaries from the old MG.
-\end{enumerate}
-
-
-%%-- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --%%
-\subsection{The compiler (\mbox{\tt compile})}
-\label{sec:compiler}
-
-\subsubsection{Data structures owned by \mbox{\tt compile}}
-
-{\bf Persistent Compiler State (PCS)} @:: known-only-to-compile@
-
-This contains info about foreign packages only, acting as a cache,
-which is private to @compile@. The cache never becomes out of
-date. There are three parts to it:
-
- \begin{itemize}
- \item
- {\bf Package Interface Table (PIT)} @:: FiniteMap Module ModIFace@
-
- @compile@ reads interfaces from modules in foreign packages, and
- caches them in the PIT. Subsequent imports of the same module get
- them directly out of the PIT, avoiding slow lexing/parsing phases.
- Because foreign packages are assumed never to become out of date,
- all contents of PIT remain valid forever. @compile@ of course
- tries to find package interfaces in PIT in preference to reading
- them from files.
-
- Both successful and failed runs of @compile@ can add arbitrary
- numbers of new interfaces to the PIT. The failed runs don't matter
- because we assume that packages are static, so the data cached even
- by a failed run is valid forever (ie for the rest of the session).
-
- \item
- {\bf Package Symbol Table (PST)} @:: FiniteMap Module ModDetails@
-
- Adding an package interface to PIT doesn't make it directly usable
- to @compile@, because it first needs to be wired (renamed +
- typechecked) into the sphagetti of the HST. On the other hand,
- most modules only use a few entities from any imported interface,
- so wiring-in the interface at PIT-entry time might be a big time
- waster. Also, wiring in an interface could mean reading other
- interfaces, and we don't want to do that unnecessarily.
-
- The PST avoids these problems by allowing incremental wiring-in to
- happen. Pieces of foreign interfaces are copied out of the holding
- pen (HP), renamed, typechecked, and placed in the PST, but only as
- @compile@ discovers it needs them. In the process of incremental
- renaming/typechecking, @compile@ may need to read more package
- interfaces, which are added to the PIT and hence to
- HP.~\ToDo{How? When?}
-
- CM passes the PST to @compile@ and is returned an updated version
- on both success and failure.
-
- \item
- {\bf Holding Pen (HP)} @:: HoldingPen@
-
- HP holds parsed but not-yet renamed-or-typechecked fragments of
- package interfaces. As typechecking of other modules progresses,
- fragments are removed (``slurped'') from HP, renamed and
- typechecked, and placed in PCS.PST (see above). Slurping a
- fragment may require new interfaces to be read into HP. The hope
- is, though, that many fragments will never get slurped, reducing
- the total number of interfaces read (as compared to eager slurping).
-
- \end{itemize}
-
- PCS is opaque to CM; only @compile@ knows what's in it, and how to
- update it. Because packages are assumed static, PCS never becomes
- out of date. So CM only needs to be able to create an empty PCS,
- with @emptyPCS@, and thence just passes it through @compile@ with
- no further ado.
-
- In return, @compile@ must promise not to store in PCS any
- information pertaining to the home modules. If it did so, CM would
- need to have a way to remove this information prior to commencing a
- rebuild, which conflicts with PCS's opaqueness to CM.
-
-
-
-
-\subsubsection{What {\tt compile} does}
-@compile@ is necessarily somewhat complex. We've decided to do away
-with private global variables -- they make the design specification
-less clear, although the implementation might use them. Without
-further ado:
-\begin{verbatim}
- compile :: SI -- obvious
- -> Finder -- to find modules
- -> ModSummary -- summary, including source
- -> Maybe ModIFace
- -- former summary, if avail
- -> HST -- for home module ModDetails
- -> PCS -- IN: the persistent compiler state
-
- -> IO CompResult
-
- data CompResult
- = CompOK ModDetails -- new details (== HST additions)
- (Maybe (ModIFace, Linkable))
- -- summary and code; Nothing => compilation
- -- not needed (old summary and code are still valid)
- PCS -- updated PCS
- [SDoc] -- warnings
-
- | CompErrs PCS -- updated PCS
- [SDoc] -- warnings and errors
-
- data PCS
- = MkPCS PIT -- package interfaces
- PST -- post slurping global symtab contribs
- HoldingPen -- pre slurping interface bits and pieces
-
- emptyPCS :: IO PCS -- since CM has no other way to make one
-\end{verbatim}
-Although @compile@ is passed three of the global structures (FLAGS,
-HST and PCS), it only modifies PCS. The rest are modified by CM as it
-sees fit, from the stuff returned in the @CompResult@.
-
-@compile@ is allowed to return an updated PCS even if compilation
-errors occur, since the information in it pertains only to foreign
-packages and is assumed to be always-correct.
-
-What @compile@ does: \ToDo{A bit vague ... needs refining. How does
- @finder@ come into the game?}
-\begin{itemize}
-\item Figure out if this module needs recompilation.
- \begin{itemize}
- \item If there's no old @ModIFace@, it does. Else:
- \item Compare the @ModSummary@ supplied with that in the
- old @ModIFace@. If the source has changed, recompilation
- is needed. Else:
- \item Compare the usage version numbers in the old @ModIFace@ with
- those in the imported @ModIFace@s. All needed interfaces
- for this should be in either HIT or PIT. If any version
- numbers differ, recompilation is needed.
- \item Otherwise it isn't needed.
- \end{itemize}
-
-\item
- If recompilation is not needed, create a new @ModDetails@ from the
- old @ModIFace@, looking up information in HST and PCS.PST as
- necessary. Return the new details, a @Nothing@ denoting
- compilation was not needed, the PCS \ToDo{I don't think the PCS
- should be updated, but who knows?}, and an empty warning list.
-
-\item
- Otherwise, compilation is needed.
-
- If the module is only available in object+interface form, read the
- interface, make up details, create a linkable pointing at the
- object code. \ToDo{Does this involve reading any more interfaces? Does
- it involve updating PST?}
-
- Otherwise, translate from source, then create and return: an
- details, interface, linkable, updated PST, and warnings.
-
- When looking for a new interface, search HST, then PCS.PIT, and only
- then read from disk. In which case add the new interface(s) to
- PCS.PIT.
-
- \ToDo{If compiling a module with a boot-interface file, check the
- boot interface against the inferred interface.}
-\end{itemize}
-
-
-\subsubsection{Contents of \mbox{\tt ModDetails},
- \mbox{\tt ModIFace} and \mbox{\tt HoldingPen}}
-Only @compile@ can see inside these three types -- they are opaque to
-everyone else. @ModDetails@ holds the post-renaming,
-post-typechecking environment created by compiling a module.
-
-\begin{verbatim}
- data ModDetails
- = ModDetails {
- moduleExports :: Avails
- moduleEnv :: GlobalRdrEnv -- == FM RdrName [Name]
- typeEnv :: FM Name TyThing -- TyThing is in TcEnv.lhs
- instEnv :: InstEnv
- fixityEnv :: FM Name Fixity
- ruleEnv :: FM Id [Rule]
- }
-\end{verbatim}
-
-@ModIFace@ is nearly the same as @ParsedIFace@ from @RnMonad.lhs@:
-\begin{verbatim}
- type ModIFace = ParsedIFace -- not really, but ...
- data ParsedIface
- = ParsedIface {
- pi_mod :: Module, -- Complete with package info
- pi_vers :: Version, -- Module version number
- pi_orphan :: WhetherHasOrphans, -- Whether this module has orphans
- pi_usages :: [ImportVersion OccName], -- Usages
- pi_exports :: [ExportItem], -- Exports
- pi_insts :: [RdrNameInstDecl], -- Local instance declarations
- pi_decls :: [(Version, RdrNameHsDecl)], -- Local definitions
- pi_fixity :: (Version, [RdrNameFixitySig]), -- Local fixity declarations,
- -- with their version
- pi_rules :: (Version, [RdrNameRuleDecl]), -- Rules, with their version
- pi_deprecs :: [RdrNameDeprecation] -- Deprecations
- }
-\end{verbatim}
-
-@HoldingPen@ is a cleaned-up version of that found in @RnMonad.lhs@,
-retaining just the 3 pieces actually comprising the holding pen:
-\begin{verbatim}
- data HoldingPen
- = HoldingPen {
- iDecls :: DeclsMap, -- A single, global map of Names to decls
-
- iInsts :: IfaceInsts,
- -- The as-yet un-slurped instance decls; this bag is depleted when we
- -- slurp an instance decl so that we don't slurp the same one twice.
- -- Each is 'gated' by the names that must be available before
- -- this instance decl is needed.
-
- iRules :: IfaceRules
- -- Similar to instance decls, only for rules
- }
-\end{verbatim}
-
-%%-- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --%%
-\subsection{The linker (\mbox{\tt link})}
-\label{sec:linker}
-
-\subsubsection{Data structures owned by the linker}
-
-In the same way that @compile@ has a persistent compiler state (PCS),
-the linker has a persistent (session-lifetime) state, PLS, the
-Linker's Persistent State. In batch mode PLS is entirely irrelevant,
-because there is only a single link step, and can be a unit value
-ignored by everybody. In interactive mode PLS is composed of the
-following three parts:
-
-\begin{itemize}
-\item
-\textbf{The Source Symbol Table (SST)}@ :: FiniteMap RdrName HValue@
- The source symbol table is used when linking interpreted code.
- Unlinked interpreted code consists of an STG tree where
- the leaves are @RdrNames@. The linker's job is to resolve these to
- actual addresses (the alternative is to resolve these lazily when
- the code is run, but this requires passing the full symbol table
- through the interpreter and the repeated lookups will probably be
- expensive).
-
- The source symbol table therefore maps @RdrName@s to @HValue@s, for
- every @RdrName@ that currently \emph{has} an @HValue@, including all
- exported functions from object code modules that are currently
- linked in. Linking therefore turns a @StgTree RdrName@ into an
- @StgTree HValue@.
-
- It is important that we can prune this symbol table by throwing away
- the mappings for an entire module, whenever we recompile/relink a
- given module. The representation is therefore probably a two-level
- mapping, from module names, to function/constructor names, to
- @HValue@s.
-
-\item \textbf{The Object Symbol Table (OST)}@ :: FiniteMap String Addr@
- This is a lower level symbol table, mapping symbol names in object
- modules to their addresses in memory. It is used only when
- resolving the external references in an object module, and contains
- only entries that are defined in object modules.
-
- Why have two symbol tables? Well, there is a clear distinction
- between the two: the source symbol table maps Haskell symbols to
- Haskell values, and the object symbol table maps object symbols to
- addresses. There is some overlap, in that Haskell symbols certainly
- have addresses, and we could look up a Haskell symbol's address by
- manufacturing the right object symbol and looking that up in the
- object symbol table, but this is likely to be slow and would force
- us to extend the object symbol table with all the symbols
- ``exported'' by interpreted code. Doing it this way enables us to
- decouple the object management subsystem from the rest of the linker
- with a minimal interface; something like
-
- \begin{verbatim}
- loadObject :: Unlinked -> IO Object
- unloadModule :: Unlinked -> IO ()
- lookupSymbol :: String -> IO Addr
- \end{verbatim}
-
- Rather unfortunately we need @lookupSymbol@ in order to populate the
- source symbol table when linking in a new compiled module. Our
- object management subsystem is currently written in C, so decoupling
- this interface as much as possible is highly desirable.
-
-\item
- {\bf Linked Image (LI)} @:: no-explicit-representation@
-
- LI isn't explicitly represented in the system, but we record it
- here for completeness anyway. LI is the current set of
- linked-together module, package and other library fragments
- constituting the current executable mass. LI comprises:
- \begin{itemize}
- \item Machine code (@.o@, @.a@, @.DLL@ file images) in memory.
- These are loaded from disk when needed, and stored in
- @malloc@ville. To simplify storage management, they are
- never freed or reused, since this creates serious
- complications for storage management. When no longer needed,
- they are simply abandoned. New linkings of the same object
- code produces new copies in memory. We hope this not to be
- too much of a space leak.
- \item STG trees, which live in the GHCI heap and are managed by the
- storage manager in the usual way. They are held alive (are
- reachable) via the @HValue@s in the OST. Such @HValue@s are
- applications of the interpreter function to the trees
- themselves. Linking a tree comprises travelling over the
- tree, replacing all the @Id@s with pointers directly to the
- relevant @_closure@ labels, as determined by searching the
- OST. Once the leaves are linked, trees are wrapped with the
- interpreter function. The resulting @HValue@s then behave
- indistinguishably from compiled versions of the same code.
- \end{itemize}
- Because object code is outside the heap and never deallocated,
- whilst interpreted code is held alive via the HST, there's no need
- to have a data structure which ``is'' the linked image.
-
- For batch compilation, LI doesn't exist because OST doesn't exist,
- and because @link@ doesn't load code into memory, instead just
- invokes the system linker.
-
- \ToDo{Do we need to say anything about CAFs and SRTs? Probably ...}
-\end{itemize}
-As with PCS, CM has no way to create an initial PLS, so we supply
-@emptyPLS@ for that purpose.
-
-\subsubsection{The linker's interface}
-
-In practice, the PLS might be hidden in the I/O monad rather
-than passed around explicitly. (The same might be true for PCS).
-Anyway:
-
-\begin{verbatim}
- data PLS -- as described above; opaque to everybody except the linker
-
- link :: PCI -> ??? -> [[Linkable]] -> PLS -> IO LinkResult
-
- data LinkResult = LinkOK PLS
- | LinkErrs PLS [SDoc]
-
- emptyPLS :: IO PLS -- since CM has no other way to make one
-\end{verbatim}
-
-CM uses @link@ as follows:
-
-After repeatedly using @compile@ to compile all modules which are
-out-of-date, the @link@ is invoked. The @[[Linkable]]@ argument to
-@link@ represents the list of (recursive groups of) home modules which
-have been newly compiled, along with @Linkable@s for each of
-the packages in use (the compilation manager knows which external
-packages are referenced by the home package). The order of the list
-is important: it is sorted in such a way that linking any prefix of
-the list will result in an image with no unresolved references. Note
-that for batch linking there may be further restrictions; for example
-it may not be possible to link recursive groups containing libraries.
-
-@link@ does the following:
-
-\begin{itemize}
- \item
- In batch mode, do nothing. In interactive mode,
- examine the supplied @[[Linkable]]@ to determine which home
- module @Unlinked@s are new. Remove precisely these @Linkable@s
- from PLS. (In fact we really need to remove their upwards
- transitive closure, but I think it is an invariant that CM will
- supply an upwards transitive closure of new modules).
- See below for descriptions of @Linkable@ and @Unlinked@.
-
- \item
- Batch system: invoke the external linker to link everything in one go.
- Interactive: bind the @Unlinked@s for the newly compiled modules,
- plus those for any newly required packages, into PLS.
-
- Note that it is the linker's responsibility to remember which
- objects and packages have already been linked. By comparing this
- with the @Linkable@s supplied to @link@, it can determine which
- of the linkables in LI are out of date
-\end{itemize}
-
-If linking in of a group should fail for some reason, @link@ should
-not modify its PLS at all. In other words, linking each group
-is atomic; it either succeeds or fails.
-
-\subsubsection*{\mbox{\tt Unlinked} and \mbox{\tt Linkable}}
-
-Two important types: @Unlinked@ and @Linkable@. The latter is a
-higher-level representation involving multiple of the former.
-An @Unlinked@ is a reference to unlinked executable code, something
-a linker could take as input:
-
-\begin{verbatim}
- data Unlinked = DotO Path
- | DotA Path
- | DotDLL Path
- | Trees [StgTree RdrName]
-\end{verbatim}
-
-The first three describe the location of a file (presumably)
-containing the code to link. @Trees@, which only exists in
-interactive mode, gives a list of @StgTrees@, in which the unresolved
-references are @RdrNames@ -- hence it's non-linkedness. Once linked,
-those @RdrNames@ are replaced with pointers to the machine code
-implementing them.
-
-A @Linkable@ gathers together several @Unlinked@s and associates them
-with either a module or package:
-
-\begin{verbatim}
- data Linkable = LM Module [Unlinked] -- a module
- | LP PkgName -- a package
-\end{verbatim}
-
-The order of the @Unlinked@s in the list is important, as
-they are linked in left-to-right order. The @Unlinked@ objects for a
-particular package can be obtained from the package configuration (see
-Section \ref{sec:staticinfo}).
-
-\ToDo{When adding @Addr@s from an object module to SST, we need to
- somehow find out the @RdrName@s of the symbols exported by that
- module.
- So we'd need to pass in the @ModDetails@ or @ModIFace@ or some such?}
-
-
-
-%%-----------------------------------------------------------------%%
-\section{Background ideas}
-\subsubsection*{Out of date, but correct in spirit}
-
-\subsection{Restructuring the system}
-
-At the moment @hsc@ compiles one source module into C or assembly.
-This functionality is pushed inside a function called @compile@,
-introduced shortly. The main new chunk of code is CM, the compilation manager,
-which supervises multiple runs of @compile@ so as to create up-to-date
-translations of a whole bunch of modules, as quickly as possible.
-CM also employs some minor helper functions, @finder@, @summarise@ and
-@link@, to do its work.
-
-Our intent is to allow CM to be used as the basis either of a
-multi-module, batch mode compilation system, or to supply an
-interactive environment similar to that of Hugs.
-Only minor modifications to the behaviour of @compile@ and @link@
-are needed to give these different behaviours.
-
-CM and @compile@, and, for interactive use, an interpreter, are the
-main code components. The most important data structure is the global
-symbol table; much design effort has been expended thereupon.
-
-
-\subsection{How the global symbol table is implemented}
-
-The top level symbol table is a @FiniteMap@ @ModuleName@
-@ModuleDetails@. @ModuleDetails@ contains essentially the environment
-created by compiling a module. CM manages this finite map, adding and
-deleting module entries as required.
-
-The @ModuleDetails@ for a module @M@ contains descriptions of all
-tycons, classes, instances, values, unfoldings, etc (henceforth
-referred to as ``entities''), available from @M@. These are just
-trees in the GHCI heap. References from other modules to these
-entities is direct -- when you have a @TyCon@ in your hand, you really
-have a pointer directly to the @TyCon@ structure in the defining module,
-rather than some kind of index into a global symbol table. So there
-is a global symbol table, but it has a distributed (sphagetti-like?)
-nature.
-
-This gives fast and convenient access to tycon, class, instance,
-etc, information. But because there are no levels of indirection,
-there's a problem when we replace @M@ with an updated version of @M@.
-We then need to find all references to entities in the old @M@'s
-sphagetti, and replace them with pointers to the new @M@'s sphagetti.
-This problem motivates a large part of the design.
-
-
-
-\subsection{Implementing incremental recompilation -- simple version}
-Given the following module graph
-\begin{verbatim}
- D
- / \
- / \
- B C
- \ /
- \ /
- A
-\end{verbatim}
-(@D@ imports @B@ and @C@, @B@ imports @A@, @C@ imports @A@) the aim is to do the
-least possible amount of compilation to bring @D@ back up to date. The
-simplest scheme we can think of is:
-\begin{itemize}
-\item {\bf Downsweep}:
- starting with @D@, re-establish what the current module graph is
- (it might have changed since last time). This means getting a
- @ModuleSummary@ of @D@. The summary can be quickly generated,
- contains @D@'s import lists, and gives some way of knowing whether
- @D@'s source has changed since the last time it was summarised.
-
- Transitively follow summaries from @D@, thereby establishing the
- module graph.
-\item
- Remove from the global symbol table (the @FiniteMap@ @ModuleName@
- @ModuleDetails@) the upwards closure of all modules in this package
- which are out-of-date with respect to their previous versions. Also
- remove all modules no longer reachable from @D@.
-\item {\bf Upsweep}:
- Starting at the lowest point in the still-in-date module graph,
- start compiling upwards, towards @D@. At each module, call
- @compile@, passing it a @FiniteMap@ @ModuleName@ @ModuleDetails@,
- and getting a new @ModuleDetails@ for the module, which is added to
- the map.
-
- When compiling a module, the compiler must be able to know which
- entries in the map are for modules in its strict downwards closure,
- and which aren't, so that it can manufacture the instance
- environment correctly (as union of instances in its downwards
- closure).
-\item
- Once @D@ has been compiled, invoke some kind of linking phase
- if batch compilation. For interactive use, can either do it all
- at the end, or as you go along.
-\end{itemize}
-In this simple world, recompilation visits the upwards closure of
-all changed modules. That means when a module @M@ is recompiled,
-we can be sure no-one has any references to entities in the old @M@,
-because modules importing @M@ will have already been removed from the
-top-level finite map in the second step above.
-
-The upshot is that we don't need to worry about updating links to @M@ in
-the global symbol table -- there shouldn't be any to update.
-\ToDo{What about mutually recursive modules?}
-
-CM will happily chase through module interfaces in other packages in
-the downsweep. But it will only process modules in this package
-during the upsweep. So it assumes that modules in other packages
-never become out of date. This is a design decision -- we could have
-decided otherwise.
-
-In fact we go further, and require other packages to be compiled,
-i.e. to consist of a collection of interface files, and one or more
-source files. CM will never apply @compile@ to a foreign package
-module, so there's no way a package can be built on the fly from source.
-
-We require @compile@ to cache foreign package interfaces it reads, so
-that subsequent uses don't have to re-read them. The cache never
-becomes out of date, since we've assumed that the source of foreign
-packages doesn't change during the course of a session (run of GHCI).
-As well as caching interfaces, @compile@ must cache, in some sense,
-the linkable code for modules. In batch compilation this might simply
-mean remembering the names of object files to link, whereas in
-interactive mode @compile@ probably needs to load object code into
-memory in preparation for in-memory linking.
-
-Important signatures for this simple scheme are:
-\begin{verbatim}
- finder :: ModuleName -> ModLocation
-
- summarise :: ModLocation -> IO ModSummary
-
- compile :: ModSummary
- -> FM ModName ModDetails
- -> IO CompileResult
-
- data CompileResult = CompOK ModDetails
- | CompErr [ErrMsg]
-
- link :: [ModLocation] -> [PackageLocation] -> IO Bool -- linked ok?
-\end{verbatim}
-
-
-\subsection{Implementing incremental recompilation -- clever version}
-
-So far, our upsweep, which is the computationally expensive bit,
-recompiles a module if either its source is out of date, or it
-imports a module which has been recompiled. Sometimes we know
-we can do better than this:
-\begin{verbatim}
- module B where module A
- import A ( f ) {-# NOINLINE f #-}
- ... f ... f x = x + 42
-\end{verbatim}
-If the definition of @f@ is changed to @f x = x + 43@, the simple
-upsweep would recompile @B@ unnecessarily. We would like to detect
-this situation and avoid propagating recompilation all the way to the
-top. There are two parts to this: detecting when a module doesn't
-need recompilation, and managing inter-module references in the
-global symbol table.
-
-\subsubsection*{Detecting when a module doesn't need recompilation}
-
-To do this, we introduce a new concept: the @ModuleIFace@. This is
-effectively an in-memory interface file. References to entities in
-other modules are done via strings, rather than being pointers
-directly to those entities. Recall that, by comparison,
-@ModuleDetails@ do contain pointers directly to the entities they
-refer to. So a @ModuleIFace@ is not part of the global symbol table.
-
-As before, compiling a module produces a @ModuleDetails@ (inside the
-@CompileResult@), but it also produces a @ModuleIFace@. The latter
-records, amongst things, the version numbers of all imported entities
-needed for the compilation of that module. @compile@ optionally also
-takes the old @ModuleIFace@ as input during compilation:
-\begin{verbatim}
- data CompileResult = CompOK ModDetails ModIFace
- | CompErr [ErrMsg]
-
- compile :: ModSummary
- -> FM ModName ModDetails
- -> Maybe ModuleIFace
- -> IO CompileResult
-\end{verbatim}
-Now, if the @ModuleSummary@ indicates this module's source hasn't
-changed, we only need to recompile it if something it depends on has
-changed. @compile@ can detect this by inspecting the imported entity
-version numbers in the module's old @ModuleIFace@, and comparing them
-with the version numbers from the entities in the modules being
-imported. If they are all the same, nothing it depends on has
-changed, so there's no point in recompiling.
-
-\subsubsection*{Managing inter-module references in the global symbol table}
-
-In the above example with @A@, @B@ and @f@, the specified change to @f@ would
-require @A@ but not @B@ to be recompiled. That generates a new
-@ModuleDetails@ for @A@. Problem is, if we leave @B@'s @ModuleDetails@
-unchanged, they continue to refer (directly) to the @f@ in @A@'s old
-@ModuleDetails@. This is not good, especially if equality between
-entities is implemented using pointer equality.
-
-One solution is to throw away @B@'s @ModuleDetails@ and recompile @B@.
-But this is precisely what we're trying to avoid, as it's expensive.
-Instead, a cheaper mechanism achieves the same thing: recreate @B@'s
-details directly from the old @ModuleIFace@. The @ModuleIFace@ will
-(textually) mention @f@; @compile@ can then find a pointer to the
-up-to-date global symbol table entry for @f@, and place that pointer
-in @B@'s @ModuleDetails@. The @ModuleDetails@ are, therefore,
-regenerated just by a quick lookup pass over the module's former
-@ModuleIFace@. All this applies, of course, only when @compile@ has
-concluded it doesn't need to recompile @B@.
-
-Now @compile@'s signature becomes a little clearer. @compile@ has to
-recompile the module, generating a fresh @ModuleDetails@ and
-@ModuleIFace@, if any of the following hold:
-\begin{itemize}
-\item
- The old @ModuleIFace@ wasn't supplied, for some reason (perhaps
- we've never compiled this module before?)
-\item
- The module's source has changed.
-\item
- The module's source hasn't changed, but inspection of @ModuleIFaces@
- for this and its imports indicates that an imported entity has
- changed.
-\end{itemize}
-If none of those are true, we're in luck: quickly knock up a new
-@ModuleDetails@ from the old @ModuleIFace@, and return them both.
-
-As a result, the upsweep still visits all modules in the upwards
-closure of those whose sources have changed. However, at some point
-we hopefully make a transition from generating new @ModuleDetails@ the
-expensive way (recompilation) to a cheap way (recycling old
-@ModuleIFaces@). Either way, all modules still get new
-@ModuleDetails@, so the global symbol table is correctly
-reconstructed.
-
-
-\subsection{How linking works, roughly}
-
-When @compile@ translates a module, it produces a @ModuleDetails@,
-@ModuleIFace@ and a @Linkable@. The @Linkable@ contains the
-translated but un-linked code for the module. And when @compile@
-ventures into an interface in package it hasn't seen so far, it
-copies the package's object code into memory, producing one or more
-@Linkable@s. CM keeps track of these linkables.
-
-Once all modules have been @compile@d, CM invokes @link@, supplying
-the all the @Linkable@s it knows about. If @compile@ had also been
-linking incrementally as it went along, @link@ doesn't have to do
-anything. On the other hand, @compile@ could choose not to be
-incremental, and leave @link@ to do all the work.
-
-@Linkable@s are opaque to CM. For batch compilation, a @Linkable@
-can record just the name of an object file, DLL, archive, or whatever,
-in which case the CM's call to @link@ supplies exactly the set of
-file names to be linked. @link@ can pass these verbatim to the
-standard system linker.
-
-
-
-
-%%-----------------------------------------------------------------%%
-\section{Ancient stuff}
-\subsubsection*{Should be selectively merged into ``Background ideas''}
-
-\subsection{Overall}
-Top level structure is:
-\begin{itemize}
-\item The Compilation Manager (CM) calculates and maintains module
- dependencies, and knows how create up-to-date object or bytecode
- for a given module. In doing so it may need to recompile
- arbitrary other modules, based on its knowledge of the module
- dependencies.
-\item On top of the CM are the ``user-level'' services. We envisage
- both a HEP-like interface, for interactive use, and an
- @hmake@ style batch compiler facility.
-\item The CM only deals with inter-module issues. It knows nothing
- about how to recompile an individual module, nor where the compiled
- result for a module lives, nor how to tell if
- a module is up to date, nor how to find the dependencies of a module.
- Instead, these services are supplied abstractly to CM via a
- @Compiler@ record. To a first approximation, a @Compiler@
- contains
- the same functionality as @hsc@ has had until now -- the ability to
- translate a single Haskell module to C/assembly/object/bytecode.
-
- Different clients of CM (HEP vs @hmake@) may supply different
- @Compiler@s, since they need slightly different behaviours.
- Specifically, HEP needs a @Compiler@ which creates bytecode
- in memory, and knows how to link it, whereas @hmake@ wants
- the traditional behaviour of emitting assembly code to disk,
- and making no attempt at linkage.
-\end{itemize}
-
-\subsection{Open questions}
-\begin{itemize}
-\item
- Error reporting from @open@ and @compile@.
-\item
- Instance environment management
-\item
- We probably need to make interface files say what
- packages they depend on (so that we can figure out
- which packages to load/link).
-\item
- CM is parameterised both by the client uses and the @Compiler@
- supplied. But it doesn't make sense to have a HEP-style client
- attached to a @hmake@-style @Compiler@. So, really, the
- parameterising entity should contain both aspects, not just the
- current @Compiler@ contents.
-\end{itemize}
-
-\subsection{Assumptions}
-
-\begin{itemize}
-\item Packages other than the "current" one are assumed to be
- already compiled.
-\item
- The "current" package is usually "MAIN",
- but we can set it with a command-line flag.
- One invocation of ghci has only one "current" package.
-\item
- Packages are not mutually recursive
-\item
- All the object code for a package P is in libP.a or libP.dll
-\end{itemize}
-
-\subsection{Stuff we need to be able to do}
-\begin{itemize}
-\item Create the environment in which a module has been translated,
- so that interactive queries can be satisfied as if ``in'' that
- module.
-\end{itemize}
-
-%%-----------------------------------------------------------------%%
-\section{The Compilation Manager}
-
-CM (@compilationManager@) is a functor, thus:
-\begin{verbatim}
-compilationManager :: Compiler -> IO HEP -- IO so that it can create
- -- global vars (IORefs)
-
-data HEP = HEP {
- load :: ModuleName -> IO (),
- compileString :: ModuleName -> String -> IO HValue,
- ....
- }
-
-newCompiler :: IO Compiler -- ??? this is a peer of compilationManager?
-
-run :: HValue -> IO () -- Run an HValue of type IO ()
- -- In HEP?
-\end{verbatim}
-
-@load@ is the central action of CM: its job is to bring a module and
-all its descendents into an executable state, by doing the following:
-\begin{enumerate}
-\item
- Use @summarise@ to descend the module hierarchy, starting from the
- nominated root, creating @ModuleSummary@s, and
- building a map @ModuleName@ @->@ @ModuleSummary@. @summarise@
- expects to be passed absolute paths to files. Use @finder@ to
- convert module names to file paths.
-\item
- Topologically sort the map,
- using dependency info in the @ModuleSummary@s.
-\item
- Clean up the symbol table by deleting the upward closure of
- changed modules.
-\item
- Working bottom to top, call @compile@ on the upward closure of
- all modules whose source has changed. A module's source has
- changed when @sourceHasChanged@ indicates there is a difference
- between old and new summaries for the module. Update the running
- @FiniteMap@ @ModuleName@ @ModuleDetails@ with the new details
- for this module. Ditto for the running
- @FiniteMap@ @ModuleName@ @ModuleIFace@.
-\item
- Call @compileDone@ to signify that we've reached the top, so
- that the batch system can now link.
-\end{enumerate}
-
-
-%%-----------------------------------------------------------------%%
-\section{A compiler}
-
-Most of the system's complexity is hidden inside the functions
-supplied in the @Compiler@ record:
-\begin{verbatim}
-data Compiler = Compiler {
-
- finder :: PackageConf -> [Path] -> IO (ModuleName -> ModuleLocation)
-
- summarise :: ModuleLocation -> IO ModuleSummary
-
- compile :: ModuleSummary
- -> Maybe ModuleIFace
- -> FiniteMap ModuleName ModuleDetails
- -> IO CompileResult
-
- compileDone :: IO ()
- compileStarting :: IO () -- still needed? I don't think so.
- }
-
-type ModuleName = String (or some such)
-type Path = String -- an absolute file name
-\end{verbatim}
-
-\subsection{The module \mbox{\tt finder}}
-The @finder@, given a package configuration file and a list of
-directories to look in, will map module names to @ModuleLocation@s,
-in which the @Path@s are filenames, probably with an absolute path
-to them.
-\begin{verbatim}
-data ModuleLocation = SourceOnly Path -- .hs
- | ObjectCode Path Path -- .o & .hi
- | InPackage Path -- .hi
-\end{verbatim}
-@SourceOnly@ and @ObjectCode@ are unremarkable. For sanity,
-we require that a module's object and interface be in the same
-directory. @InPackage@ indicates that the module is in a
-different package.
-
-@Module@ values -- perhaps all @Name@ish things -- contain the name of
-their package. That's so that
-\begin{itemize}
-\item Correct code can be generated for in-DLL vs out-of-DLL refs.
-\item We don't have version number dependencies for symbols
- imported from different packages.
-\end{itemize}
-
-Somehow or other, it will be possible to know all the packages
-required, so that the for the linker can load them.
-We could detect package dependencies by recording them in the
-@compile@r's @ModuleIFace@ cache, and with that and the
-package config info, figure out the complete set of packages
-to link. Or look at the command line args on startup.
-
-\ToDo{Need some way to tell incremental linkers about packages,
- since in general we'll need to load and link them before
- linking any modules in the current package.}
-
-
-\subsection{The module \mbox{\tt summarise}r}
-Given a filename of a module (\ToDo{presumably source or iface}),
-create a summary of it. A @ModuleSummary@ should contain only enough
-information for CM to construct an up-to-date picture of the
-dependency graph. Rather than expose CM to details of timestamps,
-etc, @summarise@ merely provides an up-to-date summary of any module.
-CM can extract the list of dependencies from a @ModuleSummary@, but
-other than that has no idea what's inside it.
-\begin{verbatim}
-data ModuleSummary = ... (abstract) ...
-
-depsFromSummary :: ModuleSummary -> [ModuleName] -- module names imported
-sourceHasChanged :: ModuleSummary -> ModuleSummary -> Bool
-\end{verbatim}
-@summarise@ is intended to be fast -- a @stat@ of the source or
-interface to see if it has changed, and, if so, a quick semi-parse to
-determine the new imports.
-
-\subsection{The module \mbox{\tt compile}r}
-@compile@ traffics in @ModuleIFace@s and @ModuleDetails@.
-
-A @ModuleIFace@ is an in-memory representation of the contents of an
-interface file, including version numbers, unfoldings and pragmas, and
-the linkable code for the module. @ModuleIFace@s are un-renamed,
-using @HsSym@/@RdrNames@ rather than (globally distinct) @Names@.
-
-@ModuleDetails@, by contrast, is an in-memory representation of the
-static environment created by compiling a module. It is phrased in
-terms of post-renaming @Names@, @TyCon@s, etc, so it's basically a
-renamed-to-global-uniqueness rendition of a @ModuleIFace@.
-
-In an interactive session, we'll want to be able to evaluate
-expressions as if they had been compiled in the scope of some
-specified module. This means that the @ModuleDetails@ must contain
-the type of everything defined in the module, rather than just the
-types of exported stuff. As a consequence, @ModuleIFace@ must also
-contain the type of everything, because it should always be possible
-to generate a module's @ModuleDetails@ from its @ModuleIFace@.
-
-CM maintains two mappings, one from @ModuleName@s to @ModuleIFace@s,
-the other from @ModuleName@s to @ModuleDetail@s. It passes the former
-to each call of @compile@. This is used to supply information about
-modules compiled prior to this one (lower down in the graph). The
-returned @CompileResult@ supplies a new @ModuleDetails@ for the module
-if compilation succeeded, and CM adds this to the mapping. The
-@CompileResult@ also supplies a new @ModuleIFace@, which is either the
-same as that supplied to @compile@, if @compile@ decided not to
-retranslate the module, or is the result of a fresh translation (from
-source). So these mappings are an explicitly-passed-around part of
-the global system state.
-
-@compile@ may also {\em optionally} also accumulate @ModuleIFace@s for
-modules in different packages -- that is, interfaces which we read,
-but never attempt to recompile source for. Such interfaces, being
-from foreign packages, never change, so @compile@ can accumulate them
-in perpetuity in a private global variable. Indeed, a major motivator
-of this design is to facilitate this caching of interface files,
-reading of which is a serious bottleneck for the current compiler.
-
-When CM restarts compilation down at the bottom of the module graph,
-it first needs to throw away all \ToDo{all?} @ModuleDetails@ in the
-upward closure of the out-of-date modules. So @ModuleDetails@ don't
-persist across recompilations. But @ModuleIFace@s do, since they
-are conceptually equivalent to interface files.
-
-
-\subsubsection*{What @compile@ returns}
-@compile@ returns a @CompileResult@ to CM.
-Note that the @compile@'s foreign-package interface cache can
-become augmented even as a result of reading interfaces for a
-compilation attempt which ultimately fails, although it will not be
-augmented with a new @ModuleIFace@ for the failed module.
-\begin{verbatim}
--- CompileResult is not abstract to the Compilation Manager
-data CompileResult
- = CompOK ModuleIFace
- ModuleDetails -- compiled ok, here are new details
- -- and new iface
-
- | CompErr [SDoc] -- compilation gave errors
-
- | NoChange -- no change required, meaning:
- -- exports, unfoldings, strictness, etc,
- -- unchanged, and executable code unchanged
-\end{verbatim}
-
-
-
-\subsubsection*{Re-establishing local-to-global name mappings}
-Consider
-\begin{verbatim}
-module Upper where module Lower ( f ) where
-import Lower ( f ) f = ...
-g = ... f ...
-\end{verbatim}
-When @Lower@ is first compiled, @f@ is allocated a @Unique@
-(presumably inside an @Id@ or @Name@?). When @Upper@ is then
-compiled, its reference to @f@ is attached directly to the
-@Id@ created when compiling @Lower@.
-
-If the definition of @f@ is now changed, but not the type,
-unfolding, strictness, or any other thing which affects the way
-it should be called, we will have to recompile @Lower@, but not
-@Upper@. This creates a problem -- @g@ will then refer to the
-the old @Id@ for @f@, not the new one. This may or may not
-matter, but it seems safer to ensure that all @Unique@-based
-references into child modules are always up to date.
-
-So @compile@ recreates the @ModuleDetails@ for @Upper@ from
-the @ModuleIFace@ of @Upper@ and the @ModuleDetails@ of @Lower@.
-
-The rule is: if a module is up to date with respect to its
-source, but a child @C@ has changed, then either:
-\begin{itemize}
-\item On examination of the version numbers in @C@'s
- interface/@ModuleIFace@ that we used last time, we discover that
- an @Id@/@TyCon@/class/instance we depend on has changed. So
- we need to retranslate the module from its source, generating
- a new @ModuleIFace@ and @ModuleDetails@.
-\item Or: there's nothing in @C@'s interface that we depend on.
- So we quickly recreate a new @ModuleDetails@ from the existing
- @ModuleIFace@, creating fresh links to the new @Unique@-world
- entities in @C@'s new @ModuleDetails@.
-\end{itemize}
-
-Upshot: we need to redo @compile@ on all modules all the way up,
-rather than just the ones that need retranslation. However, we hope
-that most modules won't need retranslation -- just regeneration of the
-@ModuleDetails@ from the @ModuleIFace@. In effect, the @ModuleIFace@
-is a quickly-compilable representation of the module's contents, just
-enough to create the @ModuleDetails@.
-
-\ToDo{Is there anything in @ModuleDetails@ which can't be
- recreated from @ModuleIFace@ ?}
-
-So the @ModuleIFace@s persist across calls to @HEP.load@, whereas
-@ModuleDetails@ are reconstructed on every compilation pass. This
-means that @ModuleIFace@s have the same lifetime as the byte/object
-code, and so should somehow contain their code.
-
-The behind-the-scenes @ModuleIFace@ cache has some kind of holding-pen
-arrangement, to lazify the copying-out of stuff from it, and thus to
-minimise redundant interface reading. \ToDo{Burble burble. More
-details.}.
-
-When CM starts working back up the module graph with @compile@, it
-needs to remove from the travelling @FiniteMap@ @ModuleName@
-@ModuleDetails@ the details for all modules in the upward closure of
-the compilation start points. However, since we're going to visit
-precisely those modules and no others on the way back up, we might as
-well just zap them the old @ModuleDetails@ incrementally. This does
-mean that the @FiniteMap@ @ModuleName@ @ModuleDetails@ will be
-inconsistent until we reach the top.
-
-In interactive mode, each @compile@ call on a module for which no
-object code is available, or for which it is out of date wrt source,
-emit bytecode into memory, update the resulting @ModuleIFace@ with the
-address of the bytecode image, and link the image.
-
-In batch mode, emit assembly or object code onto disk. Record
-somewhere \ToDo{where?} that this object file needs to go into the
-final link.
-
-When we reach the top, @compileDone@ is called, to signify that batch
-linking can now proceed, if need be.
-
-Modules in other packages never get a @ModuleIFace@ or @ModuleDetails@
-entry in CM's maps -- those maps are only for modules in this package.
-As previously mentioned, @compile@ may optionally cache @ModuleIFace@s
-for foreign package modules. When reading such an interface, we don't
-need to read the version info for individual symbols, since foreign
-packages are assumed static.
-
-\subsubsection*{What's in a \mbox{\tt ModuleIFace}?}
-
-Current interface file contents?
-
-
-\subsubsection*{What's in a \mbox{\tt ModuleDetails}?}
-
-There is no global symbol table @:: Name -> ???@. To look up a
-@Name@, first extract the @ModuleName@ from it, look that up in
-the passed-in @FiniteMap@ @ModuleName@ @ModuleDetails@,
-and finally look in the relevant @Env@.
-
-\ToDo{Do we still have the @HoldingPen@, or is it now composed from
-per-module bits too?}
-\begin{verbatim}
-data ModuleDetails = ModuleDetails {
-
- moduleExports :: what it exports (Names)
- -- roughly a subset of the .hi file contents
-
- moduleEnv :: RdrName -> Name
- -- maps top-level entities in this module to
- -- globally distinct (Uniq-ified) Names
-
- moduleDefs :: Bag Name -- All the things in the global symbol table
- -- defined by this module
-
- package :: Package -- what package am I in?
-
- lastCompile :: Date -- of last compilation
-
- instEnv :: InstEnv -- local inst env
- typeEnv :: Name -> TyThing -- local tycon env?
- }
-
--- A (globally unique) symbol table entry. Note that Ids contain
--- unfoldings.
-data TyThing = AClass Class
- | ATyCon TyCon
- | AnId Id
-\end{verbatim}
-What's the stuff in @ModuleDetails@ used for?
-\begin{itemize}
-\item @moduleExports@ so that the stuff which is visible from outside
- the module can be calculated.
-\item @moduleEnv@: \ToDo{umm err}
-\item @moduleDefs@: one reason we want this is so that we can nuke the
- global symbol table contribs from this module when it leaves the
- system. \ToDo{except ... we don't have a global symbol table any
- more.}
-\item @package@: we will need to chase arbitrarily deep into the
- interfaces of other packages. Of course we don't want to
- recompile those, but as we've read their interfaces, we may
- as well cache that info. So @package@ indicates whether this
- module is in the default package, or, if not, which it is in.
-
- Also, when we come to linking, we'll need to know which
- packages are demanded, so we know to load their objects.
-
-\item @lastCompile@: When the module was last compiled. If the
- source is older than that, then a recompilation can only be
- required if children have changed.
-\item @typeEnv@: obvious??
-\item @instEnv@: the instances contributed by this module only. The
- Report allegedly says that when a module is translated, the
- available
- instance env is all the instances in the downward closure of
- itself in the module graph.
-
- We choose to use this simple representation -- each module
- holds just its own instances -- and do the naive thing when
- creating an inst env for compilation with. If this turns out
- to be a performance problem we'll revisit the design.
-\end{itemize}
-
-
-
-%%-----------------------------------------------------------------%%
-\section{Misc text looking for a home}
-
-\subsection*{Linking}
-
-\ToDo{All this linking stuff is now bogus.}
-
-There's an abstract @LinkState@, which is threaded through the linkery
-bits. CM can call @addpkgs@ to notify the linker of packages
-required, and it can call @addmods@ to announce modules which need to
-be linked. Finally, CM calls @endlink@, after which an executable
-image should be ready. The linker may link incrementally, during each
-call of @addpkgs@ and @addmods@, or it can just store up names and do
-all the linking when @endlink@ is called.
-
-In order that incremental linking is possible, CM should specify
-packages and module groups in dependency order, ie, from the bottom up.
-
-\subsection*{In-memory linking of bytecode}
-When being HEP-like, @compile@ will translate sources to bytecodes
-in memory, with all the bytecode for a module as a contiguous lump
-outside the heap. It needs to communicate the addresses of these
-lumps to the linker. The linker also needs to know whether a
-given module is available as in-memory bytecode, or whether it
-needs to load machine code from a file.
-
-I guess @LinkState@ needs to map module names to base addresses
-of their loaded images, + the nature of the image, + whether or not
-the image has been linked.
-
-\subsection*{On disk linking of object code, to give an executable}
-The @LinkState@ in this case is just a list of module and package
-names, which @addpkgs@ and @addmods@ add to. The final @endlink@
-call can invoke the system linker.
-
-\subsection{Finding out about packages, dependencies, and auxiliary
- objects}
-
-Ask the @packages.conf@ file that lives with the driver at the mo.
-
-\ToDo{policy about upward closure?}
-
-
-
-\ToDo{record story about how in memory linking is done.}
-
-\ToDo{linker start/stop/initialisation/persistence. Need to
- say more about @LinkState@.}
-
-
-\end{document}
-
-