%%%\tableofcontents
%%%\newpage
-
%%-----------------------------------------------------------------%%
+\section*{Misc text looking for a home}
+
+\subsubsection*{Starting up}
+Some of the session-lifetime data structures are opaque to CM, so
+it doesn't know how to create an initial one. Hence it relies on its
+client to supply the following:
+\begin{verbatim}
+ emptyPCS :: PCS
+ emptyOST :: OST
+\end{verbatim}
+The PCS is maintained solely by @compile@, and OST solely by
+@link@/@unlink@. CM cannot know the representation of the latter
+since it depends on whether we're operating in interactive or batch
+mode.
+
+
+@compile@ is supplied with, and checks PIT (inside PCS) before
+reading package interfaces, so it doesn't read and add duplicate
+@ModIFace@s to PIT.
+
+
\section{Details}
-\subsection{Data structures and lifetimes}
+\subsection{Outline of the design}
+\label{sec:details-intro}
-About lifetimes:
+The design falls into three major parts:
\begin{itemize}
-\item {\bf Session} lifetime covers a complete run of GHCI,
- encompassing multiple recompilation runs.
-\item {\bf Rebuild} lifetime covers the actions needed to bring
- the target module up to date -- a downsweep from the
- target to reestablish the module graph, and an upsweep to
- bring the translations (compiled code) and global symbol
- table back up to date.
-\item {\bf Module} lifetime: that of data needed to translate
- a single module, but then discarded, for example Core,
- AbstractC, Stix trees.
+\item The compilation manager (CM), which coordinates the
+ system and supplies a HEP-like interface to clients.
+\item The module compiler (@compile@), which translates individual
+ modules to interpretable or machine code.
+\item The linker (@link@),
+ which maintains the executable image in interpreted mode.
\end{itemize}
-Structures with module lifetime are well documented and understood.
-Here we're really interested in structures with session and rebuild
-lifetimes. Most of these structures are ``owned'' by CM, since that's
+There are also three auxiliary parts: the finder, which locates
+source, object and interface files, the summariser, which quickly
+finds dependency information for modules, and the static info
+(compiler flags and package details), which is unchanged over the
+course of a session.
+
+This section continues with an overview of the session-lifetime data
+structures. Then follows the finder (section~\ref{sec:finder}),
+summariser (section~\ref{sec:summariser}),
+static info (section~\ref{sec:staticinfo}),
+and finally the three big sections
+(\ref{sec:manager},~\ref{sec:compiler},~\ref{sec:linker})
+on the compilation manager, compiler and linker respectively.
+
+\subsubsection*{Some terminology}
+
+Lifetimes: the phrase {\bf session lifetime} covers a complete run of
+GHCI, encompassing multiple recompilation runs. {\bf Module lifetime}
+is a lot shorter, being that of data needed to translate a single
+module, but then discarded, for example Core, AbstractC, Stix trees.
+
+Data structures with module lifetime are well documented and understood.
+This document is mostly concerned with session-lifetime data.
+Most of these structures are ``owned'' by CM, since that's
the only major component of GHCI which deals with session-lifetime
issues.
-Terminology: ``home'' refers to modules in this package, precisely
-the ones tracked and updated by CM. ``Package'' refers to all other
-packages, which are assumed static.
+Modules and packages: {\bf home} refers to modules in this package,
+precisely the ones tracked and updated by the compilation manager.
+{\bf Package} refers to all other packages, which are assumed static.
-New data structures are:
+\subsubsection*{A summary of all session-lifetime data structures}
+
+These structures have session lifetime but not necessarily global
+visibility. Subsequent sections elaborate who can see what.
\begin{itemize}
+\item {\bf Home Symbol Table (HST)} (owner: CM) holds the post-renaming
+ environments created by compiling each home module.
+\item {\bf Home Interface Table (HIT)} (owner: CM) holds in-memory
+ representations of the interface file created by compiling
+ each home module.
+\item {\bf Unlinked Images (UI)} (owner: CM) are executable but as-yet
+ unlinked translations of home modules only.
+\item {\bf Module Graph (MG)} (owner: CM) is the current module graph.
+\item {\bf Static Info (SI)} (owner: CM) is the package configuration
+ information and compiler flags.
+\item {\bf Persistent Compiler State (PCS)} (owner: @compile@)
+ is @compile@'s private cache of information about package
+ modules.
+\item {\bf Persistent Linker State (PLS)} (owner: @link@) is
+ @link@'s private information concerning the the current
+ state of the (in-memory) executable image.
+\end{itemize}
+
+
+%%-- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --%%
+\subsection{The finder (\mbox{\tt type Finder})}
+\label{sec:finder}
+
+@Path@ could be an indication of a location in a filesystem, or it
+could be some more generic kind of resource identifier, a URL for
+example.
+\begin{verbatim}
+ data Path = ...
+\end{verbatim}
+And some names. @Module@s are now used as primary keys for various
+maps, so they are given a @Unique@.
+\begin{verbatim}
+ type ModName = String -- a module name
+ type PkgName = String -- a package name
+ type Module = -- contains ModName and a Unique, at least
+\end{verbatim}
+
+A @ModLocation@ says where a module is, what it's called and in what
+form it it.
+\begin{verbatim}
+ data ModLocation = SourceOnly Module Path -- .hs
+ | ObjectCode Module Path Path -- .o, .hi
+ | InPackage Module PkgName
+ -- examine PCI to determine package Path
+\end{verbatim}
+
+The module finder generates @ModLocation@s from @ModName@s. We expect
+it will assume packages to be static, but we want to be able to track
+changes in home modules during the session. Specifically, we want to
+be able to notice that a module's object and interface have been
+updated, presumably by a compile run outside of the GHCI session.
+Hence the two-stage type:
+\begin{verbatim}
+ type Finder = ModName -> IO ModLocation
+ newFinder :: [PCI] -> IO Finder
+\end{verbatim}
+@newFinder@ examines the package information right at the start, but
+returns an @IO@-typed function which can inspect home module changes
+later in the session.
+
+
+%%-- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --%%
+\subsection{The summariser (\mbox{\tt summarise})}
+\label{sec:summariser}
+
+A @ModSummary@ records the minimum information needed to establish the
+module graph and determine whose source has changed. @ModSummary@s
+can be created quickly.
+\begin{verbatim}
+ data ModSummary = ModSummary
+ ModLocation -- location and kind
+ Maybe (String, Fingerprint)
+ -- source and fingerprint if .hs
+ [ModName] -- imports
+
+ type Fingerprint = ... -- file timestamp, or source checksum?
+
+ summarise :: ModLocation -> IO ModSummary
+\end{verbatim}
+
+The summary contains the location and source text, and the location
+contains the name. We would like to remove the assumption that
+sources live on disk, but I'm not sure this is good enough yet.
+
+\ToDo{Should @ModSummary@ contain source text for interface files too?}
+\ToDo{Also say that @ModIFace@ contains its module's @ModSummary@ (why?).}
+
+
+%%-- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --%%
+\subsection{Static information (SI)}
+\label{sec:staticinfo}
+
+PCI, the package configuration information, is a list of @PkgInfo@,
+each containing at least the following:
+\begin{verbatim}
+ data PkgInfo
+ = PkgInfo PkgName -- my name
+ Path -- path to my base location
+ [PkgName] -- who I depend on
+ [ModName] -- modules I supply
+ [Unlinked] -- paths to my object files
+
+ type PCI = [PkgInfo]
+\end{verbatim}
+The @Path@s in it, including those in the @Unlinked@s, are set up
+when GHCI starts.
+
+FLAGS is a bunch of compiler options. We haven't figured out yet how
+to partition them into those for the whole session vs those for
+specific source files, so currently the best we can do is:
+\begin{verbatim}
+ data FLAGS = ...
+\end{verbatim}
+
+The static information (SI) is the both of these:
+\begin{verbatim}
+ data SI = SI PCI
+ FLAGS
+\end{verbatim}
+
+
+
+%%-- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --%%
+\subsection{The Compilation Manager (CM)}
+\label{sec:manager}
+
+\subsubsection{Data structures owned by CM}
+
+CM maintains two maps (HST, HIT) and a set (UI). It's important to
+realise that CM only knows about the map/set-ness, and has no idea
+what a @ModDetails@, @ModIFace@ or @Linkable@ is. Only @compile@ and
+@link@ know that, and CM passes these types around without
+inspecting them.
+
+\begin{itemize}
\item
{\bf Home Symbol Table (HST)} @:: FiniteMap Module ModDetails@
to other entities, regardless of module boundaries. HST only
holds information for home modules; the corresponding wired-up
details for package (non-home) modules are created lazily in
- the package rename cache (PRC).
+ the package symbol table (PST) inside the persistent compiler's state
+ (PST).
CM maintains the HST, which is passed to, but not modified by,
@compile@. If compilation of a module is successful, @compile@
(Completely private to CM; nobody else sees this).
Compilation of a module always creates a @ModIFace@, which contains
- the unlinked symbol table entries. CM maintains a @FiniteMap@
+ the unlinked symbol table entries. CM maintains this @FiniteMap@
@ModName@ @ModIFace@, with session lifetime. CM never throws away
@ModIFace@s, but it does update them, by passing old ones to
@compile@ if they exist, and getting new ones back.
CM acquires @ModuleIFace@s from @compile@, which it only applies
to modules in the home package. As a result, HIT only contains
@ModuleIFace@s for modules in the home package. Those from other
- packages reside in ...
+ packages reside in the package interface table (PIT) which is a
+ component of PCS.
\item
+ {\bf Unlinked Images (UI)} @:: Set Linkable@
+
+ The @Linkable@s in UI represent executable but as-yet unlinked
+ module translations. A @Linkable@ can contain the name of an
+ object, archive or DLL file. In interactive mode, it may also be
+ the STG trees derived from translating a module. So @compile@
+ returns a @Linkable@ from each successful run, namely that of
+ translating the module at hand. At link-time, CM supplies these
+ @Linkable@s to @link@. It also examines the @ModSummary@s for all
+ home modules, and by examining their imports and the PCI (package
+ configuration info) it can determine the @Linkable@s from all
+ required imported packages too.
+
+ @Linkable@s and @ModIFace@s have a close relationship. Each
+ translated module has a corresponding @Linkable@ somewhere.
+ However, there may be @Linkable@s with no corresponding modules
+ (the RTS, for example). Conversely, multiple modules may share a
+ single @Linkable@ -- as is the case for any module from a
+ multi-module package. For these reasons it seems appropriate to
+ keep the two concepts distinct. @Linkable@s also provide
+ information about how to link package components together, and that
+ insn't the business of any specific module to know.
+
+ CM passes @compile@ a module's old @ModIFace@, if it has one, in
+ the hope that the module won't need recompiling. If so, @compile@
+ can just return the @ModIFace@ along with a new @ModDetails@
+ created from it. Similarly, CM passes in a module's old
+ @Linkable@, if it has one, and that's returned unchanged if the
+ module isn't recompiled.
+
+\item
+ {\bf Module Graph (MG)} @:: known-only-to-CM@
+
+ Records, for CM's purposes, the current module graph,
+ up-to-dateness and summaries. More details when I get to them.
+\end{itemize}
+
+
+\subsubsection{What CM does}
+Pretty much as before. \ToDo{... and what was Before?}
+
+Plus: detect module cycles during the downsweep. During the upsweep,
+ensure that compilation failures for modules in cycles do not leave
+any of the global structures in an inconsistent state.
+\begin{itemize}
+\item
+ For PCS, that's never a problem because PCS doesn't hold any
+ information pertaining to home modules.
+\item
+ HST and HIT: CM knows that these are mappings from @Module@ to
+ whatever, and can throw away entries from failed cycles, or,
+ equivalently, not commit updates to them until cycles succeed,
+ remembering of course to synthesise appropriate HSTs during
+ compilation of a cycle.
+\item
+ UI -- a collection of @Linkable@s, between which there are no
+ direct refererences, so CM can remove additions from failed cycles
+ with no difficulty.
+\item
+ OST -- linking is not carried out until the upsweep has
+ succeeded, so there's no problem here.
+\end{itemize}
+
+Plus: clear out the global data structures after the downsweep but
+before the upsweep.
+
+\ToDo{CM needs to supply a way for @compile@ to know which modules in
+ HST are in its downwards closure, and which not, so it can
+ correctly construct its instance environment.}
+
+
+
+%%-- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --%%
+\subsection{The compiler (\mbox{\tt compile})}
+\label{sec:compiler}
+
+\subsubsection{Data structures owned by \mbox{\tt compile}}
+
{\bf Persistent Compiler State (PCS)} @:: known-only-to-compile@
This contains info about foreign packages only, acting as a cache,
decision later.
\item
- {\bf Package Rename Cache (PRC)} @:: FiniteMap Module ModDetails@
+ {\bf Package Symbol Table (PST)} @:: FiniteMap Module ModDetails@
Adding an package interface to PIT doesn't make it directly usable
to @compile@, because it first needs to be wired (renamed +
waster. Also, wiring in an interface could mean reading other
interfaces, and we don't want to do that unnecessarily.
- The PRC avoids these problems by allowing incremental wiring-in to
+ The PST avoids these problems by allowing incremental wiring-in to
happen. Pieces of foreign interfaces are renamed and placed in the
- PRC, but only as @compile@ discovers it needs them. In the process
+ PST, but only as @compile@ discovers it needs them. In the process
of incremental renaming, @compile@ may need to read more package
interfaces, which are returned to CM to add to the PIT.
- CM passes the PRC to @compile@ and is returned an updated version
+ CM passes the PST to @compile@ and is returned an updated version
on success. On failure, @compile@ doesn't return an updated
version even though it might have created some updates on the way
to failure. This seems necessary to retain the (thus far unstated)
- invariant that PRC only contains renamed fragments of interfaces in
+ invariant that PST only contains renamed fragments of interfaces in
PIT.
+
+ \item
+ {\bf Holding Pen (HP)} @:: Ifaces@
+
\end{itemize}
PCS is opaque to CM; only @compile@ knows what's in it, and how to
need to have a way to remove this information prior to commencing a
rebuild, which conflicts with PCS's opaqueness to CM.
-\item
- {\bf Unlinked Images (UI)} @:: Set Linkable@
-
- The @Linkable@s in UI represent executable but as-yet unlinked
- module translations. A @Linkable@ can contain the name of an
- object, archive or DLL file. In interactive mode, it may also be
- the STG trees derived from translating a module. So @compile@
- returns a @Linkable@ from each successful run, namely that of
- translating the module at hand. At link-time, CM supplies these
- @Linkable@s to @link@. It also examines the @ModSummary@s for all
- home modules, and by examining their imports and the PCI (package
- configuration info) it can determine the @Linkable@s from all
- required imported packages too.
-
- @Linkable@s and @ModIFace@s have a close relationship. Each
- translated module has a corresponding @Linkable@ somewhere.
- However, there may be @Linkable@s with no corresponding modules
- (the RTS, for example). Conversely, multiple modules may share a
- single @Linkable@ -- as is the case for any module from a
- multi-module package. For these reasons it seems appropriate to
- keep the two concepts distinct. @Linkable@s also provide
- information about how to link package components together, and that
- insn't the business of any specific module to know.
-
- CM passes @compile@ a module's old @ModIFace@, if it has one, in
- the hope that the module won't need recompiling. If so, @compile@
- can just return the @ModIFace@ along with a new @ModDetails@
- created from it. Similarly, CM passes in a module's old
- @Linkable@, if it has one, and that's returned unchanged if the
- module isn't recompiled.
-
-\end{itemize}
-
-There are also a few auxiliary structures, of somehow lesser importance:
-
-\begin{itemize}
-\item
- {\bf Module Graph (MG)} @:: known-only-to-CM@
-
- Records, for CM's purposes, the current module graph,
- up-to-dateness and summaries. More details when I get to them.
-
-\item
- {\bf Package Config Info (PCI)} @:: [PkgInfo]@
-
- A value static over the entire session, giving, for each package,
- its name, dependencies, linkable components and constitutent module
- names.
-
-\item
- {\bf Flags/options (FLAGS)} @:: dunno@
-
- Another session-static value, containing flags/options. Burble.
-\end{itemize}
-
-
-
-\subsection{Important datatypes}
-
-\subsubsection*{Names, location and summarisation}
-The summary should contain the location, and the location contain the
-name. Also it is hoped to remove the assumption that sources live on
-disk, but I'm not sure this is good enough yet. @Module@s are now
-used as primary keys in various maps, so they are given a @Unique@.
-\begin{verbatim}
- type ModName = String -- a module name
- type PkgName = String -- a package name
-
- type Module = -- contains ModName and a Unique, at least
-\end{verbatim}
-@Path@ could be an indication of a location in a filesystem, or it
-could be some more generic kind of resource identifier, a URL for
-example.
-\begin{verbatim}
- data Path = ...
-\end{verbatim}
-A @ModLocation@ says where a module is, what it's called and in what
-form it it.
-\begin{verbatim}
- data ModLocation = SourceOnly Module Path -- .hs
- | ObjectCode Module Path Path -- .o, .hi
- | InPackage Module PkgName
- -- examine PCI to determine package Path
-\end{verbatim}
-A @ModSummary@ records the minimum information needed to establish the
-module graph and determine whose source has changed. @ModSummary@s
-can be created quickly.
-\begin{verbatim}
- data ModSummary = ModSummary
- ModLocation -- location and kind
- Maybe (String, Fingerprint)
- -- source and fingerprint if .hs
- [ModName] -- imports
-
- type Fingerprint = ... -- file timestamp, or source checksum?
-\end{verbatim}
-\ToDo{Should @ModSummary@ contain source text for interface files
- too?}
-\ToDo{Also say that @ModIFace@ contains its module's @ModSummary@.}
-@compile@ is supplied with, and checks PIT (inside PCS) before
-reading package interfaces, so it doesn't read and add duplicate
-@ModIFace@s to PIT.
-
-\subsection{Package Configuration}
-\label{sec:package-config}
-PCI, the package configuration information, is a list of @PkgInfo@,
-each containing at least the following:
-\begin{verbatim}
- data PkgInfo
- = PkgInfo PkgName -- my name
- Path -- path to my base location
- [PkgName] -- who I depend on
- [ModName] -- modules I supply
- [Unlinked] -- paths to my object files
-\end{verbatim}
-The @Path@s in it, including those in the @Unlinked@s, are set up
-when GHCI starts.
-
-\subsection{Signatures}
-
-\subsubsection*{The finder}
-The module finder generates @ModLocation@s from @ModName@s. We
-expect that it will assume packages to be static, but we want to
-be able to track changes in home modules during the session.
-Specifically, we want to be able to notice that a module's object and
-interface have been updated, presumably by a compile run outside of
-the GHCI session. Hence the two-stage type:
-\begin{verbatim}
- type Finder = ModName -> IO ModLocation
- newFinder :: [PkgConfig] -> IO Finder
-\end{verbatim}
-@newFinder@ examines the package information right at the start, but
-returns an @IO@-typed function which can inspect home module changes
-later in the session.
-
-
-\subsubsection*{Starting up}
-Some of the session-lifetime data structures are opaque to CM, so
-it doesn't know how to create an initial one. Hence it relies on its
-client to supply the following:
-\begin{verbatim}
- emptyPCS :: PCS
- emptyOST :: OST
-\end{verbatim}
-The PCS is maintained solely by @compile@, and OST solely by
-@link@/@unlink@. CM cannot know the representation of the latter
-since it depends on whether we're operating in interactive or batch
-mode.
\subsubsection*{What {\tt compile} does}
@compile@ is necessarily somewhat complex. We've decided to do away
data PCS
= MkPCS PIT -- package interfaces
- PRC -- rename cache/global symtab contents
+ PST -- rename cache/global symtab contents
\end{verbatim}
Although @compile@ is passed three of the global structures (FLAGS,
HST and PCS), it only modifies PCS. The rest are modified by CM as it
\item
If recompilation is not needed, create a new @ModDetails@ from the
- old @ModIFace@, looking up information in HST and PCS.PRC as necessary.
+ old @ModIFace@, looking up information in HST and PCS.PST as necessary.
Return the new details, the old @ModIFace@ and @Linkable@, the PCS
\ToDo{I don't think the PCS should be updated, but who knows?}, and
an empty warning list.
If the module is only available in object+interface form, read the
interface, make up details, create a linkable pointing at the
object code. Does this involve reading any more interfaces? Does
- it involve updating PRC?
+ it involve updating PST?
Otherwise, translate from source, then create and return: an
- details, interface, linkable, updated PRC, and warnings.
+ details, interface, linkable, updated PST, and warnings.
When looking for a new interface, search HST, then PCS.PIT, and only
then read from disk. In which case add the new interface(s) to
boot interface against the inferred interface.}
\end{itemize}
-
-
-\subsection{Linking}
+%%-- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --%%
+\subsection{The linker (\mbox{\tt link})}
\label{sec:linker}
-\subsubsection{The linker's private data structures}
+\subsubsection{Data structures owned by the linker}
In the same way that @compile@ has a persistent compiler state (PCS),
the linker has a persistent (session-lifetime) state, LPS, the
The order of the @Unlinked@s in the list is important, as
they are linked in left-to-right order. The @Unlinked@ objects for a
particular package can be obtained from the package configuration (see
-Section \ref{sec:package-config}).
+Section \ref{sec:staticinfo}).
\ToDo{When adding @Addr@s from an object module to SST, we need to
somehow find out the @RdrName@s of the symbols exported by that
-\subsection{What CM does}
-\label{sec:compilation-manager}
-Pretty much as before.
-
-Plus: detect module cycles during the downsweep. During the upsweep,
-ensure that compilation failures for modules in cycles do not leave
-any of the global structures in an inconsistent state.
-\begin{itemize}
-\item
- For PCS, that's never a problem because PCS doesn't hold any
- information pertaining to home modules.
-\item
- HST and HIT: CM knows that these are mappings from @Module@ to
- whatever, and can throw away entries from failed cycles, or,
- equivalently, not commit updates to them until cycles succeed,
- remembering of course to synthesise appropriate HSTs during
- compilation of a cycle.
-\item
- UI -- a collection of @Linkable@s, between which there are no
- direct refererences, so CM can remove additions from failed cycles
- with no difficulty.
-\item
- OST -- linking is not carried out until the upsweep has
- succeeded, so there's no problem here.
-\end{itemize}
-
-Plus: clear out the global data structures after the downsweep but
-before the upsweep.
-
-\ToDo{CM needs to supply a way for @compile@ to know which modules in
- HST are in its downwards closure, and which not, so it can
- correctly construct its instance environment.}
-
-
-
%%-----------------------------------------------------------------%%
\section{Background ideas}
\subsubsection*{Out of date, but correct in spirit}