\begin{document}
\title{The STG runtime system (revised)}
-\author{Simon Peyton Jones \\ Glasgow University and Oregon Graduate Institute \and
-Simon Marlow \\ Glasgow University \and
+\author{Simon Peyton Jones \\ Microsoft Research Ltd., Cambridge \and
+Simon Marlow \\ Microsoft Research Ltd., Cambridge \and
Alastair Reid \\ Yale University}
\maketitle
This document describes the GHC/Hugs run-time system. It serves as
a Glasgow/Yale/Nottingham ``contract'' about what the RTS does.
-\Subsection{New features compared to GHC 2.04}{new-features}
+\Subsection{New features compared to GHC 3.xx}{new-features}
\begin{itemize}
\item The RTS supports mixed compiled/interpreted execution, so
space leaks.
\item A running thread has only one stack, which contains a mixture of
-pointers and non-pointers. \secref{stacks} describes how we find out
+pointers and non-pointers. \secref{TSO} describes how we find out
which is which. (GHC has used two stacks for some while. Using one
stack instead of two reduces register pressure, reduces the size of
update frames, and eliminates ``stack-stubbing'' instructions.)
explicitly state where results should be returned in registers (or on
the stack) instead of on the heap.
-\item
+\item Exceptions are supported by the RTS.
+
+\item Weak Pointers generalise the previously available Foreign Object
+interface.
-Lazy black-holing has been replaced by eager black-holing. The
-problem with lazy black-holing is that it leaves slop in the heap
-which conflicts with the use of a mostly-copying collector.
-\ToDo{Why? I think we can still do lazy black holing without leaving
-slop (SDM)}
+\item The garbage collector supports a number of new features,
+including a dynamically resizable heap and multiple generations with
+aging within a generation.
\end{itemize}
generations, etc taking into account residency, GC rate and page fault
rate.
-\item
-There should be no need to specify the amnount of stack/heap space to
-allocate when you started a program - let it just take as much or as
-little as it wants. (It might be useful to be able to specify maximum
-sizes and to be able to suggest an initial size.)
-
\item
We could trigger a GC when all threads are blocked waiting for IO if
the allocation arena (or some of the generations) are nearly full.
The headers vary between different kinds of object but they all start
with a pointer to a pair consisting of an \emph{info table} and some
\emph{entry code}. The info table is used both by the evaluators and
-by the storage manager and contains an @INFO_TYPE@ field which
-identifies which kind of heap object uses it and determines the
-interpretation of the payload and of the other fields of the info
-table. The entry code is some machine code used by the machine code
-evaluator to evaluate closures and raises an error for other kinds of
-objects.
+by the storage manager and contains a @type@ field which identifies
+which kind of heap object uses it and determines the interpretation of
+the payload and of the other fields of the info table. The entry code
+is some machine code used by the machine code evaluator to evaluate
+closures and raises an error for other kinds of objects.
The major kinds of heap object used are as follows. (For simplicity,
this description omits certain optimisations and extra fields required
\end{tabular}
\end{center}
-\item[Primitive objects] are used to represent objects with unpointed
+\item[Primitive objects] are used to represent objects with unlifted
types which are too large to fit in a register (or stack slot) or for
which sharing must be preserved. Primitive objects include large
objects such as multiple precision integers and immutable arrays and
mutable objects such as mutable arrays, mutable variables, MVar's,
-IVar's and foreign object pointers. Since unpointed objects are not
-pointed, they cannot be entered. Their payload varies according to
-the kind of object.
+IVar's and foreign object pointers. Since primitive objects are not
+lifted, they cannot be entered. Their payload varies according to the
+kind of object.
\item[Function closures] are used to represent functions. Their
payload (if any) consists of the free variables of the function.
\item[Thunks] are used to represent unevaluated expressions which will
be updated with their result. Their payload (if any) consists of the
free variables of the function. The entry code for a thunk starts by
-pushing an \emph{update frame} onto the stack and overwriting the
-thunk with a \emph{black hole} (see Black Holes, below). When
-evaluation of the thunk completes, the update frame will cause the
-thunk to be overwritten again with an \emph{indirection} to the result
-of the thunk, which is always a constructor or a partial application.
+pushing an \emph{update frame} onto the stack. When evaluation of the
+thunk completes, the update frame will cause the thunk to be
+overwritten again with an \emph{indirection} to the result of the
+thunk, which is always a constructor or a partial application.
\begin{center}
\begin{tabular}{|l|l|l|l|}\hline
is overwritten with an indirection to the result of the closure and
any blocked threads are restored to the runnable queue.
+Closures are overwritten by black-holes during a ``lazy black-holing''
+phase which runs on each thread when it returns to the scheduler.
+\ToDo{section describing lazy black-holing}.
+
\begin{center}
\begin{tabular}{|l|l|l|l|}\hline
-@BH@ & \emph{Blocked threads} \\ \hline
+@BLACKHOLE@ & \emph{Blocked threads} \\ \hline
\end{tabular}
\end{center}
\ToDo{In a single threaded system, it's trivial to detect infinite
-loops: reentering a BH is always an error. How easy is it in a
+loops: reentering a BLACKHOLE is always an error. How easy is it in a
multi-threaded system?}
\item[Indirections] are used to update an unevaluated closure with its
Indirections needn't always point to a closure in WHNF. They can
point to a chain of indirections which point to an evaluated closure.
-When revertible black holes are added, they may also point to reverted
-black holes.
\item[Thread State Objects (@TSO@s)] represent Haskell threads. Their
-payload consists of a unique thread id, the status of the thread
-(runnable, blocked, etc) and the stack. @TSO@s may be resized by the
-scheduler if its stack is too small or too large.
+payload consists of some per-thread information such as the Thread ID
+and the status of the thread (runnable, blocked etc.), and the
+thread's stack. See @TSO.h@ for the full story. @TSO@s may be
+resized by the scheduler if its stack is too small or too large.
+
+The thread stack grows downwards from higher to lower addresses.
\begin{center}
\begin{tabular}{|l|l|l|l|}\hline
-@TSO@ & \emph{Thread Id} & \emph{Status} & \emph{Stack} \\ \hline
+@TSO@ & \emph{Thread info} & \emph{Stack} \\ \hline
\end{tabular}
\end{center}
\begin{center}
\begin{tabular}{|l|l|l|l|}\hline
-\emph{@RET_ADDR@} & \emph{Free Variables of the case alternatives} \\ \hline
+\emph{@RET_XXX@} & \emph{Free Variables of the case alternatives} \\ \hline
\end{tabular}
\end{center}
The free variables are a mixture of pointers and non-pointers whose
-layout is described by the info table.
+layout is described by a bitmask in the info table.
+
+There are several kinds of @RET_XXX@ return address - see
+\secref{activation-records} for the details.
Return addresses generated by the bytecode compiler look like this:
\begin{center}
\item[Update frames] are used to trigger updates. When an update
frame is entered, it overwrites the updatee with an indirection to the
-result, restarts any threads blocked on the @BH@ and returns to the
-stack object underneath the update frame.
+result, restarts any threads blocked on the @BLACKHOLE@ and returns to
+the stack object underneath the update frame.
\begin{center}
\begin{tabular}{|l|l|l|l|}\hline
-\emph{@UPDATE@} & \emph{Next Update Frame} & \emph{Updatee} \\ \hline
+\emph{@UPDATE_FRAME@} & \emph{Next Update Frame} & \emph{Updatee} \\ \hline
\end{tabular}
\end{center}
-\item[Seq frames] are used to implement the polymorphic @seq@ primitive.
-They are a special kind of update frame.
-
-\ToDo{Describe them properly}
+\item[Seq frames] are used to implement the polymorphic @seq@
+primitive. They are a special kind of update frame, and are linked on
+the update frame list.
+\begin{center}
+\begin{tabular}{|l|l|l|l|}\hline
+\emph{@SEQ_FRAME@} & \emph{Next Update Frame} \\ \hline
+\end{tabular}
+\end{center}
-\end{description}
+\item[Stop frames] are put on the bottom of each thread's stack, and
+act as sentinels for the update frame list (i.e. the last update frame
+points to the stop frame). Returning to a stop frame terminates the
+thread. Stop frames have no payload:
-\ToDo{We also need a stop frame which goes on the bottom of the stack
-when the thread terminates.}
+\begin{center}
+\begin{tabular}{|l|l|l|l|}\hline
+\emph{@SEQ_FRAME@} \\ \hline
+\end{tabular}
+\end{center}
+\end{description}
\Subsubsection{Case expressions}{case-expr-overview}
\item
-Foreign Objects are a form of weak pointer which lets Haskell access
-foreign objects.
+Weak pointers and foreign objects provide finalisation support for
+Haskell references to external objects.
\end{itemize}
never to leave less than @MIN_SIZE_SHRUNKEN_STACK@ empty words on the
stack when it does so.
-\ToDo{Would it be useful for the storage manager to enlarge the stack?}
-
\item
For efficiency reasons, very large objects (eg large arrays and TSOs)
\Subsection{Heap Objects}{heap-objects}
+\label{sec:fixed-header}
\begin{figure}
\begin{center}
\end{itemize}
Most of the RTS is completely insensitive to the number of admin
-words. The total size of the fixed header is @FIXED_HS@.
+words. The total size of the fixed header is given by
+@sizeof(StgHeader)@.
\Subsection{Info Tables}{info-tables}
\hline Parallelism Info & variable
\\ \hline Profile Info & variable
\\ \hline Debug Info & variable
-\\ \hline Static reference table & 32 bits (optional)
-\\ \hline Storage manager layout info & 32 bits
-\\ \hline Closure type & 16 bits
-\\ \hline Constructor Tag & 16 bits
+\\ \hline Static reference table & pointer word (optional)
+\\ \hline Storage manager layout info & pointer word
+\\ \hline Closure flags & 8 bits
+\\ \hline Closure type & 8 bits
+\\ \hline Constructor Tag / SRT length & 16 bits
\\ \hline entry code
\\ \vdots
\end{tabular}
\end{center}
+On a 64-bit machine the tag, type and flags fields will all be doubled
+in size, so the info table is a multiple of 64 bits.
+
An info table has the following contents (working backwards in memory
addresses):
preceded by the rest of the info table. An \emph{info pointer} always
points to the first byte of the entry code.
-\item A 16-bit constructor tag. This field is used for constructor
-info-tables only (\secref{CONSTR}), and contains an integer
-representing the tag value of the constructor, in the range $0..n-1$
-where $n$ is the number of constructors in the datatype.
+\item A 16-bit constructor tag / SRT length. For a constructor info
+table this field contains the tag of the constructor, in the range
+$0..n-1$ where $n$ is the number of constructors in the datatype.
+Otherwise, it contains the number of entries in this closure's Static
+Reference Table (\secref{srt}).
-\item An 16-bit {\em closure type field}, which identifies what kind of
+\item An 8-bit {\em closure type field}, which identifies what kind of
closure the object is. The various types of closure are described in
\secref{closures}.
+\item an 8-bit flags field, which holds various flags pertaining to
+the closure type.
+
\item A single pointer or word --- the {\em storage manager info
field}, contains auxiliary information describing the closure's
precise layout, for the benefit of the garbage collector and the code
field followed by two or more bitmap words. This layout information
is used for @RET_BIG@ and @RET_VEC_BIG@ closures.
-\item Selector Thunks (\secref{THUNK_SEL}) use the closure
+\item Selector Thunks (\secref{THUNK_SELECTOR}) use the closure
layout field to hold the selector index, since the layout is always
known (the closure contains a single pointer field).
\end{itemize}
\item @THUNK_*@
\item @RET_*@
\end{itemize}
-
+
\item \emph{Profiling info\/}
\ToDo{The profiling info is completely bogus. I've not deleted it
\emph{Stack closures} are closures allocated within a thread's stack
(which is itself a heap object). Unlike other closures, there are
never any pointers to stack closures. Stack closures are discussed in
-\secref{stacks}.
+\secref{TSO}.
\end{itemize}
A second useful classification is this:
state objects, do not represent values in the original program.
\end{itemize}
-Only pointed objects can be entered. All pointed objects share a
-common header format: the ``pointed header''; while all unpointed
-objects share a common header format: the ``unpointed header''.
-\ToDo{Describe the difference and update the diagrams to mention an
-appropriate header type.}
+Only pointed objects can be entered. If an unpointed object is
+entered the program will usually terminate with a fatal error.
This section enumerates all the kinds of heap objects in the system.
Each is identified by a distinct closure type field in its info table.
\hline
@CONSTR@ & \ref{sec:CONSTR} \\
+@CONSTR_p_n@ & \ref{sec:CONSTR} \\
@CONSTR_STATIC@ & \ref{sec:CONSTR} \\
-@CONSTR_STATIC_NOCAF@ & \ref{sec:CONSTR} \\
+@CONSTR_NOCAF_STATIC@ & \ref{sec:CONSTR} \\
@FUN@ & \ref{sec:FUN} \\
+@FUN_p_n@ & \ref{sec:FUN} \\
@FUN_STATIC@ & \ref{sec:FUN} \\
@THUNK@ & \ref{sec:THUNK} \\
+@THUNK_p_n@ & \ref{sec:THUNK} \\
@THUNK_STATIC@ & \ref{sec:THUNK} \\
-@THUNK_SELECTOR@ & \ref{sec:THUNK_SEL} \\
+@THUNK_SELECTOR@ & \ref{sec:THUNK_SELECTOR} \\
@BCO@ & \ref{sec:BCO} \\
-@BCO_CAF@ & \ref{sec:BCO} \\
-@AP@ & \ref{sec:AP} \\
+@AP_UPD@ & \ref{sec:AP_UPD} \\
@PAP@ & \ref{sec:PAP} \\
@IND@ & \ref{sec:IND} \\
@IND_OLDGEN_PERM@ & \ref{sec:IND} \\
@IND_STATIC@ & \ref{sec:IND} \\
+@CAF_UNENTERED@ & \ref{sec:CAF} \\
+@CAF_ENTERED@ & \ref{sec:CAF} \\
+@CAF_BLACKHOLE@ & \ref{sec:CAF} \\
+
\hline
\emph{Unpointed} \\
\hline
-@ARR_WORDS@ & \ref{sec:ARR_WORDS1},\ref{sec:ARR_WORDS2} \\
-@ARR_PTRS@ & \ref{sec:ARR_PTRS} \\
-@MUTVAR@ & \ref{sec:MUTVAR} \\
-@MUTARR_PTRS@ & \ref{sec:MUTARR_PTRS} \\
-@MUTARR_PTRS_FROZEN@ & \ref{sec:MUTARR_PTRS_FROZEN} \\
+@BLACKHOLE@ & \ref{sec:BLACKHOLE} \\
+@BLACKHOLE_BQ@ & \ref{sec:BLACKHOLE_BQ} \\
-@FOREIGN@ & \ref{sec:FOREIGN} \\
-
-@BH@ & \ref{sec:BH} \\
@MVAR@ & \ref{sec:MVAR} \\
-@IVAR@ & \ref{sec:IVAR} \\
-@FETCHME@ & \ref{sec:FETCHME} \\
+
+@ARR_WORDS@ & \ref{sec:ARR_WORDS} \\
+
+@MUTARR_PTRS@ & \ref{sec:MUT_ARR_PTRS} \\
+@MUTARR_PTRS_FROZEN@ & \ref{sec:MUT_ARR_PTRS_FROZEN} \\
+
+@MUT_VAR@ & \ref{sec:MUT_VAR} \\
+
+@WEAK@ & \ref{sec:WEAK} \\
+@FOREIGN@ & \ref{sec:FOREIGN} \\
+@STABLE_NAME@ & \ref{sec:STABLE_NAME} \\
\hline
\end{tabular}
@RET_BIG@ & \ref{sec:activation-records} \\
@RET_VEC_BIG@ & \ref{sec:activation-records} \\
@UPDATE_FRAME@ & \ref{sec:activation-records} \\
+@CATCH_FRAME@ & \ref{sec:activation-records} \\
+@SEQ_FRAME@ & \ref{sec:activation-records} \\
+@STOP_FRAME@ & \ref{sec:activation-records} \\
\hline
\end{tabular}
-There are also a number of administrative objects.
+There are also a number of administrative objects. It is an error to
+enter one of these objects.
\begin{tabular}{|l|l|}\hline
closure type & Section \\ \hline
@TSO@ & \ref{sec:TSO} \\
-@STABLEPTR_TABLE@ & \ref{sec:STABLEPTR_TABLE} \\
@SPARK_OBJECT@ & \ref{sec:SPARK} \\
@BLOCKED_FETCH@ & \ref{sec:BLOCKED_FETCH} \\
+@FETCHME@ & \ref{sec:FETCHME} \\
\hline
\end{tabular}
-\ToDo{I guess the parallel system has something like a stable ptr
-table. Is there any opportunity for sharing code/data structures
-here?}
-
-
\Subsection{Predicates}{closure-predicates}
-\ToDo{The following is a first attempt at defining a useful set of
-predicates. Some (such as @isWHNF@ and @isSparkable@) may need their
-definitions tweaked a little.}
-
The runtime system sometimes needs to be able to distinguish objects
according to their properties: is the object updateable? is it in weak
head normal form? etc. These questions can be answered by examining
field to ``cache'' the answers to some of these predicates.
An indirection either points to HNF (post update); or is result of
-overwriting a FetchMe, in which case the thing fetched is either
-under evaluation (BH), or by now an HNF. Thus, indirections get NoSpark flag.
-
-
-\iffalse
-@
-#define _NF 0x0001 /* Normal form */
-#define _NS 0x0002 /* Don't spark */
-#define _ST 0x0004 /* Is static */
-#define _MU 0x0008 /* Is mutable */
-#define _UP 0x0010 /* Is updatable (but not mutable) */
-#define _BM 0x0020 /* Is a "rimitive" array */
-#define _BH 0x0040 /* Is a black hole */
-#define _IN 0x0080 /* Is an indirection */
-#define _TH 0x0100 /* Is a thunk */
-
-
-
-SPEC
-SPEC_N SPEC | _NF | _NS
-SPEC_S SPEC | _TH
-SPEC_U SPEC | _UP | _TH
-
-GEN
-GEN_N GEN | _NF | _NS
-GEN_S GEN | _TH
-GEN_U GEN | _UP | _TH
-
-DYN _NF | _NS
-TUPLE _NF | _NS | _BM
-DATA _NF | _NS | _BM
-MUTUPLE _NF | _NS | _MU | _BM
-IMMUTUPLE _NF | _NS | _BM
-STATIC _NS | _ST
-CONST _NF | _NS
-CHARLIKE _NF | _NS
-INTLIKE _NF | _NS
-
-BH _NS | _BH
-BH_N BH
-BH_U BH | _UP
-
-BQ _NS | _MU | _BH
-IND _NS | _IN
-CAF _NF | _NS | _ST | _IN
-
-FM
-FETCHME FM | _MU
-FMBQ FM | _MU | _BH
-
-TSO _MU
-
-STKO
-STKO_DYNAMIC STKO | _MU
-STKO_STATIC STKO | _ST
-
-SPEC_RBH _NS | _MU | _BH
-GEN_RBH _NS | _MU | _BH
-BF _NS | _MU | _BH
-INTERNAL
-
-@
-\fi
-
+overwriting a FetchMe, in which case the thing fetched is either under
+evaluation (BLACKHOLE), or by now an HNF. Thus, indirections get
+NoSpark flag.
\subsection{Closures (aka Pointed Objects)}
\emph{Fixed header} & \emph{Pointers} & \emph{Non-pointers} \\ \hline
\end{tabular}
\end{center}
+
The data words (pointers and non-pointers) are the free variables of
-the function closure.
-The number of pointers
-and number of non-pointers are stored in the @INFO_SM@ word, in the least significant
-and most significant half-word respectively.
+the function closure. The number of pointers and number of
+non-pointers are stored in @info->layout.ptrs@ and
+@info->layout.nptrs@ respecively.
There are several different sorts of function closure, distinguished
by their closure type field:
been scavenged.
\note{Static function closures that have no static references, and
-hence a null SRT pointer, don't need the static object link field. Is
-it worth taking advantage of this? See @CONSTR_STATIC_NOCAF@.}
+hence a null SRT pointer, don't need the static object link field. We
+don't take advantage of this at the moment, but we could. See
+@CONSTR_NOCAF_STATIC@.}
\end{itemize}
Each lambda abstraction, $f$, in the STG program has its own private
constructors can store their data fields in exactly the same place as
dynamic constructors.
-\item @CONSTR_STATIC_NOCAF@. A statically allocated data constructor
+\item @CONSTR_NOCAF_STATIC@. A statically allocated data constructor
that guarantees not to point (directly or indirectly) to any CAF
(\secref{CAF}). This means it does not need a static object
link field. Since we expect that there might be quite a lot of static
Each constructor also has a \emph{constructor function}, which is a
curried function which builds an instance of the constructor. The
-constructor function has an info table labelled as @$Con$_info@.
+constructor function has an info table labelled as @$Con$_info@, and
+entry code pointed to by @$Con$_entry@.
Nullary constructors are represented by a single static info table,
which everyone points to. Thus for a nullary constructor we can omit
\subsubsection{Thunks}
\label{sec:THUNK}
-\label{sec:THUNK_SEL}
+\label{sec:THUNK_SELECTOR}
A thunk represents an expression that is not obviously in head normal
form. For example, consider the following top-level definitions:
is static while the latter is dynamic.
The layout of a thunk is the same as that for a function closure.
-However, thunks must have a payload of at least @MIN_UPD_PAYLOAD@
+However, thunks must have a payload of at least @MIN_UPD_SIZE@
words to allow it to be overwritten with a black hole and an
indirection. The compiler may have to add extra non-pointer fields to
satisfy this constraint.
\begin{itemize}
\item @THUNK@ and $@THUNK_@p@_@np$: vanilla, dynamically allocated
-thunks. Dynamic thunks are overwritten with normal indirections.
+thunks. Dynamic thunks are overwritten with normal indirections
+(@IND@), or old generation indirections (@IND_OLDGEN@): see
+\secref{IND}.
\item @THUNK_STATIC@. A static thunk is also known as a
\emph{constant applicative form}, or \emph{CAF}. Static thunks are
overwritten with static indirections.
\begin{center}
-\begin{tabular}{|l|l|l|l|l|}\hline
-\emph{Fixed header} & \emph{Pointers} & \emph{Non-pointers} \emph{Static object link}\\ \hline
+\begin{tabular}{|l|l|}\hline
+\emph{Fixed header} & \emph{Static object link}\\ \hline
\end{tabular}
\end{center}
\end{itemize}
The only label associated with a thunk is its info table:
+
\begin{description}
\item[$f$@_info@] is $f$'s info table.
\end{description}
\Subsubsection{Partial applications}{PAP}
-\ToDo{PAPs don't contains update frames or activation frames. When we
-add revertible black holes, we'll introduce a new kind of object which
-can contain activation frames.}
+A partial application (PAP) represents a function applied to too few
+arguments. It is only built as a result of updating after an
+argument-satisfaction check failure. A PAP has the following shape:
-A partial application (PAP) represents a function applied to too few arguments.
-It is only built as a result of updating after an argument-satisfaction
-check failure. A PAP has the following shape:
\begin{center}
\begin{tabular}{|l|l|l|l|}\hline
-\emph{Fixed header} & \emph{No of arg words} & \emph{Function closure} & \emph{Arg stack} \\ \hline
+\emph{Fixed header} & \emph{No of words of stack} & \emph{Function closure} & \emph{Stack chunk ...} \\ \hline
\end{tabular}
\end{center}
-The ``arg stack'' is a copy of the chunk of stack above the update
-frame; ``no of arg words'' tells how many words it consists of. The
-function closure is (a pointer to) the closure for the function whose
-argument-satisfaction check failed.
+
+The ``Stack chunk'' is a copy of the chunk of stack above the update
+frame; ``No of words of stack'' tells how many words it consists of.
+The function closure is (a pointer to) the closure for the function
+whose argument-satisfaction check failed.
+
+In the normal case where a PAP is built as a result of an argument
+satisfaction check failure, the stack chunk will just contain
+``pending arguments'', ie. pointers and tagged non-pointers. It may
+in fact also contain activation records, but not update frames, seq
+frames, or catch frames. The reason is the garbage collector uses the
+same code to scavenge a stack as it does to scavenge the payload of a
+PAP, but an update frame contains a link to the next update frame in
+the chain and this link would need to be relocated during garbage
+collection. Revertible black holes and asynchronous exceptions use
+the more general form of PAPs (see Section \ref{revertible-bh}).
There is just one standard form of PAP. There is just one info table
too, called @PAP_info@. Its entry code simply copies the arg stack
chunk back on top of the stack and enters the function closure. (It
has to do a stack overflow test first.)
+There is just one way to build a PAP: by calling @stg_update_PAP@ with
+the function closure in register @R1@ and the pending arguments on the
+stack. The @stg_update_PAP@ function will build the PAP, perform the
+update, and return to the next activation record on the stack. If
+there are \emph{no} pending arguments on the stack, then no PAP need
+be built: in this case @stg_update_PAP@ just overwrites the updatee
+with an indirection to the function closure.
+
PAPs are also used to implement Hugs functions (where the arguments
are free variables). PAPs generated by Hugs can be static so we need
both @PAP@ and @PAP_STATIC@.
-\Subsubsection{@AP@ objects}{AP}
+\Subsubsection{@AP_UPD@ objects}{AP_UPD}
-@AP@ objects are used to represent thunks built by Hugs. The only
-distintion between an @AP@ and a @PAP@ is that an @AP@ is updateable.
+@AP_UPD@ objects are used to represent thunks built by Hugs. The only
+distintion between an @AP_UPD@ and a @PAP@ is that an @AP_UPD@ is
+updateable.
\begin{center}
\begin{tabular}{|l|l|l|l|}
\hline
-\emph{Fixed Header} & \emph{No of arg words} & \emph{Function closure} & \emph{Arg stack} \\
+\emph{Fixed Header} & \emph{No of stack words} & \emph{Function closure} & \emph{Stack chunk} \\
\hline
\end{tabular}
\end{center}
top of the stack, and enters the function closure. (It has to do a
stack overflow test first.)
-The ``arg stack'' is a block of (tagged) arguments representing the
-free variables of the thunk; ``no of arg words'' tells how many words
-it consists of. The function closure is (a pointer to) the closure
-for the thunk. The argument stack may be empty if the thunk has no
-free variables.
+The ``stack chunk'' is a block of stack not containing update frames,
+seq frames or catch frames (just like a PAP). In the case of Hugs,
+the stack chunk will contain the free variables of the thunk, and the
+function closure is (a pointer to) the closure for the thunk. The
+argument stack may be empty if the thunk has no free variables.
-\note{Since @AP@s are updateable, the @MIN_UPD_PAYLOAD@ constraint
+\note{Since @AP_UPD@s are updateable, the @MIN_UPD_SIZE@ constraint
applies here too.}
\Subsubsection{Indirections}{IND}
indirections simply enters the closure it points to.
There are several forms of indirection:
+
\begin{description}
\item[@IND@] is the vanilla, dynamically-allocated indirection.
It is removed by the garbage collector. It has the following
shape:
\begin{center}
\begin{tabular}{|l|l|l|}\hline
-\emph{Fixed header} & \emph{Mutable link field} & \emph{Target closure} \\ \hline
+\emph{Fixed header} & \emph{Target closure} \\ \hline
\end{tabular}
\end{center}
-It contains a \emph{mutable link field} that is used to string together
-indirections in each generation.
+An @IND@ only exists in the youngest generation. In older
+generations, we have @IND_OLDGEN@s. The update code
+(@Upd_frame_$n$_entry@) checks whether the updatee is in the youngest
+generation before deciding which kind of indirection to use.
-\item[@IND_PERMANENT@]
+\item[@IND_OLDGEN@] is the vanilla, dynamically-allocated indirection.
+It is removed by the garbage collector. It has the following
+shape:
+\begin{center}
+\begin{tabular}{|l|l|l|}\hline
+\emph{Fixed header} & \emph{Target closure} & \emph{Mutable link field} \\ \hline
+\end{tabular}
+\end{center}
+It contains a \emph{mutable link field} that is used to string together
+mutable objects in each old generation.
+
+\item[@IND_PERM@]
for lexical profiling, it is necessary to maintain cost centre
information in an indirection, so ``permanent indirections'' are
retained forever. Otherwise they are just like vanilla indirections.
\note{Do we still need @IND@ in the profiling build, or do we just
need @IND@ but its behaviour changes when profiling is on?}
+\item[@IND_OLDGEN_PERM@]
+Just like an @IND_OLDGEN@, but sticks around like an @IND_PERM@.
+
\item[@IND_STATIC@] is used for overwriting CAFs when they have been
evaluated. Static indirections are not removed by the garbage
collector; and are statically allocated outside the heap (and should
\begin{center}
\begin{tabular}{|l|l|l|}
\hline
-\emph{Fixed header} & \emph{Target closure} & \emph{Static object link} \\
+\emph{Fixed header} & \emph{Target closure} & \emph{Static link field} \\
\hline
\end{tabular}
\end{center}
\end{description}
-\Subsubsection{Black holes and blocking queues}{BH}
+\subsubsection{Black holes and blocking queues}
+\label{sec:BLACKHOLE}
+\label{sec:BLACKHOLE_BQ}
Black hole closures are used to overwrite closures currently being
evaluated. They inform the garbage collector that there are no live
roots in the closure, thus removing a potential space leak.
-Black holes also become synchronization points in the threaded world.
-They contain a pointer to a list of blocked threads to be awakened
-when the black hole is updated (or @NULL@ if the list is empty).
+Black holes also become synchronization points in the concurrent
+world. When a thread attempts to enter a blackhole, it must wait for
+the result of the computation, which is presumably in progress in
+another thread.
+
+\note{In a single-threaded system, entering a black hole indicates an
+infinite loop. In a concurrent system, entering a black hole
+indicates an infinite loop only if the hole is being entered by the
+same thread that originally entered the closure. It could also bring
+about a deadlock situation where several threads are waiting
+circularly on computations in progress.}
+
+There are two types of black hole:
+
+\begin{description}
+
+\item[@BLACKHOLE@]
+A straightforward blackhole just consists of an info pointer and some
+padding to allow updating with an @IND_OLDGEN@ if necessary. This
+type of blackhole has no waiting threads.
+
+\begin{center}
+\begin{tabular}{|l|l|l|}
+\hline
+\emph{Fixed header} & \emph{Padding} & \emph{Padding} \\
+\hline
+\end{tabular}
+\end{center}
+
+If we're doing \emph{eager blackholing} then a thunk's info pointer is
+overwritten with @BLACKHOLE_info@ at the time of entry; hence the need
+for blackholes to be small, otherwise we'd be overwriting part of the
+thunk itself.
+
+\item[@BLACKHOLE_BQ@]
+When a thread enters a @BLACKHOLE@, it is turned into a @BLACKHOLE_BQ@
+(blocking queue), which contains a linked list of blocked threads in
+addition to the info pointer.
\begin{center}
\begin{tabular}{|l|l|l|}
\hline
-\emph{Fixed header} & \emph{Mutable link} & \emph{Blocked thread link} \\
+\emph{Fixed header} & \emph{Blocked thread link} & \emph{Mutable link field} \\
\hline
\end{tabular}
\end{center}
The \emph{Blocked thread link} points to the TSO of the first thread
waiting for the value of this thunk. All subsequent TSOs in the list
-are linked together using their @TSO_LINK@ field.
+are linked together using their @tso->link@ field, ending in
+@END_TSO_QUEUE_closure@.
-When the blocking queue is non-@NULL@ and the @BH@ is in the old
-generation, the black hole must be added to the mutables list since
-the TSOs on the list may contain pointers into the new generation.
-There is no need to clutter up the mutables list with black holes with
-empty blocking queues.
-
-\note{In a single-threaded system, entering a black hole indicates an
-infinite loop. In a concurrent system, entering a black hole
-indicates an infinite loop only if the hole is being entered by the
-same thread that originally entered the closure.}
+Because new threads can be added to the \emph{Blocked thread link}, a
+blocking queue is \emph{mutable}, so we need a mutable link field in
+order to chain it on to a mutable list for the generational garbage
+collector.
+\end{description}
\Subsubsection{FetchMes}{FETCHME}
entered.
\subsubsection{Immutable objects}
-\label{sec:ARR_WORDS1}
-\label{sec:ARR_PTRS}
+\label{sec:ARR_WORDS}
\begin{description}
\item[@ARR_WORDS@] is a variable-sized object consisting solely of
non-pointers. It is used for arrays of all sorts of things (bytes,
words, floats, doubles... it doesn't matter).
-\begin{center}
-\begin{tabular}{|c|c|c|c|}
-\hline
-\emph{Fixed Hdr} & \emph{No of non-pointers} & \emph{Non-pointers\ldots} \\ \hline
-\end{tabular}
-\end{center}
-\item[@ARR_PTRS@] is an immutable, variable sized array of pointers.
+Strictly speaking, an @ARR_WORDS@ could be mutable, but because it
+only contains non-pointers we don't need to track this fact.
+
\begin{center}
\begin{tabular}{|c|c|c|c|}
\hline
-\emph{Fixed Hdr} & \emph{Mutable link} & \emph{No of pointers} & \emph{Pointers\ldots} \\ \hline
+\emph{Fixed Hdr} & \emph{No of non-pointers} & \emph{Non-pointers\ldots} \\ \hline
\end{tabular}
\end{center}
-The mutable link is present so that we can easily freeze and thaw an
-array (by changing the header and adding/removing the array to the
-mutables list).
-
\end{description}
\subsubsection{Mutable objects}
\label{sec:mutables}
-\label{sec:ARR_WORDS2}
-\label{sec:MUTVAR}
-\label{sec:MUTARR_PTRS}
-\label{sec:MUTARR_PTRS_FROZEN}
+\label{sec:MUT_VAR}
+\label{sec:MUT_ARR_PTRS}
+\label{sec:MUT_ARR_PTRS_FROZEN}
+\label{sec:MVAR}
Some of these objects are \emph{mutable}; they represent objects which
-are explicitly mutated by Haskell code through the @ST@ monad.
-They're not used for thunks which are updated precisely once.
+are explicitly mutated by Haskell code through the @ST@ or @IO@
+monads. They're not used for thunks which are updated precisely once.
Depending on the garbage collector, mutable closures may contain extra
header information which allows a generational collector to implement
the ``write barrier.''
-\begin{description}
+Notice that mutable objects all have the same general layout: there is
+a mutable link field as the second word after the header. This is so
+that code to process old-generation mutable lists doesn't need to look
+at the type of the object to determine where its link field is.
-\item[@ARR_WORDS@] is also used to represent \emph{mutable} arrays of
-bytes, words, floats, doubles, etc. It's possible to use the same
-object type because even generational collectors don't need to
-distinguish them.
+\begin{description}
-\item[@MUTVAR@] is a mutable variable.
+\item[@MUT_VAR@] is a mutable variable.
\begin{center}
\begin{tabular}{|c|c|c|}
\hline
-\emph{Fixed Hdr} & \emph{Mutable link} & \emph{Pointer} \\ \hline
+\emph{Fixed Hdr} \emph{Pointer} & \emph{Mutable link} & \\ \hline
\end{tabular}
\end{center}
-\item[@MUTARR_PTRS@] is a mutable array of pointers.
-Such an array may be \emph{frozen}, becoming an @ARR_PTRS@, with a
+\item[@MUT_ARR_PTRS@] is a mutable array of pointers. Such an array
+may be \emph{frozen}, becoming an @MUT_ARR_PTRS_FROZEN@, with a
different info-table.
+
\begin{center}
\begin{tabular}{|c|c|c|c|}
\hline
-\emph{Fixed Hdr} & \emph{Mutable link} & \emph{No of ptrs} & \emph{Pointers\ldots} \\ \hline
+\emph{Fixed Hdr} & \emph{No of ptrs} & \emph{Mutable link} & \emph{Pointers\ldots} \\ \hline
\end{tabular}
\end{center}
+\item[@MUT_ARR_PTRS_FROZEN@] This is the immutable version of
+@MUT_ARR_PTRS@. It still has a mutable link field for two reasons: we
+need to keep it on the mutable list for an old generation at least
+until the next garbage collection, and it may become mutable again via
+@thawArray@.
+
+\begin{center}
+\begin{tabular}{|c|c|c|c|}
+\hline
+\emph{Fixed Hdr} & \emph{No of ptrs} & \emph{Mutable link} & \emph{Pointers\ldots} \\ \hline
+\end{tabular}
+\end{center}
+
+\item[@MVAR@]
+
+\begin{center}
+\begin{tabular}{|l|l|l|l|l|}
+\hline
+\emph{Fixed header} & \emph{Head} & \emph{Mutable link} & \emph{Tail}
+& \emph{Value}\\
+\hline
+\end{tabular}
+\end{center}
+
+\ToDo{MVars}
+
\end{description}
\begin{center}
\begin{tabular}{|l|l|l|l|}
\hline
-\emph{Fixed header} & \emph{Data} & \emph{Free Routine} & \emph{Foreign object link} \\
+\emph{Fixed header} & \emph{Data} \\
\hline
\end{tabular}
\end{center}
-The @FreeRoutine@ is a reference to the finalisation routine to call
-when the @ForeignObj@ becomes garbage. If @freeForeignObject@ is
-called on a Foreign Object, the @FreeRoutine@ is set to zero and the
-garbage collector will not attempt to call @FreeRoutine@ when the
-object becomes garbage.
+A foreign object is simple a boxed pointer to an address outside the
+Haskell heap, possible to @malloc@ed data. The only reason foreign
+objects exist is so that we can track the lifetime of one using weak
+pointers (see \secref{WEAK}) and run a finaliser when the foreign
+object is unreachable.
-The Foreign object link is a link to the next foreign object in the
-list. This list is traversed at the end of garbage collection: if an
-object is about to be deallocated (e.g.~it was not marked or
-evacuated), the free routine is called and the object is deleted from
-the list.
+\subsubsection{Weak pointers}
+\label{sec:WEAK}
-\subsubsection{MVars and IVars}
-\label{sec:MVAR}
-\label{sec:IVAR}
+\begin{center}
+\begin{tabular}{|l|l|l|l|l|}
+\hline
+\emph{Fixed header} & \emph{Key} & \emph{Value} & \emph{Finaliser}
+& \emph{Link}\\
+\hline
+\end{tabular}
+\end{center}
-\ToDo{MVars and IVars}
+\ToDo{Weak poitners}
+\subsubsection{Stable names}
+\label{sec:STABLE_NAME}
+\begin{center}
+\begin{tabular}{|l|l|l|l|}
+\hline
+\emph{Fixed header} & \emph{Index} \\
+\hline
+\end{tabular}
+\end{center}
+
+\ToDo{Stable names}
The remaining objects types are all administrative --- none of them
may be entered.
always unreachable --- they can only be accessed by linearly scanning
the heap.
+\note{Currently we don't use slop objects because the storage manager
+isn't reliant on objects being adjacent, but if we move to a ``mostly
+copying'' style collector, this will become an issue.}
+
\end{description}
\Subsection{Thread State Objects (TSOs)}{TSO}
-\ToDo{This is very out of date. We now embed a single stack object
-within the TSO. TSOs include an ID number which can be used to
-generate a hash value. The gransim, profiling and ticky info is
-surely bogus.}
-
In the multi-threaded system, the state of a suspended thread is
packed up into a Thread State Object (TSO) which contains all the
information needed to restart the thread and for the garbage collector
\begin{center}
\begin{tabular}{|l|l|}
\hline \emph{Fixed header}
-\\ \hline @TSO_LINK@
-\\ \hline @TSO_STATE@
+\\ \hline \emph{Link field}
+\\ \hline \emph{Mutable link field}
+\\ \hline \emph{What next}
+\\ \hline \emph{State}
+\\ \hline \emph{Thread Id}
\\ \hline \emph{Exception Handlers}
\\ \hline \emph{Ticky Info}
\\ \hline \emph{Profiling Info}
\\ \hline \emph{Parallel Info}
\\ \hline \emph{GranSim Info}
+\\ \hline \emph{Stack size}
+\\ \hline \emph{Max Stack size}
+\\ \hline \emph{Sp}
+\\ \hline \emph{Su}
+\\ \hline \emph{SpLim}
\\ \hline
\\
\emph{Stack}
\end{tabular}
\end{center}
The contents of a TSO are:
-\begin{itemize}
+\begin{description}
-\item A pointer (@TSO_LINK@) used to maintain a list of threads with a similar
- state (e.g.~all runnable, all sleeping, all blocked on the same black
- hole, all blocked on the same MVar, etc.)
+\item[\emph{Link field}] This is a pointer used to maintain a list of
+threads with a similar state (e.g.~all runnable, all sleeping, all
+blocked on the same black hole, all blocked on the same MVar,
+etc.)
-\item A word (@TSO_STATE@) which records the current state of a thread: running, runnable, blocked, etc.
+\item[\emph{Mutable link field}] Because the stack is mutable by
+definition, the generational collector needs to track TSOs in older
+generations that may point into younger ones (which is just about any
+TSO for a thread that has run recently). Hence the need for a mutable
+link field (see \secref{mutables}).
-\item Optional information for ``Ticky Ticky'' statistics: @TSO_STK_HWM@ is
- the maximum number of words allocated to this thread.
+\item[\emph{What next}]
+This field has five values:
+\begin{description}
+\item[@ThreadEnterGHC@] The thread can be started by entering the
+closure pointed to by the word on the top of the stack.
+\item[@ThreadRunGHC@] The thread can be started by jumping to the
+address on the top of the stack.
+\item[@ThreadEnterHugs@] The stack has a pointer to a Hugs-built
+closure on top of the stack: enter the closure to run the thread.
+\item[@ThreadKilled@] The thread has been killed (by @killThread#@).
+It is probably still around because it is on some queue somewhere and
+hasn't been garbage collected yet.
+\item[@ThreadComplete@] The thread has finished. Its @TSO@ hasn't
+been garbage collected yet.
+\end{description}
-\item Optional information for profiling:
- @TSO_CCC@ is the current cost centre.
+\item[\emph{Thread Id}]
+This field contains a (not necessarily unique) integer that identifies
+the thread. It can be used eg. for hashing.
-\item Optional information for parallel execution:
+\item[\emph{Ticky Info}] Optional information for ``Ticky Ticky''
+statistics: @TSO_STK_HWM@ is the maximum number of words allocated to
+this thread.
+
+\item[\emph{Profiling Info}] Optional information for profiling:
+@TSO_CCC@ is the current cost centre.
+
+\item[\emph{Parallel Info}]
+Optional information for parallel execution.
% \begin{itemize}
%
% \item I've no idea what else
%
% \end{itemize}
-%
+
+\item[\emph{GranSim Info}]
+Optional information for gransim execution.
+
% \item Optional information for GranSim execution:
% \begin{itemize}
% \item locked
% Q_MIGRATING
%
-\end{itemize}
-
-\subsection{Stack Objects}
-\label{sec:STACK_OBJECT}
-\label{sec:stacks}
+\item[\emph{Stack Info}] Various fields contain information on the
+stack: its current size, its maximum size (to avoid infinite loops
+overflowing the memory), the current stack pointer (\emph{Sp}), the
+current stack update frame pointer (\emph{Su}), and the stack limit
+(\emph{SpLim}). The latter three fields are loaded into the relevant
+registers when the thread is run.
-\ToDo{Merge this in with the section on TSOs}
+\item[\emph{Stack}] This is the actual stack for the thread,
+\emph{Stack size} words long. It grows downwards from higher
+addresses to lower addresses. When the stack overflows, it will
+generally be relocated into larger premises unless \emph{Max stack
+size} is reached.
-These are ``stack objects,'' which are used in the threaded world as
-the stack for each thread is allocated from the heap in smallish
-chunks. (The stack in the sequential world is allocated outside of
-the heap.)
-
-Each reduction thread has to have its own stack space. As there may
-be many such threads, and as any given one may need quite a big stack,
-a naive give-'em-a-big-stack-and-let-'em-run approach will cost a {\em
-lot} of memory.
-
-Our approach is to give a thread a small stack space, and then link
-on/off extra ``chunks'' as the need arises. Again, this is a
-storage-management problem, and, yet again, we choose to graft the
-whole business onto the existing heap-management machinery. So stack
-objects will live in the heap, be garbage collected, etc., etc..
-
-A stack object is laid out like this:
-
-\begin{center}
-\begin{tabular}{|l|}
-\hline
-\emph{Fixed header}
-\\ \hline
-\emph{Link to next stack object (0 for last)}
-\\ \hline
-\emph{N, the payload size in words}
-\\ \hline
-\emph{@Sp@ (byte offset from head of object)}
-\\ \hline
-\emph{@Su@ (byte offset from head of object)}
-\\ \hline
-\emph{Payload (N words)}
-\\ \hline
-\end{tabular}
-\end{center}
-
-The stack grows downwards, towards decreasing
-addresses. This makes it easier to print out the stack
-when debugging, and it means that a return address is
-at the lowest address of the chunk of stack it ``knows about''
-just like an info pointer on a closure.
+\end{description}
The garbage collector needs to be able to find all the
pointers in a stack. How does it do this?
\end{itemize}
-The game plan is this. The garbage collector
-walks the stack from the top, traversing pending-argument sections and
-activation records alternately. Next we discuss how it finds
-the pointers in each of these two stack regions.
-
+The game plan is this. The garbage collector walks the stack from the
+top, traversing pending-argument sections and activation records
+alternately. Next we discuss how it finds the pointers in each of
+these two stack regions.
\Subsubsection{Activation records}{activation-records}
-
An \emph{activation record} is a contiguous chunk of stack,
with a return address as its first word, followed by as many
data words as the return address ``knows about''. The return
There's no need to link the stable pointer table onto the mutable
list because we always treat it as a root.
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+\Subsection{Garbage Collecting CAFs}{CAF}
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+
+% begin{direct quote from current paper}
+A CAF (constant applicative form) is a top-level expression with no
+arguments. The expression may need a large, even unbounded, amount of
+storage when it is fully evaluated.
+
+CAFs are represented by closures in static memory that are updated
+with indirections to objects in the heap space once the expression is
+evaluated. Previous version of GHC maintained a list of all evaluated
+CAFs and traversed them during GC, the result being that the storage
+allocated by a CAF would reside in the heap until the program ended.
+% end{direct quote from current paper}
+
+% begin{elaboration on why CAFs are very very bad}
+Treating CAFs this way has two problems:
+\begin{itemize}
+\item
+It can cause a very large space leak. For example, this program
+should run in constant space but, instead, will run out of memory.
+\begin{verbatim}
+> main :: IO ()
+> main = print nats
+>
+> nats :: [Int]
+> nats = [0..maxInt]
+\end{verbatim}
+
+\item
+Expressions with no arguments have very different space behaviour
+depending on whether or not they occur at the top level. For example,
+if we make \verb+nats+ a local definition, the space leak goes away
+and the resulting program runs in constant space, as expected.
+\begin{verbatim}
+> main :: IO ()
+> main = print nats
+> where
+> nats :: [Int]
+> nats = [0..maxInt]
+\end{verbatim}
+
+This huge change in the operational behaviour of the program
+is a problem for optimising compilers and for programmers.
+For example, GHC will normally flatten a set of let bindings using
+this transformation:
+\begin{verbatim}
+let x1 = let x2 = e2 in e1 ==> let x2 = e2 in let x1 = e1
+\end{verbatim}
+but it does not do so if this would raise \verb+x2+ to the top level
+since that may create a CAF. Many Haskell programmers avoid creating
+large CAFs by adding a dummy argument to a CAF or by moving a CAF away
+from the top level.
+
+\end{itemize}
+% end{elaboration on why CAFs are very very bad}
+
+Solving the CAF problem requires different treatment in interactive
+systems such as Hugs than in batch-mode systems such as GHC
+\begin{itemize}
+\item
+In a batch-mode the program the runtime system is terminated
+after every execution of the runtime system. In such systems,
+the garbage collector can completely ``destroy'' a CAF when it
+is no longer live --- in much the same way as it ``destroys''
+normal closures when they are no longer live.
+
+\item
+In an interactive system, many expressions are evaluated without
+restarting the runtime system between each evaluation. In such
+systems, the garbage collector cannot completely ``destroy'' a CAF
+when it is no longer live because, whilst it might not be required in
+the evaluation of the current expression, it might be required in the
+next evaluation.
+
+There are two possible behaviours we migth want:
+\begin{enumerate}
+\item
+When a CAF is no longer required for the current evaluation, the CAF
+should be reverted to its original form. This behaviour ensures that
+the operational behaviour of the interactive system is a reasonable
+predictor of the operational behaviour of the batch-mode system. This
+allows us to use Hugs for performance debugging (in particular, trying
+to understand and reduce the heap usage of a program) --- an area of
+increasing importance as Haskell is used more and more to solve ``real
+problems'' in ``real problem domains''.
+
+\item
+Even if a CAF is no longer required for the current evaluation, we might
+choose to hang onto it by collecting it in the normal way. This keeps
+the space leak but might be useful in a teaching environment when
+trying to teach the difference between call by name evaluation (which
+doesn't share work) and lazy evaluation (which does share work).
+
+\end{enumerate}
+
+It turns out that it is easy to support both styles of use, so the
+runtime system provides a switch which lets us turn this on and off
+during execution. \ToDo{What is this switch called?} It would also
+be easy to provide a function \verb+RevertCAF+ to let the interpreter
+revert any CAF it wanted between (but not during) executions, if we so
+desired. Running \verb+RevertCAF+ during execution would lose some sharing
+but is otherwise harmless.
+
+\end{itemize}
+
+% % begin{even more pointless observation?}
+% The simplest fix would be to remove the special treatment of
+% top level variables. This works but is very inefficient.
+% ToDo: say why.
+% (Note: delete this paragraph from final version.)
+% % end{even more pointless observation?}
+
+% begin{pointless observation?}
+An easy but inefficient fix to the CAF problem would be to make a
+complete copy of the heap before every evaluation and discard the copy
+after evaluation. This works but is inefficient.
+% end{pointless observation?}
+
+An efficient way to achieve a similar effect is to revert all
+updatable thunks to their original form as they become unnecessary for
+the current evaluation. To do this, we modify the compiler to ensure
+that the only updatable thunks generated by the compiler are CAFs and
+we modify the garbage collector to revert entered CAFs to unentered
+CAFs as their value becomes unnecessary.
+
+
+\subsubsection{New Heap Objects}
+
+We add three new kinds of heap object: unentered CAF closures, entered
+CAF objects and CAF blackholes. We first describe how they are
+evaluated and then how they are garbage collected.
+\begin{itemize}
+\item
+Unentered CAF closures contain a pointer to closure representing the
+body of the CAF. The ``body closure'' is not updatable.
+
+Unentered CAF closures contain two unused fields to make them the same
+size as entered CAF closures --- which allows us to perform an inplace
+update. \ToDo{Do we have to add another kind of inplace update operation
+to the storage manager interface or do we consider this to be internal
+to the SM?}
+\begin{center}
+\begin{tabular}{|l|l|l|l|}\hline
+\verb+CAF_unentered+ & \emph{body closure} & \emph{unused} & \emph{unused} \\ \hline
+\end{tabular}
+\end{center}
+When an unentered CAF is entered, we do the following:
+\begin{itemize}
+\item
+allocate a CAF black hole;
+
+\item
+push an update frame (to update the CAF black hole) onto the stack;
+
+\item
+overwrite the CAF with an entered CAF object (see below) with the same
+body and whose value field points to the black hole;
+
+\item
+add the CAF to a list of all entered CAFs (called ``the CAF list'');
+and
+
+\item
+the closure representing the value of the CAF is entered.
+
+\end{itemize}
+
+When evaluation of the CAF body returns a value, the update frame
+causes the CAF black hole to be updated with the value in the normal
+way.
+
+\ToDo{Add a picture}
+
+\item
+Entered CAF closures contain two pointers: a pointer to the CAF body
+(the same as for unentered CAF closures); a pointer to the CAF value
+(this is initialised with a CAF blackhole, as previously described);
+and a link to the next CAF in the CAF list
+
+\ToDo{How is the end of the list marked? Null pointer or sentinel value?}.
+
+\begin{center}
+\begin{tabular}{|l|l|l|l|}\hline
+\verb+CAF_entered+ & \emph{body closure} & \emph{value} & \emph{link} \\ \hline
+\end{tabular}
+\end{center}
+When an entered CAF is entered, it enters its value closure.
+
+\item
+CAF blackholes are identical to normal blackholes except that they
+have a different infotable. The only reason for having CAF blackholes
+is to allow an optimisation of lazy blackholing where we stop scanning
+the stack when we see the first {\em normal blackhole} but not
+when we see a {\em CAF blackhole.}
+\ToDo{The optimisation we want to allow should be described elsewhere
+so that all we have to do here is describe the difference.}
+
+Instead of allocating a blackhole to update with the value of the CAF,
+it might seem simpler to update the CAF directly. This would require
+a new kind of update frame which would update the value field of the
+CAF with a pointer to the value and wouldn't catch blackholes caused
+by CAFs that depend on themselves so we chose not to do so.
+
+\end{itemize}
+
+\subsubsection{Garbage Collection}
+
+To avoid the space leak, each run of the garbage collector must revert
+the entered CAFs which are not required to complete the current
+evaluation (that is all the closures reachable from the set of
+runnable threads and the stable pointer table).
+
+It does this by performing garbage collection in three phases:
+\begin{enumerate}
+\item
+During the first phase, we ``mark'' all closures reachable from the
+scheduler state.
+
+How we ``mark'' closures depends on the garbage collector. For
+example, in a 2-space collector, closures are ``marked'' by copying
+them into ``to-space'', overwriting them with a forwarding node and
+``marking'' all the closures reachable from the copy. The only
+requirements are that we can test whether a closure is marked and if a
+closure is marked then so are all closures reachable from it.
+
+\ToDo{At present we say that the scheduler state includes any state
+that Hugs may have. This is not true anymore.}
+
+Performing this phase first provides us with a cheap test for
+execution closures: at this stage in execution, the execution closures
+are precisely the marked closures.
+
+\item
+During the second phase, we revert all unmarked CAFs on the CAF list
+and remove them from the CAF list.
+
+Since the CAF list is exactly the set of all entered CAFs, this reverts
+all entered CAFs which are not execution closures.
+
+\item
+During the third phase, we mark all top level objects (including CAFs)
+by calling \verb+MarkHugsRoots+ which will call \verb+MarkRoot+ for
+each top level object known to Hugs.
+
+\end{enumerate}
+
+To implement the second style of interactive behaviour (where we
+deliberately keep the CAF-related space leak), we simply omit the
+second phase. Omitting the second phase causes the third phase to
+mark any unmarked CAF value closures.
+
+So far, we have been describing a pure Hugs system which contains no
+machine generated code. The main difference in a hybrid system is
+that GHC-generated code is statically allocated in memory instead of
+being dynamically allocated on the heap. We split both
+\verb+CAF_unentered+ and \verb+CAF_entered+ into two versions: a
+static and a dynamic version. The static and dynamic versions of each
+CAF differ only in whether they are moved during garbage collection.
+When reverting CAFs, we revert dynamic entered CAFs to dynamic
+unentered CAFs and static entered CAFs to static unentered CAFs.
+
+
\Section{The Bytecode Evaluator}{bytecode-evaluator}
become arguments and are expected to be on the stack by the called
function.
-Hugs represents updateable thunks with @AP@ objects applying a closure
+Hugs represents updateable thunks with @AP_UPD@ objects applying a closure
to a list of arguments. (As for @PAP@s, unboxed arguments should be
preceded by a tag.) When it is entered, it pushes an update frame
followed by its payload on the stack, and enters the first word (which
-will be a pointer to a BCO). The layout of @AP@ objects is described
-in more detail in \secref{AP}.
+will be a pointer to a BCO). The layout of @AP_UPD@ objects is described
+in more detail in \secref{AP_UPD}.
Partial applications are represented by @PAP@ objects, which are
non-updatable.
\item[Selector]
-To enter a selector (\secref{THUNK_SEL}), we test whether the
+To enter a selector (\secref{THUNK_SELECTOR}), we test whether the
selectee is a value. If so, we simply select the appropriate
component; if not, it's simplest to treat it as a GHC-built closure
--- though we could interpret it if we wanted.
evaluation.}
\ToDo{I think update frames contain cost centres sometimes}
-\item
-If we are doing ``eager blackholing,'' we then overwrite the thunk
-with a black hole. Otherwise, we leave it to the garbage collector to
-black hole the thunk.
+\item If we are doing ``eager blackholing,'' we then overwrite the
+thunk with a black hole (\secref{BLACKHOLE}). Otherwise, we leave it
+to the garbage collector to black hole the thunk.
\item
Start evaluating the body of the expression.