[project @ 1998-01-18 01:08:41 by reid]

[ghc-hetmet.git] / docs / rts / rts.verb
diff --git a/docs/rts/rts.verb b/docs/rts/rts.verb

index 97a4fe0..c6d0de0 100644 (file)
--- a/docs/rts/rts.verb
+++ b/docs/rts/rts.verb
@@ -5,13 +5,12 @@
  
  % TODO:
  %
-% o I think it would be worth making the connection with CPS explicit.
+% o I (ADR) think it would be worth making the connection with CPS explicit.
  %   Now that we have explicit activation records (on the stack), we can
  %   explain the whole system in terms of CPS and tail calls --- with the
  %   one requirement that we carefuly distinguish stack-allocated objects
  %   from heap-allocated objects.
  
-
  % \documentstyle[preprint]{acmconf}
  \documentclass[11pt]{article}
  \oddsidemargin 0.1 in       %   Note that \oddsidemargin = \evensidemargin
@@ -20,7 +19,14 @@
  \marginparsep 0 in 
  \sloppy
  
-\newcommand{\note}[1]{{\em Note: #1}}
+%\usepackage{epsfig}
+
+%\newcommand{\note}[1]{{\em Note: #1}}
+\newcommand{\note}[1]{{{\bf Note:}\sl #1}}
+\newcommand{\ToDo}[1]{{{\bf ToDo:}\sl #1}}
+\newcommand{\Arg}[1]{\mbox{${\tt arg}_{#1}$}}
+\newcommand{\bottom}{bottom} % foo, can't remember the symbol name
+
  % DIMENSION OF TEXT:
  \textheight 8.5 in
  \textwidth 6.25 in
@@ -41,12 +47,9 @@
  
  \begin{document}
  
-\newcommand{\ToDo}[1]{{{\bf ToDo:}\sl #1}}
-\newcommand{\Arg}[1]{\mbox{${\tt arg}_{#1}$}}
-\newcommand{\bottom}{bottom} % foo, can't remember the symbol name
-
  \title{The STG runtime system (revised)}
  \author{Simon Peyton Jones \\ Glasgow University and Oregon Graduate Institute \and
+Simon Marlow \\ Glasgow University \and
  Alastair Reid \\ Yale University} 
  
  \maketitle
@@ -54,7 +57,8 @@ Alastair Reid \\ Yale University}
  \tableofcontents
  \newpage
  
-\section{Introduction}
+\part{Introduction}
+\section{Overview}
  
  This document describes the GHC/Hugs run-time system.  It serves as 
  a Glasgow/Yale/Nottingham ``contract'' about what the RTS does.
@@ -66,6 +70,10 @@ a Glasgow/Yale/Nottingham ``contract'' about what the RTS does.
  that a program can consist of a mixture of GHC-compiled and Hugs-interpreted
  code.
  
+\item The RTS supports concurrency by default.
+This has some costs (eg we can't do hardware stack checks) but
+reduces the number of different configurations we need to support.
+
  \item CAFs are only retained if they are
  reachable.  Since they are referred to by implicit references buried
  in code, this means that the garbage collector must traverse the whole
@@ -79,6 +87,19 @@ Using one stack instead of two reduces register pressure, reduces the
  size of update frames, and eliminates
  ``stack-stubbing'' instructions.)
  
+\item The ``return in registers'' return convention has been dropped
+because it was complicated and doesn't work well on register-poor
+architectures.  It has been partly replaced by unboxed tuples
+(section~\ref{sect:unboxed-tuples}) which allow the programmer to
+explicitly state where results should be returned in registers (or on
+the stack) instead of on the heap.
+
+\item
+
+Lazy black-holing has been replaced by eager black-holing.  The
+problem with lazy black-holing is that it leaves slop in the heap
+which conflicts with the use of a mostly-copying collector.
+
  \end{itemize} 
  
  \subsection{Wish list}
@@ -113,15 +134,12 @@ different subsets of the above features.
  You can make the following choices:
  \begin{itemize}
  \item
-Support for concurrency or parallelism.  There are four 
-mutually-exclusive choices.
+Support for parallelism.  There are three mutually-exclusive choices.
  
  \begin{description}
-\item[@SEQUENTIAL@] No concurrency or parallelism support.
-  This configuration might not support interrupt recovery.
-\item[@CONCURRENT@]  Support for concurrency but not for parallelism.
-\item[@CONCURRENT@+@GRANSIM@] Concurrency support and simulated parallelism.
-\item[@CONCURRENT@+@PARALLEL@]     Concurrency support and real parallelism.
+\item[@SEQUENTIAL@] Support for concurrency but not for parallelism.
+\item[@GRANSIM@]    Concurrency support and simulated parallelism.
+\item[@PARALLEL@]   Concurrency support and real parallelism.
  \end{description}
  
  \item @PROFILING@ adds cost-centre profiling.
@@ -137,2790 +155,4069 @@ Which garbage collector to use.  At the moment we
  only anticipate one, however.
  \end{itemize}
  
-\subsection{Terminology}
+\subsection{Glossary}
  
-\begin{itemize}
+\ToDo{This terminology is not used consistently within the document.
+If you find something which disagrees with this terminology, fix the
+usage.}
  
-\item A {\em word} is (at least) 32 bits and can hold either a signed
-or an unsigned int.
+In the type system, we have boxed and unboxed types.
  
-\item A {\em pointer} is (at least) 32 bits and big enough to hold a
-function pointer or a data pointer.  
+\begin{itemize}
  
-\item A {\em closure} is a (representation of) a value of a {\em pointed}
- type.  It may be in HNF or it may be unevaluated --- in either case, you can
- try to evaluate it again.
+\item A \emph{pointed} type is one that contains $\bot$.  Variables with
+pointed types are the only things which can be lazily evaluated.  In
+the STG machine, this means that they are the only things that can be 
+\emph{entered} or \emph{updated} and it requires that they be boxed.
  
-\item A {\em thunk} is a (representation of) a value of a {\em pointed}
- type which is {\em not} in HNF.
+\item An \emph{unpointed} type is one that does not contain $\bot$.
+Variables with unpointed types are never delayed --- they are always
+evaluated when they are constructed.  In the STG machine, this means
+that they cannot be \emph{entered} or \emph{updated}.  Unpointed objects
+may be boxed (like @Array#@) or unboxed (like @Int#@).
  
  \end{itemize}
  
-Occasionally, a field of a data structure must hold either a word or a
-pointer.  In such circumstances, it is {\em not safe} to assume that
-words and pointers are the same size.
-
-% Todo:
-% More terminology to mention.
-% unboxed, unpointed
-
-There are a few other system invariants which need to be mentioned ---
-though not necessarily here:
+In the implementation, we have different kinds of objects:
  
  \begin{itemize}
  
-\item The garbage collector never expands an object when it promotes
-it to the old generation.  This is important because the GC avoids
-performing heap overflow checks by assuming that the amount added to
-the old generation is no bigger than the current new generation.
-
-\end{itemize}
-
-
-\section{The Scheduler}
+\item \emph{boxed} objects are heap objects used by the evaluators
  
-The Scheduler is the heart of the run-time system.  A running program
-consists of a single running thread, and a list of runnable and
-blocked threads.  The running thread returns to the scheduler when any
-of the following conditions arises:
+\item \emph{unboxed} objects are not heap allocated
  
-\begin{itemize}
-\item A heap check fails, and a garbage collection is required
-\item Compiled code needs to switch to interpreted code, and vice
-versa.
-\item The thread becomes blocked.
-\item The thread is preempted.
-\end{itemize}
+\item \emph{stack} objects are allocated on the stack
  
-A running system has a global state, consisting of
+\item \emph{closures} are objects which can be \emph{entered}. 
+They are always boxed and always have boxed types.
+They may be in WHNF or they may be unevaluated.  
  
-\begin{itemize}
-\item @Hp@, the current heap pointer, which points to the next
-available address in the Heap.
-\item @HpLim@, the heap limit pointer, which points to the end of the
-heap.
-\item The Thread Preemption Flag, which is set whenever the currently
-running thread should be preempted at the next opportunity.
-\end{itemize}
+\item A \emph{thunk} is a (representation of) a value of a \emph{pointed}
+type which is \emph{not} in WHNF.
  
-Each thread has a thread-local state, which consists of
+\item A \emph{value} is an object in WHNF.  It can be pointed or unpointed.
  
-\begin{itemize}
-\item @TSO@, the Thread State Object for this thread.  This is a heap
-object that is used to store the current thread state when the thread
-is blocked or sleeping.
-\item @Sp@, the current stack pointer.
-\item @Su@, the current stack update frame pointer.  This register
-points to the most recent update frame on the stack, and is used to
-calculate the number of arguments available when entering a function.
-\item @SpLim@, the stack limit pointer.  This points to the end of the
-current stack chunk.
-\item Several general purpose registers, used for passing arguments to
-functions.
  \end{itemize}
  
-\noindent and various other bits of information used in specialised
-circumstances, such as profiling and parallel execution.  These are
-described in the appropriate sections.
  
-The following is pseudo-code for the inner loop of the scheduler
-itself.
-
-@
-while (threads_exist) {
-  // handle global problems: GC, parallelism, etc
-  if (need_gc) gc();  
-  if (external_message) service_message();
-  // deal with other urgent stuff
  
-  pick a runnable thread;
-  do {
-    switch (thread->whatNext) {
-      case (RunGHC  pc): status=runGHC(pc);  break;
-      case (RunHugs bc): status=runHugs(bc); break;
-    }
-    switch (status) {  // handle local problems
-      case (StackOverflow): enlargeStack; break;
-      case (Error e)      : error(thread,e); break;
-      case (ExitWith e)   : exit(e); break;
-      case (Yield)        : break;
-    }
-  } while (thread_runnable);
-}
-@
+At the hardware level, we have \emph{word}s and \emph{pointer}s.
  
-Optimisations to avoid excess trampolining from Hugs into itself.
-How do we invoke GC, ccalls, etc.
-General ccall (@ccall-GC@) and optimised ccall.
+\begin{itemize}
  
-\section{Evaluation}
+\item A \emph{word} is (at least) 32 bits and can hold either a signed
+or an unsigned int.
  
-This section describes the framework in which compiled code evaluates
-expressions.  Only at certain points will compiled code need to be
-able to talk to the interpreted world; these are discussed in Section
-\ref{sec:hugs-ghc-interaction}.
+\item A \emph{pointer} is (at least) 32 bits and big enough to hold a
+function pointer or a data pointer.  
  
-\subsection{Calling conventions}
+\end{itemize}
  
-\subsubsection{The call/return registers}
+Occasionally, a field of a data structure must hold either a word or a
+pointer.  In such circumstances, it is \emph{not safe} to assume that
+words and pointers are the same size.
  
-One of the problems in designing a virtual machine is that we want it
-abstract away from tedious machine details but still reveal enough of
-the underlying hardware that we can make sensible decisions about code
-generation.  A major problem area is the use of registers in
-call/return conventions.  On a machine with lots of registers, it's
-cheaper to pass arguments and results in registers than to pass them
-on the stack.  On a machine with very few registers, it's cheaper to
-pass arguments and results on the stack than to use ``virtual
-registers'' in memory.  We therefore use a hybrid system: the first
-$n$ arguments or results are passed in registers; and the remaining
-arguments or results are passed on the stack.  For register-poor
-architectures, it is important that we allow $n=0$.
  
-We'll label the arguments and results \Arg{1} \ldots \Arg{m} --- with
-the understanding that \Arg{1} \ldots \Arg{n} are in registers and
-\Arg{n+1} \ldots \Arg{m} are on top of the stack.
  
-Note that the mapping of arguments \Arg{1} \ldots \Arg{n} to machine
-registers depends on the {\em kinds} of the arguments.  For example,
-if the first argument is a Float, we might pass it in a different
-register from if it is an Int.  In fact, we might find that a given
-architecture lets us pass varying numbers of arguments according to
-their types.  For example, if a CPU has 2 Int registers and 2 Float
-registers then we could pass between 2 and 4 arguments in machine
-registers --- depending on whether they all have the same kind or they
-have different kinds.
+% Todo:
+% More terminology to mention.
+% unboxed, unpointed
  
-\subsubsection{Entering closures}
+\subsection{Subtle Dependencies}
  
-To evaluate a closure we jump to the entry code for the closure
-passing a pointer to the closure in \Arg{1} so that the entry code can
-access its environment.
+Some decisions have very subtle consequences which should be written
+down in case we want to change our minds.  
  
-\subsubsection{Function call}
+\begin{itemize}
  
-The function-call mechanism is obviously crucial.  There are five different
-cases to consider:
-\begin{enumerate}
+\item
  
-\item {\em Known combinator (function with no free variables) and enough arguments.}  
+If the garbage collector is allowed to shrink the stack of a thread,
+we cannot omit the stack check in return continuations
+(section~\ref{sect:heap-and-stack-checks}).
  
-A fast call can be made: push excess arguments onto stack and jump to
-function's {\em fast entry point} passing arguments in \Arg{1} \ldots
-\Arg{m}.  
+\item
  
-The {\em fast entry point} is only called with exactly the right
-number of arguments (in \Arg{1} \ldots \Arg{m}) so it can instantly
-start doing useful work without first testing whether it has enough
-registers or having to pop them off the stack first.
+When we return to the scheduler, the top object on the stack is a closure.
+The scheduler restarts the thread by entering the closure.
  
-\item {\em Known combinator and insufficient arguments.}
+Section~\ref{sect:hugs-return-convention} discusses how Hugs returns an
+unboxed value to GHC and how GHC returns an unboxed value to Hugs.
  
-A slow call can be made: push all arguments onto stack and jump to
-function's {\em slow entry point}.
+\item 
  
-Any unpointed arguments which are pushed on the stack must be tagged.
-This means pushing an extra word on the stack below the unpointed
-words, containing the number of unpointed words above it.
+When we return to the scheduler, we need a few empty words on the stack
+to store a closure to reenter.  Section~\ref{sect:heap-and-stack-checks}
+discusses who does the stack check and how much space they need.
  
-%Todo: forward ref about tagging?
-%Todo: picture?
+\item
  
-The {\em slow entry point} might be called with insufficient arguments
-and so it must test whether there are enough arguments on the stack.
-This {\em argument satisfaction check} consists of checking that
-@Su-Sp@ is big enough to hold all the arguments (including any tags).
+Heap objects never contain slop --- this is required if we want to
+support mostly-copying garbage collection.
  
-\begin{itemize} 
+This is a big problem when updating since the updatee is usually
+bigger than an indirection object.  The fix is to overwrite the end of
+the updatee with ``slop objects'' (described in
+section~\ref{sect:slop-objects}).
+This is hard to arrange if we do \emph{lazy} blackholing
+(section~\ref{sect:lazy-black-holing}) so we currently plan to
+blackhole an object when we push the update frame.
  
-\item If the argument satisfaction check fails, it is because there is
-one or more update frames on the stack before the rest of the
-arguments that the function needs.  In this case, we construct a PAP
-(partial application, section~\ref{sect:PAP}) containing the arguments
-which are on the stack.  The PAP construction code will return to the
-update frame with the address of the PAP in \Arg{1}.
  
-\item If the argument satisfaction check succeeds, we jump to the fast
-entry point with the arguments in \Arg{1} \ldots \Arg{arity}.
  
-If the fast entry point expects to receive some of \Arg{i} on the
-stack, we can reduce the amount of movement required by making the
-stack layout for the fast entry point look like the stack layout for
-the slow entry point.  Since the slow entry point is entered with the
-first argument on the top of the stack and with tags in front of any
-unpointed arguments, this means that if \Arg{i} is unpointed, there
-should be space below it for a tag and that the highest numbered
-argument should be passed on the top of the stack.
+\item
  
-We usually arrange that the fast entry point is placed immediately
-after the slow entry point --- so we can just ``fall through'' to the
-fast entry point without performing a jump.
+Info tables for constructors contain enough information to decide which
+return convention they use.  This allows Hugs to use a single piece of
+entry code for all constructors and insulates Hugs from changes in the
+choice of return convention.
  
  \end{itemize}
  
+\section{Source Language}
  
-\item {\em Known function closure (function with free variables) and enough arguments.}
+\subsection{Explicit Allocation}\label{sect:explicit-allocation}
  
-A fast call can be made: push excess arguments onto stack and jump to
-function's {\em fast entry point} passing a pointer to closure in
-\Arg{1} and arguments in \Arg{2} \ldots \Arg{m+1}.
+As in the original STG machine, (almost) all heap allocation is caused
+by executing a let(rec).  Since we no longer support the return in
+registers convention for data constructors, constructors now cause heap
+allocation and so they should be let-bound.
  
-Like the fast entry point for a combinator, the fast entry point for a
-closure is only called with appropriate values in \Arg{1} \ldots
-\Arg{m+1} so we can start work straight away.  The pointer to the
-closure is used to access the free variables of the closure.
+For example, we now write
+@
+> cons = \ x xs -> let r = (:) x xs in r
+@
+instead of
+@
+> cons = \ x xs -> (:) x xs
+@
  
+\note{For historical reasons, GHC doesn't use this syntax --- but it should.}
  
-\item {\em Known function closure and insufficient arguments.}
+\subsection{Unboxed tuples}\label{sect:unboxed-tuples}
  
-A slow call can be made: push all arguments onto stack and jump to the
-closure's slow entry point passing a pointer to the closure in \Arg{1}.
+Functions can take multiple arguments as easily as they can take one
+argument: there's no cost for adding another argument.  But functions
+can only return one result: the cost of adding a second ``result'' is
+that the function must construct a tuple of ``results'' on the heap.
+The assymetry is rather galling and can make certain programming
+styles quite expensive.  For example, consider a simple state transformer
+monad:
+@
+> type S a     = State -> (a,State)
+> bindS m k s0 = case m s0 of { (a,s1) -> k a s1 }
+> returnS a s  = (a,s)
+> getS s       = (s,s)
+> setS s _     = ((),s)
+@
+Here, every use of @returnS@, @getS@ or @setS@ constructs a new tuple
+in the heap which is instantly taken apart (and becomes garbage) by
+the case analysis in @bind@.  Even a short state-transformer program
+will construct a lot of these temporary tuples.
+
+Unboxed tuples provide a way for the programmer to indicate that they
+do not expect a tuple to be shared and that they do not expect it to
+be allocated in the heap.  Syntactically, unboxed tuples are just like
+single constructor datatypes except for the annotation @unboxed@.
+@
+> data unboxed AAndState# a = AnS a State
+> type S a = State -> AAndState# a
+> bindS m k s0 = case m s0 of { AnS a s1 -> k a s1 }
+> returnS a s  = AnS a s
+> getS s       = AnS s s
+> setS s _     = AnS () s
+@
+Semantically, unboxed tuples are just unlifted tuples and are subject
+to the same restrictions as other unpointed types.
  
-Again, the slow entry point performs an argument satisfaction check
-and either builds a PAP or pops the arguments off the stack into
-\Arg{2} \ldots \Arg{m+1} and jumps to the fast entry point.
+Operationally, unboxed tuples are never built on the heap.  When
+an unboxed tuple is returned, it is returned in multiple registers
+or multiple stack slots.  At first sight, this seems a little strange
+but it's no different from passing double precision floats in two
+registers.
  
+Notes:
+\begin{itemize}
+\item
+Unboxed tuples can only have one constructor and that
+thunks never have unboxed types --- so we'll never try to update an
+unboxed constructor.  The restriction to a single constructor is
+largely to avoid garbage collection complications.
  
-\item {\em Unknown function closure or thunk.}
+\item
+The core syntax does not allow variables to be bound to
+unboxed tuples (ie in default case alternatives or as function arguments)
+and does not allow unboxed tuples to be fields of other constructors.
+However, there's no harm in allowing it in the source syntax as a
+convenient, but easily removed, syntactic sugar.
  
-Sometimes, the function being called is not statically identifiable.
-Consider, for example, the @compose@ function:
+\item
+The compiler generates a closure of the form
  @
-  compose f g x = f (g x)
+> c = \ x y z -> C x y z
  @
-Since @f@ and @g@ are passed as arguments to @compose@, the latter has
-to make a heap call.  In a heap call the arguments are pushed onto the
-stack, and the closure bound to the function is entered.  In the
-example, a thunk for @(g x)@ will be allocated, (a pointer to it)
-pushed on the stack, and the closure bound to @f@ will be
-entered. That is, we will jump to @f@s entry point passing @f@ in
-\Arg{1}.  If \Arg{1} is passed on the stack, it is pushed on top of
-the thunk for @(g x)@.
-
-The {\em entry code} for an updateable thunk (which must have arity 0)
-pushes an update frame on the stack and starts executing the body of
-the closure --- using \Arg{1} to access any free variables.  This is
-described in more detail in section~\ref{sect:data-updates}.
+for every constructor (whether boxed or unboxed).  
  
-The {\em entry code} for a non-updateable closure is just the
-closure's slow entry point.
+This closure is normally used during desugaring to ensure that
+constructors are saturated and to apply any strictness annotations.
+They are also used when returning unboxed constructors to the machine
+code evaluator from the bytecode evaluator and when a heap check fails
+in a return continuation for an unboxed-tuple scrutinee.
  
-\end{enumerate}
+\end{itemize}
  
-In addition to the above considerations, if there are \emph{too many}
-arguments then the extra arguments are simply pushed on the stack with
-appropriate tags.
+\subsection{STG Syntax}
  
-To summarise, a closure's standard (slow) entry point performs the following:
-\begin{description}
-\item[Argument satisfaction check.] (function closure only)
-\item[Stack overflow check.]
-\item[Heap overflow check.]
-\item[Copy free variables out of closure.] %Todo: why?
-\item[Eager black holing.] (updateable thunk only) %Todo: forward ref.
-\item[Push update frame.]
-\item[Evaluate body of closure.]
-\end{description}
  
+\ToDo{Insert STG syntax with appropriate changes.}
  
-\subsection{Case expressions and return conventions}
-\label{sect:return-conventions}
  
-The {\em evaluation} of a thunk is always initiated by
-a @case@ expression.  For example:
-@
-  case x of (a,b) -> E
-@
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+\part{System Overview}
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
  
-The code for a @case@ expression looks like this:
+This part is concerned with defining the external interfaces of the
+major components of the system; the next part is concerned with their
+inner workings.
  
+The major components of the system are:
  \begin{itemize}
-\item Push the free variables of the branches on the stack (fv(@E@) in
-this case).
-\item  Push a \emph{return address} on the stack.
-\item  Evaluate the scrutinee (@x@ in this case).
-\end{itemize}
  
-Once evaluation of the scrutinee is complete, execution resumes at the
-return address, which points to the code for the expression @E@.
+\item 
  
-When execution resumes at the return point, there must be some {\em
-return convention} that defines where the components of the pair, @a@
-and @b@, can be found.  The return convention varies according to the
-type of the scrutinee @x@:
+The evaluators (section~\ref{sect:sm-overview}) are responsible for
+evaluating heap objects.  The system supports two evaluators: the
+machine code evaluator; and the bytecode evaluator.
  
-\begin{itemize}
+\item 
+
+The scheduler (section~\ref{sect:scheduler-overview}) acts as the
+coordinator for the whole system.  It is responsible for switching
+between evaluators, switching between threads, garbage collection,
+communication between multiple processors, etc.
  
  \item 
  
-(A space for) the return address is left on the top of the stack.
-Leaving the return address on the stack ensures that the top of the
-stack contains a valid activation record
-(section~\ref{sect:activation-records}) --- should a garbage collection
-be required.
+The storage manager (section~\ref{sect:evaluators-overview}) is
+responsible for allocating blocks of contiguous memory and for garbage
+collection.
  
-\item If @x@ has a pointed type (e.g.~a data constructor or a function),
-a pointer to @x@ is returned in \Arg{1}.
+\item 
  
-\ToDo{Warn that components of E should be extracted as soon as
-possible to avoid a space leak.}
+The loader (section~\ref{sect:loader-overview}) is responsible for
+loading machine code and bytecode files from the file system and for
+resolving references between separately compiled modules.
  
-\item If @x@ is an unpointed type (e.g.~@Int#@ or @Float#@), @x@ is
-returned in \Arg{1}
+\item 
  
-\item If @x@ is an unpointed tuple constructor, the components of @x@
-are returned in \Arg{1} \ldots \Arg{n} but no object is constructed in
-the heap.  Unboxed tuple constructors are useful for functions which
-want to return multiple values such as those used in an (explicitly
-encoded) state monad:
+The compilers (section~\ref{sect:compilers-overview}) generate machine
+code and bytecode files which can be loaded by the loader.
  
-\ToDo{Move stuff about unboxed tuples to a separate section}
+\end{itemize}
  
-@
-data unpointed AAndState a = AnS a State
-type S a = State -> AAndState a
+\ToDo{Insert diagram showing all components underneath the scheduler
+and communicating only with the scheduler}
  
-bindS m k s0 = case m s0 of { AnS s1 a -> k s1 a }
-returnS a s  = AnS a s
-@
  
-Note that unboxed datatypes can only have one constructor and that
-thunks never have unboxed types --- so we'll never try to update an
-unboxed constructor.  Unboxed tuples are \emph{never} built on the
-heap.
+\section{The Evaluators}\label{sect:evaluators-overview}
  
-When passing an unboxed tuple to a function, the components are
-flattened out and passed in \Arg{1} \ldots \Arg{n} as usual.
+There are two evaluators: a machine code evaluator and a bytecode
+evaluator.  The evaluators task is to evaluate code within a thread
+until one of the following happens:
  
+\begin{itemize}
+\item heap overflow
+\item stack overflow
+\item it is preempted
+\item it blocks in one of the concurrency primitives
+\item it performs a safe ccall
+\item it needs to switch to the other evaluator.
  \end{itemize}
  
-\subsection{Vectored Returns}
+The evaluators expect to find a closure on top of the thread's stack
+and terminate with a closure on top of the thread's stack.
  
-Many algebraic data types have more than one constructor.  For
-example, the @Maybe@ type is defined like this:
-@
-  data Maybe a = Nothing | Just a
-@
-How does the return convention encode which of the two constructors is
-being returned?  A @case@ expression scrutinising a value of @Maybe@
-type would look like this: 
-@
-  case E of 
-    Nothing -> ...
-    Just a  -> ...
-@
-Rather than pushing a return address before evaluating the scrutinee,
-@E@, the @case@ expression pushes (a pointer to) a {\em return
-vector}, a static table consisting of two code pointers: one for the
-@Just@ alternative, and one for the @Nothing@ alternative.  
+\subsection{Evaluation Model}
+\label{sect:evaluation-model}
  
-\begin{itemize}
+Whilst the evaluators differ internally, they share a common
+evaluation model and many object representations.
  
-\item
+\subsubsection{Heap Objects}
  
-The constructor @Nothing@ returns by jumping to the first item in the
-return vector with a pointer to a (statically built) Nothing closure
-in \Arg{1}.  
+The choice of heap and stack objects used by the evaluators is tightly
+bound to the evaluation model.  This section provides an overview of
+the most important heap and stack objects; further details are given
+later.
  
-It might seem that we could avoid loading \Arg{1} in this case since the
-first item in the return vector will know that @Nothing@ was returned
-(and can easily access the Nothing closure in the (unlikely) event
-that it needs it.  The only reason we load \Arg{1} is in case we have to
-perform an update (section~\ref{sect:data-updates}).
+All heap objects look like this:
  
-\item 
+\begin{center}
+\begin{tabular}{|l|l|l|l|}\hline
+\emph{Header} & \emph{Payload} \\ \hline
+\end{tabular}
+\end{center}
  
-The constructor @Just@ returns by jumping to the second element of the
-return vector with a pointer to the closure in \Arg{1}.  
+The header's vary between different kinds of object but they all start
+with a pointer to a pair consisting of an \emph{info table} and some
+\emph{entry code}.  The info table is used both by the evaluators and
+by the storage manager and contains an @INFO_TYPE@ field which
+identifies which kind of heap object uses it and determines the
+interpretation of the payload and of the other fields of the info
+table.  The entry code is some machine code used by the machine code
+evaluator to evaluate closures and raises an error for other kinds of
+objects.
  
-\end{itemize}
+The major kinds of heap object used are as follows.  (For simplicity,
+this description omits certain optimisations and extra fields required
+by the garbage collector.)
  
-In this way no test need be made to see which constructor returns;
-instead, execution resumes immediately in the appropriate branch of
-the @case@.
+\begin{description}
  
-\subsection{Direct Returns}
+\item[Constructors] are used to represent data constructors.  Their
+payload consists of the fields of the constructor; the tag of the
+constructor is stored in the info table.
  
-When a datatype has a large number of constructors, it may be
-inappropriate to use vectored returns.  The vector tables may be
-large and sparse, and it may be better to identify the constructor
-using a test-and-branch sequence on the tag.  For this reason, we
-provide an alternative return convention, called a \emph{direct
-return}.
+\begin{center}
+\begin{tabular}{|l|l|l|l|}\hline
+@CONSTR@ & \emph{Fields} \\ \hline
+\end{tabular}
+\end{center}
  
-In a direct return, the return address pushed on the stack really is a
-code pointer.  The returning code loads a pointer to the closure being
-returned in \Arg{1} as usual, and also loads the tag into \Arg{2}.
-The code at the return address will test the tag and jump to the
-appropriate code for the case branch.
+\item[Primitive objects] are used to represent objects with unpointed
+types which are too large to fit in a register (or stack slot) or for
+which sharing must be preserved.  Primitive objects include large
+objects such as multiple precision integers and immutable arrays and
+mutable objects such as mutable arrays, mutable variables, MVar's,
+IVar's and foreign object pointers.  Since unpointed objects are not
+pointed, they cannot be entered.  Their payload varies according to
+the kind of object.
  
-The choice of whether to use a vectored return or a direct return is
-made on a type-by-type basis --- up to a certain maximum number of
-constructors imposed by the update mechanism
-(section~\ref{sect:data-updates}).
+\item[Function closures] are used to represent functions.  Their
+payload (if any) consists of the free variables of the function.
  
-Single-constructor data types also use direct returns, although in
-that case there is no need to return a tag in \Arg{2}.
+\begin{center}
+\begin{tabular}{|l|l|l|l|}\hline
+@FUN@ & \emph{Free Variables} \\ \hline
+\end{tabular}
+\end{center}
  
-\ToDo{Say whether we pop the return address before returning}
+Function closures are only generated by the machine code compiler.
  
-\ToDo{Stack stubbing?}
+\item[Thunks] are used to represent unevaluated expressions which will
+be updated with their result.  Their payload (if any) consists of the
+free variables of the function.  The entry code for a thunk starts by
+pushing an \emph{update frame} onto the stack and overwriting the
+thunk with a \emph{black hole}.  When evaluation of the thunk
+completes, the update frame will cause the thunk to be overwritten
+again with an \emph{indirection} to the result of the thunk.
  
-\subsection{Updates}
-\label{sect:data-updates}
+\begin{center}
+\begin{tabular}{|l|l|l|l|}\hline
+@THUNK@ & \emph{Free Variables} \\ \hline
+\end{tabular}
+\end{center}
  
-The entry code for an updatable thunk (which must also be of arity 0):
+Thunks are only generated by the machine code compiler.
  
-\begin{itemize}
-\item copies the free variables out of the thunk into registers or
-  onto the stack.
-\item pushes an {\em update frame} onto the stack.
+\item[Byte-code Objects (@BCO@s)] are generated by the bytecode
+compiler.  In conjunction with \emph{updateable applications} and
+\emph{non-updateeable applications} they are used to represent
+functions, unevaluated expressions and return addresses.
  
-An update frame is a small activation record consisting of
  \begin{center}
-\begin{tabular}{|l|l|l|}
-\hline
-{\em Fixed header} & {\em Update Frame link} & {\em Updatee} \\
-\hline
+\begin{tabular}{|l|l|l|l|}\hline
+@BCO@ & \emph{Constant Pool} & \emph{Bytecodes} \\ \hline
  \end{tabular}
  \end{center}
  
-\note{In the semantics part of the STG paper (section 5.6), an update
-frame consists of everything down to the last update frame on the
-stack.  This would make sense too --- and would fit in nicely with
-what we're going to do when we add support for speculative
-evaluation.}
-\ToDo{I think update frames contain cost centres sometimes}
-
-\item 
-If we are doing ``eager blackholing,'' we then overwrite the thunk
-with a black hole.  Otherwise, we leave it to the garbage collector to
-black hole the thunk.
+\item[Non-updatable Applications] are used to represent the
+application of a function to an insufficient number of arguments.
+Their payload consists of the function and the arguments received so far.
  
-\item 
-Start evaluating the body of the expression.
+\begin{center}
+\begin{tabular}{|l|l|l|l|}\hline
+@PAP@ & \emph{Function Closure} & \emph{Arguments} \\ \hline
+\end{tabular}
+\end{center}
  
-\end{itemize}
+@PAP@s are used when a function is applied to too few arguments and by
+code generated by the lambda-lifting phase of the bytecode compiler.
  
-When the expression finishes evaluation, it will enter the update
-frame on the top of the stack.  Since the returner doesn't know
-whether it is entering a normal return address/vector or an update
-frame, we follow exactly the same conventions as return addresses and
-return vectors.  That is, on entering the update frame:
+\item[Updatable Applications] are used to represent the application of
+a function to a sufficient number of arguments.  Their payload
+consists of the function and its arguments.  
  
-\begin{itemize} 
-\item The value of the thunk is in \Arg{1}.  (Recall that only thunks
-are updateable and that thunks return just one value.)
+Updateable applications are like thunks: on entering an updateable
+application, the evaluators push an \emph{update frame} onto the stack
+and overwrite the application with a \emph{black hole}; when
+evaluation completes, the evaluators overwrite the application with an
+\emph{indirection} to the result of the application.
  
-\item If the data type is a direct-return type rather than a
-vectored-return type, then the tag is in \Arg{2}.
+\begin{center}
+\begin{tabular}{|l|l|l|l|}\hline
+@AP@ & \emph{Function Closure} & \emph{Arguments} \\ \hline
+\end{tabular}
+\end{center}
  
-\item The update frame is still on the stack.
-\end{itemize}
+@AP@s are only generated by the bytecode compiler.
  
-We can safely share a single statically-compiled update function
-between all types.  However, the code must be able to handle both
-vectored and direct-return datatypes.  This is done by arranging that
-the update code looks like this:
+\item[Black holes] are used to mark updateable closures which are
+currently being evaluated.  ``Black holing'' an object cures a
+potential space leak and detects certain classes of infinite loops.
+More imporantly, black holes act as synchronisation objects between
+separate threads: if a second thread tries to enter an updateable
+closure which is already being evaluated, the second thread is added
+to a list of blocked threads and the thread is suspended.
  
-@
-               |       ^       |
-               | return vector |
-               |---------------|
-               |  fixed-size   |
-               |  info table   |
-               |---------------|  <- update code pointer
-               |  update code  |
-               |       v       |
-@
+When evaluation of the black-holed closure completes, the black hole
+is overwritten with an indirection to the result of the closure and
+any blocked threads are restored to the runnable queue.
  
-Each entry in the return vector (which is large enough to cover the
-largest vectored-return type) points to the update code.
+\begin{center}
+\begin{tabular}{|l|l|l|l|}\hline
+@BH@ & \emph{Blocked threads} \\ \hline
+\end{tabular}
+\end{center}
  
-The update code:
-\begin{itemize}
-\item overwrites the {\em updatee} with an indirection to \Arg{1};
-\item loads @Su@ from the Update Frame link;
-\item removes the update frame from the stack; and 
-\item enters \Arg{1}.
-\end{itemize}
+\ToDo{In a single threaded system, it's trivial to detect infinite
+loops: reentering a BH is always an error.  How easy is it in a
+multi-threaded system?}
  
-We enter \Arg{1} again, having probably just come from there, because
-it knows whether to perform a direct or vectored return.  This could
-be optimised by compiling special update code for each slot in the
-return vector, which performs the correct return.
+\item[Indirections] are used to update an unevaluated closure with its
+(usually fully evaluated) result in situations where it isn't possible
+to perform an update in place.  (In the current system, we always
+update with an indirection to avoid duplicating the result when doing
+an update in place.)
  
-\subsection{Semi-tagging}
-\label{sect:semi-tagging}
+\begin{center}
+\begin{tabular}{|l|l|l|l|}\hline
+@IND@ & \emph{Closure} \\ \hline
+\end{tabular}
+\end{center}
  
-When a @case@ expression evaluates a variable that might be bound
-to a thunk it is often the case that the scrutinee is already evaluated.
-In this case we have paid the penalty of (a) pushing the return address (or
-return vector address) on the stack, (b) jumping through the info pointer
-of the scrutinee, and (c) returning by an indirect jump through the
-return address on the stack.
+Indirections needn't always point to an evaluated closure.  They can
+point to a chain of indirections which point to an evaluated closure.
+When revertible black holes are added, they may also point to reverted
+black holes.
  
-If we knew that the scrutinee was already evaluated we could generate
-(better) code which simply jumps to the appropriate branch of the @case@
-with a pointer to the scrutinee in \Arg{1}.
+\item[Thread State Objects (@TSO@s)] represent Haskell threads.  Their
+payload consists of a unique thread id, the status of the thread
+(runnable, blocked, etc) and the stack.  @TSO@s may be resized by the
+scheduler if its stack is too small or too large.
  
-An obvious idea, therefore, is to test dynamically whether the heap
-closure is a value (using the tag in the info table).  If not, we
-enter the closure as usual; if so, we jump straight to the appropriate
-alternative.  Here, for example, is pseudo-code for the expression
-@(case x of { (a,_,c) -> E }@:
-@
-      \Arg{1} = <pointer to x>;
-      tag = \Arg{1}->entry->tag;
-      if (isWHNF(tag)) {
-          Sp--;  \\ insert space for return address
-         goto ret;
-      }
-      push(ret);           
-      goto \Arg{1}->entry;
-      
-      <info table for return address goes here>
-ret:  a = \Arg{1}->data1; \\ suck out a and c to avoid space leak
-      c = \Arg{1}->data3;
-      <code for E2>
-@
-and here is the code for the expression @(case x of { [] -> E1; x:xs -> E2 }@:
-@
-      \Arg{1} = <pointer to x>;
-      tag = \Arg{1}->entry->tag;
-      if (isWHNF(tag)) {
-         Sp--;  \\ insert space for return address
-         goto retvec[tag];
-      }
-      push(retinfo);          
-      goto \Arg{1}->entry;
-      
-      .addr ret2
-      .addr ret1
-retvec:           \\ reversed return vector
-      <return info table for case goes here>
-retinfo:
-      panic("Direct return into vectored case");
-      
-ret1: <code for E1>
+\begin{center}
+\begin{tabular}{|l|l|l|l|}\hline
+@TSO@ & \emph{Thread Id} & \emph{Status} & \emph{Stack} \\ \hline
+\end{tabular}
+\end{center}
  
-ret2: x  = \Arg{1}->head;
-      xs = \Arg{1}->tail;
-      <code for E2>
-@
-There is an obvious cost in compiled code size (but none in the size
-of the bytecodes).  There is also a cost in execution time if we enter
-more thunks than data constructors.
+\end{description}
  
-Both the direct and vectored returns are easily modified to chase chains
-of indirections too.  In the vectored case, this is most easily done by
-making sure that @IND = TAG_1 - 1@, and adding an extra field to every
-return vector.  In the above example, the indirection code would be
-@
-ind:  \Arg{1} = \Arg{1}->next;
-      goto ind_loop;
-@
-where @ind_loop@ is the second line of code.
+\subsubsection{Stack Objects}
  
-Note that we have to leave space for a return address since the return
-address expects to find one.  If the body of the expression requires a
-heap check, we will actually have to write the return address before
-entering the garbage collector.
+The stack contains a mixture of \emph{pending arguments} and 
+\emph{stack objects}.
  
+Pending arguments are arguments to curried functions which have not
+yet been incorporated into an activation frame.  For example, when
+evaluating @let { g x y = x + y; f x = g{x} } in f{3,4}@, the
+evaluator pushes both arguments onto the stack and enters @f@.  @f@
+only requires one argument so it leaves the second argument as a
+\emph{pending argument}.  The pending argument remains on the stack
+until @f@ calls @g@ which requires two arguments: the argument passed
+to it by @f@ and the pending argument which was passed to @f@.
  
-\subsection{Heap and Stack Checks}
+Unboxed pending arguments are always preceeded by a ``tag'' which says
+how large the argument is.  This allows the garbage collector to
+locate pointers within the stack.
  
-\note{I reckon these deserve a subsection of their own}
+There are three kinds of stack object: return addresses, update frames
+and seq frames.  All stack objects look like this
  
-Don't move heap pointer before success occurs.
-Talk about how stack check looks ahead into the branches of case expressions.
+\begin{center}
+\begin{tabular}{|l|l|l|l|}\hline
+\emph{Header} & \emph{Payload} \\ \hline
+\end{tabular}
+\end{center}
  
-\subsection{Handling interrupts/signals}
+As with heap objects, the header starts with a pointer to a pair
+consisting of an \emph{info table} and some \emph{entry code}.
  
-@
-May have to keep C stack pointer in register to placate OS?
-May have to revert black holes - ouch!
-@
+\begin{description}
  
-\section{Switching Worlds}
+\item[Return addresses] are used to cause selection and execution of
+case alternatives when a constructor is returned.  Return addresses
+generated by the machine code compiler look like this:
  
-Because this is a combined compiled/interpreted system, the
-interpreter will sometimes encounter compiled code, and vice-versa.
+\begin{center}
+\begin{tabular}{|l|l|l|l|}\hline
+\emph{@RET_ADDR@} & \emph{Free Variables of the case alternatives} \\ \hline
+\end{tabular}
+\end{center}
  
-There are six cases we need to consider:
+The free variables are a mixture of pointers and non-pointers whose
+layout is described by the info table.
  
-\begin{enumerate}
-\item A GHC thread enters a Hugs-built thunk.
-\item A GHC thread calls a Hugs-compiled function.
-\item A GHC thread returns to a Hugs-compiled return address.
-\item A Hugs thread enters a GHC-built thunk.
-\item A Hugs thread calls a GHC-compiled function.
-\item A Hugs thread returns to a Hugs-compiled return address.
-\end{enumerate}
+Return addresses generated by the bytecode compiler look like this:
+\begin{center}
+\begin{tabular}{|l|l|l|l|}\hline
+\emph{@BCO_RET@} & \emph{BCO} & \emph{Free Variables of the case alternatives} \\ \hline
+\end{tabular}
+\end{center}
  
-\subsection{A GHC thread enters a Hugs-built thunk}
+There is just one @BCO_RET@ info pointer.  We avoid needing different
+@BCO_RET@s for each stack layout by tagging unboxed free variables as
+though they were pending arguments.
  
-A Hugs-built thunk looks like this:
+\item[Update frames] are used to trigger updates.  When an update
+frame is entered, it overwrites the updatee with an indirection to the
+result, restarts any threads blocked on the @BH@ and returns to the
+stack object underneath the update frame.
  
  \begin{center}
-\begin{tabular}{|l|l|}
-\hline
-\emph{Hugs} & \emph{Hugs-specific information} \\
-\hline
+\begin{tabular}{|l|l|l|l|}\hline
+\emph{@UPDATE@} & \emph{Next Update Frame} & \emph{Updatee} \\ \hline
  \end{tabular}
  \end{center}
  
-\noindent where \emph{Hugs} is a pointer to a small
-statically-compiled piece of code that does the following:
+\item[Seq frames] are used to implement the polymorphic @seq@ primitive.
+They are a special kind of update frame.
  
-\begin{itemize}
-\item Push the address of the thunk on the stack.
-\item Push @entertop@ on the stack.
-\item Save the current state of the thread in the TSO.
-\item Return to the scheduler, with the @whatNext@ field set to
-@RunHugs@.
-\end{itemize}
+\ToDo{Describe them properly}
  
-\noindent where @entertop@ is a small statically-compiled piece of
-code that does the following:
  
-\begin{itemize}
-\item pop the return address from the stack.
-\item pop the next word off the stack into \Arg{1}.
-\item enter \Arg{1}.
-\end{itemize}
+\end{description}
  
-The infotable for @entertop@ has some byte-codes attached that do
-essentially the same thing if the code is entered from Hugs.
+\ToDo{We also need a stop frame which goes on the bottom of the stack 
+when the thread terminates.}
  
-\subsection{A GHC thread calls a Hugs-compiled function}
  
-How do we do this?
+\subsubsection{Case expressions}
  
-\subsection{A GHC thread returns to a Hugs-compiled return address}
+In the STG language, all evaluation is triggered by evaluating a case
+expression.  When evaluating a case expression @case e of alts@, the
+evaluator push a return address onto the stack and evaluate the
+expression @e@.  When @e@ eventually reduces to a constructor, the
+return address on the stack is entered.  The details of how the
+constructor is passed to the return address and how the appropriate
+case alternative is selected vary between evaluators.
  
-When Hugs pushes return addresses on the stack, they look like this:
+Case expressions for unboxed data types are essentially the same: the
+case expression pushes a return address onto the stack before
+evaluating the scrutinee; when a function returns an unboxed value, it
+enters the return address on top of the stack.
  
-@
-       |               |
-       |_______________|
-       |               |  -----> bytecode object
-       |_______________|
-       |               |  _____
-       |_______________|       |___ GHC-friendly return code
-                                       _____
-                                       |    |
-                                       |    | Info Table
-                                       |____|
-                                       .    .
-                                       .    . Code
-                                       .    .
-@
  
-If GHC is returning, it will return to the address at the top of the
-stack.  The code at this address
+\subsubsection{Function Applications}
  
-\begin{itemize}
-\item saves the thread state in the TSO
-\item returns to the scheduler with a @whatNext@ field of @RunHugs@.
-\end{itemize}
+In the STG language, all function calls are tail calls.  The arguments
+are pushed onto the stack and the function closure is entered.  If any
+arguments are unboxed, they must be tagged as unboxed pending
+arguments.  Entering a closure is just a special case of calling a
+function with no arguments.
  
-If Hugs is returning to one of these addresses, it can spot the
-special return address at the top and instead jump to the bytecodes
-pointed to by the second word on the stack.
  
-\subsection{A Hugs thread enters a GHC-compiled thunk}
+\subsubsection{Let expressions}
  
-When Hugs is called on to enter a non-Hugs closure (these are
-recognisable by the lack of a \emph{Hugs} pointer at the front), the
-following sequence of instructions is executed:
+In the STG language, almost all heap allocation is caused by let
+expressions.  Filling in the contents of a set of mutually recursive
+heap objects is simple enough; the only difficulty is that once the heap space has been allocated, the thread must not return to the scheduler until
+after the objects are filled in.
  
-\begin{itemize}
-\item Push the address of the thunk on the stack.
-\item Push @entertop@ on the stack.
-\item Save the current state of the thread in the TSO.
-\item Return to the scheduler, with the @whatNext@ field set to
-@RunGHC@.
-\end{itemize}
  
-\subsection{A Hugs thread calls a GHC-compiled function}
+\subsubsection{Primitive Operations}
  
-Hugs never calls GHC-functions directly, it only enters closures
-(which point to the slow entry point for the function).  Hence in this
-case, we just push the arguments on the stack and proceed as for a
-thunk.
+\ToDo{}
  
-\subsection{A Hugs thread returns to a GHC-compiled return address}
+Most primops are simple, some aren't.
  
-\section{Heap objects}
-\label{sect:fixed-header}
  
-\ToDo{Fix this picture}
  
-\begin{figure}
-\begin{center}
-\input{closure}
-\end{center}
-\caption{A closure}
-\label{fig:closure}
-\end{figure}
  
-Every {\em heap object}, or {\em closure} is a contiguous block
-of memory, consisting of a fixed-format {\em header} followed
-by zero or more {\em data words}.
-The header consists of
-the following fields:
-\begin{itemize}
-\item A one-word {\em info pointer}, which points to
-the object's static {\em info table}.
-\item Zero or more {\em admin words} that support
-\begin{itemize}
-\item Profiling (notably a {\em cost centre} word).
-  \note{We could possibly omit the cost centre word from some 
-  administrative objects.}
-\item Parallelism (e.g. GranSim keeps the object's global address here,
-though GUM keeps a separate hash table).
-\item Statistics (e.g. a word to track how many times a thunk is entered.).
  
-We add a Ticky word to the fixed-header part of closures.  This is
-used to record indicate if a closure has been updated but not yet
-entered. It is set when the closure is updated and cleared when
-subsequently entered.
  
-NB: It is {\em not} an ``entry count'', it is an
-``entries-after-update count.''  The commoning up of @CONST@,
-@CHARLIKE@ and @INTLIKE@ closures is turned off(?) if this is
-required. This has only been done for 2s collection.
+\section{Scheduler}\label{sect:scheduler-overview}
  
+The Scheduler is the heart of the run-time system.  A running program
+consists of a single running thread, and a list of runnable and
+blocked threads.  A thread is represented by a \emph{Thread Status Object} (TSO), which contains a few words consist of a stack and a few words of
+status information.  Except for the running thread, all threads have a
+closure on top of their stack; the scheduler restarts a thread by
+entering an evaluator which performs some reduction and returns to the
+scheduler.
  
+\subsection{The scheduler's main loop}
  
-\end{itemize}
-\end{itemize}
-Most of the RTS is completely insensitive to the number of admin words.
-The total size of the fixed header is @FIXED_HS@.
+The scheduler consists of a loop which chooses a runnable thread and
+invokes one of the evaluators which performs some reduction and
+returns.
  
-Many heap objects contain fields allowing them to be inserted onto lists
-during evaluation or during garbage collection. The lists required by
-the evaluator and storage manager are as follows.
+The scheduler also takes care of system-wide issues such as heap
+overflow or communication with other processors (in the parallel
+system) and thread-specific problems such as stack overflow.
+
+\subsection{Creating a thread}
+
+Threads are created:
  
  \begin{itemize}
-\item 2 lists of threads: runnable threads and sleeping threads.
  
-\item The {\em static object list} is a list of all statically
-allocated objects which might contain pointers into the heap.
-(Section~\ref{sect:static-objects}.)
+\item
  
-\item The {\em updated thunk list} is a list of all thunks in the old
-generation which have been updated with an indirection.  
-(Section~\ref{sect:IND_OLDGEN}.)
+When the scheduler is first invoked.
  
-\item The {\em mutables list} is a list of all other objects in the
-old generation which might contain pointers into the new generation.
-Most of the object on this list are ``mutable.''
-(Section~\ref{sect:mutables}.)
+\item
  
-\item The {\em Foreign Object list} is a list of all foreign objects
- which have not yet been deallocated. (Section~\ref{sect:FOREIGN}.)
+When a message is received from another processor (I think). (Parallel
+system only.)
  
-\item The {\em Spark pool} is a doubly(?) linked list of Spark objects
-maintained by the parallel system.  (Section~\ref{sect:SPARK}.)
+\item
  
-\item The {\em Blocked Fetch list} (or
-lists?). (Section~\ref{sect:BLOCKED_FETCH}.)
+When a C program calls some Haskell code.
  
-\item For each thread, there is a list of all update frames on the
-stack.  (Section~\ref{sect:data-updates}.)
+\item
  
+By @forkIO@, @takeMVar@ and (maybe) other Concurrent Haskell primitives.
  
  \end{itemize}
  
-\ToDo{The links for these fields are usually inserted immediately
-after the fixed header except ...}
-
-\subsection{Info Tables}
  
-An {\em info table} is a contiguous block of memory, {\em laid out
-backwards}.  That is, the first field in the list that follows
-occupies the highest memory address, and the successive fields occupy
-successive decreasing memory addresses.
+\subsection{Restarting a thread}
  
-\begin{center}
-\begin{tabular}{|c|}
-   \hline Parallelism Info 
-\\ \hline Profile Info 
-\\ \hline Debug Info 
-\\ \hline Tag/bytecode pointer
-\\ \hline Static reference table 
-\\ \hline Storage manager layout info
-\\ \hline Closure type 
-\\ \hline entry code \ldots
-\\ \hline
-\end{tabular}
-\end{center}
-An info table has the following contents (working backwards in memory
-addresses):
+When the scheduler decides to run a thread, it has to decide which
+evaluator to use.  It does this by looking at the type of the closure
+on top of the stack.
  \begin{itemize}
-\item The {\em entry code} for the closure.
-This code appears literally as the (large) last entry in the
-info table, immediately preceded by the rest of the info table.
-An {\em info pointer} always points to the first byte of the entry code.
+\item @BCO@ $\Rightarrow$ bytecode evaluator
+\item @FUN@ or @THUNK@ $\Rightarrow$ machine code evaluator
+\item @CONSTR@ $\Rightarrow$ machine code evaluator
+\item other $\Rightarrow$ either evaluator.
+\end{itemize}
  
-\item A one-word {\em closure type field}, @INFO_TYPE@, identifies what kind
-of closure the object is.  The various types of closure are described
-in Section~\ref{sect:closures}.
-In some configurations, some useful properties of 
-closures (is it a HNF?  can it be sparked?)
-are represented as high-order bits so they can be tested quickly.
+The only surprise in the above is that the scheduler must enter the
+machine code evaluator if there's a constructor on top of the stack.
+This allows the bytecode evaluator to return a constructor to a
+machine code return address by pushing the constructor on top of the
+stack and returning to the scheduler.  If the return address under the
+constructor is @HUGS_RET@, the entry code for @HUGS_RET@ will
+rearrange the stack so that the return @BCO@ is on top of the stack
+and return to the scheduler which will then call the bytecode
+evaluator.  There is little point in trying to shorten this slightly
+indirect route since it is will happen very rarely if at all.
  
-\item A single pointer or word --- the {\em storage manager info field},
-@INFO_SM@, contains auxiliary information describing the closure's
-precise layout, for the benefit of the garbage collector and the code
-that stuffs graph into packets for transmission over the network.
+\note{As an optimisation, we could store the choice of evaluator in
+the TSO status whenever we leave the evaluator.  This is required for
+any thread, no matter what state it is in (blocked, stack overflow,
+etc).  It isn't clear whether this would accomplish anything.}
  
-\item A one-pointer {\em Static Reference Table (SRT) pointer}, @INFO_SRT@, points to
-a table which enables the garbage collector to identify all accessible
-code and CAFs.  They are fully described in Section~\ref{sect:srt}.
+\subsection{Returning from a thread}
  
-\item A one-pointer {\em tag/bytecode-pointer} field, @INFO_TAG@ or @INFO_BC@.  
-For data constructors this field contains the constructor tag, in the
-range $0..n-1$ where $n$ is the number of constructors.
+The evaluators return to the scheduler when any of the following
+conditions arise:
  
-For other objects that can be entered this field points to the byte
-codes for the object.  For the constructor case you can think of the
-tag as the name of a a suitable bytecode sequence but it can also be used to
-implement semi-tagging (section~\ref{sect:semi-tagging}).
+\begin{itemize}
+\item A heap check fails, and a garbage collection is required
+\item Compiled code needs to switch to interpreted code, and vice versa.
+\item The evaluator needs to perform a ``safe'' C call.
+\item The thread becomes blocked.
+\item The thread is preempted.
+\item The thread terminates.
+\end{itemize}
  
-One awkward question (which may not belong here) is ``how does the
-bytecode interpreter know whether to do a vectored return?''  The
-answer is it examines the @INFO_TYPE@ field of the return address:
-@RET_VEC_@$sz$ requires a vectored return and @RET_@$sz$ requires a
-direct return.
+Except when the thread terminates, the thread always terminates with a
+closure on the top of the stack.  
  
-\item {\em Profiling info\/}
+\subsection{Returning to the Scheduler}
+\label{sect:switching-worlds}
  
-Closure category records are attached to the info table of the
-closure. They are declared with the info table. We put pointers to
-these ClCat things in info tables.  We need these ClCat things because
-they are mutable, whereas info tables are immutable.  Hashing will map
-similar categories to the same hash value allowing statistics to be
-grouped by closure category.
+\ToDo{This ignores the other three ways of returning}
  
-Cost Centres and Closure Categories are hashed to provide indexes
-against which arbitrary information can be stored. These indexes are
-memoised in the appropriate cost centre or category record and
-subsequent hashes avoided by the index routine (it simply returns the
-memoised index).
+The evaluators return to the scheduler under three circumstances:
+\begin{itemize}
  
-There are different features which can be hashed allowing information
-to be stored for different groupings. Cost centres have the cost
-centre recorded (using the pointer), module and group. Closure
-categories have the closure description and the type
-description. Records with the same feature will be hashed to the same
-index value.
+\item
  
-The initialisation routines, @init_index_<feature>@, allocate a hash
-table in which the cost centre / category records are stored. The
-lower bound for the table size is taken from @max_<feature>_no@. They
-return the actual table size used (the next power of 2). Unused
-locations in the hash table are indicated by a 0 entry. Successive
-@init_index_<feature>@ calls just return the actual table size.
+When they enter a closure built by the other evaluator.  That is, when
+the bytecode interpreter enters a closure compiled by GHC or when the
+machine code evaluator enters a BCO.
  
-Calls to @index_<feature>@ will insert the cost centre / category
-record in the @<feature>@ hash table, if not already inserted. The hash
-index is memoised in the record and returned. 
+\item
  
-CURRENTLY ONLY ONE MEMOISATION SLOT IS AVILABLE IN EACH RECORD SO
-HASHING CAN ONLY BE DONE ON ONE FEATURE FOR EACH RECORD. This can be
-easily relaxed at the expense of extra memoisation space or continued
-rehashing.
+When they return to a return continuation built by the other
+evaluator.  That is, when the machine code evaluator returns to a
+continuation built by Hugs or when the bytecode evaluator returns to a
+continuation built by GHC.
  
-The initialisation routines must be called before initialisation of
-the stacks and heap as they require to allocate storage. It is also
-expected that the caller may want to allocate additional storage in
-which to store profiling information based on the return table size
-value(s).
+\item
  
-\begin{center}
-\begin{tabular}{|l|}
-   \hline Hash Index
-\\ \hline Selected
-\\ \hline Kind
-\\ \hline Description String
-\\ \hline Type String
-\\ \hline
-\end{tabular}
-\end{center}
+When a heap or stack check fails or when the preemption flag is set.
  
-\begin{description}
-\item[Hash Index] Memoised copy
-\item[Selected] 
-  Is this category selected (-1 == not memoised, selected? 0 or 1)
-\item[Kind]
-One of the following values (defined in CostCentre.lh):
+\end{itemize}
  
-\begin{description}
-\item[@CON_K@]
-A constructor.
-\item[@FN_K@]
-A literal function.
-\item[@PAP_K@]
-A partial application.
-\item[@THK_K@]
-A thunk, or suspension.
-\item[@BH_K@]
-A black hole.
-\item[@ARR_K@]
-An array.
-\item[@ForeignObj_K@]
-A Foreign object (non-Haskell heap resident).
-\item[@SPT_K@]
-The Stable Pointer table.  (There should only be one of these but it
-represents a form of weak space leak since it can't shrink to meet
-non-demand so it may be worth watching separately? ADR)
-\item[@INTERNAL_KIND@]
-Something internal to the runtime system.
-\end{description}
+In all cases, they return to the scheduler with a closure on top of
+the stack.  The mechanism used to trigger the world switch and the
+choice of closure left on top of the stack varies according to which
+world is being left and what is being returned.
  
+\subsubsection{Leaving the bytecode evaluator}
+\label{sect:hugs-to-ghc-switch}
  
-\item[Description] Source derived string detailing closure description.
-\item[Type] Source derived string detailing closure type.
-\end{description}
+\paragraph{Entering a machine code closure}
  
-\item {\em Parallelism info\/}
-\ToDo{}
+When it enters a closure, the bytecode evaluator performs a switch
+based on the type of closure (@AP@, @PAP@, @Ind@, etc).  On entering a
+machine code closure, it returns to the scheduler with the closure on
+top of the stack.
  
-\item {\em Debugging info\/}
-\ToDo{}
+\paragraph{Returning a constructor}
  
-\end{itemize}
+When it enters a constructor, the bytecode evaluator tests the return
+continuation on top of the stack.  If it is a machine code
+continuation, it returns to the scheduler with the constructor on top
+of the stack.
  
+\note{This is why the scheduler must enter the machine code evaluator
+if it finds a constructor on top of the stack.}
  
+\paragraph{Returning an unboxed value}
  
-\section{Kinds of Heap Object}
-\label{sect:closures}
+\note{Hugs doesn't support unboxed values in source programs but they
+are used for a few complex primops.}
  
-Heap objects can be classified in several ways, but one useful one is
-this:
-\begin{itemize}
-\item 
-{\em Static closures} occupy fixed, statically-allocated memory
-locations, with globally known addresses.
+When it returns an unboxed value, the bytecode evaluator tests the
+return continuation on top of the stack.  If it is a machine code
+continuation, it returns to the scheduler with the tagged unboxed
+value and a special closure on top of the stack.  When the closure is
+entered (by the machine code evaluator), it returns the unboxed value
+on top of the stack to the return continuation under it.
  
-\item 
-{\em Dynamic closures} are individually allocated in the heap.
+The runtime library for GHC provides one of these closures for each unboxed
+type.  Hugs cannot generate them itself since the entry code is really
+very tricky.
  
-\item 
-{\em Stack closures} are closures allocated within a thread's stack
-(which is itself a heap object).  Unlike other closures, there are
-never any pointers to stack closures.  Stack closures are discussed in
-Section~\ref{sect:stacks}.
+\paragraph{Heap/Stack overflow and preemption}
  
-\end{itemize}
-A second useful classification is this:
-\begin{itemize}
-\item 
-{\em Executive closures}, such as thunks and data constructors,
-participate directly in a program's execution.  They can be subdivided into
-two kinds of objects according to their type:
-\begin{itemize}
-\item 
-{\em Pointed objects}, represent values of a {\em pointed} type
-(<.pointed types launchbury.>) --i.e.~a type that includes $\bottom$ such as @Int@ or @Int# -> Int#@.
+The bytecode evaluator tests for heap/stack overflow and preemption
+when entering a BCO and simply returns with the BCO on top of the
+stack.
  
-\item {\em Unpointed objects}, represent values of a {\em unpointed} type --i.e.~a type that does not include $\bottom$ such as @Int#@ or @Array#@.
+\subsubsection{Leaving the machine code evaluator}
+\label{sect:ghc-to-hugs-switch}
  
-\item {\em Activation frames}, represent ``continuations''.  They are
-always stored on the stack and are never pointed to by heap objects or
-passed as arguments.  \note{It's not clear if this will still be true
-once we support speculative evaluation.}
+\paragraph{Entering a BCO}
  
-\end{itemize}
+The entry code for a BCO pushes the BCO onto the stack and returns to
+the scheduler.
  
-\item {\em Administrative closures}, such as stack objects and thread
-state objects, do not represent values in the original program.
-\end{itemize}
+\paragraph{Returning a constructor}
  
-Only pointed objects can be entered.  All pointed objects share a
-common header format: the ``pointed header''; while all unpointed
-objects share a common header format: the ``unpointed header''.
-\ToDo{Describe the difference and update the diagrams to mention
-an appropriate header type.}
+We avoid the need to test return addresses in the machine code
+evaluator by pushing a special return address on top of a pointer to
+the bytecode return continuation.  Figure~\ref{fig:hugs-return-stack}
+shows the state of the stack just before evaluating the scrutinee.
  
-This section enumerates all the kinds of heap objects in the system.
-Each is identified by a distinct @INFO_TYPE@ tag in its info table.
+\begin{figure}[ht]
+\begin{center}
+@
+| stack    |
++----------+
+| bco      |--> BCO
++----------+
+| HUGS_RET |
++----------+
+@
+%\input{hugs_return1.pstex_t}
+\end{center}
+\caption{Stack layout for evaluating a scrutinee}
+\label{fig:hugs-return-stack}
+\end{figure}
  
-\ToDo{Check this table very carefully}
+This return address rearranges the stack so that the bco pointer is
+above the constructor on the stack (as shown in
+figure~\ref{fig:hugs-boxed-return}) and returns to the scheduler.
  
-\begin{tabular}{|l|l|l|l|l|l|l|l|l|l|l|}
-\hline
+\begin{figure}[ht]
+\begin{center}
+@
+| stack    |
++----------+
+| con      |--> Constructor
++----------+
+| bco      |--> BCO
++----------+
+@
+%\input{hugs_return2.pstex_t}
+\end{center}
+\caption{Stack layout for entering a Hugs return address}
+\label{fig:hugs-boxed-return}
+\end{figure}
  
-closure kind          & HNF & UPD & NS & STA & THU & MUT & UPT & BH & IND & Section \\
+\paragraph{Returning an unboxed value}
  
-\hline                                                              
-{\em Pointed} \\ 
-\hline 
+We avoid the need to test return addresses in the machine code
+evaluator by pushing a special return address on top of a pointer to
+the bytecode return continuation.  This return address rearranges the
+stack so that the bco pointer is above the tagged unboxed value (as shown in
+figure~\ref{fig:hugs-entering-unboxed-return}) and returns to the scheduler.
  
-@CONSTR@              & 1   &     & 1  &     &     &     &     &    &     & \ref{sect:CONSTR}    \\
-@CONSTR_STATIC@       & 1   &     & 1  & 1   &     &     &     &    &     & \ref{sect:CONSTR}    \\
-@CONSTR_STATIC_NOCAF@ & 1   &     & 1  & 1   &     &     &     &    &     & \ref{sect:CONSTR}    \\
-                                                                                                
-@FUN@                 & 1   &     & ?  &     &     &     &     &    &     & \ref{sect:FUN}       \\
-@FUN_STATIC@          & 1   &     & ?  & 1   &     &     &     &    &     & \ref{sect:FUN}       \\
-                                                                                                
-@THUNK@               & 1   & 1   &    &     & 1   &     &     &    &     & \ref{sect:THUNK}     \\
-@THUNK_STATIC@        & 1   & 1   &    & 1   & 1   &     &     &    &     & \ref{sect:THUNK}     \\
-@THUNK_SELECTOR@      & 1   & 1   & 1  &     & 1   &     &     &    &     & \ref{sect:THUNK_SEL}     \\
-                                                                                                
-@PAP@                 & 1   &     & ?  &     &     &     &     &    &     & \ref{sect:PAP}       \\
-                                                                                                
-@IND@                 &     &     & 1  &     & ?   &     &     &    & 1   & \ref{sect:IND}       \\
-@IND_OLDGEN@          & 1   &     & 1  &     & ?   &     &     &    & 1   & \ref{sect:IND}       \\
-@IND_PERM@            &     &     & 1  &     & ?   &     &     &    & 1   & \ref{sect:IND}       \\
-@IND_OLDGEN_PERM@     & 1   &     & 1  &     & ?   &     &     &    & 1   & \ref{sect:IND}       \\
-@IND_STATIC@          & ?   &     & 1  & 1   & ?   &     &     &    & 1   & \ref{sect:IND}       \\
+\begin{figure}[ht]
+\begin{center}
+@
+| stack    |
++----------+
+| 1#       |
++----------+
+| I#       |
++----------+
+| bco      |--> BCO
++----------+
+@
+%\input{hugs_return2.pstex_t}
+\end{center}
+\caption{Stack layout for returning an unboxed value}
+\label{fig:hugs-entering-unboxed-return}
+\end{figure}
  
-\hline
-{\em Unpointed} \\ 
-\hline
+\paragraph{Heap/Stack overflow and preemption}
  
+\ToDo{}
  
-@ARR_WORDS@           & 1   &     & 1  &     &     & 1   & 1   &    &     & \ref{sect:ARR_WORDS1},\ref{sect:ARR_WORDS2} \\
-@ARR_PTRS@            & 1   &     & 1  &     &     & 1   & 1   &    &     & \ref{sect:ARR_PTRS}  \\
-@MUTVAR@              & 1   &     & 1  &     &     & 1   & 1   &    &     & \ref{sect:MUTVAR}    \\
-@MUTARR_PTRS@         & 1   &     & 1  &     &     & 1   & 1   &    &     & \ref{sect:MUTARR_PTRS} \\
-@MUTARR_PTRS_FROZEN@  & 1   &     & 1  &     &     & 1   & 1   &    &     & \ref{sect:MUTARR_PTRS_FROZEN} \\
  
-@FOREIGN@             & 1   &     & 1  &     &     &     & 1   &    &     & \ref{sect:FOREIGN}   \\
+\subsection{Preempting a thread}
  
-@BH@                  & ?   & 0/1 & 1  &     & ?   & ?   &     & 1  & ?   & \ref{sect:BH}        \\
-@MVAR@                       &     &     &    &     &     &     &     &    &     & \ref{sect:MVAR}      \\
-@IVAR@                       &     &     &    &     &     &     &     &    &     & \ref{sect:IVAR}      \\
-@FETCHME@             &     &     &    &     &     &     &     &    &     & \ref{sect:FETCHME}   \\
-\hline
-\end{tabular}
+Strictly speaking, threads cannot be preempted --- the scheduler
+merely sets a preemption request flag which the thread must arrange to
+test on a regular basis.  When an evaluator finds that the preemption
+request flag is set, it pushes an appropriate closure onto the stack
+and returns to the scheduler.
  
-Activation frames do not live (directly) on the heap --- but they have
-a similar organisation.  The classification bits are all zero in
-activation frames.
+In the bytecode interpreter, the flag is tested whenever we enter a
+closure.  If the preemption flag is set, it leaves the closure on top
+of the stack and returns to the scheduler.
  
-\begin{tabular}{|l|l|}\hline
-closure kind           & Section                       \\ \hline
-@RET_SMALL@            & \ref{sect:activation-records} \\
-@RET_VEC_SMALL@        & \ref{sect:activation-records} \\
-@RET_BIG@              & \ref{sect:activation-records} \\
-@RET_VEC_BIG@          & \ref{sect:activation-records} \\
-@UPDATE_FRAME@                 & \ref{sect:activation-records} \\
-\hline
-\end{tabular}
+In the machine code evaluator, the flag is only tested when a heap or
+stack check fails.  This is less expensive than testing the flag on
+entering every closure but runs the risk that a thread will enter an
+infinite loop which does not allocate any space.  If the flag is set,
+the evaluator returns to the scheduler exactly as if a heap check had
+failed.
  
-There are also a number of administrative objects.  The classification bits are
-all zero in administrative objects.
+\subsection{``Safe'' and ``unsafe'' C calls}
  
-\begin{tabular}{|l|l|}\hline
-closure kind           & Section                       \\ \hline
-@TSO@                   & \ref{sect:TSO}               \\
-@STACK_OBJECT@          & \ref{sect:STACK_OBJECT}      \\
-@STABLEPTR_TABLE@       & \ref{sect:STABLEPTR_TABLE}   \\
-@SPARK_OBJECT@          & \ref{sect:SPARK}             \\
-@BLOCKED_FETCH@        & \ref{sect:BLOCKED_FETCH}      \\
-\hline
-\end{tabular}
+There are two ways of calling C: 
  
-\ToDo{I guess the parallel system has something like a stable ptr
-table.  Is there any opportunity for sharing code/data structures
-here?}
+\begin{description}
  
+\item[``Unsafe'' C calls] are used if the programer is certain that
+the C function will not do anything dangerous.  Unsafe C calls are
+faster but must be hand-checked by the programmer.
  
-\subsection{Classification bits}
+Dangerous things include:
  
-The top bits of the @INFO_TYPE@ tag tells what sort of animal the
-closure is.
+\begin{itemize}
  
-\begin{tabular}{|l|l|l|}                                                       \hline
-Abbrev & Bit & Interpretation                                                  \\ \hline
-HNF    & 0   & 1 $\Rightarrow$ Head normal form                                        \\
-UPD    & 4   & 1 $\Rightarrow$ May be updated (inconsistent with being a HNF)   \\ 
-NS     & 1   & 1 $\Rightarrow$ Don't spark me  (Any HNF will have this set to 1)\\
-STA    & 2   & 1 $\Rightarrow$ This is a static closure                                \\
-THU    & 8   & 1 $\Rightarrow$ Is a thunk                                      \\
-MUT    & 3   & 1 $\Rightarrow$ Has mutable pointer fields                       \\ 
-UPT    & 5   & 1 $\Rightarrow$ Has an unpointed type (eg a primitive array)     \\
-BH     & 6   & 1 $\Rightarrow$ Is a black hole                                 \\
-IND    & 7   & 1 $\Rightarrow$ Is an indirection                               \\
-\hline
-\end{tabular}
+\item 
  
-Updatable structures (@_UP@) are thunks that may be shared.  Primitive
-arrays (@_BM@ -- Big Mothers) are structures that are always held
-in-memory (basically extensions of a closure).  Because there may be
-offsets into these arrays, a primitive array cannot be handled as a
-FetchMe in the parallel system, but must be shipped in its entirety if
-its parent closure is shipped.
+Call a system function such as @getchar@ which might block
+indefinitely.  This is dangerous because we don't want the entire
+runtime system to block just because one thread blocks.
  
-The other bits in the info-type field simply give a unique bit-pattern
-to identify the closure type.
+\item
  
-\iffalse
-@
-#define _NF                    0x0001  /* Normal form  */
-#define _NS                    0x0002  /* Don't spark  */
-#define _ST                    0x0004  /* Is static    */
-#define _MU                    0x0008  /* Is mutable   */
-#define _UP                    0x0010  /* Is updatable (but not mutable) */
-#define _BM                    0x0020  /* Is a "primitive" array */
-#define _BH                    0x0040  /* Is a black hole */
-#define _IN                    0x0080  /* Is an indirection */
-#define _TH                    0x0100  /* Is a thunk */
+Call an RTS function which will block on the RTS access semaphore.
+This would lead to deadlock.
  
+\item
  
+Call a Haskell function.  This is just a special case of calling an
+RTS function.
  
-SPEC   
-SPEC_N         SPEC | _NF | _NS
-SPEC_S         SPEC | _TH
-SPEC_U         SPEC | _UP | _TH
-               
-GEN    
-GEN_N          GEN | _NF | _NS
-GEN_S          GEN | _TH
-GEN_U          GEN | _UP | _TH
-               
-DYN            _NF | _NS
-TUPLE          _NF | _NS | _BM
-DATA           _NF | _NS | _BM
-MUTUPLE                _NF | _NS | _MU | _BM
-IMMUTUPLE      _NF | _NS | _BM
-STATIC         _NS | _ST
-CONST          _NF | _NS
-CHARLIKE       _NF | _NS
-INTLIKE                _NF | _NS
+\end{itemize}
  
-BH             _NS | _BH
-BH_N           BH
-BH_U           BH | _UP
-               
-BQ             _NS | _MU | _BH
-IND            _NS | _IN
-CAF            _NF | _NS | _ST | _IN
+Unsafe C calls are performed by pushing the arguments onto the C stack
+and jumping to the C function's entry point.  On exit, the result of
+the function is in a register which is returned to the Haskell code as
+an unboxed value.
  
-FM             
-FETCHME                FM | _MU
-FMBQ           FM | _MU | _BH
+\item[``Safe'' C calls] are used if the programmer suspects that the
+thread may do something dangerous.  Safe C calls are relatively slow
+but are less problematic.
  
-TSO            _MU
+Safe C calls are performed by pushing the arguments onto the Haskell
+stack, pushing a return continuation and returning a \emph{C function
+descriptor} to the scheduler.  The scheduler suspends the Haskell thread,
+spawns a new operating system thread which pops the arguments off the
+Haskell stack onto the C stack, calls the C function, pushes the
+function result onto the Haskell stack and informs the scheduler that
+the C function has completed and the Haskell thread is now runnable.
  
-STKO   
-STKO_DYNAMIC   STKO | _MU
-STKO_STATIC    STKO | _ST
-               
-SPEC_RBH       _NS | _MU | _BH
-GEN_RBH                _NS | _MU | _BH
-BF             _NS | _MU | _BH
-INTERNAL       
+\end{description}
  
-@
-\fi
+The bytecode evaluator will probably treat all C calls as being safe.
  
-Notes:
+\ToDo{It might be good for the programmer to indicate how the program
+is unsafe.  For example, if we distinguish between C functions which
+might call Haskell functions and those which might block, we could
+perform an unsafe call for blocking functions in a single-threaded
+system or, perhaps, in a multi-threaded system which only happens to
+have a single thread at the moment.}
  
-An indirection either points to HNF (post update); or is result of
-overwriting a FetchMe, in which case the thing fetched is either
-under evaluation (BH), or by now an HNF.  Thus, indirections get NoSpark flag.
  
  
+\section{The Storage Manager}\label{sect:sm-overview}
  
-\subsection{Pointed Objects}
+The storage manager is responsible for managing the heap and all
+objects stored in it.  It provides special support for lazy evaluation
+and for foreign function calls.
  
-All pointed objects can be entered.
+\subsection{SM support for lazy evaluation}
  
-\subsubsection{Function closures}\label{sect:FUN}
+\begin{itemize}
+\item
  
-Function closures represent lambda abstractions.  For example,
-consider the top-level declaration:
+Indirections are shorted out.
+
+\item
+
+Update frames pointing to unreachable objects are squeezed out.
+
+\item
+
+Adjacent update frames (for different closures) are compressed to a
+single update frame pointing to a single black hole.
+
+\end{itemize}
+
+
+\subsection{SM support for foreign function calls}
+
+\begin{itemize}
+
+\item
+
+Stable pointers allow other languages to access Haskell objects.
+
+\item
+
+Foreign Objects are a form of weak pointer which let's Haskell access
+foreign objects.
+
+\end{itemize}
+
+\subsection{Misc}
+
+\begin{itemize}
+
+\item
+
+If the stack contains a large amount of free space, the storage
+manager may shrink the stack.  If it shrinks the stack, it guarantees
+never to leave less than @MIN_SIZE_SHRUNKEN_STACK@ empty words on the
+stack when it does so.
+
+\ToDo{Would it be useful for the storage manager to enlarge the stack?}
+
+\item
+
+For efficiency reasons, very large objects (eg large arrays and TSOs)
+are not moved if possible.
+
+\end{itemize}
+
+
+\section{The Compilers}\label{sect:compilers-overview}
+
+Need to describe interface files, format of bytecode files, symbols
+defined by machine code files.
+
+\subsection{Interface Files}
+
+Here's an example - but I don't know the grammar - ADR.
  @
-  f = \x -> let g = \y -> x+y
-           in g x
+_interface_ Main 1
+_exports_
+Main main ;
+_declarations_
+1 main _:_ IOBase.IO PrelBase.();;
  @
-Both @f@ and @g@ are represented by function closures.  The closure
-for @f@ is {\em static} while that for @g@ is {\em dynamic}.
  
-The layout of a function closure is as follows:
+\subsection{Bytecode files}
+
+(All that matters here is what the loader sees.)
+
+\subsection{Machine code files}
+
+(Again, all that matters is what the loader sees.)
+
+
+\subsection{Bytecode files}
+
+(All that matters here is what the loader sees.)
+
+\subsection{Machine code files}
+
+(Again, all that matters is what the loader sees.)
+
+
+\section{The Loader}\label{sect:loader-overview}
+
+In a batch mode system, we can statically link all the modules
+together.  In an interactive system we need a loader which will
+explicitly load and unload individual modules (or, perhaps, blocks of
+mutually dependent modules) and resolve references between modules.
+
+While many operating systems provide support for dynamic loading and
+will automatically resolve cross-module references for us, we generally
+cannot rely on being able to load mutually dependent modules.
+
+A portable solution is to perform some of the linking ourselves.  Each module
+should provide three global symbols: 
+\begin{itemize}
+\item
+An initialisation routine.  (Might also be used for finalisation.)
+\item
+A table of symbols it exports.
+Entries in this table consist of the symbol name and the address of the
+names value.
+\item
+A table of symbols it imports.
+Entries in this table consist of the symbol name and a list of references
+to that symbol.
+\end{itemize}
+
+On loading a group of modules, the loader adds the contents of the
+export lists to a symbol table and then fills in all the references in the
+import lists.
+
+References in import lists are of two types:
+\begin{description}
+\item[ References in machine code ]
+
+The most efficient approach is to patch the machine code directly, but
+this will be a lot of work, very painful to port and rather fragile.
+
+Alternatively, the loader could store the value of each symbol in the
+import table for each module and the compiled code can access all
+external objects through the import table.  This requires that the
+import table be writable but does not require that the machine code or
+info tables be writable.
+
+\item[ References in data structures (SRTs and static data constructors) ]
+
+Either we patch the SRTs and constructors directly or we somehow use
+indirections through the symbol table.  Patching the SRTs requires
+that we make them writable and prevents us from making effective use
+of virtual memories that use copy-on-write policies.  Using an
+indirection is possible but tricky.
+
+Note: We could avoid patching machine code if all references to
+external references went through the SRT --- then we just have one
+thing to patch.  But the SRT always contains a pointer to the closure
+rather than the fast entry point (say), so we'd take a big performance
+hit for doing this.
+
+\end{description}
+
+Using the above scheme, all accesses to ``external'' objects involve a
+layer of indirection.  To avoid this overhead, the machine code
+compiler might provide a way for the programmer to specify which
+modules will be statically linked and which will be dynamically linked
+--- the idea being that statically linked code and data will be
+accessed directly.
+
+
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+\part{Internal details}
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+
+This part is concerned with the internal details of the components
+described in the previous part.
+
+The major components of the system are:
+\begin{itemize}
+\item The scheduler (section~\ref{sect:storage-manager-internals})
+\item The storage manager (section~\ref{sect:storage-manager-internals})
+\item The evaluators
+\item The loader
+\item The compilers
+\end{itemize}
+
+\section{The Scheduler}
+\label{sect:scheduler-internals}
+
+\ToDo{Detailed description of scheduler}
+
+Many heap objects contain fields allowing them to be inserted onto lists
+during evaluation or during garbage collection. The lists required by
+the evaluator and storage manager are as follows.
+
+\begin{itemize}
+
+\item 4 lists of threads: runnable threads, sleeping threads, threads
+waiting for timeout and threads waiting for I/O.
+
+\item The \emph{mutables list} is a list of all objects in the old
+generation which might contain pointers into the new generation.  Most
+of the objects on this list are indirections (section~\ref{sect:IND})
+or ``mutable.''  (Section~\ref{sect:mutables}.)
+
+\item The \emph{Foreign Object list} is a list of all foreign objects
+ which have not yet been deallocated. (Section~\ref{sect:FOREIGN}.)
+
+\item The \emph{Spark pool} is a doubly(?) linked list of Spark objects
+maintained by the parallel system.  (Section~\ref{sect:SPARK}.)
+
+\item The \emph{Blocked Fetch list} (or
+lists?). (Section~\ref{sect:BLOCKED_FETCH}.)
+
+\item For each thread, there is a list of all update frames on the
+stack.  (Section~\ref{sect:data-updates}.)
+
+\item The Stable Pointer Table is a table of pointers to objects which
+are known to the outside world and must be retained by the garbage
+collector even if they are not accessible from within the heap.
+
+\end{itemize}
+
+\ToDo{The links for these fields are usually inserted immediately
+after the fixed header except ...}
+
+
+
+\section{The Storage Manager}
+\label{sect:storage-manager-internals}
+
+\subsection{Misc Text looking for a home}
+
+A \emph{value} may be:
+\begin{itemize}
+\item \emph{Boxed}, i.e.~represented indirectly by a pointer to a heap object (e.g.~foreign objects, arrays); or
+\item \emph{Unboxed}, i.e.~represented directly by a bit-pattern in one or more registers (e.g.~@Int#@ and @Float#@).
+\end{itemize}
+All \emph{pointed} values are \emph{boxed}.  
+
+
+\subsection{Heap Objects}
+
+\begin{figure}
  \begin{center}
-\begin{tabular}{|l|l|l|l|}\hline
-{\em Fixed header}  & {\em Pointers} & {\em Non-pointers} \\ \hline
-\end{tabular}
+\input{closure}
  \end{center}
-The data words (pointers and non-pointers) are the free variables of
-the function closure.  
-The number of pointers
-and number of non-pointers are stored in the @INFO_SM@ word, in the least significant
-and most significant half-word respectively.
+\ToDo{Fix this picture}
+\caption{A closure}
+\label{fig:closure}
+\end{figure}
  
-There are several different sorts of function closure, distinguished
-by their @INFO_TYPE@ field:
+Every \emph{heap object} is a contiguous block
+of memory, consisting of a fixed-format \emph{header} followed
+by zero or more \emph{data words}.
+
+The header consists of the following fields:
  \begin{itemize}
-\item @FUN@: a vanilla, dynamically allocated on the heap. 
+\item A one-word \emph{info pointer}, which points to
+the object's static \emph{info table}.
+\item Zero or more \emph{admin words} that support
+\begin{itemize}
+\item Profiling (notably a \emph{cost centre} word).
+  \note{We could possibly omit the cost centre word from some 
+  administrative objects.}
+\item Parallelism (e.g. GranSim keeps the object's global address here,
+though GUM keeps a separate hash table).
+\item Statistics (e.g. a word to track how many times a thunk is entered.).
  
-\item $@FUN_@p@_@np$: to speed up garbage collection a number of
-specialised forms of @FUN@ are provided, for particular $(p,np)$ pairs,
-where $p$ is the number of pointers and $np$ the number of non-pointers.
+We add a Ticky word to the fixed-header part of closures.  This is
+used to indicate if a closure has been updated but not yet entered. It
+is set when the closure is updated and cleared when subsequently
+entered.
+\footnote{%
+NB: It is \emph{not} an ``entry count'', it is an
+``entries-after-update count.''  The commoning up of @CONST@,
+@CHARLIKE@ and @INTLIKE@ closures is turned off(?) if this is
+required. This has only been done for 2s collection.
+}
+
+\end{itemize}
+\end{itemize}
+
+Most of the RTS is completely insensitive to the number of admin words.
+The total size of the fixed header is @FIXED_HS@.
+
+\subsection{Info Tables}
+
+An \emph{info table} is a contiguous block of memory, \emph{laid out
+backwards}.  That is, the first field in the list that follows
+occupies the highest memory address, and the successive fields occupy
+successive decreasing memory addresses.
  
-\item @FUN_STATIC@.  Top-level, static, function closures (such as
-@f@ above) have a different
-layout than dynamic ones:
  \begin{center}
-\begin{tabular}{|l|l|l|}\hline
-{\em Fixed header}  & {\em Static object link} \\ \hline
+\begin{tabular}{|c|}
+   \hline Parallelism Info 
+\\ \hline Profile Info 
+\\ \hline Debug Info 
+\\ \hline Tag / Static reference table
+\\ \hline Storage manager layout info
+\\ \hline Closure type 
+\\ \hline entry code
+\\       \vdots
  \end{tabular}
  \end{center}
-Static function closurs have no free variables.  (However they may refer to other 
-static closures; these references are recorded in the function closure's SRT.)
-They have one field that is not present in dynamic closures, the {\em static object
-link} field.  This is used by the garbage collector in the same way that to-space
-is, to gather closures that have been determined to be live but that have not yet
-been scavenged.
-\note{Static function closures that have no static references, and hence
-a null SRT pointer, don't need the static object link field.  Is it worth
-taking advantage of this?  See @CONSTR_STATIC_NOCAF@.}
-\end{itemize}
+An info table has the following contents (working backwards in memory
+addresses):
+\begin{itemize}
+\item The \emph{entry code} for the closure.
+This code appears literally as the (large) last entry in the
+info table, immediately preceded by the rest of the info table.
+An \emph{info pointer} always points to the first byte of the entry code.
+
+\item A one-word \emph{closure type field}, @INFO_TYPE@, identifies what kind
+of closure the object is.  The various types of closure are described
+in Section~\ref{sect:closures}.
+In some configurations, some useful properties of 
+closures (is it a HNF?  can it be sparked?)
+are represented as high-order bits so they can be tested quickly.
+
+\item A single pointer or word --- the \emph{storage manager info field},
+@INFO_SM@, contains auxiliary information describing the closure's
+precise layout, for the benefit of the garbage collector and the code
+that stuffs graph into packets for transmission over the network.
+
+\item A one-word \emph{Tag/Static Reference Table} field, @INFO_SRT@.
+For data constructors, this field contains the constructor tag, in the
+range $0..n-1$ where $n$ is the number of constructors.  For all other
+objects it contains a pointer to a table which enables the garbage
+collector to identify all accessible code and CAFs.  They are fully
+described in Section~\ref{sect:srt}.
+
+\item \emph{Profiling info\/}
+
+\ToDo{The profiling info is completely bogus.  I've not deleted it
+from the document but I've commented it all out.}
+
+% change to \iftrue to uncomment this section
+\iffalse
+
+Closure category records are attached to the info table of the
+closure. They are declared with the info table. We put pointers to
+these ClCat things in info tables.  We need these ClCat things because
+they are mutable, whereas info tables are immutable.  Hashing will map
+similar categories to the same hash value allowing statistics to be
+grouped by closure category.
+
+Cost Centres and Closure Categories are hashed to provide indexes
+against which arbitrary information can be stored. These indexes are
+memoised in the appropriate cost centre or category record and
+subsequent hashes avoided by the index routine (it simply returns the
+memoised index).
+
+There are different features which can be hashed allowing information
+to be stored for different groupings. Cost centres have the cost
+centre recorded (using the pointer), module and group. Closure
+categories have the closure description and the type
+description. Records with the same feature will be hashed to the same
+index value.
+
+The initialisation routines, @init_index_<feature>@, allocate a hash
+table in which the cost centre / category records are stored. The
+lower bound for the table size is taken from @max_<feature>_no@. They
+return the actual table size used (the next power of 2). Unused
+locations in the hash table are indicated by a 0 entry. Successive
+@init_index_<feature>@ calls just return the actual table size.
+
+Calls to @index_<feature>@ will insert the cost centre / category
+record in the @<feature>@ hash table, if not already inserted. The hash
+index is memoised in the record and returned. 
+
+CURRENTLY ONLY ONE MEMOISATION SLOT IS AVILABLE IN EACH RECORD SO
+HASHING CAN ONLY BE DONE ON ONE FEATURE FOR EACH RECORD. This can be
+easily relaxed at the expense of extra memoisation space or continued
+rehashing.
+
+The initialisation routines must be called before initialisation of
+the stacks and heap as they require to allocate storage. It is also
+expected that the caller may want to allocate additional storage in
+which to store profiling information based on the return table size
+value(s).
+
+\begin{center}
+\begin{tabular}{|l|}
+   \hline Hash Index
+\\ \hline Selected
+\\ \hline Kind
+\\ \hline Description String
+\\ \hline Type String
+\\ \hline
+\end{tabular}
+\end{center}
+
+\begin{description}
+\item[Hash Index] Memoised copy
+\item[Selected] 
+  Is this category selected (-1 == not memoised, selected? 0 or 1)
+\item[Kind]
+One of the following values (defined in CostCentre.lh):
+
+\begin{description}
+\item[@CON_K@]
+A constructor.
+\item[@FN_K@]
+A literal function.
+\item[@PAP_K@]
+A partial application.
+\item[@THK_K@]
+A thunk, or suspension.
+\item[@BH_K@]
+A black hole.
+\item[@ARR_K@]
+An array.
+\item[@ForeignObj_K@]
+A Foreign object (non-Haskell heap resident).
+\item[@SPT_K@]
+The Stable Pointer table.  (There should only be one of these but it
+represents a form of weak space leak since it can't shrink to meet
+non-demand so it may be worth watching separately? ADR)
+\item[@INTERNAL_KIND@]
+Something internal to the runtime system.
+\end{description}
+
+
+\item[Description] Source derived string detailing closure description.
+\item[Type] Source derived string detailing closure type.
+\end{description}
+
+\fi % end of commented out stuff
+
+\item \emph{Parallelism info\/}
+\ToDo{}
+
+\item \emph{Debugging info\/}
+\ToDo{}
+
+\end{itemize}
+
+
+%-----------------------------------------------------------------------------
+\subsection{Kinds of Heap Object}
+\label{sect:closures}
+
+Heap objects can be classified in several ways, but one useful one is
+this:
+\begin{itemize}
+\item 
+\emph{Static closures} occupy fixed, statically-allocated memory
+locations, with globally known addresses.
+
+\item 
+\emph{Dynamic closures} are individually allocated in the heap.
+
+\item 
+\emph{Stack closures} are closures allocated within a thread's stack
+(which is itself a heap object).  Unlike other closures, there are
+never any pointers to stack closures.  Stack closures are discussed in
+Section~\ref{sect:stacks}.
+
+\end{itemize}
+A second useful classification is this:
+\begin{itemize}
+\item 
+\emph{Executive objects}, such as thunks and data constructors,
+participate directly in a program's execution.  They can be subdivided into
+three kinds of objects according to their type:
+\begin{itemize}
+\item 
+\emph{Pointed objects}, represent values of a \emph{pointed} type
+(<.pointed types launchbury.>) --i.e.~a type that includes $\bottom$ such as @Int@ or @Int# -> Int#@.
+
+\item \emph{Unpointed objects}, represent values of a \emph{unpointed} type --i.e.~a type that does not include $\bottom$ such as @Int#@ or @Array#@.
+
+\item \emph{Activation frames}, represent ``continuations''.  They are
+always stored on the stack and are never pointed to by heap objects or
+passed as arguments.  \note{It's not clear if this will still be true
+once we support speculative evaluation.}
+
+\end{itemize}
+
+\item \emph{Administrative objects}, such as stack objects and thread
+state objects, do not represent values in the original program.
+\end{itemize}
+
+Only pointed objects can be entered.  All pointed objects share a
+common header format: the ``pointed header''; while all unpointed
+objects share a common header format: the ``unpointed header''.
+\ToDo{Describe the difference and update the diagrams to mention
+an appropriate header type.}
+
+This section enumerates all the kinds of heap objects in the system.
+Each is identified by a distinct @INFO_TYPE@ tag in its info table.
+
+\begin{tabular}{|l|l|l|l|l|l|l|l|l|l|l|}
+\hline
+
+closure kind          & Section \\
+                     
+\hline                          
+\emph{Pointed} \\      
+\hline                       
+                     
+@CONSTR@              & \ref{sect:CONSTR}    \\
+@CONSTR_STATIC@       & \ref{sect:CONSTR}    \\
+@CONSTR_STATIC_NOCAF@ & \ref{sect:CONSTR}    \\
+                     
+@FUN@                 & \ref{sect:FUN}       \\
+@FUN_STATIC@          & \ref{sect:FUN}       \\
+                     
+@THUNK@               & \ref{sect:THUNK}     \\
+@THUNK_STATIC@        & \ref{sect:THUNK}     \\
+@THUNK_SELECTOR@      & \ref{sect:THUNK_SEL} \\
+                     
+@BCO@                & \ref{sect:BCO}       \\
+@BCO_CAF@            & \ref{sect:BCO}       \\
+                     
+@AP@                 & \ref{sect:AP}            \\
+@PAP@                 & \ref{sect:PAP}       \\
+                     
+@IND@                 & \ref{sect:IND}       \\
+@IND_OLDGEN@          & \ref{sect:IND}       \\
+@IND_PERM@            & \ref{sect:IND}       \\
+@IND_OLDGEN_PERM@     & \ref{sect:IND}       \\
+@IND_STATIC@          & \ref{sect:IND}       \\
+                     
+\hline               
+\emph{Unpointed} \\    
+\hline               
+                                     
+@ARR_WORDS@           & \ref{sect:ARR_WORDS1},\ref{sect:ARR_WORDS2} \\
+@ARR_PTRS@            & \ref{sect:ARR_PTRS}  \\
+@MUTVAR@              & \ref{sect:MUTVAR}    \\
+@MUTARR_PTRS@         & \ref{sect:MUTARR_PTRS} \\
+@MUTARR_PTRS_FROZEN@  & \ref{sect:MUTARR_PTRS_FROZEN} \\
+                     
+@FOREIGN@             & \ref{sect:FOREIGN}   \\
+                     
+@BH@                  & \ref{sect:BH}        \\
+@MVAR@                       & \ref{sect:MVAR}      \\
+@IVAR@                       & \ref{sect:IVAR}      \\
+@FETCHME@             & \ref{sect:FETCHME}   \\
+\hline
+\end{tabular}
+
+Activation frames do not live (directly) on the heap --- but they have
+a similar organisation.
+
+\begin{tabular}{|l|l|}\hline
+closure kind           & Section                       \\ \hline
+@RET_SMALL@            & \ref{sect:activation-records} \\
+@RET_VEC_SMALL@        & \ref{sect:activation-records} \\
+@RET_BIG@              & \ref{sect:activation-records} \\
+@RET_VEC_BIG@          & \ref{sect:activation-records} \\
+@UPDATE_FRAME@                 & \ref{sect:activation-records} \\
+\hline
+\end{tabular}
+
+There are also a number of administrative objects.
+
+\begin{tabular}{|l|l|}\hline
+closure kind           & Section                       \\ \hline
+@TSO@                   & \ref{sect:TSO}               \\
+@STACK_OBJECT@          & \ref{sect:STACK_OBJECT}      \\
+@STABLEPTR_TABLE@       & \ref{sect:STABLEPTR_TABLE}   \\
+@SPARK_OBJECT@          & \ref{sect:SPARK}             \\
+@BLOCKED_FETCH@        & \ref{sect:BLOCKED_FETCH}      \\
+\hline
+\end{tabular}
+
+\ToDo{I guess the parallel system has something like a stable ptr
+table.  Is there any opportunity for sharing code/data structures
+here?}
+
+
+\subsection{Predicates}
+
+\ToDo{The following is a first attempt at defining a useful set of
+predicates.  Some (such as @isWHNF@ and @isSparkable@) may need their
+definitions tweaked a little.}
+
+The runtime system sometimes needs to be able to distinguish objects
+according to their properties: is the object updateable? is it in weak
+head normal form? etc.  These questions can be answered by examining
+the @INFO_TYPE@ field of the object's info table.  
+
+We define the following predicates to detect families of related
+info types.  They are mutually exclusive and exhaustive.
+
+\begin{itemize}
+\item @isCONSTR@ is true for @CONSTR@s.
+\item @isFUN@ is true for @FUN@s.
+\item @isTHUNK@ is true for @THUNK@s.
+\item @isBCO@ is true for @BCO@s.
+\item @isAP@ is true for @AP@s.
+\item @isPAP@ is true for @PAP@s.
+\item @isINDIRECTION@ is true for indirection objects. 
+\item @isBH@ is true for black holes.
+\item @isFOREIGN_OBJECT@ is true for foreign objects.
+\item @isARRAY@ is true for array objects.
+\item @isMVAR@ is true for @MVAR@s.
+\item @isIVAR@ is true for @IVAR@s.
+\item @isFETCHME@ is true for @FETCHME@s.
+\item @isSLOP@ is true for slop objects.
+\item @isRET_ADDR@ is true for return addresses.
+\item @isUPD_ADDR@ is true for update frames.
+\item @isTSO@ is true for @TSO@s.
+\item @isSTABLE_PTR_TABLE@ is true for the stable pointer table.
+\item @isSPARK_OBJECT@ is true for spark objects.
+\item @isBLOCKED_FETCH@ is true for blocked fetch objects.
+\item @isINVALID_INFOTYPE@ is true for all other info types.
+
+\end{itemize}
+
+The following predicates detect other interesting properties:
+
+\begin{itemize}
+
+\item @isPOINTED@ is true if an object has a pointed type.
+
+If an object is pointed, the following predicates may be true
+(otherwise they are false).  @isWHNF@ and @isUPDATEABLE@ are
+mutually exclusive.
+
+\begin{itemize} 
+\item @isWHNF@ is true if the object is in Weak Head Normal Form.  
+Note that unpointed objects are (arbitrarily) not considered to be in WHNF.
+
+@isWHNF@ is true for @PAP@s, @CONSTR@s, @FUN@s and all @BCO@s.
+
+\ToDo{Need to distinguish between whnf BCOs and non-whnf BCOs in their
+@INFO_TYPE@}
+
+\item @isUPDATEABLE@ is true if the object may be overwritten with an
+ indirection object.
+
+@isUPDATEABLE@ is true for @THUNK@s, @AP@s and @BH@s.
+
+\end{itemize}
+
+It is possible for a pointed object to be neither updatable nor in
+WHNF.  For example, indirections.
+
+\item @isUNPOINTED@ is true if an object has an unpointed type.
+All such objects are boxed since only boxed objects have info pointers.
+
+It is true for @ARR_WORDS@, @ARR_PTRS@, @MUTVAR@, @MUTARR_PTRS@,
+@MUTARR_PTRS_FROZEN@, @FOREIGN@ objects, @MVAR@s and @IVAR@s.
+
+\item @isACTIVATION_FRAME@ is true for activation frames of all sorts.
+
+It is true for return addresses and update frames.
+\begin{itemize}
+\item @isVECTORED_RETADDR@ is true for vectored return addresses.
+\item @isDIRECT_RETADDR@ is true for direct return addresses.
+\end{itemize}
+
+\item @isADMINISTRATIVE@ is true for administrative objects:
+@TSO@s, the stable pointer table, spark objects and blocked fetches.
+
+\end{itemize}
+
+\begin{itemize}
+
+\item @isSTATIC@ is true for any statically allocated closure.
+
+\item @isMUTABLE@ is true for objects with mutable pointer fields:
+  @MUT_ARR@s, @MUTVAR@s, @MVAR@s and @IVAR@s.
+
+\item @isSparkable@ is true if the object can (and should) be sparked.
+It is true of updateable objects which are not in WHNF with the
+exception of @THUNK_SELECTOR@s and black holes.
+
+\end{itemize}
+
+As a minor optimisation, we might use the top bits of the @INFO_TYPE@
+field to ``cache'' the answers to some of these predicates.
+
+An indirection either points to HNF (post update); or is result of
+overwriting a FetchMe, in which case the thing fetched is either
+under evaluation (BH), or by now an HNF.  Thus, indirections get NoSpark flag.
+
+
+\iffalse
+@
+#define _NF                    0x0001  /* Normal form  */
+#define _NS                    0x0002  /* Don't spark  */
+#define _ST                    0x0004  /* Is static    */
+#define _MU                    0x0008  /* Is mutable   */
+#define _UP                    0x0010  /* Is updatable (but not mutable) */
+#define _BM                    0x0020  /* Is a "rimitive" array */
+#define _BH                    0x0040  /* Is a black hole */
+#define _IN                    0x0080  /* Is an indirection */
+#define _TH                    0x0100  /* Is a thunk */
+
+
+
+SPEC   
+SPEC_N         SPEC | _NF | _NS
+SPEC_S         SPEC | _TH
+SPEC_U         SPEC | _UP | _TH
+               
+GEN    
+GEN_N          GEN | _NF | _NS
+GEN_S          GEN | _TH
+GEN_U          GEN | _UP | _TH
+               
+DYN            _NF | _NS
+TUPLE          _NF | _NS | _BM
+DATA           _NF | _NS | _BM
+MUTUPLE                _NF | _NS | _MU | _BM
+IMMUTUPLE      _NF | _NS | _BM
+STATIC         _NS | _ST
+CONST          _NF | _NS
+CHARLIKE       _NF | _NS
+INTLIKE                _NF | _NS
+
+BH             _NS | _BH
+BH_N           BH
+BH_U           BH | _UP
+               
+BQ             _NS | _MU | _BH
+IND            _NS | _IN
+CAF            _NF | _NS | _ST | _IN
+
+FM             
+FETCHME                FM | _MU
+FMBQ           FM | _MU | _BH
+
+TSO            _MU
+
+STKO   
+STKO_DYNAMIC   STKO | _MU
+STKO_STATIC    STKO | _ST
+               
+SPEC_RBH       _NS | _MU | _BH
+GEN_RBH                _NS | _MU | _BH
+BF             _NS | _MU | _BH
+INTERNAL       
+
+@
+\fi
+
+
+\subsection{Closures (aka Pointed Objects)}
+
+An object can be entered iff it is a closure.
+
+\subsubsection{Function closures}\label{sect:FUN}
+
+Function closures represent lambda abstractions.  For example,
+consider the top-level declaration:
+@
+  f = \x -> let g = \y -> x+y
+           in g x
+@
+Both @f@ and @g@ are represented by function closures.  The closure
+for @f@ is \emph{static} while that for @g@ is \emph{dynamic}.
+
+The layout of a function closure is as follows:
+\begin{center}
+\begin{tabular}{|l|l|l|l|}\hline
+\emph{Fixed header}  & \emph{Pointers} & \emph{Non-pointers} \\ \hline
+\end{tabular}
+\end{center}
+The data words (pointers and non-pointers) are the free variables of
+the function closure.  
+The number of pointers
+and number of non-pointers are stored in the @INFO_SM@ word, in the least significant
+and most significant half-word respectively.
+
+There are several different sorts of function closure, distinguished
+by their @INFO_TYPE@ field:
+\begin{itemize}
+\item @FUN@: a vanilla, dynamically allocated on the heap. 
+
+\item $@FUN_@p@_@np$: to speed up garbage collection a number of
+specialised forms of @FUN@ are provided, for particular $(p,np)$ pairs,
+where $p$ is the number of pointers and $np$ the number of non-pointers.
+
+\item @FUN_STATIC@.  Top-level, static, function closures (such as
+@f@ above) have a different
+layout than dynamic ones:
+\begin{center}
+\begin{tabular}{|l|l|l|}\hline
+\emph{Fixed header}  & \emph{Static object link} \\ \hline
+\end{tabular}
+\end{center}
+Static function closures have no free variables.  (However they may refer to other 
+static closures; these references are recorded in the function closure's SRT.)
+They have one field that is not present in dynamic closures, the \emph{static object
+link} field.  This is used by the garbage collector in the same way that to-space
+is, to gather closures that have been determined to be live but that have not yet
+been scavenged.
+\note{Static function closures that have no static references, and hence
+a null SRT pointer, don't need the static object link field.  Is it worth
+taking advantage of this?  See @CONSTR_STATIC_NOCAF@.}
+\end{itemize}
+
+Each lambda abstraction, $f$, in the STG program has its own private info table.
+The following labels are relevant:
+\begin{itemize}
+\item $f$@_info@  is $f$'s info table.
+\item $f$@_entry@ is $f$'s slow entry point (i.e. the entry code of its
+info table; so it will label the same byte as $f$@_info@).
+\item $f@_fast_@k$ is $f$'s fast entry point.  $k$ is the number of arguments
+$f$ takes; encoding this number in the fast-entry label occasionally catches some nasty
+code-generation errors.
+\end{itemize}
+
+\subsubsection{Data Constructors}\label{sect:CONSTR}
+
+Data-constructor closures represent values constructed with
+algebraic data type constructors.
+The general layout of data constructors is the same as that for function
+closures.  That is
+\begin{center}
+\begin{tabular}{|l|l|l|l|}\hline
+\emph{Fixed header}  & \emph{Pointers} & \emph{Non-pointers} \\ \hline
+\end{tabular}
+\end{center}
+
+The SRT pointer in a data constructor's info table is used for the
+constructor tag, since a constructor never has any static references.
+
+There are several different sorts of constructor:
+\begin{itemize}
+\item @CONSTR@: a vanilla, dynamically allocated constructor.
+\item @CONSTR_@$p$@_@$np$: just like $@FUN_@p@_@np$.
+\item @CONSTR_INTLIKE@.
+A dynamically-allocated heap object that looks just like an @Int@.  The 
+garbage collector checks to see if it can common it up with one of a fixed
+set of static int-like closures, thus getting it out of the dynamic heap
+altogether.
+
+\item @CONSTR_CHARLIKE@:  same deal, but for @Char@.
+
+\item @CONSTR_STATIC@ is similar to @FUN_STATIC@, with the complication that
+the layout of the constructor must mimic that of a dynamic constructor,
+because a static constructor might be returned to some code that unpacks it.
+So its layout is like this:
+\begin{center}
+\begin{tabular}{|l|l|l|l|l|}\hline
+\emph{Fixed header}  & \emph{Pointers} & \emph{Non-pointers} & \emph{Static object link}\\ \hline
+\end{tabular}
+\end{center}
+The static object link, at the end of the closure, serves the same purpose
+as that for @FUN_STATIC@.  The pointers in the static constructor can point
+only to other static closures.
+
+The static object link occurs last in the closure so that static
+constructors can store their data fields in exactly the same place as
+dynamic constructors.
+
+\item @CONSTR_STATIC_NOCAF@.  A statically allocated data constructor
+that guarantees not to point (directly or indirectly) to any CAF
+(section~\ref{sect:CAF}).  This means it does not need a static object
+link field.  Since we expect that there might be quite a lot of static
+constructors this optimisation makes sense.  Furthermore, the @NOCAF@
+tag allows the compiler to indicate that no CAFs can be reached
+anywhere \emph{even indirectly}.
+
+
+\end{itemize}
+
+For each data constructor $Con$, two
+info tables are generated:
+\begin{itemize}
+\item $Con$@_info@ labels $Con$'s dynamic info table, 
+shared by all dynamic instances of the constructor.
+\item $Con$@_static@ labels $Con$'s static info table, 
+shared by all static instances of the constructor.
+\end{itemize}
+
+\subsubsection{Thunks}
+\label{sect:THUNK}
+\label{sect:THUNK_SEL}
+
+A thunk represents an expression that is not obviously in head normal 
+form.  For example, consider the following top-level definitions:
+@
+  range = between 1 10
+  f = \x -> let ys = take x range
+           in sum ys
+@
+Here the right-hand sides of @range@ and @ys@ are both thunks; the former
+is static while the latter is dynamic.
+
+The layout of a thunk is the same as that for a function closure.
+However, thunks must have a payload of at least @MIN_UPD_PAYLOAD@ words
+to allow it to be overwritten with a black hole and an indirection.
+The compiler may have to add extra non-pointer fields to satisfy this constraint.
+\begin{center}
+\begin{tabular}{|l|l|l|l|l|}\hline
+\emph{Fixed header}  & \emph{Pointers} & \emph{Non-pointers} \\ \hline
+\end{tabular}
+\end{center}
+The @INFO_SM@ word contains the same information as for function
+closures; that is, number of pointers and number of non-pointers.
+
+A thunk differs from a function closure in that it can be updated.
+
+There are several forms of thunk:
+\begin{itemize}
+\item @THUNK@ and $@THUNK_@p@_@np$: vanilla, dynamically allocated thunks.
+Dynamic thunks are overwritten with normal indirections.
+
+\item @THUNK_STATIC@.  A static thunk is also known as 
+a \emph{constant applicative form}, or \emph{CAF}.
+Static thunks are overwritten with static indirections.
+
+\begin{center}
+\begin{tabular}{|l|l|l|l|l|}\hline
+\emph{Fixed header}  & \emph{Pointers} & \emph{Non-pointers} \emph{Static object link}\\ \hline
+\end{tabular}
+\end{center}
+
+\item @THUNK_SELECTOR@ is a (dynamically allocated) thunk
+whose entry code performs a simple selection operation from
+a data constructor drawn from a single-constructor type.  For example,
+the thunk
+@
+       x = case y of (a,b) -> a
+@
+is a selector thunk.  A selector thunk is laid out like this:
+\begin{center}
+\begin{tabular}{|l|l|l|l|}\hline
+\emph{Fixed header}  & \emph{Selectee pointer} \\ \hline
+\end{tabular}
+\end{center}
+The @INFO_SM@ word contains the byte offset of the desired word in
+the selectee.  Note that this is different from all other thunks.
+
+The garbage collector ``peeks'' at the selectee's
+tag (in its info table).  If it is evaluated, then it goes ahead and do
+the selection, and then behaves just as if the selector thunk was an
+indirection to the selected field.
+If it is not
+evaluated, it treats the selector thunk like any other thunk of that
+shape.  [Implementation notes.  
+Copying: only the evacuate routine needs to be special.
+Compacting: only the PRStart (marking) routine needs to be special.]
+\end{itemize}
+
+
+The only label associated with a thunk is its info table:
+\begin{description}
+\item[$f$@_info@] is $f$'s info table.
+\end{description}
+
+
+\subsubsection{Byte-Code Objects}
+\label{sect:BCO}
+
+A Byte-Code Object (BCO) is a container for a a chunk of byte-code,
+which can be executed by Hugs.  The byte-code represents a
+supercombinator in the program: when Hugs compiles a module, it
+performs lambda lifting and each resulting supercombinator becomes a
+byte-code object in the heap.
+
+BCOs are not updateable; the bytecode compiler represents updatable thunks
+using a combination of @AP@s and @BCO@s.
+
+The semantics of BCOs are described in Section
+\ref{sect:hugs-heap-objects}.  A BCO has the following structure:
+
+\begin{center}
+\begin{tabular}{|l|l|l|l|l|l|}
+\hline 
+\emph{Fixed Header} & \emph{Layout} & \emph{Offset} & \emph{Size} &
+\emph{Literals} & \emph{Byte code} \\
+\hline
+\end{tabular}
+\end{center}
+
+\noindent where:
+\begin{itemize}
+\item The entry code is a static code fragment/info table that
+returns to the scheduler to invoke Hugs (Section
+\ref{sect:ghc-to-hugs-switch}).
+\item \emph{Layout} contains the number of pointer literals in the
+\emph{Literals} field.
+\item \emph{Offset} is the offset to the byte code from the start of
+the object.
+\item \emph{Size} is the number of words of byte code in the object.
+\item \emph{Literals} contains any pointer and non-pointer literals used in
+the byte-codes (including jump addresses), pointers first.
+\item \emph{Byte code} contains \emph{Size} words of non-pointer byte
+code.
+\end{itemize}
+
+
+\subsubsection{Partial applications (PAPs)}\label{sect:PAP}
+
+\ToDo{PAPs don't contains update frames or activation frames.  When we
+add revertible black holes, we'll introduce a new kind of object which
+can contain activation frames.}
+
+A partial application (PAP) represents a function applied to too few arguments.
+It is only built as a result of updating after an argument-satisfaction
+check failure.  A PAP has the following shape:
+\begin{center}
+\begin{tabular}{|l|l|l|l|}\hline
+\emph{Fixed header}  & \emph{No of arg words} & \emph{Function closure} & \emph{Arg stack} \\ \hline
+\end{tabular}
+\end{center}
+The ``arg stack'' is a copy of the chunk of stack above the update
+frame; ``no of arg words'' tells how many words it consists of.  The
+function closure is (a pointer to) the closure for the function whose
+argument-satisfaction check failed.
+
+There is just one standard form of PAP with @INFO_TYPE@ = @PAP@.
+There is just one info table too, called @PAP_info@.
+Its entry code simply copies the arg stack chunk back on top of the
+stack and enters the function closure.  (It has to do a stack overflow test first.)
+
+PAPs are also used to implement Hugs functions (where the arguments
+are free variables).  PAPs generated by Hugs can be static so we need
+both @PAP@ and @PAP_STATIC@.
+
+\subsubsection{@AP@ objects}
+\label{sect:AP}
+
+@AP@ objects are used to represent thunks built by Hugs.  The only distintion between
+an @AP@ and a @PAP@ is that an @AP@ is updateable.
+
+\begin{center}
+\begin{tabular}{|l|l|l|l|}
+\hline
+\emph{Fixed Header} & \emph{No of arg words} & \emph{Function closure} & \emph{Arg stack} \\
+\hline
+\end{tabular}
+\end{center}
+
+The entry code pushes an update frame, copies the arg stack chunk on
+top of the stack, and enters the function closure.  (It has to do a
+stack overflow test first.)
+
+The ``arg stack'' is a block of (tagged) arguments representing the
+free variables of the thunk; ``no of arg words'' tells how many words
+it consists of.  The function closure is (a pointer to) the closure
+for the thunk.  The argument stack may be empty if the thunk has no
+free variables.
+
+\note{Since @AP@s are updateable, the @MIN_UPD_PAYLOAD@ constraint
+applies here too.}
+
+\subsubsection{Indirections}
+\label{sect:IND}
+
+Indirection closures just point to other closures. They are introduced
+when a thunk is updated to point to its value. 
+The entry code for all indirections simply enters the closure it points to.
+
+There are several forms of indirection:
+\begin{description}
+\item[@IND@] is the vanilla, dynamically-allocated indirection.
+It is removed by the garbage collector. It has the following
+shape:
+\begin{center}
+\begin{tabular}{|l|l|l|}\hline
+\emph{Fixed header} & \emph{Mutable link field} & \emph{Target closure} \\ \hline
+\end{tabular}
+\end{center}
+It contains a \emph{mutable link field} that is used to string together
+indirections in each generation.
+
+
+\item[@IND_PERMANENT@]
+for lexical profiling, it is necessary to maintain cost centre
+information in an indirection, so ``permanent indirections'' are
+retained forever.  Otherwise they are just like vanilla indirections.
+\note{If a permanent indirection points to another permanent
+indirection or a @CONST@ closure, it is possible to elide the indirection
+since it will have no effect on the profiler.}
+
+\note{Do we still need @IND@ in the profiling build, or do we just
+need @IND@ but its behaviour changes when profiling is on?}
+
+\item[@IND_STATIC@] is used for overwriting CAFs when they have been
+evaluated.  Static indirections are not removed by the garbage
+collector; and are statically allocated outside the heap (and should
+stay there).  Their static object link field is used just as for
+@FUN_STATIC@ closures.
+
+\begin{center}
+\begin{tabular}{|l|l|l|}
+\hline
+\emph{Fixed header} & \emph{Target closure} & \emph{Static object link} \\
+\hline
+\end{tabular}
+\end{center}
+
+\end{description}
+
+\subsubsection{Black holes and Blocking Queues}
+\label{sect:BH}
+
+Black hole closures are used to overwrite closures currently being
+evaluated. They inform the garbage collector that there are no live
+roots in the closure, thus removing a potential space leak.  
+
+Black holes also become synchronization points in the threaded world.
+They contain a pointer to a list of blocked threads to be awakened
+when the black hole is updated (or @NULL@ if the list is empty).
+\begin{center}
+\begin{tabular}{|l|l|l|}
+\hline 
+\emph{Fixed header} & \emph{Mutable link} & \emph{Blocked thread link} \\
+\hline
+\end{tabular}
+\end{center}
+The \emph{Blocked thread link} points to the TSO of the first thread
+waiting for the value of this thunk.  All subsequent TSOs in the list
+are linked together using their @TSO_LINK@ field.
+
+When the blocking queue is non-@NULL@ and the @BH@ is in the old
+generation, the black hole must be added to the mutables list since
+the TSOs on the list may contain pointers into the new generation.
+There is no need to clutter up the mutables list with black holes with
+empty blocking queues.
+
+\note{In a single-threaded system, entering a black hole indicates an
+infinite loop.  In a concurrent system, entering a black hole
+indicates an infinite loop only if the hole is being entered by the
+same thread that originally entered the closure.}
+
+
+\subsubsection{FetchMes}\label{sect:FETCHME}
+
+In the parallel systems, FetchMes are used to represent pointers into
+the global heap.  When evaluated, the value they point to is read from
+the global heap.
+
+\ToDo{Describe layout}
+
+Because there may be offsets into these arrays, a primitive array
+cannot be handled as a FetchMe in the parallel system, but must be
+shipped in its entirety if its parent closure is shipped.
+
+
+
+\subsection{Unpointed Objects}
+
+A variable of unpointed type is always bound to a \emph{value}, never
+to a \emph{thunk}.  For this reason, unpointed objects cannot be
+entered.
+
+\subsubsection{Immutable Objects}
+\label{sect:ARR_WORDS1}
+\label{sect:ARR_PTRS}
+
+\begin{description}
+\item[@ARR_WORDS@] is a variable-sized object consisting solely of
+non-pointers.  It is used for arrays of all
+sorts of things (bytes, words, floats, doubles... it doesn't matter).
+\begin{center}
+\begin{tabular}{|c|c|c|c|}
+\hline
+\emph{Fixed Hdr} & \emph{No of non-pointers} & \emph{Non-pointers\ldots}       \\ \hline
+\end{tabular}
+\end{center}
+
+\item[@ARR_PTRS@] is an immutable, variable sized array of pointers.
+\begin{center}
+\begin{tabular}{|c|c|c|c|}
+\hline
+\emph{Fixed Hdr} & \emph{Mutable link} & \emph{No of pointers} & \emph{Pointers\ldots} \\ \hline
+\end{tabular}
+\end{center}
+The mutable link is present so that we can easily freeze and thaw an
+array (by changing the header and adding/removing the array to the
+mutables list).
+
+\end{description}
+
+\subsubsection{Mutable Objects}
+\label{sect:mutables}
+\label{sect:ARR_WORDS2}
+\label{sect:MUTVAR}
+\label{sect:MUTARR_PTRS}
+\label{sect:MUTARR_PTRS_FROZEN}
+
+Some of these objects are \emph{mutable}; they represent objects which
+are explicitly mutated by Haskell code through the @ST@ monad.
+They're not used for thunks which are updated precisely once.
+Depending on the garbage collector, mutable closures may contain extra
+header information which allows a generational collector to implement
+the ``write barrier.''
+
+\begin{description}
+
+\item[@ARR_WORDS@] is also used to represent \emph{mutable} arrays of
+bytes, words, floats, doubles, etc.  It's possible to use the same
+object type because even generational collectors don't need to
+distinguish them.
+
+\item[@MUTVAR@] is a mutable variable.
+\begin{center}
+\begin{tabular}{|c|c|c|}
+\hline
+\emph{Fixed Hdr} & \emph{Mutable link} & \emph{Pointer} \\ \hline
+\end{tabular}
+\end{center}
+
+\item[@MUTARR_PTRS@] is a mutable array of pointers.
+Such an array may be \emph{frozen}, becoming an @ARR_PTRS@, with a
+different info-table.
+\begin{center}
+\begin{tabular}{|c|c|c|c|}
+\hline
+\emph{Fixed Hdr} & \emph{Mutable link} & \emph{No of ptrs} & \emph{Pointers\ldots} \\ \hline
+\end{tabular}
+\end{center}
+
+\end{description}
+
+
+\subsubsection{Foreign Objects}\label{sect:FOREIGN}
+
+Here's what a ForeignObj looks like:
+
+\begin{center}
+\begin{tabular}{|l|l|l|l|}
+\hline 
+\emph{Fixed header} & \emph{Data} & \emph{Free Routine} & \emph{Foreign object link} \\
+\hline
+\end{tabular}
+\end{center}
+
+The @FreeRoutine@ is a reference to the finalisation routine to call
+when the @ForeignObj@ becomes garbage.  If @freeForeignObject@ is
+called on a Foreign Object, the @FreeRoutine@ is set to zero and the
+garbage collector will not attempt to call @FreeRoutine@ when the 
+object becomes garbage.
+
+The Foreign object link is a link to the next foreign object in the
+list.  This list is traversed at the end of garbage collection: if an
+object is about to be deallocated (e.g.~it was not marked or
+evacuated), the free routine is called and the object is deleted from
+the list.  
+
+\subsubsection{MVars and IVars}
+\label{sect:MVAR}
+\label{sect:IVAR}
+
+\ToDo{MVars and IVars}
+
+
+
+The remaining objects types are all administrative --- none of them may be entered.
+
+\subsection{Other weird objects}
+\label{sect:SPARK}
+\label{sect:BLOCKED_FETCH}
+
+\begin{description}
+\item[@BlockedFetch@ heap objects (`closures')] (parallel only)
+
+@BlockedFetch@s are inbound fetch messages blocked on local closures.
+They arise as entries in a local blocking queue when a fetch has been
+received for a local black hole.  When awakened, we look at their
+contents to figure out where to send a resume.
+
+A @BlockedFetch@ closure has the form:
+\begin{center}
+\begin{tabular}{|l|l|l|l|l|l|}\hline
+\emph{Fixed header} & link & node & gtid & slot & weight \\ \hline
+\end{tabular}
+\end{center}
+
+\item[Spark Closures] (parallel only)
+
+Spark closures are used to link together all closures in the spark pool.  When
+the current processor is idle, it may choose to speculatively evaluate some of
+the closures in the pool.  It may also choose to delete sparks from the pool.
+\begin{center}
+\begin{tabular}{|l|l|l|l|l|l|}\hline
+\emph{Fixed header} & \emph{Spark pool link} & \emph{Sparked closure} \\ \hline
+\end{tabular}
+\end{center}
+
+\item[Slop Objects]\label{sect:slop-objects}
+
+Slop objects are used to overwrite the end of an updatee if it is
+larger than an indirection.  Normal slop objects consist of an info
+pointer a size word and a number of slop words.  
+
+\begin{center}
+\begin{tabular}{|l|l|l|l|l|l|}\hline
+\emph{Info Pointer} & \emph{Size} & \emph{Slop Words} \\ \hline
+\end{tabular}
+\end{center}
+
+This is too large for single word slop objects which consist of a
+single info table.
+
+Note that slop objects only contain an info pointer, not a standard
+fixed header.  This doesn't cause problems because slop objects are
+always unreachable --- they can only be accessed by linearly scanning
+the heap.
+
+\end{description}
+
+\subsection{Thread State Objects (TSOs)}\label{sect:TSO}
+
+\ToDo{This is very out of date.  We now embed a single stack object
+within the TSO.  TSOs include an ID number which can be used to
+generate a hash value.  The gransim, profiling and ticky info is
+surely bogus.}
+
+In the multi-threaded system, the state of a suspended thread is
+packed up into a Thread State Object (TSO) which contains all the
+information needed to restart the thread and for the garbage collector
+to find all reachable objects.  When a thread is running, it may be
+``unpacked'' into machine registers and various other memory locations
+to provide faster access.
+
+Single-threaded systems don't really \emph{need\/} TSOs --- but they do
+need some way to tell the storage manager about live roots so it is
+convenient to use a single TSO to store the mutator state even in
+single-threaded systems.
+
+Rather than manage TSOs' alloc/dealloc, etc., in some \emph{ad hoc}
+way, we instead alloc/dealloc/etc them in the heap; then we can use
+all the standard garbage-collection/fetching/flushing/etc machinery on
+them.  So that's why TSOs are ``heap objects,'' albeit very special
+ones.
+\begin{center}
+\begin{tabular}{|l|l|}
+   \hline \emph{Fixed header}
+\\ \hline @TSO_LINK@
+\\ \hline @TSO_STATE@
+\\ \hline \emph{Exception Handlers}
+\\ \hline \emph{Ticky Info}
+\\ \hline \emph{Profiling Info}
+\\ \hline \emph{Parallel Info}
+\\ \hline \emph{GranSim Info}
+\\ \hline 
+\\
+          \emph{Stack}
+\\
+\\ \hline 
+\end{tabular}
+\end{center}
+The contents of a TSO are:
+\begin{itemize}
+
+\item A pointer (@TSO_LINK@) used to maintain a list of threads with a similar
+  state (e.g.~all runnable, all sleeping, all blocked on the same black
+  hole, all blocked on the same MVar, etc.)
+
+\item A word (@TSO_STATE@) which records the current state of a thread: running, runnable, blocked, etc.
+
+\item Optional information for ``Ticky Ticky'' statistics: @TSO_STK_HWM@ is
+  the maximum number of words allocated to this thread.
+
+\item Optional information for profiling: 
+  @TSO_CCC@ is the current cost centre.
+
+\item Optional information for parallel execution:
+
+% \begin{itemize}
+% 
+% \item The types of threads (@TSO_TYPE@):
+% \begin{description}
+% \item[@T_MAIN@]     Must be executed locally.
+% \item[@T_REQUIRED@] A required thread  -- may be exported.
+% \item[@T_ADVISORY@] An advisory thread -- may be exported.
+% \item[@T_FAIL@]     A failure thread   -- may be exported.
+% \end{description}
+% 
+% \item I've no idea what else
+% 
+% \end{itemize}
+% 
+% \item Optional information for GranSim execution:
+% \begin{itemize}
+% \item locked         
+% \item sparkname       
+% \item started at      
+% \item exported        
+% \item basic blocks    
+% \item allocs  
+% \item exectime        
+% \item fetchtime       
+% \item fetchcount      
+% \item blocktime       
+% \item blockcount      
+% \item global sparks   
+% \item local sparks    
+% \item queue           
+% \item priority        
+% \item clock          (gransim light only)
+% \end{itemize}
+% 
+% 
+% Here are the various queues for GrAnSim-type events.
+% 
+% Q_RUNNING   
+% Q_RUNNABLE  
+% Q_BLOCKED   
+% Q_FETCHING  
+% Q_MIGRATING 
+% 
+
+\end{itemize}
+
+\subsection{Stack Objects}
+\label{sect:STACK_OBJECT}
+\label{sect:stacks}
+
+\ToDo{Merge this in with the section on TSOs}
+
+These are ``stack objects,'' which are used in the threaded world as
+the stack for each thread is allocated from the heap in smallish
+chunks.  (The stack in the sequential world is allocated outside of
+the heap.)
+
+Each reduction thread has to have its own stack space.  As there may
+be many such threads, and as any given one may need quite a big stack,
+a naive give-'em-a-big-stack-and-let-'em-run approach will cost a {\em
+lot} of memory.
+
+Our approach is to give a thread a small stack space, and then link
+on/off extra ``chunks'' as the need arises.  Again, this is a
+storage-management problem, and, yet again, we choose to graft the
+whole business onto the existing heap-management machinery.  So stack
+objects will live in the heap, be garbage collected, etc., etc..
+
+A stack object is laid out like this:
+
+\begin{center}
+\begin{tabular}{|l|}
+\hline
+\emph{Fixed header} 
+\\ \hline
+\emph{Link to next stack object (0 for last)}
+\\ \hline
+\emph{N, the payload size in words}
+\\ \hline
+\emph{@Sp@ (byte offset from head of object)}
+\\ \hline
+\emph{@Su@ (byte offset from head of object)}
+\\ \hline
+\emph{Payload (N words)}
+\\ \hline
+\end{tabular}
+\end{center}
+
+The stack grows downwards, towards decreasing
+addresses.  This makes it easier to print out the stack
+when debugging, and it means that a return address is
+at the lowest address of the chunk of stack it ``knows about''
+just like an info pointer on a closure.
+
+The garbage collector needs to be able to find all the
+pointers in a stack.  How does it do this?
+
+\begin{itemize}
+
+\item Within the stack there are return addresses, pushed
+by @case@ expressions.  Below a return address (i.e. at higher
+memory addresses, since the stack grows downwards) is a chunk
+of stack that the return address ``knows about'', namely the
+activation record of the currently running function.
+
+\item Below each such activation record is a \emph{pending-argument
+section}, a chunk of
+zero or more words that are the arguments to which the result
+of the function should be applied.  The return address does not
+statically
+``know'' how many pending arguments there are, or their types.
+(For example, the function might return a result of type $\alpha$.)
+
+\item Below each pending-argument section is another return address,
+and so on.  Actually, there might be an update frame instead, but we
+can consider update frames as a special case of a return address with
+a well-defined activation record.
+
+\end{itemize}
+
+The game plan is this.  The garbage collector
+walks the stack from the top, traversing pending-argument sections and
+activation records alternately.  Next we discuss how it finds
+the pointers in each of these two stack regions.
+
+
+\subsubsection{Activation records}\label{sect:activation-records}
+
+
+An \emph{activation record} is a contiguous chunk of stack,
+with a return address as its first word, followed by as many
+data words as the return address ``knows about''.  The return
+address is actually a fully-fledged info pointer.  It points
+to an info table, replete with:
+
+\begin{itemize}
+\item entry code (i.e. the code to return to).
+\item @INFO_TYPE@ is either @RET_SMALL/RET_VEC_SMALL@ or @RET_BIG/RET_VEC_BIG@, depending
+on whether the activation record has more than 32 data words (\note{64 for 8-byte-word architectures}) and on whether 
+to use a direct or a vectored return.
+\item @INFO_SM@ for @RET_SMALL@ is a bitmap telling the layout
+of the activation record, one bit per word.  The least-significant bit
+describes the first data word of the record (adjacent to the fixed
+header) and so on.  A ``@1@'' indicates a non-pointer, a ``@0@''
+indicates
+a pointer.  We don't need to indicate exactly how many words there
+are,
+because when we get to all zeros we can treat the rest of the 
+activation record as part of the next pending-argument region.
+
+For @RET_BIG@ the @INFO_SM@ field points to a block of bitmap
+words, starting with a word that tells how many words are in
+the block.
+
+\item @INFO_SRT@ is the Static Reference Table for the return
+address (Section~\ref{sect:srt}).
+\end{itemize}
+
+The activation record is a fully fledged closure too.
+As well as an info pointer, it has all the other attributes of
+a fixed header (Section~\ref{sect:fixed-header}) including a saved cost
+centre which is reloaded when the return address is entered.
+
+In other words, all the attributes of closures are needed for
+activation records, so it's very convenient to make them look alike.
+
+
+\subsubsection{Pending arguments}
+
+So that the garbage collector can correctly identify pointers
+in pending-argument sections we explicitly tag all non-pointers.
+Every non-pointer in a pending-argument section is preceded
+(at the next lower memory word) by a one-word byte count that
+says how many bytes to skip over (excluding the tag word).
+
+The garbage collector traverses a pending argument section from 
+the top (i.e. lowest memory address).  It looks at each word in turn:
+
+\begin{itemize}
+\item If it is less than or equal to a small constant @MAX_STACK_TAG@
+then
+it treats it as a tag heralding zero or more words of non-pointers,
+so it just skips over them.
+
+\item If it points to the code segment, it must be a return
+address, so we have come to the end of the pending-argument section.
+
+\item Otherwise it must be a bona fide heap pointer.
+\end{itemize}
+
+
+\subsection{The Stable Pointer Table}\label{sect:STABLEPTR_TABLE}
+
+A stable pointer is a name for a Haskell object which can be passed to
+the external world.  It is ``stable'' in the sense that the name does
+not change when the Haskell garbage collector runs---in contrast to
+the address of the object which may well change.
+
+A stable pointer is represented by an index into the
+@StablePointerTable@.  The Haskell garbage collector treats the
+@StablePointerTable@ as a source of roots for GC.
+
+In order to provide efficient access to stable pointers and to be able
+to cope with any number of stable pointers (eg $0 \ldots 100000$), the
+table of stable pointers is an array stored on the heap and can grow
+when it overflows.  (Since we cannot compact the table by moving
+stable pointers about, it seems unlikely that a half-empty table can
+be reduced in size---this could be fixed if necessary by using a
+hash table of some sort.)
+
+In general a stable pointer table closure looks like this:
+
+\begin{center}
+\begin{tabular}{|l|l|l|l|l|l|l|l|l|l|l|}
+\hline
+\emph{Fixed header} & \emph{No of pointers} & \emph{Free} & $SP_0$ & \ldots & $SP_{n-1}$ 
+\\\hline
+\end{tabular}
+\end{center}
+
+The fields are:
+\begin{description}
+
+\item[@NPtrs@:] number of (stable) pointers.
+
+\item[@Free@:] the byte offset (from the first byte of the object) of the first free stable pointer.
+
+\item[$SP_i$:] A stable pointer slot.  If this entry is in use, it is
+an ``unstable'' pointer to a closure.  If this entry is not in use, it
+is a byte offset of the next free stable pointer slot.
+
+\end{description}
+
+When a stable pointer table is evacuated
+\begin{enumerate}
+\item the free list entries are all set to @NULL@ so that the evacuation
+  code knows they're not pointers;
+
+\item The stable pointer slots are scanned linearly: non-@NULL@ slots
+are evacuated and @NULL@-values are chained together to form a new free list.
+\end{enumerate}
+
+There's no need to link the stable pointer table onto the mutable
+list because we always treat it as a root.
+
+
+
+\section{The Bytecode Evaluator}
+
+This section describes how the Hugs interpreter interprets code in the
+same environment as compiled code executes.  Both evaluation models
+use a common garbage collector, so they must agree on the form of
+objects in the heap.
+
+Hugs interprets code by converting it to byte-code and applying a
+byte-code interpreter to it.  Wherever possible, we try to ensure that
+the byte-code is all that is required to interpret a section of code.
+This means not dynamically generating info tables, and hence we can
+only have a small number of possible heap objects each with a statically
+compiled info table.  Similarly for stack objects: in fact we only
+have one Hugs stack object, in which all information is tagged for the
+garbage collector.
+
+There is, however, one exception to this rule.  Hugs must generate
+info tables for any constructors it is asked to compile, since the
+alternative is to force a context-switch each time compiled code
+enters a Hugs-built constructor, which would be prohibitively
+expensive.
+
+We achieve this simplicity by forgoing some of the optimisations used
+by compiled code:
+\begin{itemize}
+\item
+
+Whereas compiled code has five different ways of entering a closure
+(section~\ref{sect:entering-closures}), interpreted code has only one.
+The entry point for interpreted code behaves like slow entry points for
+compiled code.
+
+\item
+
+We use just one info table for \emph{all\/} direct returns.  
+This introduces two problems:
+\begin{enumerate}
+\item How does the interpreter know what code to execute?
+
+Instead of pushing just a return address, we push a return BCO and a 
+trivial return address which just enters the return BCO.
+
+(In a purely interpreted system, we could avoid pushing the trivial
+return address.)
+
+\item How can the garbage collector follow pointers within the
+activation record?
+
+We could push a third word ---a bitmask describing the location of any
+pointers within the record--- but, since we're already tagging unboxed
+function arguments on the stack, we use the same mechanism for unboxed
+values within the activation record.
+
+\ToDo{Do we have to stub out dead variables in the activation frame?}
+
+\end{enumerate}
+
+\item
+
+We trivially support vectored returns by pushing a return vector whose
+entries are all the same.
+
+\item
+
+We avoid the need to build SRTs by putting bytecode objects on the
+heap and restricting BCOs to a single basic block.
+
+\end{itemize}
+
+\subsection{Hugs Info Tables}
+
+Hugs requires the following info tables and closures:
+\begin{description}
+\item [@HUGS_RET@].
+
+Contains both a vectored return table and a direct entry point.  All
+entry points are the same: they rearrange the stack to match the Hugs
+return convention (section~\label{sect:hugs-return-convention}) and return
+to the scheduler.  When the scheduler restarts the thread, it will
+find a BCO on top of the stack and will enter the Hugs interpreter.
+
+\item [@UPD_RET@].
+
+This is just the standard info table for an update frame.
+
+\item [Constructors].
+
+The entry code for a constructor jumps to a generic entry point in the
+runtime system which decides whether to do a vectored or unvectored
+return depending on the shape of the constructor/type.  This implies that
+info tables must have enough info to make that decision.
+
+\item [@AP@ and @PAP@].
+
+\item [Indirections].
+
+\item [Selectors].
+
+Hugs doesn't generate them itself but it ought to recognise them
+
+\item [Complex primops].
+
+Some of the primops are too complex for GHC to generate inline.
+Instead, these primops are hand-written and called as normal functions.
+Hugs only needs to know their names and types but doesn't care whether
+they are generated by GHC or by hand.  Two things to watch:
+
+\begin{enumerate}
+\item
+Hugs must be able to enter these primops even if it is working on a
+standalone system that does not support genuine GHC generated code.
+
+\item The complex primops often involve unboxed tuple types (which
+Hugs does not support at the source level) so we cannot specify their
+types in a Haskell source file.
+
+\end{enumerate}
+
+\end{description}
+
+\subsection{Hugs Heap Objects}
+\label{sect:hugs-heap-objects}
+
+\subsubsection{Byte-Code Objects}
+
+Compiled byte code lives on the global heap, in objects called
+Byte-Code Objects (or BCOs).  The layout of BCOs is described in
+detail in Section \ref{sect:BCO}, in this section we will describe
+their semantics.
+
+Since byte-code lives on the heap, it can be garbage collected just
+like any other heap-resident data.  Hugs arranges that any BCO's
+referred to by the Hugs symbol tables are treated as live objects by
+the garbage collector.  When a module is unloaded, the pointers to its
+BCOs are removed from the symbol table, and the code will be garbage
+collected some time later.
+
+A BCO represents a basic block of code --- the (only) entry points is
+at the beginning of a BCO, and it is impossible to jump into the
+middle of one.  A BCO represents not only the code for a function, but
+also its closure; a BCO can be entered just like any other closure.
+Hugs performs lambda-lifting during compilation to byte-code, and each
+top-level combinator becomes a BCO in the heap.
+
+
+\subsubsection{Thunks and partial applications}
+
+A thunk consists of a code pointer, and values for the free variables
+of that code.  Since Hugs byte-code is lambda-lifted, free variables
+become arguments and are expected to be on the stack by the called
+function.
+
+Hugs represents updateable thunks with @AP@ objects applying a closure
+to a list of arguments.  (As for @PAP@s, unboxed arguments should be
+preceded by a tag.)  When it is entered, it pushes an update frame
+followed by its payload on the stack, and enters the first word (which
+will be a pointer to a BCO).  The layout of @AP@ objects is described
+in more detail in Section \ref{sect:AP}.
+
+Partial applications are represented by @PAP@ objects, which are
+non-updatable.
  
-Each lambda abstraction, $f$, in the STG program has its own private info table.
-The following labels are relevant:
+\ToDo{Hugs Constructors}.
+
+\subsection{Calling conventions}
+\label{sect:hugs-calling-conventions}
+\label{sect:standard-closures}
+
+The calling convention for any byte-code function is straightforward:
  \begin{itemize}
-\item $f$@_info@  is $f$'s info table.
-\item $f$@_entry@ is $f$'s slow entry point (i.e. the entry code of its
-info table; so it will label the same byte as $f$@_info@).
-\item $f@_fast_@k$ is $f$'s fast entry point.  $k$ is the number of arguments
-$f$ takes; encoding this number in the fast-entry label occasionally catches some nasty
-code-generation errors.
+\item Push any arguments on the stack.
+\item Push a pointer to the BCO.
+\item Begin interpreting the byte code.
  \end{itemize}
  
-\subsubsection{Data Constructors}\label{sect:CONSTR}
+In a system containing both GHC and Hugs, the bytecode interpreter
+only has to be able to enter BCOs: everything else can be handled by
+returning to the compiled world (as described in
+Section~\ref{sect:hugs-to-ghc-switch}) and entering the closure
+there.
+
+This would work but it would obviously be very inefficient if
+we entered a @AP@ by switching worlds, entering the @AP@,
+pushing the arguments and function onto the stack, and entering the
+function which, likely as not, will be a byte-code object which we
+will enter by \emph{returning} to the byte-code interpreter.  To avoid
+such gratuitious world switching, we choose to recognise certain
+closure types as being ``standard'' --- and duplicate the entry code
+for the ``standard closures'' in the bytecode interpreter.
+
+A closure is said to be ``standard'' if its entry code is entirely
+determined by its info table.  \emph{Standard Closures} have the
+desirable property that the byte-code interpreter can enter
+the closure by simply ``interpreting'' the info table instead of
+switching to the compiled world.  The standard closures include:
  
-Data-constructor closures represent values constructed with
-algebraic data type constructors.
-The general layout of data constructors is the same as that for function
-closures.  That is
-\begin{center}
-\begin{tabular}{|l|l|l|l|}\hline
-{\em Fixed header}  & {\em Pointers} & {\em Non-pointers} \\ \hline
-\end{tabular}
-\end{center}
+\begin{description}
+\item[Constructor]
+To enter a constructor, we simply return (see Section
+\ref{sect:hugs-return-convention}).
  
-The SRT pointer in a data constructor's info table is never used --- the
-code for a constructor does not make any static references.
-\note{Use it for something else??  E.g. tag?}
+\item[Indirection]
+To enter an indirection, we simply enter the object it points to
+after possibly adjusting the current cost centre.
  
-There are several different sorts of constructor:
-\begin{itemize}
-\item @CONSTR@: a vanilla, dynamically allocated constructor.
-\item @CONSTR_@$p$@_@$np$: just like $@FUN_@p@_@np$.
-\item @CONSTR_INTLIKE@.
-A dynamically-allocated heap object that looks just like an @Int@.  The 
-garbage collector checks to see if it can common it up with one of a fixed
-set of static int-like closures, thus getting it out of the dynamic heap
-altogether.
+\item[@AP@] 
  
-\item @CONSTR_CHARLIKE@:  same deal, but for @Char@.
+To enter an @AP@, we push an update frame, push the
+arguments, push the function and enter the function.
+(Not forgetting a stack check at the start.)
  
-\item @CONSTR_STATIC@ is similar to @FUN_STATIC@, with the complication that
-the layout of the constructor must mimic that of a dynamic constructor,
-because a static constructor might be returned to some code that unpacks it.
-So its layout is like this:
-\begin{center}
-\begin{tabular}{|l|l|l|l|l|}\hline
-{\em Fixed header}  & {\em Pointers} & {\em Non-pointers} & {\em Static object link}\\ \hline
-\end{tabular}
-\end{center}
-The static object link, at the end of the closure, serves the same purpose
-as that for @FUN_STATIC@.  The pointers in the static constructor can point
-only to other static closures.
+\item[@PAP@]
  
-The static object link occurs last in the closure so that static
-constructors can store their data fields in exactly the same place as
-dynamic constructors.
+To enter a @PAP@, we push the arguments, push the function and enter
+the function.  (Not forgetting a stack check at the start.)
  
-\item @CONSTR_STATIC_NOCAF@.  A statically allocated data constructor
-that guarantees not to point (directly or indirectly) to any CAF
-(section~\ref{sect:CAF}).  This means it does not need a static object
-link field.  Since we expect that there might be quite a lot of static
-constructors this optimisation makes sense.  Furthermore, the @NOCAF@
-tag allows the compiler to indicate that no CAFs can be reached
-anywhere {\em even indirectly}.
+\item[Selector] 
+To enter a selector, we test whether the selectee is a value.  If so,
+we simply select the appropriate component; if not, it's simplest to
+treat it as a GHC-built closure --- though we could interpret it if we
+wanted.
  
+\end{description}
  
-\end{itemize}
+The most obvious omissions from the above list are @BCO@s (which we
+dealt with above) and GHC-built closures (which are covered in Section
+\ref{sect:hugs-to-ghc-switch}).
  
-For each data constructor $Con$, two
-info tables are generated:
-\begin{itemize}
-\item $Con$@_info@ labels $Con$'s dynamic info table, 
-shared by all dynamic instances of the constructor.
-\item $Con$@_static@ labels $Con$'s static info table, 
-shared by all static instances of the constructor.
-\end{itemize}
  
-\subsubsection{Thunks}
-\label{sect:THUNK}
-\label{sect:THUNK_SEL}
+\subsection{Return convention}
+\label{sect:hugs-return-convention}
  
-A thunk represents an expression that is not obviously in head normal 
-form.  For example, consider the following top-level definitions:
+When Hugs pushes a return address, it pushes both a pointer to the BCO
+to return to, and a pointer to a static code fragment @HUGS_RET@ (this
+is described in Section \ref{sect:ghc-to-hugs-switch}).  The
+stack layout is shown in Figure \ref{fig:hugs-return-stack}.
+
+\begin{figure}[ht]
+\begin{center}
  @
-  range = between 1 10
-  f = \x -> let ys = take x range
-           in sum ys
+| stack    |
++----------+
+| bco      |--> BCO
++----------+
+| HUGS_RET |
++----------+
  @
-Here the right-hand sides of @range@ and @ys@ are both thunks; the former
-is static while the latter is dynamic.
+%\input{hugs_ret.pstex_t}
+\end{center}
+\caption{Stack layout for a Hugs return address}
+\label{fig:hugs-return-stack}
+\end{figure}
  
-The layout of a thunk is the same as that for a function closure,
-except that it may have some words of ``slop'' at the end to make sure
-that it has 
-at least @MIN_UPD_PAYLOAD@ words in addition to its
-fixed header.
+\begin{figure}[ht]
  \begin{center}
-\begin{tabular}{|l|l|l|l|l|}\hline
-{\em Fixed header}  & {\em Pointers} & {\em Non-pointers} & {\em Slop} \\ \hline
-\end{tabular}
+@
+| stack    |
++----------+
+| con      |--> CON
++----------+
+@
+%\input{hugs_ret2.pstex_t}
  \end{center}
-The @INFO_SM@ word contains the same information as for function
-closures; that is, number of pointers and number of non-pointers (excluding slop).
-
-A thunk differs from a function closure in that it can be updated.
-
-There are several forms of thunk:
-\begin{itemize}
-\item @THUNK@: a vanilla, dynamically allocated thunk.
-The garbage collection code for thunks whose
-pointer + non-pointer words is less than @MIN_UPD_PAYLOAD@ differs from
-that for function closures and data constructors, because the GC code
-has to account for the slop.
-\item $@THUNK_@p@_@np$.  Similar comments apply.
-\item @THUNK_STATIC@.  A static thunk is also known as 
-a {\em constant applicative form}, or {\em CAF}.
+\caption{Stack layout on enterings a Hugs return address}
+\label{fig:hugs-return2}
+\end{figure}
  
+\begin{figure}[ht]
  \begin{center}
-\begin{tabular}{|l|l|l|l|l|}\hline
-{\em Fixed header}  & {\em Pointers} & {\em Non-pointers} & {\em Slop} & {\em Static object link}\\ \hline
-\end{tabular}
+@
+| stack    |
++----------+
+| 3#       |
++----------+
+| I#       |
++----------+
+@
+%\input{hugs_ret2.pstex_t}
  \end{center}
+\caption{Stack layout on entering a Hugs return address with an unboxed value}
+\label{fig:hugs-return-int}
+\end{figure}
  
-\item @THUNK_SELECTOR@ is a (dynamically allocated) thunk
-whose entry code performs a simple selection operation from
-a data constructor drawn from a single-constructor type.  For example,
-the thunk
+\begin{figure}[ht]
+\begin{center}
  @
-       x = case y of (a,b) -> a
+| stack    |
++----------+
+| ghc_ret  |
++----------+
+| con      |--> CON
++----------+
  @
-is a selector thunk.  A selector thunk is laid out like this:
+%\input{hugs_ret3.pstex_t}
+\end{center}
+\caption{Stack layout on enterings a GHC return address}
+\label{fig:hugs-return3}
+\end{figure}
+
+\begin{figure}[ht]
  \begin{center}
-\begin{tabular}{|l|l|l|l|}\hline
-{\em Fixed header}  & {\em Selectee pointer} \\ \hline
-\end{tabular}
+@
+| stack    |
++----------+
+| ghc_ret  |
++----------+
+| 3#       |
++----------+
+| I#       |
++----------+
+| restart  |--> id_Int#_closure
++----------+
+@
+%\input{hugs_ret2.pstex_t}
  \end{center}
-The @INFO_SM@ word contains the byte offset of the desired word in
-the selectee.  Note that this is different from all other thunks.
+\caption{Stack layout on enterings a GHC return address with an unboxed value}
+\label{fig:hugs-return-int}
+\end{figure}
  
-The garbage collector ``peeks'' at the selectee's
-tag (in its info table).  If it is evaluated, then it goes ahead and do
-the selection, and then behaves just as if the selector thunk was an
-indirection to the selected field.
-If it is not
-evaluated, it treats the selector thunk like any other thunk of that
-shape.  [Implementation notes.  
-Copying: only the evacuate routine needs to be special.
-Compacting: only the PRStart (marking) routine needs to be special.]
-\end{itemize}
+When a Hugs byte-code sequence enters a closure, it examines the 
+return address on top of the stack.
  
+\begin{itemize}
  
-The only label associated with a thunk is its info table:
-\begin{description}
-\item[$f$@_info@] is $f$'s info table.
-\end{description}
+\item If the return address is @HUGS_RET@, pop the @HUGS_RET@ and the
+bco for the continuation off the stack, push a pointer to the constructor onto
+the stack and enter the BCO with the current object pointer set to the BCO
+(Figure \ref{fig:hugs-return2}).
  
+\item If the top of the stack is not @HUGS_RET@, we need to do a world
+switch as described in Section \ref{sect:hugs-to-ghc-switch}.
  
-\subsubsection{Partial applications (PAPs)}\label{sect:PAP}
+\end{itemize}
  
-A partial application (PAP) represents a function applied to too few arguments.
-It is only built as a result of updating after an argument-satisfaction
-check failure.  A PAP has the following shape:
-\begin{center}
-\begin{tabular}{|l|l|l|l|}\hline
-{\em Fixed header}  & {\em No of arg words} & {\em Function closure} & {\em Arg stack} \\ \hline
-\end{tabular}
-\end{center}
-The ``arg stack'' is a copy of of the chunk of stack above the update
-frame; ``no of arg words'' tells how many words it consists of.  The
-function closure is (a pointer to) the closure for the function whose
-argument-satisfaction check failed.
+\ToDo{This duplicates what we say about switching worlds
+(section~\ref{sect:switching-worlds}) - kill one or t'other.}
  
-There is just one standard form of PAP with @INFO_TYPE@ = @PAP@.
-There is just one info table too, called @PAP_info@.
-Its entry code simply copies the arg stack chunk back on top of the
-stack and enters the function closure.  (It has to do a stack overflow test first.)
  
-There are no static PAPs.
+\ToDo{This was in the evaluation model part but it really belongs in
+this part which is about the internal details of each of the major
+sections.}
  
-\subsubsection{Indirections}
-\label{sect:IND}
-\label{sect:IND_OLDGEN}
+\subsection{Addressing Modes}
  
-Indirection closures just point to other closures. They are introduced
-when a thunk is updated to point to its value. 
-The entry code for all indirections simply enters the closure it points to.
+To avoid potential alignment problems and simplify garbage collection,
+all literal constants are stored in two tables (one boxed, the other
+unboxed) within each BCO and are referred to by offsets into the tables.
+Slots in the constant tables are word aligned.
  
-There are several forms of indirection:
-\begin{description}
-\item[@IND@] is the vanilla, dynamically-allocated indirection.
-It is removed by the garbage collector. It has the following
-shape:
-\begin{center}
-\begin{tabular}{|l|l|l|}\hline
-{\em Fixed header} & {\em Target closure} \\ \hline
-\end{tabular}
-\end{center}
+\ToDo{How big can the offsets be?  Is the offset specified in the
+address field or in the instruction?}
  
-\item[@IND_OLDGEN@] is the indirection used to update an old-generation
-thunk. Its shape is like this:
-\begin{center}
-\begin{tabular}{|l|l|l|}\hline
-{\em Fixed header} & {\em Mutable link field} & {\em Target closure} \\ \hline
-\end{tabular}
-\end{center}
-It contains a {\em mutable link field} that is used to string together
-all old-generation indirections that might have a pointer into
-the new generation.
+Literals can have the following types: char, int, nat, float, double,
+and pointer to boxed object.  There is no real difference between
+char, int, nat and float since they all occupy 32 bits --- but it
+costs almost nothing to distinguish them and may improve portability
+and simplify debugging.
  
+\subsection{Compilation}
  
-\item[@IND_PERMANENT@ and @IND_OLDGEN_PERMANENT@.]
-for lexical profiling, it is necessary to maintain cost centre
-information in an indirection, so ``permanent indirections'' are
-retained forever.  Otherwise they are just like vanilla indirections.
-\note{If a permanent indirection points to another permanent
-indirection or a @CONST@ closure, it is possible to elide the indirection
-since it will have no effect on the profiler.}
-\note{Do we still need @IND@ and @IND_OLDGEN@
-in the profiling build, or can we just make
-do with one pair whose behaviour changes when profiling is built?}
  
-\item[@IND_STATIC@] is used for overwriting CAFs when they have been
-evaluated.  Static indirections are not removed by the garbage
-collector; and are statically allocated outside the heap (and should
-stay there).  Their static object link field is used just as for
-@FUN_STATIC@ closures.
+\def\is{\mbox{\it is}}
+\def\ts{\mbox{\it ts}}
+\def\as{\mbox{\it as}}
+\def\bs{\mbox{\it bs}}
+\def\cs{\mbox{\it cs}}
+\def\rs{\mbox{\it rs}}
+\def\us{\mbox{\it us}}
+\def\vs{\mbox{\it vs}}
+\def\ws{\mbox{\it ws}}
+\def\xs{\mbox{\it xs}}
  
-\begin{center}
-\begin{tabular}{|l|l|l|}
-\hline
-{\em Fixed header} & {\em Target closure} & {\em Static object link} \\
-\hline
-\end{tabular}
-\end{center}
+\def\e{\mbox{\it e}}
+\def\alts{\mbox{\it alts}}
+\def\fail{\mbox{\it fail}}
+\def\panic{\mbox{\it panic}}
+\def\ua{\mbox{\it ua}}
+\def\obj{\mbox{\it obj}}
+\def\bco{\mbox{\it bco}}
+\def\tag{\mbox{\it tag}}
+\def\entry{\mbox{\it entry}}
+\def\su{\mbox{\it su}}
  
-\end{description}
+\def\Ind#1{{\mbox{\it Ind}\ {#1}}}
+\def\update#1{{\mbox{\it update}\ {#1}}}
  
-\subsubsection{Activation Records}
+\def\next{$\Longrightarrow$}
+\def\append{\mathrel{+\mkern-6mu+}}
+\def\reverse{\mbox{\it reverse}}
+\def\size#1{{\vert {#1} \vert}}
+\def\arity#1{{\mbox{\it arity}{#1}}}
  
-Activation records are parts of the stack described by return address
-info tables (closures with @INFO_TYPE@ values of @RET_SMALL@,
-@RET_VEC_SMALL@, @RET_BIG@ and @RET_VEC_BIG@). They are described in
-section~\ref{sect:activation-records}.
+\def\AP{\mbox{\it AP}}
+\def\PAP{\mbox{\it PAP}}
+\def\GHCRET{\mbox{\it GHCRET}}
+\def\GHCOBJ{\mbox{\it GHCOBJ}}
  
+To make sense of the instructions, we need a sense of how they will be
+used.  Here is a small compiler for the STG language.
  
-\subsubsection{Black holes, MVars and IVars}
-\label{sect:BH}
-\label{sect:MVAR}
-\label{sect:IVAR}
+@
+> cg (f{a1, ... am}) = do
+>   pushAtom am; ... pushAtom a1
+>   pushVar f
+>   SLIDE (m+1) |env|
+>   ENTER
+> cg (let {x1=rhs1; ... xm=rhsm} in e) = do
+>   ALLOC x1 |rhs1|, ... ALLOC xm |rhsm|
+>   build x1 rhs1,   ... build xm rhsm
+>   cg e
+> cg (case e of alts) = do
+>   PUSHALTS (cgAlts alts)
+>   cg e
+
+> cgAlts { alt1; ... altm }  = cgAlt alt1 $ ... $ cgAlt altm pmFail
+>
+> cgAlt (x@C{xs} -> e) fail = do
+>   TEST C fail
+>   HEAPCHECK (heapUse e)
+>   UNPACK xs
+>   cg e
+
+> build x (C{a1, ... am}) = do 
+>   pushUntaggedAtom am; ... pushUntaggedAtom a1
+>   PACK x C
+> -- A useful optimisation
+> build x ({v1, ... vm} \ {}. f{a1, ... am}) = do 
+>   pushVar am; ... pushVar a1
+>   pushVar f
+>   MKAP x m
+> build x ({v1, ... vm} \ {}. e) = do 
+>   pushVar vm; ... pushVar v1
+>   PUSHBCO (cgRhs ({v1, ... vm} \ {}. e))
+>   MKAP x m
+> build x ({v1, ... vm} \ {x1, ... xm}. e) = do 
+>   pushVar vm; ... pushVar v1
+>   PUSHBCO (cgRhs ({v1, ... vm} \ {x1, ... xm}. e))
+>   MKPAP x m
+
+> cgRhs (vs \ xs. e) = do
+>   ARGCHECK   (xs ++ vs)  -- can be omitted if xs == {}
+>   STACKCHECK min(stackUse e,heapOverflowSlop)
+>   HEAPCHECK  (heapUse e)
+>   cg e
+
+> pushAtom x  = pushVar x
+> pushAtom i# = PUSHINT i#
+
+> pushVar x = if isGlobalVar x then PUSHGLOBAL x else PUSHLOCAL x 
+
+> pushUntaggedAtom x  = pushVar x
+> pushUntaggedAtom i# = PUSHUNTAGGEDINT i#
+
+> pushVar x = if isGlobalVar x then PUSHGLOBAL x else PUSHLOCAL x 
+@
  
-Black hole closures are used to overwrite closures currently being
-evaluated. They inform the garbage collector that there are no live
-roots in the closure, thus removing a potential space leak.  
+\ToDo{Is there an easy way to add semi-tagging?  Would it be that different?}
  
-Black holes also become synchronization points in the threaded world.
-They contain a pointer to a list of blocked threads to be awakened
-when the black hole is updated (or @NULL@ if the list is empty).
-\begin{center}
-\begin{tabular}{|l|l|l|}
-\hline 
-{\em Fixed header} & {\em Mutable link} & {\em Blocked thread link} \\
+\ToDo{Optimise thunks of the form @f{x1,...xm}@ so that we build an AP directly}
+
+\subsection{Instructions}
+
+We specify the semantics of instructions using transition rules of
+the form:
+
+\begin{tabular}{|llrrrrr|}
+\hline
+       & $\is$         & $s$   & $\su$         & $h$  & $hp$  & $\sigma$ \\
+\next  & $\is'$        & $s'$  & $\su'$        & $h'$ & $hp'$ & $\sigma$ \\
  \hline
  \end{tabular}
-\end{center}
-The {\em Blocked thread link} points to the TSO of the first thread
-waiting for the value of this thunk.  All subsequent TSOs in the list
-are linked together using their @TSO_LINK@ field.
  
-When the blocking queue is non-@NULL@, the black hole must be added to
-the mutables list since the TSOs on the list may contain pointers into
-the new generation.  There is no need to clutter up the mutables list
-with black holes with empty blocking queues.
+where $\is$ is an instruction stream, $s$ is the stack, $\su$ is the 
+update frame pointer and $h$ is the heap.
  
-\ToDo{MVars}
  
+\subsection{Stack manipulation}
  
-\subsubsection{FetchMes}\label{sect:FETCHME}
+\begin{description}
  
-In the parallel systems, FetchMes are used to represent pointers into
-the global heap.  When evaluated, the value they point to is read from
-the global heap.
+\item[ Push a global variable ].
  
-\ToDo{Describe layout}
+\begin{tabular}{|llrrrrr|}
+\hline
+       & PUSHGLOBAL $o$ : $\is$ & $s$          & $su$ & $h$ & $hp$ & $\sigma$ \\
+\next  & $\is$                  & $\sigma!o:s$ & $su$ & $h$ & $hp$ & $\sigma$ \\
+\hline
+\end{tabular}
  
+\item[ Push a local variable ].
  
-\subsection{Unpointed Objects}
+\begin{tabular}{|llrrrrr|}
+\hline
+       & PUSHLOCAL $o$ : $\is$ & $s$           & $su$ & $h$ & $hp$ & $\sigma$ \\
+\next  & $\is$                 & $s!o : s$     & $su$ & $h$ & $hp$ & $\sigma$ \\
+\hline
+\end{tabular}
  
-A variable of unpointed type is always bound to a {\em value}, never to a {\em thunk}.
-For this reason, unpointed objects cannot be entered.
+\item[ Push an unboxed int ].
  
-A {\em value} may be:
-\begin{itemize}
-\item {\em Boxed}, i.e.~represented indirectly by a pointer to a heap object (e.g.~foreign objects, arrays); or
-\item {\em Unboxed}, i.e.~represented directly by a bit-pattern in one or more registers (e.g.~@Int#@ and @Float#@).
-\end{itemize}
-All {\em pointed} values are {\em boxed}.  
+\begin{tabular}{|llrrrrr|}
+\hline
+       & PUSHINT $o$ : $\is$   & $s$                   & $su$ & $h$ & $hp$ & $\sigma$ \\
+\next  & $\is$                 & $I\# : \sigma!o : s$  & $su$ & $h$ & $hp$ & $\sigma$ \\
+\hline
+\end{tabular}
  
-\subsubsection{Immutable Objects}
-\label{sect:ARR_WORDS1}
-\label{sect:ARR_PTRS}
+The $I\#$ is a tag included for the benefit of the garbage collector.
+Similar rules exist for floats, doubles, chars, etc.
  
-\begin{description}
-\item[@ARR_WORDS@] is a variable-sized object consisting solely of
-non-pointers.  It is used for arrays of all
-sorts of things (bytes, words, floats, doubles... it doesn't matter).
-\begin{center}
-\begin{tabular}{|c|c|c|c|}
+\item[ Push an unboxed int ].
+
+\begin{tabular}{|llrrrrr|}
+\hline
+       & PUSHUNTAGGEDINT $o$ : $\is$   & $s$                   & $su$ & $h$ & $hp$ & $\sigma$ \\
+\next  & $\is$                 & $\sigma!o : s$        & $su$ & $h$ & $hp$ & $\sigma$ \\
  \hline
-{\em Fixed Hdr} & {\em No of non-pointers} & {\em Non-pointers\ldots}  \\ \hline
  \end{tabular}
-\end{center}
  
-\item[@ARR_PTRS@] is an immutable, variable sized array of pointers.
-\begin{center}
-\begin{tabular}{|c|c|c|c|}
+Similar rules exist for floats, doubles, chars, etc.
+
+\item[ Delete environment from stack --- ready for tail call ].
+
+\begin{tabular}{|llrrrrr|}
+\hline
+       & SLIDE $m$ $n$ : $\is$ & $\as \append \bs \append \cs$         & $su$ & $h$ & $hp$ & $\sigma$ \\
+\next  & $\is$                 & $\as \append \cs$                     & $su$ & $h$ & $hp$ & $\sigma$ \\
  \hline
-{\em Fixed Hdr} & {\em Mutable link} & {\em No of pointers} & {\em Pointers\ldots}     \\ \hline
  \end{tabular}
-\end{center}
-The mutable link is present so that we can easily freeze and thaw an
-array (by changing the header and adding/removing the array to the
-mutables list).
+\\
+where $\size{\as} = m$ and $\size{\bs} = n$.
  
-\end{description}
  
-\subsubsection{Mutable Objects}
-\label{sect:mutables}
-\label{sect:ARR_WORDS2}
-\label{sect:MUTVAR}
-\label{sect:MUTARR_PTRS}
-\label{sect:MUTARR_PTRS_FROZEN}
+\item[ Push a return address ].
  
-Some of these objects are {\em mutable}; they represent objects which
-are explicitly mutated by Haskell code through the @ST@ monad.
-They're not used for thunks which are updated precisely once.
-Depending on the garbage collector, mutable closures may contain extra
-header information which allows a generational collector to implement
-the ``write barrier.''
+\begin{tabular}{|llrrrrr|}
+\hline
+       & PUSHALTS $o$:$\is$    & $s$                   & $su$ & $h$ & $hp$ & $\sigma$ \\
+\next  & $\is$                 & $@HUGS_RET@:\sigma!o:s$       & $su$ & $h$ & $hp$ & $\sigma$ \\
+\hline
+\end{tabular}
+
+\item[ Push a BCO ].
+
+\begin{tabular}{|llrrrrr|}
+\hline
+       & PUSHBCO $o$ : $\is$   & $s$                   & $su$ & $h$ & $hp$ & $\sigma$ \\
+\next  & $\is$                 & $\sigma!o : s$        & $su$ & $h$ & $hp$ & $\sigma$ \\
+\hline
+\end{tabular}
+
+\end{description}
+
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+\subsection{Heap manipulation}
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
  
  \begin{description}
  
-\item[@ARR_WORDS@] is also used to represent {\em mutable} arrays of
-bytes, words, floats, doubles, etc.  It's possible to use the same
-object type because even generational collectors don't need to
-distinguish them.
+\item[ Allocate a heap object ].
  
-\item[@MUTVAR@] is a mutable variable.
-\begin{center}
-\begin{tabular}{|c|c|c|}
+\begin{tabular}{|llrrrrr|}
+\hline
+       & ALLOC $m$ : $\is$     & $s$    & $su$ & $h$ & $hp$   & $\sigma$ \\
+\next  & $\is$                 & $hp:s$ & $su$ & $h$ & $hp+m$ & $\sigma$ \\
  \hline
-{\em Fixed Hdr} & {\em Mutable link} & {\em Pointer} \\ \hline
  \end{tabular}
-\end{center}
  
-\item[@MUTARR_PTRS@] is a mutable array of pointers.
-Such an array may be {\em frozen}, becoming an @SM_MUTARR_PTRS_FROZEN@, with a
-different info-table.
-\begin{center}
-\begin{tabular}{|c|c|c|c|}
+\item[ Build a constructor ].
+
+\begin{tabular}{|llrrrrr|}
+\hline
+       & PACK $o$ $o'$ : $\is$ & $\ws \append s$       & $su$ & $h$                            & $hp$ & $\sigma$ \\
+\next  & $\is$                 & $s$                   & $su$ & $h[s!o \mapsto Pack C\{\ws\}]$ & $hp$ & $\sigma$ \\
  \hline
-{\em Fixed Hdr} & {\em Mutable link} & {\em No of ptrs} & {\em Pointers\ldots} \\ \hline
  \end{tabular}
-\end{center}
-
-\item[@MUTARR_PTRS_FROZEN@] is a frozen @MUTARR_PTRS@ closure.
-The garbage collector converts @MUTARR_PTRS_FROZEN@ to @ARR_PTRS@ as it removes them from
-the mutables list.
+\\
+where $C = \sigma!o'$ and $\size{\ws} = \arity{C}$.
  
-\end{description}
+\item[ Build an AP or  PAP ].
  
+\begin{tabular}{|llrrrrr|}
+\hline
+       & MKAP $o$ $m$:$\is$    & $f : \ws \append s$   & $su$ & $h$                            & $hp$ & $\sigma$ \\
+\next  & $\is$                 & $s$                   & $su$ & $h[s!o \mapsto \AP(f,\ws)]$    & $hp$ & $\sigma$ \\
+\hline
+\end{tabular}
+\\
+where $\size{\ws} = m$.
  
-\subsubsection{Foreign Objects}\label{sect:FOREIGN}
+\begin{tabular}{|llrrrrr|}
+\hline
+       & MKPAP $o$ $m$:$\is$   & $f : \ws \append s$   & $su$ & $h$                            & $hp$ & $\sigma$ \\
+\next  & $\is$                 & $s$                   & $su$ & $h[s!o \mapsto \PAP(f,\ws)]$   & $hp$ & $\sigma$ \\
+\hline
+\end{tabular}
+\\
+where $\size{\ws} = m$.
  
-Here's what a ForeignObj looks like:
+\item[ Unpacking a constructor ].
  
-\begin{center}
-\begin{tabular}{|l|l|l|l|}
-\hline 
-{\em Fixed header} & {\em Data} & {\em Free Routine} & {\em Foreign object link} \\
+\begin{tabular}{|llrrrrr|}
+\hline
+       & UNPACK : $is$         & $a : s$                               & $su$ & $h[a \mapsto C\ \ws]$          & $hp$ & $\sigma$ \\
+\next  & $is'$                 & $(\reverse\ \ws) \append a : s$       & $su$ & $h$                            & $hp$ & $\sigma$ \\
  \hline
  \end{tabular}
-\end{center}
  
-The @FreeRoutine@ is a reference to the finalisation routine to call
-when the @ForeignObj@ becomes garbage.  If @freeForeignObject@ is
-called on a Foreign Object, the @FreeRoutine@ is set to zero and the
-garbage collector will not attempt to call @FreeRoutine@ when the 
-object becomes garbage.
+The $\reverse\ \ws$ looks expensive but, since the stack grows down
+and the heap grows up, that's actually the cheap way of copying from
+heap to stack.  Looking at the compilation rules, you'll see that we
+always push the args in reverse order.
  
-The Foreign object link is a link to the next foreign object in the
-list.  This list is traversed at the end of garbage collection: if an
-object is about to be deallocated (e.g.~it was not marked or
-evacuated), the free routine is called and the object is deleted from
-the list.  
+\end{description}
  
  
-The remaining objects types are all administrative --- none of them may be entered.
+\subsection{Entering a closure}
  
-\subsection{Thread State Objects (TSOs)}\label{sect:TSO}
+\begin{description}
  
-In the multi-threaded system, the state of a suspended thread is
-packed up into a Thread State Object (TSO) which contains all the
-information needed to restart the thread and for the garbage collector
-to find all reachable objects.  When a thread is running, it may be
-``unpacked'' into machine registers and various other memory locations
-to provide faster access.
+\item[ Enter a BCO ].
  
-Single-threaded systems don't really {\em need\/} TSOs --- but they do
-need some way to tell the storage manager about live roots so it is
-convenient to use a single TSO to store the mutator state even in
-single-threaded systems.
+\begin{tabular}{|llrrrrr|}
+\hline
+       & [ENTER]       & $a : s$       & $su$ & $h[a \mapsto BCO\{\is\} ]$     & $hp$ & $\sigma$ \\
+\next  & $\is$         & $a : s$       & $su$ & $h$                            & $hp$ & $a$ \\
+\hline
+\end{tabular}
  
-Rather than manage TSOs' alloc/dealloc, etc., in some {\em ad hoc}
-way, we instead alloc/dealloc/etc them in the heap; then we can use
-all the standard garbage-collection/fetching/flushing/etc machinery on
-them.  So that's why TSOs are ``heap objects,'' albeit very special
-ones.
-\begin{center}
-\begin{tabular}{|l|l|}
-   \hline {\em Fixed header}
-\\ \hline @TSO_LINK@
-\\ \hline @TSO_WHATNEXT@
-\\ \hline @TSO_WHATNEXT_INFO@ 
-\\ \hline @TSO_STACK@ 
-\\ \hline {\em Exception Handlers}
-\\ \hline {\em Ticky Info}
-\\ \hline {\em Profiling Info}
-\\ \hline {\em Parallel Info}
-\\ \hline {\em GranSim Info}
-\\ \hline
+\item[ Enter a PAP closure ].
+
+\begin{tabular}{|llrrrrr|}
+\hline
+       & [ENTER]       & $a : s$               & $su$ & $h[a \mapsto \PAP(f,\ws)]$     & $hp$ & $\sigma$ \\
+\next  & [ENTER]       & $f : \ws \append s$   & $su$ & $h$                            & $hp$ & $???$ \\
+\hline
  \end{tabular}
-\end{center}
-The contents of a TSO are:
-\begin{itemize}
  
-\item A pointer (@TSO_LINK@) used to maintain a list of threads with a similar
-  state (e.g.~all runable, all sleeping, all blocked on the same black
-  hole, all blocked on the same MVar, etc.)
+\item[ Entering an AP closure ].
  
-\item A word (@TSO_WHATNEXT@) which is in suspended threads to record
- how to awaken it.  This typically requires a program counter which is stored
- in the pointer @TSO_WHATNEXT_INFO@
+\begin{tabular}{|llrrrrr|}
+\hline
+       & [ENTER]       & $a : s$                               & $su$  & $h[a \mapsto \AP(f,ws)]$      & $hp$ & $\sigma$ \\
+\next  & [ENTER]       & $f : \ws \append @UPD_RET@:\su:a:s$   & $su'$ & $h$                           & $hp$ & $???$ \\
+\hline
+\end{tabular}
  
-\item A pointer (@TSO_STACK@) to the top stack block.
+Optimisations:
+\begin{itemize}
+\item Instead of blindly pushing an update frame for $a$, we can first test whether there's already
+ an update frame there.  If so, overwrite the existing updatee with an indirection to $a$ and
+ overwrite the updatee field with $a$.  (Overwriting $a$ with an indirection to the updatee also
+ works.)  This results in update chains of maximum length 2. 
+\end{itemize}
  
-\item Optional information for ``Ticky Ticky'' statistics: @TSO_STK_HWM@ is
-  the maximum number of words allocated to this thread.
  
-\item Optional information for profiling: 
-  @TSO_CCC@ is the current cost centre.
+\item[ Returning a constructor ].
  
-\item Optional information for parallel execution:
-\begin{itemize}
+\begin{tabular}{|llrrrrr|}
+\hline
+       & [ENTER]               & $a : @HUGS_RET@ : \alts : s$  & $su$ & $h[a \mapsto C\{\ws\}]$        & $hp$ & $\sigma$ \\
+\next  & $\alts.\entry$        & $a:s$                         & $su$ & $h$                            & $hp$ & $\sigma$ \\
+\hline
+\end{tabular}
  
-\item The types of threads (@TSO_TYPE@):
-\begin{description}
-\item[@T_MAIN@]     Must be executed locally.
-\item[@T_REQUIRED@] A required thread  -- may be exported.
-\item[@T_ADVISORY@] An advisory thread -- may be exported.
-\item[@T_FAIL@]     A failure thread   -- may be exported.
-\end{description}
  
-\item I've no idea what else
+\item[ Entering an indirection node ].
  
-\end{itemize}
+\begin{tabular}{|llrrrrr|}
+\hline
+       & [ENTER]       & $a  : s$      & $su$ & $h[a \mapsto \Ind{a'}]$        & $hp$ & $\sigma$ \\
+\next  & [ENTER]       & $a' : s$      & $su$ & $h$                            & $hp$ & $\sigma$ \\
+\hline
+\end{tabular}
  
-\item Optional information for GranSim execution:
-\begin{itemize}
-\item locked         
-\item sparkname         
-\item started at        
-\item exported  
-\item basic blocks      
-\item allocs    
-\item exectime  
-\item fetchtime         
-\item fetchcount        
-\item blocktime         
-\item blockcount        
-\item global sparks     
-\item local sparks      
-\item queue             
-\item priority  
-\item clock          (gransim light only)
-\end{itemize}
+\item[Entering GHC closure].
  
+\begin{tabular}{|llrrrrr|}
+\hline
+       & [ENTER]       & $a : s$       & $su$ & $h[a \mapsto \GHCOBJ]$         & $hp$ & $\sigma$ \\
+\next  & [ENTERGHC]    & $a : s$       & $su$ & $h$                            & $hp$ & $\sigma$ \\
+\hline
+\end{tabular}
  
-Here are the various queues for GrAnSim-type events.
-@
-Q_RUNNING   
-Q_RUNNABLE  
-Q_BLOCKED   
-Q_FETCHING  
-Q_MIGRATING 
-@
+\item[Returning a constructor to GHC].
  
-\end{itemize}
+\begin{tabular}{|llrrrrr|}
+\hline
+       & [ENTER]       & $a : \GHCRET : s$     & $su$ & $h[a \mapsto C \ws]$   & $hp$ & $\sigma$ \\
+\next  & [ENTERGHC]    & $a : \GHCRET : s$     & $su$ & $h$                    & $hp$ & $\sigma$ \\
+\hline
+\end{tabular}
+
+\end{description}
  
-\subsection{Other weird objects}
-\label{sect:SPARK}
-\label{sect:BLOCKED_FETCH}
+
+\subsection{Updates}
  
  \begin{description}
-\item[@BlockedFetch@ heap objects (`closures')] (parallel only)
  
-@BlockedFetch@s are inbound fetch messages blocked on local closures.
-They arise as entries in a local blocking queue when a fetch has been
-received for a local black hole.  When awakened, we look at their
-contents to figure out where to send a resume.
+\item[ Updating with a constructor].
  
-A @BlockedFetch@ closure has the form:
-\begin{center}
-\begin{tabular}{|l|l|l|l|l|l|}\hline
-{\em Fixed header} & link & node & gtid & slot & weight \\ \hline
+\begin{tabular}{|llrrrrr|}
+\hline
+       & [ENTER]       & $a : @UPD_RET@ : ua : s$      & $su$ & $h[a \mapsto C\{\ws\}]$  & $hp$ & $\sigma$ \\
+\next  & [ENTER]       & $a \append s$                 & $su$ & $h[au \mapsto \Ind{a}$   & $hp$ & $\sigma$ \\
+\hline
  \end{tabular}
-\end{center}
  
-\item[Spark Closures] (parallel only)
+\item[ Argument checks].
  
-Spark closures are used to link together all closures in the spark pool.  When
-the current processor is idle, it may choose to speculatively evaluate some of
-the closures in the pool.  It may also choose to delete sparks from the pool.
-\begin{center}
-\begin{tabular}{|l|l|l|l|l|l|}\hline
-{\em Fixed header} & {\em Spark pool link} & {\em Sparked closure} \\ \hline
+\begin{tabular}{|llrrrrr|}
+\hline
+       & ARGCHECK $m$:$\is$    & $a : \as \append s$   & $su$ & $h$    & $hp$ & $\sigma$ \\
+\next  & $\is$                 & $a : \as \append s$   & $su$ & $h'$   & $hp$ & $\sigma$ \\
+\hline
  \end{tabular}
-\end{center}
+\\
+where $m \ge (su - sp)$
  
+\begin{tabular}{|llrrrrr|}
+\hline
+       & ARGCHECK $m$:$\is$    & $a : \as \append @UPD_RET@:su:ua:s$   & $su$ & $h$    & $hp$ & $\sigma$ \\
+\next  & $\is$                 & $a : \as \append s$                   & $su$ & $h'$   & $hp$ & $\sigma$ \\
+\hline
+\end{tabular}
+\\
+where $m < (su - sp)$ and
+      $h' = h[ua \mapsto \Ind{a'}, a' \mapsto \PAP(a,\reverse\ \as) ]$
  
-\end{description}
-
+Again, we reverse the list of values as we transfer them from the
+stack to the heap --- reflecting the fact that the stack and heap grow
+in different directions.
  
-\subsection{Stack Objects}
-\label{sect:STACK_OBJECT}
-\label{sect:stacks}
+\end{description}
  
-These are ``stack objects,'' which are used in the threaded world as
-the stack for each thread is allocated from the heap in smallish
-chunks.  (The stack in the sequential world is allocated outside of
-the heap.)
+\subsection{Branches}
  
-Each reduction thread has to have its own stack space.  As there may
-be many such threads, and as any given one may need quite a big stack,
-a naive give-'em-a-big-stack-and-let-'em-run approach will cost a {\em
-lot} of memory.
+\begin{description}
  
-Our approach is to give a thread a small stack space, and then link
-on/off extra ``chunks'' as the need arises.  Again, this is a
-storage-management problem, and, yet again, we choose to graft the
-whole business onto the existing heap-management machinery.  So stack
-objects will live in the heap, be garbage collected, etc., etc..
+\item[ Testing a constructor ].
  
-A stack object is laid out like this:
+\begin{tabular}{|llrrrrr|}
+\hline
+       & TEST $tag$ $is'$ : $is$       & $a : s$       & $su$ & $h[a \mapsto C\ \ws]$  & $hp$ & $\sigma$ \\
+\next  & $is$                          & $a : s$       & $su$ & $h$                    & $hp$ & $\sigma$ \\
+\hline
+\end{tabular}
+\\
+where $C.\tag = tag$
  
-\begin{center}
-\begin{tabular}{|l|}
+\begin{tabular}{|llrrrrr|}
+\hline
+       & TEST $tag$ $is'$ : $is$       & $a : s$       & $su$ & $h[a \mapsto C\ \ws]$  & $hp$ & $\sigma$ \\
+\next  & $is'$                         & $a : s$       & $su$ & $h$                    & $hp$ & $\sigma$ \\
  \hline
-{\em Fixed header} 
-\\ \hline
-{\em Link to next stack object (0 for last)}
-\\ \hline
-{\em N, the payload size in words}
-\\ \hline
-{\em @Sp@ (byte offset from head of object)}
-\\ \hline
-{\em @Su@ (byte offset from head of object)}
-\\ \hline
-{\em Payload (N words)}
-\\ \hline
  \end{tabular}
-\end{center}
+\\
+where $C.\tag \neq tag$
  
-\ToDo{Are stack objects on the mutable list?}
+\end{description}
  
-The stack grows downwards, towards decreasing
-addresses.  This makes it easier to print out the stack
-when debugging, and it means that a return address is
-at the lowest address of the chunk of stack it ``knows about''
-just like an info pointer on a closure.
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+\subsection{Heap and stack checks}
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
  
-The garbage collector needs to be able to find all the
-pointers in a stack.  How does it do this?
+\begin{tabular}{|llrrrrr|}
+\hline
+       & STACKCHECK $stk$:$\is$        & $s$   & $su$ & $h$    & $hp$ & $\sigma$ \\
+\next  & $\is$                         & $s$   & $su$ & $h$    & $hp$ & $\sigma$ \\
+\hline
+\end{tabular}
+\\
+if $s$ has $stk$ free slots.
  
-\begin{itemize}
+\begin{tabular}{|llrrrrr|}
+\hline
+       & HEAPCHECK $hp$:$\is$          & $s$   & $su$ & $h$    & $hp$ & $\sigma$ \\
+\next  & $\is$                         & $s$   & $su$ & $h$    & $hp$ & $\sigma$ \\
+\hline
+\end{tabular}
+\\
+if $h$ has $hp$ free slots.
  
-\item Within the stack there are return addresses, pushed
-by @case@ expressions.  Below a return address (i.e. at higher
-memory addresses, since the stack grows downwards) is a chunk
-of stack that the return address ``knows about'', namely the
-activation record of the currently running function.
+If either check fails, we push the current bco ($\sigma$) onto the
+stack and return to the scheduler.  When the scheduler has fixed the
+problem, it pops the top object off the stack and reenters it.
  
-\item Below each such activation record is a {\em pending-argument
-section}, a chunk of
-zero or more words that are the arguments to which the result
-of the function should be applied.  The return address does not
-statically
-``know'' how many pending arguments there are, or their types.
-(For example, the function might return a result of type $\alpha$.)
  
-\item Below each pending-argument section is another return address,
-and so on.  Actually, there might be an update frame instead, but we
-can consider update frames as a special case of a return address with
-a well-defined activation record.
+Optimisations:
+\begin{itemize}
+\item The bytecode CHECK1000 conservatively checks for 1000 words of heap space and 1000 words of stack space.
+      We use it to reduce code space and instruction decoding time.
+\item The bytecode HEAPCHECK1000 conservatively checks for 1000 words of heap space.
+      It is used in case alternatives.
+\end{itemize}
  
-\ToDo{Doesn't it {\em have} to be an update frame?  After all, the arg
-satisfaction check is @Su - Sp >= ...@.}
  
-\end{itemize}
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+\subsection{Primops}
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
  
-The game plan is this.  The garbage collector
-walks the stack from the top, traversing pending-argument sections and
-activation records alternately.  Next we discuss how it finds
-the pointers in each of these two stack regions.
+\ToDo{primops take m words and return n words. The expect boxed arguments on the stack.}
  
  
-\subsubsection{Activation records}\label{sect:activation-records}
+\section{The Machine Code Evaluator}
  
-An {\em activation record} is a contiguous chunk of stack,
-with a return address as its first word, followed by as many
-data words as the return address ``knows about''.  The return
-address is actually a fully-fledged info pointer.  It points
-to an info table, replete with:
+This section describes the framework in which compiled code evaluates
+expressions.  Only at certain points will compiled code need to be
+able to talk to the interpreted world; these are discussed in Section
+\ref{sect:switching-worlds}.
  
-\begin{itemize}
-\item entry code (i.e. the code to return to).
-\item @INFO_TYPE@ is either @RET_SMALL/RET_VEC_SMALL@ or @RET_BIG/RET_VEC_BIG@, depending
-on whether the activation record has more than 32 data words (\note{64 for 8-byte-word architectures}) and on whether 
-to use a direct or a vectored return.
-\item @INFO_SM@ for @RET_SMALL@ is a bitmap telling the layout
-of the activation record, one bit per word.  The least-significant bit
-describes the first data word of the record (adjacent to the fixed
-header) and so on.  A ``@1@'' indicates a non-pointer, a ``@0@''
-indicates
-a pointer.  We don't need to indicate exactly how many words there
-are,
-because when we get to all zeros we can treat the rest of the 
-activation record as part of the next pending-argument region.
+\subsection{Calling conventions}
  
-For @RET_BIG@ the @INFO_SM@ field points to a block of bitmap
-words, starting with a word that tells how many words are in
-the block.
+\subsubsection{The call/return registers}
  
-\item @INFO_SRT@ is the Static Reference Table for the return
-address (Section~\ref{sect:srt}).
-\end{itemize}
+One of the problems in designing a virtual machine is that we want it
+abstract away from tedious machine details but still reveal enough of
+the underlying hardware that we can make sensible decisions about code
+generation.  A major problem area is the use of registers in
+call/return conventions.  On a machine with lots of registers, it's
+cheaper to pass arguments and results in registers than to pass them
+on the stack.  On a machine with very few registers, it's cheaper to
+pass arguments and results on the stack than to use ``virtual
+registers'' in memory.  We therefore use a hybrid system: the first
+$n$ arguments or results are passed in registers; and the remaining
+arguments or results are passed on the stack.  For register-poor
+architectures, it is important that we allow $n=0$.
  
-The activation record is a fully fledged closure too.
-As well as an info pointer, it has all the other attributes of
-a fixed header (Section~\ref{sect:fixed-header}) including a saved cost
-centre which is reloaded when the return address is entered.
+We'll label the arguments and results \Arg{1} \ldots \Arg{m} --- with
+the understanding that \Arg{1} \ldots \Arg{n} are in registers and
+\Arg{n+1} \ldots \Arg{m} are on top of the stack.
  
-In other words, all the attributes of closures are needed for
-activation records, so it's very convenient to make them look alike.
+Note that the mapping of arguments \Arg{1} \ldots \Arg{n} to machine
+registers depends on the \emph{kinds} of the arguments.  For example,
+if the first argument is a Float, we might pass it in a different
+register from if it is an Int.  In fact, we might find that a given
+architecture lets us pass varying numbers of arguments according to
+their types.  For example, if a CPU has 2 Int registers and 2 Float
+registers then we could pass between 2 and 4 arguments in machine
+registers --- depending on whether they all have the same kind or they
+have different kinds.
  
+\subsubsection{Entering closures}
+\label{sect:entering-closures}
  
-\subsubsection{Pending arguments}
+To evaluate a closure we jump to the entry code for the closure
+passing a pointer to the closure in \Arg{1} so that the entry code can
+access its environment.
  
-So that the garbage collector can correctly identify pointers
-in pending-argument sections we explicitly tag all non-pointers.
-Every non-pointer in a pending-argument section is preceded
-(at the next lower memory word) by a one-word byte count that
-says how many bytes to skip over (excluding the tag word).
+\subsubsection{Function call}
  
-The garbage collector traverses a pending argument section from 
-the top (i.e. lowest memory address).  It looks at each word in turn:
+The function-call mechanism is obviously crucial.  There are five different
+cases to consider:
+\begin{enumerate}
  
-\begin{itemize}
-\item If it is less than or equal to a small constant @MAX_STACK_TAG@
-then
-it treats it as a tag heralding zero or more words of non-pointers,
-so it just skips over them.
+\item \emph{Known combinator (function with no free variables) and enough arguments.}  
  
-\item If it points to the code segment, it must be a return
-address, so we have come to the end of the pending-argument section.
+A fast call can be made: push excess arguments onto stack and jump to
+function's \emph{fast entry point} passing arguments in \Arg{1} \ldots
+\Arg{m}.  
  
-\item Otherwise it must be a bona fide heap pointer.
-\end{itemize}
+The \emph{fast entry point} is only called with exactly the right
+number of arguments (in \Arg{1} \ldots \Arg{m}) so it can instantly
+start doing useful work without first testing whether it has enough
+registers or having to pop them off the stack first.
  
+\item \emph{Known combinator and insufficient arguments.}
  
-\subsection{The Stable Pointer Table}\label{sect:STABLEPTR_TABLE}
+A slow call can be made: push all arguments onto stack and jump to
+function's \emph{slow entry point}.
  
-A stable pointer is a name for a Haskell object which can be passed to
-the external world.  It is ``stable'' in the sense that the name does
-not change when the Haskell garbage collector runs---in contrast to
-the address of the object which may well change.
+Any unpointed arguments which are pushed on the stack must be tagged.
+This means pushing an extra word on the stack below the unpointed
+words, containing the number of unpointed words above it.
  
-A stable pointer is represented by an index into the
-@StablePointerTable@.  The Haskell garbage collector treats the
-@StablePointerTable@ as a source of roots for GC.
+%Todo: forward ref about tagging?
+%Todo: picture?
  
-In order to provide efficient access to stable pointers and to be able
-to cope with any number of stable pointers (eg $0 \ldots 100000$), the
-table of stable pointers is an array stored on the heap and can grow
-when it overflows.  (Since we cannot compact the table by moving
-stable pointers about, it seems unlikely that a half-empty table can
-be reduced in size---this could be fixed if necessary by using a
-hash table of some sort.)
+The \emph{slow entry point} might be called with insufficient arguments
+and so it must test whether there are enough arguments on the stack.
+This \emph{argument satisfaction check} consists of checking that
+@Su-Sp@ is big enough to hold all the arguments (including any tags).
  
-In general a stable pointer table closure looks like this:
+\begin{itemize} 
  
-\begin{center}
-\begin{tabular}{|l|l|l|l|l|l|l|l|l|l|l|}
-\hline
-{\em Fixed header} & {\em No of pointers} & {\em Free} & $SP_0$ & \ldots & $SP_{n-1}$ 
-\\\hline
-\end{tabular}
-\end{center}
+\item If the argument satisfaction check fails, it is because there is
+one or more update frames on the stack before the rest of the
+arguments that the function needs.  In this case, we construct a PAP
+(partial application, section~\ref{sect:PAP}) containing the arguments
+which are on the stack.  The PAP construction code will return to the
+update frame with the address of the PAP in \Arg{1}.
  
-The fields are:
-\begin{description}
+\item If the argument satisfaction check succeeds, we jump to the fast
+entry point with the arguments in \Arg{1} \ldots \Arg{arity}.
  
-\item[@NPtrs@:] number of (stable) pointers.
+If the fast entry point expects to receive some of \Arg{i} on the
+stack, we can reduce the amount of movement required by making the
+stack layout for the fast entry point look like the stack layout for
+the slow entry point.  Since the slow entry point is entered with the
+first argument on the top of the stack and with tags in front of any
+unpointed arguments, this means that if \Arg{i} is unpointed, there
+should be space below it for a tag and that the highest numbered
+argument should be passed on the top of the stack.
  
-\item[@Free@:] the byte offset (from the first byte of the object) of the first free stable pointer.
+We usually arrange that the fast entry point is placed immediately
+after the slow entry point --- so we can just ``fall through'' to the
+fast entry point without performing a jump.
  
-\item[$SP_i$:] A stable pointer slot.  If this entry is in use, it is
-an ``unstable'' pointer to a closure.  If this entry is not in use, it
-is a byte offset of the next free stable pointer slot.
+\end{itemize}
  
-\end{description}
  
-When a stable pointer table is evacuated
-\begin{enumerate}
-\item the free list entries are all set to @NULL@ so that the evacuation
-  code knows they're not pointers;
+\item \emph{Known function closure (function with free variables) and enough arguments.}
  
-\item The stable pointer slots are scanned linearly: non-@NULL@ slots
-are evacuated and @NULL@-values are chained together to form a new free list.
-\end{enumerate}
+A fast call can be made: push excess arguments onto stack and jump to
+function's \emph{fast entry point} passing a pointer to closure in
+\Arg{1} and arguments in \Arg{2} \ldots \Arg{m+1}.
  
-There's no need to link the stable pointer table onto the mutable
-list because we always treat it as a root.
+Like the fast entry point for a combinator, the fast entry point for a
+closure is only called with appropriate values in \Arg{1} \ldots
+\Arg{m+1} so we can start work straight away.  The pointer to the
+closure is used to access the free variables of the closure.
  
  
+\item \emph{Known function closure and insufficient arguments.}
  
-\section{The Storage Manager}
+A slow call can be made: push all arguments onto stack and jump to the
+closure's slow entry point passing a pointer to the closure in \Arg{1}.
  
-The generational collector remembers the depth of the last generation
-collected and the value of the heap pointer at the end of the last GC.
-If the mutator has not moved the heap pointer, that means that the
-amount of space recovered is insufficient to satisfy even one request
-and it is time to collect an older generation or report a heap overflow.
+Again, the slow entry point performs an argument satisfaction check
+and either builds a PAP or pops the arguments off the stack into
+\Arg{2} \ldots \Arg{m+1} and jumps to the fast entry point.
  
-A deeper collection is also triggered when a minor collection fails to
-recover at least @...@ bytes of space.
  
-When can a GC happen?
+\item \emph{Unknown function closure, thunk or constructor.}
  
+Sometimes, the function being called is not statically identifiable.
+Consider, for example, the @compose@ function:
  @
-- During updates (ie during returns)
-- When a heap check fails
-- When a stack check fails (concurrent system only)
-- When a context switch happens (concurrent system only)
-
-When do heap checks occur?
-- Immediately after entering a thunk
-- Immediately after entering a case alternative
-
-When do stack checks occur?
-- We calculate the worst-case stack usage of an entire
-  thunk so there's no need to put a check inside each alternative.
-- Immediately after entering a thunk
-  We can't make a similar worst-case calculation for heap usage
-  because the heap isn't used in a stacklike manner so any
-  evaluation inside a case might steal some of the heap we've
-  checked for.
-
-Concurrency
-- Threads can be blocked
-- Threads can be put to sleep
-  - Heap may move while we sleep
-  - Black holing may happen while we sleep (ie during GC)
+  compose f g x = f (g x)
  @
+Since @f@ and @g@ are passed as arguments to @compose@, the latter has
+to make a heap call.  In a heap call the arguments are pushed onto the
+stack, and the closure bound to the function is entered.  In the
+example, a thunk for @(g x)@ will be allocated, (a pointer to it)
+pushed on the stack, and the closure bound to @f@ will be
+entered. That is, we will jump to @f@s entry point passing @f@ in
+\Arg{1}.  If \Arg{1} is passed on the stack, it is pushed on top of
+the thunk for @(g x)@.
  
-\subsection{The SM state}
-
-Contains @Hp@, @HpLim@, @StablePtrTable@ plus version-specific info.
+The \emph{entry code} for an updateable thunk (which must have arity 0)
+pushes an update frame on the stack and starts executing the body of
+the closure --- using \Arg{1} to access any free variables.  This is
+described in more detail in section~\ref{sect:data-updates}.
  
-\begin{itemize}
+The \emph{entry code} for a non-updateable closure is just the
+closure's slow entry point.
  
-\item Static Object list 
-\item Foreign Object list
-\item Stable Pointer Table
+\end{enumerate}
  
-\end{itemize}
+In addition to the above considerations, if there are \emph{too many}
+arguments then the extra arguments are simply pushed on the stack with
+appropriate tags.
  
-In addition, the generational collector requires:
+To summarise, a closure's standard (slow) entry point performs the following:
+\begin{description}
+\item[Argument satisfaction check.] (function closure only)
+\item[Stack overflow check.]
+\item[Heap overflow check.]
+\item[Copy free variables out of closure.] %Todo: why?
+\item[Eager black holing.] (updateable thunk only) %Todo: forward ref.
+\item[Push update frame.]
+\item[Evaluate body of closure.]
+\end{description}
  
-\begin{itemize}
  
-\item Old Generation Indirection list
-\item Mutables list --- list of mutable objects in the old generation.
-\item @OldLim@ --- the boundary between the generations
-\item Old Foreign Object list --- foreign objects in the old generation
+\subsection{Case expressions and return conventions}
+\label{sect:return-conventions}
  
-\end{itemize}
+The \emph{evaluation} of a thunk is always initiated by
+a @case@ expression.  For example:
+@
+  case x of (a,b) -> E
+@
  
-It is passed a table of {\em roots\/} containing
+The code for a @case@ expression looks like this:
  
  \begin{itemize}
-
-\item All runnable TSOs
-
+\item Push the free variables of the branches on the stack (fv(@E@) in
+this case).
+\item  Push a \emph{return address} on the stack.
+\item  Evaluate the scrutinee (@x@ in this case).
  \end{itemize}
  
+Once evaluation of the scrutinee is complete, execution resumes at the
+return address, which points to the code for the expression @E@.
  
-In the parallel system, there must be some extra magic associated with
-global GC.
+When execution resumes at the return point, there must be some {\em
+return convention} that defines where the components of the pair, @a@
+and @b@, can be found.  The return convention varies according to the
+type of the scrutinee @x@:
  
-\subsection{The SM interface}
+\begin{itemize}
  
-@initSM@ finalizes any runtime parameters of the storage manager.
+\item 
  
-@exitSM@ does any cleaning up required by the storage manager before
-the program is executed. Its main purpose is to print any summary
-statistics.
+(A space for) the return address is left on the top of the stack.
+Leaving the return address on the stack ensures that the top of the
+stack contains a valid activation record
+(section~\ref{sect:activation-records}) --- should a garbage collection
+be required.
  
-@initHeap@ allocates the heap. It initialises the @hp@ and @hplim@
-fields of @sm@ to represent an empty heap for the compiled-in garbage
-collector.  It also initialises @CAFlist@ to be the empty list. If we
-are using Appel's collector it also initialises the @OldLim@ field.
-It also initialises the stable pointer table and the @ForeignObjList@
-(and @OldForeignObjList@) fields.
+\item If @x@ has a boxed type (e.g.~a data constructor or a function),
+a pointer to @x@ is returned in \Arg{1}.
  
-@collectHeap@ invokes the garbage collector.  @collectHeap@ requires
-all the fields of @sm@ to be initialised appropriately (from the
-STG-machine registers).  The following are identified as heap roots:
-\begin{itemize}
-\item The updated CAFs recorded in @CAFlist@.
-\item Any pointers found on the stack.
-\item All runnable and sleeping TSOs.
-\item The stable pointer table.
-\end{itemize}
+\ToDo{Warn that components of E should be extracted as soon as
+possible to avoid a space leak.}
  
-There are two possible results from a garbage collection:
-\begin{description} 
-\item[@GC_FAIL@] 
-The garbage collector is unable to free up any more space.
+\item If @x@ is an unboxed type (e.g.~@Int#@ or @Float#@), @x@ is
+returned in \Arg{1}
  
-\item[@GC_SUCCESS@]
-The garbage collector managed to free up more space.
+\item If @x@ is an unboxed tuple constructor, the components of @x@
+are returned in \Arg{1} \ldots \Arg{n} but no object is constructed in
+the heap.  
  
-\begin{itemize} 
-\item @hp@ and @hplim@ will indicate the new space available for
-allocation.
+When passing an unboxed tuple to a function, the components are
+flattened out and passed in \Arg{1} \ldots \Arg{n} as usual.
  
-\item The elements of @CAFlist@ and the stable pointers will be
-updated to point to the new locations of the closures they reference.
+\end{itemize}
  
-\item Any members of @ForeignObjList@ which became garbage should have
-been reported (by calling their finalising routines; and the
-@(Old)ForeignObjList@ updated to contain only those Foreign objects
-which are still live.  
+\subsection{Vectored Returns}
  
-\end{itemize}
+Many algebraic data types have more than one constructor.  For
+example, the @Maybe@ type is defined like this:
+@
+  data Maybe a = Nothing | Just a
+@
+How does the return convention encode which of the two constructors is
+being returned?  A @case@ expression scrutinising a value of @Maybe@
+type would look like this: 
+@
+  case E of 
+    Nothing -> ...
+    Just a  -> ...
+@
+Rather than pushing a return address before evaluating the scrutinee,
+@E@, the @case@ expression pushes (a pointer to) a \emph{return
+vector}, a static table consisting of two code pointers: one for the
+@Just@ alternative, and one for the @Nothing@ alternative.  
  
-\end{description}
+\begin{itemize}
  
+\item
  
-%************************************************************************
-%*                                                                     *
-\subsection{``What really happens in a garbage collection?''}
-%*                                                                     *
-%************************************************************************
+The constructor @Nothing@ returns by jumping to the first item in the
+return vector with a pointer to a (statically built) Nothing closure
+in \Arg{1}.  
  
-This is a brief tutorial on ``what really happens'' going to/from the
-storage manager in a garbage collection.
+It might seem that we could avoid loading \Arg{1} in this case since the
+first item in the return vector will know that @Nothing@ was returned
+(and can easily access the Nothing closure in the (unlikely) event
+that it needs it.  The only reason we load \Arg{1} is in case we have to
+perform an update (section~\ref{sect:data-updates}).
  
-\begin{description}
-%------------------------------------------------------------------------
-\item[The heap check:]
+\item 
  
-[OLD-ISH: WDP]
+The constructor @Just@ returns by jumping to the second element of the
+return vector with a pointer to the closure in \Arg{1}.  
  
-If you gaze into the C output of GHC, you see many macros calls like:
-\begin{verbatim}
-HEAP_CHK_2PtrsLive((_FHS+2));
-\end{verbatim}
+\end{itemize}
  
-This expands into the C (roughly speaking...):
-@
-Hp = Hp + (_FHS+2);    /* optimistically move heap pointer forward */
+In this way no test need be made to see which constructor returns;
+instead, execution resumes immediately in the appropriate branch of
+the @case@.
  
-GC_WHILE_OR_IF (HEAP_OVERFLOW_OP(Hp, HpLim) OR_INTERVAL_EXPIRED) {
-       STGCALL2_GC(PerformGC, <liveness-bits>, (_FHS+2));
-}
-@
+\subsection{Direct Returns}
  
-In the parallel world, where we will need to re-try the heap check,
-@GC_WHILE_OR_IF@ will be a ``while''; in the sequential world, it will
-be an ``if''.
+When a datatype has a large number of constructors, it may be
+inappropriate to use vectored returns.  The vector tables may be
+large and sparse, and it may be better to identify the constructor
+using a test-and-branch sequence on the tag.  For this reason, we
+provide an alternative return convention, called a \emph{direct
+return}.
  
-The ``heap lookahead'' checks, which are similar and used for
-multi-precision @Integer@ ops, have some further complications.  See
-the commentary there (@StgMacros.lh@).
+In a direct return, the return address pushed on the stack really is a
+code pointer.  The returning code loads a pointer to the closure being
+returned in \Arg{1} as usual, and also loads the tag into \Arg{2}.
+The code at the return address will test the tag and jump to the
+appropriate code for the case branch.
  
-%------------------------------------------------------------------------
-\item[Into @callWrapper_GC@...:]
+The choice of whether to use a vectored return or a direct return is
+made on a type-by-type basis --- up to a certain maximum number of
+constructors imposed by the update mechanism
+(section~\ref{sect:data-updates}).
  
-When we failed the heap check (above), we were inside the
-GCC-registerised ``threaded world.''  @callWrapper_GC@ is all about
-getting in and out of the threaded world.  On SPARCs, with register
-windows, the name of the game is not shifting windows until we have
-what we want out of the old one.  In tricky cases like this, it's best
-written in assembly language.
+Single-constructor data types also use direct returns, although in
+that case there is no need to return a tag in \Arg{2}.
  
-Performing a GC (potentially) means giving up the thread of control.
-So we must fill in the thread-state-object (TSO) [and its associated
-stk object] with enough information for later resumption:
-\begin{enumerate}
-\item
-Save the return address in the TSO's PC field.
-\item
-Save the machine registers used in the STG threaded world in their
-corresponding TSO fields.  We also save the pointer-liveness
-information in the TSO.
-\item
-The registers that are not thread-specific, notably @Hp@ and
-@HpLim@, are saved in the @StorageMgrInfo@ structure.
-\item
-Call the routine it was asked to call; in this example, call
-@PerformGC@ with arguments @<liveness>@ and @_FHS+2@ (some constant)...
+\ToDo{Say whether we pop the return address before returning}
  
-\end{enumerate}
+\ToDo{Stack stubbing?}
  
-%------------------------------------------------------------------------
-\item[Into the heap overflow wrapper, @PerformGC@ [parallel]:]
+\subsection{Updates}
+\label{sect:data-updates}
  
-Most information has already been saved in the TSO.
+The entry code for an updatable thunk (which must be of arity 0):
  
-\begin{enumerate}
-\item
-The first argument (@<liveness>@, in our example) say what registers
-are live, i.e., are ``roots'' the storage manager needs to know.
-\begin{verbatim}
-StorageMgrInfo.rootno  = 2;
-StorageMgrInfo.roots[0]        = (P_) Ret1_SAVE;
-StorageMgrInfo.roots[1]        = (P_) Ret2_SAVE;
-\end{verbatim}
+\begin{itemize}
+\item copies the free variables out of the thunk into registers or
+  onto the stack.
+\item pushes an \emph{update frame} onto the stack.
  
-\item
-We move the heap-pointer back [we had optimistically
-advanced it, in the initial heap check]
+An update frame is a small activation record consisting of
+\begin{center}
+\begin{tabular}{|l|l|l|}
+\hline
+\emph{Fixed header} & \emph{Update Frame link} & \emph{Updatee} \\
+\hline
+\end{tabular}
+\end{center}
+
+\note{In the semantics part of the STG paper (section 5.6), an update
+frame consists of everything down to the last update frame on the
+stack.  This would make sense too --- and would fit in nicely with
+what we're going to do when we add support for speculative
+evaluation.}
+\ToDo{I think update frames contain cost centres sometimes}
  
  \item 
-We load up the @smInfo@ data from the STG registers' @*_SAVE@ locations.
+If we are doing ``eager blackholing,'' we then overwrite the thunk
+with a black hole.  Otherwise, we leave it to the garbage collector to
+black hole the thunk.
  
-\item
-We mark on the scheduler's big ``blackboard'' that a GC is
-required.
+\item 
+Start evaluating the body of the expression.
  
-\item
-We reschedule, i.e., this thread gives up control.  (The scheduler
-will presumably initiate a garbage-collection, but it may have to do
-any number of other things---flushing, for example---before ``normal
-execution'' resumes; and it most certainly may not be this thread that
-resumes at that point!)
-\end{enumerate}
+\end{itemize}
  
-IT IS AT THIS POINT THAT THE WORLD IS COMPLETELY TIDY.
+When the expression finishes evaluation, it will enter the update
+frame on the top of the stack.  Since the returner doesn't know
+whether it is entering a normal return address/vector or an update
+frame, we follow exactly the same conventions as return addresses and
+return vectors.  That is, on entering the update frame:
  
-%------------------------------------------------------------------------
-\item[Out of @callWrapper_GC@ [parallel]:]
+\begin{itemize} 
+\item The value of the thunk is in \Arg{1}.  (Recall that only thunks
+are updateable and that thunks return just one value.)
  
-When this thread is finally resumed after GC (and who knows what
-else), it will restart by the normal enter-TSO/enter-stack-object
-sequence, which has the effect of re-loading the registers, etc.,
-(i.e., restoring the state).
+\item If the data type is a direct-return type rather than a
+vectored-return type, then the tag is in \Arg{2}.
  
-Because the address we saved in the TSO's PC field was that at the end
-of the heap check, and because the check is a while-loop in the
-parallel system, we will now loop back around, and make sure there is
-enough space before continuing.
-\end{description}
+\item The update frame is still on the stack.
+\end{itemize}
  
+We can safely share a single statically-compiled update function
+between all types.  However, the code must be able to handle both
+vectored and direct-return datatypes.  This is done by arranging that
+the update code looks like this:
  
+@
+                |       ^       |
+                | return vector |
+                |---------------|
+                |  fixed-size   |
+                |  info table   |
+                |---------------|  <- update code pointer
+                |  update code  |
+                |       v       |
+@
  
-\subsection{Static Reference Tables (SRTs)}
-\label{sect:srt}
-\label{sect:CAF}
-\label{sect:static-objects}
+Each entry in the return vector (which is large enough to cover the
+largest vectored-return type) points to the update code.
  
-In the above, we assumed that objects always contained pointers to all
-their free variables.  In fact, this isn't quite true: GHC omits
-pointers to top-level objects and allocates their closures in static
-memory.  This optimisation reduces the number of free variables in
-heap objects - reducing memory usage and the effort needed to put them
-into heap objects.  However, this optimisation comes at a cost: we
-need to complicate the garbage collector with machinery for tracing
-these static references.
+The update code:
+\begin{itemize}
+\item overwrites the \emph{updatee} with an indirection to \Arg{1};
+\item loads @Su@ from the Update Frame link;
+\item removes the update frame from the stack; and 
+\item enters \Arg{1}.
+\end{itemize}
  
-Early versions of GHC used a very simple algorithm: it treated all
-static objects as roots.  This is safe in the sense that no object is
-ever deallocated if there's a chance that it might be required later
-but can lead to some terrible space leaks.  For example, this program
-ought to be able to run in constant space but, because @xs@ is never
-deallocated, it runs in linear space.
+We enter \Arg{1} again, having probably just come from there, because
+it knows whether to perform a direct or vectored return.  This could
+be optimised by compiling special update code for each slot in the
+return vector, which performs the correct return.
  
-@
-main = print xs
-xs = [1..]
-@
+\subsection{Semi-tagging}
+\label{sect:semi-tagging}
  
-The correct behaviour is for the garbage collector to keep a static
-object alive iff it might be required later in execution.  That is, if
-it is reachable from any live heap objects {\em or\/} from any return
-addresses found on the stack or from the current program counter.
-Since it is obviously infeasible for the garbage collector to scan
-machine code looking for static references, the code generator must
-generate a table of all static references in any piece of code (and we
-must place a pointer to this table next to any piece of code we
-generate).
+When a @case@ expression evaluates a variable that might be bound
+to a thunk it is often the case that the scrutinee is already evaluated.
+In this case we have paid the penalty of (a) pushing the return address (or
+return vector address) on the stack, (b) jumping through the info pointer
+of the scrutinee, and (c) returning by an indirect jump through the
+return address on the stack.
  
-Here's what the SRT has to contain:
+If we knew that the scrutinee was already evaluated we could generate
+(better) code which simply jumps to the appropriate branch of the
+@case@ with a pointer to the scrutinee in \Arg{1}.  (For direct
+returns to multiconstructor datatypes, we might also load the tag into
+\Arg{2}).
  
+An obvious idea, therefore, is to test dynamically whether the heap
+closure is a value (using the tag in the info table).  If not, we
+enter the closure as usual; if so, we jump straight to the appropriate
+alternative.  Here, for example, is pseudo-code for the expression
+@(case x of { (a,_,c) -> E }@:
  @
-...
+      \Arg{1} = <pointer to x>;
+      tag = \Arg{1}->entry->tag;
+      if (isWHNF(tag)) {
+          Sp--;  \\ insert space for return address
+          goto ret;
+      }
+      push(ret);           
+      goto \Arg{1}->entry;
+      
+      <info table for return address goes here>
+ret:  a = \Arg{1}->data1; \\ suck out a and c to avoid space leak
+      c = \Arg{1}->data3;
+      <code for E2>
  @
-
-Here's how we represent it:
-
+and here is the code for the expression @(case x of { [] -> E1; x:xs -> E2 }@:
  @
-...
-must be able to handle 0 references well
+      \Arg{1} = <pointer to x>;
+      tag = \Arg{1}->entry->tag;
+      if (isWHNF(tag)) {
+          Sp--;  \\ insert space for return address
+          goto retvec[tag];
+      }
+      push(retinfo);          
+      goto \Arg{1}->entry;
+      
+      .addr ret2
+      .addr ret1
+retvec:           \\ reversed return vector
+      <return info table for case goes here>
+retinfo:
+      panic("Direct return into vectored case");
+      
+ret1: <code for E1>
+
+ret2: x  = \Arg{1}->head;
+      xs = \Arg{1}->tail;
+      <code for E2>
  @
+There is an obvious cost in compiled code size (but none in the size
+of the bytecodes).  There is also a cost in execution time if we enter
+more thunks than data constructors.
  
+Both the direct and vectored returns are easily modified to chase chains
+of indirections too.  In the vectored case, this is most easily done by
+making sure that @IND = TAG_1 - 1@, and adding an extra field to every
+return vector.  In the above example, the indirection code would be
  @
-Other trickery:
-o The CAF list
-o The scavenge list
-o Generational GC trickery
+ind:  \Arg{1} = \Arg{1}->next;
+      goto ind_loop;
  @
+where @ind_loop@ is the second line of code.
  
-\subsection{Space leaks and black holes}
-\label{sect:black-hole}
-
-\iffalse
+Note that we have to leave space for a return address since the return
+address expects to find one.  If the body of the expression requires a
+heap check, we will actually have to write the return address before
+entering the garbage collector.
  
-\ToDo{Insert text stolen from update paper}
  
-\else
+\subsection{Heap and Stack Checks}
+\label{sect:heap-and-stack-checks}
+
+The storage manager detects that it needs to garbage collect the old
+generation when the evaluator requests a garbage collection without
+having moved the heap pointer since the last garbage collection.  It
+is therefore important that the GC routines \emph{not} move the heap
+pointer unless the heap check fails.  This is different from what
+happens in the current STG implementation.
+
+Assuming that the stack can never shrink, we perform a stack check
+when we enter a closure but not when we return to a return
+continuation.  This doesn't work for heap checks because we cannot
+predict what will happen to the heap if we call a function.
+
+If we wish to allow the stack to shrink, we need to perform a stack
+check whenever we enter a return continuation.  Most of these checks
+could be eliminated if the storage manager guaranteed that a stack
+would always have 1000 words (say) of space after it was shrunk.  Then
+we can omit stack checks for less than 1000 words in return
+continuations.
+
+When an argument satisfaction check fails, we need to push the closure
+(in R1) onto the stack - so we need to perform a stack check.  The
+problem is that the argument satisfaction check occurs \emph{before}
+the stack check.  The solution is that the caller of a slow entry
+point or closure will guarantee that there is at least one word free
+on the stack for the callee to use.  
+
+Similarily, if a heap or stack check fails, we need to push the arguments
+and closure onto the stack.  If we just came from the slow entry point, 
+there's certainly enough space and it is the responsibility of anyone
+using the fast entry point to guarantee that there is enough space.
+
+\ToDo{Be more precise about how much space is required - document it
+in the calling convention section.}
  
-A program exhibits a {\em space leak} if it retains storage that is
-sure not to be used again.  Space leaks are becoming increasingly
-common in imperative programs that @malloc@ storage and fail
-subsequently to @free@ it.  They are, however, also common in
-garbage-collected systems, especially where lazy evaluation is
-used.[.wadler leak, runciman heap profiling jfp.]
+\subsection{Handling interrupts/signals}
  
-Quite a bit of experience has now accumulated suggesting that
-implementors must be very conscientious about avoiding gratuitous
-space leaks --- that is, ones which are an accidental artefact of some
-implementation technique.[.appel book.]  The update mechanism is
-a case in point, as <.jones jfp leak.> points out.  Consider a thunk for
-the expression
-@
-  let xs = [1..1000] in last xs
-@ 
-where @last@ is a function that returns the last element of its
-argument list.  When the thunk is entered it will call @last@, which
-will consume @xs@ until it finds the last element.  Since the list
-@[1..1000]@ is produced lazily one might reasonably expect the
-expression to evaluate in constant space.  But {\em until the moment
-of update, the thunk itself still retains a pointer to the beginning
-of the list @xs@}.  So, until the update takes place the whole list
-will be retained!
-
-Of course, this is completely gratuitous.  The pointer to @xs@ in the
-thunk will never be used again.  In <.peyton stock hardware.> the solution to
-this problem that we advocated was to overwrite a thunk's info with a
-fixed ``black hole'' info pointer, {\em at the moment of entry}.  The
-storage management information attached to a black-hole info pointer
-tells the garbage collector that the closure contains no pointers,
-thereby plugging the space leak.
-
-\subsubsection{Lazy black-holing}
-
-Black-holing is a well-known idea.  The trouble is that it is
-gallingly expensive.  To avoid the occasional space leak, for every
-single thunk entry we have to load a full-word literal constant into a
-register (often two instructions) and then store that register into a
-memory location.  
-
-Fortunately, this cost can easily be avoided.  The
-idea is simple: {\em instead of black-holing every thunk on entry,
-wait until the garbage collector is called, and then black-hole all
-(and only) the thunks whose evaluation is in progress at that moment}.
-There is no benefit in black-holing a thunk that is updated before
-garbage collection strikes!  In effect, the idea is to perform the
-black-holing operation lazily, only when it is needed.  This
-dramatically cuts down the number of black-holing operations, as our
-results show {\em forward ref}.
-
-How can we find all the thunks whose evaluation is in progress?  They
-are precisely the ones for which update frames are on the stack.  So
-all we need do is find all the update frames (via the @Su@ chain) and
-black-hole their thunks right at the start of garbage collection.
-Notice that it is not enough to refrain from treating update frames as
-roots: firstly because the thunks to which they point may need to be
-moved in a copying collector, but more importantly because the thunk
-might be accessible via some other route.
-
-\subsubsection{Detecting loops}
-
-Black-holing has a second minor advantage: evaluation of a thunk whose
-value depends on itself will cause a black hole closure to be entered,
-which can cause a suitable error message to be displayed. For example,
-consider the definition
  @
-  x = 1+x
+May have to keep C stack pointer in register to placate OS?
+May have to revert black holes - ouch!
  @
-The code to evaluate @x@'s right hand side will evaluate @x@.  In the
-absence of black-holing, the result will be a stack overflow, as the
-evaluator repeatedly pushes a return address and enters @x@.  If
-thunks are black-holed on entry, then this infinite loop can be caught
-almost instantly.
-
-With our new method of lazy black-holing, a self-referential program
-might cause either stack overflow or a black-hole error message,
-depending on exactly when garbage collection strikes.  It is quite
-easy to conceal these differences, however.  If stack overflow occurs,
-all we need do is examine the update frames on the stack to see if
-more than one refers to the same thunk.  If so, there is a loop that
-would have been detected by eager black-holing.
-
-\subsubsection{Lazy locking}
-\label{sect:lock}
-
-In a parallel implementation, it is necessary somehow to ``lock'' a
-thunk that is under evaluation, so that other parallel evaluators
-cannot simultaneously evaluate it and thereby duplicate work.
-Instead, an evaluator that enters a locked thunk should be blocked,
-and made runnable again when the thunk is updated.
-
-This locking is readily arranged in the same way as black-holing, by
-overwriting the thunk's info pointer with a special ``locked'' info
-pointer, at the moment of entry.  If another evaluator enters the
-thunk before it has been updated, it will land in the entry code for
-the ``locked'' info pointer, which blocks the evaluator and queues it
-on the locked thunk.
-
-The details are given by <.portable parallel trinder.>.  However, the close similarity
-between locking and black holing suggests the following question: can
-locking be done lazily too?  The answer is that it can, except that
-locking can be postponed only until the next {\em context switch},
-rather than the next {\em garbage collection}.  We are assuming here
-that the parallel implementation does not use shared memory to allow
-two processors to access the same closure.  If such access is
-permitted then every thunk entry requires a hardware lock, and becomes
-much too expensive.
-
-Is lazy locking worth while, given that it requires extra work every
-context switch?  We believe it is, because contexts switches are
-relatively infrequent, and thousands of thunk-entries typically take
-place between each.
-
-{\em Measurements elsewhere.  Omit this section? If so, fix cross refs to here.}
  
-\fi
  
  
-\subsection{Squeezing identical updates}
+\section{The Loader}
+\section{The Compilers}
  
  \iffalse
+\part{Old stuff - needs to be mined for useful info}
  
-\ToDo{Insert text stolen from update paper}
-
-\else
+\section{The Scheduler}
  
-Consider the following Haskell definition of the standard
-function @partition@ that divides a list into two, those elements
-that satisfy a predicate @p@ and those that do not:
-@
-  partition :: (a->Bool) -> [a] -> ([a],[a])
-  partition p [] = ([],[])
-  partition p (x:xs) = if p x then (x:ys, zs)
-                              else (ys, x:zs)
-                     where
-                       (ys,zs) = partition p xs
-@
-By the time this definition has been desugared, it looks like this:
-@
-  partition p xs
-    = case xs of
-        [] -> ([],[])
-        (x:xs) -> let
-                    t = partition p xs
-                    ys = fst t
-                    zs = snd t
-                  in
-                  if p x then (x:ys,zs)
-                         else (ys,x:zs)
-@
-Lazy evaluation demands that the recursive call is bound to an
-intermediate variable, @t@, from which @ys@ and @zs@ are lazily
-selected. (The functions @fst@ and @snd@ select the first and second
-elements of a pair, respectively.)
-
-Now, suppose that @partition@ is applied to a list @[x1,x2]@,
-all of whose
-elements satisfy @p@.  We can get a good idea of what will happen
-at runtime by unrolling the recursion a few times in our heads.
-Unrolling once, and remembering that @(p x1)@ is @True@, we get this:
-@
-  partition p [x1,x2]
-=
-  let t1 = partition [x2]
-      ys1 = fst t1
-      zs1 = snd t1
-  in (x1:ys1, zs1)
-@
-Unrolling the rest of the way gives this:
-@
-  partition p [x1,x2]
-=
-  let t2  = ([],[])
-      ys2 = fst t2
-      zs2 = snd t2
-      t1  = (x2:ys2,zs2)
-      ys1 = fst t1
-      zs1 = snd t1
-   in (x1:ys1,zs1)
-@
-Now consider what happens if @zs1@ is evaluated.  It is bound to a
-thunk, which will push an update frame before evaluating the
-expression @snd t1@.  This expression in turn forces evaluation of
-@zs2@, which pushes an update frame before evaluating @snd t2@.
-Indeed the stack of update frames will grow as deep as the list is
-long when @zs1@ is evaluated.  This is rather galling, since all the
-thunks @zs1@, @zs2@, and so on, have the same value.
-
-\ToDo{Describe the state-transformer case in which we get a space leak from
-pending update frames.}
-
-The solution is simple.  The garbage collector, which is going to traverse the
-update stack in any case, can easily identify two update frames that are directly
-on top of each other.  The second of these will update its target with the same
-value as the first.  Therefore, the garbage collector can perform the update 
-right away, by overwriting one update target with an indirection to the second,
-and eliminate the corresponding update frame.  In this way ever-growing stacks of
-update frames are reduced to a single representative at garbage collection time.
-If this is done at the start of garbage collection then, if it turns out that
-some of these update targets are garbage they will be collected right away.
+The Scheduler is the heart of the run-time system.  A running program
+consists of a single running thread, and a list of runnable and
+blocked threads.  The running thread returns to the scheduler when any
+of the following conditions arises:
  
-\fi
+\begin{itemize}
+\item A heap check fails, and a garbage collection is required
+\item Compiled code needs to switch to interpreted code, and vice
+versa.
+\item The thread becomes blocked.
+\item The thread is preempted.
+\end{itemize}
  
-\subsection{Space leaks and selectors}
+A running system has a global state, consisting of
  
-\iffalse
+\begin{itemize}
+\item @Hp@, the current heap pointer, which points to the next
+available address in the Heap.
+\item @HpLim@, the heap limit pointer, which points to the end of the
+heap.
+\item The Thread Preemption Flag, which is set whenever the currently
+running thread should be preempted at the next opportunity.
+\item A list of runnable threads. 
+\item A list of blocked threads.
+\end{itemize}
  
-\ToDo{Insert text stolen from update paper}
+Each thread is represented by a Thread State Object (TSO), which is
+described in detail in Section \ref{sect:TSO}.
  
-\else
+The following is pseudo-code for the inner loop of the scheduler
+itself.
  
-In 1987, Wadler identified an important source of space leaks in
-lazy functional programs.  Consider the Haskell function definition:
-@
-  f p = (g1 a, g2 b) where (a,b) = p
-@
-The pattern-matching in the @where@ clause is known as
-{\em lazy pattern-matching}, because it is performed only if @a@
-or @b@ is actually evaluated.  The desugarer translates lazy pattern matching
-to the use of selectors, @fst@ and @snd@ in this case:
-@
-  f p = let a = fst p
-           b = snd p
-       in
-       (b, a)
-@
-Now suppose that the second component of the pair @(f p)@, namely @a@,
-is evaluated and discarded, but the first is not although it remains
-reachable.  The garbage collector will find that the thunk for @b@ refers
-to @p@ and hence to @a@.  Thus, although @a@ cannot ever be used again, its
-space is retained.  It turns out that this space leak can have a very bad effect
-indeed on a program's space behaviour (Section~\ref{sect:selector-results}).
-
-Wadler's paper also proposed a solution: if the garbage collector
-encounters a thunk of the form @snd p@, where @p@ is evaluated, then
-the garbage collector should perform the selection and overwrite the
-thunk with a pointer to the second component of the pair.  In effect, the
-garbage collector thereby performs a bounded amount of as-yet-undemanded evaluation
-in the hope of improving space behaviour.
-We implement this idea directly, by making the garbage collector
-eagerly execute all selector thunks\footnote{A word of caution: it is rather easy 
-to make a mistake in the implementation, especially if the garbage collector
-uses pointer reversal to traverse the reachable graph.},
-with results 
-reported in Section~\ref{sect:THUNK_SEL}.
-
-One could easily imagine generalisations of this idea, with the garbage 
-collector performing bounded amounts of space-saving work.  One example is
-this:
  @
-  f x []     = (x,x)
-  f x (y:ys) = f (x+1) ys
+while (threads_exist) {
+  // handle global problems: GC, parallelism, etc
+  if (need_gc) gc();  
+  if (external_message) service_message();
+  // deal with other urgent stuff
+
+  pick a runnable thread;
+  do {
+    // enter object on top of stack
+    // if the top object is a BCO, we must enter it
+    // otherwise appply any heuristic we wish.
+    if (thread->stack[thread->sp]->info.type == BCO) {
+       status = runHugs(thread,&smInfo);
+    } else {
+       status = runGHC(thread,&smInfo);
+    }
+    switch (status) {  // handle local problems
+      case (StackOverflow): enlargeStack; break;
+      case (Error e)      : error(thread,e); break;
+      case (ExitWith e)   : exit(e); break;
+      case (Yield)        : break;
+    }
+  } while (thread_runnable);
+}
  @
-Most lazy evaluators will build up a chain of thunks for the accumulating
-parameter, @x@, each of which increments @x@.  It is not safe to evaluate
-any of these thunks eagerly, since @f@ is not strict in @x@, and we know nothing
-about the value of @x@ passed in the initial call to @f@.
-On the other hand, if the garbage collector found a thunk @(x+1)@ where
-@x@ happened to be evaluated, then it could ``execute'' it eagerly.
-If done carefully, the entire chain could be eliminated in a single
-garbage collection.   We have not (yet) implemented this idea.
-A very similar idea, dubbed ``stingy evaluation'', is described 
-by <.stingy.>.
-
-<.sparud lazy pattern matching.> describes another solution to the
-lazy-pattern-matching
-problem.  His solution involves adding code to the two thunks for
-@a@ and @b@ so that if either is evaluated it arranges to update the
-other as well as itself.  The garbage-collector solution is a little
-more general, since it applies whether or not the selectors were
-generated by lazy pattern matching, and in our setting it was easier
-to implement than Sparud's.
  
-\fi
+\subsection{Invoking the garbage collector}
+\subsection{Putting the thread to sleep}
  
+\subsection{Calling C from Haskell}
  
-\subsection{Internal workings of the Compacting Collector}
+We distinguish between "safe calls" where the programmer guarantees
+that the C function will not call a Haskell function or, in a
+multithreaded system, block for a long period of time and "unsafe
+calls" where the programmer cannot make that guarantee.  
  
-\subsection{Internal workings of the Copying Collector}
+Safe calls are performed without returning to the scheduler and are
+discussed elsewhere (\ToDo{discuss elsewhere}).
  
-\subsection{Internal workings of the Generational Collector}
+Unsafe calls are performed by returning an array (outside the Haskell
+heap) of arguments and a C function pointer to the scheduler.  The
+scheduler allocates a new thread from the operating system
+(multithreaded system only), spawns a call to the function and
+continues executing another thread.  When the ccall completes, the
+thread informs the scheduler and the scheduler adds the thread to the
+runnable threads list.  
  
+\ToDo{Describe this in more detail.}
  
  
-\section{Profiling}
+\subsection{Calling Haskell from C}
  
-Registering costs centres looks awkward - can we tidy it up?
+When C calls a Haskell closure, it sends a message to the scheduler
+thread.  On receiving the message, the scheduler creates a new Haskell
+thread, pushes the arguments to the C function onto the thread's stack
+(with tags for unboxed arguments) pushes the Haskell closure and adds
+the thread to the runnable list so that it can be entered in the
+normal way.
  
-\section{Parallelism}
+When the closure returns, the scheduler sends back a message which
+awakens the (C) thread.  
  
-Something about global GC, inter-process messages and fetchmes.
+\ToDo{Do we need to worry about the garbage collector deallocating the
+thread if it gets blocked?}
  
-\section{Debugging}
+\subsection{Switching Worlds}
+\label{sect:switching-worlds}
  
-\section{Ticky Ticky profiling}
+\ToDo{This has all changed: we always leave a closure on top of the
+stack if we mean to continue executing it.  The scheduler examines the
+top of the stack and tries to guess which world we want to be in.  If
+it finds a @BCO@, it certainly enters Hugs, if it finds a @GHC@
+closure, it certainly enters GHC and if it finds a standard closure,
+it is free to choose either one but it's probably best to enter GHC
+for everything except @BCO@s and perhaps @AP@s.}
  
-Measure what proportion of ...:
-\begin{itemize}
-\item
-... Enters are to data values, function values, thunks.
-\item
-... allocations are for data values, functions values, thunks.
-\item
-... updates are for data values, function values.
-\item
-... updates ``fit''
-\item
-... return-in-heap (dynamic)
-\item
-... vectored return (dynamic)
-\item
-... updates are wasted (never re-entered).
-\item
-... constructor returns get away without hitting an update.
-\end{itemize}
+Because this is a combined compiled/interpreted system, the
+interpreter will sometimes encounter compiled code, and vice-versa.
  
-%************************************************************************
-%*                                                                     *
-\subsubsection[ticky-stk-heap-use]{Stack and heap usage}
-%*                                                                     *
-%************************************************************************
+All world-switches go via the scheduler, ensuring that the world is in
+a known state ready to enter either compiled code or the interpreter.
+When a thread is run from the scheduler, the @whatNext@ field in the
+TSO (Section \ref{sect:TSO}) is checked to find out how to execute the
+thread.
  
-Things we are interested in here:
  \begin{itemize}
-\item
-How many times we do a heap check and move @Hp@; comparing this with
-the allocations gives an indication of how many things we get per trip
-to the well:
+\item If @whatNext@ is set to @ReturnGHC@, we load up the required
+registers from the TSO and jump to the address at the top of the user
+stack.
+\item If @whatNext@ is set to @EnterGHC@, we load up the required
+registers from the TSO and enter the closure pointed to by the top
+word of the stack.
+\item If @whatNext@ is set to @EnterHugs@, we enter the top thing on
+the stack, using the interpreter.
+\end{itemize}
  
-If we do a ``heap lookahead,'' we haven't really allocated any
-heap, so we need to undo the effects of an @ALLOC_HEAP@:
+There are four cases we need to consider:
  
-\item
-The stack high-water mark.
+\begin{enumerate}
+\item A GHC thread enters a Hugs-built closure.
+\item A GHC thread returns to a Hugs-compiled return address.
+\item A Hugs thread enters a GHC-built closure.
+\item A Hugs thread returns to a Hugs-compiled return address.
+\end{enumerate}
  
-\item
-Re-use of stack slots, and stubbing of stack slots:
+GHC-compiled modules cannot call functions in a Hugs-compiled module
+directly, because the compiler has no information about arities in the
+external module.  Therefore it must assume any top-level objects are
+CAFs, and enter their closures.
  
-\end{itemize}
+\ToDo{Hugs-built constructors?}
  
-%************************************************************************
-%*                                                                     *
-\subsubsection[ticky-allocs]{Allocations}
-%*                                                                     *
-%************************************************************************
-
-We count things every time we allocate something in the dynamic heap.
-For each, we count the number of words of (1)~``admin'' (header),
-(2)~good stuff (useful pointers and data), and (3)~``slop'' (extra
-space, in hopes it will allow an in-place update).
-
-The first five macros are inserted when the compiler generates code
-to allocate something; the categories correspond to the @ClosureClass@
-datatype (manifest functions, thunks, constructors, big tuples, and
-partial applications).
-
-We may also allocate space when we do an update, and there isn't
-enough space.  These macros suffice (for: updating with a partial
-application and a constructor):
-
-In the threaded world, we allocate space for the spark pool, stack objects,
-and thread state objects.
-
-The histogrammy bit is fairly straightforward; the @-2@ is: one for
-0-origin C arrays; the other one because we do {\em no} one-word
-allocations, so we would never inc that histogram slot; so we shift
-everything over by one.
-
-Some hard-to-account-for words are allocated by/for primitives,
-includes Integer support.  @ALLOC_PRIM2@ tells us about these.  We
-count everything as ``goods'', which is not strictly correct.
-(@ALLOC_PRIM@ is the same sort of stuff, but we know the
-admin/goods/slop breakdown.)
-
-%************************************************************************
-%*                                                                     *
-\subsubsection[ticky-enters]{Enters}
-%*                                                                     *
-%************************************************************************
-
-We do more magical things with @ENT_FUN_DIRECT@.  Besides simply knowing
-how many ``fast-entry-point'' enters there were, we'd like {\em simple}
-information about where those enters were, and the properties thereof.
-@
-struct ent_counter {
-    unsigned   registeredp:16, /* 0 == no, 1 == yes */
-               arity:16,       /* arity (static info) */
-               Astk_args:16,   /* # of args off A stack */
-               Bstk_args:16;   /* # of args off B stack */
-                               /* (rest of args are in registers) */
-    StgChar    *f_str;         /* name of the thing */
-    StgChar    *f_arg_kinds;   /* info about the args types */
-    StgChar    *wrap_str;      /* name of its wrapper (if any) */
-    StgChar    *wrap_arg_kinds;/* info about the orig wrapper's arg types */
-    I_         ctr;            /* the actual counter */
-    struct ent_counter *link;  /* link to chain them all together */
-};
-@
+We now examine the various cases one by one and describe how the
+switch happens in each situation.
  
-%************************************************************************
-%*                                                                     *
-\subsubsection[ticky-returns]{Returns}
-%*                                                                     *
-%************************************************************************
+\subsection{A GHC thread enters a Hugs-built closure}
+\label{sect:ghc-to-hugs-switch}
  
-Whenever a ``return'' occurs, it is returning the constituent parts of
-a data constructor.  The parts can be returned either in registers, or
-by allocating some heap to put it in (the @ALLOC_*@ macros account for
-the allocation).  The constructor can either be an existing one
-(@*OLD*@) or we could have {\em just} figured out this stuff
-(@*NEW*@).
+There is three possibilities: GHC has entered a @PAP@, or it has
+entered a @AP@, or it has entered the BCO directly (for a top-level
+function closure).  @AP@s and @PAP@s are ``standard closures'' and
+so do not require us to enter the bytecode interpreter.
  
-Here's some special magic that Simon wants [edited to match names
-actually used]:
+The entry code for a BCO does the following:
  
-@
-From: Simon L Peyton Jones <simonpj>
-To: partain, simonpj
-Subject: counting updates
-Date: Wed, 25 Mar 92 08:39:48 +0000
+\begin{itemize}
+\item Push the address of the object entered on the stack.
+\item Save the current state of the thread in its TSO.
+\item Return to the scheduler, setting @whatNext@ to @EnterHugs@.
+\end{itemize}
  
-I'd like to count how many times we update in place when actually Node
-points to the thing.  Here's how:
+BCO's for thunks and functions have the same entry conventions as
+slow entry points: they expect to find their arguments on the stac
+with unboxed arguments preceded by appropriate tags.
  
-@RET_OLD_IN_REGS@ sets the variable @ReturnInRegsNodeValid@ to @True@;
-@RET_NEW_IN_REGS@ sets it to @False@.
+\subsection{A GHC thread returns to a Hugs-compiled return address}
+\label{sect:ghc-to-hugs-switch}
  
-@RET_SEMI_???@ sets it to??? ToDo [WDP]
+Hugs return addresses are laid out as in Figure
+\ref{fig:hugs-return-stack}.  If GHC is returning, it will return to
+the address at the top of the stack, namely @HUGS_RET@.  The code at
+@HUGS_RET@ performs the following:
  
-@UPD_CON_IN_PLACE@ tests the variable, and increments @UPD_IN_PLACE_COPY_ctr@
-if it is true.
+\begin{itemize}
+\item pushes \Arg{1} (the return value) on the stack.
+\item saves the thread state in the TSO
+\item returns to the scheduler with @whatNext@ set to @EnterHugs@.
+\end{itemize}
  
-Then we need to report it along with the update-in-place info.
-@
+\noindent When Hugs runs, it will enter the return value, which will
+return using the correct Hugs convention (Section
+\ref{sect:hugs-return-convention}) to the return address underneath it
+on the stack.
  
+\subsection{A Hugs thread enters a GHC-compiled closure}
+\label{sect:hugs-to-ghc-switch}
  
-Of all the returns (sum of four categories above), how many were
-vectored?  (The rest were obviously unvectored).
+Hugs can recognise a GHC-built closure as not being one of the
+following types of object:
  
-%************************************************************************
-%*                                                                     *
-\subsubsection[ticky-update-frames]{Update frames}
-%*                                                                     *
-%************************************************************************
+\begin{itemize}
+\item A @BCO@,
+\item A @AP@,
+\item A @PAP@,
+\item An indirection, or
+\item A constructor.
+\end{itemize}
  
-These macros count up the following update information.
+When Hugs is called on to enter a GHC closure, it executes the
+following sequence of instructions:
  
-\begin{tabular}{|l|l|} \hline
-Macro                  &       Counts                                  \\ \hline
-                       &                                               \\
-@UPDF_STD_PUSHED@      &       Update frame pushed                     \\
-@UPDF_CON_PUSHED@      &       Constructor update frame pushed         \\
-@UPDF_HOLE_PUSHED@     &       An update frame to update a black hole  \\
-@UPDF_OMITTED@         &       A thunk decided not to push an update frame \\
-                       &       (all subsets of @ENT_THK@)              \\
-@UPDF_RCC_PUSHED@      &       Cost Centre restore frame pushed        \\
-@UPDF_RCC_OMITTED@     &       Cost Centres not required -- not pushed \\\hline
-\end{tabular}
+\begin{itemize}
+\item Push the address of the closure on the stack.
+\item Save the current state of the thread in the TSO.
+\item Return to the scheduler, with the @whatNext@ field set to
+@EnterGHC@.
+\end{itemize}
  
-%************************************************************************
-%*                                                                     *
-\subsubsection[ticky-updates]{Updates}
-%*                                                                     *
-%************************************************************************
+\subsection{A Hugs thread returns to a GHC-compiled return address}
+\label{sect:hugs-to-ghc-switch}
  
-These macros record information when we do an update.  We always
-update either with a data constructor (CON) or a partial application
-(PAP).
+When Hugs encounters a return address on the stack that is not
+@HUGS_RET@, it knows that a world-switch is required.  At this point
+the stack contains a pointer to the return value, followed by the GHC
+return address.  The following sequence is then performed:
  
-\begin{tabular}{|l|l|}\hline
-Macro                  &       Where                                           \\ \hline
-                       &                                                       \\
-@UPD_EXISTING@         &       Updating with an indirection to something       \\
-                       &       already in the heap                             \\
-@UPD_SQUEEZED@         &       Same as @UPD_EXISTING@ but because              \\
-                       &       of stack-squeezing                              \\
-@UPD_CON_W_NODE@       &       Updating with a CON: by indirecting to Node     \\
-@UPD_CON_IN_PLACE@     &       Ditto, but in place                             \\
-@UPD_CON_IN_NEW@       &       Ditto, but allocating the object                \\
-@UPD_PAP_IN_PLACE@     &       Same, but updating w/ a PAP                     \\
-@UPD_PAP_IN_NEW@       &                                                       \\\hline
-\end{tabular}
+\begin{itemize}
+\item save the state of the thread in the TSO.
+\item return to the scheduler, setting @whatNext@ to @EnterGHC@.
+\end{itemize}
  
-%************************************************************************
-%*                                                                     *
-\subsubsection[ticky-selectors]{Doing selectors at GC time}
-%*                                                                     *
-%************************************************************************
+The first thing that GHC will do is enter the object on the top of the
+stack, which is a pointer to the return value.  This value will then
+return itself to the return address using the GHC return convention.
  
-@GC_SEL_ABANDONED@: we could've done the selection, but we gave up
-(e.g., to avoid overflowing the C stack); @GC_SEL_MINOR@: did a
-selection in a minor GC; @GC_SEL_MAJOR@: ditto, but major GC.
  
+\fi
  
  
-\section{History}
+\part{History}
  
  We're nuking the following:
  
@@ -2970,7 +4267,7 @@ We're nuking the following:
    STATIC SMReps are now called CONST
  
  \item
-  @SM_MUTVAR@ is new
+  @MUTVAR@ is new
  
  \item The profiling ``kind'' field is now encoded in the @INFO_TYPE@ field.
  This identifies the general sort of the closure for profiling purposes.
@@ -2980,206 +4277,6 @@ This identifies the general sort of the closure for profiling purposes.
  
  \end{itemize}
  
-\section{Old tricks}
-
-@CAF@ indirections:
-
-These are statically defined closures which have been updated with a
-heap-allocated result.  Initially these are exactly the same as a
-@STATIC@ closure but with special entry code. On entering the closure
-the entry code must:
-
-\begin{itemize}
-\item Allocate a black hole in the heap which will be updated with
-      the result.
-\item Overwrite the static closure with a special @CAF@ indirection.
-
-\item Link the static indirection onto the list of updated @CAF@s.
-\end{itemize}
-
-The indirection and the link field require the initial @STATIC@
-closure to be of at least size @MIN_UPD_SIZE@ (excluding the fixed
-header).
-
-@CAF@s are treated as special garbage collection roots.  These roots
-are explicitly collected by the garbage collector, since they may
-appear in code even if they are not linked with the main heap.  They
-consequently represent potentially enormous space-leaks.  A @CAF@
-closure retains a fixed location in statically allocated data space.
-When updated, the contents of the @CAF@ indirection are changed to
-reflect the new closure. @CAF@ indirections require special garbage
-collection code.
-
-\section{Old stuff about SRTs}
-
-Garbage collection of @CAF@s is tricky.  We have to cope with explicit
-collection from the @CAFlist@ as well as potential references from the
-stack and heap which will cause the @CAF@ evacuation code to be
-called.  They are treated like indirections which are shorted out.
-However they must also be updated to point to the new location of the
-new closure as the @CAF@ may still be used by references which
-reside in the code.
-
-{\bf Copying Collection}
-
-A first scheme might use evacuation code which evacuates the reference
-and updates the indirection. This is no good as subsequent evacuations
-will result in an already evacuated closure being evacuated. This will
-leave a forward reference in to-space!
-
-An alternative scheme evacuates the @CAFlist@ first. The closures
-referenced are evacuated and the @CAF@ indirection updated to point to
-the evacuated closure. The @CAF@ evacuation code simply returns the
-updated indirection pointer --- the pointer to the evacuated closure.
-Unfortunately the closure the @CAF@ references may be a static
-closure, in fact, it may be another @CAF@. This will cause the second
-@CAF@'s evacuation code to be called before the @CAF@ has been
-evacuated, returning an unevacuated pointer.
-
-Another scheme leaves updating the @CAF@ indirections to the end of
-the garbage collection.  All the references are evacuated and
-scavenged as usual (including the @CAFlist@). Once collection is
-complete the @CAFlist@ is traversed updating the @CAF@ references with
-the result of evacuating the referenced closure again. This will
-immediately return as it must be a forward reference, a static
-closure, or a @CAF@ which will indirect by evacuating its reference.
-
-The crux of the problem is that the @CAF@ evacuation code needs to
-know if its reference has already been evacuated and updated. If not,
-then the reference can be evacuated, updated and returned safely
-(possibly evacuating another @CAF@). If it has, then the updated
-reference can be returned. This can be done using two @CAF@
-info-tables. At the start of a collection the @CAFlist@ is traversed
-and set to an internal {\em evacuate and update} info-table. During
-collection, evacution of such a @CAF@ also results in the info-table
-being reset back to the standard @CAF@ info-table. Thus subsequent
-evacuations will simply return the updated reference. On completion of
-the collection all @CAF@s will have {\em return reference} info-tables
-again.
-
-This is the scheme we adopt. A @CAF@ indirection has evacuation code
-which returns the evacuated and updated reference. During garbage
-collection, all the @CAF@s are overwritten with an internal @CAF@ info
-table which has evacuation code which performs this evacuate and
-update and restores the original @CAF@ code. At some point during the
-collection we must ensure that all the @CAF@s are indeed evacuated.
-
-The only potential problem with this scheme is a cyclic list of @CAF@s
-all directly referencing (possibly via indirections) another @CAF@!
-Evacuation of the first @CAF@ will fail in an infinite loop of @CAF@
-evacuations. This is solved by ensuring that the @CAF@ info-table is
-updated to a {\em return reference} info-table before performing the
-evacuate and update. If this {\em return reference} evacuation code is
-called before the actual evacuation is complete it must be because
-such a cycle of references exists. Returning the still unevacuated
-reference is OK --- all the @CAF@s will now reference the same
-@CAF@ which will reference itself! Construction of such a structure
-indicates the program must be in an infinite loop.
-
-{\bf Compacting Collector}
-
-When shorting out a @CAF@, its reference must be marked. A first
-attempt might explicitly mark the @CAF@s, updating the reference with
-the marked reference (possibly short circuting indirections). The
-actual @CAF@ marking code can indicate that they have already been
-marked (though this might not have actually been done yet) and return
-the indirection pointer so it is shorted out. Unfortunately the @CAF@
-reference might point to an indirection which will be subsequently
-shorted out. Rather than returning the @CAF@ reference we treat the
-@CAF@ as an indirection, calling the mark code of the reference, which
-will return the appropriately shorted reference.
-
-Problem: Cyclic list of @CAF@s all directly referencing (possibly via
-indirections) another @CAF@!
-
-Before compacting, the locations of the @CAF@ references are
-explicitly linked to the closures they reference (if they reference
-heap allocated closures) so that the compacting process will update
-them to the closure's new location. Unfortunately these locations'
-@CAF@ indirections are static.  This causes premature termination
-since the test to find the info pointer at the end of the location
-list will match more than one value.  This can be solved by using an
-auxiliary dynamic array (on the top of the A stack).  One location for
-each @CAF@ indirection is linked to the closure that the @CAF@
-references. Once collection is complete this array is traversed and
-the corresponding @CAF@ is then updated with the updated pointer from
-the auxiliary array.
-
-
-It is possible to use an alternative marking scheme, using a similar
-idea to the copying solution. This scheme avoids the need to update
-the @CAF@ references explicitly. We introduce an auxillary {\em mark
-and update} @CAF@ info-table which is used to update all @CAF@s at the
-start of a collection. The new code marks the @CAF@ reference,
-updating it with the returned reference.  The returned reference is
-itself returned so the @CAF@ is shorted out.  The code also modifies the
-@CAF@ info-table to be a {\em return reference}.  Subsequent attempts to
-mark the @CAF@ simply return the updated reference.
-
-A cyclic @CAF@ reference will result in an attempt to mark the @CAF@
-before the marking has been completed and the reference updated. We
-cannot start marking the @CAF@ as it is already being marked. Nor can
-we return the reference as it has not yet been updated. Neither can we
-treat the CAF as an indirection since the @CAF@ reference has been
-obscured by the pointer reversal stack. All we can do is return the
-@CAF@ itself. This will result in some @CAF@ references not being
-shorted out.
-
-This scheme has not been adopted but has been implemented. The code is
-commented out with @#if 0@.
-
-\subsection{The virtual register set}
-
-We refer to any (atomic) part of the virtual machine state as a ``register.''
-These ``registers'' may be shared between all threads in the system or may be
-specific to each thread.
-
-Global: 
-@
-  Hp
-  HpLim
-  Thread preemption flag
-@
-
-Thread specific:
-@
-  TSO - pointer to the TSO for when we have to pack thread away
-  Sp
-  SpLim
-  Su - used to calculate number of arguments on stack
-     - this is a more convenient representation
-  Call/return registers (aka General purpose registers)
-  Cost centre (and other debug/profile info)
-  Statistic gathering (not in production system)
-  Exception handlers 
-    Heap overflow  - possible global?
-    Stack overflow - possibly global?
-    Pattern match failure
-    maybe a failWith handler?
-    maybe an exitWith handler?
-    ...
-@
-
-Some of these virtual ``registers'' are used very frequently and should
-be mapped onto machine registers if at all possible.  Others are used
-very infrequently and can be kept in memory to free up registers for
-other uses.
-
-On register-poor architectures, we can play a few tricks to reduce the
-number of virtual registers which need to be accessed on a regular
-basis:
-
-@
-- HpLim trick
-- Grow stack and heap towards each other (single-threaded system only)
-- We might need to keep the C stack pointer in a register if that
-  is what the OS expects when a signal occurs.
-- Preemption flag trick
-- If any of the frequently accessed registers cannot be mapped onto
-  machine registers we should keep the TSO in a machine register to
-  allow faster access to all the other non-machine registers.
-@
-
  
  \end{document}