From: areid Date: Thu, 8 Jan 1998 14:40:22 +0000 (+0000) Subject: [project @ 1998-01-08 14:40:22 by areid] X-Git-Tag: Approx_2487_patches~1114 X-Git-Url: http://git.megacz.com/?p=ghc-hetmet.git;a=commitdiff_plain;h=ff14742cc328f19b9bf7c04d9a69408e641cf64a [project @ 1998-01-08 14:40:22 by areid] Added an overview section, commented out some of the more bogus parts of the document (but some bits still remain) --- diff --git a/docs/rts/rts.verb b/docs/rts/rts.verb index 7e0af3d..9dced41 100644 --- a/docs/rts/rts.verb +++ b/docs/rts/rts.verb @@ -5,13 +5,12 @@ % TODO: % -% o I think it would be worth making the connection with CPS explicit. +% o I (ADR) think it would be worth making the connection with CPS explicit. % Now that we have explicit activation records (on the stack), we can % explain the whole system in terms of CPS and tail calls --- with the % one requirement that we carefuly distinguish stack-allocated objects % from heap-allocated objects. - % \documentstyle[preprint]{acmconf} \documentclass[11pt]{article} \oddsidemargin 0.1 in % Note that \oddsidemargin = \evensidemargin @@ -20,7 +19,7 @@ \marginparsep 0 in \sloppy -\usepackage{epsfig} +%\usepackage{epsfig} \newcommand{\note}[1]{{\em Note: #1}} % DIMENSION OF TEXT: @@ -91,6 +90,12 @@ architectures. It has been partly replaced by unboxed tuples explicitly state where results should be returned in registers (or on the stack) instead of on the heap. +\item + +Lazy black-holing has been replaced by eager black-holing. The +problem with lazy black-holing is that it leaves slop in the heap +which conflicts with the use of a mostly-copying collector. + \end{itemize} \subsection{Wish list} @@ -131,6 +136,11 @@ mutually-exclusive choices. \begin{description} \item[@SEQUENTIAL@] No concurrency or parallelism support. This configuration might not support interrupt recovery. + + \note{There's probably not much point in supporting this option. If + we've gone to the effort of supporting concurency, we don't gain + much by being able to turn it off.} + \item[@CONCURRENT@] Support for concurrency but not for parallelism. \item[@CONCURRENT@+@GRANSIM@] Concurrency support and simulated parallelism. \item[@CONCURRENT@+@PARALLEL@] Concurrency support and real parallelism. @@ -172,7 +182,7 @@ pointed types are the only things which can be lazily evaluated. In the STG machine, this means that they are the only things that can be {\em entered} or {\em updated} and it requires that they be boxed. -\item An {\em unpointed} type is one that does not contains $\bot$. +\item An {\em unpointed} type is one that does not contain $\bot$. Variables with unpointed types are never delayed --- they are always evaluated when they are constructed. In the STG machine, this means that they cannot be {\em entered} or {\em updated}. Unpointed objects @@ -200,7 +210,7 @@ words and pointers are the same size. \subsection{Subtle Dependencies} Some decisions have very subtle consequences which should be written -down in case we want to change our minds. +down in case we want to change our minds. \begin{itemize} @@ -234,10 +244,16 @@ discusses who does the stack check and how much space they need. Heap objects never contain slop --- this is required if we want to support mostly-copying garbage collection. +This is a big problem when updating since the updatee is usually +bigger than an indirection object. The fix is to overwrite the end of +the updatee with ``slop objects'' (described in +section~\ref{sect:slop-objects}). This is hard to arrange if we do \emph{lazy} blackholing (section~\ref{sect:lazy-black-holing}) so we currently plan to blackhole an object when we push the update frame. + + \item Info tables for constructors contain enough information to decide which @@ -268,12 +284,6 @@ instead of \subsection{Unboxed tuples}\label{sect:unboxed-tuples} -\Note{We're not planning to implement this right away. There doesn't -seem to be any real difficulty adding it to the runtime system but -it'll take a lot of work adding it to the compiler. Since unboxed -tuples do not trigger allocation, the syntax could be modified to allow -unboxed tuples in expressions.} - Functions can take multiple arguments as easily as they can take one argument: there's no cost for adding another argument. But functions can only return one result: the cost of adding a second ``result'' is @@ -314,20 +324,44 @@ or multiple stack slots. At first sight, this seems a little strange but it's no different from passing double precision floats in two registers. -Note that unboxed tuples can only have one constructor and that +Notes: +\begin{itemize} +\item +Unboxed tuples can only have one constructor and that thunks never have unboxed types --- so we'll never try to update an unboxed constructor. The restriction to a single constructor is largely to avoid garbage collection complications. +\item +The core syntax does not allow variables to be bound to +unboxed tuples (ie in default case alternatives or as function arguments) +and does not allow unboxed tuples to be fields of other constructors. +However, there's no harm in allowing it in the source syntax as a +convenient, but easily removed, syntactic sugar. + +\item +The compiler generates a closure of the form +@ +> c = \ x y z -> C x y z +@ +for every constructor (whether boxed or unboxed). + +This closure is normally used during desugaring to ensure that +constructors are saturated and to apply any strictness annotations. +They are also used when returning unboxed constructors to the machine +code evaluator from the bytecode evaluator and when a heap check fails +in a return continuation for an unboxed-tuple scrutinee. + +\end{itemize} + \subsection{STG Syntax} \ToDo{Insert STG syntax with appropriate changes.} -%----------------------------------------------------------------------------- -\part{Evaluation Model} - -\section{Overview} +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% +\part{System Overview} +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% This part is concerned with defining the external interfaces of the major components of the system; the next part is concerned with their @@ -336,1629 +370,3481 @@ inner workings. The major components of the system are: \begin{itemize} \item The scheduler -\item The loader \item The storage manager -\item The machine code evaluator (compiled code) -\item The bytecode evaluator (interpreted code) +\item The evaluators +\item The loader \item The compilers \end{itemize} -\section{The Compilers} - -Need to describe interface files. - -Here's an example - but I don't know the grammar - ADR. -@ -_interface_ Main 1 -_exports_ -Main main ; -_declarations_ -1 main _:_ IOBase.IO PrelBase.();; -@ +\ToDo{Insert diagram showing all components underneath the scheduler +and communicating only with the scheduler} -\section{The Scheduler} +\section{Scheduler} The Scheduler is the heart of the run-time system. A running program consists of a single running thread, and a list of runnable and -blocked threads. The running thread returns to the scheduler when any -of the following conditions arises: +blocked threads. All threads consist of a stack and a few words of +status information. Except for the running thread, all threads have a +closure on top of their stack; the scheduler restarts a thread by +entering an evaluator which performs some reduction and returns. -\begin{itemize} -\item A heap check fails, and a garbage collection is required -\item Compiled code needs to switch to interpreted code, and vice -versa. -\item The thread becomes blocked. -\item The thread is preempted. -\end{itemize} +\subsection{The scheduler's main loop} -A running system has a global state, consisting of +The scheduler consists of a loop which chooses a runnable thread and +invokes one of the evaluators which performs some reduction and +returns. -\begin{itemize} -\item @Hp@, the current heap pointer, which points to the next -available address in the Heap. -\item @HpLim@, the heap limit pointer, which points to the end of the -heap. -\item The Thread Preemption Flag, which is set whenever the currently -running thread should be preempted at the next opportunity. -\item A list of runnable threads. -\item A list of blocked threads. -\end{itemize} +The scheduler also takes care of system-wide issues such as heap +overflow or communication with other processors (in the parallel +system) and thread-specific problems such as stack overflow. -Each thread is represented by a Thread State Object (TSO), which is -described in detail in Section \ref{sect:TSO}. +\subsection{Creating a thread} -The following is pseudo-code for the inner loop of the scheduler -itself. +Threads are created: -@ -while (threads_exist) { - // handle global problems: GC, parallelism, etc - if (need_gc) gc(); - if (external_message) service_message(); - // deal with other urgent stuff +\begin{itemize} - pick a runnable thread; - do { - // enter object on top of stack - // if the top object is a BCO, we must enter it - // otherwise appply any heuristic we wish. - if (thread->stack[thread->sp]->info.type == BCO) { - status = runHugs(thread,&smInfo); - } else { - status = runGHC(thread,&smInfo); - } - switch (status) { // handle local problems - case (StackOverflow): enlargeStack; break; - case (Error e) : error(thread,e); break; - case (ExitWith e) : exit(e); break; - case (Yield) : break; - } - } while (thread_runnable); -} -@ +\item -\subsection{Invoking the garbage collector} -\subsection{Putting the thread to sleep} +When the scheduler is first invoked. -\subsection{Calling C from Haskell} +\item -We distinguish between "safe calls" where the programmer guarantees -that the C function will not call a Haskell function or, in a -multithreaded system, block for a long period of time and "unsafe -calls" where the programmer cannot make that guarantee. +When a message is received from another processor (I think). (Parallel +system only.) -Safe calls are performed without returning to the scheduler and are -discussed elsewhere (\ToDo{discuss elsewhere}). +\item -Unsafe calls are performed by returning an array (outside the Haskell -heap) of arguments and a C function pointer to the scheduler. The -scheduler allocates a new thread from the operating system -(multithreaded system only), spawns a call to the function and -continues executing another thread. When the ccall completes, the -thread informs the scheduler and the scheduler adds the thread to the -runnable threads list. +When a C program calls some Haskell code. -\ToDo{Describe this in more detail.} +\end{itemize} -\subsection{Calling Haskell from C} +\subsection{Restarting a thread} -When C calls a Haskell closure, it sends a message to the scheduler -thread. On receiving the message, the scheduler creates a new Haskell -thread, pushes the arguments to the C function onto the thread's stack -(with tags for unboxed arguments) pushes the Haskell closure and adds -the thread to the runnable list so that it can be entered in the -normal way. +The evaluators can reduce almost all types of closure except that only +the machine code evaluator can reduce GHC-compiled closures and only +the bytecode evaluator can reduce Hugs-compiled closures. +Consequently, the scheduler may use either evaluator to restart a +thread unless the top closure is a @BCO@ or contains machine code. -When the closure returns, the scheduler sends back a message which -awakens the (C) thread. +However, if the top of the stack contains a constructor, the scheduler +should use the machine code evaluator to restart the thread. This +allows the bytecode evaluator to return a constructor to a machine +code return address by pushing the constructor on top of the stack and +returning to the scheduler. If the return address under the +constructor is @HUGS_RET@, the entry code for @HUGS_RET@ will +rearrange the stack so that the return @BCO@ is on top of the stack +and return to the scheduler which will then call the bytecode +evaluator. There is little point in trying to shorten this slightly +indirect route since it will happen very rarely if at all. -\ToDo{Do we need to worry about the garbage collector deallocating the -thread if it gets blocked?} +\subsection{Returning from a thread} -\subsection{Switching Worlds} -\label{sect:switching-worlds} +The evaluators return to the scheduler when any of the following +conditions arise: -\ToDo{This has all changed: we always leave a closure on top of the -stack if we mean to continue executing it. The scheduler examines the -top of the stack and tries to guess which world we want to be in. If -it finds a @BCO@, it certainly enters Hugs, if it finds a @GHC@ -closure, it certainly enters GHC and if it finds a standard closure, -it is free to choose either one but it's probably best to enter GHC -for everything except @BCO@s and perhaps @AP@s.} +\begin{itemize} +\item A heap check fails, and a garbage collection is required +\item Compiled code needs to switch to interpreted code, and vice versa. +\item The evaluator needs to perform an ``unsafe'' C call. +\item The thread becomes blocked. +\item The thread is preempted. +\item The thread terminates. +\end{itemize} -Because this is a combined compiled/interpreted system, the -interpreter will sometimes encounter compiled code, and vice-versa. +Except when the thread terminates, the thread always terminates with a +closure on the top of the stack. -All world-switches go via the scheduler, ensuring that the world is in -a known state ready to enter either compiled code or the interpreter. -When a thread is run from the scheduler, the @whatNext@ field in the -TSO (Section \ref{sect:TSO}) is checked to find out how to execute the -thread. +\subsection{Preempting a thread} -\begin{itemize} -\item If @whatNext@ is set to @ReturnGHC@, we load up the required -registers from the TSO and jump to the address at the top of the user -stack. -\item If @whatNext@ is set to @EnterGHC@, we load up the required -registers from the TSO and enter the closure pointed to by the top -word of the stack. -\item If @whatNext@ is set to @EnterHugs@, we enter the top thing on -the stack, using the interpreter. -\end{itemize} +Strictly speaking, threads cannot be preempted --- the scheduler +merely sets a preemption request flag which the thread must arrange to +test on a regular basis. When an evaluator finds that the preemption +request flag is set, it pushes an appropriate closure onto the stack +and returns to the scheduler. -There are four cases we need to consider: +In the bytecode interpreter, the flag is tested whenever we enter a +closure. If the preemption flag is set, it leaves the closure on top +of the stack and returns to the scheduler. -\begin{enumerate} -\item A GHC thread enters a Hugs-built closure. -\item A GHC thread returns to a Hugs-compiled return address. -\item A Hugs thread enters a GHC-built closure. -\item A Hugs thread returns to a Hugs-compiled return address. -\end{enumerate} +In the machine code evaluator, the flag is only tested when a heap or +stack check fails. This is less expensive than testing the flag on +entering every closure but runs the risk that a thread will enter an +infinite loop which does not allocate any space. If the flag is set, +the evaluator returns to the scheduler exactly as if a heap check had +failed. -GHC-compiled modules cannot call functions in a Hugs-compiled module -directly, because the compiler has no information about arities in the -external module. Therefore it must assume any top-level objects are -CAFs, and enter their closures. +\subsection{``Safe'' and ``unsafe'' C calls} -\ToDo{Hugs-built constructors?} +There are two ways of calling C: -We now examine the various cases one by one and describe how the -switch happens in each situation. +\begin{description} -\subsection{A GHC thread enters a Hugs-built closure} -\label{sect:ghc-to-hugs-closure} +\item[``Safe'' C calls] +are used if the programer is certain that the C function will not +do anything dangerous such as calling a Haskell function or an +operating system call which blocks the thread for a long period of time. +\footnote{Warning: this use of ``safe'' and ``unsafe'' is the exact +opposite of the usage for functions like @unsafePerformIO@.} +Safe C calls are faster but must be hand-checked by the programmer. + +Safe C calls are performed by pushing the arguments onto the C stack +and jumping to the C function's entry point. On exit, the result of +the function is in a register which is returned to the Haskell code as +an unboxed value. + +\item[``Unsafe'' C calls] are used if the programmer suspects that the +thread may do something dangerous like blocking or calling a Haskell +function. Unsafe C calls are relatively slow but are less problematic. + +Unsafe C calls are performed by pushing the arguments onto the Haskell +stack, pushing a return continuation and returning a \emph{C function +descriptor} to the scheduler. The scheduler suspends the Haskell thread, +spawns a new operating system thread which pops the arguments off the +Haskell stack onto the C stack, calls the C function, pushes the +function result onto the Haskell stack and informs the scheduler that +the C function has completed and the Haskell thread is now runnable. -There is three possibilities: GHC has entered a @PAP@, or it has -entered a @AP@, or it has entered the BCO directly (for a top-level -function closure). @AP@s and @PAP@s are ``standard closures'' and -so do not require us to enter the bytecode interpreter. +\end{description} -The entry code for a BCO does the following: +The bytecode evaluator will probably treat all C calls as being unsafe. -\begin{itemize} -\item Push the address of the object entered on the stack. -\item Save the current state of the thread in its TSO. -\item Return to the scheduler, setting @whatNext@ to @EnterHugs@. -\end{itemize} +\ToDo{It might be good for the programmer to indicate how the program is +unsafe. For example, if we distinguish between C functions which might +call Haskell functions and those which might block, we could perform a +safe call for blocking functions in a single-threaded system or, perhaps, in a multi-threaded system which only happens to have a single thread at the moment.} -BCO's for thunks and functions have the same entry conventions as -slow entry points: they expect to find their arguments on the stac -with unboxed arguments preceded by appropriate tags. -\subsection{A GHC thread returns to a Hugs-compiled return address} -\label{sect:ghc-to-hugs-return} +\section{The Evaluators} -Hugs return addresses are laid out as in Figure -\ref{fig:hugs-return-stack}. If GHC is returning, it will return to -the address at the top of the stack, namely @HUGS_RET@. The code at -@HUGS_RET@ performs the following: +All the scheduler needs to know about evaluation is how to manipulate +threads and how to find the closure on top of the stack. The +evaluators need to agree on the representations of certain objects and +on how to return from the scheduler. + +\subsection{Returning to the Scheduler} +\label{sect:switching-worlds} +The evaluators return to the scheduler under three circumstances: \begin{itemize} -\item pushes \Arg{1} (the return value) on the stack. -\item saves the thread state in the TSO -\item returns to the scheduler with @whatNext@ set to @EnterHugs@. -\end{itemize} -\noindent When Hugs runs, it will enter the return value, which will -return using the correct Hugs convention (Section -\ref{sect:hugs-return-convention}) to the return address underneath it -on the stack. +\item -\subsection{A Hugs thread enters a GHC-compiled closure} -\label{sect:hugs-to-ghc-closure} +When they enter a closure built by the other evaluator. That is, when +the bytecode interpreter enters a closure compiled by GHC or when the +machine code evaluator enters a BCO. -Hugs can recognise a GHC-built closure as not being one of the -following types of object: +\item -\begin{itemize} -\item A @BCO@, -\item A @AP@, -\item A @PAP@, -\item An indirection, or -\item A constructor. -\end{itemize} +When they return to a return continuation built by the other +evaluator. That is, when the machine code evaluator returns to a +continuation built by Hugs or when the bytecode evaluator returns to a +continuation built by GHC. -When Hugs is called on to enter a GHC closure, it executes the -following sequence of instructions: +\item + +When a heap or stack check fails or when the preemption flag is set. -\begin{itemize} -\item Push the address of the closure on the stack. -\item Save the current state of the thread in the TSO. -\item Return to the scheduler, with the @whatNext@ field set to -@EnterGHC@. \end{itemize} -\subsection{A Hugs thread returns to a GHC-compiled return address} -\label{sect:hugs-to-ghc-return} +In all cases, they return to the scheduler with a closure on top of +the stack. The mechanism used to trigger the world switch and the +choice of closure left on top of the stack varies according to which +world is being left and what is being returned. -When Hugs encounters a return address on the stack that is not -@HUGS_RET@, it knows that a world-switch is required. At this point -the stack contains a pointer to the return value, followed by the GHC -return address. The following sequence is then performed: +\subsubsection{Leaving the bytecode evaluator} +\label{sect:hugs-to-ghc-switch} -\begin{itemize} -\item save the state of the thread in the TSO. -\item return to the scheduler, setting @whatNext@ to @EnterGHC@. -\end{itemize} +\paragraph{Entering a machine code closure} -The first thing that GHC will do is enter the object on the top of the -stack, which is a pointer to the return value. This value will then -return itself to the return address using the GHC return convention. +When it enters a closure, the bytecode evaluator performs a switch +based on the type of closure (@AP@, @PAP@, @Ind@, etc). On entering a +machine code closure, it returns to the scheduler with the closure on +top of the stack. -\section{The Loader} +\paragraph{Returning a constructor} -\ToDo{Is it ok to load code when threads are running?} +When it enters a constructor, the bytecode evaluator tests the return +continuation on top of the stack. If it is a machine code +continuation, it returns to the scheduler with the constructor on top +of the stack. -In a batch mode system, we can statically link all the modules -together. In an interactive system we need a loader which will -explicitly load and unload individual modules (or, perhaps, blocks of -mutually dependent modules) and resolve references between modules. +\note{This is why the scheduler must enter the machine code evaluator +if it finds a constructor on top of the stack.} -While many operating systems provide support for dynamic loading and -will automatically resolve cross-module references for us, we generally -cannot rely on being able to load mutually dependent modules. +\paragraph{Returning an unboxed value} -A portable solution is to perform some of the linking ourselves. Each module -should provide three global symbols: -\begin{itemize} -\item -An initialisation routine. (Might also be used for finalisation.) -\item -A table of symbols it exports. -Entries in this table consist of the symbol name and the address of the -names value. -\item -A table of symbols it imports. -Entries in this table consist of the symbol name and a list of references -to that symbol. -\end{itemize} +\note{Hugs doesn't support unboxed values in source programs but they +are used for a few complex primops.} -On loading a group of modules, the loader adds the contents of the -export lists to a symbol table and then fills in all the references in the -import lists. +When it enters a constructor, the bytecode evaluator tests the return +continuation on top of the stack. If it is a machine code +continuation, it returns to the scheduler with the unboxed value and a +special closure on top of the stack. When the closure is entered (by +the machine code evaluator), it returns the unboxed value on top of +the stack to the return continuation under it. -References in import lists are of two types: -\begin{description} -\item[ References in machine code ] +The runtime system (or GHC?) provides one of these closures for each +unboxed type. Hugs cannot generate them itself since the entry code is +really very tricky. -The most efficient approach is to patch the machine code directly, but -this will be a lot of work, very painful to port and rather fragile. +\paragraph{Heap/Stack overflow and preemption} -Alternatively, the loader could store the value of each symbol in the -import table for each module and the compiled code can access all -external objects through the import table. This requires that the -import table be writable but does not require that the machine code or -info tables be writable. +The bytecode evaluator tests for heap/stack overflow and preemption +when entering a BCO and simply returns with the BCO on top of the +stack. -\item[ References in data structures (SRTs and static data constructors) ] +\subsubsection{Leaving the machine code evaluator} +\label{sect:ghc-to-hugs-switch} -Either we patch the SRTs and constructors directly or we somehow use -indirections through the symbol table. Patching the SRTs requires -that we make them writable and prevents us from making effective use -of virtual memories that use copy-on-write policies. Using an -indirection is possible but tricky. +\paragraph{Entering a BCO} -Note: We could avoid patching machine code if all references to -eternal references went through the SRT --- then we just have one -thing to patch. But the SRT always contains a pointer to the closure -rather than the fast entry point (say), so we'd take a big performance -hit for doing this. +The entry code for a BCO pushes the BCO onto the stack and returns to +the scheduler. -\end{description} +\paragraph{Returning a constructor} -\section{Compiled Execution} +We avoid the need to test return addresses in the machine code +evaluator by pushing a special return address on top of a pointer to +the bytecode return continuation. Figure~\ref{fig:hugs-return-stack} +shows the state of the stack just before evaluating the scrutinee. -This section describes the framework in which compiled code evaluates -expressions. Only at certain points will compiled code need to be -able to talk to the interpreted world; these are discussed in Section -\ref{sect:switching-worlds}. +\begin{figure}[ht] +\begin{center} +@ +| stack | ++----------+ +| bco |--> BCO ++----------+ +| HUGS_RET | ++----------+ +@ +%\input{hugs_return1.pstex_t} +\end{center} +\caption{Stack layout for evaluating a scrutinee} +\label{fig:hugs-return-stack} +\end{figure} -\subsection{Calling conventions} +This return address rearranges the stack so that the bco pointer is +above the constructor on the stack (as shown in +figure~\ref{fig:hugs-boxed-return}) and returns to the scheduler. -\subsubsection{The call/return registers} +\begin{figure}[ht] +\begin{center} +@ +| stack | ++----------+ +| con |--> Constructor ++----------+ +| bco |--> BCO ++----------+ +@ +%\input{hugs_return2.pstex_t} +\end{center} +\caption{Stack layout for entering a Hugs return address} +\label{fig:hugs-boxed-return} +\end{figure} -One of the problems in designing a virtual machine is that we want it -abstract away from tedious machine details but still reveal enough of -the underlying hardware that we can make sensible decisions about code -generation. A major problem area is the use of registers in -call/return conventions. On a machine with lots of registers, it's -cheaper to pass arguments and results in registers than to pass them -on the stack. On a machine with very few registers, it's cheaper to -pass arguments and results on the stack than to use ``virtual -registers'' in memory. We therefore use a hybrid system: the first -$n$ arguments or results are passed in registers; and the remaining -arguments or results are passed on the stack. For register-poor -architectures, it is important that we allow $n=0$. +\paragraph{Returning an unboxed value} -We'll label the arguments and results \Arg{1} \ldots \Arg{m} --- with -the understanding that \Arg{1} \ldots \Arg{n} are in registers and -\Arg{n+1} \ldots \Arg{m} are on top of the stack. +We avoid the need to test return addresses in the machine code +evaluator by pushing a special return address on top of a pointer to +the bytecode return continuation. This return address rearranges the +stack so that the bco pointer is above the unboxed value (as shown in +figure~\ref{fig:hugs-entering-unboxed-return}) and returns to the scheduler. -Note that the mapping of arguments \Arg{1} \ldots \Arg{n} to machine -registers depends on the {\em kinds} of the arguments. For example, -if the first argument is a Float, we might pass it in a different -register from if it is an Int. In fact, we might find that a given -architecture lets us pass varying numbers of arguments according to -their types. For example, if a CPU has 2 Int registers and 2 Float -registers then we could pass between 2 and 4 arguments in machine -registers --- depending on whether they all have the same kind or they -have different kinds. +\begin{figure}[ht] +\begin{center} +@ +| stack | ++----------+ +| 1# | ++----------+ +| I# | ++----------+ +| bco |--> BCO ++----------+ +@ +%\input{hugs_return2.pstex_t} +\end{center} +\caption{Stack layout for returning an unboxed value} +\label{fig:hugs-entering-unboxed-return} +\end{figure} -\subsubsection{Entering closures} -\label{sect:entering-closures} +\paragraph{Heap/Stack overflow and preemption} -To evaluate a closure we jump to the entry code for the closure -passing a pointer to the closure in \Arg{1} so that the entry code can -access its environment. +\ToDo{} -\subsubsection{Function call} +\subsection{Shared Representations} -The function-call mechanism is obviously crucial. There are five different -cases to consider: -\begin{enumerate} +We share @AP@s, @PAP@s, constructors, indirections, selectors(?) and +update frames. These are described in section~\ref{sect:heap-objects}. -\item {\em Known combinator (function with no free variables) and enough arguments.} -A fast call can be made: push excess arguments onto stack and jump to -function's {\em fast entry point} passing arguments in \Arg{1} \ldots -\Arg{m}. +\section{The Storage Manager} -The {\em fast entry point} is only called with exactly the right -number of arguments (in \Arg{1} \ldots \Arg{m}) so it can instantly -start doing useful work without first testing whether it has enough -registers or having to pop them off the stack first. +The storage manager is responsible for managing the heap and all +objects stored in it. Most objects are just copied in the normal way +but a number receive special treatment by the storage manager: -\item {\em Known combinator and insufficient arguments.} +\begin{itemize} +\item -A slow call can be made: push all arguments onto stack and jump to -function's {\em slow entry point}. +Indirections are shorted out. -Any unpointed arguments which are pushed on the stack must be tagged. -This means pushing an extra word on the stack below the unpointed -words, containing the number of unpointed words above it. +\item -%Todo: forward ref about tagging? -%Todo: picture? +Weak pointers and stable pointers are treated specially. -The {\em slow entry point} might be called with insufficient arguments -and so it must test whether there are enough arguments on the stack. -This {\em argument satisfaction check} consists of checking that -@Su-Sp@ is big enough to hold all the arguments (including any tags). +\item -\begin{itemize} +Thread State Objects (TSOs) and the stacks within them are treated specially. +In particular: -\item If the argument satisfaction check fails, it is because there is -one or more update frames on the stack before the rest of the -arguments that the function needs. In this case, we construct a PAP -(partial application, section~\ref{sect:PAP}) containing the arguments -which are on the stack. The PAP construction code will return to the -update frame with the address of the PAP in \Arg{1}. +\begin{itemize} +\item -\item If the argument satisfaction check succeeds, we jump to the fast -entry point with the arguments in \Arg{1} \ldots \Arg{arity}. +Update frames pointing to unreachable objects are squeezed out. -If the fast entry point expects to receive some of \Arg{i} on the -stack, we can reduce the amount of movement required by making the -stack layout for the fast entry point look like the stack layout for -the slow entry point. Since the slow entry point is entered with the -first argument on the top of the stack and with tags in front of any -unpointed arguments, this means that if \Arg{i} is unpointed, there -should be space below it for a tag and that the highest numbered -argument should be passed on the top of the stack. +\item -We usually arrange that the fast entry point is placed immediately -after the slow entry point --- so we can just ``fall through'' to the -fast entry point without performing a jump. +Adjacent update frames (for different closures) are compressed to a +single update frame pointing to a single black hole. -\end{itemize} +\item +If the stack contains a large amount of free space, the storage +manager may shrink the stack. If it shrinks the stack, it guarantees +never to leave less than @MIN_SIZE_SHRUNKEN_STACK@ empty words on the +stack when it does so. -\item {\em Known function closure (function with free variables) and enough arguments.} +\ToDo{Would it be useful for the storage manager to enlarge the stack?} -A fast call can be made: push excess arguments onto stack and jump to -function's {\em fast entry point} passing a pointer to closure in -\Arg{1} and arguments in \Arg{2} \ldots \Arg{m+1}. +\end{itemize} -Like the fast entry point for a combinator, the fast entry point for a -closure is only called with appropriate values in \Arg{1} \ldots -\Arg{m+1} so we can start work straight away. The pointer to the -closure is used to access the free variables of the closure. +\item +Very large objects (eg large arrays and TSOs) are not moved if +possible. -\item {\em Known function closure and insufficient arguments.} +\end{itemize} -A slow call can be made: push all arguments onto stack and jump to the -closure's slow entry point passing a pointer to the closure in \Arg{1}. -Again, the slow entry point performs an argument satisfaction check -and either builds a PAP or pops the arguments off the stack into -\Arg{2} \ldots \Arg{m+1} and jumps to the fast entry point. +\section{The Compilers} +Need to describe interface files, format of bytecode files, symbols +defined by machine code files. -\item {\em Unknown function closure, thunk or constructor.} +\subsection{Interface Files} -Sometimes, the function being called is not statically identifiable. -Consider, for example, the @compose@ function: +Here's an example - but I don't know the grammar - ADR. @ - compose f g x = f (g x) +_interface_ Main 1 +_exports_ +Main main ; +_declarations_ +1 main _:_ IOBase.IO PrelBase.();; @ -Since @f@ and @g@ are passed as arguments to @compose@, the latter has -to make a heap call. In a heap call the arguments are pushed onto the -stack, and the closure bound to the function is entered. In the -example, a thunk for @(g x)@ will be allocated, (a pointer to it) -pushed on the stack, and the closure bound to @f@ will be -entered. That is, we will jump to @f@s entry point passing @f@ in -\Arg{1}. If \Arg{1} is passed on the stack, it is pushed on top of -the thunk for @(g x)@. -The {\em entry code} for an updateable thunk (which must have arity 0) -pushes an update frame on the stack and starts executing the body of -the closure --- using \Arg{1} to access any free variables. This is -described in more detail in section~\ref{sect:data-updates}. +\subsection{Bytecode files} -The {\em entry code} for a non-updateable closure is just the -closure's slow entry point. +(All that matters here is what the loader sees.) -\end{enumerate} +\subsection{Machine code files} -In addition to the above considerations, if there are \emph{too many} -arguments then the extra arguments are simply pushed on the stack with -appropriate tags. +(Again, all that matters is what the loader sees.) -To summarise, a closure's standard (slow) entry point performs the following: -\begin{description} -\item[Argument satisfaction check.] (function closure only) -\item[Stack overflow check.] -\item[Heap overflow check.] -\item[Copy free variables out of closure.] %Todo: why? -\item[Eager black holing.] (updateable thunk only) %Todo: forward ref. -\item[Push update frame.] -\item[Evaluate body of closure.] -\end{description} +\subsection{Bytecode files} -\subsection{Case expressions and return conventions} -\label{sect:return-conventions} +(All that matters here is what the loader sees.) -The {\em evaluation} of a thunk is always initiated by -a @case@ expression. For example: -@ - case x of (a,b) -> E -@ +\subsection{Machine code files} -The code for a @case@ expression looks like this: +(Again, all that matters is what the loader sees.) -\begin{itemize} -\item Push the free variables of the branches on the stack (fv(@E@) in -this case). -\item Push a \emph{return address} on the stack. -\item Evaluate the scrutinee (@x@ in this case). -\end{itemize} -Once evaluation of the scrutinee is complete, execution resumes at the -return address, which points to the code for the expression @E@. +\section{The Loader} -When execution resumes at the return point, there must be some {\em -return convention} that defines where the components of the pair, @a@ -and @b@, can be found. The return convention varies according to the -type of the scrutinee @x@: +\ToDo{Is it ok to load code when threads are running?} -\begin{itemize} +In a batch mode system, we can statically link all the modules +together. In an interactive system we need a loader which will +explicitly load and unload individual modules (or, perhaps, blocks of +mutually dependent modules) and resolve references between modules. -\item +While many operating systems provide support for dynamic loading and +will automatically resolve cross-module references for us, we generally +cannot rely on being able to load mutually dependent modules. -(A space for) the return address is left on the top of the stack. -Leaving the return address on the stack ensures that the top of the -stack contains a valid activation record -(section~\ref{sect:activation-records}) --- should a garbage collection -be required. +A portable solution is to perform some of the linking ourselves. Each module +should provide three global symbols: +\begin{itemize} +\item +An initialisation routine. (Might also be used for finalisation.) +\item +A table of symbols it exports. +Entries in this table consist of the symbol name and the address of the +names value. +\item +A table of symbols it imports. +Entries in this table consist of the symbol name and a list of references +to that symbol. +\end{itemize} -\item If @x@ has a boxed type (e.g.~a data constructor or a function), -a pointer to @x@ is returned in \Arg{1}. - -\ToDo{Warn that components of E should be extracted as soon as -possible to avoid a space leak.} - -\item If @x@ is an unboxed type (e.g.~@Int#@ or @Float#@), @x@ is -returned in \Arg{1} +On loading a group of modules, the loader adds the contents of the +export lists to a symbol table and then fills in all the references in the +import lists. -\item If @x@ is an unboxed tuple constructor, the components of @x@ -are returned in \Arg{1} \ldots \Arg{n} but no object is constructed in -the heap. +References in import lists are of two types: +\begin{description} +\item[ References in machine code ] -When passing an unboxed tuple to a function, the components are -flattened out and passed in \Arg{1} \ldots \Arg{n} as usual. +The most efficient approach is to patch the machine code directly, but +this will be a lot of work, very painful to port and rather fragile. -\end{itemize} +Alternatively, the loader could store the value of each symbol in the +import table for each module and the compiled code can access all +external objects through the import table. This requires that the +import table be writable but does not require that the machine code or +info tables be writable. -\subsection{Vectored Returns} +\item[ References in data structures (SRTs and static data constructors) ] -Many algebraic data types have more than one constructor. For -example, the @Maybe@ type is defined like this: -@ - data Maybe a = Nothing | Just a -@ -How does the return convention encode which of the two constructors is -being returned? A @case@ expression scrutinising a value of @Maybe@ -type would look like this: -@ - case E of - Nothing -> ... - Just a -> ... -@ -Rather than pushing a return address before evaluating the scrutinee, -@E@, the @case@ expression pushes (a pointer to) a {\em return -vector}, a static table consisting of two code pointers: one for the -@Just@ alternative, and one for the @Nothing@ alternative. +Either we patch the SRTs and constructors directly or we somehow use +indirections through the symbol table. Patching the SRTs requires +that we make them writable and prevents us from making effective use +of virtual memories that use copy-on-write policies. Using an +indirection is possible but tricky. -\begin{itemize} +Note: We could avoid patching machine code if all references to +eternal references went through the SRT --- then we just have one +thing to patch. But the SRT always contains a pointer to the closure +rather than the fast entry point (say), so we'd take a big performance +hit for doing this. -\item +\end{description} -The constructor @Nothing@ returns by jumping to the first item in the -return vector with a pointer to a (statically built) Nothing closure -in \Arg{1}. +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% +\part{Internal details} +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -It might seem that we could avoid loading \Arg{1} in this case since the -first item in the return vector will know that @Nothing@ was returned -(and can easily access the Nothing closure in the (unlikely) event -that it needs it. The only reason we load \Arg{1} is in case we have to -perform an update (section~\ref{sect:data-updates}). +This part is concerned with the internal details of the components +described in the previous part. -\item +The major components of the system are: +\begin{itemize} +\item The scheduler +\item The storage manager +\item The evaluators +\item The loader +\item The compilers +\end{itemize} -The constructor @Just@ returns by jumping to the second element of the -return vector with a pointer to the closure in \Arg{1}. +\section{The Scheduler} -\end{itemize} +\section{The Storage Manager} +\label{sect:storage-manager-internals} -In this way no test need be made to see which constructor returns; -instead, execution resumes immediately in the appropriate branch of -the @case@. +\ToDo{Fix this picture} -\subsection{Direct Returns} +\begin{figure} +\begin{center} +\input{closure} +\end{center} +\caption{A closure} +\label{fig:closure} +\end{figure} -When a datatype has a large number of constructors, it may be -inappropriate to use vectored returns. The vector tables may be -large and sparse, and it may be better to identify the constructor -using a test-and-branch sequence on the tag. For this reason, we -provide an alternative return convention, called a \emph{direct -return}. +Every {\em heap object} is a contiguous block +of memory, consisting of a fixed-format {\em header} followed +by zero or more {\em data words}. -In a direct return, the return address pushed on the stack really is a -code pointer. The returning code loads a pointer to the closure being -returned in \Arg{1} as usual, and also loads the tag into \Arg{2}. -The code at the return address will test the tag and jump to the -appropriate code for the case branch. +\ToDo{I absolutely do not believe that every heap object has a header +like this - ADR. I believe that they all have an info pointer but I +see no readon why stack objects and unpointed heap objects would have +an entry count since this will always be zero.} -The choice of whether to use a vectored return or a direct return is -made on a type-by-type basis --- up to a certain maximum number of -constructors imposed by the update mechanism -(section~\ref{sect:data-updates}). +The header consists of the following fields: +\begin{itemize} +\item A one-word {\em info pointer}, which points to +the object's static {\em info table}. +\item Zero or more {\em admin words} that support +\begin{itemize} +\item Profiling (notably a {\em cost centre} word). + \note{We could possibly omit the cost centre word from some + administrative objects.} +\item Parallelism (e.g. GranSim keeps the object's global address here, +though GUM keeps a separate hash table). +\item Statistics (e.g. a word to track how many times a thunk is entered.). -Single-constructor data types also use direct returns, although in -that case there is no need to return a tag in \Arg{2}. +We add a Ticky word to the fixed-header part of closures. This is +used to indicate if a closure has been updated but not yet entered. It +is set when the closure is updated and cleared when subsequently +entered. -\ToDo{Say whether we pop the return address before returning} +NB: It is {\em not} an ``entry count'', it is an +``entries-after-update count.'' The commoning up of @CONST@, +@CHARLIKE@ and @INTLIKE@ closures is turned off(?) if this is +required. This has only been done for 2s collection. -\ToDo{Stack stubbing?} +\end{itemize} +\end{itemize} -\subsection{Updates} -\label{sect:data-updates} +Most of the RTS is completely insensitive to the number of admin words. +The total size of the fixed header is @FIXED_HS@. -The entry code for an updatable thunk (which must be of arity 0): +Many heap objects contain fields allowing them to be inserted onto lists +during evaluation or during garbage collection. The lists required by +the evaluator and storage manager are as follows. \begin{itemize} -\item copies the free variables out of the thunk into registers or - onto the stack. -\item pushes an {\em update frame} onto the stack. +\item 2 lists of threads: runnable threads and sleeping threads. -An update frame is a small activation record consisting of -\begin{center} -\begin{tabular}{|l|l|l|} -\hline -{\em Fixed header} & {\em Update Frame link} & {\em Updatee} \\ -\hline -\end{tabular} -\end{center} +\item The {\em static object list} is a list of all statically +allocated objects which might contain pointers into the heap. +(Section~\ref{sect:static-objects}.) -\note{In the semantics part of the STG paper (section 5.6), an update -frame consists of everything down to the last update frame on the -stack. This would make sense too --- and would fit in nicely with -what we're going to do when we add support for speculative -evaluation.} -\ToDo{I think update frames contain cost centres sometimes} +\item The {\em updated thunk list} is a list of all thunks in the old +generation which have been updated with an indirection. +(Section~\ref{sect:IND_OLDGEN}.) -\item -If we are doing ``eager blackholing,'' we then overwrite the thunk -with a black hole. Otherwise, we leave it to the garbage collector to -black hole the thunk. +\item The {\em mutables list} is a list of all other objects in the +old generation which might contain pointers into the new generation. +Most of the object on this list are ``mutable.'' +(Section~\ref{sect:mutables}.) -\item -Start evaluating the body of the expression. +\item The {\em Foreign Object list} is a list of all foreign objects + which have not yet been deallocated. (Section~\ref{sect:FOREIGN}.) -\end{itemize} +\item The {\em Spark pool} is a doubly(?) linked list of Spark objects +maintained by the parallel system. (Section~\ref{sect:SPARK}.) -When the expression finishes evaluation, it will enter the update -frame on the top of the stack. Since the returner doesn't know -whether it is entering a normal return address/vector or an update -frame, we follow exactly the same conventions as return addresses and -return vectors. That is, on entering the update frame: +\item The {\em Blocked Fetch list} (or +lists?). (Section~\ref{sect:BLOCKED_FETCH}.) -\begin{itemize} -\item The value of the thunk is in \Arg{1}. (Recall that only thunks -are updateable and that thunks return just one value.) +\item For each thread, there is a list of all update frames on the +stack. (Section~\ref{sect:data-updates}.) -\item If the data type is a direct-return type rather than a -vectored-return type, then the tag is in \Arg{2}. -\item The update frame is still on the stack. \end{itemize} -We can safely share a single statically-compiled update function -between all types. However, the code must be able to handle both -vectored and direct-return datatypes. This is done by arranging that -the update code looks like this: +\ToDo{The links for these fields are usually inserted immediately +after the fixed header except ...} -@ - | ^ | - | return vector | - |---------------| - | fixed-size | - | info table | - |---------------| <- update code pointer - | update code | - | v | -@ +\subsection{Info Tables} -Each entry in the return vector (which is large enough to cover the -largest vectored-return type) points to the update code. +An {\em info table} is a contiguous block of memory, {\em laid out +backwards}. That is, the first field in the list that follows +occupies the highest memory address, and the successive fields occupy +successive decreasing memory addresses. -The update code: +\begin{center} +\begin{tabular}{|c|} + \hline Parallelism Info +\\ \hline Profile Info +\\ \hline Debug Info +\\ \hline Tag / Static reference table +\\ \hline Storage manager layout info +\\ \hline Closure type +\\ \hline entry code +\\ \vdots +\end{tabular} +\end{center} +An info table has the following contents (working backwards in memory +addresses): \begin{itemize} -\item overwrites the {\em updatee} with an indirection to \Arg{1}; -\item loads @Su@ from the Update Frame link; -\item removes the update frame from the stack; and -\item enters \Arg{1}. -\end{itemize} - -We enter \Arg{1} again, having probably just come from there, because -it knows whether to perform a direct or vectored return. This could -be optimised by compiling special update code for each slot in the -return vector, which performs the correct return. +\item The {\em entry code} for the closure. +This code appears literally as the (large) last entry in the +info table, immediately preceded by the rest of the info table. +An {\em info pointer} always points to the first byte of the entry code. -\subsection{Semi-tagging} -\label{sect:semi-tagging} +\item A one-word {\em closure type field}, @INFO_TYPE@, identifies what kind +of closure the object is. The various types of closure are described +in Section~\ref{sect:closures}. +In some configurations, some useful properties of +closures (is it a HNF? can it be sparked?) +are represented as high-order bits so they can be tested quickly. + +\item A single pointer or word --- the {\em storage manager info field}, +@INFO_SM@, contains auxiliary information describing the closure's +precise layout, for the benefit of the garbage collector and the code +that stuffs graph into packets for transmission over the network. + +\item A one-word {\em Tag/Static Reference Table} field, @INFO_SRT@. +For data constructors, this field contains the constructor tag, in the +range $0..n-1$ where $n$ is the number of constructors. For all other +objects it contains a pointer to a table which enables the garbage +collector to identify all accessible code and CAFs. They are fully +described in Section~\ref{sect:srt}. + +\item {\em Profiling info\/} + +\ToDo{The profiling info is completely bogus. I've not deleted it +from the document but I've commented it all out.} + +% change to \iftrue to uncomment this section +\iffalse + +Closure category records are attached to the info table of the +closure. They are declared with the info table. We put pointers to +these ClCat things in info tables. We need these ClCat things because +they are mutable, whereas info tables are immutable. Hashing will map +similar categories to the same hash value allowing statistics to be +grouped by closure category. + +Cost Centres and Closure Categories are hashed to provide indexes +against which arbitrary information can be stored. These indexes are +memoised in the appropriate cost centre or category record and +subsequent hashes avoided by the index routine (it simply returns the +memoised index). + +There are different features which can be hashed allowing information +to be stored for different groupings. Cost centres have the cost +centre recorded (using the pointer), module and group. Closure +categories have the closure description and the type +description. Records with the same feature will be hashed to the same +index value. + +The initialisation routines, @init_index_@, allocate a hash +table in which the cost centre / category records are stored. The +lower bound for the table size is taken from @max__no@. They +return the actual table size used (the next power of 2). Unused +locations in the hash table are indicated by a 0 entry. Successive +@init_index_@ calls just return the actual table size. + +Calls to @index_@ will insert the cost centre / category +record in the @@ hash table, if not already inserted. The hash +index is memoised in the record and returned. + +CURRENTLY ONLY ONE MEMOISATION SLOT IS AVILABLE IN EACH RECORD SO +HASHING CAN ONLY BE DONE ON ONE FEATURE FOR EACH RECORD. This can be +easily relaxed at the expense of extra memoisation space or continued +rehashing. + +The initialisation routines must be called before initialisation of +the stacks and heap as they require to allocate storage. It is also +expected that the caller may want to allocate additional storage in +which to store profiling information based on the return table size +value(s). + +\begin{center} +\begin{tabular}{|l|} + \hline Hash Index +\\ \hline Selected +\\ \hline Kind +\\ \hline Description String +\\ \hline Type String +\\ \hline +\end{tabular} +\end{center} + +\begin{description} +\item[Hash Index] Memoised copy +\item[Selected] + Is this category selected (-1 == not memoised, selected? 0 or 1) +\item[Kind] +One of the following values (defined in CostCentre.lh): + +\begin{description} +\item[@CON_K@] +A constructor. +\item[@FN_K@] +A literal function. +\item[@PAP_K@] +A partial application. +\item[@THK_K@] +A thunk, or suspension. +\item[@BH_K@] +A black hole. +\item[@ARR_K@] +An array. +\item[@ForeignObj_K@] +A Foreign object (non-Haskell heap resident). +\item[@SPT_K@] +The Stable Pointer table. (There should only be one of these but it +represents a form of weak space leak since it can't shrink to meet +non-demand so it may be worth watching separately? ADR) +\item[@INTERNAL_KIND@] +Something internal to the runtime system. +\end{description} + + +\item[Description] Source derived string detailing closure description. +\item[Type] Source derived string detailing closure type. +\end{description} + +\fi % end of commented out stuff + +\item {\em Parallelism info\/} +\ToDo{} + +\item {\em Debugging info\/} +\ToDo{} + +\end{itemize} + + +%----------------------------------------------------------------------------- +\subsection{Kinds of Heap Object} +\label{sect:closures} + +Heap objects can be classified in several ways, but one useful one is +this: +\begin{itemize} +\item +{\em Static closures} occupy fixed, statically-allocated memory +locations, with globally known addresses. + +\item +{\em Dynamic closures} are individually allocated in the heap. + +\item +{\em Stack closures} are closures allocated within a thread's stack +(which is itself a heap object). Unlike other closures, there are +never any pointers to stack closures. Stack closures are discussed in +Section~\ref{sect:stacks}. + +\end{itemize} +A second useful classification is this: +\begin{itemize} +\item +{\em Executive objects}, such as thunks and data constructors, +participate directly in a program's execution. They can be subdivided into +three kinds of objects according to their type: +\begin{itemize} +\item +{\em Pointed objects}, represent values of a {\em pointed} type +(<.pointed types launchbury.>) --i.e.~a type that includes $\bottom$ such as @Int@ or @Int# -> Int#@. + +\item {\em Unpointed objects}, represent values of a {\em unpointed} type --i.e.~a type that does not include $\bottom$ such as @Int#@ or @Array#@. + +\item {\em Activation frames}, represent ``continuations''. They are +always stored on the stack and are never pointed to by heap objects or +passed as arguments. \note{It's not clear if this will still be true +once we support speculative evaluation.} + +\end{itemize} + +\item {\em Administrative objects}, such as stack objects and thread +state objects, do not represent values in the original program. +\end{itemize} + +Only pointed objects can be entered. All pointed objects share a +common header format: the ``pointed header''; while all unpointed +objects share a common header format: the ``unpointed header''. +\ToDo{Describe the difference and update the diagrams to mention +an appropriate header type.} + +This section enumerates all the kinds of heap objects in the system. +Each is identified by a distinct @INFO_TYPE@ tag in its info table. + +\begin{tabular}{|l|l|l|l|l|l|l|l|l|l|l|} +\hline + +closure kind & Section \\ + +\hline +{\em Pointed} \\ +\hline + +@CONSTR@ & \ref{sect:CONSTR} \\ +@CONSTR_STATIC@ & \ref{sect:CONSTR} \\ +@CONSTR_STATIC_NOCAF@ & \ref{sect:CONSTR} \\ + +@FUN@ & \ref{sect:FUN} \\ +@FUN_STATIC@ & \ref{sect:FUN} \\ + +@THUNK@ & \ref{sect:THUNK} \\ +@THUNK_STATIC@ & \ref{sect:THUNK} \\ +@THUNK_SELECTOR@ & \ref{sect:THUNK_SEL} \\ + +@BCO@ & \ref{sect:BCO} \\ +@BCO_CAF@ & \ref{sect:BCO} \\ + +@AP@ & \ref{sect:AP} \\ +@PAP@ & \ref{sect:PAP} \\ + +@IND@ & \ref{sect:IND} \\ +@IND_OLDGEN@ & \ref{sect:IND} \\ +@IND_PERM@ & \ref{sect:IND} \\ +@IND_OLDGEN_PERM@ & \ref{sect:IND} \\ +@IND_STATIC@ & \ref{sect:IND} \\ + +\hline +{\em Unpointed} \\ +\hline + +@ARR_WORDS@ & \ref{sect:ARR_WORDS1},\ref{sect:ARR_WORDS2} \\ +@ARR_PTRS@ & \ref{sect:ARR_PTRS} \\ +@MUTVAR@ & \ref{sect:MUTVAR} \\ +@MUTARR_PTRS@ & \ref{sect:MUTARR_PTRS} \\ +@MUTARR_PTRS_FROZEN@ & \ref{sect:MUTARR_PTRS_FROZEN} \\ + +@FOREIGN@ & \ref{sect:FOREIGN} \\ + +@BH@ & \ref{sect:BH} \\ +@MVAR@ & \ref{sect:MVAR} \\ +@IVAR@ & \ref{sect:IVAR} \\ +@FETCHME@ & \ref{sect:FETCHME} \\ +\hline +\end{tabular} + +Activation frames do not live (directly) on the heap --- but they have +a similar organisation. + +\begin{tabular}{|l|l|}\hline +closure kind & Section \\ \hline +@RET_SMALL@ & \ref{sect:activation-records} \\ +@RET_VEC_SMALL@ & \ref{sect:activation-records} \\ +@RET_BIG@ & \ref{sect:activation-records} \\ +@RET_VEC_BIG@ & \ref{sect:activation-records} \\ +@UPDATE_FRAME@ & \ref{sect:activation-records} \\ +\hline +\end{tabular} + +There are also a number of administrative objects. + +\begin{tabular}{|l|l|}\hline +closure kind & Section \\ \hline +@TSO@ & \ref{sect:TSO} \\ +@STACK_OBJECT@ & \ref{sect:STACK_OBJECT} \\ +@STABLEPTR_TABLE@ & \ref{sect:STABLEPTR_TABLE} \\ +@SPARK_OBJECT@ & \ref{sect:SPARK} \\ +@BLOCKED_FETCH@ & \ref{sect:BLOCKED_FETCH} \\ +\hline +\end{tabular} + +\ToDo{I guess the parallel system has something like a stable ptr +table. Is there any opportunity for sharing code/data structures +here?} + + +\subsection{Predicates} + +\ToDo{The following is a first attempt at defining a useful set of +predicates. Some (such as @isWHNF@ and @isSparkable@) may need their +definitions tweaked a little.} + +The runtime system sometimes needs to be able to distinguish objects +according to their properties: is the object updateable? is it in weak +head normal form? etc. These questions can be answered by examining +the @INFO_TYPE@ field of the object's info table. + +We define the following predicates to detect families of related +info types. They are mutually exclusive and exhaustive. + +\begin{itemize} +\item @isCONSTR@ is true for @CONSTR@s. +\item @isFUN@ is true for @FUN@s. +\item @isTHUNK@ is true for @THUNK@s. +\item @isBCO@ is true for @BCO@s. +\item @isAP@ is true for @AP@s. +\item @isPAP@ is true for @PAP@s. +\item @isINDIRECTION@ is true for indirection objects. +\item @isBH@ is true for black holes. +\item @isFOREIGN_OBJECT@ is true for foreign objects. +\item @isARRAY@ is true for array objects. +\item @isMVAR@ is true for @MVAR@s. +\item @isIVAR@ is true for @IVAR@s. +\item @isFETCHME@ is true for @FETCHME@s. +\item @isSLOP@ is true for slop objects. +\item @isRET_ADDR@ is true for return addresses. +\item @isUPD_ADDR@ is true for update frames. +\item @isTSO@ is true for @TSO@s. +\item @isSTABLE_PTR_TABLE@ is true for the stable pointer table. +\item @isSPARK_OBJECT@ is true for spark objects. +\item @isBLOCKED_FETCH@ is true for blocked fetch objects. +\item @isINVALID_INFOTYPE@ is true for all other info types. + +\end{itemize} + +The following predicates detect other interesting properties: + +\begin{itemize} + +\item @isPOINTED@ is true if an object has a pointed type. + +If an object is pointed, the following predicates may be true +(otherwise they are false). @isWHNF@ and @isUPDATEABLE@ are +mutually exclusive. + +\begin{itemize} +\item @isWHNF@ is true if the object is in Weak Head Normal Form. +Note that unpointed objects are (arbitrarily) not considered to be in WHNF. + +@isWHNF@ is true for @PAP@s, @CONSTR@s, @FUN@s and some @BCO@s. + +\ToDo{Need to distinguish between whnf BCOs and non-whnf BCOs in their +@INFO_TYPE@} + +\item @isBOTTOM@ is true if the object is known to be $\bot$. It is +true of @BH@s. \note{I suspect we'll want to add other kinds of +infotype which are known to be bottom later.} + +\item @isUPDATEABLE@ is true if the object may be overwritten with an + indirection object. + +@isUPDATEABLE@ is true for @THUNK@s, @AP@s and @BH@s. + +\end{itemize} + +It is possible for a pointed object to be neither updatable nor in +WHNF. For example, indirections. + +\item @isUNPOINTED@ is true if an object has an unpointed type. +All such objects are boxed since only boxed objects have info pointers. + +It is true for @ARR_WORDS@, @ARR_PTRS@, @MUTVAR@, @MUTARR_PTRS@, +@MUTARR_PTRS_FROZEN@, @FOREIGN@ objects, @MVAR@s and @IVAR@s. + +\item @isACTIVATION_FRAME@ is true for activation frames of all sorts. + +It is true for return addresses and update frames. +\begin{itemize} +\item @isVECTORED_RETADDR@ is true for vectored return addresses. +\item @isDIRECT_RETADDR@ is true for direct return addresses. +\end{itemize} + +\item @isADMINISTRATIVE@ is true for administrative objects: +@TSO@s, the stable pointer table, spark objects and blocked fetches. + +\end{itemize} + +\begin{itemize} + +\item @isSTATIC@ is true for any statically allocated closure. + +\item @isMUTABLE@ is true for objects with mutable pointer fields: + @MUT_ARR@s, @MUTVAR@s, @MVAR@s and @IVAR@s. + +\item @isSparkable@ is true if the object can (and should) be sparked. +It is true of updateable objects which are not in WHNF with the +exception of @THUNK_SELECTOR@s and black holes. + +\end{itemize} + +As a minor optimisation, we might use the top bits of the @INFO_TYPE@ +field to ``cache'' the answers to some of these predicates. + +An indirection either points to HNF (post update); or is result of +overwriting a FetchMe, in which case the thing fetched is either +under evaluation (BH), or by now an HNF. Thus, indirections get NoSpark flag. + + +\iffalse +@ +#define _NF 0x0001 /* Normal form */ +#define _NS 0x0002 /* Don't spark */ +#define _ST 0x0004 /* Is static */ +#define _MU 0x0008 /* Is mutable */ +#define _UP 0x0010 /* Is updatable (but not mutable) */ +#define _BM 0x0020 /* Is a "primitive" array */ +#define _BH 0x0040 /* Is a black hole */ +#define _IN 0x0080 /* Is an indirection */ +#define _TH 0x0100 /* Is a thunk */ + + + +SPEC +SPEC_N SPEC | _NF | _NS +SPEC_S SPEC | _TH +SPEC_U SPEC | _UP | _TH + +GEN +GEN_N GEN | _NF | _NS +GEN_S GEN | _TH +GEN_U GEN | _UP | _TH + +DYN _NF | _NS +TUPLE _NF | _NS | _BM +DATA _NF | _NS | _BM +MUTUPLE _NF | _NS | _MU | _BM +IMMUTUPLE _NF | _NS | _BM +STATIC _NS | _ST +CONST _NF | _NS +CHARLIKE _NF | _NS +INTLIKE _NF | _NS + +BH _NS | _BH +BH_N BH +BH_U BH | _UP + +BQ _NS | _MU | _BH +IND _NS | _IN +CAF _NF | _NS | _ST | _IN + +FM +FETCHME FM | _MU +FMBQ FM | _MU | _BH + +TSO _MU + +STKO +STKO_DYNAMIC STKO | _MU +STKO_STATIC STKO | _ST + +SPEC_RBH _NS | _MU | _BH +GEN_RBH _NS | _MU | _BH +BF _NS | _MU | _BH +INTERNAL + +@ +\fi + + +\subsection{Pointed Objects} + +All pointed objects can be entered. + +\subsubsection{Function closures}\label{sect:FUN} + +Function closures represent lambda abstractions. For example, +consider the top-level declaration: +@ + f = \x -> let g = \y -> x+y + in g x +@ +Both @f@ and @g@ are represented by function closures. The closure +for @f@ is {\em static} while that for @g@ is {\em dynamic}. + +The layout of a function closure is as follows: +\begin{center} +\begin{tabular}{|l|l|l|l|}\hline +{\em Fixed header} & {\em Pointers} & {\em Non-pointers} \\ \hline +\end{tabular} +\end{center} +The data words (pointers and non-pointers) are the free variables of +the function closure. +The number of pointers +and number of non-pointers are stored in the @INFO_SM@ word, in the least significant +and most significant half-word respectively. + +There are several different sorts of function closure, distinguished +by their @INFO_TYPE@ field: +\begin{itemize} +\item @FUN@: a vanilla, dynamically allocated on the heap. + +\item $@FUN_@p@_@np$: to speed up garbage collection a number of +specialised forms of @FUN@ are provided, for particular $(p,np)$ pairs, +where $p$ is the number of pointers and $np$ the number of non-pointers. + +\item @FUN_STATIC@. Top-level, static, function closures (such as +@f@ above) have a different +layout than dynamic ones: +\begin{center} +\begin{tabular}{|l|l|l|}\hline +{\em Fixed header} & {\em Static object link} \\ \hline +\end{tabular} +\end{center} +Static function closures have no free variables. (However they may refer to other +static closures; these references are recorded in the function closure's SRT.) +They have one field that is not present in dynamic closures, the {\em static object +link} field. This is used by the garbage collector in the same way that to-space +is, to gather closures that have been determined to be live but that have not yet +been scavenged. +\note{Static function closures that have no static references, and hence +a null SRT pointer, don't need the static object link field. Is it worth +taking advantage of this? See @CONSTR_STATIC_NOCAF@.} +\end{itemize} + +Each lambda abstraction, $f$, in the STG program has its own private info table. +The following labels are relevant: +\begin{itemize} +\item $f$@_info@ is $f$'s info table. +\item $f$@_entry@ is $f$'s slow entry point (i.e. the entry code of its +info table; so it will label the same byte as $f$@_info@). +\item $f@_fast_@k$ is $f$'s fast entry point. $k$ is the number of arguments +$f$ takes; encoding this number in the fast-entry label occasionally catches some nasty +code-generation errors. +\end{itemize} + +\subsubsection{Data Constructors}\label{sect:CONSTR} + +Data-constructor closures represent values constructed with +algebraic data type constructors. +The general layout of data constructors is the same as that for function +closures. That is +\begin{center} +\begin{tabular}{|l|l|l|l|}\hline +{\em Fixed header} & {\em Pointers} & {\em Non-pointers} \\ \hline +\end{tabular} +\end{center} + +The SRT pointer in a data constructor's info table is used for the +constructor tag, since a constructor never has any static references. + +There are several different sorts of constructor: +\begin{itemize} +\item @CONSTR@: a vanilla, dynamically allocated constructor. +\item @CONSTR_@$p$@_@$np$: just like $@FUN_@p@_@np$. +\item @CONSTR_INTLIKE@. +A dynamically-allocated heap object that looks just like an @Int@. The +garbage collector checks to see if it can common it up with one of a fixed +set of static int-like closures, thus getting it out of the dynamic heap +altogether. + +\item @CONSTR_CHARLIKE@: same deal, but for @Char@. + +\item @CONSTR_STATIC@ is similar to @FUN_STATIC@, with the complication that +the layout of the constructor must mimic that of a dynamic constructor, +because a static constructor might be returned to some code that unpacks it. +So its layout is like this: +\begin{center} +\begin{tabular}{|l|l|l|l|l|}\hline +{\em Fixed header} & {\em Pointers} & {\em Non-pointers} & {\em Static object link}\\ \hline +\end{tabular} +\end{center} +The static object link, at the end of the closure, serves the same purpose +as that for @FUN_STATIC@. The pointers in the static constructor can point +only to other static closures. + +The static object link occurs last in the closure so that static +constructors can store their data fields in exactly the same place as +dynamic constructors. + +\item @CONSTR_STATIC_NOCAF@. A statically allocated data constructor +that guarantees not to point (directly or indirectly) to any CAF +(section~\ref{sect:CAF}). This means it does not need a static object +link field. Since we expect that there might be quite a lot of static +constructors this optimisation makes sense. Furthermore, the @NOCAF@ +tag allows the compiler to indicate that no CAFs can be reached +anywhere {\em even indirectly}. + + +\end{itemize} + +For each data constructor $Con$, two +info tables are generated: +\begin{itemize} +\item $Con$@_info@ labels $Con$'s dynamic info table, +shared by all dynamic instances of the constructor. +\item $Con$@_static@ labels $Con$'s static info table, +shared by all static instances of the constructor. +\end{itemize} + +\subsubsection{Thunks} +\label{sect:THUNK} +\label{sect:THUNK_SEL} + +A thunk represents an expression that is not obviously in head normal +form. For example, consider the following top-level definitions: +@ + range = between 1 10 + f = \x -> let ys = take x range + in sum ys +@ +Here the right-hand sides of @range@ and @ys@ are both thunks; the former +is static while the latter is dynamic. + +The layout of a thunk is the same as that for a function closure, +except that it may have some words of ``slop'' at the end to make sure +that it has +at least @MIN_UPD_PAYLOAD@ words in addition to its +fixed header. +\begin{center} +\begin{tabular}{|l|l|l|l|l|}\hline +{\em Fixed header} & {\em Pointers} & {\em Non-pointers} & {\em Slop} \\ \hline +\end{tabular} +\end{center} +The @INFO_SM@ word contains the same information as for function +closures; that is, number of pointers and number of non-pointers (excluding slop). + +A thunk differs from a function closure in that it can be updated. + +There are several forms of thunk: +\begin{itemize} +\item @THUNK@: a vanilla, dynamically allocated thunk. +The garbage collection code for thunks whose +pointer + non-pointer words is less than @MIN_UPD_PAYLOAD@ differs from +that for function closures and data constructors, because the GC code +has to account for the slop. +\item $@THUNK_@p@_@np$. Similar comments apply. +\item @THUNK_STATIC@. A static thunk is also known as +a {\em constant applicative form}, or {\em CAF}. + +\begin{center} +\begin{tabular}{|l|l|l|l|l|}\hline +{\em Fixed header} & {\em Pointers} & {\em Non-pointers} & {\em Slop} & {\em Static object link}\\ \hline +\end{tabular} +\end{center} + +\item @THUNK_SELECTOR@ is a (dynamically allocated) thunk +whose entry code performs a simple selection operation from +a data constructor drawn from a single-constructor type. For example, +the thunk +@ + x = case y of (a,b) -> a +@ +is a selector thunk. A selector thunk is laid out like this: +\begin{center} +\begin{tabular}{|l|l|l|l|}\hline +{\em Fixed header} & {\em Selectee pointer} \\ \hline +\end{tabular} +\end{center} +The @INFO_SM@ word contains the byte offset of the desired word in +the selectee. Note that this is different from all other thunks. + +The garbage collector ``peeks'' at the selectee's +tag (in its info table). If it is evaluated, then it goes ahead and do +the selection, and then behaves just as if the selector thunk was an +indirection to the selected field. +If it is not +evaluated, it treats the selector thunk like any other thunk of that +shape. [Implementation notes. +Copying: only the evacuate routine needs to be special. +Compacting: only the PRStart (marking) routine needs to be special.] +\end{itemize} + + +The only label associated with a thunk is its info table: +\begin{description} +\item[$f$@_info@] is $f$'s info table. +\end{description} + + +\subsubsection{Byte-Code Objects} +\label{sect:BCO} + +A Byte-Code Object (BCO) is a container for a a chunk of byte-code, +which can be executed by Hugs. The byte-code represents a +supercombinator in the program: when hugs compiles a module, it +performs lambda lifting and each resulting supercombinator becomes a +byte-code object in the heap. + +There are two kinds of BCO: a standard @BCO@ which has an arity of one +or more, and a @BCO_CAF@ which takes no arguments and can be updated. +When a @BCO_CAF@ is updated, the code is thrown away! + +The semantics of BCOs are described in Section +\ref{sect:hugs-heap-objects}. A BCO has the following structure: + +\begin{center} +\begin{tabular}{|l|l|l|l|l|l|} +\hline +\emph{Fixed Header} & \emph{Layout} & \emph{Offset} & \emph{Size} & +\emph{Literals} & \emph{Byte code} \\ +\hline +\end{tabular} +\end{center} + +\noindent where: +\begin{itemize} +\item The entry code is a static code fragment/info table that +returns to the scheduler to invoke Hugs (Section +\ref{sect:ghc-to-hugs-switch}). +\item \emph{Layout} contains the number of pointer literals in the +\emph{Literals} field. +\item \emph{Offset} is the offset to the byte code from the start of +the object. +\item \emph{Size} is the number of words of byte code in the object. +\item \emph{Literals} contains any pointer and non-pointer literals used in +the byte-codes (including jump addresses), pointers first. +\item \emph{Byte code} contains \emph{Size} words of non-pointer byte +code. +\end{itemize} + +\subsubsection{Partial applications (PAPs)}\label{sect:PAP} + +A partial application (PAP) represents a function applied to too few arguments. +It is only built as a result of updating after an argument-satisfaction +check failure. A PAP has the following shape: +\begin{center} +\begin{tabular}{|l|l|l|l|}\hline +{\em Fixed header} & {\em No of arg words} & {\em Function closure} & {\em Arg stack} \\ \hline +\end{tabular} +\end{center} +The ``arg stack'' is a copy of the chunk of stack above the update +frame; ``no of arg words'' tells how many words it consists of. The +function closure is (a pointer to) the closure for the function whose +argument-satisfaction check failed. + +There is just one standard form of PAP with @INFO_TYPE@ = @PAP@. +There is just one info table too, called @PAP_info@. +Its entry code simply copies the arg stack chunk back on top of the +stack and enters the function closure. (It has to do a stack overflow test first.) + +PAPs are also used to implement Hugs functions (where the arguments are free variables). +PAPs generated by Hugs can be static. + +\subsubsection{@AP@ objects} +\label{sect:AP} + +@AP@ objects are used to represent thunks built by Hugs. The only distintion between +an @AP@ and a @PAP@ is that an @AP@ is updateable. + +\begin{center} +\begin{tabular}{|l|l|l|l|} +\hline +\emph{Fixed Header} & {\em No of arg words} & {\em Function closure} & {\em Arg stack} \\ +\hline +\end{tabular} +\end{center} + +The entry code pushes an update frame, copies the arg stack chunk on +top of the stack, and enters the function closure. (It has to do a +stack overflow test first.) + +The ``arg stack'' is a block of (tagged) arguments representing the +free variables of the thunk; ``no of arg words'' tells how many words +it consists of. The function closure is (a pointer to) the closure +for the thunk. The argument stack may be empty if the thunk has no +free variables. + + +\subsubsection{Indirections} +\label{sect:IND} +\label{sect:IND_OLDGEN} + +Indirection closures just point to other closures. They are introduced +when a thunk is updated to point to its value. +The entry code for all indirections simply enters the closure it points to. + +There are several forms of indirection: +\begin{description} +\item[@IND@] is the vanilla, dynamically-allocated indirection. +It is removed by the garbage collector. It has the following +shape: +\begin{center} +\begin{tabular}{|l|l|l|}\hline +{\em Fixed header} & {\em Target closure} \\ \hline +\end{tabular} +\end{center} + +\item[@IND_OLDGEN@] is the indirection used to update an old-generation +thunk. Its shape is like this: +\begin{center} +\begin{tabular}{|l|l|l|}\hline +{\em Fixed header} & {\em Mutable link field} & {\em Target closure} \\ \hline +\end{tabular} +\end{center} +It contains a {\em mutable link field} that is used to string together +all old-generation indirections that might have a pointer into +the new generation. + + +\item[@IND_PERMANENT@ and @IND_OLDGEN_PERMANENT@.] +for lexical profiling, it is necessary to maintain cost centre +information in an indirection, so ``permanent indirections'' are +retained forever. Otherwise they are just like vanilla indirections. +\note{If a permanent indirection points to another permanent +indirection or a @CONST@ closure, it is possible to elide the indirection +since it will have no effect on the profiler.} +\note{Do we still need @IND@ and @IND_OLDGEN@ +in the profiling build, or can we just make +do with one pair whose behaviour changes when profiling is built?} + +\item[@IND_STATIC@] is used for overwriting CAFs when they have been +evaluated. Static indirections are not removed by the garbage +collector; and are statically allocated outside the heap (and should +stay there). Their static object link field is used just as for +@FUN_STATIC@ closures. + +\begin{center} +\begin{tabular}{|l|l|l|} +\hline +{\em Fixed header} & {\em Target closure} & {\em Static object link} \\ +\hline +\end{tabular} +\end{center} + +\end{description} + +\subsubsection{Activation Records} + +Activation records are parts of the stack described by return address +info tables (closures with @INFO_TYPE@ values of @RET_SMALL@, +@RET_VEC_SMALL@, @RET_BIG@ and @RET_VEC_BIG@). They are described in +section~\ref{sect:activation-records}. + + +\subsubsection{Black holes, MVars and IVars} +\label{sect:BH} +\label{sect:MVAR} +\label{sect:IVAR} + +Black hole closures are used to overwrite closures currently being +evaluated. They inform the garbage collector that there are no live +roots in the closure, thus removing a potential space leak. + +Black holes also become synchronization points in the threaded world. +They contain a pointer to a list of blocked threads to be awakened +when the black hole is updated (or @NULL@ if the list is empty). +\begin{center} +\begin{tabular}{|l|l|l|} +\hline +{\em Fixed header} & {\em Mutable link} & {\em Blocked thread link} \\ +\hline +\end{tabular} +\end{center} +The {\em Blocked thread link} points to the TSO of the first thread +waiting for the value of this thunk. All subsequent TSOs in the list +are linked together using their @TSO_LINK@ field. + +When the blocking queue is non-@NULL@, the black hole must be added to +the mutables list since the TSOs on the list may contain pointers into +the new generation. There is no need to clutter up the mutables list +with black holes with empty blocking queues. + +\ToDo{MVars} + + +\subsubsection{FetchMes}\label{sect:FETCHME} + +In the parallel systems, FetchMes are used to represent pointers into +the global heap. When evaluated, the value they point to is read from +the global heap. + +\ToDo{Describe layout} + +Because there may be offsets into these arrays, a primitive array +cannot be handled as a FetchMe in the parallel system, but must be +shipped in its entirety if its parent closure is shipped. + + + +\subsection{Unpointed Objects} + +A variable of unpointed type is always bound to a {\em value}, never to a {\em thunk}. +For this reason, unpointed objects cannot be entered. + +A {\em value} may be: +\begin{itemize} +\item {\em Boxed}, i.e.~represented indirectly by a pointer to a heap object (e.g.~foreign objects, arrays); or +\item {\em Unboxed}, i.e.~represented directly by a bit-pattern in one or more registers (e.g.~@Int#@ and @Float#@). +\end{itemize} +All {\em pointed} values are {\em boxed}. + +\subsubsection{Immutable Objects} +\label{sect:ARR_WORDS1} +\label{sect:ARR_PTRS} + +\begin{description} +\item[@ARR_WORDS@] is a variable-sized object consisting solely of +non-pointers. It is used for arrays of all +sorts of things (bytes, words, floats, doubles... it doesn't matter). +\begin{center} +\begin{tabular}{|c|c|c|c|} +\hline +{\em Fixed Hdr} & {\em No of non-pointers} & {\em Non-pointers\ldots} \\ \hline +\end{tabular} +\end{center} + +\item[@ARR_PTRS@] is an immutable, variable sized array of pointers. +\begin{center} +\begin{tabular}{|c|c|c|c|} +\hline +{\em Fixed Hdr} & {\em Mutable link} & {\em No of pointers} & {\em Pointers\ldots} \\ \hline +\end{tabular} +\end{center} +The mutable link is present so that we can easily freeze and thaw an +array (by changing the header and adding/removing the array to the +mutables list). + +\end{description} + +\subsubsection{Mutable Objects} +\label{sect:mutables} +\label{sect:ARR_WORDS2} +\label{sect:MUTVAR} +\label{sect:MUTARR_PTRS} +\label{sect:MUTARR_PTRS_FROZEN} + +Some of these objects are {\em mutable}; they represent objects which +are explicitly mutated by Haskell code through the @ST@ monad. +They're not used for thunks which are updated precisely once. +Depending on the garbage collector, mutable closures may contain extra +header information which allows a generational collector to implement +the ``write barrier.'' + +\begin{description} + +\item[@ARR_WORDS@] is also used to represent {\em mutable} arrays of +bytes, words, floats, doubles, etc. It's possible to use the same +object type because even generational collectors don't need to +distinguish them. + +\item[@MUTVAR@] is a mutable variable. +\begin{center} +\begin{tabular}{|c|c|c|} +\hline +{\em Fixed Hdr} & {\em Mutable link} & {\em Pointer} \\ \hline +\end{tabular} +\end{center} + +\item[@MUTARR_PTRS@] is a mutable array of pointers. +Such an array may be {\em frozen}, becoming an @SM_MUTARR_PTRS_FROZEN@, with a +different info-table. +\begin{center} +\begin{tabular}{|c|c|c|c|} +\hline +{\em Fixed Hdr} & {\em Mutable link} & {\em No of ptrs} & {\em Pointers\ldots} \\ \hline +\end{tabular} +\end{center} + +\item[@MUTARR_PTRS_FROZEN@] is a frozen @MUTARR_PTRS@ closure. +The garbage collector converts @MUTARR_PTRS_FROZEN@ to @ARR_PTRS@ as it removes them from +the mutables list. + +\end{description} + + +\subsubsection{Foreign Objects}\label{sect:FOREIGN} + +Here's what a ForeignObj looks like: + +\begin{center} +\begin{tabular}{|l|l|l|l|} +\hline +{\em Fixed header} & {\em Data} & {\em Free Routine} & {\em Foreign object link} \\ +\hline +\end{tabular} +\end{center} + +The @FreeRoutine@ is a reference to the finalisation routine to call +when the @ForeignObj@ becomes garbage. If @freeForeignObject@ is +called on a Foreign Object, the @FreeRoutine@ is set to zero and the +garbage collector will not attempt to call @FreeRoutine@ when the +object becomes garbage. + +The Foreign object link is a link to the next foreign object in the +list. This list is traversed at the end of garbage collection: if an +object is about to be deallocated (e.g.~it was not marked or +evacuated), the free routine is called and the object is deleted from +the list. + + +The remaining objects types are all administrative --- none of them may be entered. + +\subsection{Other weird objects} +\label{sect:SPARK} +\label{sect:BLOCKED_FETCH} + +\begin{description} +\item[@BlockedFetch@ heap objects (`closures')] (parallel only) + +@BlockedFetch@s are inbound fetch messages blocked on local closures. +They arise as entries in a local blocking queue when a fetch has been +received for a local black hole. When awakened, we look at their +contents to figure out where to send a resume. + +A @BlockedFetch@ closure has the form: +\begin{center} +\begin{tabular}{|l|l|l|l|l|l|}\hline +{\em Fixed header} & link & node & gtid & slot & weight \\ \hline +\end{tabular} +\end{center} + +\item[Spark Closures] (parallel only) + +Spark closures are used to link together all closures in the spark pool. When +the current processor is idle, it may choose to speculatively evaluate some of +the closures in the pool. It may also choose to delete sparks from the pool. +\begin{center} +\begin{tabular}{|l|l|l|l|l|l|}\hline +{\em Fixed header} & {\em Spark pool link} & {\em Sparked closure} \\ \hline +\end{tabular} +\end{center} + +\item[Slop Objects]\label{sect:slop-objects} + +Slop objects are used to overwrite the end of an updatee if it is +larger than an indirection. Normal slop objects consist of an info +pointer a size word and a number of slop words. + +\begin{center} +\begin{tabular}{|l|l|l|l|l|l|}\hline +{\em Info Pointer} & {\em Size} & {\em Slop Words} \\ \hline +\end{tabular} +\end{center} + +This is too large for single word slop objects which consist of a +single info table. + +Note that slop objects only contain an info pointer, not a standard +fixed header. This doesn't cause problems because slop objects are +always unreachable --- they can only be accessed by linearly scanning +the heap. + +\end{description} + +\subsection{Thread State Objects (TSOs)}\label{sect:TSO} + +\ToDo{This is very out of date. We now embed a single stack object +within the TSO. TSOs include an ID number which can be used to +generate a hash value. The gransim, profiling and ticky info is +surely bogus.} + +In the multi-threaded system, the state of a suspended thread is +packed up into a Thread State Object (TSO) which contains all the +information needed to restart the thread and for the garbage collector +to find all reachable objects. When a thread is running, it may be +``unpacked'' into machine registers and various other memory locations +to provide faster access. + +Single-threaded systems don't really {\em need\/} TSOs --- but they do +need some way to tell the storage manager about live roots so it is +convenient to use a single TSO to store the mutator state even in +single-threaded systems. + +Rather than manage TSOs' alloc/dealloc, etc., in some {\em ad hoc} +way, we instead alloc/dealloc/etc them in the heap; then we can use +all the standard garbage-collection/fetching/flushing/etc machinery on +them. So that's why TSOs are ``heap objects,'' albeit very special +ones. +\begin{center} +\begin{tabular}{|l|l|} + \hline {\em Fixed header} +\\ \hline @TSO_LINK@ +\\ \hline @TSO_WHATNEXT@ +\\ \hline @TSO_WHATNEXT_INFO@ +\\ \hline @TSO_STACK@ +\\ \hline {\em Exception Handlers} +\\ \hline {\em Ticky Info} +\\ \hline {\em Profiling Info} +\\ \hline {\em Parallel Info} +\\ \hline {\em GranSim Info} +\\ \hline +\end{tabular} +\end{center} +The contents of a TSO are: +\begin{itemize} + +\item A pointer (@TSO_LINK@) used to maintain a list of threads with a similar + state (e.g.~all runnable, all sleeping, all blocked on the same black + hole, all blocked on the same MVar, etc.) + +\item A word (@TSO_WHATNEXT@) which is in suspended threads to record + how to awaken it. This typically requires a program counter which is stored + in the pointer @TSO_WHATNEXT_INFO@ + +\item A pointer (@TSO_STACK@) to the top stack block. + +\item Optional information for ``Ticky Ticky'' statistics: @TSO_STK_HWM@ is + the maximum number of words allocated to this thread. + +\item Optional information for profiling: + @TSO_CCC@ is the current cost centre. + +\item Optional information for parallel execution: +\begin{itemize} + +\item The types of threads (@TSO_TYPE@): +\begin{description} +\item[@T_MAIN@] Must be executed locally. +\item[@T_REQUIRED@] A required thread -- may be exported. +\item[@T_ADVISORY@] An advisory thread -- may be exported. +\item[@T_FAIL@] A failure thread -- may be exported. +\end{description} + +\item I've no idea what else + +\end{itemize} + +\item Optional information for GranSim execution: +\begin{itemize} +\item locked +\item sparkname +\item started at +\item exported +\item basic blocks +\item allocs +\item exectime +\item fetchtime +\item fetchcount +\item blocktime +\item blockcount +\item global sparks +\item local sparks +\item queue +\item priority +\item clock (gransim light only) +\end{itemize} + + +Here are the various queues for GrAnSim-type events. +@ +Q_RUNNING +Q_RUNNABLE +Q_BLOCKED +Q_FETCHING +Q_MIGRATING +@ + +\end{itemize} + +\subsection{Stack Objects} +\label{sect:STACK_OBJECT} +\label{sect:stacks} + +These are ``stack objects,'' which are used in the threaded world as +the stack for each thread is allocated from the heap in smallish +chunks. (The stack in the sequential world is allocated outside of +the heap.) + +Each reduction thread has to have its own stack space. As there may +be many such threads, and as any given one may need quite a big stack, +a naive give-'em-a-big-stack-and-let-'em-run approach will cost a {\em +lot} of memory. + +Our approach is to give a thread a small stack space, and then link +on/off extra ``chunks'' as the need arises. Again, this is a +storage-management problem, and, yet again, we choose to graft the +whole business onto the existing heap-management machinery. So stack +objects will live in the heap, be garbage collected, etc., etc.. + +A stack object is laid out like this: + +\begin{center} +\begin{tabular}{|l|} +\hline +{\em Fixed header} +\\ \hline +{\em Link to next stack object (0 for last)} +\\ \hline +{\em N, the payload size in words} +\\ \hline +{\em @Sp@ (byte offset from head of object)} +\\ \hline +{\em @Su@ (byte offset from head of object)} +\\ \hline +{\em Payload (N words)} +\\ \hline +\end{tabular} +\end{center} + +\ToDo{Are stack objects on the mutable list?} + +The stack grows downwards, towards decreasing +addresses. This makes it easier to print out the stack +when debugging, and it means that a return address is +at the lowest address of the chunk of stack it ``knows about'' +just like an info pointer on a closure. + +The garbage collector needs to be able to find all the +pointers in a stack. How does it do this? + +\begin{itemize} + +\item Within the stack there are return addresses, pushed +by @case@ expressions. Below a return address (i.e. at higher +memory addresses, since the stack grows downwards) is a chunk +of stack that the return address ``knows about'', namely the +activation record of the currently running function. + +\item Below each such activation record is a {\em pending-argument +section}, a chunk of +zero or more words that are the arguments to which the result +of the function should be applied. The return address does not +statically +``know'' how many pending arguments there are, or their types. +(For example, the function might return a result of type $\alpha$.) + +\item Below each pending-argument section is another return address, +and so on. Actually, there might be an update frame instead, but we +can consider update frames as a special case of a return address with +a well-defined activation record. + +\ToDo{Doesn't it {\em have} to be an update frame? After all, the arg +satisfaction check is @Su - Sp >= ...@.} + +\end{itemize} + +The game plan is this. The garbage collector +walks the stack from the top, traversing pending-argument sections and +activation records alternately. Next we discuss how it finds +the pointers in each of these two stack regions. + + +\subsubsection{Activation records}\label{sect:activation-records} + +An {\em activation record} is a contiguous chunk of stack, +with a return address as its first word, followed by as many +data words as the return address ``knows about''. The return +address is actually a fully-fledged info pointer. It points +to an info table, replete with: + +\begin{itemize} +\item entry code (i.e. the code to return to). +\item @INFO_TYPE@ is either @RET_SMALL/RET_VEC_SMALL@ or @RET_BIG/RET_VEC_BIG@, depending +on whether the activation record has more than 32 data words (\note{64 for 8-byte-word architectures}) and on whether +to use a direct or a vectored return. +\item @INFO_SM@ for @RET_SMALL@ is a bitmap telling the layout +of the activation record, one bit per word. The least-significant bit +describes the first data word of the record (adjacent to the fixed +header) and so on. A ``@1@'' indicates a non-pointer, a ``@0@'' +indicates +a pointer. We don't need to indicate exactly how many words there +are, +because when we get to all zeros we can treat the rest of the +activation record as part of the next pending-argument region. + +For @RET_BIG@ the @INFO_SM@ field points to a block of bitmap +words, starting with a word that tells how many words are in +the block. + +\item @INFO_SRT@ is the Static Reference Table for the return +address (Section~\ref{sect:srt}). +\end{itemize} + +The activation record is a fully fledged closure too. +As well as an info pointer, it has all the other attributes of +a fixed header (Section~\ref{sect:fixed-header}) including a saved cost +centre which is reloaded when the return address is entered. + +In other words, all the attributes of closures are needed for +activation records, so it's very convenient to make them look alike. + + +\subsubsection{Pending arguments} + +So that the garbage collector can correctly identify pointers +in pending-argument sections we explicitly tag all non-pointers. +Every non-pointer in a pending-argument section is preceded +(at the next lower memory word) by a one-word byte count that +says how many bytes to skip over (excluding the tag word). + +The garbage collector traverses a pending argument section from +the top (i.e. lowest memory address). It looks at each word in turn: + +\begin{itemize} +\item If it is less than or equal to a small constant @MAX_STACK_TAG@ +then +it treats it as a tag heralding zero or more words of non-pointers, +so it just skips over them. + +\item If it points to the code segment, it must be a return +address, so we have come to the end of the pending-argument section. + +\item Otherwise it must be a bona fide heap pointer. +\end{itemize} + + +\subsection{The Stable Pointer Table}\label{sect:STABLEPTR_TABLE} + +A stable pointer is a name for a Haskell object which can be passed to +the external world. It is ``stable'' in the sense that the name does +not change when the Haskell garbage collector runs---in contrast to +the address of the object which may well change. + +A stable pointer is represented by an index into the +@StablePointerTable@. The Haskell garbage collector treats the +@StablePointerTable@ as a source of roots for GC. + +In order to provide efficient access to stable pointers and to be able +to cope with any number of stable pointers (eg $0 \ldots 100000$), the +table of stable pointers is an array stored on the heap and can grow +when it overflows. (Since we cannot compact the table by moving +stable pointers about, it seems unlikely that a half-empty table can +be reduced in size---this could be fixed if necessary by using a +hash table of some sort.) + +In general a stable pointer table closure looks like this: + +\begin{center} +\begin{tabular}{|l|l|l|l|l|l|l|l|l|l|l|} +\hline +{\em Fixed header} & {\em No of pointers} & {\em Free} & $SP_0$ & \ldots & $SP_{n-1}$ +\\\hline +\end{tabular} +\end{center} + +The fields are: +\begin{description} + +\item[@NPtrs@:] number of (stable) pointers. + +\item[@Free@:] the byte offset (from the first byte of the object) of the first free stable pointer. + +\item[$SP_i$:] A stable pointer slot. If this entry is in use, it is +an ``unstable'' pointer to a closure. If this entry is not in use, it +is a byte offset of the next free stable pointer slot. + +\end{description} + +When a stable pointer table is evacuated +\begin{enumerate} +\item the free list entries are all set to @NULL@ so that the evacuation + code knows they're not pointers; + +\item The stable pointer slots are scanned linearly: non-@NULL@ slots +are evacuated and @NULL@-values are chained together to form a new free list. +\end{enumerate} + +There's no need to link the stable pointer table onto the mutable +list because we always treat it as a root. + + + +\section{The Bytecode Evaluator} + +This section describes how the Hugs interpreter interprets code in the +same environment as compiled code executes. Both evaluation models +use a common garbage collector, so they must agree on the form of +objects in the heap. + +Hugs interprets code by converting it to byte-code and applying a +byte-code interpreter to it. Wherever possible, we try to ensure that +the byte-code is all that is required to interpret a section of code. +This means not dynamically generating info tables, and hence we can +only have a small number of possible heap objects each with a statically +compiled info table. Similarly for stack objects: in fact we only +have one Hugs stack object, in which all information is tagged for the +garbage collector. + +There is, however, one exception to this rule. Hugs must generate +info tables for any constructors it is asked to compile, since the +alternative is to force a context-switch each time compiled code +enters a Hugs-built constructor, which would be prohibitively +expensive. + +We achieve this simplicity by forgoing some of the optimisations used +by compiled code: +\begin{itemize} +\item + +Whereas compiled code has five different ways of entering a closure +(section~\ref{sect:entering-closures}), interpreted code has only one. +The entry point for interpreted code behaves like slow entry points for +compiled code. + +\item + +We use just one info table for {\em all\/} direct returns. +This introduces two problems: +\begin{enumerate} +\item How does the interpreter know what code to execute? + +Instead of pushing just a return address, we push a return BCO and a +trivial return address which just enters the return BCO. + +(In a purely interpreted system, we could avoid pushing the trivial +return address.) + +\item How can the garbage collector follow pointers within the +activation record? + +We could push a third word ---a bitmask describing the location of any +pointers within the record--- but, since we're already tagging unboxed +function arguments on the stack, we use the same mechanism for unboxed +values within the activation record. + +\ToDo{Do we have to stub out dead variables in the activation frame?} + +\end{enumerate} + +\item + +We trivially support vectored returns by pushing a return vector whose +entries are all the same. + +\item + +We avoid the need to build SRTs by putting bytecode objects on the +heap and restricting BCOs to a single basic block. + +\end{itemize} + +\subsection{Hugs Info Tables} + +Hugs requires the following info tables and closures: +\begin{description} +\item [@HUGS_RET@]. + +Contains both a vectored return table and a direct entry point. All +entry points are the same: they rearrange the stack to match the Hugs +return convention (section~\label{sect:hugs-return-convention}) and return +to the scheduler. When the scheduler restarts the thread, it will +find a BCO on top of the stack and will enter the Hugs interpreter. + +\item [@UPD_RET@]. + +This is just the standard info table for an update frame. + +\item [Constructors]. + +The entry code for a constructor jumps to a generic entry point in the +runtime system which decides whether to do a vectored or unvectored +return depending on the shape of the constructor/type. This implies that +info tables must have enough info to make that decision. + +\item [@AP@ and @PAP@]. + +\item [Indirections]. + +\item [Selectors]. + +Hugs doesn't generate them itself but it ought to recognise them + +\item [Complex primops]. + +Some of the primops are too complex for GHC to generate inline. +Instead, these primops are hand-written and called as normal functions. +Hugs only needs to know their names and types but doesn't care whether +they are generated by GHC or by hand. Two things to watch: + +\begin{enumerate} +\item +Hugs must be able to enter these primops even if it is working on a +standalone system that does not support genuine GHC generated code. + +\item The complex primops often involve unboxed tuple types (which +Hugs does not support at the source level) so we cannot specify their +types in a Haskell source file. + +\end{enumerate} + +\end{description} + +\subsection{Hugs Heap Objects} +\label{sect:hugs-heap-objects} + +\subsubsection{Byte-Code Objects} + +Compiled byte code lives on the global heap, in objects called +Byte-Code Objects (or BCOs). The layout of BCOs is described in +detail in Section \ref{sect:BCO}, in this section we will describe +their semantics. + +Since byte-code lives on the heap, it can be garbage collected just +like any other heap-resident data. Hugs arranges that any BCO's +referred to by the Hugs symbol tables are treated as live objects by +the garbage collector. When a module is unloaded, the pointers to its +BCOs are removed from the symbol table, and the code will be garbage +collected some time later. + +A BCO represents a basic block of code - all entry points are at the +beginning of a BCO, and it is impossible to jump into the middle of +one. A BCO represents not only the code for a function, but also its +closure; a BCO can be entered just like any other closure. Hugs +performs lambda-lifting during compilation to byte-code, and each +top-level combinator becomes a BCO in the heap. + +\ToDo{The phrase "all entry points..." suggests that BCOs have multiple +entry points. If so, we need to say a lot more about it...} + +\subsubsection{Thunks and partial applications} + +A thunk consists of a code pointer, and values for the free variables +of that code. Since Hugs byte-code is lambda-lifted, free variables +become arguments and are expected to be on the stack by the called +function. + +Hugs represents updateable thunks with @AP@ objects applying a closure +to a list of arguments. (As for @PAP@s, unboxed arguments should be +preceded by a tag.) When it is entered, it pushes an update frame +followed by its payload on the stack, and enters the first word (which +will be a pointer to a BCO). The layout of @AP@ objects is described +in more detail in Section \ref{sect:AP}. + +Partial applications are represented by @PAP@ objects, which are +non-updatable. + +\ToDo{Hugs Constructors}. + +\subsection{Calling conventions} +\label{sect:hugs-calling-conventions} +\label{sect:standard-closures} + +The calling convention for any byte-code function is straightforward: +\begin{itemize} +\item Push any arguments on the stack. +\item Push a pointer to the BCO. +\item Begin interpreting the byte code. +\end{itemize} + +In a system containing both GHC and Hugs, the bytecode interpreter +only has to be able to enter BCOs: everything else can be handled by +returning to the compiled world (as described in +Section~\ref{sect:hugs-to-ghc-switch}) and entering the closure +there. + +This would work but it would obviously be very inefficient if +we entered a @AP@ by switching worlds, entering the @AP@, +pushing the arguments and function onto the stack, and entering the +function which, likely as not, will be a byte-code object which we +will enter by \emph{returning} to the byte-code interpreter. To avoid +such gratuitious world switching, we choose to recognise certain +closure types as being ``standard'' --- and duplicate the entry code +for the ``standard closures'' in the bytecode interpreter. + +A closure is said to be ``standard'' if its entry code is entirely +determined by its info table. \emph{Standard Closures} have the +desirable property that the byte-code interpreter can enter +the closure by simply ``interpreting'' the info table instead of +switching to the compiled world. The standard closures include: + +\begin{description} +\item[Constructor] +To enter a constructor, we simply return (see Section +\ref{sect:hugs-return-convention}). + +\item[Indirection] +To enter an indirection, we simply enter the object it points to +after possibly adjusting the current cost centre. + +\item[@AP@] + +To enter an @AP@, we push an update frame, push the +arguments, push the function and enter the function. +(Not forgetting a stack check at the start.) + +\item[@PAP@] + +To enter a @PAP@, we push the arguments, push the function and enter +the function. (Not forgetting a stack check at the start.) + +\item[Selector] +To enter a selector, we test whether the selectee is a value. If so, +we simply select the appropriate component; if not, it's simplest to +treat it as a GHC-built closure --- though we could interpret it if we +wanted. + +\end{description} + +The most obvious omissions from the above list are @BCO@s (which we +dealt with above) and GHC-built closures (which are covered in Section +\ref{sect:hugs-to-ghc-switch}). + + +\subsection{Return convention} +\label{sect:hugs-return-convention} + +When Hugs pushes a return address, it pushes both a pointer to the BCO +to return to, and a pointer to a static code fragment @HUGS_RET@ (this +is described in Section \ref{sect:ghc-to-hugs-switch}). The +stack layout is shown in Figure \ref{fig:hugs-return-stack}. + +\begin{figure}[ht] +\begin{center} +@ +| stack | ++----------+ +| bco |--> BCO ++----------+ +| HUGS_RET | ++----------+ +@ +%\input{hugs_ret.pstex_t} +\end{center} +\caption{Stack layout for a Hugs return address} +\label{fig:hugs-return-stack} +\end{figure} + +\begin{figure}[ht] +\begin{center} +@ +| stack | ++----------+ +| con |--> CON ++----------+ +@ +%\input{hugs_ret2.pstex_t} +\end{center} +\caption{Stack layout on enterings a Hugs return address} +\label{fig:hugs-return2} +\end{figure} + +\begin{figure}[ht] +\begin{center} +@ +| stack | ++----------+ +| 3# | ++----------+ +| I# | ++----------+ +@ +%\input{hugs_ret2.pstex_t} +\end{center} +\caption{Stack layout on entering a Hugs return address with an unboxed value} +\label{fig:hugs-return-int} +\end{figure} + +\begin{figure}[ht] +\begin{center} +@ +| stack | ++----------+ +| ghc_ret | ++----------+ +| con |--> CON ++----------+ +@ +%\input{hugs_ret3.pstex_t} +\end{center} +\caption{Stack layout on enterings a GHC return address} +\label{fig:hugs-return3} +\end{figure} + +\begin{figure}[ht] +\begin{center} +@ +| stack | ++----------+ +| ghc_ret | ++----------+ +| 3# | ++----------+ +| I# | ++----------+ +| restart |--> id_Int#_closure ++----------+ +@ +%\input{hugs_ret2.pstex_t} +\end{center} +\caption{Stack layout on enterings a GHC return address with an unboxed value} +\label{fig:hugs-return-int} +\end{figure} + +When a Hugs byte-code sequence enters a closure, it examines the +return address on top of the stack. + +\begin{itemize} + +\item If the return address is @HUGS_RET@, pop the @HUGS_RET@ and the +bco for the continuation off the stack, push a pointer to the constructor onto +the stack and enter the BCO with the current object pointer set to the BCO +(Figure \ref{fig:hugs-return2}). + +\item If the top of the stack is not @HUGS_RET@, we need to do a world +switch as described in Section \ref{sect:hugs-to-ghc-switch}. + +\end{itemize} + +\ToDo{This duplicates what we say about switching worlds +(section~\ref{sect:switching-worlds}) - kill one or t'other.} + + +\ToDo{This was in the evaluation model part but it really belongs in +this part which is about the internal details of each of the major +sections.} + +\subsection{Addressing Modes} + +To avoid potential alignment problems and simplify garbage collection, +all literal constants are stored in two tables (one boxed, the other +unboxed) within each BCO and are referred to by offsets into the tables. +Slots in the constant tables are word aligned. + +\ToDo{How big can the offsets be? Is the offset specified in the +address field or in the instruction?} + +Literals can have the following types: char, int, nat, float, double, +and pointer to boxed object. There is no real difference between +char, int, nat and float since they all occupy 32 bits --- but it +costs almost nothing to distinguish them and may improve portability +and simplify debugging. + +\subsection{Compilation} + + +\def\is{\mbox{\it is}} +\def\ts{\mbox{\it ts}} +\def\as{\mbox{\it as}} +\def\bs{\mbox{\it bs}} +\def\cs{\mbox{\it cs}} +\def\rs{\mbox{\it rs}} +\def\us{\mbox{\it us}} +\def\vs{\mbox{\it vs}} +\def\ws{\mbox{\it ws}} +\def\xs{\mbox{\it xs}} + +\def\e{\mbox{\it e}} +\def\alts{\mbox{\it alts}} +\def\fail{\mbox{\it fail}} +\def\panic{\mbox{\it panic}} +\def\ua{\mbox{\it ua}} +\def\obj{\mbox{\it obj}} +\def\bco{\mbox{\it bco}} +\def\tag{\mbox{\it tag}} +\def\entry{\mbox{\it entry}} +\def\su{\mbox{\it su}} + +\def\Ind#1{{\mbox{\it Ind}\ {#1}}} +\def\update#1{{\mbox{\it update}\ {#1}}} + +\def\next{$\Longrightarrow$} +\def\append{\mathrel{+\mkern-6mu+}} +\def\reverse{\mbox{\it reverse}} +\def\size#1{{\vert {#1} \vert}} +\def\arity#1{{\mbox{\it arity}{#1}}} + +\def\AP{\mbox{\it AP}} +\def\PAP{\mbox{\it PAP}} +\def\GHCRET{\mbox{\it GHCRET}} +\def\GHCOBJ{\mbox{\it GHCOBJ}} + +To make sense of the instructions, we need a sense of how they will be +used. Here is a small compiler for the STG language. + +@ +> cg (f{a1, ... am}) = do +> pushAtom am; ... pushAtom a1 +> pushVar f +> SLIDE (m+1) |env| +> ENTER +> cg (let{x1=rhs1; ... xm=rhsm in e) = do +> ALLOC x1 |rhs1|, ... ALLOC xm |rhsm| +> build x1 rhs1, ... build xm rhsm +> cg e +> cg (case e of alts) = do +> PUSHALTS (cgAlts alts) +> cg e + +> cgAlts { alt1; ... altm } = cgAlt alt1 $ ... $ cgAlt altm pmFail +> +> cgAlt (x@C{xs} -> e) fail = do +> TEST C fail +> HEAPCHECK (heapUse e) +> UNPACK xs +> cg e + +> build x (C{a1, ... am}) = do +> pushUntaggedAtom am; ... pushUntaggedAtom a1 +> PACK x C +> build x ({v1, ... vm} \ {}. e) = do +> pushVar vm; ... pushVar v1 +> PUSHBCO (cgRhs ({v1, ... vm} \ {}. e)) +> MKAP x m +> build x ({v1, ... vm} \ {x1, ... xm}. e) = do +> pushVar vm; ... pushVar v1 +> PUSHBCO (cgRhs ({v1, ... vm} \ {x1, ... xm}. e)) +> MKPAP x m + +> cgRhs (vs \ xs. e) = do +> ARGCHECK (xs ++ vs) -- can be omitted if xs == {} +> STACKCHECK min(stackUse e,heapOverflowSlop) +> HEAPCHECK (heapUse e) +> cg e + +> pushAtom x = pushVar x +> pushAtom i# = PUSHINT i# + +> pushVar x = if isGlobalVar x then PUSHGLOBAL x else PUSHLOCAL x + +> pushUntaggedAtom x = pushVar x +> pushUntaggedAtom i# = PUSHUNTAGGEDINT i# + +> pushVar x = if isGlobalVar x then PUSHGLOBAL x else PUSHLOCAL x +@ + +\ToDo{Is there an easy way to add semi-tagging? Would it be that different?} + +\ToDo{Optimise thunks of the form @f{x1,...xm}@ so that we build an AP directly} + +\subsection{Instructions} + +We specify the semantics of instructions using transition rules of +the form: + +\begin{tabular}{|llrrrrr|} +\hline + & $\is$ & $s$ & $\su$ & $h$ & $hp$ & $\sigma$ \\ +\next & $\is'$ & $s'$ & $\su'$ & $h'$ & $hp'$ & $\sigma$ \\ +\hline +\end{tabular} + +where $\is$ is an instruction stream, $s$ is the stack, $\su$ is the +update frame pointer and $h$ is the heap. + + +\subsection{Stack manipulation} + +\begin{description} + +\item[ Push a global variable ]. + +\begin{tabular}{|llrrrrr|} +\hline + & PUSHGLOBAL $o$ : $\is$ & $s$ & $su$ & $h$ & $hp$ & $\sigma$ \\ +\next & $\is$ & $\sigma!o:s$ & $su$ & $h$ & $hp$ & $\sigma$ \\ +\hline +\end{tabular} + +\item[ Push a local variable ]. + +\begin{tabular}{|llrrrrr|} +\hline + & PUSHLOCAL $o$ : $\is$ & $s$ & $su$ & $h$ & $hp$ & $\sigma$ \\ +\next & $\is$ & $s!o : s$ & $su$ & $h$ & $hp$ & $\sigma$ \\ +\hline +\end{tabular} + +\item[ Push an unboxed int ]. + +\begin{tabular}{|llrrrrr|} +\hline + & PUSHINT $o$ : $\is$ & $s$ & $su$ & $h$ & $hp$ & $\sigma$ \\ +\next & $\is$ & $I\# : \sigma!o : s$ & $su$ & $h$ & $hp$ & $\sigma$ \\ +\hline +\end{tabular} + +The $I\#$ is a tag included for the benefit of the garbage collector. +Similar rules exist for floats, doubles, chars, etc. + +\item[ Push an unboxed int ]. + +\begin{tabular}{|llrrrrr|} +\hline + & PUSHUNTAGGEDINT $o$ : $\is$ & $s$ & $su$ & $h$ & $hp$ & $\sigma$ \\ +\next & $\is$ & $\sigma!o : s$ & $su$ & $h$ & $hp$ & $\sigma$ \\ +\hline +\end{tabular} -When a @case@ expression evaluates a variable that might be bound -to a thunk it is often the case that the scrutinee is already evaluated. -In this case we have paid the penalty of (a) pushing the return address (or -return vector address) on the stack, (b) jumping through the info pointer -of the scrutinee, and (c) returning by an indirect jump through the -return address on the stack. +Similar rules exist for floats, doubles, chars, etc. -If we knew that the scrutinee was already evaluated we could generate -(better) code which simply jumps to the appropriate branch of the -@case@ with a pointer to the scrutinee in \Arg{1}. (For direct -returns to multiconstructor datatypes, we might also load the tag into -\Arg{2}). +\item[ Delete environment from stack --- ready for tail call ]. -An obvious idea, therefore, is to test dynamically whether the heap -closure is a value (using the tag in the info table). If not, we -enter the closure as usual; if so, we jump straight to the appropriate -alternative. Here, for example, is pseudo-code for the expression -@(case x of { (a,_,c) -> E }@: -@ - \Arg{1} = ; - tag = \Arg{1}->entry->tag; - if (isWHNF(tag)) { - Sp--; \\ insert space for return address - goto ret; - } - push(ret); - goto \Arg{1}->entry; - - -ret: a = \Arg{1}->data1; \\ suck out a and c to avoid space leak - c = \Arg{1}->data3; - -@ -and here is the code for the expression @(case x of { [] -> E1; x:xs -> E2 }@: -@ - \Arg{1} = ; - tag = \Arg{1}->entry->tag; - if (isWHNF(tag)) { - Sp--; \\ insert space for return address - goto retvec[tag]; - } - push(retinfo); - goto \Arg{1}->entry; - - .addr ret2 - .addr ret1 -retvec: \\ reversed return vector - -retinfo: - panic("Direct return into vectored case"); - -ret1: +\begin{tabular}{|llrrrrr|} +\hline + & SLIDE $m$ $n$ : $\is$ & $\as \append \bs \append \cs$ & $su$ & $h$ & $hp$ & $\sigma$ \\ +\next & $\is$ & $\as \append \cs$ & $su$ & $h$ & $hp$ & $\sigma$ \\ +\hline +\end{tabular} +\\ +where $\size{\as} = m$ and $\size{\bs} = n$. -ret2: x = \Arg{1}->head; - xs = \Arg{1}->tail; - -@ -There is an obvious cost in compiled code size (but none in the size -of the bytecodes). There is also a cost in execution time if we enter -more thunks than data constructors. -Both the direct and vectored returns are easily modified to chase chains -of indirections too. In the vectored case, this is most easily done by -making sure that @IND = TAG_1 - 1@, and adding an extra field to every -return vector. In the above example, the indirection code would be -@ -ind: \Arg{1} = \Arg{1}->next; - goto ind_loop; -@ -where @ind_loop@ is the second line of code. +\item[ Push a return address ]. -Note that we have to leave space for a return address since the return -address expects to find one. If the body of the expression requires a -heap check, we will actually have to write the return address before -entering the garbage collector. +\begin{tabular}{|llrrrrr|} +\hline + & PUSHALTS $o$:$\is$ & $s$ & $su$ & $h$ & $hp$ & $\sigma$ \\ +\next & $\is$ & $@HUGS_RET@:\sigma!o:s$ & $su$ & $h$ & $hp$ & $\sigma$ \\ +\hline +\end{tabular} +\item[ Push a BCO ]. -\subsection{Heap and Stack Checks} -\label{sect:heap-and-stack-checks} +\begin{tabular}{|llrrrrr|} +\hline + & PUSHBCO $o$ : $\is$ & $s$ & $su$ & $h$ & $hp$ & $\sigma$ \\ +\next & $\is$ & $\sigma!o : s$ & $su$ & $h$ & $hp$ & $\sigma$ \\ +\hline +\end{tabular} -The storage manager detects that it needs to garbage collect the old -generation when the evaluator requests a garbage collection without -having moved the heap pointer since the last garbage collection. It -is therefore important that the GC routines {\em not} move the heap -pointer unless the heap check fails. This is different from what -happens in the current STG implementation. +\end{description} -Assuming that the stack can never shrink, we perform a stack check -when we enter a closure but not when we return to a return -continuation. This doesn't work for heap checks because we cannot -predict what will happen to the heap if we call a function. +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% +\subsection{Heap manipulation} +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -If we wish to allow the stack to shrink, we need to perform a stack -check whenever we enter a return continuation. Most of these checks -could be eliminated if the storage manager guaranteed that a stack -would always have 1000 words (say) of space after it was shrunk. Then -we can omit stack checks for less than 1000 words in return -continuations. +\begin{description} -When an argument satisfaction check fails, we need to push the closure -(in R1) onto the stack - so we need to perform a stack check. The -problem is that the argument satisfaction check occurs \emph{before} -the stack check. The solution is that the caller of a slow entry -point or closure will guarantee that there is at least one word free -on the stack for the callee to use. +\item[ Allocate a heap object ]. -Similarily, if a heap or stack check fails, we need to push the arguments -and closure onto the stack. If we just came from the slow entry point, -there's certainly enough space and it is the responsibility of anyone -using the fast entry point to guarantee that there is enough space. +\begin{tabular}{|llrrrrr|} +\hline + & ALLOC $m$ : $\is$ & $s$ & $su$ & $h$ & $hp$ & $\sigma$ \\ +\next & $\is$ & $hp:s$ & $su$ & $h$ & $hp+m$ & $\sigma$ \\ +\hline +\end{tabular} -\ToDo{Be more precise about how much space is required - document it -in the calling convention section.} +\item[ Build a constructor ]. -\subsection{Handling interrupts/signals} +\begin{tabular}{|llrrrrr|} +\hline + & PACK $o$ $o'$ : $\is$ & $\ws \append s$ & $su$ & $h$ & $hp$ & $\sigma$ \\ +\next & $\is$ & $s$ & $su$ & $h[s!o \mapsto Pack C\{\ws\}]$ & $hp$ & $\sigma$ \\ +\hline +\end{tabular} +\\ +where $C = \sigma!o'$ and $\size{\ws} = \arity{C}$. -@ -May have to keep C stack pointer in register to placate OS? -May have to revert black holes - ouch! -@ +\item[ Build an AP or PAP ]. + +\begin{tabular}{|llrrrrr|} +\hline + & MKAP $o$ $m$:$\is$ & $f : \ws \append s$ & $su$ & $h$ & $hp$ & $\sigma$ \\ +\next & $\is$ & $s$ & $su$ & $h[s!o \mapsto \AP(f,\ws)]$ & $hp$ & $\sigma$ \\ +\hline +\end{tabular} +\\ +where $\size{\ws} = m$. + +\begin{tabular}{|llrrrrr|} +\hline + & MKPAP $o$ $m$:$\is$ & $f : \ws \append s$ & $su$ & $h$ & $hp$ & $\sigma$ \\ +\next & $\is$ & $s$ & $su$ & $h[s!o \mapsto \PAP(f,\ws)]$ & $hp$ & $\sigma$ \\ +\hline +\end{tabular} +\\ +where $\size{\ws} = m$. -\section{Interpreted Execution} +\item[ Unpacking a constructor ]. -This section describes how the Hugs interpreter interprets code in the -same environment as compiled code executes. Both evaluation models -use a common garbage collector, so they must agree on the form of -objects in the heap. +\begin{tabular}{|llrrrrr|} +\hline + & UNPACK : $is$ & $a : s$ & $su$ & $h[a \mapsto C\ \ws]$ & $hp$ & $\sigma$ \\ +\next & $is'$ & $(\reverse\ \ws) \append a : s$ & $su$ & $h$ & $hp$ & $\sigma$ \\ +\hline +\end{tabular} + +The $\reverse\ \ws$ looks expensive but, since the stack grows down +and the heap grows up, that's actually the cheap way of copying from +heap to stack. Looking at the compilation rules, you'll see that we +always push the args in reverse order. + +\end{description} + + +\subsection{Entering a closure} + +\begin{description} + +\item[ Enter a BCO ]. + +\begin{tabular}{|llrrrrr|} +\hline + & [ENTER] & $a : s$ & $su$ & $h[a \mapsto BCO\{\is\} ]$ & $hp$ & $\sigma$ \\ +\next & $\is$ & $a : s$ & $su$ & $h$ & $hp$ & $a$ \\ +\hline +\end{tabular} + +\item[ Enter a PAP closure ]. + +\begin{tabular}{|llrrrrr|} +\hline + & [ENTER] & $a : s$ & $su$ & $h[a \mapsto \PAP(f,\ws)]$ & $hp$ & $\sigma$ \\ +\next & [ENTER] & $f : \ws \append s$ & $su$ & $h$ & $hp$ & $???$ \\ +\hline +\end{tabular} + +\item[ Entering an AP closure ]. + +\begin{tabular}{|llrrrrr|} +\hline + & [ENTER] & $a : s$ & $su$ & $h[a \mapsto \AP(f,ws)]$ & $hp$ & $\sigma$ \\ +\next & [ENTER] & $f : \ws \append @UPD_RET@:\su:a:s$ & $su'$ & $h$ & $hp$ & $???$ \\ +\hline +\end{tabular} + +Optimisations: +\begin{itemize} +\item Instead of blindly pushing an update frame for $a$, we can first test whether there's already + an update frame there. If so, overwrite the existing updatee with an indirection to $a$ and + overwrite the updatee field with $a$. (Overwriting $a$ with an indirection to the updatee also + works.) This results in update chains of maximum length 2. +\end{itemize} + + +\item[ Returning a constructor ]. + +\begin{tabular}{|llrrrrr|} +\hline + & [ENTER] & $a : @HUGS_RET@ : \alts : s$ & $su$ & $h[a \mapsto C\{\ws\}]$ & $hp$ & $\sigma$ \\ +\next & $\alts.\entry$ & $a:s$ & $su$ & $h$ & $hp$ & $\sigma$ \\ +\hline +\end{tabular} + + +\item[ Entering an indirection node ]. + +\begin{tabular}{|llrrrrr|} +\hline + & [ENTER] & $a : s$ & $su$ & $h[a \mapsto \Ind{a'}]$ & $hp$ & $\sigma$ \\ +\next & [ENTER] & $a' : s$ & $su$ & $h$ & $hp$ & $\sigma$ \\ +\hline +\end{tabular} + +\item[Entering GHC closure]. + +\begin{tabular}{|llrrrrr|} +\hline + & [ENTER] & $a : s$ & $su$ & $h[a \mapsto \GHCOBJ]$ & $hp$ & $\sigma$ \\ +\next & [ENTERGHC] & $a : s$ & $su$ & $h$ & $hp$ & $\sigma$ \\ +\hline +\end{tabular} -Hugs interprets code by converting it to byte-code and applying a -byte-code interpreter to it. Wherever possible, we try to ensure that -the byte-code is all that is required to interpret a section of code. -This means not dynamically generating info tables, and hence we can -only have a small number of possible heap objects each with a statically -compiled info table. Similarly for stack objects: in fact we only -have one Hugs stack object, in which all information is tagged for the -garbage collector. +\item[Returning a constructor to GHC]. -There is, however, one exception to this rule. Hugs must generate -info tables for any constructors it is asked to compile, since the -alternative is to force a context-switch each time compiled code -enters a Hugs-built constructor, which would be prohibitively -expensive. +\begin{tabular}{|llrrrrr|} +\hline + & [ENTER] & $a : \GHCRET : s$ & $su$ & $h[a \mapsto C \ws]$ & $hp$ & $\sigma$ \\ +\next & [ENTERGHC] & $a : \GHCRET : s$ & $su$ & $h$ & $hp$ & $\sigma$ \\ +\hline +\end{tabular} -We achieve this simplicity by forgoing some of the optimisations used -by compiled code: -\begin{itemize} -\item +\end{description} -Whereas compiled code has five different ways of entering a closure -(section~\ref{sect:entering-closures}), interpreted code has only one. -The entry point for interpreted code behaves like slow entry points for -compiled code. -\item +\subsection{Updates} -We use just one info table for {\em all\/} direct returns. -This introduces two problems: -\begin{enumerate} -\item How does the interpreter know what code to execute? +\begin{description} -Instead of pushing just a return address, we push a return BCO and a -trivial return address which just enters the return BCO. +\item[ Updating with a constructor]. -(In a purely interpreted system, we could avoid pushing the trivial -return address.) +\begin{tabular}{|llrrrrr|} +\hline + & [ENTER] & $a : @UPD_RET@ : ua : s$ & $su$ & $h[a \mapsto C\{\ws\}]$ & $hp$ & $\sigma$ \\ +\next & [ENTER] & $a \append s$ & $su$ & $h[au \mapsto \Ind{a}$ & $hp$ & $\sigma$ \\ +\hline +\end{tabular} -\item How can the garbage collector follow pointers within the -activation record? +\item[ Argument checks]. -We could push a third word ---a bitmask describing the location of any -pointers within the record--- but, since we're already tagging unboxed -function arguments on the stack, we use the same mechanism for unboxed -values within the activation record. +\begin{tabular}{|llrrrrr|} +\hline + & ARGCHECK $m$:$\is$ & $a : \as \append s$ & $su$ & $h$ & $hp$ & $\sigma$ \\ +\next & $\is$ & $a : \as \append s$ & $su$ & $h'$ & $hp$ & $\sigma$ \\ +\hline +\end{tabular} +\\ +where $m \ge (su - sp)$ -\ToDo{Do we have to stub out dead variables in the activation frame?} +\begin{tabular}{|llrrrrr|} +\hline + & ARGCHECK $m$:$\is$ & $a : \as \append @UPD_RET@:su:ua:s$ & $su$ & $h$ & $hp$ & $\sigma$ \\ +\next & $\is$ & $a : \as \append s$ & $su$ & $h'$ & $hp$ & $\sigma$ \\ +\hline +\end{tabular} +\\ +where $m < (su - sp)$ and + $h' = h[ua \mapsto \Ind{a'}, a' \mapsto \PAP(a,\reverse\ \as) ]$ -\end{enumerate} +Again, we reverse the list of values as we transfer them from the +stack to the heap --- reflecting the fact that the stack and heap grow +in different directions. -\item +\end{description} -We trivially support vectored returns by pushing a return vector whose -entries are all the same. +\subsection{Branches} -\item +\begin{description} -We avoid the need to build SRTs by putting bytecode objects on the -heap and restricting BCOs to a single basic block. +\item[ Testing a constructor ]. -\end{itemize} +\begin{tabular}{|llrrrrr|} +\hline + & TEST $tag$ $is'$ : $is$ & $a : s$ & $su$ & $h[a \mapsto C\ \ws]$ & $hp$ & $\sigma$ \\ +\next & $is$ & $a : s$ & $su$ & $h$ & $hp$ & $\sigma$ \\ +\hline +\end{tabular} +\\ +where $C.\tag = tag$ -\subsubsection{Hugs Info Tables} +\begin{tabular}{|llrrrrr|} +\hline + & TEST $tag$ $is'$ : $is$ & $a : s$ & $su$ & $h[a \mapsto C\ \ws]$ & $hp$ & $\sigma$ \\ +\next & $is'$ & $a : s$ & $su$ & $h$ & $hp$ & $\sigma$ \\ +\hline +\end{tabular} +\\ +where $C.\tag \neq tag$ -Hugs requires the following info tables and closures: -\begin{description} -\item [@HUGS_RET@]. +\end{description} -Contains both a vectored return table and a direct entry point. All -entry points are the same: they rearrange the stack to match the Hugs -return convention (section~{sect:hugs-return-convention}) and return -to the scheduler. When the scheduler restarts the thread, it will -find a BCO on top of the stack and will enter the Hugs interpreter. +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% +\subsection{Heap and stack checks} +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -\item [@UPD_RET@]. +\begin{tabular}{|llrrrrr|} +\hline + & STACKCHECK $stk$:$\is$ & $s$ & $su$ & $h$ & $hp$ & $\sigma$ \\ +\next & $\is$ & $s$ & $su$ & $h$ & $hp$ & $\sigma$ \\ +\hline +\end{tabular} +\\ +if $s$ has $stk$ free slots. -\item [Constructors]. +\begin{tabular}{|llrrrrr|} +\hline + & HEAPCHECK $hp$:$\is$ & $s$ & $su$ & $h$ & $hp$ & $\sigma$ \\ +\next & $\is$ & $s$ & $su$ & $h$ & $hp$ & $\sigma$ \\ +\hline +\end{tabular} +\\ +if $h$ has $hp$ free slots. -The entry code for a constructor jumps to a generic entry point in the -runtime system which decides whether to do a vectored or unvectored -return depending on the shape of the constructor/type. This implies that -info tables must have enough info to make that decision. +If either check fails, we push the current bco ($\sigma$) onto the +stack and return to the scheduler. When the scheduler has fixed the +problem, it pops the top object off the stack and reenters it. -\item [@AP@ and @PAP@]. -\item [Indirections]. +Optimisations: +\begin{itemize} +\item The bytecode CHECK1000 conservatively checks for 1000 words of heap space and 1000 words of stack space. + We use it to reduce code space and instruction decoding time. +\item The bytecode HEAPCHECK1000 conservatively checks for 1000 words of heap space. + It is used in case alternatives. +\end{itemize} -\item [Selectors]. --- doesn't generate them itself but it ought to recognise them +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% +\subsection{Primops} +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -\item [Complex primops]. +\ToDo{primops take m words and return n words. The expect boxed arguments on the stack.} -\end{description} +\section{The Machine Code Evaluator} -\subsection{Hugs Heap Objects} -\label{sect:hugs-heap-objects} +This section describes the framework in which compiled code evaluates +expressions. Only at certain points will compiled code need to be +able to talk to the interpreted world; these are discussed in Section +\ref{sect:switching-worlds}. -\subsubsection{Byte-Code Objects} +\subsection{Calling conventions} -Compiled byte code lives on the global heap, in objects called -Byte-Code Objects (or BCOs). The layout of BCOs is described in -detail in Section \ref{sect:BCO}, in this section we will describe -their semantics. +\subsubsection{The call/return registers} -Since byte-code lives on the heap, it can be garbage collected just -like any other heap-resident data. Hugs arranges that any BCO's -referred to by the Hugs symbol tables are treated as live objects by -the garbage collectr. When a module is unloaded, the pointers to its -BCOs are removed from the symbol table, and the code will be garbage -collected some time later. +One of the problems in designing a virtual machine is that we want it +abstract away from tedious machine details but still reveal enough of +the underlying hardware that we can make sensible decisions about code +generation. A major problem area is the use of registers in +call/return conventions. On a machine with lots of registers, it's +cheaper to pass arguments and results in registers than to pass them +on the stack. On a machine with very few registers, it's cheaper to +pass arguments and results on the stack than to use ``virtual +registers'' in memory. We therefore use a hybrid system: the first +$n$ arguments or results are passed in registers; and the remaining +arguments or results are passed on the stack. For register-poor +architectures, it is important that we allow $n=0$. -A BCO represents a basic block of code - all entry points are at the -beginning of a BCO, and it is impossible to jump into the middle of -one. A BCO represents not only the code for a function, but also its -closure; a BCO can be entered just like any other closure. Hugs -performs lambda-lifting during compilation to byte-code, and each -top-level combinator becomes a BCO in the heap. +We'll label the arguments and results \Arg{1} \ldots \Arg{m} --- with +the understanding that \Arg{1} \ldots \Arg{n} are in registers and +\Arg{n+1} \ldots \Arg{m} are on top of the stack. -\ToDo{The phrase "all entry points..." suggests that BCOs have multiple -entry points. If so, we need to say a lot more about it...} +Note that the mapping of arguments \Arg{1} \ldots \Arg{n} to machine +registers depends on the {\em kinds} of the arguments. For example, +if the first argument is a Float, we might pass it in a different +register from if it is an Int. In fact, we might find that a given +architecture lets us pass varying numbers of arguments according to +their types. For example, if a CPU has 2 Int registers and 2 Float +registers then we could pass between 2 and 4 arguments in machine +registers --- depending on whether they all have the same kind or they +have different kinds. -\subsubsection{Thunks and partial applications} +\subsubsection{Entering closures} +\label{sect:entering-closures} -A thunk consists of a code pointer, and values for the free variables -of that code. Since Hugs byte-code is lambda-lifted, free variables -become arguments and are expected to be on the stack by the called -function. +To evaluate a closure we jump to the entry code for the closure +passing a pointer to the closure in \Arg{1} so that the entry code can +access its environment. -Hugs represents updateable thunks with @AP@ objects applying a closure -to a list of arguments. (As for @PAP@s, unboxed arguments should be -preceded by a tag.) When it is entered, it pushes an update frame -followed by its payload on the stack, and enters the first word (which -will be a pointer to a BCO). The layout of @AP@ objects is described -in more detail in Section \ref{sect:AP}. +\subsubsection{Function call} -Partial applications are represented by @PAP@ objects, which are -non-updatable. +The function-call mechanism is obviously crucial. There are five different +cases to consider: +\begin{enumerate} -\ToDo{Hugs Constructors}. +\item {\em Known combinator (function with no free variables) and enough arguments.} -\subsection{Calling conventions} -\label{sect:hugs-calling-conventions} -\label{sect:standard-closures} +A fast call can be made: push excess arguments onto stack and jump to +function's {\em fast entry point} passing arguments in \Arg{1} \ldots +\Arg{m}. -The calling convention for any byte-code function is straightforward: -\begin{itemize} -\item Push any arguments on the stack. -\item Push a pointer to the BCO. -\item Begin interpreting the byte code. -\end{itemize} +The {\em fast entry point} is only called with exactly the right +number of arguments (in \Arg{1} \ldots \Arg{m}) so it can instantly +start doing useful work without first testing whether it has enough +registers or having to pop them off the stack first. -In a system containing both GHC and Hugs, the bytecode interpreter -only has to be able to enter BCOs: everything else can be handled by -returning to the compiled world (as described in -Section~\ref{sect:hugs-to-ghc-closure}) and entering the closure -there. +\item {\em Known combinator and insufficient arguments.} -This would work but it would obviously be very inefficient if -we entered a @AP@ by switching worlds, entering the @AP@, -pushing the arguments and function onto the stack, and entering the -function which, likely as not, will be a byte-code object which we -will enter by \emph{returning} to the byte-code interpreter. To avoid -such gratuitious world switching, we choose to recognise certain -closure types as being ``standard'' --- and duplicate the entry code -for the ``standard closures'' in the bytecode interpreter. +A slow call can be made: push all arguments onto stack and jump to +function's {\em slow entry point}. -A closure is said to be ``standard'' if its entry code is entirely -determined by its info table. \emph{Standard Closures} have the -desirable property that the byte-code interpreter can enter -the closure by simply ``interpreting'' the info table instead of -switching to the compiled world. The standard closures include: +Any unpointed arguments which are pushed on the stack must be tagged. +This means pushing an extra word on the stack below the unpointed +words, containing the number of unpointed words above it. -\begin{description} -\item[Constructor] -To enter a constructor, we simply return (see Section -\ref{sect:hugs-return-convention}). +%Todo: forward ref about tagging? +%Todo: picture? -\item[Indirection] -To enter an indirection, we simply enter the object it points to -after possibly adjusting the current cost centre. +The {\em slow entry point} might be called with insufficient arguments +and so it must test whether there are enough arguments on the stack. +This {\em argument satisfaction check} consists of checking that +@Su-Sp@ is big enough to hold all the arguments (including any tags). -\item[@AP@] +\begin{itemize} -To enter an @AP@, we push an update frame, push the -arguments, push the function and enter the function. -(Not forgetting a stack check at the start.) +\item If the argument satisfaction check fails, it is because there is +one or more update frames on the stack before the rest of the +arguments that the function needs. In this case, we construct a PAP +(partial application, section~\ref{sect:PAP}) containing the arguments +which are on the stack. The PAP construction code will return to the +update frame with the address of the PAP in \Arg{1}. -\item[@PAP@] +\item If the argument satisfaction check succeeds, we jump to the fast +entry point with the arguments in \Arg{1} \ldots \Arg{arity}. -To enter a @PAP@, we push the arguments, push the function and enter -the function. (Not forgetting a stack check at the start.) +If the fast entry point expects to receive some of \Arg{i} on the +stack, we can reduce the amount of movement required by making the +stack layout for the fast entry point look like the stack layout for +the slow entry point. Since the slow entry point is entered with the +first argument on the top of the stack and with tags in front of any +unpointed arguments, this means that if \Arg{i} is unpointed, there +should be space below it for a tag and that the highest numbered +argument should be passed on the top of the stack. -\item[Selector] -To enter a selector, we test whether the selectee is a value. If so, -we simply select the appropriate component; if not, it's simplest to -treat it as a GHC-built closure --- though we could interpret it if we -wanted. +We usually arrange that the fast entry point is placed immediately +after the slow entry point --- so we can just ``fall through'' to the +fast entry point without performing a jump. -\end{description} +\end{itemize} -The most obvious omissions from the above list are @BCO@s (which we -dealt with above) and GHC-built closures (which are covered in Section -\ref{sect:hugs-to-ghc-closure}). +\item {\em Known function closure (function with free variables) and enough arguments.} -\subsection{Return convention} -\label{sect:hugs-return-convention} +A fast call can be made: push excess arguments onto stack and jump to +function's {\em fast entry point} passing a pointer to closure in +\Arg{1} and arguments in \Arg{2} \ldots \Arg{m+1}. -When Hugs pushes a return address, it pushes both a pointer to the BCO -to return to, and a pointer to a static code fragment @HUGS_RET@ (this -will be described in Section \ref{sect:ghc-to-hugs-return}). The -stack layout is shown in Figure \ref{fig:hugs-return-stack}. +Like the fast entry point for a combinator, the fast entry point for a +closure is only called with appropriate values in \Arg{1} \ldots +\Arg{m+1} so we can start work straight away. The pointer to the +closure is used to access the free variables of the closure. -\begin{figure} -\begin{center} -@ -| stack | -+----------+ -| bco |--> BCO -+----------+ -| HUGS_RET | -+----------+ -@ -%\input{hugs_ret.pstex_t} -\end{center} -\caption{Stack layout for a Hugs return address} -\label{fig:hugs-return-stack} -\end{figure} -\begin{figure} -\begin{center} -@ -| stack | -+----------+ -| con |--> CON -+----------+ -@ -%\input{hugs_ret2.pstex_t} -\end{center} -\caption{Stack layout on enterings a Hugs return address} -\label{fig:hugs-return2} -\end{figure} +\item {\em Known function closure and insufficient arguments.} -\begin{figure} -\begin{center} -@ -| stack | -+----------+ -| 3# | -+----------+ -| I# | -+----------+ -@ -%\input{hugs_ret2.pstex_t} -\end{center} -\caption{Stack layout on enterings a Hugs return address with an unboxed value} -\label{fig:hugs-return-int} -\end{figure} +A slow call can be made: push all arguments onto stack and jump to the +closure's slow entry point passing a pointer to the closure in \Arg{1}. -\begin{figure} -\begin{center} -@ -| stack | -+----------+ -| ghc_ret | -+----------+ -| con |--> CON -+----------+ -@ -%\input{hugs_ret3.pstex_t} -\end{center} -\caption{Stack layout on enterings a GHC return address} -\label{fig:hugs-return3} -\end{figure} +Again, the slow entry point performs an argument satisfaction check +and either builds a PAP or pops the arguments off the stack into +\Arg{2} \ldots \Arg{m+1} and jumps to the fast entry point. -\begin{figure} -\begin{center} + +\item {\em Unknown function closure, thunk or constructor.} + +Sometimes, the function being called is not statically identifiable. +Consider, for example, the @compose@ function: @ -| stack | -+----------+ -| ghc_ret | -+----------+ -| 3# | -+----------+ -| I# | -+----------+ -| restart |--> id_Int#_closure -+----------+ + compose f g x = f (g x) @ -%\input{hugs_ret2.pstex_t} -\end{center} -\caption{Stack layout on enterings a GHC return address with an unboxed value} -\label{fig:hugs-return-int} -\end{figure} +Since @f@ and @g@ are passed as arguments to @compose@, the latter has +to make a heap call. In a heap call the arguments are pushed onto the +stack, and the closure bound to the function is entered. In the +example, a thunk for @(g x)@ will be allocated, (a pointer to it) +pushed on the stack, and the closure bound to @f@ will be +entered. That is, we will jump to @f@s entry point passing @f@ in +\Arg{1}. If \Arg{1} is passed on the stack, it is pushed on top of +the thunk for @(g x)@. -When a Hugs byte-code sequence enters a closure, it examines the -return address on top of the stack. +The {\em entry code} for an updateable thunk (which must have arity 0) +pushes an update frame on the stack and starts executing the body of +the closure --- using \Arg{1} to access any free variables. This is +described in more detail in section~\ref{sect:data-updates}. -\begin{itemize} +The {\em entry code} for a non-updateable closure is just the +closure's slow entry point. -\item If the return address is @HUGS_RET@, pop the @HUGS_RET@ and the -bco for the continuation off the stack, push a pointer to the constructor onto -the stack and enter the BCO with the current object pointer set to the BCO -(Figure \ref{fig:hugs-return2}). +\end{enumerate} -\item If the top of the stack is not @HUGS_RET@, we need to do a world -switch as described in Section \ref{sect:hugs-to-ghc-return}. +In addition to the above considerations, if there are \emph{too many} +arguments then the extra arguments are simply pushed on the stack with +appropriate tags. -\end{itemize} +To summarise, a closure's standard (slow) entry point performs the following: +\begin{description} +\item[Argument satisfaction check.] (function closure only) +\item[Stack overflow check.] +\item[Heap overflow check.] +\item[Copy free variables out of closure.] %Todo: why? +\item[Eager black holing.] (updateable thunk only) %Todo: forward ref. +\item[Push update frame.] +\item[Evaluate body of closure.] +\end{description} -\part{Implementation} +\subsection{Case expressions and return conventions} +\label{sect:return-conventions} -\section{Overview} +The {\em evaluation} of a thunk is always initiated by +a @case@ expression. For example: +@ + case x of (a,b) -> E +@ -This part describes the inner workings of the major components of the system. -Their external interfaces are described in the previous part. +The code for a @case@ expression looks like this: -The major components of the system are: \begin{itemize} -\item The scheduler -\item The loader -\item The storage manager -\item The machine code evaluator (compiled code) -\item The bytecode evaluator (interpreted code) +\item Push the free variables of the branches on the stack (fv(@E@) in +this case). +\item Push a \emph{return address} on the stack. +\item Evaluate the scrutinee (@x@ in this case). \end{itemize} +Once evaluation of the scrutinee is complete, execution resumes at the +return address, which points to the code for the expression @E@. +When execution resumes at the return point, there must be some {\em +return convention} that defines where the components of the pair, @a@ +and @b@, can be found. The return convention varies according to the +type of the scrutinee @x@: -\section{Hugs Bytecodes} -\label{sect:hugs-bytecodes} +\begin{itemize} -\ToDo{This was in the evaluation model part but it really belongs in -this part which is about the internal details of each of the major -sections.} +\item -\subsubsection{Addressing Modes} +(A space for) the return address is left on the top of the stack. +Leaving the return address on the stack ensures that the top of the +stack contains a valid activation record +(section~\ref{sect:activation-records}) --- should a garbage collection +be required. -To avoid potential alignment problems and simplify garbage collection, -all literal constants are stored in two tables (one boxed, the other -unboxed) within each BCO and are referred to by offsets into the tables. -Slots in the constant tables are word aligned. +\item If @x@ has a boxed type (e.g.~a data constructor or a function), +a pointer to @x@ is returned in \Arg{1}. -\ToDo{How big can the offsets be? Is the offset specified in the -address field or in the instruction?} +\ToDo{Warn that components of E should be extracted as soon as +possible to avoid a space leak.} -Literals can have the following types: char, int, nat, float, double, -and pointer to boxed object. There is no real difference between -char, int, nat and float since they all occupy 32 bits --- but it -costs almost nothing to distinguish them and may improve portability -and simplify debugging. +\item If @x@ is an unboxed type (e.g.~@Int#@ or @Float#@), @x@ is +returned in \Arg{1} + +\item If @x@ is an unboxed tuple constructor, the components of @x@ +are returned in \Arg{1} \ldots \Arg{n} but no object is constructed in +the heap. + +When passing an unboxed tuple to a function, the components are +flattened out and passed in \Arg{1} \ldots \Arg{n} as usual. -\subsubsection{Compilation} +\end{itemize} +\subsection{Vectored Returns} -\def\is{\mbox{\it is}} -\def\ts{\mbox{\it ts}} -\def\as{\mbox{\it as}} -\def\bs{\mbox{\it bs}} -\def\cs{\mbox{\it cs}} -\def\rs{\mbox{\it rs}} -\def\us{\mbox{\it us}} -\def\vs{\mbox{\it vs}} -\def\ws{\mbox{\it ws}} -\def\xs{\mbox{\it xs}} +Many algebraic data types have more than one constructor. For +example, the @Maybe@ type is defined like this: +@ + data Maybe a = Nothing | Just a +@ +How does the return convention encode which of the two constructors is +being returned? A @case@ expression scrutinising a value of @Maybe@ +type would look like this: +@ + case E of + Nothing -> ... + Just a -> ... +@ +Rather than pushing a return address before evaluating the scrutinee, +@E@, the @case@ expression pushes (a pointer to) a {\em return +vector}, a static table consisting of two code pointers: one for the +@Just@ alternative, and one for the @Nothing@ alternative. -\def\e{\mbox{\it e}} -\def\alts{\mbox{\it alts}} -\def\fail{\mbox{\it fail}} -\def\panic{\mbox{\it panic}} -\def\ua{\mbox{\it ua}} -\def\obj{\mbox{\it obj}} -\def\bco{\mbox{\it bco}} -\def\tag{\mbox{\it tag}} -\def\entry{\mbox{\it entry}} -\def\su{\mbox{\it su}} +\begin{itemize} -\def\Ind#1{{\mbox{\it Ind}\ {#1}}} -\def\update#1{{\mbox{\it update}\ {#1}}} +\item -\def\next{$\Longrightarrow$} -\def\append{\mathrel{+\mkern-6mu+}} -\def\reverse{\mbox{\it reverse}} -\def\size#1{{\vert {#1} \vert}} -\def\arity#1{{\mbox{\it arity}{#1}}} +The constructor @Nothing@ returns by jumping to the first item in the +return vector with a pointer to a (statically built) Nothing closure +in \Arg{1}. -\def\AP{\mbox{\it AP}} -\def\PAP{\mbox{\it PAP}} -\def\GHCRET{\mbox{\it GHCRET}} -\def\GHCOBJ{\mbox{\it GHCOBJ}} +It might seem that we could avoid loading \Arg{1} in this case since the +first item in the return vector will know that @Nothing@ was returned +(and can easily access the Nothing closure in the (unlikely) event +that it needs it. The only reason we load \Arg{1} is in case we have to +perform an update (section~\ref{sect:data-updates}). -To make sense of the instructions, we need a sense of how they will be -used. Here is a small compiler for the STG language. +\item -@ -> cg (f{a1, ... am}) = do -> pushAtom am; ... pushAtom a1 -> pushVar f -> SLIDE (m+1) |env| -> ENTER -> cg (let{x1=rhs1; ... xm=rhsm in e) = do -> ALLOC x1 |rhs1|, ... ALLOC xm |rhsm| -> build x1 rhs1, ... build xm rhsm -> cg e -> cg (case e of alts) = do -> PUSHALTS (cgAlts alts) -> cg e +The constructor @Just@ returns by jumping to the second element of the +return vector with a pointer to the closure in \Arg{1}. -> cgAlts { alt1; ... altm } = cgAlt alt1 $ ... $ cgAlt altm pmFail -> -> cgAlt (x@C{xs} -> e) fail = do -> TEST C fail -> HEAPCHECK (heapUse e) -> UNPACK xs -> cg e +\end{itemize} -> build x (C{a1, ... am}) = do -> pushUntaggedAtom am; ... pushUntaggedAtom a1 -> PACK x C -> build x ({v1, ... vm} \ {}. e) = do -> pushVar vm; ... pushVar v1 -> PUSHBCO (cgRhs ({v1, ... vm} \ {}. e)) -> MKAP x m -> build x ({v1, ... vm} \ {x1, ... xm}. e) = do -> pushVar vm; ... pushVar v1 -> PUSHBCO (cgRhs ({v1, ... vm} \ {x1, ... xm}. e)) -> MKPAP x m +In this way no test need be made to see which constructor returns; +instead, execution resumes immediately in the appropriate branch of +the @case@. -> cgRhs (vs \ xs. e) = do -> ARGCHECK (xs ++ vs) -- can be omitted if xs == {} -> STACKCHECK min(stackUse e,heapOverflowSlop) -> HEAPCHECK (heapUse e) -> cg e +\subsection{Direct Returns} -> pushAtom x = pushVar x -> pushAtom i# = PUSHINT i# +When a datatype has a large number of constructors, it may be +inappropriate to use vectored returns. The vector tables may be +large and sparse, and it may be better to identify the constructor +using a test-and-branch sequence on the tag. For this reason, we +provide an alternative return convention, called a \emph{direct +return}. -> pushVar x = if isGlobalVar x then PUSHGLOBAL x else PUSHLOCAL x +In a direct return, the return address pushed on the stack really is a +code pointer. The returning code loads a pointer to the closure being +returned in \Arg{1} as usual, and also loads the tag into \Arg{2}. +The code at the return address will test the tag and jump to the +appropriate code for the case branch. -> pushUntaggedAtom x = pushVar x -> pushUntaggedAtom i# = PUSHUNTAGGEDINT i# +The choice of whether to use a vectored return or a direct return is +made on a type-by-type basis --- up to a certain maximum number of +constructors imposed by the update mechanism +(section~\ref{sect:data-updates}). -> pushVar x = if isGlobalVar x then PUSHGLOBAL x else PUSHLOCAL x -@ +Single-constructor data types also use direct returns, although in +that case there is no need to return a tag in \Arg{2}. -\ToDo{Is there an easy way to add semi-tagging? Would it be that different?} +\ToDo{Say whether we pop the return address before returning} -\ToDo{Optimise thunks of the form @f{x1,...xm}@ so that we build an AP directly} +\ToDo{Stack stubbing?} -\subsubsection{Instructions} +\subsection{Updates} +\label{sect:data-updates} -We specify the semantics of instructions using transition rules of -the form: +The entry code for an updatable thunk (which must be of arity 0): -\begin{tabular}{|llrrrrr|} +\begin{itemize} +\item copies the free variables out of the thunk into registers or + onto the stack. +\item pushes an {\em update frame} onto the stack. + +An update frame is a small activation record consisting of +\begin{center} +\begin{tabular}{|l|l|l|} \hline - & $\is$ & $s$ & $\su$ & $h$ & $hp$ & $\sigma$ \\ -\next & $\is'$ & $s'$ & $\su'$ & $h'$ & $hp'$ & $\sigma$ \\ +{\em Fixed header} & {\em Update Frame link} & {\em Updatee} \\ \hline \end{tabular} +\end{center} -where $\is$ is an instruction stream, $s$ is the stack, $\su$ is the -update frame pointer and $h$ is the heap. +\note{In the semantics part of the STG paper (section 5.6), an update +frame consists of everything down to the last update frame on the +stack. This would make sense too --- and would fit in nicely with +what we're going to do when we add support for speculative +evaluation.} +\ToDo{I think update frames contain cost centres sometimes} +\item +If we are doing ``eager blackholing,'' we then overwrite the thunk +with a black hole. Otherwise, we leave it to the garbage collector to +black hole the thunk. -\subsubsection{Stack manipulation} +\item +Start evaluating the body of the expression. -\begin{description} +\end{itemize} -\item[ Push a global variable ]. +When the expression finishes evaluation, it will enter the update +frame on the top of the stack. Since the returner doesn't know +whether it is entering a normal return address/vector or an update +frame, we follow exactly the same conventions as return addresses and +return vectors. That is, on entering the update frame: -\begin{tabular}{|llrrrrr|} -\hline - & PUSHGLOBAL $o$ : $\is$ & $s$ & $su$ & $h$ & $hp$ & $\sigma$ \\ -\next & $\is$ & $\sigma!o:s$ & $su$ & $h$ & $hp$ & $\sigma$ \\ -\hline -\end{tabular} +\begin{itemize} +\item The value of the thunk is in \Arg{1}. (Recall that only thunks +are updateable and that thunks return just one value.) -\item[ Push a local variable ]. +\item If the data type is a direct-return type rather than a +vectored-return type, then the tag is in \Arg{2}. -\begin{tabular}{|llrrrrr|} -\hline - & PUSHLOCAL $o$ : $\is$ & $s$ & $su$ & $h$ & $hp$ & $\sigma$ \\ -\next & $\is$ & $s!o : s$ & $su$ & $h$ & $hp$ & $\sigma$ \\ -\hline -\end{tabular} +\item The update frame is still on the stack. +\end{itemize} -\item[ Push an unboxed int ]. +We can safely share a single statically-compiled update function +between all types. However, the code must be able to handle both +vectored and direct-return datatypes. This is done by arranging that +the update code looks like this: -\begin{tabular}{|llrrrrr|} -\hline - & PUSHINT $o$ : $\is$ & $s$ & $su$ & $h$ & $hp$ & $\sigma$ \\ -\next & $\is$ & $I\# : \sigma!o : s$ & $su$ & $h$ & $hp$ & $\sigma$ \\ -\hline -\end{tabular} +@ + | ^ | + | return vector | + |---------------| + | fixed-size | + | info table | + |---------------| <- update code pointer + | update code | + | v | +@ -The $I\#$ is a tag included for the benefit of the garbage collector. -Similar rules exist for floats, doubles, chars, etc. +Each entry in the return vector (which is large enough to cover the +largest vectored-return type) points to the update code. -\item[ Push an unboxed int ]. +The update code: +\begin{itemize} +\item overwrites the {\em updatee} with an indirection to \Arg{1}; +\item loads @Su@ from the Update Frame link; +\item removes the update frame from the stack; and +\item enters \Arg{1}. +\end{itemize} -\begin{tabular}{|llrrrrr|} -\hline - & PUSHUNTAGGEDINT $o$ : $\is$ & $s$ & $su$ & $h$ & $hp$ & $\sigma$ \\ -\next & $\is$ & $\sigma!o : s$ & $su$ & $h$ & $hp$ & $\sigma$ \\ -\hline -\end{tabular} +We enter \Arg{1} again, having probably just come from there, because +it knows whether to perform a direct or vectored return. This could +be optimised by compiling special update code for each slot in the +return vector, which performs the correct return. -Similar rules exist for floats, doubles, chars, etc. +\subsection{Semi-tagging} +\label{sect:semi-tagging} -\item[ Delete environment from stack --- ready for tail call ]. +When a @case@ expression evaluates a variable that might be bound +to a thunk it is often the case that the scrutinee is already evaluated. +In this case we have paid the penalty of (a) pushing the return address (or +return vector address) on the stack, (b) jumping through the info pointer +of the scrutinee, and (c) returning by an indirect jump through the +return address on the stack. + +If we knew that the scrutinee was already evaluated we could generate +(better) code which simply jumps to the appropriate branch of the +@case@ with a pointer to the scrutinee in \Arg{1}. (For direct +returns to multiconstructor datatypes, we might also load the tag into +\Arg{2}). + +An obvious idea, therefore, is to test dynamically whether the heap +closure is a value (using the tag in the info table). If not, we +enter the closure as usual; if so, we jump straight to the appropriate +alternative. Here, for example, is pseudo-code for the expression +@(case x of { (a,_,c) -> E }@: +@ + \Arg{1} = ; + tag = \Arg{1}->entry->tag; + if (isWHNF(tag)) { + Sp--; \\ insert space for return address + goto ret; + } + push(ret); + goto \Arg{1}->entry; + + +ret: a = \Arg{1}->data1; \\ suck out a and c to avoid space leak + c = \Arg{1}->data3; + +@ +and here is the code for the expression @(case x of { [] -> E1; x:xs -> E2 }@: +@ + \Arg{1} = ; + tag = \Arg{1}->entry->tag; + if (isWHNF(tag)) { + Sp--; \\ insert space for return address + goto retvec[tag]; + } + push(retinfo); + goto \Arg{1}->entry; + + .addr ret2 + .addr ret1 +retvec: \\ reversed return vector + +retinfo: + panic("Direct return into vectored case"); + +ret1: -\begin{tabular}{|llrrrrr|} -\hline - & SLIDE $m$ $n$ : $\is$ & $\as \append \bs \append \cs$ & $su$ & $h$ & $hp$ & $\sigma$ \\ -\next & $\is$ & $\as \append \cs$ & $su$ & $h$ & $hp$ & $\sigma$ \\ -\hline -\end{tabular} -\\ -where $\size{\as} = m$ and $\size{\bs} = n$. +ret2: x = \Arg{1}->head; + xs = \Arg{1}->tail; + +@ +There is an obvious cost in compiled code size (but none in the size +of the bytecodes). There is also a cost in execution time if we enter +more thunks than data constructors. +Both the direct and vectored returns are easily modified to chase chains +of indirections too. In the vectored case, this is most easily done by +making sure that @IND = TAG_1 - 1@, and adding an extra field to every +return vector. In the above example, the indirection code would be +@ +ind: \Arg{1} = \Arg{1}->next; + goto ind_loop; +@ +where @ind_loop@ is the second line of code. -\item[ Push a return address ]. +Note that we have to leave space for a return address since the return +address expects to find one. If the body of the expression requires a +heap check, we will actually have to write the return address before +entering the garbage collector. -\begin{tabular}{|llrrrrr|} -\hline - & PUSHALTS $o$:$\is$ & $s$ & $su$ & $h$ & $hp$ & $\sigma$ \\ -\next & $\is$ & $@HUGS_RET@:\sigma!o:s$ & $su$ & $h$ & $hp$ & $\sigma$ \\ -\hline -\end{tabular} -\item[ Push a BCO ]. +\subsection{Heap and Stack Checks} +\label{sect:heap-and-stack-checks} -\begin{tabular}{|llrrrrr|} -\hline - & PUSHBCO $o$ : $\is$ & $s$ & $su$ & $h$ & $hp$ & $\sigma$ \\ -\next & $\is$ & $\sigma!o : s$ & $su$ & $h$ & $hp$ & $\sigma$ \\ -\hline -\end{tabular} +The storage manager detects that it needs to garbage collect the old +generation when the evaluator requests a garbage collection without +having moved the heap pointer since the last garbage collection. It +is therefore important that the GC routines {\em not} move the heap +pointer unless the heap check fails. This is different from what +happens in the current STG implementation. -\end{description} +Assuming that the stack can never shrink, we perform a stack check +when we enter a closure but not when we return to a return +continuation. This doesn't work for heap checks because we cannot +predict what will happen to the heap if we call a function. -%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -\subsubsection{Heap manipulation} -%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% +If we wish to allow the stack to shrink, we need to perform a stack +check whenever we enter a return continuation. Most of these checks +could be eliminated if the storage manager guaranteed that a stack +would always have 1000 words (say) of space after it was shrunk. Then +we can omit stack checks for less than 1000 words in return +continuations. -\begin{description} +When an argument satisfaction check fails, we need to push the closure +(in R1) onto the stack - so we need to perform a stack check. The +problem is that the argument satisfaction check occurs \emph{before} +the stack check. The solution is that the caller of a slow entry +point or closure will guarantee that there is at least one word free +on the stack for the callee to use. -\item[ Allocate a heap object ]. +Similarily, if a heap or stack check fails, we need to push the arguments +and closure onto the stack. If we just came from the slow entry point, +there's certainly enough space and it is the responsibility of anyone +using the fast entry point to guarantee that there is enough space. -\begin{tabular}{|llrrrrr|} -\hline - & ALLOC $m$ : $\is$ & $s$ & $su$ & $h$ & $hp$ & $\sigma$ \\ -\next & $\is$ & $hp:s$ & $su$ & $h$ & $hp+m$ & $\sigma$ \\ -\hline -\end{tabular} +\ToDo{Be more precise about how much space is required - document it +in the calling convention section.} -\item[ Build a constructor ]. +\subsection{Handling interrupts/signals} -\begin{tabular}{|llrrrrr|} -\hline - & PACK $o$ $o'$ : $\is$ & $\ws \append s$ & $su$ & $h$ & $hp$ & $\sigma$ \\ -\next & $\is$ & $s$ & $su$ & $h[s!o \mapsto Pack C\{\ws\}]$ & $hp$ & $\sigma$ \\ -\hline -\end{tabular} -\\ -where $C = \sigma!o'$ and $\size{\ws} = \arity{C}$. +@ +May have to keep C stack pointer in register to placate OS? +May have to revert black holes - ouch! +@ -\item[ Build an AP or PAP ]. -\begin{tabular}{|llrrrrr|} -\hline - & MKAP $o$ $m$:$\is$ & $f : \ws \append s$ & $su$ & $h$ & $hp$ & $\sigma$ \\ -\next & $\is$ & $s$ & $su$ & $h[s!o \mapsto \AP(f,\ws)]$ & $hp$ & $\sigma$ \\ -\hline -\end{tabular} -\\ -where $\size{\ws} = m$. -\begin{tabular}{|llrrrrr|} -\hline - & MKPAP $o$ $m$:$\is$ & $f : \ws \append s$ & $su$ & $h$ & $hp$ & $\sigma$ \\ -\next & $\is$ & $s$ & $su$ & $h[s!o \mapsto \PAP(f,\ws)]$ & $hp$ & $\sigma$ \\ -\hline -\end{tabular} -\\ -where $\size{\ws} = m$. +\section{The Loader} +\section{The Compilers} -\item[ Unpacking a constructor ]. +\iffalse +\part{Old stuff - needs to be mined for useful info} -\begin{tabular}{|llrrrrr|} -\hline - & UNPACK : $is$ & $a : s$ & $su$ & $h[a \mapsto C\ \ws]$ & $hp$ & $\sigma$ \\ -\next & $is'$ & $(\reverse\ \ws) \append a : s$ & $su$ & $h$ & $hp$ & $\sigma$ \\ -\hline -\end{tabular} +\section{The Scheduler} -The $\reverse\ \ws$ looks expensive but, since the stack grows down -and the heap grows up, that's actually the cheap way of copying from -heap to stack. Looking at the compilation rules, you'll see that we -always push the args in reverse order. +The Scheduler is the heart of the run-time system. A running program +consists of a single running thread, and a list of runnable and +blocked threads. The running thread returns to the scheduler when any +of the following conditions arises: -\end{description} +\begin{itemize} +\item A heap check fails, and a garbage collection is required +\item Compiled code needs to switch to interpreted code, and vice +versa. +\item The thread becomes blocked. +\item The thread is preempted. +\end{itemize} +A running system has a global state, consisting of -\subsubsection{Entering a closure} +\begin{itemize} +\item @Hp@, the current heap pointer, which points to the next +available address in the Heap. +\item @HpLim@, the heap limit pointer, which points to the end of the +heap. +\item The Thread Preemption Flag, which is set whenever the currently +running thread should be preempted at the next opportunity. +\item A list of runnable threads. +\item A list of blocked threads. +\end{itemize} -\begin{description} +Each thread is represented by a Thread State Object (TSO), which is +described in detail in Section \ref{sect:TSO}. -\item[ Enter a BCO ]. +The following is pseudo-code for the inner loop of the scheduler +itself. -\begin{tabular}{|llrrrrr|} -\hline - & [ENTER] & $a : s$ & $su$ & $h[a \mapsto BCO\{\is\} ]$ & $hp$ & $\sigma$ \\ -\next & $\is$ & $a : s$ & $su$ & $h$ & $hp$ & $a$ \\ -\hline -\end{tabular} +@ +while (threads_exist) { + // handle global problems: GC, parallelism, etc + if (need_gc) gc(); + if (external_message) service_message(); + // deal with other urgent stuff -\item[ Enter a PAP closure ]. + pick a runnable thread; + do { + // enter object on top of stack + // if the top object is a BCO, we must enter it + // otherwise appply any heuristic we wish. + if (thread->stack[thread->sp]->info.type == BCO) { + status = runHugs(thread,&smInfo); + } else { + status = runGHC(thread,&smInfo); + } + switch (status) { // handle local problems + case (StackOverflow): enlargeStack; break; + case (Error e) : error(thread,e); break; + case (ExitWith e) : exit(e); break; + case (Yield) : break; + } + } while (thread_runnable); +} +@ -\begin{tabular}{|llrrrrr|} -\hline - & [ENTER] & $a : s$ & $su$ & $h[a \mapsto \PAP(f,\ws)]$ & $hp$ & $\sigma$ \\ -\next & [ENTER] & $f : \ws \append s$ & $su$ & $h$ & $hp$ & $???$ \\ -\hline -\end{tabular} +\subsection{Invoking the garbage collector} +\subsection{Putting the thread to sleep} -\item[ Entering an AP closure ]. +\subsection{Calling C from Haskell} -\begin{tabular}{|llrrrrr|} -\hline - & [ENTER] & $a : s$ & $su$ & $h[a \mapsto \AP(f,ws)]$ & $hp$ & $\sigma$ \\ -\next & [ENTER] & $f : \ws \append @UPD_RET@:\su:a:s$ & $su'$ & $h$ & $hp$ & $???$ \\ -\hline -\end{tabular} +We distinguish between "safe calls" where the programmer guarantees +that the C function will not call a Haskell function or, in a +multithreaded system, block for a long period of time and "unsafe +calls" where the programmer cannot make that guarantee. -Optimisations: -\begin{itemize} -\item Instead of blindly pushing an update frame for $a$, we can first test whether there's already - an update frame there. If so, overwrite the existing updatee with an indirection to $a$ and - overwrite the updatee field with $a$. (Overwriting $a$ with an indirection to the updatee also - works.) This results in update chains of maximum length 2. -\end{itemize} +Safe calls are performed without returning to the scheduler and are +discussed elsewhere (\ToDo{discuss elsewhere}). +Unsafe calls are performed by returning an array (outside the Haskell +heap) of arguments and a C function pointer to the scheduler. The +scheduler allocates a new thread from the operating system +(multithreaded system only), spawns a call to the function and +continues executing another thread. When the ccall completes, the +thread informs the scheduler and the scheduler adds the thread to the +runnable threads list. -\item[ Returning a constructor ]. +\ToDo{Describe this in more detail.} -\begin{tabular}{|llrrrrr|} -\hline - & [ENTER] & $a : @HUGS_RET@ : \alts : s$ & $su$ & $h[a \mapsto C\{\ws\}]$ & $hp$ & $\sigma$ \\ -\next & $\alts.\entry$ & $a:s$ & $su$ & $h$ & $hp$ & $\sigma$ \\ -\hline -\end{tabular} +\subsection{Calling Haskell from C} -\item[ Entering an indirection node ]. +When C calls a Haskell closure, it sends a message to the scheduler +thread. On receiving the message, the scheduler creates a new Haskell +thread, pushes the arguments to the C function onto the thread's stack +(with tags for unboxed arguments) pushes the Haskell closure and adds +the thread to the runnable list so that it can be entered in the +normal way. -\begin{tabular}{|llrrrrr|} -\hline - & [ENTER] & $a : s$ & $su$ & $h[a \mapsto \Ind{a'}]$ & $hp$ & $\sigma$ \\ -\next & [ENTER] & $a' : s$ & $su$ & $h$ & $hp$ & $\sigma$ \\ -\hline -\end{tabular} +When the closure returns, the scheduler sends back a message which +awakens the (C) thread. -\item[Entering GHC closure]. +\ToDo{Do we need to worry about the garbage collector deallocating the +thread if it gets blocked?} -\begin{tabular}{|llrrrrr|} -\hline - & [ENTER] & $a : s$ & $su$ & $h[a \mapsto \GHCOBJ]$ & $hp$ & $\sigma$ \\ -\next & [ENTERGHC] & $a : s$ & $su$ & $h$ & $hp$ & $\sigma$ \\ -\hline -\end{tabular} +\subsection{Switching Worlds} +\label{sect:switching-worlds} -\item[Returning a constructor to GHC]. +\ToDo{This has all changed: we always leave a closure on top of the +stack if we mean to continue executing it. The scheduler examines the +top of the stack and tries to guess which world we want to be in. If +it finds a @BCO@, it certainly enters Hugs, if it finds a @GHC@ +closure, it certainly enters GHC and if it finds a standard closure, +it is free to choose either one but it's probably best to enter GHC +for everything except @BCO@s and perhaps @AP@s.} -\begin{tabular}{|llrrrrr|} -\hline - & [ENTER] & $a : \GHCRET : s$ & $su$ & $h[a \mapsto C \ws]$ & $hp$ & $\sigma$ \\ -\next & [ENTERGHC] & $a : \GHCRET : s$ & $su$ & $h$ & $hp$ & $\sigma$ \\ -\hline -\end{tabular} +Because this is a combined compiled/interpreted system, the +interpreter will sometimes encounter compiled code, and vice-versa. -\end{description} +All world-switches go via the scheduler, ensuring that the world is in +a known state ready to enter either compiled code or the interpreter. +When a thread is run from the scheduler, the @whatNext@ field in the +TSO (Section \ref{sect:TSO}) is checked to find out how to execute the +thread. +\begin{itemize} +\item If @whatNext@ is set to @ReturnGHC@, we load up the required +registers from the TSO and jump to the address at the top of the user +stack. +\item If @whatNext@ is set to @EnterGHC@, we load up the required +registers from the TSO and enter the closure pointed to by the top +word of the stack. +\item If @whatNext@ is set to @EnterHugs@, we enter the top thing on +the stack, using the interpreter. +\end{itemize} -\subsubsection{Updates} +There are four cases we need to consider: -\begin{description} +\begin{enumerate} +\item A GHC thread enters a Hugs-built closure. +\item A GHC thread returns to a Hugs-compiled return address. +\item A Hugs thread enters a GHC-built closure. +\item A Hugs thread returns to a Hugs-compiled return address. +\end{enumerate} -\item[ Updating with a constructor]. +GHC-compiled modules cannot call functions in a Hugs-compiled module +directly, because the compiler has no information about arities in the +external module. Therefore it must assume any top-level objects are +CAFs, and enter their closures. -\begin{tabular}{|llrrrrr|} -\hline - & [ENTER] & $a : @UPD_RET@ : ua : s$ & $su$ & $h[a \mapsto C\{\ws\}]$ & $hp$ & $\sigma$ \\ -\next & [ENTER] & $a \append s$ & $su$ & $h[au \mapsto \Ind{a}$ & $hp$ & $\sigma$ \\ -\hline -\end{tabular} +\ToDo{Hugs-built constructors?} -\item[ Argument checks]. +We now examine the various cases one by one and describe how the +switch happens in each situation. -\begin{tabular}{|llrrrrr|} -\hline - & ARGCHECK $m$:$\is$ & $a : \as \append s$ & $su$ & $h$ & $hp$ & $\sigma$ \\ -\next & $\is$ & $a : \as \append s$ & $su$ & $h'$ & $hp$ & $\sigma$ \\ -\hline -\end{tabular} -\\ -where $m \ge (su - sp)$ +\subsection{A GHC thread enters a Hugs-built closure} +\label{sect:ghc-to-hugs-switch} -\begin{tabular}{|llrrrrr|} -\hline - & ARGCHECK $m$:$\is$ & $a : \as \append @UPD_RET@:su:ua:s$ & $su$ & $h$ & $hp$ & $\sigma$ \\ -\next & $\is$ & $a : \as \append s$ & $su$ & $h'$ & $hp$ & $\sigma$ \\ -\hline -\end{tabular} -\\ -where $m < (su - sp)$ and - $h' = h[ua \mapsto \Ind{a'}, a' \mapsto \PAP(a,\reverse\ \as) ]$ +There is three possibilities: GHC has entered a @PAP@, or it has +entered a @AP@, or it has entered the BCO directly (for a top-level +function closure). @AP@s and @PAP@s are ``standard closures'' and +so do not require us to enter the bytecode interpreter. -Again, we reverse the list of values as we transfer them from the -stack to the heap --- reflecting the fact that the stack and heap grow -in different directions. +The entry code for a BCO does the following: -\end{description} +\begin{itemize} +\item Push the address of the object entered on the stack. +\item Save the current state of the thread in its TSO. +\item Return to the scheduler, setting @whatNext@ to @EnterHugs@. +\end{itemize} -\subsubsection{Branches} +BCO's for thunks and functions have the same entry conventions as +slow entry points: they expect to find their arguments on the stac +with unboxed arguments preceded by appropriate tags. -\begin{description} +\subsection{A GHC thread returns to a Hugs-compiled return address} +\label{sect:ghc-to-hugs-switch} -\item[ Testing a constructor ]. +Hugs return addresses are laid out as in Figure +\ref{fig:hugs-return-stack}. If GHC is returning, it will return to +the address at the top of the stack, namely @HUGS_RET@. The code at +@HUGS_RET@ performs the following: -\begin{tabular}{|llrrrrr|} -\hline - & TEST $tag$ $is'$ : $is$ & $a : s$ & $su$ & $h[a \mapsto C\ \ws]$ & $hp$ & $\sigma$ \\ -\next & $is$ & $a : s$ & $su$ & $h$ & $hp$ & $\sigma$ \\ -\hline -\end{tabular} -\\ -where $C.\tag = tag$ +\begin{itemize} +\item pushes \Arg{1} (the return value) on the stack. +\item saves the thread state in the TSO +\item returns to the scheduler with @whatNext@ set to @EnterHugs@. +\end{itemize} -\begin{tabular}{|llrrrrr|} -\hline - & TEST $tag$ $is'$ : $is$ & $a : s$ & $su$ & $h[a \mapsto C\ \ws]$ & $hp$ & $\sigma$ \\ -\next & $is'$ & $a : s$ & $su$ & $h$ & $hp$ & $\sigma$ \\ -\hline -\end{tabular} -\\ -where $C.\tag \neq tag$ +\noindent When Hugs runs, it will enter the return value, which will +return using the correct Hugs convention (Section +\ref{sect:hugs-return-convention}) to the return address underneath it +on the stack. -\end{description} +\subsection{A Hugs thread enters a GHC-compiled closure} +\label{sect:hugs-to-ghc-switch} -%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -\subsubsection{Heap and stack checks} -%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% +Hugs can recognise a GHC-built closure as not being one of the +following types of object: -\begin{tabular}{|llrrrrr|} -\hline - & STACKCHECK $stk$:$\is$ & $s$ & $su$ & $h$ & $hp$ & $\sigma$ \\ -\next & $\is$ & $s$ & $su$ & $h$ & $hp$ & $\sigma$ \\ -\hline -\end{tabular} -\\ -if $s$ has $stk$ free slots. +\begin{itemize} +\item A @BCO@, +\item A @AP@, +\item A @PAP@, +\item An indirection, or +\item A constructor. +\end{itemize} -\begin{tabular}{|llrrrrr|} -\hline - & HEAPCHECK $hp$:$\is$ & $s$ & $su$ & $h$ & $hp$ & $\sigma$ \\ -\next & $\is$ & $s$ & $su$ & $h$ & $hp$ & $\sigma$ \\ -\hline -\end{tabular} -\\ -if $h$ has $hp$ free slots. +When Hugs is called on to enter a GHC closure, it executes the +following sequence of instructions: -If either check fails, we push the current bco ($\sigma$) onto the -stack and return to the scheduler. When the scheduler has fixed the -problem, it pops the top object off the stack and reenters it. +\begin{itemize} +\item Push the address of the closure on the stack. +\item Save the current state of the thread in the TSO. +\item Return to the scheduler, with the @whatNext@ field set to +@EnterGHC@. +\end{itemize} +\subsection{A Hugs thread returns to a GHC-compiled return address} +\label{sect:hugs-to-ghc-switch} + +When Hugs encounters a return address on the stack that is not +@HUGS_RET@, it knows that a world-switch is required. At this point +the stack contains a pointer to the return value, followed by the GHC +return address. The following sequence is then performed: -Optimisations: \begin{itemize} -\item The bytecode CHECK1000 conservatively checks for 1000 words of heap space and 1000 words of stack space. - We use it to reduce code space and instruction decoding time. -\item The bytecode HEAPCHECK1000 conservatively checks for 1000 words of heap space. - It is used in case alternatives. +\item save the state of the thread in the TSO. +\item return to the scheduler, setting @whatNext@ to @EnterGHC@. \end{itemize} +The first thing that GHC will do is enter the object on the top of the +stack, which is a pointer to the return value. This value will then +return itself to the return address using the GHC return convention. -%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -\subsubsection{Primops} -%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -\ToDo{primops take m words and return n words. The expect boxed arguments on the stack.} +\fi +\part{Implementation} + +\section{Overview} +This part describes the inner workings of the major components of the system. +Their external interfaces are described in the previous part. +The major components of the system are: +\begin{itemize} +\item The scheduler +\item The loader +\item The storage manager +\item The machine code evaluator (compiled code) +\item The bytecode evaluator (interpreted code) +\end{itemize} +\iffalse \section{Heap objects} +\label{sect:heap-objects} \label{sect:fixed-header} \ToDo{Fix this picture} @@ -2460,7 +4346,7 @@ The semantics of BCOs are described in Section \begin{itemize} \item The entry code is a static code fragment/info table that returns to the scheduler to invoke Hugs (Section -\ref{sect:ghc-to-hugs-closure}). +\ref{sect:ghc-to-hugs-switch}). \item \emph{Layout} contains the number of pointer literals in the \emph{Literals} field. \item \emph{Offset} is the offset to the byte code from the start of @@ -3284,7 +5170,7 @@ are evacuated and @NULL@-values are chained together to form a new free list. There's no need to link the stable pointer table onto the mutable list because we always treat it as a root. - +\fi \section{The Storage Manager} @@ -3409,13 +5295,16 @@ which are still live. \end{description} - %************************************************************************ %* * \subsection{``What really happens in a garbage collection?''} %* * %************************************************************************ +\ToDo{I commented out this long, out of date section - ADR} + +\iffalse + This is a brief tutorial on ``what really happens'' going to/from the storage manager in a garbage collection. @@ -3526,7 +5415,7 @@ parallel system, we will now loop back around, and make sure there is enough space before continuing. \end{description} - +\fi % end of commented out part \subsection{Static Reference Tables (SRTs)} \label{sect:srt} @@ -3718,6 +5607,11 @@ place between each. \subsection{Squeezing identical updates} +\Note{This can also be done by testing whether @Sp == Su@ when we push +an update frame. If so, we can overwrite the updatee with an +indirection to the existing updatee (and some slop objects) and avoid +pushing an update frame.} + \iffalse \ToDo{Insert text stolen from update paper} @@ -3883,8 +5777,6 @@ to implement than Sparud's. \subsection{Internal workings of the Generational Collector} -\section{Dynamic Linking} - \section{Profiling} Registering costs centres looks awkward - can we tidy it up? @@ -3919,7 +5811,7 @@ Measure what proportion of ...: %************************************************************************ %* * -\subsubsection[ticky-stk-heap-use]{Stack and heap usage} +\subsection[ticky-stk-heap-use]{Stack and heap usage} %* * %************************************************************************ @@ -3943,7 +5835,7 @@ Re-use of stack slots, and stubbing of stack slots: %************************************************************************ %* * -\subsubsection[ticky-allocs]{Allocations} +\subsection[ticky-allocs]{Allocations} %* * %************************************************************************ @@ -3977,7 +5869,7 @@ admin/goods/slop breakdown.) %************************************************************************ %* * -\subsubsection[ticky-enters]{Enters} +\subsection[ticky-enters]{Enters} %* * %************************************************************************ @@ -4002,7 +5894,7 @@ struct ent_counter { %************************************************************************ %* * -\subsubsection[ticky-returns]{Returns} +\subsection[ticky-returns]{Returns} %* * %************************************************************************ @@ -4042,7 +5934,7 @@ vectored? (The rest were obviously unvectored). %************************************************************************ %* * -\subsubsection[ticky-update-frames]{Update frames} +\subsection[ticky-update-frames]{Update frames} %* * %************************************************************************ @@ -4062,7 +5954,7 @@ Macro & Counts \\ \hline %************************************************************************ %* * -\subsubsection[ticky-updates]{Updates} +\subsection[ticky-updates]{Updates} %* * %************************************************************************ @@ -4086,7 +5978,7 @@ Macro & Where \\ \hline %************************************************************************ %* * -\subsubsection[ticky-selectors]{Doing selectors at GC time} +\subsection[ticky-selectors]{Doing selectors at GC time} %* * %************************************************************************ @@ -4188,6 +6080,10 @@ collection code. \section{Old stuff about SRTs} +\ToDo{Commented out} + +\iffalse + Garbage collection of @CAF@s is tricky. We have to cope with explicit collection from the @CAFlist@ as well as potential references from the stack and heap which will cause the @CAF@ evacuation code to be @@ -4304,8 +6200,14 @@ shorted out. This scheme has not been adopted but has been implemented. The code is commented out with @#if 0@. +\fi + \subsection{The virtual register set} +\ToDo{Commented out} + +\iffalse + We refer to any (atomic) part of the virtual machine state as a ``register.'' These ``registers'' may be shared between all threads in the system or may be specific to each thread. @@ -4356,6 +6258,7 @@ basis: allow faster access to all the other non-machine registers. @ +\fi \end{document}