From 128e3070182bdc8e713a90618136aa5da38c3d02 Mon Sep 17 00:00:00 2001 From: reid Date: Tue, 21 Oct 1997 17:22:24 +0000 Subject: [PATCH] [project @ 1997-10-21 17:22:24 by reid] Improved glossary/terminology at start - added unpointed and unboxed. Created a section at start to describe the source language. At the moment, all it contains is a description of unboxed tuple constructors. Replaced erroneous uses of "closure" with "heap object". According to the glossary, closures are enterable - things like stack objects are not enterable so they can't be closures. Clarified section 2.7 (heap and stack checks): why should we not move Hp during heap check? Added comment that I don't believe in the notion of fixed headers. --- docs/rts/rts.verb | 163 +++++++++++++++++++++++++++++++++++++++-------------- 1 file changed, 120 insertions(+), 43 deletions(-) diff --git a/docs/rts/rts.verb b/docs/rts/rts.verb index 3548688..6f9b2be 100644 --- a/docs/rts/rts.verb +++ b/docs/rts/rts.verb @@ -83,6 +83,13 @@ Using one stack instead of two reduces register pressure, reduces the size of update frames, and eliminates ``stack-stubbing'' instructions.) +\item The ``return in registers'' return convention has been dropped +because it was complicated and doesn't work well on register-poor +architectures. It has been partly replaced by unboxed +tuples~\ref{sect:unboxed-tuples} which allow the programmer to +explicitly state where results should be returned in registers (or on +the stack) instead of on the heap. + \end{itemize} \subsection{Wish list} @@ -141,7 +148,11 @@ Which garbage collector to use. At the moment we only anticipate one, however. \end{itemize} -\subsection{Terminology} +\subsection{Glossary} + +\ToDo{This terminology is not used consistently within the document. +If you find soemthing which disagrees with this terminology, fix the +usage.} \begin{itemize} @@ -151,12 +162,29 @@ or an unsigned int. \item A {\em pointer} is (at least) 32 bits and big enough to hold a function pointer or a data pointer. +\item A {\em boxed} type is one whose elements are heap allocated. + +\item An {\em unboxed} type is one whose elements are {\em not} heap allocated. + +\item A {\em pointed} type is one that contains $\bot$. Variables with +pointed types are the only things which can be lazily evaluated. In +the STG machine, this means that they are the only things that can be +{\em entered} or {\em updated} and it requires that they be boxed. + +\item An {\em unpointed} type is one that does not contains $\bot$. +Variables with unpointed types are never delayed --- they are always +evaluated when they are constructed. In the STG machine, this means +that they cannot be {\em entered} or {\em updated}. Unpointed objects +may be boxed (like @Array#@) or unboxed (like @Int#@). + \item A {\em closure} is a (representation of) a value of a {\em pointed} - type. It may be in HNF or it may be unevaluated --- in either case, you can - try to evaluate it again. +type. It may be in HNF or it may be unevaluated --- in either case, you can +try to evaluate it again. \item A {\em thunk} is a (representation of) a value of a {\em pointed} - type which is {\em not} in HNF. +type which is {\em not} in HNF. + +\item A {\em value} is an object in HNF. It can be pointed or unpointed. \end{itemize} @@ -168,8 +196,10 @@ words and pointers are the same size. % More terminology to mention. % unboxed, unpointed -There are a few other system invariants which need to be mentioned --- -though not necessarily here: +\subsection{Invariants} + +There are a few system invariants which need to be mentioned --- +though this is probably the wrong place for them. \begin{itemize} @@ -180,6 +210,56 @@ the old generation is no bigger than the current new generation. \end{itemize} +\section{Source Language} + +\subsection{Unboxed tuples}\label{sect:unboxed-tuples} + +Functions can take multiple arguments as easily as they can take one +argument: there's no cost for adding another argument. But functions +can only return one result: the cost of adding a second ``result'' is +that the function must construct a tuple of ``results'' on the heap. +The assymetry is rather galling and can make certain programming +styles quite expensive. For example, consider a simple state transformer +monad: +@ +> type S a = State -> (a,State) +> bindS m k s0 = case m s0 of { (a,s1) -> k a s1 } +> returnS a s = (a,s) +> getS s = (s,s) +> setS s _ = ((),s) +@ +Here, every use of @returnS@, @getS@ or @setS@ constructs a new tuple +in the heap which is instantly taken apart (and becomes garbage) by +the case analysis in @bind@. Even a short state-transformer program +will construct a lot of these temporary tuples. + +Unboxed tuples provide a way for the programmer to indicate that they +do not expect a tuple to be shared and that they do not expect it to +be allocated in the heap. Syntactically, unboxed tuples are just like +single constructor datatypes except for the annotation @unboxed@. +@ +> data unboxed AAndState# a = AnS a State +> type S a = State -> AAndState# a +> bindS m k s0 = case m s0 of { AnS a s1 -> k a s1 } +> returnS a s = AnS a s +> getS s = AnS s s +> setS s _ = AnS () s +@ +Semantically, unboxed tuples are just unlifted tuples and are subject +to the same restrictions as other unpointed types. + +Operationally, unboxed tuples are never built on the heap. When +unboxed tuples are returned, they are returned in multiple registers +or multiple stack slots. At first sight, this seems a little strange +but it's no different from passing double precision floats in two +registers. + +Note that unboxed tuples can only have one constructor and that +thunks never have unboxed types --- so we'll never try to update an +unboxed constructor. The restriction to a single constructor is +largely to avoid garbage collection complications. + + %----------------------------------------------------------------------------- \part{Evaluation Model} \section{Compiled Execution} @@ -310,7 +390,7 @@ and either builds a PAP or pops the arguments off the stack into \Arg{2} \ldots \Arg{m+1} and jumps to the fast entry point. -\item {\em Unknown function closure or thunk.} +\item {\em Unknown function closure, thunk or constructor.} Sometimes, the function being called is not statically identifiable. Consider, for example, the @compose@ function: @@ -388,35 +468,18 @@ stack contains a valid activation record (section~\ref{sect:activation-records}) --- should a garbage collection be required. -\item If @x@ has a pointed type (e.g.~a data constructor or a function), +\item If @x@ has a boxed type (e.g.~a data constructor or a function), a pointer to @x@ is returned in \Arg{1}. \ToDo{Warn that components of E should be extracted as soon as possible to avoid a space leak.} -\item If @x@ is an unpointed type (e.g.~@Int#@ or @Float#@), @x@ is +\item If @x@ is an unboxed type (e.g.~@Int#@ or @Float#@), @x@ is returned in \Arg{1} -\item If @x@ is an unpointed tuple constructor, the components of @x@ +\item If @x@ is an unboxed tuple constructor, the components of @x@ are returned in \Arg{1} \ldots \Arg{n} but no object is constructed in -the heap. Unboxed tuple constructors are useful for functions which -want to return multiple values such as those used in an (explicitly -encoded) state monad: - -\ToDo{Move stuff about unboxed tuples to a separate section} - -@ -data unpointed AAndState a = AnS a State -type S a = State -> AAndState a - -bindS m k s0 = case m s0 of { AnS s1 a -> k s1 a } -returnS a s = AnS a s -@ - -Note that unboxed datatypes can only have one constructor and that -thunks never have unboxed types --- so we'll never try to update an -unboxed constructor. Unboxed tuples are \emph{never} built on the -heap. +the heap. When passing an unboxed tuple to a function, the components are flattened out and passed in \Arg{1} \ldots \Arg{n} as usual. @@ -498,7 +561,7 @@ that case there is no need to return a tag in \Arg{2}. \subsection{Updates} \label{sect:data-updates} -The entry code for an updatable thunk (which must also be of arity 0): +The entry code for an updatable thunk (which must be of arity 0): \begin{itemize} \item copies the free variables out of the thunk into registers or @@ -590,8 +653,10 @@ of the scrutinee, and (c) returning by an indirect jump through the return address on the stack. If we knew that the scrutinee was already evaluated we could generate -(better) code which simply jumps to the appropriate branch of the @case@ -with a pointer to the scrutinee in \Arg{1}. +(better) code which simply jumps to the appropriate branch of the +@case@ with a pointer to the scrutinee in \Arg{1}. (For direct +returns to multiconstructor datatypes, we might also load the tag into +\Arg{2}). An obvious idea, therefore, is to test dynamically whether the heap closure is a value (using the tag in the info table). If not, we @@ -661,9 +726,16 @@ entering the garbage collector. \note{I reckon these deserve a subsection of their own} -Don't move heap pointer before success occurs. +The storage manager detects that it needs to garbage collect the old +generation when the evaluator requests a garbage collection without +having moved the heap pointer since the last garbage collection. It +is therefore important that the GC routines {\em not} move the heap +pointer unless the heap check fails. This is different from what +happens in the current STG implementation. + Talk about how stack check looks ahead into the branches of case expressions. + \subsection{Handling interrupts/signals} @ @@ -781,7 +853,7 @@ stack layout is shown in Figure \ref{fig:hugs-return-stack}. \begin{center} \input{hugs_ret.pstex_t} \end{center} -\caption{Stack layout for a hugs return address} +\caption{Stack layout for a Hugs return address} \label{fig:hugs-return-stack} \end{figure} @@ -789,7 +861,7 @@ stack layout is shown in Figure \ref{fig:hugs-return-stack}. \begin{center} \input{hugs_ret2.pstex_t} \end{center} -\caption{Stack layout on enterings a hugs return address} +\caption{Stack layout on enterings a Hugs return address} \label{fig:hugs-return2} \end{figure} @@ -853,8 +925,8 @@ while (threads_exist) { pick a runnable thread; do { switch (thread->whatNext) { - case (RunGHC pc): status=runGHC(pc); break; - case (RunHugs bc): status=runHugs(bc); break; + case (EnterGHC pc): status=runGHC(pc); break; + case (EnterHugs bc): status=runHugs(bc); break; } switch (status) { // handle local problems case (StackOverflow): enlargeStack; break; @@ -1003,11 +1075,16 @@ return itself to the return address using the GHC return convention. \label{fig:closure} \end{figure} -Every {\em heap object}, or {\em closure} is a contiguous block +Every {\em heap object} is a contiguous block of memory, consisting of a fixed-format {\em header} followed by zero or more {\em data words}. -The header consists of -the following fields: + +\ToDo{I absolutely do not believe that every heap object has a header +like this - ADR. I believe that they all have an info pointer but I +see no readon why stack objects and unpointed heap objects would have +an entry count since this will always be zero.} + +The header consists of the following fields: \begin{itemize} \item A one-word {\em info pointer}, which points to the object's static {\em info table}. @@ -1243,9 +1320,9 @@ Section~\ref{sect:stacks}. A second useful classification is this: \begin{itemize} \item -{\em Executive closures}, such as thunks and data constructors, +{\em Executive objects}, such as thunks and data constructors, participate directly in a program's execution. They can be subdivided into -two kinds of objects according to their type: +three kinds of objects according to their type: \begin{itemize} \item {\em Pointed objects}, represent values of a {\em pointed} type @@ -1260,7 +1337,7 @@ once we support speculative evaluation.} \end{itemize} -\item {\em Administrative closures}, such as stack objects and thread +\item {\em Administrative objects}, such as stack objects and thread state objects, do not represent values in the original program. \end{itemize} @@ -2645,7 +2722,7 @@ a case in point, as <.jones jfp leak.> points out. Consider a thunk for the expression @ let xs = [1..1000] in last xs -@ +@ where @last@ is a function that returns the last element of its argument list. When the thunk is entered it will call @last@, which will consume @xs@ until it finds the last element. Since the list -- 1.7.10.4