[project @ 1997-10-21 17:22:24 by reid]

author reid <unknown>

Tue, 21 Oct 1997 17:22:24 +0000 (17:22 +0000)

committer reid <unknown>

Tue, 21 Oct 1997 17:22:24 +0000 (17:22 +0000)
author reid <unknown>
Tue, 21 Oct 1997 17:22:24 +0000 (17:22 +0000)
committer reid <unknown>
Tue, 21 Oct 1997 17:22:24 +0000 (17:22 +0000)
diff --git a/docs/rts/rts.verb b/docs/rts/rts.verb

index 3548688..6f9b2be 100644 (file)
--- a/docs/rts/rts.verb
+++ b/docs/rts/rts.verb
@@ -83,6 +83,13 @@ Using one stack instead of two reduces register pressure, reduces the
  size of update frames, and eliminates
  ``stack-stubbing'' instructions.)
  
  size of update frames, and eliminates
  ``stack-stubbing'' instructions.)
  
+\item The ``return in registers'' return convention has been dropped
+because it was complicated and doesn't work well on register-poor
+architectures.  It has been partly replaced by unboxed
+tuples~\ref{sect:unboxed-tuples} which allow the programmer to
+explicitly state where results should be returned in registers (or on
+the stack) instead of on the heap.
+
  \end{itemize} 
  
  \subsection{Wish list}
  \end{itemize} 
  
  \subsection{Wish list}
@@ -141,7 +148,11 @@ Which garbage collector to use.  At the moment we
  only anticipate one, however.
  \end{itemize}
  
  only anticipate one, however.
  \end{itemize}
  
-\subsection{Terminology}
+\subsection{Glossary}
+
+\ToDo{This terminology is not used consistently within the document.
+If you find soemthing which disagrees with this terminology, fix the
+usage.}
  
  \begin{itemize}
  
  
  \begin{itemize}
  
@@ -151,12 +162,29 @@ or an unsigned int.
  \item A {\em pointer} is (at least) 32 bits and big enough to hold a
  function pointer or a data pointer.  
  
  \item A {\em pointer} is (at least) 32 bits and big enough to hold a
  function pointer or a data pointer.  
  
+\item A {\em boxed} type is one whose elements are heap allocated.
+
+\item An {\em unboxed} type is one whose elements are {\em not} heap allocated.
+
+\item A {\em pointed} type is one that contains $\bot$.  Variables with
+pointed types are the only things which can be lazily evaluated.  In
+the STG machine, this means that they are the only things that can be 
+{\em entered} or {\em updated} and it requires that they be boxed.
+
+\item An {\em unpointed} type is one that does not contains $\bot$.
+Variables with unpointed types are never delayed --- they are always
+evaluated when they are constructed.  In the STG machine, this means
+that they cannot be {\em entered} or {\em updated}.  Unpointed objects
+may be boxed (like @Array#@) or unboxed (like @Int#@).
+
  \item A {\em closure} is a (representation of) a value of a {\em pointed}
  \item A {\em closure} is a (representation of) a value of a {\em pointed}
- type.  It may be in HNF or it may be unevaluated --- in either case, you can
- try to evaluate it again.
+type.  It may be in HNF or it may be unevaluated --- in either case, you can
+try to evaluate it again.
  
  \item A {\em thunk} is a (representation of) a value of a {\em pointed}
  
  \item A {\em thunk} is a (representation of) a value of a {\em pointed}
- type which is {\em not} in HNF.
+type which is {\em not} in HNF.
+
+\item A {\em value} is an object in HNF.  It can be pointed or unpointed.
  
  \end{itemize}
  
  
  \end{itemize}
  
@@ -168,8 +196,10 @@ words and pointers are the same size.
  % More terminology to mention.
  % unboxed, unpointed
  
  % More terminology to mention.
  % unboxed, unpointed
  
-There are a few other system invariants which need to be mentioned ---
-though not necessarily here:
+\subsection{Invariants}
+
+There are a few system invariants which need to be mentioned ---
+though this is probably the wrong place for them.
  
  \begin{itemize}
  
  
  \begin{itemize}
  
@@ -180,6 +210,56 @@ the old generation is no bigger than the current new generation.
  
  \end{itemize}
  
  
  \end{itemize}
  
+\section{Source Language}
+
+\subsection{Unboxed tuples}\label{sect:unboxed-tuples}
+
+Functions can take multiple arguments as easily as they can take one
+argument: there's no cost for adding another argument.  But functions
+can only return one result: the cost of adding a second ``result'' is
+that the function must construct a tuple of ``results'' on the heap.
+The assymetry is rather galling and can make certain programming
+styles quite expensive.  For example, consider a simple state transformer
+monad:
+@
+> type S a = State -> (a,State)
+> bindS m k s0 = case m s0 of { (a,s1) -> k a s1 }
+> returnS a s  = (a,s)
+> getS s       = (s,s)
+> setS s _     = ((),s)
+@
+Here, every use of @returnS@, @getS@ or @setS@ constructs a new tuple
+in the heap which is instantly taken apart (and becomes garbage) by
+the case analysis in @bind@.  Even a short state-transformer program
+will construct a lot of these temporary tuples.
+
+Unboxed tuples provide a way for the programmer to indicate that they
+do not expect a tuple to be shared and that they do not expect it to
+be allocated in the heap.  Syntactically, unboxed tuples are just like
+single constructor datatypes except for the annotation @unboxed@.
+@
+> data unboxed AAndState# a = AnS a State
+> type S a = State -> AAndState# a
+> bindS m k s0 = case m s0 of { AnS a s1 -> k a s1 }
+> returnS a s  = AnS a s
+> getS s       = AnS s s
+> setS s _     = AnS () s
+@
+Semantically, unboxed tuples are just unlifted tuples and are subject
+to the same restrictions as other unpointed types.
+
+Operationally, unboxed tuples are never built on the heap.  When
+unboxed tuples are returned, they are returned in multiple registers
+or multiple stack slots.  At first sight, this seems a little strange
+but it's no different from passing double precision floats in two
+registers.
+
+Note that unboxed tuples can only have one constructor and that
+thunks never have unboxed types --- so we'll never try to update an
+unboxed constructor.  The restriction to a single constructor is
+largely to avoid garbage collection complications.
+
+
  %-----------------------------------------------------------------------------
  \part{Evaluation Model}
  \section{Compiled Execution}
  %-----------------------------------------------------------------------------
  \part{Evaluation Model}
  \section{Compiled Execution}
@@ -310,7 +390,7 @@ and either builds a PAP or pops the arguments off the stack into
  \Arg{2} \ldots \Arg{m+1} and jumps to the fast entry point.
  
  
  \Arg{2} \ldots \Arg{m+1} and jumps to the fast entry point.
  
  
-\item {\em Unknown function closure or thunk.}
+\item {\em Unknown function closure, thunk or constructor.}
  
  Sometimes, the function being called is not statically identifiable.
  Consider, for example, the @compose@ function:
  
  Sometimes, the function being called is not statically identifiable.
  Consider, for example, the @compose@ function:
@@ -388,35 +468,18 @@ stack contains a valid activation record
  (section~\ref{sect:activation-records}) --- should a garbage collection
  be required.
  
  (section~\ref{sect:activation-records}) --- should a garbage collection
  be required.
  
-\item If @x@ has a pointed type (e.g.~a data constructor or a function),
+\item If @x@ has a boxed type (e.g.~a data constructor or a function),
  a pointer to @x@ is returned in \Arg{1}.
  
  \ToDo{Warn that components of E should be extracted as soon as
  possible to avoid a space leak.}
  
  a pointer to @x@ is returned in \Arg{1}.
  
  \ToDo{Warn that components of E should be extracted as soon as
  possible to avoid a space leak.}
  
-\item If @x@ is an unpointed type (e.g.~@Int#@ or @Float#@), @x@ is
+\item If @x@ is an unboxed type (e.g.~@Int#@ or @Float#@), @x@ is
  returned in \Arg{1}
  
  returned in \Arg{1}
  
-\item If @x@ is an unpointed tuple constructor, the components of @x@
+\item If @x@ is an unboxed tuple constructor, the components of @x@
  are returned in \Arg{1} \ldots \Arg{n} but no object is constructed in
  are returned in \Arg{1} \ldots \Arg{n} but no object is constructed in
-the heap.  Unboxed tuple constructors are useful for functions which
-want to return multiple values such as those used in an (explicitly
-encoded) state monad:
-
-\ToDo{Move stuff about unboxed tuples to a separate section}
-
-@
-data unpointed AAndState a = AnS a State
-type S a = State -> AAndState a
-
-bindS m k s0 = case m s0 of { AnS s1 a -> k s1 a }
-returnS a s  = AnS a s
-@
-
-Note that unboxed datatypes can only have one constructor and that
-thunks never have unboxed types --- so we'll never try to update an
-unboxed constructor.  Unboxed tuples are \emph{never} built on the
-heap.
+the heap.  
  
  When passing an unboxed tuple to a function, the components are
  flattened out and passed in \Arg{1} \ldots \Arg{n} as usual.
  
  When passing an unboxed tuple to a function, the components are
  flattened out and passed in \Arg{1} \ldots \Arg{n} as usual.
@@ -498,7 +561,7 @@ that case there is no need to return a tag in \Arg{2}.
  \subsection{Updates}
  \label{sect:data-updates}
  
  \subsection{Updates}
  \label{sect:data-updates}
  
-The entry code for an updatable thunk (which must also be of arity 0):
+The entry code for an updatable thunk (which must be of arity 0):
  
  \begin{itemize}
  \item copies the free variables out of the thunk into registers or
  
  \begin{itemize}
  \item copies the free variables out of the thunk into registers or
@@ -590,8 +653,10 @@ of the scrutinee, and (c) returning by an indirect jump through the
  return address on the stack.
  
  If we knew that the scrutinee was already evaluated we could generate
  return address on the stack.
  
  If we knew that the scrutinee was already evaluated we could generate
-(better) code which simply jumps to the appropriate branch of the @case@
-with a pointer to the scrutinee in \Arg{1}.
+(better) code which simply jumps to the appropriate branch of the
+@case@ with a pointer to the scrutinee in \Arg{1}.  (For direct
+returns to multiconstructor datatypes, we might also load the tag into
+\Arg{2}).
  
  An obvious idea, therefore, is to test dynamically whether the heap
  closure is a value (using the tag in the info table).  If not, we
  
  An obvious idea, therefore, is to test dynamically whether the heap
  closure is a value (using the tag in the info table).  If not, we
@@ -661,9 +726,16 @@ entering the garbage collector.
  
  \note{I reckon these deserve a subsection of their own}
  
  
  \note{I reckon these deserve a subsection of their own}
  
-Don't move heap pointer before success occurs.
+The storage manager detects that it needs to garbage collect the old
+generation when the evaluator requests a garbage collection without
+having moved the heap pointer since the last garbage collection.  It
+is therefore important that the GC routines {\em not} move the heap
+pointer unless the heap check fails.  This is different from what
+happens in the current STG implementation.
+
  Talk about how stack check looks ahead into the branches of case expressions.
  
  Talk about how stack check looks ahead into the branches of case expressions.
  
+
  \subsection{Handling interrupts/signals}
  
  @
  \subsection{Handling interrupts/signals}
  
  @
@@ -781,7 +853,7 @@ stack layout is shown in Figure \ref{fig:hugs-return-stack}.
  \begin{center}
  \input{hugs_ret.pstex_t}
  \end{center}
  \begin{center}
  \input{hugs_ret.pstex_t}
  \end{center}
-\caption{Stack layout for a hugs return address}
+\caption{Stack layout for a Hugs return address}
  \label{fig:hugs-return-stack}
  \end{figure}
  
  \label{fig:hugs-return-stack}
  \end{figure}
  
@@ -789,7 +861,7 @@ stack layout is shown in Figure \ref{fig:hugs-return-stack}.
  \begin{center}
  \input{hugs_ret2.pstex_t}
  \end{center}
  \begin{center}
  \input{hugs_ret2.pstex_t}
  \end{center}
-\caption{Stack layout on enterings a hugs return address}
+\caption{Stack layout on enterings a Hugs return address}
  \label{fig:hugs-return2}
  \end{figure}
  
  \label{fig:hugs-return2}
  \end{figure}
  
@@ -853,8 +925,8 @@ while (threads_exist) {
    pick a runnable thread;
    do {
      switch (thread->whatNext) {
    pick a runnable thread;
    do {
      switch (thread->whatNext) {
-      case (RunGHC  pc): status=runGHC(pc);  break;
-      case (RunHugs bc): status=runHugs(bc); break;
+      case (EnterGHC  pc): status=runGHC(pc);  break;
+      case (EnterHugs bc): status=runHugs(bc); break;
      }
      switch (status) {  // handle local problems
        case (StackOverflow): enlargeStack; break;
      }
      switch (status) {  // handle local problems
        case (StackOverflow): enlargeStack; break;
@@ -1003,11 +1075,16 @@ return itself to the return address using the GHC return convention.
  \label{fig:closure}
  \end{figure}
  
  \label{fig:closure}
  \end{figure}
  
-Every {\em heap object}, or {\em closure} is a contiguous block
+Every {\em heap object} is a contiguous block
  of memory, consisting of a fixed-format {\em header} followed
  by zero or more {\em data words}.
  of memory, consisting of a fixed-format {\em header} followed
  by zero or more {\em data words}.
-The header consists of
-the following fields:
+
+\ToDo{I absolutely do not believe that every heap object has a header
+like this - ADR.  I believe that they all have an info pointer but I
+see no readon why stack objects and unpointed heap objects would have
+an entry count since this will always be zero.}
+
+The header consists of the following fields:
  \begin{itemize}
  \item A one-word {\em info pointer}, which points to
  the object's static {\em info table}.
  \begin{itemize}
  \item A one-word {\em info pointer}, which points to
  the object's static {\em info table}.
@@ -1243,9 +1320,9 @@ Section~\ref{sect:stacks}.
  A second useful classification is this:
  \begin{itemize}
  \item 
  A second useful classification is this:
  \begin{itemize}
  \item 
-{\em Executive closures}, such as thunks and data constructors,
+{\em Executive objects}, such as thunks and data constructors,
  participate directly in a program's execution.  They can be subdivided into
  participate directly in a program's execution.  They can be subdivided into
-two kinds of objects according to their type:
+three kinds of objects according to their type:
  \begin{itemize}
  \item 
  {\em Pointed objects}, represent values of a {\em pointed} type
  \begin{itemize}
  \item 
  {\em Pointed objects}, represent values of a {\em pointed} type
@@ -1260,7 +1337,7 @@ once we support speculative evaluation.}
  
  \end{itemize}
  
  
  \end{itemize}
  
-\item {\em Administrative closures}, such as stack objects and thread
+\item {\em Administrative objects}, such as stack objects and thread
  state objects, do not represent values in the original program.
  \end{itemize}
  
  state objects, do not represent values in the original program.
  \end{itemize}
  
@@ -2645,7 +2722,7 @@ a case in point, as <.jones jfp leak.> points out.  Consider a thunk for
  the expression
  @
    let xs = [1..1000] in last xs
  the expression
  @
    let xs = [1..1000] in last xs
-@ 
+@
  where @last@ is a function that returns the last element of its
  argument list.  When the thunk is entered it will call @last@, which
  will consume @xs@ until it finds the last element.  Since the list
  where @last@ is a function that returns the last element of its
  argument list.  When the thunk is entered it will call @last@, which
  will consume @xs@ until it finds the last element.  Since the list
author	reid <unknown>
	Tue, 21 Oct 1997 17:22:24 +0000 (17:22 +0000)
committer	reid <unknown>
	Tue, 21 Oct 1997 17:22:24 +0000 (17:22 +0000)