Reorganisation of the source tree

[ghc-hetmet.git] / docs / storage-mgt / rp.tex
diff --git a/docs/storage-mgt/rp.tex b/docs/storage-mgt/rp.tex

new file mode 100644 (file)

index 0000000..2055894
--- /dev/null
+++ b/docs/storage-mgt/rp.tex
@@ -0,0 +1,1102 @@
+\documentclass{article}
+\usepackage{code,a4wide}
+
+\usepackage{graphics,epsfig,epic,eepic,epsfig}
+
+\setlength{\parskip}{0.25cm}
+\setlength{\parsep}{0.25cm}
+\setlength{\topsep}{0cm}
+\setlength{\parindent}{0cm}
+\renewcommand{\textfraction}{0.2}
+\renewcommand{\floatpagefraction}{0.7}
+
+
+% Terminology
+\newcommand{\block}{block}
+\newcommand{\Block}{Block}
+\newcommand{\segment}{segment}
+\newcommand{\Segment}{Segment}
+\newcommand{\step}{step}
+\newcommand{\Step}{Step}
+
+\newcommand{\note}[1]{{\em $\spadesuit$ #1}}
+
+\begin{document}
+\title{Implementation of Retainer Profiling}
+\author{Sungwoo Park and Simon Peyton-Jones}
+
+\makeatactive
+\maketitle
+
+\section{Retainer Profiling}
+
+Retainer profiling~\cite{CN} is a profiling technique which is based upon a 
+special view of production and consumption of heap objects at runtime:
+while producers build heap objects to form new pieces of graph,
+consumers hold pointers to these heap objects, or \emph{retain} them, so 
+that they are not freed during garbage collections. 
+On this basis, we refereed to such consumers as \emph{retainers}.
+Notice that an object can have more than one retainer because it can
+be pointed to by multiple objects.
+
+For each live object in the heap, retainer profiling computes 
+all its retainers, or its \emph{retainer set}.
+A naive implementation of retainer profiling could consider every
+immediate ancestor of an object as its retainer.
+Although this approach appears to provide an accurate report on the 
+relationship between retainers and retainees, the result can hardly be useful.
+For instance, it is pointless to scrutinize a list and treat each cons
+cell as a retainer of the following cons cell.
+This observation suggests that we need to have a way of designating only 
+certain types of objects as candidates for retainers.
+In other words, we need to employ an oracle which tells whether a given
+object can be a retainer or not.
+
+Since no retainer of a particular object needs to be using the
+object actively, we can find all its retainers simply by traversing
+the graph. In other words, we do not have to distinguish those retainers
+actively exploiting it from other retainers just holding pointers
+to it (either directly or indirectly).
+Thus, retainer profiling can be accomplished simply by traversing the
+graph.
+
+Figure~\ref{fig-retaineralgorithm} shows the algorithm for retainer
+profiling. The traversal begins at every root, and proceeds 
+in a depth first manner (or a breadth first manner).
+The function @R()@ returns the \emph{identity} of a given retainer such
+as its information table address or
+the name of the module which creates it.
+Notice that the retainer identity function does not need to be a 
+one-to-one mapping:
+multiple objects can share the same identity.
+Such a retainer identity function reduces the cost of traversal.
+For instance, when an object
+is reached from several retainers which share the same identity, we need to
+consider only the first visit to the object.
+In other words, whichever retainer (among those sharing the same identity) 
+leads to the object for the first time affects the retainer set of the object
+in consideration
+and all the other retainers can be ignored.
+Thus, the more coarse the function @R()@ is, the less
+it costs to traverse the graph for retainer profiling.
+The function @isRetainer()@ tells whether a given object is a retainer or not.
+Hence, the behavior of the retainer profiling algorithm is parameterized
+over: 1) the set of roots; 2) the function @R()@; 3) the function
+@isRetainer()@.
+
+One important invariant on the function @R()@ is that its return value
+must be consistent for a given retainer. In other words, @R()@ must return
+the same value for a given retainer no matter it is invoked.
+For this reason, the memory address of a retainer, for instance, cannot be used as
+its retainer identity because its location may change during garbage collections.
+
+\begin{figure}[ht]
+\begin{center}
+\begin{code}
+for every root r
+  retain(r, r)
+
+R(r) =
+  the identity of r
+
+isRetainer(c) =
+  if c is a retainer then   
+    true 
+  else 
+    false
+
+retain(c, r) =
+  if R(r) is a member of c.retainerSet then 
+    return
+  add R(r) to c.retainerSet
+  if isRetainer(c) then 
+    r' := c
+  else 
+    r' := r
+  for every successor c' of c
+    retain(c', r')
+\end{code}
+\caption{Retainer profiling algorithm}
+\label{fig-retaineralgorithm}
+\end{center}
+\end{figure}
+
+Another way of formulating retainer profiling is in terms of fixed point
+equations on retainer sets.
+To be specific, given the two functions @isRetainer()@ and @R()@,
+the retainer set of every object is computed as the least fixed point
+solution of the following equations:
+\begin{itemize}
+\item For every root @r@, 
+\begin{center}
+  @R(r)@ $\in$ @r.retainerSet@.
+\end{center}
+\item For every reachable object @c@, 
+\begin{center}
+$\bigcup_{\mathit{each\ ancestor\ @a@\ of\ @c@}}$ @from(a)@ $\subseteq$ 
+@c.retainerSet@ 
+\end{center}
+where @from(a)@ returns retainer(s) obtainable from @a@:
+\begin{center}
+@from(a)@ = if @isRetainer(a)@ then $\{@a@\}$ else @a.retainerSet@.
+\end{center}
+\end{itemize}
+
+This document describes the implementation of retainer profiling on
+the Glasgow Haskell Compiler runtime system. 
+It explains every detail of the development so that it can be (hopefully)
+a complete maintenance guide.
+A secondary goal is to help (hopefully) those who wish to extend the system 
+to implement another profiling scheme.\footnote{Unless otherwise mentioned,
+all identifiers are defined in @RetainerProfile.c@ or @RetainerSet.c@.}
+
+\section{Installing the GHC}
+
+Installing the GHC is done as follows:
+
+\begin{enumerate}
+\item Get the source code from the CVS repository.
+\begin{code}
+./              cvs checkout fpconfig
+./fptools/      cvs update -d CONTRIB common-rts distrib docs ghc glafp-utils 
+                    hslibs literate mhms mk nofib testsuite
+\end{code}
+
+\item Set up the basic configuration.
+\begin{code}
+./fptools/      autoconf                    
+./fptools/ghc/  autoconf
+./fptools/      configure
+\end{code}
+
+\item Set up the configuration for development and debugging.
+\begin{code}
+./fptools/mk    vi build.mk
+    GhcHcOpts = -O -fasm -Rghc-timing
+    SplitObjs = NO
+    GhcRtsHcOpts = 
+    GhcRtsCcOpts = -g
+    STRIP =:
+\end{code}
+@GhcLibWays@ tells the compiler to build the code for profiling as well.
+@GhcRtsHcOpts@ has additional flags for @gcc@ when compiling @.hc@ files.
+@GhcRtsCcOpts@ has additional flags for @gcc@ when compiling @.c@ files.
+Since we will implement retainer profiling in @.c@ files, we turn on the 
+debugging flag @-g@. 
+The empty setting for @STRIP@ tells the compiler not to remove source code
+information (generated due to the @-g@ option) from executable files so that
+they can be examined with @gdb@.
+
+\item Remove unnecessary files if needed and build everything.
+\begin{code}
+./fptools/      make
+\end{code}
+\end{enumerate}
+
+\section{Adding Retainer Set Fields}
+
+Since every Haskell closure now needs to store its retainer set at runtime, 
+it must be augmented with a new field,
+namely, a \emph{retainer set field}.
+This section explains how to add such a field to Haskell closures.
+It should be clear how to generalize the idea for adding 
+any number of new fields.\footnote{The GHC provides two 
+ways of building executable programs from 
+source files: normal way and profiling way. 
+We are concerned only about profiling way, and all the pieces of code 
+implementing profiling way are wrapped by the @PROFILING@ 
+pre-processing directive (as in @\#ifdef PROFILING@).
+Therefore, all the additions and changes that we make to the source code 
+are assumed to be wrapped by the @PROFILING@ pre-processing 
+directive as well unless otherwise mentioned.}
+
+\subsection{Adding a new field to Haskell closures} 
+
+We want to add a retainer set field of type @retainerSet@ to every 
+closure, so we create a new file @includes/StgRetainerProf.h@ where
+we define the type @retainerSet@.
+The actual definition of @retainerSet@ will be given later.
+
+\begin{code}
+/* includes/StgRetainerProf.h */
+typedef ... retainerSet;
+\end{code}
+
+We make type @retainerSet@ to be publicly available by including
+@includes/StgRetainerProf.h@ itself to @includes/Stg.h@ (not wrapped
+by @PROFILING@).
+
+\begin{code}
+/* includes/Stg.h */
+#include "StgRetainerProf.h"
+\end{code}
+
+Then we add a retainer set field @rs@ to the @StgProfHeader@ structure.
+
+\begin{code}
+/* include/Closures.h */
+typedef struct {
+   CostCentreStack *ccs;
+   retainerSet *rs;
+} StgProfHeader;
+\end{code}
+
+Now every closure @c@ (including static closures) has a retainer set field,
+which can be accessed with @c->header.prof.rs@ (where @c@ denotes the 
+address of the closure).
+
+\subsection{Changing constants}
+
+We are ready to reflect the new size of Haskell closures to other part
+of the source code.
+This is accomplished by changing a few constants which specify the size
+of certain types of closures and their layout.
+
+When building the runtime system, the @gcc@ compiler correctly figures out
+the size of every structure on its own.
+However, 
+GHC simply reads @includes/Constants.h@ to to determine the size of 
+closures assumed by the runtime system.
+Thus, we must change the constants used by the GHC itself (as opposed to
+the runtime system). They are all found in @includes/Constants.h@.
+We increase each of them by 1 to reflect the retainer set field which is one 
+word long:
+\begin{code}
+/* includes/Constants.h */
+#define PROF_HDR_SIZE       2
+#define SCC_UF_SIZE         5
+#define SCC_SEQ_FRAME_SIZE  4
+\end{code}
+@PROF_HDR_SIZE@ denotes the size of the structure @StgProfHeader@, which
+is now two words long. 
+@SCC_UF_SIZE@ and @SCC_SEQ_FRAME_SIZE@ denote the size of the structures
+@StgUpdateFrame@ and @StgSeqFrame@ (in @includes/Closures.h@) in
+words.
+
+Now we must rebuild the GHC so that, when executed, the code generated by 
+the GHC must now allocate one more word for the retainer set field than before.
+
+\begin{code}
+./fptools/ghc/  make boot
+./fptools/ghc/  make
+\end{code}
+
+The second command @make boot@ instructs the build system to analyze
+the source code dependency so that the next execution of @make@ correctly
+finds all required files.
+
+Next we change four bitmap constants which specify the layout of
+certain types of closures.
+As an example, let us consider @RET_BITMAP@, which specifies the layout
+of thunk selectors (corresponding to closure type @THUNK_SELECTOR@).
+Without a retainer set field, there is only one non-pointer (represented
+by $1$) followed by one or more pointers (represented by $0$) in the closure 
+body and the bitmap representation is $0b1$, or $1$. 
+With a retainer set field, which is not a pointer to another closure and thus
+represented by $1$, there are two non-pointers, and the bitmap representation
+is $0b11$, or $3$. Notice that the bitmap is interpreted in reverse order:
+the least significant bit corresponds to the first word in the closure body,
+and the second least significant bit to the second word, and so forth.
+The same rule applies to the other three bitmap constants:
+@CATCH_FRAME_BITMAP@ (for closure type @CATCH_FRAME@ and structure 
+@StgCatchFrame@),
+@STOP_THREAD_BITMAP@ (for closure type @STOP_FRAME@ and structure 
+@StgStopFrame@), and
+@UPD_FRAME_BITMAP@ (for closure type @UPDATE_FRAME@ and structure
+@StgUpdateFrame@).
+
+\begin{code}
+/* rts/StgStdThunks.hc */
+#define RET_BITMAP          3
+/* rts/Exception.hc */
+#define CATCH_FRAME_BITMAP  15
+/* rts/StgStartup.hc */
+#define STOP_THREAD_BITMAP  3
+/* rts/updates.hc */
+#define UPD_FRAME_BITMAP    7
+\end{code}
+
+For most closure types, the new definition of @StgProfHeader@ is
+automatically propagated to their corresponding structures.
+However, there are six closures types which are not affected by 
+@StgProfHeader@. They are all stack closures:
+@RET_DYN@, @RET_BCO@, @RET_SMALL@, @RET_VEC_SMALL@, @RET_BIG@, and
+@RET_VEC_BIG@. 
+If you want a new field to be added to these closures, you may
+have to modify their corresponding structures.
+
+\textbf{To do:} Presently the above changes introduce two bug in the 
+runtime system. 
+First, @nofib/real/symalg@ ends up with a division-by-zero
+exception if we add a new field. 
+Second, the runtime system option @-auto-all@ clashes in some test files
+in the @nofib@ testing suite (e.g., @spectral/expert@).
+
+\subsection{Initialization code}
+
+When a new closure is allocated, its retainer set field may have to be 
+initialized according to the way that retainer profiling is implemented.
+For instance, we could use as an initial value a pointer to an empty retainer 
+set. 
+Alternatively we could assign a null pointer to indicate that its retainer
+set has not been computed yet, which we adopt in our implementation.
+In either case, we have to visit the new closure and execute some initialization
+code on it so that its retainer set field is set to an appropriate value.
+
+There are three parts in the source code which need to be modified.
+dynamic closure initialization, static closure initialization,  
+and update frame initialization.
+The first is accomplished by modifying the macro @SET_PROF_HDR()@ (in 
+@include/ClosureMacros.h@). When a closure  @c@ is created at runtime, 
+@SET_PROF_HDR()@ is invoked immediately with @c@ as its first argument.
+Thus, the following code initializes the retainer set field of every
+dynamic closure to a null pointer.
+
+\begin{code}
+/* include/ClosureMacros.h */
+#define SET_PROF_HDR(c,ccs_)            \
+        ((c)->header.prof.ccs = ccs_, (c)->header.prof.rs = NULL)
+\end{code}
+
+Similarly, the macro @SET_STATIC_PROF_HDR()@ (in the
+same file) specifies how the retainer set field of every static closure
+is initialized, which is rewritten as follows:
+
+\begin{code}
+/* include/ClosureMacros.h */
+#define SET_STATIC_PROF_HDR(ccs_)       \
+        prof : { ccs : ccs_, rs : NULL },
+\end{code}
+
+\textbf{Obsolete:} Dynamic closures created through explicit C function invocations
+(in @RtsAPI.c@) are now initialized by @SET_HDR()@.
+
+%There is another way of creating dynamic closures through explicit C 
+%function invocations at runtime. 
+%Such functions are all defined in @RtsAPI.c@: @rts_mkChar()@, @rts_mkInt()@,
+%@rts_mkWord()@, and so forth.
+%Each function allocates memory for a new closure, 
+%initializes it, and returns its address. 
+%Therefore, we can simply insert in each function another initialization code 
+%for retainer set fields.
+%To this end, we define a macro @setRetainerField()@ and insert it 
+%in each function:
+%
+%\begin{code}
+%#define setRetainerField(p)             \
+%  (p)->header.prof.rs = NULL
+%\end{code}
+%
+%For instance, @rts_mkChar()@ is now defined as follows:
+%
+%\begin{code}
+%/* RtsAPI.c */
+%HaskellObj
+%rts_mkChar (HsChar c)
+%{
+%  StgClosure *p = (StgClosure *)allocate(CONSTR_sizeW(0,1));
+%  ...
+%  setRetainerField(p);
+%  return p;
+%}
+%\end{code}
+
+Finally we may need to initialize the retainer set field of an update frame 
+(stack closure) when it is pushed onto the stack for the first time. 
+For instance, if we want to initialize the retainer set field of update
+frames to a null pointer, we can rewrite the macro @PUSH_STD_CCCS()@ 
+(in @includes/Updates.h@) as follows:
+
+\begin{code}
+/* includes/Updates.h */
+#define PUSH_STD_CCCS(frame)            \
+  (frame->header.prof.ccs = CCCS, frame->header.prof.rs = NULL)
+\end{code}
+
+In our implementation of retainer profiling, however, the retainer set field is not 
+used for any stack closure.
+Hence, the above modification is entirely unnecessary.
+Also, update frames are the only exception to the standard way of creating
+stack closures: all the other types of stack closures with a retainer set 
+field are eventually initialized by
+the macro @SET\_HDR()@ (in @includes/ClosureMacros.h@), which in turn
+invokes @SET\_PROF\_HDR()@. This is not the case for update frames.
+Compare @PUSH\_UPD\_FRAME()@ (in @includes/Updates.h@) and  
+@PUSH\_SEQ\_FRAME()@ (in @includes/StgMacros.h@) for clarification.
+
+\section{Retainer Sets}
+
+At the end of retainer profiling, every live closure (except stack
+closures, for which we do not compute retainer sets) is associated with
+a retainer set; there can be no closure without an associated retainer set
+because every live closure is visited during traversal.
+Since many closures may well be associated with a common retainer set,
+we want to avoid creating any particular retainer set more than once.
+This section presents the details of manipulating retainer sets in our
+implementation.
+
+\subsection{Identity of a retainer}
+
+The function @R()@ in Figure~\ref{fig-retaineralgorithm} returns
+the identity of a retainer. In order to implement it, we need 
+a type for retainer identity.
+The type @retainer@ (in @includes/StgRetainerProf.h@) is introduced for
+this purpose. 
+
+There are various ways of defining the type @retainer@. 
+For instance, we can designate the information table address of a retainer as
+its identity as follows:
+
+\begin{code}
+struct _StgInfoTable;
+typedef struct _StgInfoTable *retainer;
+\end{code}
+
+We can also use the cost centre stack associated with the retainer:
+
+\begin{code}
+typedef CostCentreStack *retainer;
+\end{code}
+
+The function @R()@ is embodied as the function @getRetainerFrom()@ in the 
+implementation, whose type is @(StgClosure *)@ $\rightarrow$ @retainer@.
+It is straightforward to define @getRetainerFrom()@ according to the definition
+of @retainer@, as illustrated below:
+
+\begin{code}
+retainer getRetainerFrom(StgClosure *c) { return get_itbl(c); }
+retainer getRetainerFrom(StgClosure *c) { return c->header.prof.ccs; }
+\end{code}
+
+\subsection{Retainer sets and the cost function}
+
+A retainer set is stored in the structure @retainerSet@ 
+(in @includes/StgRetainerProf.h@):
+
+\begin{code}
+typedef struct _retainerSet {
+  nat num;
+  nat cost;
+  ...
+  int id;
+  retainer element[0];
+} retainerSet;
+\end{code}
+
+The field @num@ gives the number of retainers in the retainer set, which
+are all stored in the array @element[]@. Thus, the size of @element[]@
+is assumed to be @num@.
+The field @cost@ gives the sum of the \emph{costs} of those closures 
+associated with the retainer set: if a closure @c@ is
+associated with the retainer set, that is, if @c@ is retained by each
+retainer in the retainer set and none else,
+the cost of @c@ is added to the field @cost@. 
+The field @id@ gives a unique identification number for the retainer set.
+
+The interface to @retainerSet@ is as follows 
+(see @RetainerSet.h@):
+
+\begin{description}
+\item[@void initializeAllRetainerSet(void)@] initializes the store for retainer sets.
+\item[@void refreshAllRetainerSet(void)@] refreshes each retainer set by setting
+its @cost@ field to zero. This function does destroy any retainer set.
+\item[@void closeAllRetainerSet(void)@] destroys all retainer sets and closes
+the store for retainer sets.
+\item[@retainerSet *singleton(retainer r)@] returns a singleton retainer set
+consisting of @r@ alone. If such a retainer set already exists, no new retainer
+set is created. Otherwise, a new retainer set is created.
+\item[@retainerSet *addElement(retainer r, retainerSet *rs)@] returns a retainer set
+@rs@ augmented with @r@. If such a retainer set already exists, no new retainer set
+is created. Otherwise, a new retainer set is created.
+\item[@rtsBool isMember(retainer r, retainerSet *rs)@] returns a boolean value
+indicating whether @r@ is a member of @rs@.
+\item[@void traverseAllRetainerSet(void (*f)(retainerSet *))@] invokes the function
+@f@ on every retainer set created.
+\item[@void printRetainerSetShort(FILE *, retainerSet *)@] prints a single retainer
+set.
+\item[@void outputRetainerSet(FILE *, nat *allCost, nat *numSet)@] prints all 
+retainer sets. Stores the sum of all their costs in @*allCost@ and the number
+of them in @*numSet@.
+\item[@void outputAllRetainerSet(FILE *)@] prints all retainer sets.
+\end{description}
+
+We also define a \emph{cost function}, which returns the cost of a given closure, 
+in order to compute the field @cost@. 
+The cost function can be defined in several ways.
+A typical definition is on the size of a closure, which results in
+the field @cost@ accumulating the size of all the closures retained by a
+retainer set.
+If we just want to count the number of closures retained by the
+retainer set, we can simply set the cost of every closure to one regardless
+of its closure type.
+Furthermore, we can define the cost function flexibly according to
+the closure type. 
+For instance, we can set the size of any static closure to zero so that
+it is not taken into account at all in computing the field @cost@. 
+Notice that static closures are also visited during traversal because they 
+may lead to other dynamic closures (e.g., static indirection closures of 
+closure type @IND_STATIC@).
+This is especially desirable because we usually focus on the heap use.
+We can also selectively choose certain dynamic closure types not to contribute 
+to the field @cost@.
+
+In our implementation, there are two functions related with the cost function:
+@cost()@ and @costPure()@. 
+@cost()@ returns the size of the entire memory allocated for a given closure
+(even including the two fields in the structure @StgProfHeader@). 
+It returns zero for static closures. 
+@costPure()@ returns the size of the memory which would be allocated for
+a given closure with no profiling.
+It is defined in terms of @cost()@, and it suffices to change only @cost()@
+when a new scheme for the cost function is desired.
+@costPure()@ is put to actual use in computing the field @cost@ because it 
+effectively hides the memory overhead incurred by profiling.
+
+\subsection{Implementation}
+
+The algorithm for retainer profiling in Figure~\ref{fig-retaineralgorithm} 
+adds at most one retainer to an existing retainer set (or an empty retainer set)
+at any moment; it does not require a retainer set union operation.
+This observation simplifies the implementation, and
+we employ the following two functions for creating new retainer sets: 
+@singleton()@, which creates a singleton retainer set, and 
+@addElement()@, which adds an element to an existing retainer set. 
+
+It is a frequent operation during retainer profiling to search for a retainer
+set, which may or may not exist, built from a given retainer set and a
+particular retainer.
+To efficiently implement this operation,
+we choose to store all retainer sets in a hash table and 
+the structure @retainerSet@ is now extended with two new fields
+@hashKey@ and @link@.
+The field @hashKey@ stores the hash key which is obtained
+from the retainers in a retainer set.
+The field @link@ points to the next retainer set in the same bucket:
+
+\begin{code}
+typedef struct _retainerSet {
+  ...
+  StgWord hashKey;
+  struct _retainerSet *link;
+  ...
+} retainerSet;
+\end{code}
+
+The hashing function must be defined in such a way that a retainer set
+can have only one unique hash key regardless of the order its elements
+are stored, i.e., the hashing function must be additive.
+
+It is often observed that two successive executions of retainer profiling share
+a number of retainer sets in common, especially if the two executions are
+close in time.
+This also implies that the number of all retainer sets which can be created
+at any moment does not grow indefinitely regardless of the interval at which
+retainer profiling is performed; it does not grow commensurately with the
+number of times retainer profiling is executed.
+This observation eliminates the need to free the memory allocated for
+retainer sets; we can simply set the @cost@ field of every retainer set
+to zero after each retainer profiling and reuse it during the next time.
+
+\section{Graph Traversal}
+
+At the heart of retainer profiling lies \emph{graph traversal};
+the algorithm in Figure~\ref{fig-retaineralgorithm} is supposed to visit
+every closure in the graph at least once and yield statistics on the heap use.
+Since only live closures are reachable from the root, the algorithm
+does not deal with dead closures.
+
+This section presents details on how to achieve an efficient implementation of 
+graph traversal without incurring extra memory overhead and compromising speed.
+
+\subsection{Goal}
+
+Traversing a graph itself can be done in a straightforward way;
+we choose either depth first search or breadth first search, and traverse
+the graph starting from a given set of roots.
+After a complete traversal, each live closure @c@ (including static closures)
+has an associated retainer set, whose address is stored in the field
+@c->header.prof.rs@. 
+
+A real complication arises when retainer profiling is performed once again:
+all live closures which have survived all garbage collections since 
+the previous retainer profiling 
+still have an associated retainer set (indicated by
+a non-null pointer in their retainer set field), which is no longer
+valid. Any new closure created since then has
+a null pointer in its retainer set field at the beginning of retainer 
+profiling and will become associated with a retainer set.
+Thus, we can no longer distinguish valid retainer set fields
+from invalid ones. 
+
+A simple remedy is to linearly scan the heap at the beginning of each 
+retainer profiling and set all retainer set fields to a null pointer.
+It resets the retainer set field of each dynamic closure, whether it is
+live or not with respect to the given set of root.
+This is feasible because any closure in the heap directly adjoins the
+next closure, if any.
+The problem is that we have no way of visiting all live static closures,
+for which we compute retainer sets.
+
+A moment of thought, however, reveals that we can completely avoid computing 
+retainer sets for static closures. This is because retainer profiling is 
+concerned only about the heap, which consists of dynamic closures and no
+static closures. In other words, we can treat every static closure as
+a bridge connecting two dynamic closures. 
+For instance, if a dynamic closure @c@$_1$ has a pointer to a static
+closure @s@ and @c@ has a pointer to another dynamic closure @c@$_2$,
+we can think of the pointer in @c@$_1$ as a direct pointer to @c@$_2$.
+The big problem is that if the graph has a cycle containing static closures,
+an infinite loop occurs. In other words, we have no way of telling whether
+a static closure has been visited or not and are forced to compute
+retainer sets for static closures as well.\footnote{For instance, 
+a static closure is allowed to have a self-reference in its SRT, which
+is also followed during retainer profiling.}
+
+Another remedy is to stores in every closure a time stamp for the
+retainer set field. The time stamp indicates whether the retainer
+set field is valid or no longer valid (i.e., it is for the previous
+retainer profiling). 
+At the cost of one extra field in each closure, we can achieve an
+elegant implementation with little complication.
+However, it turns out that the memory overhead is too big.\footnote{A typical
+dynamic closure is only two or three words long.}
+Thus, our goal is to stick to the definition of the structure @StgProfHeader@
+given earlier and yet to achieve an elegant solution.
+
+\subsection{Basic plan}
+
+Since we visit every live object and update its retainer set field,
+any retainer set field can either be valid (the corresponding retainer
+set is valid) or point to a retainer set created during the previous
+retainer profiling. 
+In order to distinguish valid retainer set fields
+from invalid ones, we exploit the least significant bit of the retainer
+set field: we maintain a one bit mark which flips over every time
+retainer profiling is performed, and judge that a retainer set field is
+valid only if its least significant bit matches the mark.
+The variable @flip@ serves for this purpose. 
+The macros @isRetainerSetFieldValid()@ tests if the retainer set field
+of a give closure @c@ is valid:
+
+\begin{code}
+#define isRetainerSetFieldValid(c) \
+  ((((StgWord)(c)->header.prof.rs & 1) ^ flip) == 0)
+\end{code}
+
+As an example, a retainer set field can be set to a null value conforming
+the current value of @flip@ by the macro @setRetainerSetToNull()@:
+
+\begin{code}
+#define setRetainerSetToNull(c)   \
+  (c)->header.prof.rs = (retainerSet *)((StgWord)NULL | flip)
+\end{code}
+
+Now, when a dynamic closure @c@ is created, its retainer set field is
+initialized to a null value conforming to the current value of 
+@flip@:\footnote{Actually this is not mandatory: even when the null
+value does not conform to the current value of @flip@, it will be replaced
+by a correct null value when @c@ is visited later.}
+
+\begin{code}
+extern StgWord flip;
+#define SET_PROF_HDR(c,ccs_)            \
+        ((c)->header.prof.ccs = ccs_, (c)->header.prof.rs = (retainerSet *)((StgWord)NULL | flip))
+\end{code}
+
+We do not need to revise @SET_STATIC_PROF_HDR()@ if the initial value of
+@flip@ is set to $0$.\footnote{For the same reason, an initial value $1$
+does not compromise the correctness of the implementation.}
+
+\subsection{Set of roots}
+
+The set of roots consists of all thread closures (running, sleeping, or 
+blocked) existing at the beginning of a retainer profiling. 
+It is handily obtained in an indirect way by invoking the function
+@GetRoots()@ (in @Schedule.c@) with an appropriate argument, which must be
+a function:
+@GetRoots()@ invokes on each root known to the runtime system its argument.
+Thus, we implement a function @retainClosure()@, which initiates traversal
+from a given root and updates the retainer set of every closure reachable
+from the root,
+and invokes @GetRoots()@ with @retainClosure@ as an argument.
+
+In addition to the thread closures, weak pointers are also considered
+as roots; they may not be reachable from any thread closure yet are still
+being in used.
+A weak pointer has three pointer fields: @key@, @value@, and 
+@finalizer@ (see the structure @StgWeak@ in @includes/Closures.h@).
+It turns out that these pointers may not be valid all the time:
+at a certain point during execution, for instance, the pointer @key@ may point 
+to a dead closure. 
+However, right after a major garbage collection, all the three pointers are
+guaranteed to be valid, i.e., they all point to live closures. 
+This facilitates the handling of weak pointers if we choose to
+perform retainer profiling immediately after a major garbage collection.
+All weak pointers are found in the linked list @weak_ptr_list@ 
+(in @Weak.c@).
+
+See the function @computeRetainerSet()@ for details.
+
+\subsection{Static closures}
+
+When a live dynamic closure @c@ is visited for the first time during traversal,
+its retainer set field is checked against the current value of @flip@.
+If it was created at some point since the previous retainer profiling, 
+its retainer set field is already set to a correct null value. 
+Otherwise, it must have been visited 
+during the previous retainer profiling and thus its retainer set field is
+invalid and will be set to a correct null value. 
+Therefore it is unnecessary to visit all dynamic closures and set their
+retainer set field to a correct null value at the beginning of each retainer 
+profiling.
+
+However, this operation is required for static closures. 
+The reason is that a static closure, which is never garbage collected,
+may appear alternately in the set of live closures.
+In other words, a currently live static closure may become dead and 
+be resuscitated again.
+Therefore, for a static closure, it does not help to check if its
+retainer set field conforms to the current value of @flip@. 
+For instance, 
+if a static closure happens to belong to the set of live closures every other 
+retainer profiling, its retainer set field will never set to a null value,
+which is disastrous.
+Therefore, we choose to visit all live static closures at the beginning
+of each retainer profiling and set their retainer set field to a 
+correct null value. 
+
+In order to find all live static closures, we have each retainer 
+profiling preceded by a major garbage collection, which knows all live
+static closures.\footnote{This is a heavy 
+restriction on retainer profiling, which makes retainer profiling partially
+dependent on garbage collection. 
+However, it does not affect any retainer profiling result because 
+retainer profiling considers only live closures, which survive any
+garbage collection.} 
+To be specific, the garbage collector builds a linked list
+@scavenged_static_objects@ (in @GC.c@) during a major garbage collection,
+which stores all live static closures of our interest.
+\footnote{
+A static closure of closure type @IND\_STATIC@ may be put in the
+list @mut\_once\_list@ of the oldest generation, instead of the list
+@scavenged\_static\_objects@. 
+In our implementation, such a closure is just skipped over because it
+contains only a pointer to a dynamic closure, and we do not compute
+its retainer set.
+Thus, there is no need to traverse the list @mut\_once\_list@ of the oldest 
+generation.}
+Since it destroys the linked list after finishing the major garbage collection
+(by invoking the function @zero_static_object_list()@ with 
+@scavenged_static_objects@ as its argument),
+we traverse the linked list to set the retainer set field of each
+live static closure to a correct null value before its destruction.
+This is done by invoking the function 
+@resetStaticObjectForRetainerProfiling()@.
+
+\textbf{To do:} In the current implemenation, if a static closure has no child
+(e.g., @CONSTR_NOCAF_STATIC@, @THUNK_STATIC@ with an empty SRT, and
+@FUN_STATIC@ with an empty SRT), we do not compute its retainer set (because
+there is no need to do). This slight optimization allows us to render 
+retainer profiling no longer dependent on garbage collection due to the
+following propoerty: 
+
+\begin{center}
+A static closure can alternately appear and disappear in the set of live 
+closures across multiple executions of retainer profiling if and only if
+it has an empty SRT and no child.
+\end{center}
+
+Then we can completely eliminate the function 
+@resetStaticObjectForRetainerProfiling()@. 
+
+\subsection{Traversal}
+
+The traversal proceeds in a depth first manner and is implemented
+with two mutually recursive functions: @retainStack()@ and @retainerClosure()@.
+@retainerStack()@ can be invoked on dynamic closures holding a stack chunk:
+closure types @TSO@, @PAP@, and @AP_UPD@. 
+It in turn invokes @retainerClosure()@ on each closure reachable from
+stack closures in the stack chunk. Notice that it does not invoke
+@retainerClosure()@ on those stack closures because we do not compute 
+retainer sets for stack closures.
+@retainerClosure()@ iteratively traverses all live closures reachable
+from a given closure.
+It maintains its own stack to record the next scan position in every closure
+currently under consideration.\footnote{A recursive version of 
+@retainerClosure()@ could be implemented easily. 
+@retainerClosure()@ in our implementation is an iterative version.}
+When it encounters a closure holding a stack chunk, it invokes @retainerStack()@
+on that closure. 
+Hence,
+the traversal is triggered simply by invoking @retainerClosure()@ on every root.
+
+\textbf{To do:} 
+The correctness of retainer profiling is subject to the correctness
+of the two macros @IS_ARG_TAG()@ and @LOOKS_LIKE_GHC_INFO()@
+(see @retainStack()@). Since
+@LOOKS_LIKE_GHC_INFO()@ is a bit precarious macro, so I believe that
+the current implementation may not be quite safe. Also, @scavenge_stack()@
+in @GC.c@ also exploits this macro in order to identify shallow pointers.
+This can be a serious problem if a stack chunk contains some
+word which looks like a pointer but is actually not a pointer.
+
+\subsection{Sanity check}
+
+Since we assume that a retainer profiling is preceded by a major garbage
+collection,
+we expect that the size of all the dynamic closures visited during 
+any retainer profiling adds up exactly to the total size of the heap.
+In fact, this is not the case; there can be closures not reachable from
+the set of roots yet residing in the heap even after a major garbage
+collection.
+
+First, a dead weak pointer remains in the heap until its finalizer
+finishes. Although its finalizer thread closure is part of the set of roots,
+the dead weak pointer itself is not reachable from any root.
+Since it cannot be visited during retainer profiling anyhow, we do not
+need to located it and set its retainer set field 
+appropriately.\footnote{Dead weak pointers are identified with their 
+information table @stg\_DEAD\_WEAK\_info@ (in @StgMiscClosures.hc@). 
+Notice that their closure type is @CONSTR@, \emph{not} @WEAK@;
+their information table is replaced by @stg\_DEAD\_WEAK\_info@ in the 
+function @scheduleFinalizers()@ (in @GC.c@).} 
+
+Second, 
+mutable variables (of closure type @MUT_VAR@) may remain in the heap
+even when they are not reachable from the set of roots while
+dynamic closures pointed to by them must be live.\footnote{I do not 
+understand clearly why this happens :(} 
+Since such mutable variables may become live again (in the sense that
+they become reachable from the set of roots), we must locate them
+and set their retainer set field appropriately after each retainer
+profiling. This is handily accomplished by traversing the list
+@mut_once_list@ in every generation.
+
+\section{Retainer Profiling Schemes}
+
+A retainer profiling scheme specifies \emph{what} retainer profiling
+yields (as opposed to \emph{how} retainer profiling computes the retainer
+set for every live object).
+It is determined primarily by the meaning of retainer identity,
+that is, the type @retainer@ (in @includes/StgRetainerProf.h@). 
+The function @getRetainerFrom()@ must be defined according to the
+definition of the type @retainer@.
+
+In order for a new retain profiling scheme to fully work, we need to follow
+four steps:
+
+\begin{enumerate}
+\item Define the type @retainer@ as desired.
+\item Write @getRetainerFrom()@.
+\item Write two hashing functions @hashkeySingletone()@ and 
+      @hashKeyAddElement()@, which return the hash key from a single
+      retainer and a retainer set with another retainer, respectively.
+\item Write two printing functions @printRetainer()@ and 
+      @printRetainerSetShort()@.
+      These functions are employed when a retainer or a retainer set is
+      printed in the output file. 
+\end{enumerate}
+
+In our implementation, we use cost centre stacks for retainer identity:
+
+\begin{code}
+typedef CostCentreStack *retainer;
+\end{code}
+\begin{code}
+retainer getRetainerFrom(StgClosure *c) { return c->header.prof.ccs; }
+\end{code}
+\begin{code}
+void printRetainer(FILE *f, retainer cc)
+{
+  fprintf(f,"%s[%s]", cc->label, cc->module);
+}
+\end{code}
+
+\textbf{To do:} All the closures created by @rts_mk...()@ in @RtsAPI.c@ are given 
+@CCS_SYSTEM@ as their cost centre stacks. This may not be accurate indeed,
+and, for instance, @CCCS@ may be a better choice than @CCS_SYSTEM@. 
+
+\section{Usage}
+
+Since cost centre stacks are used as retainer identity, a source program
+must be given proper cost centre annotations by programmers. 
+Alternatively,
+we can ask the compiler to automatically insert cost centre annotations.
+For instance, the compiler option @-auto-all@ inserts a cost centre
+annotation around every top-level function as shown below 
+(the @-p@ option is a must
+because we must build the executable file in a profiling way):
+
+\begin{code}
+$ ghc-inplace -o Foo.out -p -auto-all Foo.hs
+\end{code}
+
+The runtime system option @-hR@ tells the executable program to
+gather profiling statistics and report them in a @.prof@ file:
+
+\begin{code}
+$ Foo.out +RTS -hR -RTS
+\end{code}
+
+The option @-i@ can be used to 
+specify a desired interval at which retainer profiling is performed.
+The default and minimum value is half a second:
+
+\begin{code}
+$ Foo.out +RTS -hR -i2.5 -RTS
+\end{code}
+
+Then, two text files are generated: a @.prof@ file and a @.hp@ file.
+The @.prof@ file records the progress of retainer profiling:
+for each retainer profiling performed during program execution, 
+it shows
+the Haskell mutator time (as opposed to the user time) at which 
+the retainer profiling starts,
+the average number of times a closure is visited, 
+the sum of costs assigned to all retainer sets (obtained from the field
+@cost@ in each retainer set),
+and the number of all retainer sets created \emph{since} the beginning
+of program execution.
+A typical entry in a @.prof@ file looks like:
+
+\begin{code}
+Retainer Profiling: 3, at 3.530000 seconds
+  Average number of visits per object = 1.687765
+  Current total costs = 801844
+  Number of retainer sets = 118
+\end{code}
+
+The sum of costs assigned to all retainer sets may \emph{not} be equal to the
+size of the heap. 
+The discrepancy is attributed to those live object which are not reachable
+from the set of roots. 
+Still it is a good estimate of the size of the heap at the moment when
+the retainer profiling was performed.
+
+The @.prof@ file also shows the contents of every retainer set which 
+has been assigned a positive cost (i.e., the field @cost@) at least once;
+not every retainer set created is assigned a positive cost because quite
+a few retainer sets are created as intermediate retainer sets before
+creating a real retainer set. This results from the restriction on the way
+retainer sets are created (only one retainer can be added to an existing
+retainer set at a time).
+
+An example of the contents of a retainer set is:
+
+\begin{code}
+SET 71 = {<doFile[Main],main[Main],MAIN[MAIN]>, <synth_2[Main],doFile[Main],main[Main],MAIN[MAIN]>}
+\end{code}
+
+The retainer set has an identification number $71$.
+It is associated with two retainers, whose retainer identities are shown 
+inside angle brackets @<...>@.
+For instance, the first retainer is created when the cost centre stack
+is @doFile[Main],main[Main],MAIN[MAIN]@, shown from the top to the bottom.
+Each entry in angle brackets consists of a cost centre name (e.g., @doFile@) 
+and its module name (e.g., @Main@).
+
+The @.hp@ file can be supplied to the @hp2ps@ program to create a postscript 
+file showing the progress of retainer profiling in a graph:
+
+\begin{code}
+$ hp2ps Foo.hs
+$ gv Foo.ps
+\end{code}
+
+An example of such a graph is shown in Figure~\ref{fig-cacheprof}.
+It shows the cost assigned to each retainer set at the point 
+when a retainer profiling is performed (marked by a corresponding inverted 
+triangles on the horizontal axis). 
+The abbreviated contents of each retainer set is displayed in the right column.
+Due to the space limitation,
+it shows only topmost cost centres (without module names)
+instead of printing the full contents.
+For instance, @(71)doFile,synth_2@ corresponds to a retainer set shown above 
+(@71@ is its identification number).
+The contents may be truncated if it is too long. 
+
+Notice that the time is in the Haskell mutator time, which excludes 
+the runtime system time such as garbage collection time and retainer profiling
+time. Thus, the actual execution takes longer than indicated in the
+graph. Also, the timer employed to periodically perform retainer profiling
+is not perfectly accurate. Therefore, the result may slightly vary for each
+execution of retainer profiling.
+
+\begin{figure}[ht]
+\centering
+\epsfig{file=cacheprof_p.eps,width=5in}
+\caption{A graph showing the progress of retainer profiling}
+\label{fig-cacheprof}
+\end{figure}
+
+\section{Comparision with nhc}
+
+\section{Files}
+
+This section gives a summary of changes made to the GHC in 
+implementing retainer profiling.
+Only three files (@includes/StgRetainerProf.h@, @RetainerProfile.c@, and 
+@RetainerProfile.h@) are new, and all others exist in the GHC.
+
+@\includes@ directory:
+
+\begin{description}
+\item[StgRetainerProf.h] defines types @retainer@ and @retainerSet@. 
+\item[Stg.h] includes the header file @StgRetainerProf.h@.
+\item[Closures.h] changes structure @StgProfHeader@.
+\item[Constants.h] changes constants @PROF_HDR_SIZE@, @SCC_UF_SIZE@, and
+  @SCC_SEQ_FRAME_SIZE@.
+\item[ClosureMacros.h] changes macros @SET_PROF_HDR()@ and 
+  @SET_STATIC_PROF_HDR()@.
+\item[Updates.h] changes macro @PUSH_STD_CCCS()@.
+\end{description}
+
+@\rts@ directory:
+
+\begin{description}
+\item[Exception.hc] changes constant @CATCH_FRAME_BITMAP@, 
+\item[StgStartup.hc] changes constant @STOP_THREAD_BITMAP@.
+\item[StgStdThunks.hc] changes constant @RET_BITMAP@.
+\item[Updates.hc] changes constant @UPD_FRAME_BITMAP@.
+\item[RetainerProfile.c] implements the retainer profiling engine.
+\item[RetainerProfile.h] is the header for @RetainerProfile.c@.
+\item[RetainerSet.c] implements the abstract datatype @retainerSet@.
+\item[RetainerSet.h] defines the interface for @retainerSet@.
+\item[GC.c] invokes @resetStaticObjectForRetainerProfiling()@ in 
+  @GarbageCollect()@.
+\item[Itimer.c] changes @handle_tick()@.
+\item[ProfHeap.c] changes @initHeapProfiling()@ and @endHeapProfiling()@.
+\item[Profiling.c] changes @initProfilingLogFile()@ and 
+  @report_ccs_profiling()@.
+\item[Proftimer.c] declares @ticks_to_retainer_profiling@, 
+  @performRetainerProfiling@, and @doContextSwitch@.
+\item[Proftimer.h] is the header for @Proftimer.c@. Defines @PROFILING_MIN_PERIOD@,
+  which specifies the minimum profiling period and the default profiling period.
+%\item[RtsAPI.c] implements @setRetainerField()@.
+\item[RtsFlags.c] 
+  sets @RtsFlags.ProfFlags.doHeapProfile@ and
+  adds a string to  @usage_text[]@ in @setupRtsFlags()@.      
+\item[RtsFlags.h] defines constants @HEAP_BY_RETAINER@ and @RETAINERchar@.
+\item[RtsStartup.c] includes the header file @RetainerProfile.h@.
+  Changes @shutdownHaskell()@.
+\item[Schedule.c] changes @schedule()@. 
+\item[Stats.c] 
+  declares @RP_start_time@, @RP_tot_time@, @RPe_start_time@, 
+  @RPe_tot_time@.
+  Changes @mut_user_time_during_GC()@, @mut_user_time()@, 
+  @stat_startExit()@, 
+  @stat_endExit()@, and
+  @stat_exit()@.
+  Defines
+  @mut_user_time_during_RP()@, 
+  @stat_startRP()@, and
+  @stat_endRP()@.
+\item[Stats.h] is the header for @Stats.c@.
+\item[StgMiscClosures.hc] redefines @stg_DEAD_WEAK_info@.
+\item[Storage.c] changes @initStorage()@, @memInventory()@.
+\end{description}
+
+\bibliographystyle{plain}
+\bibliography{reference}
+
+\end{document}