X-Git-Url: http://git.megacz.com/?a=blobdiff_plain;f=ghc%2Fdocs%2Fstorage-mgt%2Frp.tex;fp=ghc%2Fdocs%2Fstorage-mgt%2Frp.tex;h=0000000000000000000000000000000000000000;hb=0065d5ab628975892cea1ec7303f968c3338cbe1;hp=2055894282fab4f759885725abfa47ebe4f05887;hpb=28a464a75e14cece5db40f2765a29348273ff2d2;p=ghc-hetmet.git diff --git a/ghc/docs/storage-mgt/rp.tex b/ghc/docs/storage-mgt/rp.tex deleted file mode 100644 index 2055894..0000000 --- a/ghc/docs/storage-mgt/rp.tex +++ /dev/null @@ -1,1102 +0,0 @@ -\documentclass{article} -\usepackage{code,a4wide} - -\usepackage{graphics,epsfig,epic,eepic,epsfig} - -\setlength{\parskip}{0.25cm} -\setlength{\parsep}{0.25cm} -\setlength{\topsep}{0cm} -\setlength{\parindent}{0cm} -\renewcommand{\textfraction}{0.2} -\renewcommand{\floatpagefraction}{0.7} - - -% Terminology -\newcommand{\block}{block} -\newcommand{\Block}{Block} -\newcommand{\segment}{segment} -\newcommand{\Segment}{Segment} -\newcommand{\step}{step} -\newcommand{\Step}{Step} - -\newcommand{\note}[1]{{\em $\spadesuit$ #1}} - -\begin{document} -\title{Implementation of Retainer Profiling} -\author{Sungwoo Park and Simon Peyton-Jones} - -\makeatactive -\maketitle - -\section{Retainer Profiling} - -Retainer profiling~\cite{CN} is a profiling technique which is based upon a -special view of production and consumption of heap objects at runtime: -while producers build heap objects to form new pieces of graph, -consumers hold pointers to these heap objects, or \emph{retain} them, so -that they are not freed during garbage collections. -On this basis, we refereed to such consumers as \emph{retainers}. -Notice that an object can have more than one retainer because it can -be pointed to by multiple objects. - -For each live object in the heap, retainer profiling computes -all its retainers, or its \emph{retainer set}. -A naive implementation of retainer profiling could consider every -immediate ancestor of an object as its retainer. -Although this approach appears to provide an accurate report on the -relationship between retainers and retainees, the result can hardly be useful. -For instance, it is pointless to scrutinize a list and treat each cons -cell as a retainer of the following cons cell. -This observation suggests that we need to have a way of designating only -certain types of objects as candidates for retainers. -In other words, we need to employ an oracle which tells whether a given -object can be a retainer or not. - -Since no retainer of a particular object needs to be using the -object actively, we can find all its retainers simply by traversing -the graph. In other words, we do not have to distinguish those retainers -actively exploiting it from other retainers just holding pointers -to it (either directly or indirectly). -Thus, retainer profiling can be accomplished simply by traversing the -graph. - -Figure~\ref{fig-retaineralgorithm} shows the algorithm for retainer -profiling. The traversal begins at every root, and proceeds -in a depth first manner (or a breadth first manner). -The function @R()@ returns the \emph{identity} of a given retainer such -as its information table address or -the name of the module which creates it. -Notice that the retainer identity function does not need to be a -one-to-one mapping: -multiple objects can share the same identity. -Such a retainer identity function reduces the cost of traversal. -For instance, when an object -is reached from several retainers which share the same identity, we need to -consider only the first visit to the object. -In other words, whichever retainer (among those sharing the same identity) -leads to the object for the first time affects the retainer set of the object -in consideration -and all the other retainers can be ignored. -Thus, the more coarse the function @R()@ is, the less -it costs to traverse the graph for retainer profiling. -The function @isRetainer()@ tells whether a given object is a retainer or not. -Hence, the behavior of the retainer profiling algorithm is parameterized -over: 1) the set of roots; 2) the function @R()@; 3) the function -@isRetainer()@. - -One important invariant on the function @R()@ is that its return value -must be consistent for a given retainer. In other words, @R()@ must return -the same value for a given retainer no matter it is invoked. -For this reason, the memory address of a retainer, for instance, cannot be used as -its retainer identity because its location may change during garbage collections. - -\begin{figure}[ht] -\begin{center} -\begin{code} -for every root r - retain(r, r) - -R(r) = - the identity of r - -isRetainer(c) = - if c is a retainer then - true - else - false - -retain(c, r) = - if R(r) is a member of c.retainerSet then - return - add R(r) to c.retainerSet - if isRetainer(c) then - r' := c - else - r' := r - for every successor c' of c - retain(c', r') -\end{code} -\caption{Retainer profiling algorithm} -\label{fig-retaineralgorithm} -\end{center} -\end{figure} - -Another way of formulating retainer profiling is in terms of fixed point -equations on retainer sets. -To be specific, given the two functions @isRetainer()@ and @R()@, -the retainer set of every object is computed as the least fixed point -solution of the following equations: -\begin{itemize} -\item For every root @r@, -\begin{center} - @R(r)@ $\in$ @r.retainerSet@. -\end{center} -\item For every reachable object @c@, -\begin{center} -$\bigcup_{\mathit{each\ ancestor\ @a@\ of\ @c@}}$ @from(a)@ $\subseteq$ -@c.retainerSet@ -\end{center} -where @from(a)@ returns retainer(s) obtainable from @a@: -\begin{center} -@from(a)@ = if @isRetainer(a)@ then $\{@a@\}$ else @a.retainerSet@. -\end{center} -\end{itemize} - -This document describes the implementation of retainer profiling on -the Glasgow Haskell Compiler runtime system. -It explains every detail of the development so that it can be (hopefully) -a complete maintenance guide. -A secondary goal is to help (hopefully) those who wish to extend the system -to implement another profiling scheme.\footnote{Unless otherwise mentioned, -all identifiers are defined in @RetainerProfile.c@ or @RetainerSet.c@.} - -\section{Installing the GHC} - -Installing the GHC is done as follows: - -\begin{enumerate} -\item Get the source code from the CVS repository. -\begin{code} -./ cvs checkout fpconfig -./fptools/ cvs update -d CONTRIB common-rts distrib docs ghc glafp-utils - hslibs literate mhms mk nofib testsuite -\end{code} - -\item Set up the basic configuration. -\begin{code} -./fptools/ autoconf -./fptools/ghc/ autoconf -./fptools/ configure -\end{code} - -\item Set up the configuration for development and debugging. -\begin{code} -./fptools/mk vi build.mk - GhcHcOpts = -O -fasm -Rghc-timing - SplitObjs = NO - GhcRtsHcOpts = - GhcRtsCcOpts = -g - STRIP =: -\end{code} -@GhcLibWays@ tells the compiler to build the code for profiling as well. -@GhcRtsHcOpts@ has additional flags for @gcc@ when compiling @.hc@ files. -@GhcRtsCcOpts@ has additional flags for @gcc@ when compiling @.c@ files. -Since we will implement retainer profiling in @.c@ files, we turn on the -debugging flag @-g@. -The empty setting for @STRIP@ tells the compiler not to remove source code -information (generated due to the @-g@ option) from executable files so that -they can be examined with @gdb@. - -\item Remove unnecessary files if needed and build everything. -\begin{code} -./fptools/ make -\end{code} -\end{enumerate} - -\section{Adding Retainer Set Fields} - -Since every Haskell closure now needs to store its retainer set at runtime, -it must be augmented with a new field, -namely, a \emph{retainer set field}. -This section explains how to add such a field to Haskell closures. -It should be clear how to generalize the idea for adding -any number of new fields.\footnote{The GHC provides two -ways of building executable programs from -source files: normal way and profiling way. -We are concerned only about profiling way, and all the pieces of code -implementing profiling way are wrapped by the @PROFILING@ -pre-processing directive (as in @\#ifdef PROFILING@). -Therefore, all the additions and changes that we make to the source code -are assumed to be wrapped by the @PROFILING@ pre-processing -directive as well unless otherwise mentioned.} - -\subsection{Adding a new field to Haskell closures} - -We want to add a retainer set field of type @retainerSet@ to every -closure, so we create a new file @includes/StgRetainerProf.h@ where -we define the type @retainerSet@. -The actual definition of @retainerSet@ will be given later. - -\begin{code} -/* includes/StgRetainerProf.h */ -typedef ... retainerSet; -\end{code} - -We make type @retainerSet@ to be publicly available by including -@includes/StgRetainerProf.h@ itself to @includes/Stg.h@ (not wrapped -by @PROFILING@). - -\begin{code} -/* includes/Stg.h */ -#include "StgRetainerProf.h" -\end{code} - -Then we add a retainer set field @rs@ to the @StgProfHeader@ structure. - -\begin{code} -/* include/Closures.h */ -typedef struct { - CostCentreStack *ccs; - retainerSet *rs; -} StgProfHeader; -\end{code} - -Now every closure @c@ (including static closures) has a retainer set field, -which can be accessed with @c->header.prof.rs@ (where @c@ denotes the -address of the closure). - -\subsection{Changing constants} - -We are ready to reflect the new size of Haskell closures to other part -of the source code. -This is accomplished by changing a few constants which specify the size -of certain types of closures and their layout. - -When building the runtime system, the @gcc@ compiler correctly figures out -the size of every structure on its own. -However, -GHC simply reads @includes/Constants.h@ to to determine the size of -closures assumed by the runtime system. -Thus, we must change the constants used by the GHC itself (as opposed to -the runtime system). They are all found in @includes/Constants.h@. -We increase each of them by 1 to reflect the retainer set field which is one -word long: -\begin{code} -/* includes/Constants.h */ -#define PROF_HDR_SIZE 2 -#define SCC_UF_SIZE 5 -#define SCC_SEQ_FRAME_SIZE 4 -\end{code} -@PROF_HDR_SIZE@ denotes the size of the structure @StgProfHeader@, which -is now two words long. -@SCC_UF_SIZE@ and @SCC_SEQ_FRAME_SIZE@ denote the size of the structures -@StgUpdateFrame@ and @StgSeqFrame@ (in @includes/Closures.h@) in -words. - -Now we must rebuild the GHC so that, when executed, the code generated by -the GHC must now allocate one more word for the retainer set field than before. - -\begin{code} -./fptools/ghc/ make boot -./fptools/ghc/ make -\end{code} - -The second command @make boot@ instructs the build system to analyze -the source code dependency so that the next execution of @make@ correctly -finds all required files. - -Next we change four bitmap constants which specify the layout of -certain types of closures. -As an example, let us consider @RET_BITMAP@, which specifies the layout -of thunk selectors (corresponding to closure type @THUNK_SELECTOR@). -Without a retainer set field, there is only one non-pointer (represented -by $1$) followed by one or more pointers (represented by $0$) in the closure -body and the bitmap representation is $0b1$, or $1$. -With a retainer set field, which is not a pointer to another closure and thus -represented by $1$, there are two non-pointers, and the bitmap representation -is $0b11$, or $3$. Notice that the bitmap is interpreted in reverse order: -the least significant bit corresponds to the first word in the closure body, -and the second least significant bit to the second word, and so forth. -The same rule applies to the other three bitmap constants: -@CATCH_FRAME_BITMAP@ (for closure type @CATCH_FRAME@ and structure -@StgCatchFrame@), -@STOP_THREAD_BITMAP@ (for closure type @STOP_FRAME@ and structure -@StgStopFrame@), and -@UPD_FRAME_BITMAP@ (for closure type @UPDATE_FRAME@ and structure -@StgUpdateFrame@). - -\begin{code} -/* rts/StgStdThunks.hc */ -#define RET_BITMAP 3 -/* rts/Exception.hc */ -#define CATCH_FRAME_BITMAP 15 -/* rts/StgStartup.hc */ -#define STOP_THREAD_BITMAP 3 -/* rts/updates.hc */ -#define UPD_FRAME_BITMAP 7 -\end{code} - -For most closure types, the new definition of @StgProfHeader@ is -automatically propagated to their corresponding structures. -However, there are six closures types which are not affected by -@StgProfHeader@. They are all stack closures: -@RET_DYN@, @RET_BCO@, @RET_SMALL@, @RET_VEC_SMALL@, @RET_BIG@, and -@RET_VEC_BIG@. -If you want a new field to be added to these closures, you may -have to modify their corresponding structures. - -\textbf{To do:} Presently the above changes introduce two bug in the -runtime system. -First, @nofib/real/symalg@ ends up with a division-by-zero -exception if we add a new field. -Second, the runtime system option @-auto-all@ clashes in some test files -in the @nofib@ testing suite (e.g., @spectral/expert@). - -\subsection{Initialization code} - -When a new closure is allocated, its retainer set field may have to be -initialized according to the way that retainer profiling is implemented. -For instance, we could use as an initial value a pointer to an empty retainer -set. -Alternatively we could assign a null pointer to indicate that its retainer -set has not been computed yet, which we adopt in our implementation. -In either case, we have to visit the new closure and execute some initialization -code on it so that its retainer set field is set to an appropriate value. - -There are three parts in the source code which need to be modified. -dynamic closure initialization, static closure initialization, -and update frame initialization. -The first is accomplished by modifying the macro @SET_PROF_HDR()@ (in -@include/ClosureMacros.h@). When a closure @c@ is created at runtime, -@SET_PROF_HDR()@ is invoked immediately with @c@ as its first argument. -Thus, the following code initializes the retainer set field of every -dynamic closure to a null pointer. - -\begin{code} -/* include/ClosureMacros.h */ -#define SET_PROF_HDR(c,ccs_) \ - ((c)->header.prof.ccs = ccs_, (c)->header.prof.rs = NULL) -\end{code} - -Similarly, the macro @SET_STATIC_PROF_HDR()@ (in the -same file) specifies how the retainer set field of every static closure -is initialized, which is rewritten as follows: - -\begin{code} -/* include/ClosureMacros.h */ -#define SET_STATIC_PROF_HDR(ccs_) \ - prof : { ccs : ccs_, rs : NULL }, -\end{code} - -\textbf{Obsolete:} Dynamic closures created through explicit C function invocations -(in @RtsAPI.c@) are now initialized by @SET_HDR()@. - -%There is another way of creating dynamic closures through explicit C -%function invocations at runtime. -%Such functions are all defined in @RtsAPI.c@: @rts_mkChar()@, @rts_mkInt()@, -%@rts_mkWord()@, and so forth. -%Each function allocates memory for a new closure, -%initializes it, and returns its address. -%Therefore, we can simply insert in each function another initialization code -%for retainer set fields. -%To this end, we define a macro @setRetainerField()@ and insert it -%in each function: -% -%\begin{code} -%#define setRetainerField(p) \ -% (p)->header.prof.rs = NULL -%\end{code} -% -%For instance, @rts_mkChar()@ is now defined as follows: -% -%\begin{code} -%/* RtsAPI.c */ -%HaskellObj -%rts_mkChar (HsChar c) -%{ -% StgClosure *p = (StgClosure *)allocate(CONSTR_sizeW(0,1)); -% ... -% setRetainerField(p); -% return p; -%} -%\end{code} - -Finally we may need to initialize the retainer set field of an update frame -(stack closure) when it is pushed onto the stack for the first time. -For instance, if we want to initialize the retainer set field of update -frames to a null pointer, we can rewrite the macro @PUSH_STD_CCCS()@ -(in @includes/Updates.h@) as follows: - -\begin{code} -/* includes/Updates.h */ -#define PUSH_STD_CCCS(frame) \ - (frame->header.prof.ccs = CCCS, frame->header.prof.rs = NULL) -\end{code} - -In our implementation of retainer profiling, however, the retainer set field is not -used for any stack closure. -Hence, the above modification is entirely unnecessary. -Also, update frames are the only exception to the standard way of creating -stack closures: all the other types of stack closures with a retainer set -field are eventually initialized by -the macro @SET\_HDR()@ (in @includes/ClosureMacros.h@), which in turn -invokes @SET\_PROF\_HDR()@. This is not the case for update frames. -Compare @PUSH\_UPD\_FRAME()@ (in @includes/Updates.h@) and -@PUSH\_SEQ\_FRAME()@ (in @includes/StgMacros.h@) for clarification. - -\section{Retainer Sets} - -At the end of retainer profiling, every live closure (except stack -closures, for which we do not compute retainer sets) is associated with -a retainer set; there can be no closure without an associated retainer set -because every live closure is visited during traversal. -Since many closures may well be associated with a common retainer set, -we want to avoid creating any particular retainer set more than once. -This section presents the details of manipulating retainer sets in our -implementation. - -\subsection{Identity of a retainer} - -The function @R()@ in Figure~\ref{fig-retaineralgorithm} returns -the identity of a retainer. In order to implement it, we need -a type for retainer identity. -The type @retainer@ (in @includes/StgRetainerProf.h@) is introduced for -this purpose. - -There are various ways of defining the type @retainer@. -For instance, we can designate the information table address of a retainer as -its identity as follows: - -\begin{code} -struct _StgInfoTable; -typedef struct _StgInfoTable *retainer; -\end{code} - -We can also use the cost centre stack associated with the retainer: - -\begin{code} -typedef CostCentreStack *retainer; -\end{code} - -The function @R()@ is embodied as the function @getRetainerFrom()@ in the -implementation, whose type is @(StgClosure *)@ $\rightarrow$ @retainer@. -It is straightforward to define @getRetainerFrom()@ according to the definition -of @retainer@, as illustrated below: - -\begin{code} -retainer getRetainerFrom(StgClosure *c) { return get_itbl(c); } -retainer getRetainerFrom(StgClosure *c) { return c->header.prof.ccs; } -\end{code} - -\subsection{Retainer sets and the cost function} - -A retainer set is stored in the structure @retainerSet@ -(in @includes/StgRetainerProf.h@): - -\begin{code} -typedef struct _retainerSet { - nat num; - nat cost; - ... - int id; - retainer element[0]; -} retainerSet; -\end{code} - -The field @num@ gives the number of retainers in the retainer set, which -are all stored in the array @element[]@. Thus, the size of @element[]@ -is assumed to be @num@. -The field @cost@ gives the sum of the \emph{costs} of those closures -associated with the retainer set: if a closure @c@ is -associated with the retainer set, that is, if @c@ is retained by each -retainer in the retainer set and none else, -the cost of @c@ is added to the field @cost@. -The field @id@ gives a unique identification number for the retainer set. - -The interface to @retainerSet@ is as follows -(see @RetainerSet.h@): - -\begin{description} -\item[@void initializeAllRetainerSet(void)@] initializes the store for retainer sets. -\item[@void refreshAllRetainerSet(void)@] refreshes each retainer set by setting -its @cost@ field to zero. This function does destroy any retainer set. -\item[@void closeAllRetainerSet(void)@] destroys all retainer sets and closes -the store for retainer sets. -\item[@retainerSet *singleton(retainer r)@] returns a singleton retainer set -consisting of @r@ alone. If such a retainer set already exists, no new retainer -set is created. Otherwise, a new retainer set is created. -\item[@retainerSet *addElement(retainer r, retainerSet *rs)@] returns a retainer set -@rs@ augmented with @r@. If such a retainer set already exists, no new retainer set -is created. Otherwise, a new retainer set is created. -\item[@rtsBool isMember(retainer r, retainerSet *rs)@] returns a boolean value -indicating whether @r@ is a member of @rs@. -\item[@void traverseAllRetainerSet(void (*f)(retainerSet *))@] invokes the function -@f@ on every retainer set created. -\item[@void printRetainerSetShort(FILE *, retainerSet *)@] prints a single retainer -set. -\item[@void outputRetainerSet(FILE *, nat *allCost, nat *numSet)@] prints all -retainer sets. Stores the sum of all their costs in @*allCost@ and the number -of them in @*numSet@. -\item[@void outputAllRetainerSet(FILE *)@] prints all retainer sets. -\end{description} - -We also define a \emph{cost function}, which returns the cost of a given closure, -in order to compute the field @cost@. -The cost function can be defined in several ways. -A typical definition is on the size of a closure, which results in -the field @cost@ accumulating the size of all the closures retained by a -retainer set. -If we just want to count the number of closures retained by the -retainer set, we can simply set the cost of every closure to one regardless -of its closure type. -Furthermore, we can define the cost function flexibly according to -the closure type. -For instance, we can set the size of any static closure to zero so that -it is not taken into account at all in computing the field @cost@. -Notice that static closures are also visited during traversal because they -may lead to other dynamic closures (e.g., static indirection closures of -closure type @IND_STATIC@). -This is especially desirable because we usually focus on the heap use. -We can also selectively choose certain dynamic closure types not to contribute -to the field @cost@. - -In our implementation, there are two functions related with the cost function: -@cost()@ and @costPure()@. -@cost()@ returns the size of the entire memory allocated for a given closure -(even including the two fields in the structure @StgProfHeader@). -It returns zero for static closures. -@costPure()@ returns the size of the memory which would be allocated for -a given closure with no profiling. -It is defined in terms of @cost()@, and it suffices to change only @cost()@ -when a new scheme for the cost function is desired. -@costPure()@ is put to actual use in computing the field @cost@ because it -effectively hides the memory overhead incurred by profiling. - -\subsection{Implementation} - -The algorithm for retainer profiling in Figure~\ref{fig-retaineralgorithm} -adds at most one retainer to an existing retainer set (or an empty retainer set) -at any moment; it does not require a retainer set union operation. -This observation simplifies the implementation, and -we employ the following two functions for creating new retainer sets: -@singleton()@, which creates a singleton retainer set, and -@addElement()@, which adds an element to an existing retainer set. - -It is a frequent operation during retainer profiling to search for a retainer -set, which may or may not exist, built from a given retainer set and a -particular retainer. -To efficiently implement this operation, -we choose to store all retainer sets in a hash table and -the structure @retainerSet@ is now extended with two new fields -@hashKey@ and @link@. -The field @hashKey@ stores the hash key which is obtained -from the retainers in a retainer set. -The field @link@ points to the next retainer set in the same bucket: - -\begin{code} -typedef struct _retainerSet { - ... - StgWord hashKey; - struct _retainerSet *link; - ... -} retainerSet; -\end{code} - -The hashing function must be defined in such a way that a retainer set -can have only one unique hash key regardless of the order its elements -are stored, i.e., the hashing function must be additive. - -It is often observed that two successive executions of retainer profiling share -a number of retainer sets in common, especially if the two executions are -close in time. -This also implies that the number of all retainer sets which can be created -at any moment does not grow indefinitely regardless of the interval at which -retainer profiling is performed; it does not grow commensurately with the -number of times retainer profiling is executed. -This observation eliminates the need to free the memory allocated for -retainer sets; we can simply set the @cost@ field of every retainer set -to zero after each retainer profiling and reuse it during the next time. - -\section{Graph Traversal} - -At the heart of retainer profiling lies \emph{graph traversal}; -the algorithm in Figure~\ref{fig-retaineralgorithm} is supposed to visit -every closure in the graph at least once and yield statistics on the heap use. -Since only live closures are reachable from the root, the algorithm -does not deal with dead closures. - -This section presents details on how to achieve an efficient implementation of -graph traversal without incurring extra memory overhead and compromising speed. - -\subsection{Goal} - -Traversing a graph itself can be done in a straightforward way; -we choose either depth first search or breadth first search, and traverse -the graph starting from a given set of roots. -After a complete traversal, each live closure @c@ (including static closures) -has an associated retainer set, whose address is stored in the field -@c->header.prof.rs@. - -A real complication arises when retainer profiling is performed once again: -all live closures which have survived all garbage collections since -the previous retainer profiling -still have an associated retainer set (indicated by -a non-null pointer in their retainer set field), which is no longer -valid. Any new closure created since then has -a null pointer in its retainer set field at the beginning of retainer -profiling and will become associated with a retainer set. -Thus, we can no longer distinguish valid retainer set fields -from invalid ones. - -A simple remedy is to linearly scan the heap at the beginning of each -retainer profiling and set all retainer set fields to a null pointer. -It resets the retainer set field of each dynamic closure, whether it is -live or not with respect to the given set of root. -This is feasible because any closure in the heap directly adjoins the -next closure, if any. -The problem is that we have no way of visiting all live static closures, -for which we compute retainer sets. - -A moment of thought, however, reveals that we can completely avoid computing -retainer sets for static closures. This is because retainer profiling is -concerned only about the heap, which consists of dynamic closures and no -static closures. In other words, we can treat every static closure as -a bridge connecting two dynamic closures. -For instance, if a dynamic closure @c@$_1$ has a pointer to a static -closure @s@ and @c@ has a pointer to another dynamic closure @c@$_2$, -we can think of the pointer in @c@$_1$ as a direct pointer to @c@$_2$. -The big problem is that if the graph has a cycle containing static closures, -an infinite loop occurs. In other words, we have no way of telling whether -a static closure has been visited or not and are forced to compute -retainer sets for static closures as well.\footnote{For instance, -a static closure is allowed to have a self-reference in its SRT, which -is also followed during retainer profiling.} - -Another remedy is to stores in every closure a time stamp for the -retainer set field. The time stamp indicates whether the retainer -set field is valid or no longer valid (i.e., it is for the previous -retainer profiling). -At the cost of one extra field in each closure, we can achieve an -elegant implementation with little complication. -However, it turns out that the memory overhead is too big.\footnote{A typical -dynamic closure is only two or three words long.} -Thus, our goal is to stick to the definition of the structure @StgProfHeader@ -given earlier and yet to achieve an elegant solution. - -\subsection{Basic plan} - -Since we visit every live object and update its retainer set field, -any retainer set field can either be valid (the corresponding retainer -set is valid) or point to a retainer set created during the previous -retainer profiling. -In order to distinguish valid retainer set fields -from invalid ones, we exploit the least significant bit of the retainer -set field: we maintain a one bit mark which flips over every time -retainer profiling is performed, and judge that a retainer set field is -valid only if its least significant bit matches the mark. -The variable @flip@ serves for this purpose. -The macros @isRetainerSetFieldValid()@ tests if the retainer set field -of a give closure @c@ is valid: - -\begin{code} -#define isRetainerSetFieldValid(c) \ - ((((StgWord)(c)->header.prof.rs & 1) ^ flip) == 0) -\end{code} - -As an example, a retainer set field can be set to a null value conforming -the current value of @flip@ by the macro @setRetainerSetToNull()@: - -\begin{code} -#define setRetainerSetToNull(c) \ - (c)->header.prof.rs = (retainerSet *)((StgWord)NULL | flip) -\end{code} - -Now, when a dynamic closure @c@ is created, its retainer set field is -initialized to a null value conforming to the current value of -@flip@:\footnote{Actually this is not mandatory: even when the null -value does not conform to the current value of @flip@, it will be replaced -by a correct null value when @c@ is visited later.} - -\begin{code} -extern StgWord flip; -#define SET_PROF_HDR(c,ccs_) \ - ((c)->header.prof.ccs = ccs_, (c)->header.prof.rs = (retainerSet *)((StgWord)NULL | flip)) -\end{code} - -We do not need to revise @SET_STATIC_PROF_HDR()@ if the initial value of -@flip@ is set to $0$.\footnote{For the same reason, an initial value $1$ -does not compromise the correctness of the implementation.} - -\subsection{Set of roots} - -The set of roots consists of all thread closures (running, sleeping, or -blocked) existing at the beginning of a retainer profiling. -It is handily obtained in an indirect way by invoking the function -@GetRoots()@ (in @Schedule.c@) with an appropriate argument, which must be -a function: -@GetRoots()@ invokes on each root known to the runtime system its argument. -Thus, we implement a function @retainClosure()@, which initiates traversal -from a given root and updates the retainer set of every closure reachable -from the root, -and invokes @GetRoots()@ with @retainClosure@ as an argument. - -In addition to the thread closures, weak pointers are also considered -as roots; they may not be reachable from any thread closure yet are still -being in used. -A weak pointer has three pointer fields: @key@, @value@, and -@finalizer@ (see the structure @StgWeak@ in @includes/Closures.h@). -It turns out that these pointers may not be valid all the time: -at a certain point during execution, for instance, the pointer @key@ may point -to a dead closure. -However, right after a major garbage collection, all the three pointers are -guaranteed to be valid, i.e., they all point to live closures. -This facilitates the handling of weak pointers if we choose to -perform retainer profiling immediately after a major garbage collection. -All weak pointers are found in the linked list @weak_ptr_list@ -(in @Weak.c@). - -See the function @computeRetainerSet()@ for details. - -\subsection{Static closures} - -When a live dynamic closure @c@ is visited for the first time during traversal, -its retainer set field is checked against the current value of @flip@. -If it was created at some point since the previous retainer profiling, -its retainer set field is already set to a correct null value. -Otherwise, it must have been visited -during the previous retainer profiling and thus its retainer set field is -invalid and will be set to a correct null value. -Therefore it is unnecessary to visit all dynamic closures and set their -retainer set field to a correct null value at the beginning of each retainer -profiling. - -However, this operation is required for static closures. -The reason is that a static closure, which is never garbage collected, -may appear alternately in the set of live closures. -In other words, a currently live static closure may become dead and -be resuscitated again. -Therefore, for a static closure, it does not help to check if its -retainer set field conforms to the current value of @flip@. -For instance, -if a static closure happens to belong to the set of live closures every other -retainer profiling, its retainer set field will never set to a null value, -which is disastrous. -Therefore, we choose to visit all live static closures at the beginning -of each retainer profiling and set their retainer set field to a -correct null value. - -In order to find all live static closures, we have each retainer -profiling preceded by a major garbage collection, which knows all live -static closures.\footnote{This is a heavy -restriction on retainer profiling, which makes retainer profiling partially -dependent on garbage collection. -However, it does not affect any retainer profiling result because -retainer profiling considers only live closures, which survive any -garbage collection.} -To be specific, the garbage collector builds a linked list -@scavenged_static_objects@ (in @GC.c@) during a major garbage collection, -which stores all live static closures of our interest. -\footnote{ -A static closure of closure type @IND\_STATIC@ may be put in the -list @mut\_once\_list@ of the oldest generation, instead of the list -@scavenged\_static\_objects@. -In our implementation, such a closure is just skipped over because it -contains only a pointer to a dynamic closure, and we do not compute -its retainer set. -Thus, there is no need to traverse the list @mut\_once\_list@ of the oldest -generation.} -Since it destroys the linked list after finishing the major garbage collection -(by invoking the function @zero_static_object_list()@ with -@scavenged_static_objects@ as its argument), -we traverse the linked list to set the retainer set field of each -live static closure to a correct null value before its destruction. -This is done by invoking the function -@resetStaticObjectForRetainerProfiling()@. - -\textbf{To do:} In the current implemenation, if a static closure has no child -(e.g., @CONSTR_NOCAF_STATIC@, @THUNK_STATIC@ with an empty SRT, and -@FUN_STATIC@ with an empty SRT), we do not compute its retainer set (because -there is no need to do). This slight optimization allows us to render -retainer profiling no longer dependent on garbage collection due to the -following propoerty: - -\begin{center} -A static closure can alternately appear and disappear in the set of live -closures across multiple executions of retainer profiling if and only if -it has an empty SRT and no child. -\end{center} - -Then we can completely eliminate the function -@resetStaticObjectForRetainerProfiling()@. - -\subsection{Traversal} - -The traversal proceeds in a depth first manner and is implemented -with two mutually recursive functions: @retainStack()@ and @retainerClosure()@. -@retainerStack()@ can be invoked on dynamic closures holding a stack chunk: -closure types @TSO@, @PAP@, and @AP_UPD@. -It in turn invokes @retainerClosure()@ on each closure reachable from -stack closures in the stack chunk. Notice that it does not invoke -@retainerClosure()@ on those stack closures because we do not compute -retainer sets for stack closures. -@retainerClosure()@ iteratively traverses all live closures reachable -from a given closure. -It maintains its own stack to record the next scan position in every closure -currently under consideration.\footnote{A recursive version of -@retainerClosure()@ could be implemented easily. -@retainerClosure()@ in our implementation is an iterative version.} -When it encounters a closure holding a stack chunk, it invokes @retainerStack()@ -on that closure. -Hence, -the traversal is triggered simply by invoking @retainerClosure()@ on every root. - -\textbf{To do:} -The correctness of retainer profiling is subject to the correctness -of the two macros @IS_ARG_TAG()@ and @LOOKS_LIKE_GHC_INFO()@ -(see @retainStack()@). Since -@LOOKS_LIKE_GHC_INFO()@ is a bit precarious macro, so I believe that -the current implementation may not be quite safe. Also, @scavenge_stack()@ -in @GC.c@ also exploits this macro in order to identify shallow pointers. -This can be a serious problem if a stack chunk contains some -word which looks like a pointer but is actually not a pointer. - -\subsection{Sanity check} - -Since we assume that a retainer profiling is preceded by a major garbage -collection, -we expect that the size of all the dynamic closures visited during -any retainer profiling adds up exactly to the total size of the heap. -In fact, this is not the case; there can be closures not reachable from -the set of roots yet residing in the heap even after a major garbage -collection. - -First, a dead weak pointer remains in the heap until its finalizer -finishes. Although its finalizer thread closure is part of the set of roots, -the dead weak pointer itself is not reachable from any root. -Since it cannot be visited during retainer profiling anyhow, we do not -need to located it and set its retainer set field -appropriately.\footnote{Dead weak pointers are identified with their -information table @stg\_DEAD\_WEAK\_info@ (in @StgMiscClosures.hc@). -Notice that their closure type is @CONSTR@, \emph{not} @WEAK@; -their information table is replaced by @stg\_DEAD\_WEAK\_info@ in the -function @scheduleFinalizers()@ (in @GC.c@).} - -Second, -mutable variables (of closure type @MUT_VAR@) may remain in the heap -even when they are not reachable from the set of roots while -dynamic closures pointed to by them must be live.\footnote{I do not -understand clearly why this happens :(} -Since such mutable variables may become live again (in the sense that -they become reachable from the set of roots), we must locate them -and set their retainer set field appropriately after each retainer -profiling. This is handily accomplished by traversing the list -@mut_once_list@ in every generation. - -\section{Retainer Profiling Schemes} - -A retainer profiling scheme specifies \emph{what} retainer profiling -yields (as opposed to \emph{how} retainer profiling computes the retainer -set for every live object). -It is determined primarily by the meaning of retainer identity, -that is, the type @retainer@ (in @includes/StgRetainerProf.h@). -The function @getRetainerFrom()@ must be defined according to the -definition of the type @retainer@. - -In order for a new retain profiling scheme to fully work, we need to follow -four steps: - -\begin{enumerate} -\item Define the type @retainer@ as desired. -\item Write @getRetainerFrom()@. -\item Write two hashing functions @hashkeySingletone()@ and - @hashKeyAddElement()@, which return the hash key from a single - retainer and a retainer set with another retainer, respectively. -\item Write two printing functions @printRetainer()@ and - @printRetainerSetShort()@. - These functions are employed when a retainer or a retainer set is - printed in the output file. -\end{enumerate} - -In our implementation, we use cost centre stacks for retainer identity: - -\begin{code} -typedef CostCentreStack *retainer; -\end{code} -\begin{code} -retainer getRetainerFrom(StgClosure *c) { return c->header.prof.ccs; } -\end{code} -\begin{code} -void printRetainer(FILE *f, retainer cc) -{ - fprintf(f,"%s[%s]", cc->label, cc->module); -} -\end{code} - -\textbf{To do:} All the closures created by @rts_mk...()@ in @RtsAPI.c@ are given -@CCS_SYSTEM@ as their cost centre stacks. This may not be accurate indeed, -and, for instance, @CCCS@ may be a better choice than @CCS_SYSTEM@. - -\section{Usage} - -Since cost centre stacks are used as retainer identity, a source program -must be given proper cost centre annotations by programmers. -Alternatively, -we can ask the compiler to automatically insert cost centre annotations. -For instance, the compiler option @-auto-all@ inserts a cost centre -annotation around every top-level function as shown below -(the @-p@ option is a must -because we must build the executable file in a profiling way): - -\begin{code} -$ ghc-inplace -o Foo.out -p -auto-all Foo.hs -\end{code} - -The runtime system option @-hR@ tells the executable program to -gather profiling statistics and report them in a @.prof@ file: - -\begin{code} -$ Foo.out +RTS -hR -RTS -\end{code} - -The option @-i@ can be used to -specify a desired interval at which retainer profiling is performed. -The default and minimum value is half a second: - -\begin{code} -$ Foo.out +RTS -hR -i2.5 -RTS -\end{code} - -Then, two text files are generated: a @.prof@ file and a @.hp@ file. -The @.prof@ file records the progress of retainer profiling: -for each retainer profiling performed during program execution, -it shows -the Haskell mutator time (as opposed to the user time) at which -the retainer profiling starts, -the average number of times a closure is visited, -the sum of costs assigned to all retainer sets (obtained from the field -@cost@ in each retainer set), -and the number of all retainer sets created \emph{since} the beginning -of program execution. -A typical entry in a @.prof@ file looks like: - -\begin{code} -Retainer Profiling: 3, at 3.530000 seconds - Average number of visits per object = 1.687765 - Current total costs = 801844 - Number of retainer sets = 118 -\end{code} - -The sum of costs assigned to all retainer sets may \emph{not} be equal to the -size of the heap. -The discrepancy is attributed to those live object which are not reachable -from the set of roots. -Still it is a good estimate of the size of the heap at the moment when -the retainer profiling was performed. - -The @.prof@ file also shows the contents of every retainer set which -has been assigned a positive cost (i.e., the field @cost@) at least once; -not every retainer set created is assigned a positive cost because quite -a few retainer sets are created as intermediate retainer sets before -creating a real retainer set. This results from the restriction on the way -retainer sets are created (only one retainer can be added to an existing -retainer set at a time). - -An example of the contents of a retainer set is: - -\begin{code} -SET 71 = {, } -\end{code} - -The retainer set has an identification number $71$. -It is associated with two retainers, whose retainer identities are shown -inside angle brackets @<...>@. -For instance, the first retainer is created when the cost centre stack -is @doFile[Main],main[Main],MAIN[MAIN]@, shown from the top to the bottom. -Each entry in angle brackets consists of a cost centre name (e.g., @doFile@) -and its module name (e.g., @Main@). - -The @.hp@ file can be supplied to the @hp2ps@ program to create a postscript -file showing the progress of retainer profiling in a graph: - -\begin{code} -$ hp2ps Foo.hs -$ gv Foo.ps -\end{code} - -An example of such a graph is shown in Figure~\ref{fig-cacheprof}. -It shows the cost assigned to each retainer set at the point -when a retainer profiling is performed (marked by a corresponding inverted -triangles on the horizontal axis). -The abbreviated contents of each retainer set is displayed in the right column. -Due to the space limitation, -it shows only topmost cost centres (without module names) -instead of printing the full contents. -For instance, @(71)doFile,synth_2@ corresponds to a retainer set shown above -(@71@ is its identification number). -The contents may be truncated if it is too long. - -Notice that the time is in the Haskell mutator time, which excludes -the runtime system time such as garbage collection time and retainer profiling -time. Thus, the actual execution takes longer than indicated in the -graph. Also, the timer employed to periodically perform retainer profiling -is not perfectly accurate. Therefore, the result may slightly vary for each -execution of retainer profiling. - -\begin{figure}[ht] -\centering -\epsfig{file=cacheprof_p.eps,width=5in} -\caption{A graph showing the progress of retainer profiling} -\label{fig-cacheprof} -\end{figure} - -\section{Comparision with nhc} - -\section{Files} - -This section gives a summary of changes made to the GHC in -implementing retainer profiling. -Only three files (@includes/StgRetainerProf.h@, @RetainerProfile.c@, and -@RetainerProfile.h@) are new, and all others exist in the GHC. - -@\includes@ directory: - -\begin{description} -\item[StgRetainerProf.h] defines types @retainer@ and @retainerSet@. -\item[Stg.h] includes the header file @StgRetainerProf.h@. -\item[Closures.h] changes structure @StgProfHeader@. -\item[Constants.h] changes constants @PROF_HDR_SIZE@, @SCC_UF_SIZE@, and - @SCC_SEQ_FRAME_SIZE@. -\item[ClosureMacros.h] changes macros @SET_PROF_HDR()@ and - @SET_STATIC_PROF_HDR()@. -\item[Updates.h] changes macro @PUSH_STD_CCCS()@. -\end{description} - -@\rts@ directory: - -\begin{description} -\item[Exception.hc] changes constant @CATCH_FRAME_BITMAP@, -\item[StgStartup.hc] changes constant @STOP_THREAD_BITMAP@. -\item[StgStdThunks.hc] changes constant @RET_BITMAP@. -\item[Updates.hc] changes constant @UPD_FRAME_BITMAP@. -\item[RetainerProfile.c] implements the retainer profiling engine. -\item[RetainerProfile.h] is the header for @RetainerProfile.c@. -\item[RetainerSet.c] implements the abstract datatype @retainerSet@. -\item[RetainerSet.h] defines the interface for @retainerSet@. -\item[GC.c] invokes @resetStaticObjectForRetainerProfiling()@ in - @GarbageCollect()@. -\item[Itimer.c] changes @handle_tick()@. -\item[ProfHeap.c] changes @initHeapProfiling()@ and @endHeapProfiling()@. -\item[Profiling.c] changes @initProfilingLogFile()@ and - @report_ccs_profiling()@. -\item[Proftimer.c] declares @ticks_to_retainer_profiling@, - @performRetainerProfiling@, and @doContextSwitch@. -\item[Proftimer.h] is the header for @Proftimer.c@. Defines @PROFILING_MIN_PERIOD@, - which specifies the minimum profiling period and the default profiling period. -%\item[RtsAPI.c] implements @setRetainerField()@. -\item[RtsFlags.c] - sets @RtsFlags.ProfFlags.doHeapProfile@ and - adds a string to @usage_text[]@ in @setupRtsFlags()@. -\item[RtsFlags.h] defines constants @HEAP_BY_RETAINER@ and @RETAINERchar@. -\item[RtsStartup.c] includes the header file @RetainerProfile.h@. - Changes @shutdownHaskell()@. -\item[Schedule.c] changes @schedule()@. -\item[Stats.c] - declares @RP_start_time@, @RP_tot_time@, @RPe_start_time@, - @RPe_tot_time@. - Changes @mut_user_time_during_GC()@, @mut_user_time()@, - @stat_startExit()@, - @stat_endExit()@, and - @stat_exit()@. - Defines - @mut_user_time_during_RP()@, - @stat_startRP()@, and - @stat_endRP()@. -\item[Stats.h] is the header for @Stats.c@. -\item[StgMiscClosures.hc] redefines @stg_DEAD_WEAK_info@. -\item[Storage.c] changes @initStorage()@, @memInventory()@. -\end{description} - -\bibliographystyle{plain} -\bibliography{reference} - -\end{document}