X-Git-Url: http://git.megacz.com/?p=ghc-hetmet.git;a=blobdiff_plain;f=docs%2Fstorage-mgt%2Fldv.tex;fp=docs%2Fstorage-mgt%2Fldv.tex;h=936407c7012d4c6c049f09547497be90f3532c7d;hp=0000000000000000000000000000000000000000;hb=0065d5ab628975892cea1ec7303f968c3338cbe1;hpb=28a464a75e14cece5db40f2765a29348273ff2d2 diff --git a/docs/storage-mgt/ldv.tex b/docs/storage-mgt/ldv.tex new file mode 100644 index 0000000..936407c --- /dev/null +++ b/docs/storage-mgt/ldv.tex @@ -0,0 +1,695 @@ +\documentclass{article} +\usepackage{code,a4wide} + +\usepackage{graphics,epsfig,epic,eepic,epsfig} + +\setlength{\parskip}{0.25cm} +\setlength{\parsep}{0.25cm} +\setlength{\topsep}{0cm} +\setlength{\parindent}{0cm} +\renewcommand{\textfraction}{0.2} +\renewcommand{\floatpagefraction}{0.7} + + +% Terminology +\newcommand{\block}{block} +\newcommand{\Block}{Block} +\newcommand{\segment}{segment} +\newcommand{\Segment}{Segment} +\newcommand{\step}{step} +\newcommand{\Step}{Step} + +\newcommand{\note}[1]{{\em $\spadesuit$ #1}} + +\begin{document} +\title{Implementation of Lag/Drag/Void/Use Profiling} +\author{Sungwoo Park \\ Simon Marlow} + +\makeatactive +\maketitle + +\section{Lag/Drag/Void/Use Profiling} + +\emph{Lag/Drag/Void/Use} (LDVU) profiling~\cite{RR} is a profiling technique +which yields a summary of the biography of all the dynamic closures created +during program execution. +In this profiling scheme, +the biography of a closure is determined by four important events associated +with the closure: \emph{creation}, \emph{first use}, +\emph{last use}, and \emph{destruction} (see Figure~\ref{fig-ldv}). +The intervals between these successive events correspond to three phases +for the closure: \emph{lag} (between creation and first use), +\emph{use} (between first use and last use), and +\emph{drag} (between last use and destruction). +If the closure is never used, it is considered to remain in the \emph{void} +phase all its lifetime. + +\begin{figure}[ht] +\begin{center} +\input{ldv.eepic} +\caption{The biography of a closure} +\label{fig-ldv} +\end{center} +\end{figure} + +The LDVU profiler regularly performs heap censuses during program execution. +Each time a heap census is performed, the LDVU profiler increments a global +time, which is used for timing all the events (such as creation and destruction +of a closure) occurring during program execution. +Hence, for instance, all closures creating between two successive heap censuses +have the same creation time and belong to the same \emph{generation}.\footnote{In +this document, a generation is related with heap censuses, not garbage collections +as in other profiling schemes.} +After the program terminates, it yields a post-mortem report on how much +of the \emph{live} graph is in one of the four phases at the moment of each +heap census. + +It must be emphasized that the LDVU profiler considers only live closures; +it should not take into consideration dead closures which do not constitute +the graph. Therefore, the result of LDVU profiling does not depend on the +frequency of garbage collections. + +This document describes the implementation of LDVU profiling on the Glasgow +Haskell Compiler runtime system.\footnote{Unless otherwise noted, all identifiers +are defined in @LdvProfile.c@}. + +\section{An Overview of the Implementation} + +Every closure is augmented with an additional word in its profiling header +to accommodate three additional pieces of information: +1) state flag indicating whether the closure has been used at least once or not. +2) creation time; 3) time of most recent use if any so far. +We refer to such a word as an LDV word. + +The LDVU profiler maintains a global time, stored in @ldvTime@. +It is incremented each time a heap census is performed. +During a heap census, the profiler scans all live closures and computes the +following: +1) the total size of all closures which have never been used; +2) the total size of all closures which have been used at least once +in the past.\footnote{There is another category of closures, namely, +\emph{inherently used} closures. We will explain +in Section~\ref{sec-heap-censuses}.} +It is not until the whole program execution finishes that the profiler +can actually decide the total size corresponding to each of the four phases for +a particular heap census. It is only when a closure is destroyed that the profiler +can determine how long the closure has been in a specific phase. +Therefore, it is not sufficient to perform heap censuses periodically in order to +compute the profiling statistics: the runtime system needs to intercept +all events associated with any closures and update necessary information. + +All events associated with closures are handled by one of the three +macros defined +in @includes/StgLdv.h@: @LDV_recordCreate()@, @LDV_recordUse()@, and +@LDV_recordDead()@. + +\begin{itemize} +\item{@LDV_recordCreate()@} is called when a closure is created and updates its +creation time field. + +\item{@LDV_recordUse()@} is called when a closure is used and updates its most recent +use time field. + +\item{@LDV_recordDead()@} is called when a closure @c@ is removed from the graph. +It does not update its LDV word (because @c@ is about to be destroyed). +Instead, it updates the statistics on LDVU profiling according to the following +observation: +if @c@ has never been used (which is indicated by the state flag in its LDV +word), +@c@ contributes to the void phase from its creation time to the last census +time; if @c@ was used at least once (which is also indicated by the state flag), +@c@ contributes to the @drag@ phase after its last use time. +\end{itemize} + +At the end of the program execution, the profiler performs a last census during +which all closures in the heap are declared to be dead and @LDV_recordDead()@ +is invoked on each of them. +Then, the profiler computes the final statistics. + +\section{LDV Words} + +We choose to share the LDV word for both retainer profiling and LDVU +profiling, which cannot take place simultaneously. +This is the reason why there is a +union structure inside the @StgProHeader@ structure. +The field @hp.ldvw@ in the @StgProfHeader@ structure corresponds to the LDV +word: +\begin{code} +typedef struct { + ... + union { + retainerSet *rs; // Retainer Set + StgWord ldvw; // Lag/Drag/Void Word + } hp; +} StgProfHeader; +\end{code} +For instance, the LDV word of a closure @c@ can now be accessed with +@c->header.prof.hp.ldvw@ (or by @LDVW(c)@ where @LDVW()@ is a macro in +@includes/StgLdvProf.h@). + +An LDV word is divided into three fields, whose position is specified +by three constants in @includes/StgLdvProf.h@: +\begin{itemize} +\item{@LDV_STATE_MASK@} corresponds to the state flag. +\item{@LDV_CREATE_MASK@} corresponds to the creation time. +\item{@LDV_LAST_MASK@} corresponds to the most recent use time. +\end{itemize} +The constant @LDV_SHIFT@ specifies how many bits are allocated for +creation time or most recent use time. +For instance, the creation time of a closure @c@ can be obtained by +@(LDVW(c) & LDV_CREATE_MASK) >> LDV_SHIFT@. + +The creation time field and the most recent use time field can be set only by the +macros @LDV_recordCreate()@ and @LDV_recordUse()@. +@LDV_recordCreate()@ must be called whenever a new dynamic closure is created, +and this is handily accomplished by rewriting the macro @SET_PROF_HDR()@ +(in @includes/ClosureMacros.h@) (we do not need to change @SET_STATIC_PROF_HDR()@ +because static closures are not involved in LDVU profiling at all): + +\begin{code} +#define SET_PROF_HDR(c,ccs_) \ + ((c)->header.prof.ccs = ccs_, \ + LDV_recordCreate((c))) +\end{code} + +There are a few cases in which the info table of a closure changes through +an explicit invocation of @SET_INFO()@ or a direct assignment to its @header.info@ +field: 1) an indirection closure is replaced by an old-generation +indirection closure; 2) a thunk is replaced by a blackhole; 3) a thunk is replaced +by an indirection closure when its evaluation result becomes available. + +\emph{We regard such a situation as +the destruction of an old closure followed by the creation of a new closure +at the same memory address.}\footnote{This would be unnecessary if the two closures +are of the same size, but it is not always the case. We choose to distinguish +the two closures for the sake of consistency.} +For instance, when an @IND_PERM@ closure is replaced by an @IND_OLDGEN_PERM@ +closures (during scavenging in @GC.c@), we wrap the invocation of @SET_INFO()@ with +the invocations of @LDV_recordDead()@ and @LDV_recordCreate()@ as follows +(@LDV_recordDead()@ requires the actual size of the closures being destroyed): + +\begin{code} + LDV_recordDead((StgClosure *)p, sizeofW(StgInd) - sizeofW(StgProfHeader)); + SET_INFO(((StgClosure *)p), &stg_IND_OLDGEN_PERM_info); + LDV_recordCreate((StgClosure *)p); +\end{code} + +\textbf{To do:} +A direct assignment to the @header.info@ field implies that its cost centre +field is not initialized. This is no problem in the case of @EVACUATED@ closures +because they will +not be used again after a garbage collection. However, I am not sure if this is safe +for @BLACKHOLE_BQ@ closures (in @StgMiscClosures.hc@) when retainer profiling, +which employs cost centre stacks, is going on. +If it is safe, please leave a comment there. + +@LDV_recordUse()@ is called on a closure whenever it is used, or \emph{entered}. +Its state flag changes if necessary to indicate that it has been used, and +the current global time is stored in its last use time field. + +\section{Global Time \texttt{ldvTime} and Retainer Profiling} + +The global time, stored in @ldvTime@, records the current time period. +It is initialized to $1$ and incremented after each time a heap census +is completed through an invocation of @LdvCensus()@. Note that each +value of @ldvTime@ represents a time \emph{period}, not a point in +time. + +All closures created between two successive invocations of +@LdvCensus()@ have the same creation time. If a closure is used at +least once between two successive heap censuses, we consider the +closure to be in the use phase during the corresponding time period +(because we just set its last use time field to the current value of +@ldvTime@ whenever it is used). Notice that a closure with a creation +time $t_c$ may be destroyed before the actual heap census for time +$t_c$ and thus may \emph{not} be observed during the heap census for +time $t_c$. Such a closure does not show up in the profile at all. + +In addition, the value of @ldvTime@ indicates which of LDVU profiling +and retainer profiling is currently active: during LDVU profiling, it +is initialized to $1$ in @initLdvProfiling()@ and then increments as +LDVU profiling proceeds; during retainer profiling, however, it is +always fixed to $0$. Thus, wherever a piece of code shared by both +retainer profiling and LDVU profiling comes to play, we usually need +to first examine the value of @ldvTime@ if necessary. For instance, +consider the macro @LDV_recordUse()@: + +\begin{code} +#define LDV_recordUse(c) \ + if (ldvTime > 0) \ + LDVW((c)) = (LDVW((c)) & LDV_CREATE_MASK) | ldvTime | LDV_STATE_USE; +\end{code} + +If retainer profiling is being performed, @ldvTime@ is equal to $0$, +and @LDV_recordUse()@ causes no side effect.\footnote{Due to this +interference with LDVU profiling, retainer profiling slows down a bit; +for instance, checking @ldvTime@ against $0$ in the above example +would always evaluate to @rtsFalse@ during retainer profiling. +However, this is the price to be paid for our decision not to employ a +separate field for LDVU profiling.} + +As another example, consider @LDV_recordCreate()@: + +\begin{code} +#define LDV_recordCreate(c) \ + LDVW((c)) = (ldvTime << LDV_SHIFT) | LDV_STATE_CREATE +\end{code} + +The above definition of @LDV_recordCreate()@ works without any problem +even for retainer profiling: during retainer profiling, +a retainer set field (@hp.ldvw@) must be initialized to a null pointer. +Since @ldvTime@ is fixed to $0$, @LDV_recordCreate()@ initializes +retainer set fields correctly. + +\section{Heap Censuses} +\label{sec-heap-censuses} + +The LDVU profiler performs heap censuses periodically by invoking the +function @LdvCensus()@. Because we need to know exactly which +closures in the heap are live at census time, we always precede the +census with a major garbage collection. + +During a census, we examine each closure one by one and compute the +following three quantities: + +\begin{enumerate} +\item the total size of all \emph{inherently used} closures. +\item the total size of all closures which have not been used (yet). +\item the total size of all closures which have been used at least once. +\end{enumerate} + +For most closures, a \emph{use} consists of entering the closure. For +unlifted objects which are never entered (eg. @ARR_WORDS@), it would +be difficult to determine their points of use because such points are +scattered around the implementation in various primitive operations. +For this reason we consider all unlifted objects as ``inherently +used''. The following types of closures are considered to be +inherently used: @TSO@, @MVAR@, @MUT_ARR_PTRS@, @MUT_ARR_PTRS_FROZEN@, +@ARR_WORDS@, @WEAK@, @MUT_VAR@, @MUT_CONS@, @FOREIGN@, @BCO@, and +@STABLE_NAME@. + +The three quantities are stored in an @LdvGenInfo@ array @gi[]@. +@gi[]@ is indexed by time period. For instance, @gi[ldvTime]@ stores +the three quantaties for the current global time period. The +structure @LdvGenInfo@ is defined as follows: + +\begin{code} +typedef struct { + ... + int inherentlyUsed; // total size of 'inherently used' closures + int notUsed; // total size of 'not used yet' closures + int used; // total size of 'used at least once' closures + ... +} LdvGenInfo; +\end{code} + +The above three quantities account for mutually exclusive sets of closures. +In other words, if a closure is not inherently used, it belongs to +either the second or the third. + +\subsection{Taking a Census of the Live Heap} + +During a heap census, we need to visit every live closure once, so we +perform a linear scan of the live heap after a major GC. We can take +advantage of the following facts to implement a linear scan for heap +censuses: + +\begin{itemize} +\item The nursery is empty. The small object pool and the large object pool, + however, may \emph{not} be empty. This is because the garbage collector + invokes @scheduleFinalizer()@ after removing dead closures, and + @scheduleFinalizer()@ may create new closures through @allocate()@. +\item @IND@, @IND_OLDGEN@, and @EVACUATED@ closures do not appear in +the live heap. +\end{itemize} + +There is one small complication when traversing the live heap: the +garbage collector may have replaced @WEAK@ objects with @DEAD_WEAK@ +objects, which have a smaller size and hence leave some space before +the next object. To avoid this problem we change the size of +@DEAD_WEAK@ objects to match that of @WEAK@ objects when profiling is +enabled (see @StgMiscClosures.hc@). + +\section{Destruction of Closures} + +In order to compute the total size of closures for each of the four +phases, we must report the destruction of every closure (except +inherently used closures) to the LDVU profiler by invoking +@LDV_recordDead()@. @LDV_recordDead()@ must not be called on any +inherently used closure because any invocation of @LDV_recordDead()@ +affects the statistics regarding void and drag phases, which no +inherently used closure can be in. + +@LDV_recordDead()@ updates two fields @voidNew@ and @dragNew@ in the +@LdvGenInfo@ array @gi[]@: + +\begin{code} +typedef struct { + ... + int voidNew; + int dragnew; + ... +} LdvGenInfo; +\end{code} + +@gi[ldvTime].voidNew@ accumulates the size of all closures satisfying +the following two conditions: 1) observed during the heap census at +time @ldvTime@; 2) now known to have been in the void phase at time +@ldvTime@. It is updated when a closure which has never been used is +destroyed. Suppose that a closure @c@ which has never been used is +about to be destroyed. If its creation time is $t_c$, we judge that +@c@ has been in the void phase all its lifetime, namely, from time +$t_c$ to @ldvTime@. Since @c@ will not be observed during the next +heap census, which corresponds to time @ldvTime@, @c@ contributes to +the void phase of times $t_c$ through @ldvTime@ - 1. Therefore, we +increase the @voidNew@ field of @gi[@$t_c$@]@ through @gi[ldvTime - 1]@ + by the size of @c@.\footnote{In the actual implementation, we +update @gi[$t_c$]@ and @gi[ldvTime]@ (not @gi[ldvTime@$ - $1@]@) only: +@gi[$t_c$]@ and @gi[ldvTime]@ are increased and decreased by the size +of @c@, respectively. After finishing the program execution, we can +correctly adjust all the fields as follows: @gi[$t_c$]@ is computed as +$\sum_{i=0}^{t_c}$@gi[$i$]@. } + +@gi[ldvTime].dragNew@ accumulates the size of all closures satisfying the following +two conditions: 1) observed during the heap census at time @ldvTime@; +2) now known to have been in the drag phase at time @ldvTime@. +It is updated when a closure which has been used at least once is destroyed. +Suppose that a closure @c@ which has been used last at time $t_l$ is about to +be destroyed. +We judge that @c@ has been in the drag phase from time $t_l + 1$ to +time @ldvTime@$ - 1$ (if $t_l + 1 > $@ldvTime@$ - 1$, nothing happens). +Therefore, we increase the @dragNew@ field of @gi[@$t_l + 1$@]@ through +@gi[ldvTime@$ - 1$@]@ +by the size of @c@.\footnote{As in the case of @voidNew@, we update +@gi[@$t_l + 1$@]@ and @gi[ldvTime]@ only.} + +Now we need to find out all the cases of closure destruction. +There are four cases in which a closure is destroyed: + +\begin{enumerate} +\item A closure is overwritten with a blackhole: + @UPD_BH_UPDATABLE()@ and @UPD_BH_SINGLE_ENTRY()@ in @includes/StgMacros.h@, + @threadLazyBlackHole()@ and @threadSqueezeStack()@ in @GC.c@, + the entry code for @BLACKHOLE@ closures in @StgMiscClosures.hc@ (a + @BLACKHOLE@ closure is changed into a @BLACKHOLE_BQ@ closure). + We call either @LDV_recordDead()@ or @LDV_recordDead_FILL_SLOP_DYNAMIC()@. + +\item A weak pointer is overwritten with a dead weak pointer: + @finalizzeWeakzh_fast()@ in @PrimOps.hc@, + @finalizeWeakPointersNow()@ and @scheduleFinalizers()@ in @Weak.c@. + Since a weak pointer is inherently used, we do not call @LDV_recordDead()@. + +\item A closure is overwritten with an indirection closure: + @updateWithIndirection()@ and @updateWithPermIndirection()@ in @Storage.h@, + @scavenge()@ in @GC.c@, in which an @IND_PERM@ closure is explicitly replaced + with an @IND_OLDGEN_PERM@ closure during scavenging. + We call either @LDV_recordDead()@ or @LDV_recordDead_FILL_SLOP_DYNAMIC()@. + +\item Closures are removed permanently from the graph during garbage +collections. We locate and dispose of all dead closures by linearly +scanning the from-space right before tidying up. This is feasible +because any closures which is about to be removed from the graph still +remains in the from-space until tidying up is completed. The next +subsection explains how to implement this idea. +\end{enumerate} + +\subsection{Linear scan of the from-space during garbage collections} + +In order to implement linear scan of the from-space during a garbage collection +(before tidying up), +we need to take into consideration the following facts: + +\begin{itemize} +\item The pointer @free@ of a block in the nursery may incorrectly point to +a byte past its actual boundary. +This happens because +the Haskell mutator first increases @hpLim@ without comparing it with the +actual boundary when allocating fresh memory for a new closure. +@hpLim@ is later assigned to the pointer @free@ of the corresponding memory +block, which means that during a heap census, the pointer @hpLim@ may not +be trusted. +Notice that @hpLim@ is not available during LDVU profiling; it is valid +only during the Haskell mutator time. + +\item The from-space may well contain a good number of @EVACUATED@ closures, +and they must be skipped over. + +\item The from-space includes the nursery. +Furthermore, a closure in the nursery may not necessarily be adjacent to the next +closure because slop words may lie between the two closures; +the Haskell mutator may allocate more space than actually needed in the +nursery when creating a closure, potentially leaving slop words. +\end{itemize} + +The first problem is easily solved by limiting the scan up to the +actual block boundary for each nursery block (see +@processNurseryForDead()@). In other words, for a nursery block +descriptor @bd@, whichever of @bd->start@$ + $@BLOCK_SIZE_W@ and +@bd->free@ is smaller is used as the actual boundary. + +We solve the second problem by exploiting LDV words of @EVACUATED@ +closures: we store the size of an evacuated closure, which now resides +in the to-space, in the LDV word of the new @EVACUATED@ closure +occupying its memory. This is easily implemented by inserting a call +to the macro @SET_EVACUAEE_FOR_LDV()@ in @copy()@ and @copyPart()@ (in +@GC.c@). Thus, when we encounter an @EVACUATED@ closure while +linearly scanning the nursery, we can skip a correct number of words +by referring to its LDV word. + +The third problem could be partially solved by always monitoring @Hp@ +during the Haskell mutator time: whenever @Hp@ is increased, we fill +with zeroes as many words as the change of @HP@. Then, we could skip +any trailing zero words when linearly scanning the nursery. +Alternatively we could initialize the entire nursery with zeroes after +each garbage collection and not worry about any change made to @Hp@ +during the Haskell mutator time. The number of zero words to be +written to the nursery could be reduced in the first approach, for we +do not have to fill the header for a new closure. Nevertheless we +choose to employ the second approach because it simplifies the +implementation code significantly (see @resetNurseries()@ in +@Storage.c@). Moreover, the second approach compensates for its +redundant initialization cost by providing faster execution (due to a +single memory write loop in contrast to frequent memory write loops in +the first approach). Also, we attribute the initialization cost to +the runtime system and thus the Haskell mutator behavior is little +affected. + +There is further complication though: occasionally a closure is +overwritten with a closure of a smaller size, leaving some slop +between itself and the next closure in the heap. There are two cases: + +\begin{enumerate} +\item A closure is overwritten with a blackhole. +\item A closure is overwritten with an indirection closure. +\end{enumerate} + +In either case, an existing closure is destroyed after being replaced +with a new closure. If the two closures are of the same size, no slop +words are introduced and we only need to invoke @LDV_recordDead()@ on +the existing closure, which cannot be an inherently used closure. If +not, that is, the new closure is smaller than the existing closure +(the opposite cannot happen), we need to fill one or more slop words +with zeroes as well as invoke @LDV_recordDead()@ on the existing +closure. The macro @LDV_recordDead_FILL_SLOP_DYNAMIC()@ accomplishes +these two tasks: it determines the size of the existing closure, +invokes @LDV_recordDead()@, and fills the slop words with zeroes. +After excluding all cases in which the two closures are of the same +size, we invoke @LDV_recordDead_FILL_SLOP_DYNAMIC()@ only from: + +\begin{enumerate} +\item @threadLazyBlackHole()@ and @threadSqueezeStack()@ in @GC.c@ +(for lazy blackholing), +\item @UPD_BH_UPDATABLE()@ and @UPD_BH_SINGLE_ENTRY()@ in +@includes/StgMacros.h@ (for eager blackholing, which isn't the +default), +\item @updateWithIndirection()@ and @updateWithPermIndirection()@ +in @Storage.h@.\footnote{Actually slop words created in +@updateWithIndirection()@ cannot survive major garbage collections. +Still we invoke @LDV\_recordDead\_FILL\_SLOP\_DYNAMIC()@ to support linear +scan of the heap during a garbage collection, which is discussed in the next +section.} +\end{enumerate} + +The linear scan of the from-space is initiated by the garbage +collector. From the function @LdvCensusForDead()@, every dead closure +in the from-space is visited through an invocation of +@processHeapClosureForDead()@. + +\subsection{Final scan of the heap} + +Since a closure surviving the final garbage collection is implicitly destroyed +when the runtime system shuts down, we must invoke @processHeapClosureForDead@ +on \emph{every} closure in the heap once more after the final garbage collection. +The function @LdvCensusKillAll()@, which is invoked from @shutdownHaskell()@ +in @RtsStartup.c@, traverses the entire heap and visits each closure. +It also stops LDVU profiling by resetting @ldvTime@ to $0$. + +It may be that after LDVU profiling stops, new closures may be created +and even garbage collections may be performed. +We choose to ignore these closures because they are all concerned about +finalizing weak pointers (in @finalizeWeakPointersNow()@). +It can be catastrophic to invoke @LdvCensusKillAll()@ after finishing +@finalizeWeakPointersNow()@: @finalizeWeakPointersNow()@ calls +@rts_evalIO()@, which is essentially initiating a new program execution, +and no assumptions made upon LDVU profiling hold any longer. + +\section{Time of Use} + +In order to yield correct LDVU profiling results, we must make sure +that @LDV_recordUse()@ be called on a closure whenever it is used; +otherwise, most of closures would be reported to be in the void phase. +@includes/StgLdvProf.h@ provides an entry macro @LDV_ENTER@ which +expands to @LDV_recordUse()@. The compiler arranges to invoke +@LDV_ENTER@ in the entry code for every dynamic closure it generates +code for (constructors, thunks and functions). We also have to add +@LDV_ENTER@ calls to the closures statically compiled into the RTS: +@PAP@s, @AP_UPD@s, standard thunk forms (in @StgStdThunks.hc@, and +several others in @StgMiscClosures.hc@. + +\section{Computing Final Statistics} + +After the final scan of the heap, we can accurately determine the total +size of closures in one of the four phases at the moment of each heap census. +The structure @LdvGenInfo@ is augmented with two additional fields +@voidTotal@ and @dragTotal@: + +\begin{code} +typedef struct { + ... + int voidTotal; + int dragTotal; + ... +} LdvGenInfo; +\end{code} + +@gi[@$i$@].voidTotal@ and @gi[@$i$@].dragTotal@ are computed +from @gi[@$i$@].voidNew@ and @gi[@$i$@].dragNew@, respectively.\footnote{Due +to a slight optimization described before, @gi[@$i$@].voidTotal@ is actually +computed as $\sum_{1 \leq j \leq i}$@gi[@$j$@].voidNew@. +@gi[@$i$@].dragTotal@ is computed in a similar way.} +Then, the total size of closures in the lag phase @gi[@$i$@].lagTotal@ is computed +as @gi[@$i$@].notUsed@$-$@gi[@$i$@].voidTotal@ (because any unused closure +is either in the void phase or in the lag phase). +Similarly, +the total size of closures in the use phase @gi[@$i$@].useTotal@ is computed +as @gi[@$i$@].used@$-$@gi[@$i$@].dragTotal@ (because any used closure +is either in the use phase or in the drag phase). +@endLdvProfiling()@, called from @endHeapProfiling@ in @ProfHeap.c@, computes these +final statistics. + +\section{Usage} + +The runtime system option @-hL@ tells the executable program to +perform LDVU profiling and produce a @.hp@ file: + +\begin{code} +$ Foo.out +RTS -hL +\end{code} + +The option @-i@ can be used to +specify a desired interval at which LDVU profiling is performed. +The default and minimum value is half a second: + +\begin{code} +$ Foo.out +RTS -hL -i2.5 -RTS +\end{code} + +The @.hp@ file can be supplied to the @hp2ps@ program to create a postscript +file showing the progress of LDVU profiling in a graph: + +\begin{code} +$ hp2ps Foo.hs +$ gv Foo.ps +\end{code} + +The horizontal axis of the graph is in the Haskell mutator time, which excludes +the runtime system time such as garbage collection time and LDVU profiling +time. +The Haskell mutator runs a bit slower than it would without performing +LDVU profiling, but the difference is minute. +Also, the timer employed to periodically perform retainer profiling +is not perfectly accurate. Therefore, the result may slightly vary for each +execution of retainer profiling. + +\textbf{To do:} Currently the LDVU profiling is not supported with @-G1@ option. + +\textbf{To do:} When we perform LDVU profiling, the Haskell mutator time seems to +be affected by @-S@ or @-s@ runtime option. For instance, the following +two options should result in nearly same profiling outputs, but +the second run (without @-Sstderr@ option) spends almost twice as +long in the Haskell mutator as the first run: +1) @+RTS -Sstderr -hL -RTS@; 2) @+RTS -hL -RTS@. +This is quite a subtle bug because this wierd phenomenon is not +observed in retainer profiling, yet the implementation of +@mut_user_time_during_LDV()@ is completely analogous to that of +@mut_user_time_during_RP()@. The overall shapes of the resultant graphs +are almost the same, though. + +\section{Files} + +This section gives a summary of changes made to the GHC in +implementing LDVU profiling. +Only three files (@includes/StgLdvProf.h@, @LdvProfile.c@, and +@LdvProfile.h@) are new, and all others exist in the GHC. + +@\includes@ directory: + +\begin{description} +\item[StgLdvProf.h] defines type @LdvGenInfo@, constants, and macros related +with LDVU profiling. +\item[ClosureMacros.h] changes macro @SET_PROF_HDR()@. +\item[Stg.h] includes th header file @StgLdvProf.h@. +\item[StgMacros.h] changes macros @UPD_BH_UPDATABLE()@ and @UPD_BH_SINGLE_ENTRY()@. +\end{description} + +@\rts@ directory: + +\begin{description} +\item[GC.c] invokes @LdvCensusForDead()@ before tidying up, sets @hasBeenAnyGC@ to + @rtsTrue@, and changes @copy()@ and @copyPart()@. + Invokes @LDV_recordDead()@ and @LDV_recordDead_FILL_SLOP_DYNAMIC()@. +\item[Itimer.c] changes @handle_tick()@. +\item[LdvProfile.c] implements the LDVU profiling engine. +\item[LdvProfile.h] is the header for @LdvProfile.c@. +\item[PrimOps.hc] changes @finalizzeWeakzh_fast()@. +\item[ProfHeap.c] changes @initHeapProfiling()@ and @endHeapProfiling()@. +\item[Profiling.c] changes @initProfilingLogFile@ and @report_ccs_profiling()@. +\item[Proftimer.c] declares @ticks_to_retainer_ldv_profiling@, + @performRetainerLdvProfiling@, and @doContextSwitch@. +\item[Proftimer.h] is the header for @Proftimer.c@. Defines @PROFILING_MIN_PERIOD@, + which specifies the minimum profiling period and the default profiling period. +%\item[RtsAPI.c] implements @setProfileHeader()@. +\item[RtsFlags.c] + sets @RtsFlags.ProfFlags.doHeapProfile@, + adds a string to @usage_text[]@ in @setupRtsFlags()@. +\item[RtsFlags.h] defines constants @HEAP_BY_LDV@ and @LDVchar@. +\item[RtsStartup.c] changes @shutDownHaskell()@. +\item[Schedule.c] changes @schedule()@. +\item[Stats.c] + declares @LDV_start_time@, @LDV_tot_time@, @LDVe_start_time@, + @LDVe_tot_time@. + Changes @mut_user_time_during_GC()@, @mut_user_time()@, + @stat_startExit()@, + @stat_endExit()@, and + @stat_exit()@. + Defines + @mut_user_time_during_LDV()@, + @stat_startLDV()@, and + @stat_endLDV()@. +\item[Stats.h] is hte header for @Stats.c@. +\item[StgMiscClosures.hc] inserts entry macros in + @stg_IND_entry()@, @stg_IND_PERM_entry()@, @stg_IND_OLDGEN_entry()@, + @stg_IND_OLDGEN_PERM_entry()@, @stg_BLACKHOLE_entry()@, @stg_BLACKHOLE_BQ_entry()@, + and @stg_CAF_BLACKHOLE_entry()@. + Invokes @LDV_recordDead()@ in @stg_BLACKHOLE_entry@. + Redefines @stg_DEAD_WEAK_info@. +\item[Storage.c] changes @initStorage()@, @resetNurseries()@, and @allocNursery()@. +\item[Storage.h] changes @updateWithIndirection()@ and @updateWithPermIndirection()@. +\item[Updates.hc] inserts entry macros in @stg_PAP_entry()@ and @stg_AP_UPD_entry()@. +\item[Weak.c] changes @scheduleFinalizers()@. +\end{description} + +\bibliographystyle{plain} +\bibliography{reference} + +\end{document}