From: simonmar <unknown>
Date: Thu, 22 Nov 2001 12:06:55 +0000 (+0000)
Subject: [project @ 2001-11-22 12:06:55 by simonmar]
X-Git-Tag: Approximately_9120_patches~548
X-Git-Url: http://git.megacz.com/?a=commitdiff_plain;h=8f877d355000d6a3304722cb49ec2c21e73e3a0e;p=ghc-hetmet.git

[project @ 2001-11-22 12:06:55 by simonmar]
Update this document.  Some of the implementation problems can be
solved in a cleaner way, and the document was confused about the
problems caused by slop in the heap (it's only a problem when
traversing the dead heap, not the live heap).
---

diff --git a/ghc/docs/storage-mgt/ldv.tex b/ghc/docs/storage-mgt/ldv.tex
index de1ba93..936407c 100644
--- a/ghc/docs/storage-mgt/ldv.tex
+++ b/ghc/docs/storage-mgt/ldv.tex
@@ -23,7 +23,7 @@
 
 \begin{document}
 \title{Implementation of Lag/Drag/Void/Use Profiling}
-\author{Sungwoo Park}
+\author{Sungwoo Park \\ Simon Marlow}
 
 \makeatactive
 \maketitle
@@ -209,33 +209,30 @@ the current global time is stored in its last use time field.
 
 \section{Global Time \texttt{ldvTime} and Retainer Profiling}
 
-The global time, stored in @ldvTime@, is used primarily to record the number
-of times heap censuses have been performed for LDV profiling. 
-It is initialized to $1$ and incremented each time a heap census is completed 
-through an invocation of @LdvCensus()@ (i.e., at the end of @LdvCensus()@).
-Its implication is that 
-all closures created between two successive invocations of @LdvCensus()@
-have the same creation time.
-Another implication is that 
-if a closure is used at least once between two successive heap 
-censuses, we consider the closure to be in the use phase 
-during the corresponding time period 
-(because we just set its last use time field to the current value
-of @ldvTime@ whenever it is used).
-Notice that a closure with a creation time $t_c$ may be destroyed before
-the major garbage collection before the actual heap census for time $t_c$ and thus 
-may \emph{not} be observed during the heap census for time $t_c$.
-Such a closure does not affect the profiling statistics at all.
-
-In addition, the global time @ldvTime@ indicates
-which of LDVU profiling and retainer profiling is currently active:
-during LDVU profiling, it is initialized to $1$ in @initLdvProfiling()@
-and then increments as LDVU profiling proceeds;
-during retainer profiling, however, it is always fixed to $0$.
-
-Thus, wherever a piece of code shared by both retainer profiling and
-LDVU profiling comes to play, we usually need to first examine the value of @ldvTime@
-if necessary. For instance, consider the macro @LDV_recordUse()@:
+The global time, stored in @ldvTime@, records the current time period.
+It is initialized to $1$ and incremented after each time a heap census
+is completed through an invocation of @LdvCensus()@.  Note that each
+value of @ldvTime@ represents a time \emph{period}, not a point in
+time.
+
+All closures created between two successive invocations of
+@LdvCensus()@ have the same creation time.  If a closure is used at
+least once between two successive heap censuses, we consider the
+closure to be in the use phase during the corresponding time period
+(because we just set its last use time field to the current value of
+@ldvTime@ whenever it is used).  Notice that a closure with a creation
+time $t_c$ may be destroyed before the actual heap census for time
+$t_c$ and thus may \emph{not} be observed during the heap census for
+time $t_c$.  Such a closure does not show up in the profile at all.
+
+In addition, the value of @ldvTime@ indicates which of LDVU profiling
+and retainer profiling is currently active: during LDVU profiling, it
+is initialized to $1$ in @initLdvProfiling()@ and then increments as
+LDVU profiling proceeds; during retainer profiling, however, it is
+always fixed to $0$.  Thus, wherever a piece of code shared by both
+retainer profiling and LDVU profiling comes to play, we usually need
+to first examine the value of @ldvTime@ if necessary. For instance,
+consider the macro @LDV_recordUse()@:
 
 \begin{code}
 #define LDV_recordUse(c)                              \
@@ -243,12 +240,12 @@ if necessary. For instance, consider the macro @LDV_recordUse()@:
     LDVW((c)) = (LDVW((c)) & LDV_CREATE_MASK) | ldvTime | LDV_STATE_USE; 
 \end{code}
 
-If retainer profiling is being performed, @ldvTime@ is equal to $0$, and
-@LDV_recordUse()@ causes no side effect.\footnote{Due to this interference
-with LDVU profiling, retainer profiling slows down a bit; for instance, 
-checking @ldvTime@ against $0$ in the above example would always evaluate to
-@rtsFalse@ during retainer profiling.
-However, this is the price to be paid for our decision not to employ a 
+If retainer profiling is being performed, @ldvTime@ is equal to $0$,
+and @LDV_recordUse()@ causes no side effect.\footnote{Due to this
+interference with LDVU profiling, retainer profiling slows down a bit;
+for instance, checking @ldvTime@ against $0$ in the above example
+would always evaluate to @rtsFalse@ during retainer profiling.
+However, this is the price to be paid for our decision not to employ a
 separate field for LDVU profiling.}
 
 As another example, consider @LDV_recordCreate()@:
@@ -268,51 +265,39 @@ retainer set fields correctly.
 \label{sec-heap-censuses}
 
 The LDVU profiler performs heap censuses periodically by invoking the
-function @LdvCensus()@
-(a heap census can take place at any random moment during program execution).
-Since LDVU profiling deals only with live closures, however, we choose to have
-a heap census preceded by a major garbage collection so that only live
-closures reside in the heap 
-(see @schedule()@ in @Schedule.c@).\footnote{It turns out that this
-choice does not considerably simplify the implementation; we need to
-finish a complete linear scan of the entire live heap at any rate. 
-As a side note, in~\cite{RR}, a heap census is \emph{followed} by a garbage
-collection, which is the opposite of our strategy.}
-This design choice is necessary for the LDVU profiling result not to be 
-affected by the amount of heap memory available to programs.
-Without a a major garbage collection performed before a heap census,
-the size of closures in the drag or void phases would heavily depend on
-the amount of available heap memory. 
-
-During a census, we examine each closure one by one and compute the following
-three quantities:
+function @LdvCensus()@.  Because we need to know exactly which
+closures in the heap are live at census time, we always precede the
+census with a major garbage collection.
+
+During a census, we examine each closure one by one and compute the
+following three quantities:
 
 \begin{enumerate}
 \item the total size of all \emph{inherently used} closures.
-\item the total size of all closures which have never been used.
+\item the total size of all closures which have not been used (yet).
 \item the total size of all closures which have been used at least once. 
 \end{enumerate}
 
-An inherently used closure is, by definition, one which is deemed to
-be in use even at the point of its birth. A closure which cannot be
-entered (e.g., @ARR_WORDS@) belongs to this category (otherwise
-such a closure would stay in the void phase all the time).
-In the current implementation, the following types of closures are 
-considered to be inherently used:
-@TSO@, @MVAR@, @MUT_ARR_PTRS@, @MUT_ARR_PTRS_FROZEN@, @ARR_WORDS@,
-@WEAK@, @MUT_VAR@, @MUT_CONS@, @FOREIGN@, @BCO@, and @STABLE_NAME@.
+For most closures, a \emph{use} consists of entering the closure.  For
+unlifted objects which are never entered (eg. @ARR_WORDS@), it would
+be difficult to determine their points of use because such points are
+scattered around the implementation in various primitive operations.
+For this reason we consider all unlifted objects as ``inherently
+used''.  The following types of closures are considered to be
+inherently used: @TSO@, @MVAR@, @MUT_ARR_PTRS@, @MUT_ARR_PTRS_FROZEN@,
+@ARR_WORDS@, @WEAK@, @MUT_VAR@, @MUT_CONS@, @FOREIGN@, @BCO@, and
+@STABLE_NAME@.
 
 The three quantities are stored in an @LdvGenInfo@ array @gi[]@.
-@gi[]@ is indexed by time. 
-For instance, @gi[ldvTime]@ stores the three quantaties for the current
-global time.
-The structure @LdvGenInfo@ is defined as follows:
+@gi[]@ is indexed by time period.  For instance, @gi[ldvTime]@ stores
+the three quantaties for the current global time period.  The
+structure @LdvGenInfo@ is defined as follows:
 
 \begin{code}
 typedef struct {
   ...
   int inherentlyUsed;   // total size of 'inherently used' closures
-  int notUsed;          // total size of 'never used' closures
+  int notUsed;          // total size of 'not used yet' closures
   int used;             // total size of 'used at least once' closures
   ...
 } LdvGenInfo;
@@ -322,104 +307,41 @@ The above three quantities account for mutually exclusive sets of closures.
 In other words, if a closure is not inherently used, it belongs to 
 either the second or the third.
 
-\subsection{Linear Scan of the Heap during Heap Censuses}
+\subsection{Taking a Census of the Live Heap}
 
-During a heap census, we need to visit every live closure once, and a linear
-scan of the heap is sufficient to our goal. 
-Since we assume that a major garbage collection cleans up the heap before
-any heap census, we can take advantage of the following facts to implement
-a linear scan for heap censuses:
+During a heap census, we need to visit every live closure once, so we
+perform a linear scan of the live heap after a major GC.  We can take
+advantage of the following facts to implement a linear scan for heap
+censuses:
 
 \begin{itemize}
 \item The nursery is empty. The small object pool and the large object pool, 
       however, may \emph{not} be empty. This is because the garbage collector
       invokes @scheduleFinalizer()@ after removing dead closures, and
       @scheduleFinalizer()@ may create new closures through @allocate()@.
-\item @IND@, @IND_OLDGEN@, and @EVACUATED@ closures do not appear in the heap.
+\item @IND@, @IND_OLDGEN@, and @EVACUATED@ closures do not appear in
+the live heap.
 \end{itemize}
 
-The implementation of the linear scan strategy is complicated by the
-following two facts. 
-First, either the small object pool (accessible from @small_alloc_list@) 
-or the large object pool (accessible from @g0s0->large_objects@) 
-may contain an initial evaluation stack, which is allocated via @allocate()@ in 
-@initModules()@ (in @RtsStartup.c@).
-The initial evaluation stack is heterogenous from any closure; it may not
-consist of consecutively allocated closures, which makes it
-hard to discern the initial evaluation stack among ordinary closures (unless
-we keep track of its position and size). 
-Second, a closure may not necessarily be adjacent to the next closure in the heap
-because there may appear \emph{slop words} between two closures.
-This happens when a closure is replaced by another closure with different 
-(smaller) size.
-
-We solve the first problem simply by postponing heap censuses until the first
-garbage collection, whether it is a major garbage collection or not.
-This simple idea works because the initial evaluation stack is removed from
-the heap during a garbage collection. The boolean variable @hasBeenAnyGC@
-is set to @rtsTrue@ the first time a garbage collection is performed, and
-hence @LdvCensus()@ returns immediately unless @hasBeenAnyGC@ has a @rtsTrue@
-value. 
-In practice, however, this premature return seldom happens: minor garbage 
-collections occur frequently enough that the first invocation of @LdvCensus()@ is
-preceded by a minor garbage collection in most cases.
-
-We solve the second problem by filling all possible slop words with zeroes.
-Then we can find the next closure after any give closure by skipping zeroes
-if any. First of all, we notice that slop words surviving a major garbage
-collection can be generated only in the following two cases:
-
-\begin{enumerate}
-\item A closure is overwritten with a blackhole. 
-\item A closure is overwritten with an indirection closure.
-%\item A weak pointer is overwritten with a dead weak pointer.
-\end{enumerate}
-
-In either case, an existing closure is destroyed after being replaced with a 
-new closure. 
-If the two closures are of the same size, no slop words are introduce and 
-we only need to invoke 
-@LDV_recordDead()@ on the existing closure, which cannot be an inherently used
-closure.
-If not, that is, the new closure is smaller than the existing closure 
-(the opposite cannot happen), we need to fill one or more slop words with zeroes
-as well as invoke @LDV_recordDead()@ on the existing closure.
-The macro @LDV_recordDead_FILL_SLOP_DYNAMIC()@ accomplishes these two tasks:
-it determines the size of the existing closure, invokes @LDV_recordDead()@, and
-fills the slop words with zeroes.
-After excluding all cases in which the two closures are of the same size,
-we invoke @LDV_recordDead_FILL_SLOP_DYNAMIC()@ only from:
-
-\begin{enumerate}
-\item @UPD_BH_UPDATABLE()@ and @UPD_BH_SINGLE_ENTRY()@ in @includes/StgMacros.h@, 
-@threadLazyBlackHole()@ and @threadSqueezeStack()@ in @GC.c@.
-\item @updateWithIndirection()@ and @updateWithPermIndirection()@ 
-in @Storage.h@.\footnote{Actually slop words created in 
-@updateWithIndirection()@ cannot survive major garbage collections.
-Still we invoke @LDV\_recordDead\_FILL\_SLOP\_DYNAMIC()@ to support linear
-scan of the heap during a garbage collection, which is discussed in the next
-section.}
-\end{enumerate}
-
-Notice that weak pointers being overwritten with dead weak pointers can be handled 
-properly without recourse to @LDV_recordDead_FILL_SLOP_DYNAMIC()@. The reason is 
-that dead weak pointers are of the same size as normal weak 
-pointers.\footnote{This is not the case without profiling. It was not the case 
-in the previous version of
-GHC, either. See the comments on @stg\_DEAD\_WEAK\_info@ in @StgMiscClosures.hc@.}
+There is one small complication when traversing the live heap: the
+garbage collector may have replaced @WEAK@ objects with @DEAD_WEAK@
+objects, which have a smaller size and hence leave some space before
+the next object.  To avoid this problem we change the size of
+@DEAD_WEAK@ objects to match that of @WEAK@ objects when profiling is
+enabled (see @StgMiscClosures.hc@).
 
 \section{Destruction of Closures}
 
-In order to compute the total size of closures for each of the four phases,
-we must report the destruction of every closure (except inherently used closures)
-to the LDVU profiler by invoking @LDV_recordDead()@.
-@LDV_recordDead()@ must not be called on any inherently used closure
-because any invocation of @LDV_recordDead()@ affects the
-statistics regarding void and drag phases, which no inherently used
-closure can be in.
+In order to compute the total size of closures for each of the four
+phases, we must report the destruction of every closure (except
+inherently used closures) to the LDVU profiler by invoking
+@LDV_recordDead()@.  @LDV_recordDead()@ must not be called on any
+inherently used closure because any invocation of @LDV_recordDead()@
+affects the statistics regarding void and drag phases, which no
+inherently used closure can be in.
 
-@LDV_recordDead()@ updates two fields @voidNew@ and @dragNew@ in the @LdvGenInfo@ 
-array @gi[]@:
+@LDV_recordDead()@ updates two fields @voidNew@ and @dragNew@ in the
+@LdvGenInfo@ array @gi[]@:
 
 \begin{code}
 typedef struct {
@@ -430,24 +352,23 @@ typedef struct {
 } LdvGenInfo;
 \end{code} 
 
-@gi[ldvTime].voidNew@ accumulates the size of all closures satisfying the following
-two conditions: 1) observed during the heap census at time @ldvTime@;
-2) now known to have been in the void phase at time @ldvTime@.
-It is updated when a closure which has never been used is destroyed.
-Suppose that a closure @c@ which has never been used is about to be destroyed.
-If its creation time is $t_c$, we judge that @c@ has been in the
-void phase all its lifetime, namely, from time $t_c$ to @ldvTime@.
-Since @c@ will not be observed during the next heap census, which corresponds
-to time @ldvTime@, @c@ contributes to the void phase of times $t_c$ through
-@ldvTime@ - 1.
-Therefore, we increase the @voidNew@ field of @gi[@$t_c$@]@ through @gi[ldvTime - 1]@
-by the size of @c@.\footnote{In the actual implementation, we update @gi[$t_c$]@ 
-and @gi[ldvTime]@ (not @gi[ldvTime@$ - $1@]@) only: @gi[$t_c$]@ and @gi[ldvTime]@ 
-are increased and decreased by the size of @c@, respectively. 
-After finishing the program execution, we can correctly adjust all the fields
-as follows:
-@gi[$t_c$]@ is computed as $\sum_{i=0}^{t_c}$@gi[$i$]@.
-}
+@gi[ldvTime].voidNew@ accumulates the size of all closures satisfying
+the following two conditions: 1) observed during the heap census at
+time @ldvTime@; 2) now known to have been in the void phase at time
+@ldvTime@.  It is updated when a closure which has never been used is
+destroyed.  Suppose that a closure @c@ which has never been used is
+about to be destroyed.  If its creation time is $t_c$, we judge that
+@c@ has been in the void phase all its lifetime, namely, from time
+$t_c$ to @ldvTime@.  Since @c@ will not be observed during the next
+heap census, which corresponds to time @ldvTime@, @c@ contributes to
+the void phase of times $t_c$ through @ldvTime@ - 1.  Therefore, we
+increase the @voidNew@ field of @gi[@$t_c$@]@ through @gi[ldvTime - 1]@
+ by the size of @c@.\footnote{In the actual implementation, we
+update @gi[$t_c$]@ and @gi[ldvTime]@ (not @gi[ldvTime@$ - $1@]@) only:
+@gi[$t_c$]@ and @gi[ldvTime]@ are increased and decreased by the size
+of @c@, respectively.  After finishing the program execution, we can
+correctly adjust all the fields as follows: @gi[$t_c$]@ is computed as
+$\sum_{i=0}^{t_c}$@gi[$i$]@.  }
 
 @gi[ldvTime].dragNew@ accumulates the size of all closures satisfying the following
 two conditions: 1) observed during the heap census at time @ldvTime@;
@@ -484,17 +405,14 @@ There are four cases in which a closure is destroyed:
   with an @IND_OLDGEN_PERM@ closure during scavenging.
   We call either @LDV_recordDead()@ or @LDV_recordDead_FILL_SLOP_DYNAMIC()@.
   
-\item Closures are removed permanently from the graph during garbage collections.
-We locate and dispose of all dead closures by 
-linearly scanning the from-space right before tidying up.
-This is feasible because any closures which is about to be removed from the graph
-still remains in the from-space until tidying up is completed.
-The next subsection explains how to implement this idea.
+\item Closures are removed permanently from the graph during garbage
+collections.  We locate and dispose of all dead closures by linearly
+scanning the from-space right before tidying up.  This is feasible
+because any closures which is about to be removed from the graph still
+remains in the from-space until tidying up is completed.  The next
+subsection explains how to implement this idea.
 \end{enumerate}
 
-\textbf{To do:} if okay, replace the invocation of @SET_HDR()@ with that of 
-@SET_INFO()@ and remove @LDV_recordCreate()@ in some of the above cases. 
-
 \subsection{Linear scan of the from-space during garbage collections}
 
 In order to implement linear scan of the from-space during a garbage collection 
@@ -502,12 +420,6 @@ In order to implement linear scan of the from-space during a garbage collection
 we need to take into consideration the following facts:
 
 \begin{itemize}
-\item The from-space includes the nursery. 
-Furthermore, a closure in the nursery may not necessarily be adjacent to the next 
-closure because slop words may lie between the two closures;
-the Haskell mutator may allocate more space than actually needed in the
-nursery when creating a closure, potentially leaving slop words. 
-
 \item The pointer @free@ of a block in the nursery may incorrectly point to
 a byte past its actual boundary.
 This happens because 
@@ -521,43 +433,87 @@ only during the Haskell mutator time.
 
 \item The from-space may well contain a good number of @EVACUATED@ closures,
 and they must be skipped over.
+
+\item The from-space includes the nursery. 
+Furthermore, a closure in the nursery may not necessarily be adjacent to the next 
+closure because slop words may lie between the two closures;
+the Haskell mutator may allocate more space than actually needed in the
+nursery when creating a closure, potentially leaving slop words. 
 \end{itemize}
 
-The first problem could be solved by always monitoring @Hp@ during the Haskell
-mutator time: whenever @Hp@ is increased, we fill with zeroes 
-as many words as the change of @HP@. Then, we could skip any trailing 
-zero words when linearly scanning the nursery.
+The first problem is easily solved by limiting the scan up to the
+actual block boundary for each nursery block (see
+@processNurseryForDead()@).  In other words, for a nursery block
+descriptor @bd@, whichever of @bd->start@$ + $@BLOCK_SIZE_W@ and
+@bd->free@ is smaller is used as the actual boundary.
+
+We solve the second problem by exploiting LDV words of @EVACUATED@
+closures: we store the size of an evacuated closure, which now resides
+in the to-space, in the LDV word of the new @EVACUATED@ closure
+occupying its memory.  This is easily implemented by inserting a call
+to the macro @SET_EVACUAEE_FOR_LDV()@ in @copy()@ and @copyPart()@ (in
+@GC.c@).  Thus, when we encounter an @EVACUATED@ closure while
+linearly scanning the nursery, we can skip a correct number of words
+by referring to its LDV word.
+
+The third problem could be partially solved by always monitoring @Hp@
+during the Haskell mutator time: whenever @Hp@ is increased, we fill
+with zeroes as many words as the change of @HP@. Then, we could skip
+any trailing zero words when linearly scanning the nursery.
 Alternatively we could initialize the entire nursery with zeroes after
-each garbage collection and not worry about any change made to @Hp@ during the
-Haskell mutator time. 
-The number of zero words to be written to the nursery could be reduced
-in the first approach, for we do not have to fill the header for a new closure.
-Nevertheless we choose to employ the second approach because
-it simplifies the implementation code significantly 
-(see @resetNurseries()@ in @Storage.c@). 
-Moreover, the second approach compensates for its redundant initialization cost
-by providing faster execution (due to a single memory write loop in contrast
-to frequent memory write loops in the first approach).
-Also, we attribute the initialization cost to the runtime system and thus 
-the Haskell mutator behavior is little affected. 
-
-The second problem is easily solved by limiting the scan up to the actual
-block boundary for each nursery block (see @processNurseryForDead()@).
-In other words, for a nursery block descriptor @bd@, 
-whichever of @bd->start@$ + $@BLOCK_SIZE_W@ and @bd->free@ is smaller
-is used as the actual boundary. 
-
-We solve the third problem by exploiting LDV words of @EVACUATED@ closures:
-we store the size of an evacuated closure, which now resides in the to-space,
-in the LDV word of the new @EVACUATED@ closure occupying its memory.
-This is easily implemented by inserting a call to the macro
-@SET_EVACUAEE_FOR_LDV()@ in @copy()@ and @copyPart()@ (in @GC.c@).
-Thus, when we encounter an @EVACUATED@ closure while linearly scanning the
-nursery, we can skip a correct number of words by referring to its LDV word.
-
-The linear scan of the from-space is initiated by the garbage collector.
-From the function @LdvCensusForDead()@, every dead closure in the from-space is
-visited through an invocation of @processHeapClosureForDead()@.
+each garbage collection and not worry about any change made to @Hp@
+during the Haskell mutator time.  The number of zero words to be
+written to the nursery could be reduced in the first approach, for we
+do not have to fill the header for a new closure.  Nevertheless we
+choose to employ the second approach because it simplifies the
+implementation code significantly (see @resetNurseries()@ in
+@Storage.c@).  Moreover, the second approach compensates for its
+redundant initialization cost by providing faster execution (due to a
+single memory write loop in contrast to frequent memory write loops in
+the first approach).  Also, we attribute the initialization cost to
+the runtime system and thus the Haskell mutator behavior is little
+affected.
+
+There is further complication though: occasionally a closure is
+overwritten with a closure of a smaller size, leaving some slop
+between itself and the next closure in the heap.  There are two cases:
+
+\begin{enumerate}
+\item A closure is overwritten with a blackhole. 
+\item A closure is overwritten with an indirection closure.
+\end{enumerate}
+
+In either case, an existing closure is destroyed after being replaced
+with a new closure.  If the two closures are of the same size, no slop
+words are introduced and we only need to invoke @LDV_recordDead()@ on
+the existing closure, which cannot be an inherently used closure.  If
+not, that is, the new closure is smaller than the existing closure
+(the opposite cannot happen), we need to fill one or more slop words
+with zeroes as well as invoke @LDV_recordDead()@ on the existing
+closure.  The macro @LDV_recordDead_FILL_SLOP_DYNAMIC()@ accomplishes
+these two tasks: it determines the size of the existing closure,
+invokes @LDV_recordDead()@, and fills the slop words with zeroes.
+After excluding all cases in which the two closures are of the same
+size, we invoke @LDV_recordDead_FILL_SLOP_DYNAMIC()@ only from:
+
+\begin{enumerate}
+\item @threadLazyBlackHole()@ and @threadSqueezeStack()@ in @GC.c@
+(for lazy blackholing),
+\item @UPD_BH_UPDATABLE()@ and @UPD_BH_SINGLE_ENTRY()@ in
+@includes/StgMacros.h@ (for eager blackholing, which isn't the
+default),
+\item @updateWithIndirection()@ and @updateWithPermIndirection()@ 
+in @Storage.h@.\footnote{Actually slop words created in 
+@updateWithIndirection()@ cannot survive major garbage collections.
+Still we invoke @LDV\_recordDead\_FILL\_SLOP\_DYNAMIC()@ to support linear
+scan of the heap during a garbage collection, which is discussed in the next
+section.}
+\end{enumerate}
+
+The linear scan of the from-space is initiated by the garbage
+collector.  From the function @LdvCensusForDead()@, every dead closure
+in the from-space is visited through an invocation of
+@processHeapClosureForDead()@.
 
 \subsection{Final scan of the heap}
 
@@ -579,17 +535,16 @@ and no assumptions made upon LDVU profiling hold any longer.
 
 \section{Time of Use}
 
-In order to yield correct LDVU profiling results, we must make sure that
-@LDV_recordUse()@ be called on a closure whenever it is used; otherwise,
-most of closures would be reported to be in the void phase.
-@includes/StgLdvProf.h@ provides nine entry macros (e.g., @LDV_ENT_THK()@).
-Each of the entry macros corresponds to a type of closures which can be entered.
-For instance, @LDV_ENT_THK()@ is supposed to be invoked whenever a thunk
-is used. In the current implementation, all these macros expand to 
-@LDV_recordUse()@ with no additional work.
-
-\textbf{To do:} modify the compiler so that it inserts the above macros
-at appropriate places in the generated C code.
+In order to yield correct LDVU profiling results, we must make sure
+that @LDV_recordUse()@ be called on a closure whenever it is used;
+otherwise, most of closures would be reported to be in the void phase.
+@includes/StgLdvProf.h@ provides an entry macro @LDV_ENTER@ which
+expands to @LDV_recordUse()@.  The compiler arranges to invoke
+@LDV_ENTER@ in the entry code for every dynamic closure it generates
+code for (constructors, thunks and functions).  We also have to add
+@LDV_ENTER@ calls to the closures statically compiled into the RTS:
+@PAP@s, @AP_UPD@s, standard thunk forms (in @StgStdThunks.hc@, and
+several others in @StgMiscClosures.hc@.
 
 \section{Computing Final Statistics}
 
@@ -658,6 +613,18 @@ execution of retainer profiling.
 
 \textbf{To do:} Currently the LDVU profiling is not supported with @-G1@ option.
 
+\textbf{To do:} When we perform LDVU profiling, the Haskell mutator time seems to
+be affected by @-S@ or @-s@ runtime option. For instance, the following 
+two options should result in nearly same profiling outputs, but
+the second run (without @-Sstderr@ option) spends almost twice as
+long in the Haskell mutator as the first run:
+1) @+RTS -Sstderr -hL -RTS@; 2) @+RTS -hL -RTS@.
+This is quite a subtle bug because this wierd phenomenon is not 
+observed in retainer profiling, yet the implementation of 
+@mut_user_time_during_LDV()@ is completely analogous to that of 
+@mut_user_time_during_RP()@. The overall shapes of the resultant graphs 
+are almost the same, though.
+
 \section{Files}
 
 This section gives a summary of changes made to the GHC in 
@@ -689,8 +656,9 @@ with LDVU profiling.
 \item[Profiling.c] changes @initProfilingLogFile@ and @report_ccs_profiling()@.
 \item[Proftimer.c] declares @ticks_to_retainer_ldv_profiling@,
   @performRetainerLdvProfiling@, and @doContextSwitch@.
-\item[Proftimer.h] is the header for @Proftimer.c@.
-\item[RtsAPI.c] implements @setProfileHeader()@.
+\item[Proftimer.h] is the header for @Proftimer.c@. Defines @PROFILING_MIN_PERIOD@,
+  which specifies the minimum profiling period and the default profiling period.
+%\item[RtsAPI.c] implements @setProfileHeader()@.
 \item[RtsFlags.c]
   sets @RtsFlags.ProfFlags.doHeapProfile@,
   adds a string to @usage_text[]@ in @setupRtsFlags()@.