docs/storage-mgt/ldv.tex

   1 \documentclass{article}
   2 \usepackage{code,a4wide}
   3
   4 \usepackage{graphics,epsfig,epic,eepic,epsfig}
   5
   6 \setlength{\parskip}{0.25cm}
   7 \setlength{\parsep}{0.25cm}
   8 \setlength{\topsep}{0cm}
   9 \setlength{\parindent}{0cm}
  10 \renewcommand{\textfraction}{0.2}
  11 \renewcommand{\floatpagefraction}{0.7}
  12
  13
  14 % Terminology
  15 \newcommand{\block}{block}
  16 \newcommand{\Block}{Block}
  17 \newcommand{\segment}{segment}
  18 \newcommand{\Segment}{Segment}
  19 \newcommand{\step}{step}
  20 \newcommand{\Step}{Step}
  21
  22 \newcommand{\note}[1]{{\em $\spadesuit$ #1}}
  23
  24 \begin{document}
  25 \title{Implementation of Lag/Drag/Void/Use Profiling}
  26 \author{Sungwoo Park \\ Simon Marlow}
  27
  28 \makeatactive
  29 \maketitle
  30
  31 \section{Lag/Drag/Void/Use Profiling}
  32
  33 \emph{Lag/Drag/Void/Use} (LDVU) profiling~\cite{RR} is a profiling technique
  34 which yields a summary of the biography of all the dynamic closures created
  35 during program execution.
  36 In this profiling scheme,
  37 the biography of a closure is determined by four important events associated
  38 with the closure: \emph{creation}, \emph{first use},
  39 \emph{last use}, and \emph{destruction} (see Figure~\ref{fig-ldv}).
  40 The intervals between these successive events correspond to three phases
  41 for the closure: \emph{lag} (between creation and first use),
  42 \emph{use} (between first use and last use), and
  43 \emph{drag} (between last use and destruction).
  44 If the closure is never used, it is considered to remain in the \emph{void}
  45 phase all its lifetime.
  46
  47 \begin{figure}[ht]
  48 \begin{center}
  49 \input{ldv.eepic}
  50 \caption{The biography of a closure}
  51 \label{fig-ldv}
  52 \end{center}
  53 \end{figure}
  54
  55 The LDVU profiler regularly performs heap censuses during program execution.
  56 Each time a heap census is performed, the LDVU profiler increments a global
  57 time, which is used for timing all the events (such as creation and destruction
  58 of a closure) occurring during program execution.
  59 Hence, for instance, all closures creating between two successive heap censuses
  60 have the same creation time and belong to the same \emph{generation}.\footnote{In
  61 this document, a generation is related with heap censuses, not garbage collections
  62 as in other profiling schemes.}
  63 After the program terminates, it yields a post-mortem report on how much
  64 of the \emph{live} graph is in one of the four phases at the moment of each
  65 heap census.
  66
  67 It must be emphasized that the LDVU profiler considers only live closures;
  68 it should not take into consideration dead closures which do not constitute
  69 the graph. Therefore, the result of LDVU profiling does not depend on the
  70 frequency of garbage collections.
  71
  72 This document describes the implementation of LDVU profiling on the Glasgow
  73 Haskell Compiler runtime system.\footnote{Unless otherwise noted, all identifiers
  74 are defined in @LdvProfile.c@}.
  75
  76 \section{An Overview of the Implementation}
  77
  78 Every closure is augmented with an additional word in its profiling header
  79 to accommodate three additional pieces of information:
  80 1) state flag indicating whether the closure has been used at least once or not.
  81 2) creation time; 3) time of most recent use if any so far.
  82 We refer to such a word as an LDV word.
  83
  84 The LDVU profiler maintains a global time, stored in @ldvTime@.
  85 It is incremented each time a heap census is performed.
  86 During a heap census, the profiler scans all live closures and computes the
  87 following:
  88 1) the total size of all closures which have never been used;
  89 2) the total size of all closures which have been used at least once
  90 in the past.\footnote{There is another category of closures, namely,
  91 \emph{inherently used} closures. We will explain
  92 in Section~\ref{sec-heap-censuses}.}
  93 It is not until the whole program execution finishes that the profiler
  94 can actually decide the total size corresponding to each of the four phases for
  95 a particular heap census. It is only when a closure is destroyed that the profiler
  96 can determine how long the closure has been in a specific phase.
  97 Therefore, it is not sufficient to perform heap censuses periodically in order to
  98 compute the profiling statistics: the runtime system needs to intercept
  99 all events associated with any closures and update necessary information.
 100
 101 All events associated with closures are handled by one of the three
 102 macros defined
 103 in @includes/StgLdv.h@: @LDV_recordCreate()@, @LDV_recordUse()@, and
 104 @LDV_recordDead()@.
 105
 106 \begin{itemize}
 107 \item{@LDV_recordCreate()@} is called when a closure is created and updates its
 108 creation time field.
 109
 110 \item{@LDV_recordUse()@} is called when a closure is used and updates its most recent
 111 use time field.
 112
 113 \item{@LDV_recordDead()@} is called when a closure @c@ is removed from the graph.
 114 It does not update its LDV word (because @c@ is about to be destroyed).
 115 Instead, it updates the statistics on LDVU profiling according to the following
 116 observation:
 117 if @c@ has never been used (which is indicated by the state flag in its LDV
 118 word),
 119 @c@ contributes to the void phase from its creation time to the last census
 120 time; if @c@ was used at least once (which is also indicated by the state flag),
 121 @c@ contributes to the @drag@ phase after its last use time.
 122 \end{itemize}
 123
 124 At the end of the program execution, the profiler performs a last census during
 125 which all closures in the heap are declared to be dead and @LDV_recordDead()@
 126 is invoked on each of them.
 127 Then, the profiler computes the final statistics.
 128
 129 \section{LDV Words}
 130
 131 We choose to share the LDV word for both retainer profiling and LDVU
 132 profiling, which cannot take place simultaneously.
 133 This is the reason why there is a
 134 union structure inside the @StgProHeader@ structure.
 135 The field @hp.ldvw@ in the @StgProfHeader@ structure corresponds to the LDV
 136 word:
 137 \begin{code}
 138 typedef struct {
 139   ...
 140   union {
 141     retainerSet *rs;          // Retainer Set
 142     StgWord ldvw;             // Lag/Drag/Void Word
 143   } hp;
 144 } StgProfHeader;
 145 \end{code}
 146 For instance, the LDV word of a closure @c@ can now be accessed with
 147 @c->header.prof.hp.ldvw@ (or by @LDVW(c)@ where @LDVW()@ is a macro in
 148 @includes/StgLdvProf.h@).
 149
 150 An LDV word is divided into three fields, whose position is specified
 151 by three constants in @includes/StgLdvProf.h@:
 152 \begin{itemize}
 153 \item{@LDV_STATE_MASK@} corresponds to the state flag.
 154 \item{@LDV_CREATE_MASK@} corresponds to the creation time.
 155 \item{@LDV_LAST_MASK@} corresponds to the most recent use time.
 156 \end{itemize}
 157 The constant @LDV_SHIFT@ specifies how many bits are allocated for
 158 creation time or most recent use time.
 159 For instance, the creation time of a closure @c@ can be obtained by
 160 @(LDVW(c) & LDV_CREATE_MASK) >> LDV_SHIFT@.
 161
 162 The creation time field and the most recent use time field can be set only by the
 163 macros @LDV_recordCreate()@ and @LDV_recordUse()@.
 164 @LDV_recordCreate()@ must be called whenever a new dynamic closure is created,
 165 and this is handily accomplished by rewriting the macro @SET_PROF_HDR()@
 166 (in @includes/ClosureMacros.h@) (we do not need to change @SET_STATIC_PROF_HDR()@
 167 because static closures are not involved in LDVU profiling at all):
 168
 169 \begin{code}
 170 #define SET_PROF_HDR(c,ccs_)            \
 171         ((c)->header.prof.ccs = ccs_,   \
 172         LDV_recordCreate((c)))
 173 \end{code}
 174
 175 There are a few cases in which the info table of a closure changes through
 176 an explicit invocation of @SET_INFO()@ or a direct assignment to its @header.info@
 177 field: 1) an indirection closure is replaced by an old-generation
 178 indirection closure; 2) a thunk is replaced by a blackhole; 3) a thunk is replaced
 179 by an indirection closure when its evaluation result becomes available.
 180
 181 \emph{We regard such a situation as
 182 the destruction of an old closure followed by the creation of a new closure
 183 at the same memory address.}\footnote{This would be unnecessary if the two closures
 184 are of the same size, but it is not always the case. We choose to distinguish
 185 the two closures for the sake of consistency.}
 186 For instance, when an @IND_PERM@ closure is replaced by an @IND_OLDGEN_PERM@
 187 closures (during scavenging in @GC.c@), we wrap the invocation of @SET_INFO()@ with
 188 the invocations of @LDV_recordDead()@ and @LDV_recordCreate()@ as follows
 189 (@LDV_recordDead()@ requires the actual size of the closures being destroyed):
 190
 191 \begin{code}
 192   LDV_recordDead((StgClosure *)p, sizeofW(StgInd) - sizeofW(StgProfHeader));
 193   SET_INFO(((StgClosure *)p), &stg_IND_OLDGEN_PERM_info);
 194   LDV_recordCreate((StgClosure *)p);
 195 \end{code}
 196
 197 \textbf{To do:}
 198 A direct assignment to the @header.info@ field implies that its cost centre
 199 field is not initialized. This is no problem in the case of @EVACUATED@ closures
 200 because they will
 201 not be used again after a garbage collection. However, I am not sure if this is safe
 202 for @BLACKHOLE_BQ@ closures (in @StgMiscClosures.hc@) when retainer profiling,
 203 which employs cost centre stacks, is going on.
 204 If it is safe, please leave a comment there.
 205
 206 @LDV_recordUse()@ is called on a closure whenever it is used, or \emph{entered}.
 207 Its state flag changes if necessary to indicate that it has been used, and
 208 the current global time is stored in its last use time field.
 209
 210 \section{Global Time \texttt{ldvTime} and Retainer Profiling}
 211
 212 The global time, stored in @ldvTime@, records the current time period.
 213 It is initialized to $1$ and incremented after each time a heap census
 214 is completed through an invocation of @LdvCensus()@.  Note that each
 215 value of @ldvTime@ represents a time \emph{period}, not a point in
 216 time.
 217
 218 All closures created between two successive invocations of
 219 @LdvCensus()@ have the same creation time.  If a closure is used at
 220 least once between two successive heap censuses, we consider the
 221 closure to be in the use phase during the corresponding time period
 222 (because we just set its last use time field to the current value of
 223 @ldvTime@ whenever it is used).  Notice that a closure with a creation
 224 time $t_c$ may be destroyed before the actual heap census for time
 225 $t_c$ and thus may \emph{not} be observed during the heap census for
 226 time $t_c$.  Such a closure does not show up in the profile at all.
 227
 228 In addition, the value of @ldvTime@ indicates which of LDVU profiling
 229 and retainer profiling is currently active: during LDVU profiling, it
 230 is initialized to $1$ in @initLdvProfiling()@ and then increments as
 231 LDVU profiling proceeds; during retainer profiling, however, it is
 232 always fixed to $0$.  Thus, wherever a piece of code shared by both
 233 retainer profiling and LDVU profiling comes to play, we usually need
 234 to first examine the value of @ldvTime@ if necessary. For instance,
 235 consider the macro @LDV_recordUse()@:
 236
 237 \begin{code}
 238 #define LDV_recordUse(c)                              \
 239   if (ldvTime > 0)                                    \
 240     LDVW((c)) = (LDVW((c)) & LDV_CREATE_MASK) | ldvTime | LDV_STATE_USE;
 241 \end{code}
 242
 243 If retainer profiling is being performed, @ldvTime@ is equal to $0$,
 244 and @LDV_recordUse()@ causes no side effect.\footnote{Due to this
 245 interference with LDVU profiling, retainer profiling slows down a bit;
 246 for instance, checking @ldvTime@ against $0$ in the above example
 247 would always evaluate to @rtsFalse@ during retainer profiling.
 248 However, this is the price to be paid for our decision not to employ a
 249 separate field for LDVU profiling.}
 250
 251 As another example, consider @LDV_recordCreate()@:
 252
 253 \begin{code}
 254 #define LDV_recordCreate(c)   \
 255   LDVW((c)) = (ldvTime << LDV_SHIFT) | LDV_STATE_CREATE
 256 \end{code}
 257
 258 The above definition of @LDV_recordCreate()@ works without any problem
 259 even for retainer profiling: during retainer profiling,
 260 a retainer set field (@hp.ldvw@) must be initialized to a null pointer.
 261 Since @ldvTime@ is fixed to $0$, @LDV_recordCreate()@ initializes
 262 retainer set fields correctly.
 263
 264 \section{Heap Censuses}
 265 \label{sec-heap-censuses}
 266
 267 The LDVU profiler performs heap censuses periodically by invoking the
 268 function @LdvCensus()@.  Because we need to know exactly which
 269 closures in the heap are live at census time, we always precede the
 270 census with a major garbage collection.
 271
 272 During a census, we examine each closure one by one and compute the
 273 following three quantities:
 274
 275 \begin{enumerate}
 276 \item the total size of all \emph{inherently used} closures.
 277 \item the total size of all closures which have not been used (yet).
 278 \item the total size of all closures which have been used at least once.
 279 \end{enumerate}
 280
 281 For most closures, a \emph{use} consists of entering the closure.  For
 282 unlifted objects which are never entered (eg. @ARR_WORDS@), it would
 283 be difficult to determine their points of use because such points are
 284 scattered around the implementation in various primitive operations.
 285 For this reason we consider all unlifted objects as ``inherently
 286 used''.  The following types of closures are considered to be
 287 inherently used: @TSO@, @MVAR@, @MUT_ARR_PTRS@, @MUT_ARR_PTRS_FROZEN@,
 288 @ARR_WORDS@, @WEAK@, @MUT_VAR@, @MUT_CONS@, @FOREIGN@, @BCO@, and
 289 @STABLE_NAME@.
 290
 291 The three quantities are stored in an @LdvGenInfo@ array @gi[]@.
 292 @gi[]@ is indexed by time period.  For instance, @gi[ldvTime]@ stores
 293 the three quantaties for the current global time period.  The
 294 structure @LdvGenInfo@ is defined as follows:
 295
 296 \begin{code}
 297 typedef struct {
 298   ...
 299   int inherentlyUsed;   // total size of 'inherently used' closures
 300   int notUsed;          // total size of 'not used yet' closures
 301   int used;             // total size of 'used at least once' closures
 302   ...
 303 } LdvGenInfo;
 304 \end{code}
 305
 306 The above three quantities account for mutually exclusive sets of closures.
 307 In other words, if a closure is not inherently used, it belongs to
 308 either the second or the third.
 309
 310 \subsection{Taking a Census of the Live Heap}
 311
 312 During a heap census, we need to visit every live closure once, so we
 313 perform a linear scan of the live heap after a major GC.  We can take
 314 advantage of the following facts to implement a linear scan for heap
 315 censuses:
 316
 317 \begin{itemize}
 318 \item The nursery is empty. The small object pool and the large object pool,
 319       however, may \emph{not} be empty. This is because the garbage collector
 320       invokes @scheduleFinalizer()@ after removing dead closures, and
 321       @scheduleFinalizer()@ may create new closures through @allocate()@.
 322 \item @IND@, @IND_OLDGEN@, and @EVACUATED@ closures do not appear in
 323 the live heap.
 324 \end{itemize}
 325
 326 There is one small complication when traversing the live heap: the
 327 garbage collector may have replaced @WEAK@ objects with @DEAD_WEAK@
 328 objects, which have a smaller size and hence leave some space before
 329 the next object.  To avoid this problem we change the size of
 330 @DEAD_WEAK@ objects to match that of @WEAK@ objects when profiling is
 331 enabled (see @StgMiscClosures.hc@).
 332
 333 \section{Destruction of Closures}
 334
 335 In order to compute the total size of closures for each of the four
 336 phases, we must report the destruction of every closure (except
 337 inherently used closures) to the LDVU profiler by invoking
 338 @LDV_recordDead()@.  @LDV_recordDead()@ must not be called on any
 339 inherently used closure because any invocation of @LDV_recordDead()@
 340 affects the statistics regarding void and drag phases, which no
 341 inherently used closure can be in.
 342
 343 @LDV_recordDead()@ updates two fields @voidNew@ and @dragNew@ in the
 344 @LdvGenInfo@ array @gi[]@:
 345
 346 \begin{code}
 347 typedef struct {
 348   ...
 349   int voidNew;
 350   int dragnew;
 351   ...
 352 } LdvGenInfo;
 353 \end{code}
 354
 355 @gi[ldvTime].voidNew@ accumulates the size of all closures satisfying
 356 the following two conditions: 1) observed during the heap census at
 357 time @ldvTime@; 2) now known to have been in the void phase at time
 358 @ldvTime@.  It is updated when a closure which has never been used is
 359 destroyed.  Suppose that a closure @c@ which has never been used is
 360 about to be destroyed.  If its creation time is $t_c$, we judge that
 361 @c@ has been in the void phase all its lifetime, namely, from time
 362 $t_c$ to @ldvTime@.  Since @c@ will not be observed during the next
 363 heap census, which corresponds to time @ldvTime@, @c@ contributes to
 364 the void phase of times $t_c$ through @ldvTime@ - 1.  Therefore, we
 365 increase the @voidNew@ field of @gi[@$t_c$@]@ through @gi[ldvTime - 1]@
 366  by the size of @c@.\footnote{In the actual implementation, we
 367 update @gi[$t_c$]@ and @gi[ldvTime]@ (not @gi[ldvTime@$ - $1@]@) only:
 368 @gi[$t_c$]@ and @gi[ldvTime]@ are increased and decreased by the size
 369 of @c@, respectively.  After finishing the program execution, we can
 370 correctly adjust all the fields as follows: @gi[$t_c$]@ is computed as
 371 $\sum_{i=0}^{t_c}$@gi[$i$]@.  }
 372
 373 @gi[ldvTime].dragNew@ accumulates the size of all closures satisfying the following
 374 two conditions: 1) observed during the heap census at time @ldvTime@;
 375 2) now known to have been in the drag phase at time @ldvTime@.
 376 It is updated when a closure which has been used at least once is destroyed.
 377 Suppose that a closure @c@ which has been used last at time $t_l$ is about to
 378 be destroyed.
 379 We judge that @c@ has been in the drag phase from time $t_l + 1$ to
 380 time @ldvTime@$ - 1$ (if $t_l + 1 > $@ldvTime@$ - 1$, nothing happens).
 381 Therefore, we increase the @dragNew@ field of @gi[@$t_l + 1$@]@ through
 382 @gi[ldvTime@$ - 1$@]@
 383 by the size of @c@.\footnote{As in the case of @voidNew@, we update
 384 @gi[@$t_l + 1$@]@ and @gi[ldvTime]@ only.}
 385
 386 Now we need to find out all the cases of closure destruction.
 387 There are four cases in which a closure is destroyed:
 388
 389 \begin{enumerate}
 390 \item A closure is overwritten with a blackhole:
 391   @UPD_BH_UPDATABLE()@ and @UPD_BH_SINGLE_ENTRY()@ in @includes/StgMacros.h@,
 392   @threadLazyBlackHole()@ and @threadSqueezeStack()@ in @GC.c@,
 393   the entry code for @BLACKHOLE@ closures in @StgMiscClosures.hc@ (a
 394   @BLACKHOLE@ closure is changed into a @BLACKHOLE_BQ@ closure).
 395   We call either @LDV_recordDead()@ or @LDV_recordDead_FILL_SLOP_DYNAMIC()@.
 396
 397 \item A weak pointer is overwritten with a dead weak pointer:
 398   @finalizzeWeakzh_fast()@ in @PrimOps.hc@,
 399   @finalizeWeakPointersNow()@ and @scheduleFinalizers()@ in @Weak.c@.
 400   Since a weak pointer is inherently used, we do not call @LDV_recordDead()@.
 401
 402 \item A closure is overwritten with an indirection closure:
 403   @updateWithIndirection()@ and @updateWithPermIndirection()@ in @Storage.h@,
 404   @scavenge()@ in @GC.c@, in which an @IND_PERM@ closure is explicitly replaced
 405   with an @IND_OLDGEN_PERM@ closure during scavenging.
 406   We call either @LDV_recordDead()@ or @LDV_recordDead_FILL_SLOP_DYNAMIC()@.
 407
 408 \item Closures are removed permanently from the graph during garbage
 409 collections.  We locate and dispose of all dead closures by linearly
 410 scanning the from-space right before tidying up.  This is feasible
 411 because any closures which is about to be removed from the graph still
 412 remains in the from-space until tidying up is completed.  The next
 413 subsection explains how to implement this idea.
 414 \end{enumerate}
 415
 416 \subsection{Linear scan of the from-space during garbage collections}
 417
 418 In order to implement linear scan of the from-space during a garbage collection
 419 (before tidying up),
 420 we need to take into consideration the following facts:
 421
 422 \begin{itemize}
 423 \item The pointer @free@ of a block in the nursery may incorrectly point to
 424 a byte past its actual boundary.
 425 This happens because
 426 the Haskell mutator first increases @hpLim@ without comparing it with the
 427 actual boundary when allocating fresh memory for a new closure.
 428 @hpLim@ is later assigned to the pointer @free@ of the corresponding memory
 429 block, which means that during a heap census, the pointer @hpLim@ may not
 430 be trusted.
 431 Notice that @hpLim@ is not available during LDVU profiling; it is valid
 432 only during the Haskell mutator time.
 433
 434 \item The from-space may well contain a good number of @EVACUATED@ closures,
 435 and they must be skipped over.
 436
 437 \item The from-space includes the nursery.
 438 Furthermore, a closure in the nursery may not necessarily be adjacent to the next
 439 closure because slop words may lie between the two closures;
 440 the Haskell mutator may allocate more space than actually needed in the
 441 nursery when creating a closure, potentially leaving slop words.
 442 \end{itemize}
 443
 444 The first problem is easily solved by limiting the scan up to the
 445 actual block boundary for each nursery block (see
 446 @processNurseryForDead()@).  In other words, for a nursery block
 447 descriptor @bd@, whichever of @bd->start@$ + $@BLOCK_SIZE_W@ and
 448 @bd->free@ is smaller is used as the actual boundary.
 449
 450 We solve the second problem by exploiting LDV words of @EVACUATED@
 451 closures: we store the size of an evacuated closure, which now resides
 452 in the to-space, in the LDV word of the new @EVACUATED@ closure
 453 occupying its memory.  This is easily implemented by inserting a call
 454 to the macro @SET_EVACUAEE_FOR_LDV()@ in @copy()@ and @copyPart()@ (in
 455 @GC.c@).  Thus, when we encounter an @EVACUATED@ closure while
 456 linearly scanning the nursery, we can skip a correct number of words
 457 by referring to its LDV word.
 458
 459 The third problem could be partially solved by always monitoring @Hp@
 460 during the Haskell mutator time: whenever @Hp@ is increased, we fill
 461 with zeroes as many words as the change of @HP@. Then, we could skip
 462 any trailing zero words when linearly scanning the nursery.
 463 Alternatively we could initialize the entire nursery with zeroes after
 464 each garbage collection and not worry about any change made to @Hp@
 465 during the Haskell mutator time.  The number of zero words to be
 466 written to the nursery could be reduced in the first approach, for we
 467 do not have to fill the header for a new closure.  Nevertheless we
 468 choose to employ the second approach because it simplifies the
 469 implementation code significantly (see @resetNurseries()@ in
 470 @Storage.c@).  Moreover, the second approach compensates for its
 471 redundant initialization cost by providing faster execution (due to a
 472 single memory write loop in contrast to frequent memory write loops in
 473 the first approach).  Also, we attribute the initialization cost to
 474 the runtime system and thus the Haskell mutator behavior is little
 475 affected.
 476
 477 There is further complication though: occasionally a closure is
 478 overwritten with a closure of a smaller size, leaving some slop
 479 between itself and the next closure in the heap.  There are two cases:
 480
 481 \begin{enumerate}
 482 \item A closure is overwritten with a blackhole.
 483 \item A closure is overwritten with an indirection closure.
 484 \end{enumerate}
 485
 486 In either case, an existing closure is destroyed after being replaced
 487 with a new closure.  If the two closures are of the same size, no slop
 488 words are introduced and we only need to invoke @LDV_recordDead()@ on
 489 the existing closure, which cannot be an inherently used closure.  If
 490 not, that is, the new closure is smaller than the existing closure
 491 (the opposite cannot happen), we need to fill one or more slop words
 492 with zeroes as well as invoke @LDV_recordDead()@ on the existing
 493 closure.  The macro @LDV_recordDead_FILL_SLOP_DYNAMIC()@ accomplishes
 494 these two tasks: it determines the size of the existing closure,
 495 invokes @LDV_recordDead()@, and fills the slop words with zeroes.
 496 After excluding all cases in which the two closures are of the same
 497 size, we invoke @LDV_recordDead_FILL_SLOP_DYNAMIC()@ only from:
 498
 499 \begin{enumerate}
 500 \item @threadLazyBlackHole()@ and @threadSqueezeStack()@ in @GC.c@
 501 (for lazy blackholing),
 502 \item @UPD_BH_UPDATABLE()@ and @UPD_BH_SINGLE_ENTRY()@ in
 503 @includes/StgMacros.h@ (for eager blackholing, which isn't the
 504 default),
 505 \item @updateWithIndirection()@ and @updateWithPermIndirection()@
 506 in @Storage.h@.\footnote{Actually slop words created in
 507 @updateWithIndirection()@ cannot survive major garbage collections.
 508 Still we invoke @LDV\_recordDead\_FILL\_SLOP\_DYNAMIC()@ to support linear
 509 scan of the heap during a garbage collection, which is discussed in the next
 510 section.}
 511 \end{enumerate}
 512
 513 The linear scan of the from-space is initiated by the garbage
 514 collector.  From the function @LdvCensusForDead()@, every dead closure
 515 in the from-space is visited through an invocation of
 516 @processHeapClosureForDead()@.
 517
 518 \subsection{Final scan of the heap}
 519
 520 Since a closure surviving the final garbage collection is implicitly destroyed
 521 when the runtime system shuts down, we must invoke @processHeapClosureForDead@
 522 on \emph{every} closure in the heap once more after the final garbage collection.
 523 The function @LdvCensusKillAll()@, which is invoked from @shutdownHaskell()@
 524 in @RtsStartup.c@, traverses the entire heap and visits each closure.
 525 It also stops LDVU profiling by resetting @ldvTime@ to $0$.
 526
 527 It may be that after LDVU profiling stops, new closures may be created
 528 and even garbage collections may be performed.
 529 We choose to ignore these closures because they are all concerned about
 530 finalizing weak pointers (in @finalizeWeakPointersNow()@).
 531 It can be catastrophic to invoke @LdvCensusKillAll()@ after finishing
 532 @finalizeWeakPointersNow()@: @finalizeWeakPointersNow()@ calls
 533 @rts_evalIO()@, which is essentially initiating a new program execution,
 534 and no assumptions made upon LDVU profiling hold any longer.
 535
 536 \section{Time of Use}
 537
 538 In order to yield correct LDVU profiling results, we must make sure
 539 that @LDV_recordUse()@ be called on a closure whenever it is used;
 540 otherwise, most of closures would be reported to be in the void phase.
 541 @includes/StgLdvProf.h@ provides an entry macro @LDV_ENTER@ which
 542 expands to @LDV_recordUse()@.  The compiler arranges to invoke
 543 @LDV_ENTER@ in the entry code for every dynamic closure it generates
 544 code for (constructors, thunks and functions).  We also have to add
 545 @LDV_ENTER@ calls to the closures statically compiled into the RTS:
 546 @PAP@s, @AP_UPD@s, standard thunk forms (in @StgStdThunks.hc@, and
 547 several others in @StgMiscClosures.hc@.
 548
 549 \section{Computing Final Statistics}
 550
 551 After the final scan of the heap, we can accurately determine the total
 552 size of closures in one of the four phases at the moment of each heap census.
 553 The structure @LdvGenInfo@ is augmented with two additional fields
 554 @voidTotal@ and @dragTotal@:
 555
 556 \begin{code}
 557 typedef struct {
 558   ...
 559   int voidTotal;
 560   int dragTotal;
 561   ...
 562 } LdvGenInfo;
 563 \end{code}
 564
 565 @gi[@$i$@].voidTotal@ and @gi[@$i$@].dragTotal@ are computed
 566 from @gi[@$i$@].voidNew@ and @gi[@$i$@].dragNew@, respectively.\footnote{Due
 567 to a slight optimization described before, @gi[@$i$@].voidTotal@ is actually
 568 computed as $\sum_{1 \leq j \leq i}$@gi[@$j$@].voidNew@.
 569 @gi[@$i$@].dragTotal@ is computed in a similar way.}
 570 Then, the total size of closures in the lag phase @gi[@$i$@].lagTotal@ is computed
 571 as @gi[@$i$@].notUsed@$-$@gi[@$i$@].voidTotal@ (because any unused closure
 572 is either in the void phase or in the lag phase).
 573 Similarly,
 574 the total size of closures in the use phase @gi[@$i$@].useTotal@ is computed
 575 as @gi[@$i$@].used@$-$@gi[@$i$@].dragTotal@ (because any used closure
 576 is either in the use phase or in the drag phase).
 577 @endLdvProfiling()@, called from @endHeapProfiling@ in @ProfHeap.c@, computes these
 578 final statistics.
 579
 580 \section{Usage}
 581
 582 The runtime system option @-hL@ tells the executable program to
 583 perform LDVU profiling and produce a @.hp@ file:
 584
 585 \begin{code}
 586 $ Foo.out +RTS -hL
 587 \end{code}
 588
 589 The option @-i@ can be used to
 590 specify a desired interval at which LDVU profiling is performed.
 591 The default and minimum value is half a second:
 592
 593 \begin{code}
 594 $ Foo.out +RTS -hL -i2.5 -RTS
 595 \end{code}
 596
 597 The @.hp@ file can be supplied to the @hp2ps@ program to create a postscript
 598 file showing the progress of LDVU profiling in a graph:
 599
 600 \begin{code}
 601 $ hp2ps Foo.hs
 602 $ gv Foo.ps
 603 \end{code}
 604
 605 The horizontal axis of the graph is in the Haskell mutator time, which excludes
 606 the runtime system time such as garbage collection time and LDVU profiling
 607 time.
 608 The Haskell mutator runs a bit slower than it would without performing
 609 LDVU profiling, but the difference is minute.
 610 Also, the timer employed to periodically perform retainer profiling
 611 is not perfectly accurate. Therefore, the result may slightly vary for each
 612 execution of retainer profiling.
 613
 614 \textbf{To do:} Currently the LDVU profiling is not supported with @-G1@ option.
 615
 616 \textbf{To do:} When we perform LDVU profiling, the Haskell mutator time seems to
 617 be affected by @-S@ or @-s@ runtime option. For instance, the following
 618 two options should result in nearly same profiling outputs, but
 619 the second run (without @-Sstderr@ option) spends almost twice as
 620 long in the Haskell mutator as the first run:
 621 1) @+RTS -Sstderr -hL -RTS@; 2) @+RTS -hL -RTS@.
 622 This is quite a subtle bug because this wierd phenomenon is not
 623 observed in retainer profiling, yet the implementation of
 624 @mut_user_time_during_LDV()@ is completely analogous to that of
 625 @mut_user_time_during_RP()@. The overall shapes of the resultant graphs
 626 are almost the same, though.
 627
 628 \section{Files}
 629
 630 This section gives a summary of changes made to the GHC in
 631 implementing LDVU profiling.
 632 Only three files (@includes/StgLdvProf.h@, @LdvProfile.c@, and
 633 @LdvProfile.h@) are new, and all others exist in the GHC.
 634
 635 @\includes@ directory:
 636
 637 \begin{description}
 638 \item[StgLdvProf.h] defines type @LdvGenInfo@, constants, and macros related
 639 with LDVU profiling.
 640 \item[ClosureMacros.h] changes macro @SET_PROF_HDR()@.
 641 \item[Stg.h] includes th header file @StgLdvProf.h@.
 642 \item[StgMacros.h] changes macros @UPD_BH_UPDATABLE()@ and @UPD_BH_SINGLE_ENTRY()@.
 643 \end{description}
 644
 645 @\rts@ directory:
 646
 647 \begin{description}
 648 \item[GC.c] invokes @LdvCensusForDead()@ before tidying up, sets @hasBeenAnyGC@ to
 649   @rtsTrue@, and changes @copy()@ and @copyPart()@.
 650   Invokes @LDV_recordDead()@ and @LDV_recordDead_FILL_SLOP_DYNAMIC()@.
 651 \item[Itimer.c] changes @handle_tick()@.
 652 \item[LdvProfile.c] implements the LDVU profiling engine.
 653 \item[LdvProfile.h] is the header for @LdvProfile.c@.
 654 \item[PrimOps.hc] changes @finalizzeWeakzh_fast()@.
 655 \item[ProfHeap.c] changes @initHeapProfiling()@ and @endHeapProfiling()@.
 656 \item[Profiling.c] changes @initProfilingLogFile@ and @report_ccs_profiling()@.
 657 \item[Proftimer.c] declares @ticks_to_retainer_ldv_profiling@,
 658   @performRetainerLdvProfiling@, and @doContextSwitch@.
 659 \item[Proftimer.h] is the header for @Proftimer.c@. Defines @PROFILING_MIN_PERIOD@,
 660   which specifies the minimum profiling period and the default profiling period.
 661 %\item[RtsAPI.c] implements @setProfileHeader()@.
 662 \item[RtsFlags.c]
 663   sets @RtsFlags.ProfFlags.doHeapProfile@,
 664   adds a string to @usage_text[]@ in @setupRtsFlags()@.
 665 \item[RtsFlags.h] defines constants @HEAP_BY_LDV@ and @LDVchar@.
 666 \item[RtsStartup.c] changes @shutDownHaskell()@.
 667 \item[Schedule.c] changes @schedule()@.
 668 \item[Stats.c]
 669   declares @LDV_start_time@, @LDV_tot_time@, @LDVe_start_time@,
 670   @LDVe_tot_time@.
 671   Changes @mut_user_time_during_GC()@, @mut_user_time()@,
 672   @stat_startExit()@,
 673   @stat_endExit()@, and
 674   @stat_exit()@.
 675   Defines
 676   @mut_user_time_during_LDV()@,
 677   @stat_startLDV()@, and
 678   @stat_endLDV()@.
 679 \item[Stats.h] is hte header for @Stats.c@.
 680 \item[StgMiscClosures.hc] inserts entry macros in
 681   @stg_IND_entry()@, @stg_IND_PERM_entry()@, @stg_IND_OLDGEN_entry()@,
 682   @stg_IND_OLDGEN_PERM_entry()@, @stg_BLACKHOLE_entry()@, @stg_BLACKHOLE_BQ_entry()@,
 683   and @stg_CAF_BLACKHOLE_entry()@.
 684   Invokes @LDV_recordDead()@ in @stg_BLACKHOLE_entry@.
 685   Redefines @stg_DEAD_WEAK_info@.
 686 \item[Storage.c] changes @initStorage()@, @resetNurseries()@, and @allocNursery()@.
 687 \item[Storage.h] changes @updateWithIndirection()@ and @updateWithPermIndirection()@.
 688 \item[Updates.hc] inserts entry macros in @stg_PAP_entry()@ and @stg_AP_UPD_entry()@.
 689 \item[Weak.c] changes @scheduleFinalizers()@.
 690 \end{description}
 691
 692 \bibliographystyle{plain}
 693 \bibliography{reference}
 694
 695 \end{document}