1 \documentclass{article}
2 \usepackage{code,a4wide}
4 \usepackage{graphics,epsfig,epic,eepic,epsfig}
6 \setlength{\parskip}{0.25cm}
7 \setlength{\parsep}{0.25cm}
8 \setlength{\topsep}{0cm}
9 \setlength{\parindent}{0cm}
10 \renewcommand{\textfraction}{0.2}
11 \renewcommand{\floatpagefraction}{0.7}
15 \newcommand{\block}{block}
16 \newcommand{\Block}{Block}
17 \newcommand{\segment}{segment}
18 \newcommand{\Segment}{Segment}
19 \newcommand{\step}{step}
20 \newcommand{\Step}{Step}
22 \newcommand{\note}[1]{{\em $\spadesuit$ #1}}
25 \title{Implementation of Lag/Drag/Void/Use Profiling}
26 \author{Sungwoo Park \\ Simon Marlow}
31 \section{Lag/Drag/Void/Use Profiling}
33 \emph{Lag/Drag/Void/Use} (LDVU) profiling~\cite{RR} is a profiling technique
34 which yields a summary of the biography of all the dynamic closures created
35 during program execution.
36 In this profiling scheme,
37 the biography of a closure is determined by four important events associated
38 with the closure: \emph{creation}, \emph{first use},
39 \emph{last use}, and \emph{destruction} (see Figure~\ref{fig-ldv}).
40 The intervals between these successive events correspond to three phases
41 for the closure: \emph{lag} (between creation and first use),
42 \emph{use} (between first use and last use), and
43 \emph{drag} (between last use and destruction).
44 If the closure is never used, it is considered to remain in the \emph{void}
45 phase all its lifetime.
50 \caption{The biography of a closure}
55 The LDVU profiler regularly performs heap censuses during program execution.
56 Each time a heap census is performed, the LDVU profiler increments a global
57 time, which is used for timing all the events (such as creation and destruction
58 of a closure) occurring during program execution.
59 Hence, for instance, all closures creating between two successive heap censuses
60 have the same creation time and belong to the same \emph{generation}.\footnote{In
61 this document, a generation is related with heap censuses, not garbage collections
62 as in other profiling schemes.}
63 After the program terminates, it yields a post-mortem report on how much
64 of the \emph{live} graph is in one of the four phases at the moment of each
67 It must be emphasized that the LDVU profiler considers only live closures;
68 it should not take into consideration dead closures which do not constitute
69 the graph. Therefore, the result of LDVU profiling does not depend on the
70 frequency of garbage collections.
72 This document describes the implementation of LDVU profiling on the Glasgow
73 Haskell Compiler runtime system.\footnote{Unless otherwise noted, all identifiers
74 are defined in @LdvProfile.c@}.
76 \section{An Overview of the Implementation}
78 Every closure is augmented with an additional word in its profiling header
79 to accommodate three additional pieces of information:
80 1) state flag indicating whether the closure has been used at least once or not.
81 2) creation time; 3) time of most recent use if any so far.
82 We refer to such a word as an LDV word.
84 The LDVU profiler maintains a global time, stored in @ldvTime@.
85 It is incremented each time a heap census is performed.
86 During a heap census, the profiler scans all live closures and computes the
88 1) the total size of all closures which have never been used;
89 2) the total size of all closures which have been used at least once
90 in the past.\footnote{There is another category of closures, namely,
91 \emph{inherently used} closures. We will explain
92 in Section~\ref{sec-heap-censuses}.}
93 It is not until the whole program execution finishes that the profiler
94 can actually decide the total size corresponding to each of the four phases for
95 a particular heap census. It is only when a closure is destroyed that the profiler
96 can determine how long the closure has been in a specific phase.
97 Therefore, it is not sufficient to perform heap censuses periodically in order to
98 compute the profiling statistics: the runtime system needs to intercept
99 all events associated with any closures and update necessary information.
101 All events associated with closures are handled by one of the three
103 in @includes/StgLdv.h@: @LDV_recordCreate()@, @LDV_recordUse()@, and
107 \item{@LDV_recordCreate()@} is called when a closure is created and updates its
110 \item{@LDV_recordUse()@} is called when a closure is used and updates its most recent
113 \item{@LDV_recordDead()@} is called when a closure @c@ is removed from the graph.
114 It does not update its LDV word (because @c@ is about to be destroyed).
115 Instead, it updates the statistics on LDVU profiling according to the following
117 if @c@ has never been used (which is indicated by the state flag in its LDV
119 @c@ contributes to the void phase from its creation time to the last census
120 time; if @c@ was used at least once (which is also indicated by the state flag),
121 @c@ contributes to the @drag@ phase after its last use time.
124 At the end of the program execution, the profiler performs a last census during
125 which all closures in the heap are declared to be dead and @LDV_recordDead()@
126 is invoked on each of them.
127 Then, the profiler computes the final statistics.
131 We choose to share the LDV word for both retainer profiling and LDVU
132 profiling, which cannot take place simultaneously.
133 This is the reason why there is a
134 union structure inside the @StgProHeader@ structure.
135 The field @hp.ldvw@ in the @StgProfHeader@ structure corresponds to the LDV
141 retainerSet *rs; // Retainer Set
142 StgWord ldvw; // Lag/Drag/Void Word
146 For instance, the LDV word of a closure @c@ can now be accessed with
147 @c->header.prof.hp.ldvw@ (or by @LDVW(c)@ where @LDVW()@ is a macro in
148 @includes/StgLdvProf.h@).
150 An LDV word is divided into three fields, whose position is specified
151 by three constants in @includes/StgLdvProf.h@:
153 \item{@LDV_STATE_MASK@} corresponds to the state flag.
154 \item{@LDV_CREATE_MASK@} corresponds to the creation time.
155 \item{@LDV_LAST_MASK@} corresponds to the most recent use time.
157 The constant @LDV_SHIFT@ specifies how many bits are allocated for
158 creation time or most recent use time.
159 For instance, the creation time of a closure @c@ can be obtained by
160 @(LDVW(c) & LDV_CREATE_MASK) >> LDV_SHIFT@.
162 The creation time field and the most recent use time field can be set only by the
163 macros @LDV_recordCreate()@ and @LDV_recordUse()@.
164 @LDV_recordCreate()@ must be called whenever a new dynamic closure is created,
165 and this is handily accomplished by rewriting the macro @SET_PROF_HDR()@
166 (in @includes/ClosureMacros.h@) (we do not need to change @SET_STATIC_PROF_HDR()@
167 because static closures are not involved in LDVU profiling at all):
170 #define SET_PROF_HDR(c,ccs_) \
171 ((c)->header.prof.ccs = ccs_, \
172 LDV_recordCreate((c)))
175 There are a few cases in which the info table of a closure changes through
176 an explicit invocation of @SET_INFO()@ or a direct assignment to its @header.info@
177 field: 1) an indirection closure is replaced by an old-generation
178 indirection closure; 2) a thunk is replaced by a blackhole; 3) a thunk is replaced
179 by an indirection closure when its evaluation result becomes available.
181 \emph{We regard such a situation as
182 the destruction of an old closure followed by the creation of a new closure
183 at the same memory address.}\footnote{This would be unnecessary if the two closures
184 are of the same size, but it is not always the case. We choose to distinguish
185 the two closures for the sake of consistency.}
186 For instance, when an @IND_PERM@ closure is replaced by an @IND_OLDGEN_PERM@
187 closures (during scavenging in @GC.c@), we wrap the invocation of @SET_INFO()@ with
188 the invocations of @LDV_recordDead()@ and @LDV_recordCreate()@ as follows
189 (@LDV_recordDead()@ requires the actual size of the closures being destroyed):
192 LDV_recordDead((StgClosure *)p, sizeofW(StgInd) - sizeofW(StgProfHeader));
193 SET_INFO(((StgClosure *)p), &stg_IND_OLDGEN_PERM_info);
194 LDV_recordCreate((StgClosure *)p);
198 A direct assignment to the @header.info@ field implies that its cost centre
199 field is not initialized. This is no problem in the case of @EVACUATED@ closures
201 not be used again after a garbage collection. However, I am not sure if this is safe
202 for @BLACKHOLE_BQ@ closures (in @StgMiscClosures.hc@) when retainer profiling,
203 which employs cost centre stacks, is going on.
204 If it is safe, please leave a comment there.
206 @LDV_recordUse()@ is called on a closure whenever it is used, or \emph{entered}.
207 Its state flag changes if necessary to indicate that it has been used, and
208 the current global time is stored in its last use time field.
210 \section{Global Time \texttt{ldvTime} and Retainer Profiling}
212 The global time, stored in @ldvTime@, records the current time period.
213 It is initialized to $1$ and incremented after each time a heap census
214 is completed through an invocation of @LdvCensus()@. Note that each
215 value of @ldvTime@ represents a time \emph{period}, not a point in
218 All closures created between two successive invocations of
219 @LdvCensus()@ have the same creation time. If a closure is used at
220 least once between two successive heap censuses, we consider the
221 closure to be in the use phase during the corresponding time period
222 (because we just set its last use time field to the current value of
223 @ldvTime@ whenever it is used). Notice that a closure with a creation
224 time $t_c$ may be destroyed before the actual heap census for time
225 $t_c$ and thus may \emph{not} be observed during the heap census for
226 time $t_c$. Such a closure does not show up in the profile at all.
228 In addition, the value of @ldvTime@ indicates which of LDVU profiling
229 and retainer profiling is currently active: during LDVU profiling, it
230 is initialized to $1$ in @initLdvProfiling()@ and then increments as
231 LDVU profiling proceeds; during retainer profiling, however, it is
232 always fixed to $0$. Thus, wherever a piece of code shared by both
233 retainer profiling and LDVU profiling comes to play, we usually need
234 to first examine the value of @ldvTime@ if necessary. For instance,
235 consider the macro @LDV_recordUse()@:
238 #define LDV_recordUse(c) \
240 LDVW((c)) = (LDVW((c)) & LDV_CREATE_MASK) | ldvTime | LDV_STATE_USE;
243 If retainer profiling is being performed, @ldvTime@ is equal to $0$,
244 and @LDV_recordUse()@ causes no side effect.\footnote{Due to this
245 interference with LDVU profiling, retainer profiling slows down a bit;
246 for instance, checking @ldvTime@ against $0$ in the above example
247 would always evaluate to @rtsFalse@ during retainer profiling.
248 However, this is the price to be paid for our decision not to employ a
249 separate field for LDVU profiling.}
251 As another example, consider @LDV_recordCreate()@:
254 #define LDV_recordCreate(c) \
255 LDVW((c)) = (ldvTime << LDV_SHIFT) | LDV_STATE_CREATE
258 The above definition of @LDV_recordCreate()@ works without any problem
259 even for retainer profiling: during retainer profiling,
260 a retainer set field (@hp.ldvw@) must be initialized to a null pointer.
261 Since @ldvTime@ is fixed to $0$, @LDV_recordCreate()@ initializes
262 retainer set fields correctly.
264 \section{Heap Censuses}
265 \label{sec-heap-censuses}
267 The LDVU profiler performs heap censuses periodically by invoking the
268 function @LdvCensus()@. Because we need to know exactly which
269 closures in the heap are live at census time, we always precede the
270 census with a major garbage collection.
272 During a census, we examine each closure one by one and compute the
273 following three quantities:
276 \item the total size of all \emph{inherently used} closures.
277 \item the total size of all closures which have not been used (yet).
278 \item the total size of all closures which have been used at least once.
281 For most closures, a \emph{use} consists of entering the closure. For
282 unlifted objects which are never entered (eg. @ARR_WORDS@), it would
283 be difficult to determine their points of use because such points are
284 scattered around the implementation in various primitive operations.
285 For this reason we consider all unlifted objects as ``inherently
286 used''. The following types of closures are considered to be
287 inherently used: @TSO@, @MVAR@, @MUT_ARR_PTRS@, @MUT_ARR_PTRS_FROZEN@,
288 @ARR_WORDS@, @WEAK@, @MUT_VAR@, @MUT_CONS@, @FOREIGN@, @BCO@, and
291 The three quantities are stored in an @LdvGenInfo@ array @gi[]@.
292 @gi[]@ is indexed by time period. For instance, @gi[ldvTime]@ stores
293 the three quantaties for the current global time period. The
294 structure @LdvGenInfo@ is defined as follows:
299 int inherentlyUsed; // total size of 'inherently used' closures
300 int notUsed; // total size of 'not used yet' closures
301 int used; // total size of 'used at least once' closures
306 The above three quantities account for mutually exclusive sets of closures.
307 In other words, if a closure is not inherently used, it belongs to
308 either the second or the third.
310 \subsection{Taking a Census of the Live Heap}
312 During a heap census, we need to visit every live closure once, so we
313 perform a linear scan of the live heap after a major GC. We can take
314 advantage of the following facts to implement a linear scan for heap
318 \item The nursery is empty. The small object pool and the large object pool,
319 however, may \emph{not} be empty. This is because the garbage collector
320 invokes @scheduleFinalizer()@ after removing dead closures, and
321 @scheduleFinalizer()@ may create new closures through @allocate()@.
322 \item @IND@, @IND_OLDGEN@, and @EVACUATED@ closures do not appear in
326 There is one small complication when traversing the live heap: the
327 garbage collector may have replaced @WEAK@ objects with @DEAD_WEAK@
328 objects, which have a smaller size and hence leave some space before
329 the next object. To avoid this problem we change the size of
330 @DEAD_WEAK@ objects to match that of @WEAK@ objects when profiling is
331 enabled (see @StgMiscClosures.hc@).
333 \section{Destruction of Closures}
335 In order to compute the total size of closures for each of the four
336 phases, we must report the destruction of every closure (except
337 inherently used closures) to the LDVU profiler by invoking
338 @LDV_recordDead()@. @LDV_recordDead()@ must not be called on any
339 inherently used closure because any invocation of @LDV_recordDead()@
340 affects the statistics regarding void and drag phases, which no
341 inherently used closure can be in.
343 @LDV_recordDead()@ updates two fields @voidNew@ and @dragNew@ in the
344 @LdvGenInfo@ array @gi[]@:
355 @gi[ldvTime].voidNew@ accumulates the size of all closures satisfying
356 the following two conditions: 1) observed during the heap census at
357 time @ldvTime@; 2) now known to have been in the void phase at time
358 @ldvTime@. It is updated when a closure which has never been used is
359 destroyed. Suppose that a closure @c@ which has never been used is
360 about to be destroyed. If its creation time is $t_c$, we judge that
361 @c@ has been in the void phase all its lifetime, namely, from time
362 $t_c$ to @ldvTime@. Since @c@ will not be observed during the next
363 heap census, which corresponds to time @ldvTime@, @c@ contributes to
364 the void phase of times $t_c$ through @ldvTime@ - 1. Therefore, we
365 increase the @voidNew@ field of @gi[@$t_c$@]@ through @gi[ldvTime - 1]@
366 by the size of @c@.\footnote{In the actual implementation, we
367 update @gi[$t_c$]@ and @gi[ldvTime]@ (not @gi[ldvTime@$ - $1@]@) only:
368 @gi[$t_c$]@ and @gi[ldvTime]@ are increased and decreased by the size
369 of @c@, respectively. After finishing the program execution, we can
370 correctly adjust all the fields as follows: @gi[$t_c$]@ is computed as
371 $\sum_{i=0}^{t_c}$@gi[$i$]@. }
373 @gi[ldvTime].dragNew@ accumulates the size of all closures satisfying the following
374 two conditions: 1) observed during the heap census at time @ldvTime@;
375 2) now known to have been in the drag phase at time @ldvTime@.
376 It is updated when a closure which has been used at least once is destroyed.
377 Suppose that a closure @c@ which has been used last at time $t_l$ is about to
379 We judge that @c@ has been in the drag phase from time $t_l + 1$ to
380 time @ldvTime@$ - 1$ (if $t_l + 1 > $@ldvTime@$ - 1$, nothing happens).
381 Therefore, we increase the @dragNew@ field of @gi[@$t_l + 1$@]@ through
382 @gi[ldvTime@$ - 1$@]@
383 by the size of @c@.\footnote{As in the case of @voidNew@, we update
384 @gi[@$t_l + 1$@]@ and @gi[ldvTime]@ only.}
386 Now we need to find out all the cases of closure destruction.
387 There are four cases in which a closure is destroyed:
390 \item A closure is overwritten with a blackhole:
391 @UPD_BH_UPDATABLE()@ and @UPD_BH_SINGLE_ENTRY()@ in @includes/StgMacros.h@,
392 @threadLazyBlackHole()@ and @threadSqueezeStack()@ in @GC.c@,
393 the entry code for @BLACKHOLE@ closures in @StgMiscClosures.hc@ (a
394 @BLACKHOLE@ closure is changed into a @BLACKHOLE_BQ@ closure).
395 We call either @LDV_recordDead()@ or @LDV_recordDead_FILL_SLOP_DYNAMIC()@.
397 \item A weak pointer is overwritten with a dead weak pointer:
398 @finalizzeWeakzh_fast()@ in @PrimOps.hc@,
399 @finalizeWeakPointersNow()@ and @scheduleFinalizers()@ in @Weak.c@.
400 Since a weak pointer is inherently used, we do not call @LDV_recordDead()@.
402 \item A closure is overwritten with an indirection closure:
403 @updateWithIndirection()@ and @updateWithPermIndirection()@ in @Storage.h@,
404 @scavenge()@ in @GC.c@, in which an @IND_PERM@ closure is explicitly replaced
405 with an @IND_OLDGEN_PERM@ closure during scavenging.
406 We call either @LDV_recordDead()@ or @LDV_recordDead_FILL_SLOP_DYNAMIC()@.
408 \item Closures are removed permanently from the graph during garbage
409 collections. We locate and dispose of all dead closures by linearly
410 scanning the from-space right before tidying up. This is feasible
411 because any closures which is about to be removed from the graph still
412 remains in the from-space until tidying up is completed. The next
413 subsection explains how to implement this idea.
416 \subsection{Linear scan of the from-space during garbage collections}
418 In order to implement linear scan of the from-space during a garbage collection
420 we need to take into consideration the following facts:
423 \item The pointer @free@ of a block in the nursery may incorrectly point to
424 a byte past its actual boundary.
426 the Haskell mutator first increases @hpLim@ without comparing it with the
427 actual boundary when allocating fresh memory for a new closure.
428 @hpLim@ is later assigned to the pointer @free@ of the corresponding memory
429 block, which means that during a heap census, the pointer @hpLim@ may not
431 Notice that @hpLim@ is not available during LDVU profiling; it is valid
432 only during the Haskell mutator time.
434 \item The from-space may well contain a good number of @EVACUATED@ closures,
435 and they must be skipped over.
437 \item The from-space includes the nursery.
438 Furthermore, a closure in the nursery may not necessarily be adjacent to the next
439 closure because slop words may lie between the two closures;
440 the Haskell mutator may allocate more space than actually needed in the
441 nursery when creating a closure, potentially leaving slop words.
444 The first problem is easily solved by limiting the scan up to the
445 actual block boundary for each nursery block (see
446 @processNurseryForDead()@). In other words, for a nursery block
447 descriptor @bd@, whichever of @bd->start@$ + $@BLOCK_SIZE_W@ and
448 @bd->free@ is smaller is used as the actual boundary.
450 We solve the second problem by exploiting LDV words of @EVACUATED@
451 closures: we store the size of an evacuated closure, which now resides
452 in the to-space, in the LDV word of the new @EVACUATED@ closure
453 occupying its memory. This is easily implemented by inserting a call
454 to the macro @SET_EVACUAEE_FOR_LDV()@ in @copy()@ and @copyPart()@ (in
455 @GC.c@). Thus, when we encounter an @EVACUATED@ closure while
456 linearly scanning the nursery, we can skip a correct number of words
457 by referring to its LDV word.
459 The third problem could be partially solved by always monitoring @Hp@
460 during the Haskell mutator time: whenever @Hp@ is increased, we fill
461 with zeroes as many words as the change of @HP@. Then, we could skip
462 any trailing zero words when linearly scanning the nursery.
463 Alternatively we could initialize the entire nursery with zeroes after
464 each garbage collection and not worry about any change made to @Hp@
465 during the Haskell mutator time. The number of zero words to be
466 written to the nursery could be reduced in the first approach, for we
467 do not have to fill the header for a new closure. Nevertheless we
468 choose to employ the second approach because it simplifies the
469 implementation code significantly (see @resetNurseries()@ in
470 @Storage.c@). Moreover, the second approach compensates for its
471 redundant initialization cost by providing faster execution (due to a
472 single memory write loop in contrast to frequent memory write loops in
473 the first approach). Also, we attribute the initialization cost to
474 the runtime system and thus the Haskell mutator behavior is little
477 There is further complication though: occasionally a closure is
478 overwritten with a closure of a smaller size, leaving some slop
479 between itself and the next closure in the heap. There are two cases:
482 \item A closure is overwritten with a blackhole.
483 \item A closure is overwritten with an indirection closure.
486 In either case, an existing closure is destroyed after being replaced
487 with a new closure. If the two closures are of the same size, no slop
488 words are introduced and we only need to invoke @LDV_recordDead()@ on
489 the existing closure, which cannot be an inherently used closure. If
490 not, that is, the new closure is smaller than the existing closure
491 (the opposite cannot happen), we need to fill one or more slop words
492 with zeroes as well as invoke @LDV_recordDead()@ on the existing
493 closure. The macro @LDV_recordDead_FILL_SLOP_DYNAMIC()@ accomplishes
494 these two tasks: it determines the size of the existing closure,
495 invokes @LDV_recordDead()@, and fills the slop words with zeroes.
496 After excluding all cases in which the two closures are of the same
497 size, we invoke @LDV_recordDead_FILL_SLOP_DYNAMIC()@ only from:
500 \item @threadLazyBlackHole()@ and @threadSqueezeStack()@ in @GC.c@
501 (for lazy blackholing),
502 \item @UPD_BH_UPDATABLE()@ and @UPD_BH_SINGLE_ENTRY()@ in
503 @includes/StgMacros.h@ (for eager blackholing, which isn't the
505 \item @updateWithIndirection()@ and @updateWithPermIndirection()@
506 in @Storage.h@.\footnote{Actually slop words created in
507 @updateWithIndirection()@ cannot survive major garbage collections.
508 Still we invoke @LDV\_recordDead\_FILL\_SLOP\_DYNAMIC()@ to support linear
509 scan of the heap during a garbage collection, which is discussed in the next
513 The linear scan of the from-space is initiated by the garbage
514 collector. From the function @LdvCensusForDead()@, every dead closure
515 in the from-space is visited through an invocation of
516 @processHeapClosureForDead()@.
518 \subsection{Final scan of the heap}
520 Since a closure surviving the final garbage collection is implicitly destroyed
521 when the runtime system shuts down, we must invoke @processHeapClosureForDead@
522 on \emph{every} closure in the heap once more after the final garbage collection.
523 The function @LdvCensusKillAll()@, which is invoked from @shutdownHaskell()@
524 in @RtsStartup.c@, traverses the entire heap and visits each closure.
525 It also stops LDVU profiling by resetting @ldvTime@ to $0$.
527 It may be that after LDVU profiling stops, new closures may be created
528 and even garbage collections may be performed.
529 We choose to ignore these closures because they are all concerned about
530 finalizing weak pointers (in @finalizeWeakPointersNow()@).
531 It can be catastrophic to invoke @LdvCensusKillAll()@ after finishing
532 @finalizeWeakPointersNow()@: @finalizeWeakPointersNow()@ calls
533 @rts_evalIO()@, which is essentially initiating a new program execution,
534 and no assumptions made upon LDVU profiling hold any longer.
536 \section{Time of Use}
538 In order to yield correct LDVU profiling results, we must make sure
539 that @LDV_recordUse()@ be called on a closure whenever it is used;
540 otherwise, most of closures would be reported to be in the void phase.
541 @includes/StgLdvProf.h@ provides an entry macro @LDV_ENTER@ which
542 expands to @LDV_recordUse()@. The compiler arranges to invoke
543 @LDV_ENTER@ in the entry code for every dynamic closure it generates
544 code for (constructors, thunks and functions). We also have to add
545 @LDV_ENTER@ calls to the closures statically compiled into the RTS:
546 @PAP@s, @AP_UPD@s, standard thunk forms (in @StgStdThunks.hc@, and
547 several others in @StgMiscClosures.hc@.
549 \section{Computing Final Statistics}
551 After the final scan of the heap, we can accurately determine the total
552 size of closures in one of the four phases at the moment of each heap census.
553 The structure @LdvGenInfo@ is augmented with two additional fields
554 @voidTotal@ and @dragTotal@:
565 @gi[@$i$@].voidTotal@ and @gi[@$i$@].dragTotal@ are computed
566 from @gi[@$i$@].voidNew@ and @gi[@$i$@].dragNew@, respectively.\footnote{Due
567 to a slight optimization described before, @gi[@$i$@].voidTotal@ is actually
568 computed as $\sum_{1 \leq j \leq i}$@gi[@$j$@].voidNew@.
569 @gi[@$i$@].dragTotal@ is computed in a similar way.}
570 Then, the total size of closures in the lag phase @gi[@$i$@].lagTotal@ is computed
571 as @gi[@$i$@].notUsed@$-$@gi[@$i$@].voidTotal@ (because any unused closure
572 is either in the void phase or in the lag phase).
574 the total size of closures in the use phase @gi[@$i$@].useTotal@ is computed
575 as @gi[@$i$@].used@$-$@gi[@$i$@].dragTotal@ (because any used closure
576 is either in the use phase or in the drag phase).
577 @endLdvProfiling()@, called from @endHeapProfiling@ in @ProfHeap.c@, computes these
582 The runtime system option @-hL@ tells the executable program to
583 perform LDVU profiling and produce a @.hp@ file:
589 The option @-i@ can be used to
590 specify a desired interval at which LDVU profiling is performed.
591 The default and minimum value is half a second:
594 $ Foo.out +RTS -hL -i2.5 -RTS
597 The @.hp@ file can be supplied to the @hp2ps@ program to create a postscript
598 file showing the progress of LDVU profiling in a graph:
605 The horizontal axis of the graph is in the Haskell mutator time, which excludes
606 the runtime system time such as garbage collection time and LDVU profiling
608 The Haskell mutator runs a bit slower than it would without performing
609 LDVU profiling, but the difference is minute.
610 Also, the timer employed to periodically perform retainer profiling
611 is not perfectly accurate. Therefore, the result may slightly vary for each
612 execution of retainer profiling.
614 \textbf{To do:} Currently the LDVU profiling is not supported with @-G1@ option.
616 \textbf{To do:} When we perform LDVU profiling, the Haskell mutator time seems to
617 be affected by @-S@ or @-s@ runtime option. For instance, the following
618 two options should result in nearly same profiling outputs, but
619 the second run (without @-Sstderr@ option) spends almost twice as
620 long in the Haskell mutator as the first run:
621 1) @+RTS -Sstderr -hL -RTS@; 2) @+RTS -hL -RTS@.
622 This is quite a subtle bug because this wierd phenomenon is not
623 observed in retainer profiling, yet the implementation of
624 @mut_user_time_during_LDV()@ is completely analogous to that of
625 @mut_user_time_during_RP()@. The overall shapes of the resultant graphs
626 are almost the same, though.
630 This section gives a summary of changes made to the GHC in
631 implementing LDVU profiling.
632 Only three files (@includes/StgLdvProf.h@, @LdvProfile.c@, and
633 @LdvProfile.h@) are new, and all others exist in the GHC.
635 @\includes@ directory:
638 \item[StgLdvProf.h] defines type @LdvGenInfo@, constants, and macros related
640 \item[ClosureMacros.h] changes macro @SET_PROF_HDR()@.
641 \item[Stg.h] includes th header file @StgLdvProf.h@.
642 \item[StgMacros.h] changes macros @UPD_BH_UPDATABLE()@ and @UPD_BH_SINGLE_ENTRY()@.
648 \item[GC.c] invokes @LdvCensusForDead()@ before tidying up, sets @hasBeenAnyGC@ to
649 @rtsTrue@, and changes @copy()@ and @copyPart()@.
650 Invokes @LDV_recordDead()@ and @LDV_recordDead_FILL_SLOP_DYNAMIC()@.
651 \item[Itimer.c] changes @handle_tick()@.
652 \item[LdvProfile.c] implements the LDVU profiling engine.
653 \item[LdvProfile.h] is the header for @LdvProfile.c@.
654 \item[PrimOps.hc] changes @finalizzeWeakzh_fast()@.
655 \item[ProfHeap.c] changes @initHeapProfiling()@ and @endHeapProfiling()@.
656 \item[Profiling.c] changes @initProfilingLogFile@ and @report_ccs_profiling()@.
657 \item[Proftimer.c] declares @ticks_to_retainer_ldv_profiling@,
658 @performRetainerLdvProfiling@, and @doContextSwitch@.
659 \item[Proftimer.h] is the header for @Proftimer.c@. Defines @PROFILING_MIN_PERIOD@,
660 which specifies the minimum profiling period and the default profiling period.
661 %\item[RtsAPI.c] implements @setProfileHeader()@.
663 sets @RtsFlags.ProfFlags.doHeapProfile@,
664 adds a string to @usage_text[]@ in @setupRtsFlags()@.
665 \item[RtsFlags.h] defines constants @HEAP_BY_LDV@ and @LDVchar@.
666 \item[RtsStartup.c] changes @shutDownHaskell()@.
667 \item[Schedule.c] changes @schedule()@.
669 declares @LDV_start_time@, @LDV_tot_time@, @LDVe_start_time@,
671 Changes @mut_user_time_during_GC()@, @mut_user_time()@,
673 @stat_endExit()@, and
676 @mut_user_time_during_LDV()@,
677 @stat_startLDV()@, and
679 \item[Stats.h] is hte header for @Stats.c@.
680 \item[StgMiscClosures.hc] inserts entry macros in
681 @stg_IND_entry()@, @stg_IND_PERM_entry()@, @stg_IND_OLDGEN_entry()@,
682 @stg_IND_OLDGEN_PERM_entry()@, @stg_BLACKHOLE_entry()@, @stg_BLACKHOLE_BQ_entry()@,
683 and @stg_CAF_BLACKHOLE_entry()@.
684 Invokes @LDV_recordDead()@ in @stg_BLACKHOLE_entry@.
685 Redefines @stg_DEAD_WEAK_info@.
686 \item[Storage.c] changes @initStorage()@, @resetNurseries()@, and @allocNursery()@.
687 \item[Storage.h] changes @updateWithIndirection()@ and @updateWithPermIndirection()@.
688 \item[Updates.hc] inserts entry macros in @stg_PAP_entry()@ and @stg_AP_UPD_entry()@.
689 \item[Weak.c] changes @scheduleFinalizers()@.
692 \bibliographystyle{plain}
693 \bibliography{reference}