1 \documentclass{article}
2 \usepackage{code,a4wide}
4 \usepackage{graphics,epsfig,epic,eepic,epsfig}
6 \setlength{\parskip}{0.25cm}
7 \setlength{\parsep}{0.25cm}
8 \setlength{\topsep}{0cm}
9 \setlength{\parindent}{0cm}
10 \renewcommand{\textfraction}{0.2}
11 \renewcommand{\floatpagefraction}{0.7}
15 \newcommand{\block}{block}
16 \newcommand{\Block}{Block}
17 \newcommand{\segment}{segment}
18 \newcommand{\Segment}{Segment}
19 \newcommand{\step}{step}
20 \newcommand{\Step}{Step}
22 \newcommand{\note}[1]{{\em $\spadesuit$ #1}}
25 \title{Implementation of Lag/Drag/Void/Use Profiling}
31 \section{Lag/Drag/Void/Use Profiling}
33 \emph{Lag/Drag/Void/Use} (LDVU) profiling~\cite{RR} is a profiling technique
34 which yields a summary of the biography of all the dynamic closures created
35 during program execution.
36 In this profiling scheme,
37 the biography of a closure is determined by four important events associated
38 with the closure: \emph{creation}, \emph{first use},
39 \emph{last use}, and \emph{destruction} (see Figure~\ref{fig-ldv}).
40 The intervals between these successive events correspond to three phases
41 for the closure: \emph{lag} (between creation and first use),
42 \emph{use} (between first use and last use), and
43 \emph{drag} (between last use and destruction).
44 If the closure is never used, it is considered to remain in the \emph{void}
45 phase all its lifetime.
50 \caption{The biography of a closure}
55 The LDVU profiler regularly performs heap censuses during program execution.
56 Each time a heap census is performed, the LDVU profiler increments a global
57 time, which is used for timing all the events (such as creation and destruction
58 of a closure) occurring during program execution.
59 Hence, for instance, all closures creating between two successive heap censuses
60 have the same creation time and belong to the same \emph{generation}.\footnote{In
61 this document, a generation is related with heap censuses, not garbage collections
62 as in other profiling schemes.}
63 After the program terminates, it yields a post-mortem report on how much
64 of the \emph{live} graph is in one of the four phases at the moment of each
67 It must be emphasized that the LDVU profiler considers only live closures;
68 it should not take into consideration dead closures which do not constitute
69 the graph. Therefore, the result of LDVU profiling does not depend on the
70 frequency of garbage collections.
72 This document describes the implementation of LDVU profiling on the Glasgow
73 Haskell Compiler runtime system.\footnote{Unless otherwise noted, all identifiers
74 are defined in @LdvProfile.c@}.
76 \section{An Overview of the Implementation}
78 Every closure is augmented with an additional word in its profiling header
79 to accommodate three additional pieces of information:
80 1) state flag indicating whether the closure has been used at least once or not.
81 2) creation time; 3) time of most recent use if any so far.
82 We refer to such a word as an LDV word.
84 The LDVU profiler maintains a global time, stored in @ldvTime@.
85 It is incremented each time a heap census is performed.
86 During a heap census, the profiler scans all live closures and computes the
88 1) the total size of all closures which have never been used;
89 2) the total size of all closures which have been used at least once
90 in the past.\footnote{There is another category of closures, namely,
91 \emph{inherently used} closures. We will explain
92 in Section~\ref{sec-heap-censuses}.}
93 It is not until the whole program execution finishes that the profiler
94 can actually decide the total size corresponding to each of the four phases for
95 a particular heap census. It is only when a closure is destroyed that the profiler
96 can determine how long the closure has been in a specific phase.
97 Therefore, it is not sufficient to perform heap censuses periodically in order to
98 compute the profiling statistics: the runtime system needs to intercept
99 all events associated with any closures and update necessary information.
101 All events associated with closures are handled by one of the three
103 in @includes/StgLdv.h@: @LDV_recordCreate()@, @LDV_recordUse()@, and
107 \item{@LDV_recordCreate()@} is called when a closure is created and updates its
110 \item{@LDV_recordUse()@} is called when a closure is used and updates its most recent
113 \item{@LDV_recordDead()@} is called when a closure @c@ is removed from the graph.
114 It does not update its LDV word (because @c@ is about to be destroyed).
115 Instead, it updates the statistics on LDVU profiling according to the following
117 if @c@ has never been used (which is indicated by the state flag in its LDV
119 @c@ contributes to the void phase from its creation time to the last census
120 time; if @c@ was used at least once (which is also indicated by the state flag),
121 @c@ contributes to the @drag@ phase after its last use time.
124 At the end of the program execution, the profiler performs a last census during
125 which all closures in the heap are declared to be dead and @LDV_recordDead()@
126 is invoked on each of them.
127 Then, the profiler computes the final statistics.
131 We choose to share the LDV word for both retainer profiling and LDVU
132 profiling, which cannot take place simultaneously.
133 This is the reason why there is a
134 union structure inside the @StgProHeader@ structure.
135 The field @hp.ldvw@ in the @StgProfHeader@ structure corresponds to the LDV
141 retainerSet *rs; // Retainer Set
142 StgWord ldvw; // Lag/Drag/Void Word
146 For instance, the LDV word of a closure @c@ can now be accessed with
147 @c->header.prof.hp.ldvw@ (or by @LDVW(c)@ where @LDVW()@ is a macro in
148 @includes/StgLdvProf.h@).
150 An LDV word is divided into three fields, whose position is specified
151 by three constants in @includes/StgLdvProf.h@:
153 \item{@LDV_STATE_MASK@} corresponds to the state flag.
154 \item{@LDV_CREATE_MASK@} corresponds to the creation time.
155 \item{@LDV_LAST_MASK@} corresponds to the most recent use time.
157 The constant @LDV_SHIFT@ specifies how many bits are allocated for
158 creation time or most recent use time.
159 For instance, the creation time of a closure @c@ can be obtained by
160 @(LDVW(c) & LDV_CREATE_MASK) >> LDV_SHIFT@.
162 The creation time field and the most recent use time field can be set only by the
163 macros @LDV_recordCreate()@ and @LDV_recordUse()@.
164 @LDV_recordCreate()@ must be called whenever a new dynamic closure is created,
165 and this is handily accomplished by rewriting the macro @SET_PROF_HDR()@
166 (in @includes/ClosureMacros.h@) (we do not need to change @SET_STATIC_PROF_HDR()@
167 because static closures are not involved in LDVU profiling at all):
170 #define SET_PROF_HDR(c,ccs_) \
171 ((c)->header.prof.ccs = ccs_, \
172 LDV_recordCreate((c)))
175 There are a few cases in which the info table of a closure changes through
176 an explicit invocation of @SET_INFO()@ or a direct assignment to its @header.info@
177 field: 1) an indirection closure is replaced by an old-generation
178 indirection closure; 2) a thunk is replaced by a blackhole; 3) a thunk is replaced
179 by an indirection closure when its evaluation result becomes available.
181 \emph{We regard such a situation as
182 the destruction of an old closure followed by the creation of a new closure
183 at the same memory address.}\footnote{This would be unnecessary if the two closures
184 are of the same size, but it is not always the case. We choose to distinguish
185 the two closures for the sake of consistency.}
186 For instance, when an @IND_PERM@ closure is replaced by an @IND_OLDGEN_PERM@
187 closures (during scavenging in @GC.c@), we wrap the invocation of @SET_INFO()@ with
188 the invocations of @LDV_recordDead()@ and @LDV_recordCreate()@ as follows
189 (@LDV_recordDead()@ requires the actual size of the closures being destroyed):
192 LDV_recordDead((StgClosure *)p, sizeofW(StgInd) - sizeofW(StgProfHeader));
193 SET_INFO(((StgClosure *)p), &stg_IND_OLDGEN_PERM_info);
194 LDV_recordCreate((StgClosure *)p);
198 A direct assignment to the @header.info@ field implies that its cost centre
199 field is not initialized. This is no problem in the case of @EVACUATED@ closures
201 not be used again after a garbage collection. However, I am not sure if this is safe
202 for @BLACKHOLE_BQ@ closures (in @StgMiscClosures.hc@) when retainer profiling,
203 which employs cost centre stacks, is going on.
204 If it is safe, please leave a comment there.
206 @LDV_recordUse()@ is called on a closure whenever it is used, or \emph{entered}.
207 Its state flag changes if necessary to indicate that it has been used, and
208 the current global time is stored in its last use time field.
210 \section{Global Time \texttt{ldvTime} and Retainer Profiling}
212 The global time, stored in @ldvTime@, is used primarily to record the number
213 of times heap censuses have been performed for LDV profiling.
214 It is initialized to $1$ and incremented each time a heap census is completed
215 through an invocation of @LdvCensus()@ (i.e., at the end of @LdvCensus()@).
216 Its implication is that
217 all closures created between two successive invocations of @LdvCensus()@
218 have the same creation time.
219 Another implication is that
220 if a closure is used at least once between two successive heap
221 censuses, we consider the closure to be in the use phase
222 during the corresponding time period
223 (because we just set its last use time field to the current value
224 of @ldvTime@ whenever it is used).
225 Notice that a closure with a creation time $t_c$ may be destroyed before
226 the major garbage collection before the actual heap census for time $t_c$ and thus
227 may \emph{not} be observed during the heap census for time $t_c$.
228 Such a closure does not affect the profiling statistics at all.
230 In addition, the global time @ldvTime@ indicates
231 which of LDVU profiling and retainer profiling is currently active:
232 during LDVU profiling, it is initialized to $1$ in @initLdvProfiling()@
233 and then increments as LDVU profiling proceeds;
234 during retainer profiling, however, it is always fixed to $0$.
236 Thus, wherever a piece of code shared by both retainer profiling and
237 LDVU profiling comes to play, we usually need to first examine the value of @ldvTime@
238 if necessary. For instance, consider the macro @LDV_recordUse()@:
241 #define LDV_recordUse(c) \
243 LDVW((c)) = (LDVW((c)) & LDV_CREATE_MASK) | ldvTime | LDV_STATE_USE;
246 If retainer profiling is being performed, @ldvTime@ is equal to $0$, and
247 @LDV_recordUse()@ causes no side effect.\footnote{Due to this interference
248 with LDVU profiling, retainer profiling slows down a bit; for instance,
249 checking @ldvTime@ against $0$ in the above example would always evaluate to
250 @rtsFalse@ during retainer profiling.
251 However, this is the price to be paid for our decision not to employ a
252 separate field for LDVU profiling.}
254 As another example, consider @LDV_recordCreate()@:
257 #define LDV_recordCreate(c) \
258 LDVW((c)) = (ldvTime << LDV_SHIFT) | LDV_STATE_CREATE
261 The above definition of @LDV_recordCreate()@ works without any problem
262 even for retainer profiling: during retainer profiling,
263 a retainer set field (@hp.ldvw@) must be initialized to a null pointer.
264 Since @ldvTime@ is fixed to $0$, @LDV_recordCreate()@ initializes
265 retainer set fields correctly.
267 \section{Heap Censuses}
268 \label{sec-heap-censuses}
270 The LDVU profiler performs heap censuses periodically by invoking the
271 function @LdvCensus()@
272 (a heap census can take place at any random moment during program execution).
273 Since LDVU profiling deals only with live closures, however, we choose to have
274 a heap census preceded by a major garbage collection so that only live
275 closures reside in the heap
276 (see @schedule()@ in @Schedule.c@).\footnote{It turns out that this
277 choice does not considerably simplify the implementation; we need to
278 finish a complete linear scan of the entire live heap at any rate.
279 As a side note, in~\cite{RR}, a heap census is \emph{followed} by a garbage
280 collection, which is the opposite of our strategy.}
281 This design choice is necessary for the LDVU profiling result not to be
282 affected by the amount of heap memory available to programs.
283 Without a a major garbage collection performed before a heap census,
284 the size of closures in the drag or void phases would heavily depend on
285 the amount of available heap memory.
287 During a census, we examine each closure one by one and compute the following
291 \item the total size of all \emph{inherently used} closures.
292 \item the total size of all closures which have never been used.
293 \item the total size of all closures which have been used at least once.
296 An inherently used closure is, by definition, one which is deemed to
297 be in use even at the point of its birth. A closure which cannot be
298 entered (e.g., @ARR_WORDS@) belongs to this category (otherwise
299 such a closure would stay in the void phase all the time).
300 In the current implementation, the following types of closures are
301 considered to be inherently used:
302 @TSO@, @MVAR@, @MUT_ARR_PTRS@, @MUT_ARR_PTRS_FROZEN@, @ARR_WORDS@,
303 @WEAK@, @MUT_VAR@, @MUT_CONS@, @FOREIGN@, @BCO@, and @STABLE_NAME@.
305 The three quantities are stored in an @LdvGenInfo@ array @gi[]@.
306 @gi[]@ is indexed by time.
307 For instance, @gi[ldvTime]@ stores the three quantaties for the current
309 The structure @LdvGenInfo@ is defined as follows:
314 int inherentlyUsed; // total size of 'inherently used' closures
315 int notUsed; // total size of 'never used' closures
316 int used; // total size of 'used at least once' closures
321 The above three quantities account for mutually exclusive sets of closures.
322 In other words, if a closure is not inherently used, it belongs to
323 either the second or the third.
325 \subsection{Linear Scan of the Heap during Heap Censuses}
327 During a heap census, we need to visit every live closure once, and a linear
328 scan of the heap is sufficient to our goal.
329 Since we assume that a major garbage collection cleans up the heap before
330 any heap census, we can take advantage of the following facts to implement
331 a linear scan for heap censuses:
334 \item The nursery is empty. The small object pool and the large object pool,
335 however, may \emph{not} be empty. This is because the garbage collector
336 invokes @scheduleFinalizer()@ after removing dead closures, and
337 @scheduleFinalizer()@ may create new closures through @allocate()@.
338 \item @IND@, @IND_OLDGEN@, and @EVACUATED@ closures do not appear in the heap.
341 The implementation of the linear scan strategy is complicated by the
343 First, either the small object pool (accessible from @small_alloc_list@)
344 or the large object pool (accessible from @g0s0->large_objects@)
345 may contain an initial evaluation stack, which is allocated via @allocate()@ in
346 @initModules()@ (in @RtsStartup.c@).
347 The initial evaluation stack is heterogenous from any closure; it may not
348 consist of consecutively allocated closures, which makes it
349 hard to discern the initial evaluation stack among ordinary closures (unless
350 we keep track of its position and size).
351 Second, a closure may not necessarily be adjacent to the next closure in the heap
352 because there may appear \emph{slop words} between two closures.
353 This happens when a closure is replaced by another closure with different
356 We solve the first problem simply by postponing heap censuses until the first
357 garbage collection, whether it is a major garbage collection or not.
358 This simple idea works because the initial evaluation stack is removed from
359 the heap during a garbage collection. The boolean variable @hasBeenAnyGC@
360 is set to @rtsTrue@ the first time a garbage collection is performed, and
361 hence @LdvCensus()@ returns immediately unless @hasBeenAnyGC@ has a @rtsTrue@
363 In practice, however, this premature return seldom happens: minor garbage
364 collections occur frequently enough that the first invocation of @LdvCensus()@ is
365 preceded by a minor garbage collection in most cases.
367 We solve the second problem by filling all possible slop words with zeroes.
368 Then we can find the next closure after any give closure by skipping zeroes
369 if any. First of all, we notice that slop words surviving a major garbage
370 collection can be generated only in the following two cases:
373 \item A closure is overwritten with a blackhole.
374 \item A closure is overwritten with an indirection closure.
375 %\item A weak pointer is overwritten with a dead weak pointer.
378 In either case, an existing closure is destroyed after being replaced with a
380 If the two closures are of the same size, no slop words are introduce and
381 we only need to invoke
382 @LDV_recordDead()@ on the existing closure, which cannot be an inherently used
384 If not, that is, the new closure is smaller than the existing closure
385 (the opposite cannot happen), we need to fill one or more slop words with zeroes
386 as well as invoke @LDV_recordDead()@ on the existing closure.
387 The macro @LDV_recordDead_FILL_SLOP_DYNAMIC()@ accomplishes these two tasks:
388 it determines the size of the existing closure, invokes @LDV_recordDead()@, and
389 fills the slop words with zeroes.
390 After excluding all cases in which the two closures are of the same size,
391 we invoke @LDV_recordDead_FILL_SLOP_DYNAMIC()@ only from:
394 \item @UPD_BH_UPDATABLE()@ and @UPD_BH_SINGLE_ENTRY()@ in @includes/StgMacros.h@,
395 @threadLazyBlackHole()@ and @threadSqueezeStack()@ in @GC.c@.
396 \item @updateWithIndirection()@ and @updateWithPermIndirection()@
397 in @Storage.h@.\footnote{Actually slop words created in
398 @updateWithIndirection()@ cannot survive major garbage collections.
399 Still we invoke @LDV\_recordDead\_FILL\_SLOP\_DYNAMIC()@ to support linear
400 scan of the heap during a garbage collection, which is discussed in the next
404 Notice that weak pointers being overwritten with dead weak pointers can be handled
405 properly without recourse to @LDV_recordDead_FILL_SLOP_DYNAMIC()@. The reason is
406 that dead weak pointers are of the same size as normal weak
407 pointers.\footnote{This is not the case without profiling. It was not the case
408 in the previous version of
409 GHC, either. See the comments on @stg\_DEAD\_WEAK\_info@ in @StgMiscClosures.hc@.}
411 \section{Destruction of Closures}
413 In order to compute the total size of closures for each of the four phases,
414 we must report the destruction of every closure (except inherently used closures)
415 to the LDVU profiler by invoking @LDV_recordDead()@.
416 @LDV_recordDead()@ must not be called on any inherently used closure
417 because any invocation of @LDV_recordDead()@ affects the
418 statistics regarding void and drag phases, which no inherently used
421 @LDV_recordDead()@ updates two fields @voidNew@ and @dragNew@ in the @LdvGenInfo@
433 @gi[ldvTime].voidNew@ accumulates the size of all closures satisfying the following
434 two conditions: 1) observed during the heap census at time @ldvTime@;
435 2) now known to have been in the void phase at time @ldvTime@.
436 It is updated when a closure which has never been used is destroyed.
437 Suppose that a closure @c@ which has never been used is about to be destroyed.
438 If its creation time is $t_c$, we judge that @c@ has been in the
439 void phase all its lifetime, namely, from time $t_c$ to @ldvTime@.
440 Since @c@ will not be observed during the next heap census, which corresponds
441 to time @ldvTime@, @c@ contributes to the void phase of times $t_c$ through
443 Therefore, we increase the @voidNew@ field of @gi[@$t_c$@]@ through @gi[ldvTime - 1]@
444 by the size of @c@.\footnote{In the actual implementation, we update @gi[$t_c$]@
445 and @gi[ldvTime]@ (not @gi[ldvTime@$ - $1@]@) only: @gi[$t_c$]@ and @gi[ldvTime]@
446 are increased and decreased by the size of @c@, respectively.
447 After finishing the program execution, we can correctly adjust all the fields
449 @gi[$t_c$]@ is computed as $\sum_{i=0}^{t_c}$@gi[$i$]@.
452 @gi[ldvTime].dragNew@ accumulates the size of all closures satisfying the following
453 two conditions: 1) observed during the heap census at time @ldvTime@;
454 2) now known to have been in the drag phase at time @ldvTime@.
455 It is updated when a closure which has been used at least once is destroyed.
456 Suppose that a closure @c@ which has been used last at time $t_l$ is about to
458 We judge that @c@ has been in the drag phase from time $t_l + 1$ to
459 time @ldvTime@$ - 1$ (if $t_l + 1 > $@ldvTime@$ - 1$, nothing happens).
460 Therefore, we increase the @dragNew@ field of @gi[@$t_l + 1$@]@ through
461 @gi[ldvTime@$ - 1$@]@
462 by the size of @c@.\footnote{As in the case of @voidNew@, we update
463 @gi[@$t_l + 1$@]@ and @gi[ldvTime]@ only.}
465 Now we need to find out all the cases of closure destruction.
466 There are four cases in which a closure is destroyed:
469 \item A closure is overwritten with a blackhole:
470 @UPD_BH_UPDATABLE()@ and @UPD_BH_SINGLE_ENTRY()@ in @includes/StgMacros.h@,
471 @threadLazyBlackHole()@ and @threadSqueezeStack()@ in @GC.c@,
472 the entry code for @BLACKHOLE@ closures in @StgMiscClosures.hc@ (a
473 @BLACKHOLE@ closure is changed into a @BLACKHOLE_BQ@ closure).
474 We call either @LDV_recordDead()@ or @LDV_recordDead_FILL_SLOP_DYNAMIC()@.
476 \item A weak pointer is overwritten with a dead weak pointer:
477 @finalizzeWeakzh_fast()@ in @PrimOps.hc@,
478 @finalizeWeakPointersNow()@ and @scheduleFinalizers()@ in @Weak.c@.
479 Since a weak pointer is inherently used, we do not call @LDV_recordDead()@.
481 \item A closure is overwritten with an indirection closure:
482 @updateWithIndirection()@ and @updateWithPermIndirection()@ in @Storage.h@,
483 @scavenge()@ in @GC.c@, in which an @IND_PERM@ closure is explicitly replaced
484 with an @IND_OLDGEN_PERM@ closure during scavenging.
485 We call either @LDV_recordDead()@ or @LDV_recordDead_FILL_SLOP_DYNAMIC()@.
487 \item Closures are removed permanently from the graph during garbage collections.
488 We locate and dispose of all dead closures by
489 linearly scanning the from-space right before tidying up.
490 This is feasible because any closures which is about to be removed from the graph
491 still remains in the from-space until tidying up is completed.
492 The next subsection explains how to implement this idea.
495 \textbf{To do:} if okay, replace the invocation of @SET_HDR()@ with that of
496 @SET_INFO()@ and remove @LDV_recordCreate()@ in some of the above cases.
498 \subsection{Linear scan of the from-space during garbage collections}
500 In order to implement linear scan of the from-space during a garbage collection
502 we need to take into consideration the following facts:
505 \item The from-space includes the nursery.
506 Furthermore, a closure in the nursery may not necessarily be adjacent to the next
507 closure because slop words may lie between the two closures;
508 the Haskell mutator may allocate more space than actually needed in the
509 nursery when creating a closure, potentially leaving slop words.
511 \item The pointer @free@ of a block in the nursery may incorrectly point to
512 a byte past its actual boundary.
514 the Haskell mutator first increases @hpLim@ without comparing it with the
515 actual boundary when allocating fresh memory for a new closure.
516 @hpLim@ is later assigned to the pointer @free@ of the corresponding memory
517 block, which means that during a heap census, the pointer @hpLim@ may not
519 Notice that @hpLim@ is not available during LDVU profiling; it is valid
520 only during the Haskell mutator time.
522 \item The from-space may well contain a good number of @EVACUATED@ closures,
523 and they must be skipped over.
526 The first problem could be solved by always monitoring @Hp@ during the Haskell
527 mutator time: whenever @Hp@ is increased, we fill with zeroes
528 as many words as the change of @HP@. Then, we could skip any trailing
529 zero words when linearly scanning the nursery.
530 Alternatively we could initialize the entire nursery with zeroes after
531 each garbage collection and not worry about any change made to @Hp@ during the
532 Haskell mutator time.
533 The number of zero words to be written to the nursery could be reduced
534 in the first approach, for we do not have to fill the header for a new closure.
535 Nevertheless we choose to employ the second approach because
536 it simplifies the implementation code significantly
537 (see @resetNurseries()@ in @Storage.c@).
538 Moreover, the second approach compensates for its redundant initialization cost
539 by providing faster execution (due to a single memory write loop in contrast
540 to frequent memory write loops in the first approach).
541 Also, we attribute the initialization cost to the runtime system and thus
542 the Haskell mutator behavior is little affected.
544 The second problem is easily solved by limiting the scan up to the actual
545 block boundary for each nursery block (see @processNurseryForDead()@).
546 In other words, for a nursery block descriptor @bd@,
547 whichever of @bd->start@$ + $@BLOCK_SIZE_W@ and @bd->free@ is smaller
548 is used as the actual boundary.
550 We solve the third problem by exploiting LDV words of @EVACUATED@ closures:
551 we store the size of an evacuated closure, which now resides in the to-space,
552 in the LDV word of the new @EVACUATED@ closure occupying its memory.
553 This is easily implemented by inserting a call to the macro
554 @SET_EVACUAEE_FOR_LDV()@ in @copy()@ and @copyPart()@ (in @GC.c@).
555 Thus, when we encounter an @EVACUATED@ closure while linearly scanning the
556 nursery, we can skip a correct number of words by referring to its LDV word.
558 The linear scan of the from-space is initiated by the garbage collector.
559 From the function @LdvCensusForDead()@, every dead closure in the from-space is
560 visited through an invocation of @processHeapClosureForDead()@.
562 \subsection{Final scan of the heap}
564 Since a closure surviving the final garbage collection is implicitly destroyed
565 when the runtime system shuts down, we must invoke @processHeapClosureForDead@
566 on \emph{every} closure in the heap once more after the final garbage collection.
567 The function @LdvCensusKillAll()@, which is invoked from @shutdownHaskell()@
568 in @RtsStartup.c@, traverses the entire heap and visits each closure.
569 It also stops LDVU profiling by resetting @ldvTime@ to $0$.
571 It may be that after LDVU profiling stops, new closures may be created
572 and even garbage collections may be performed.
573 We choose to ignore these closures because they are all concerned about
574 finalizing weak pointers (in @finalizeWeakPointersNow()@).
575 It can be catastrophic to invoke @LdvCensusKillAll()@ after finishing
576 @finalizeWeakPointersNow()@: @finalizeWeakPointersNow()@ calls
577 @rts_evalIO()@, which is essentially initiating a new program execution,
578 and no assumptions made upon LDVU profiling hold any longer.
580 \section{Time of Use}
582 In order to yield correct LDVU profiling results, we must make sure that
583 @LDV_recordUse()@ be called on a closure whenever it is used; otherwise,
584 most of closures would be reported to be in the void phase.
585 @includes/StgLdvProf.h@ provides nine entry macros (e.g., @LDV_ENT_THK()@).
586 Each of the entry macros corresponds to a type of closures which can be entered.
587 For instance, @LDV_ENT_THK()@ is supposed to be invoked whenever a thunk
588 is used. In the current implementation, all these macros expand to
589 @LDV_recordUse()@ with no additional work.
591 \textbf{To do:} modify the compiler so that it inserts the above macros
592 at appropriate places in the generated C code.
594 \section{Computing Final Statistics}
596 After the final scan of the heap, we can accurately determine the total
597 size of closures in one of the four phases at the moment of each heap census.
598 The structure @LdvGenInfo@ is augmented with two additional fields
599 @voidTotal@ and @dragTotal@:
610 @gi[@$i$@].voidTotal@ and @gi[@$i$@].dragTotal@ are computed
611 from @gi[@$i$@].voidNew@ and @gi[@$i$@].dragNew@, respectively.\footnote{Due
612 to a slight optimization described before, @gi[@$i$@].voidTotal@ is actually
613 computed as $\sum_{1 \leq j \leq i}$@gi[@$j$@].voidNew@.
614 @gi[@$i$@].dragTotal@ is computed in a similar way.}
615 Then, the total size of closures in the lag phase @gi[@$i$@].lagTotal@ is computed
616 as @gi[@$i$@].notUsed@$-$@gi[@$i$@].voidTotal@ (because any unused closure
617 is either in the void phase or in the lag phase).
619 the total size of closures in the use phase @gi[@$i$@].useTotal@ is computed
620 as @gi[@$i$@].used@$-$@gi[@$i$@].dragTotal@ (because any used closure
621 is either in the use phase or in the drag phase).
622 @endLdvProfiling()@, called from @endHeapProfiling@ in @ProfHeap.c@, computes these
627 The runtime system option @-hL@ tells the executable program to
628 perform LDVU profiling and produce a @.hp@ file:
634 The option @-i@ can be used to
635 specify a desired interval at which LDVU profiling is performed.
636 The default and minimum value is half a second:
639 $ Foo.out +RTS -hL -i2.5 -RTS
642 The @.hp@ file can be supplied to the @hp2ps@ program to create a postscript
643 file showing the progress of LDVU profiling in a graph:
650 The horizontal axis of the graph is in the Haskell mutator time, which excludes
651 the runtime system time such as garbage collection time and LDVU profiling
653 The Haskell mutator runs a bit slower than it would without performing
654 LDVU profiling, but the difference is minute.
655 Also, the timer employed to periodically perform retainer profiling
656 is not perfectly accurate. Therefore, the result may slightly vary for each
657 execution of retainer profiling.
659 \textbf{To do:} Currently the LDVU profiling is not supported with @-G1@ option.
663 This section gives a summary of changes made to the GHC in
664 implementing LDVU profiling.
665 Only three files (@includes/StgLdvProf.h@, @LdvProfile.c@, and
666 @LdvProfile.h@) are new, and all others exist in the GHC.
668 @\includes@ directory:
671 \item[StgLdvProf.h] defines type @LdvGenInfo@, constants, and macros related
673 \item[ClosureMacros.h] changes macro @SET_PROF_HDR()@.
674 \item[Stg.h] includes th header file @StgLdvProf.h@.
675 \item[StgMacros.h] changes macros @UPD_BH_UPDATABLE()@ and @UPD_BH_SINGLE_ENTRY()@.
681 \item[GC.c] invokes @LdvCensusForDead()@ before tidying up, sets @hasBeenAnyGC@ to
682 @rtsTrue@, and changes @copy()@ and @copyPart()@.
683 Invokes @LDV_recordDead()@ and @LDV_recordDead_FILL_SLOP_DYNAMIC()@.
684 \item[Itimer.c] changes @handle_tick()@.
685 \item[LdvProfile.c] implements the LDVU profiling engine.
686 \item[LdvProfile.h] is the header for @LdvProfile.c@.
687 \item[PrimOps.hc] changes @finalizzeWeakzh_fast()@.
688 \item[ProfHeap.c] changes @initHeapProfiling()@ and @endHeapProfiling()@.
689 \item[Profiling.c] changes @initProfilingLogFile@ and @report_ccs_profiling()@.
690 \item[Proftimer.c] declares @ticks_to_retainer_ldv_profiling@,
691 @performRetainerLdvProfiling@, and @doContextSwitch@.
692 \item[Proftimer.h] is the header for @Proftimer.c@.
693 \item[RtsAPI.c] implements @setProfileHeader()@.
695 sets @RtsFlags.ProfFlags.doHeapProfile@,
696 adds a string to @usage_text[]@ in @setupRtsFlags()@.
697 \item[RtsFlags.h] defines constants @HEAP_BY_LDV@ and @LDVchar@.
698 \item[RtsStartup.c] changes @shutDownHaskell()@.
699 \item[Schedule.c] changes @schedule()@.
701 declares @LDV_start_time@, @LDV_tot_time@, @LDVe_start_time@,
703 Changes @mut_user_time_during_GC()@, @mut_user_time()@,
705 @stat_endExit()@, and
708 @mut_user_time_during_LDV()@,
709 @stat_startLDV()@, and
711 \item[Stats.h] is hte header for @Stats.c@.
712 \item[StgMiscClosures.hc] inserts entry macros in
713 @stg_IND_entry()@, @stg_IND_PERM_entry()@, @stg_IND_OLDGEN_entry()@,
714 @stg_IND_OLDGEN_PERM_entry()@, @stg_BLACKHOLE_entry()@, @stg_BLACKHOLE_BQ_entry()@,
715 and @stg_CAF_BLACKHOLE_entry()@.
716 Invokes @LDV_recordDead()@ in @stg_BLACKHOLE_entry@.
717 Redefines @stg_DEAD_WEAK_info@.
718 \item[Storage.c] changes @initStorage()@, @resetNurseries()@, and @allocNursery()@.
719 \item[Storage.h] changes @updateWithIndirection()@ and @updateWithPermIndirection()@.
720 \item[Updates.hc] inserts entry macros in @stg_PAP_entry()@ and @stg_AP_UPD_entry()@.
721 \item[Weak.c] changes @scheduleFinalizers()@.
724 \bibliographystyle{plain}
725 \bibliography{reference}