1 \documentclass{article}
2 \usepackage{code,a4wide}
4 \usepackage{graphics,epsfig,epic,eepic,epsfig}
6 \setlength{\parskip}{0.25cm}
7 \setlength{\parsep}{0.25cm}
8 \setlength{\topsep}{0cm}
9 \setlength{\parindent}{0cm}
10 \renewcommand{\textfraction}{0.2}
11 \renewcommand{\floatpagefraction}{0.7}
15 \newcommand{\block}{block}
16 \newcommand{\Block}{Block}
17 \newcommand{\segment}{segment}
18 \newcommand{\Segment}{Segment}
19 \newcommand{\step}{step}
20 \newcommand{\Step}{Step}
22 \newcommand{\note}[1]{{\em $\spadesuit$ #1}}
25 \title{Implementation of Lag/Drag/Void/Use Profiling}
31 \section{Lag/Drag/Void/Use Profiling}
33 \emph{Lag/Drag/Void/Use} (LDVU) profiling~\cite{RR} is a profiling technique
34 which yields a summary of the biography of all the dynamic closures created
35 during program execution.
36 In this profiling scheme,
37 the biography of a closure is determined by four important events associated
38 with the closure: \emph{creation}, \emph{first use},
39 \emph{last use}, and \emph{destruction}.
40 The intervals between these successive events correspond to three phases
41 for the closure: \emph{lag} (between creation and first use),
42 \emph{use} (between first use and last use), and
43 \emph{drag} (between last use and destruction).
44 If the closure is never used, it is considered to remain in the \emph{void}
45 phase all its lifetime.
47 The LDVU profiler regularly performs heap censuses during program execution.
48 Each time a heap census is performed, the LDVU profiler increments a global
49 time, which is used for timing all the events (such as creation and destruction
50 of a closure) occurring during program execution.
51 Hence, for instance, all closures creating between two successive heap censuses
52 have the same creation time and belong to the same \emph{generation}.\footnote{In
53 this document, a generation is related with heap censuses, not garbage collections
54 as in other profiling schemes.}
55 After the program terminates, it yields a post-mortem report on how much
56 of the \emph{live} graph is in one of the four phases at the moment of each
59 It must be emphasized that the LDVU profiler considers only live closures;
60 it should not take into consideration dead closures which do not constitute
61 the graph. Therefore, the result of LDVU profiling does not depend on the
62 frequency of garbage collections.
64 This document describes the implementation of LDVU profiling on the Glasgow
65 Haskell Compiler runtime system.\footnote{Unless otherwise noted, all identifiers
66 are defined in @LdvProfile.c@}.
68 \section{An Overview of the Implementation}
70 Every closure is augmented with an additional word in its profiling header
71 to accommodate three additional pieces of information:
72 1) state flag indicating whether the closure has been used at least once or not.
73 2) creation time; 3) time of most recent use if any so far.
74 We refer to such a word as an LDV word.
76 The LDVU profiler maintains a global time, stored in @ldvTime@.
77 It is incremented each time a heap census is performed.
78 During a heap census, the profiler scans all live closures and computes the
80 1) the total size of all closures which have never been used;
81 2) the total size of all closures which have been used at least once
82 in the past.\footnote{There is another category of closures, namely,
83 \emph{inherently used} closures. We will explain this later.}
84 It is not until the whole program execution finishes that the profiler
85 can actually decide the total size corresponding to each of the four phases for
86 a particular heap census. It is only when a closure is destroyed that the profiler
87 can determine how long the closure has been in a specific phase.
88 Therefore, it is not sufficient to perform heap censuses periodically in order to
89 compute the profiling statistics: the runtime system needs to intercept
90 all events associated with any closures and update necessary information.
92 All events associated with closures are handled by one of the three
94 in @includes/StgLdv.h@: @LDV_recordCreate()@, @LDV_recordUse()@, and
96 @LDV_recordCreate()@ is called when a closure is created and updates its creation
98 @LDV_recordUse()@ is called when a closure is used and updates its most recent
101 @LDV_recordDead()@ is called when a closure @c@ is removed from the graph.
102 It does not update its LDV word (because @c@ is about to be destroyed).
103 Instead, it updates the statistics on LDVU profiling according to the following
105 if @c@ has never been used (which is indicated by its state flag),
106 @c@ contributes to the void phase from its creation time to the last census
107 time; if @c@ was used at least once (which is also indicated by its state flag),
108 @c@ contributes to the @drag@ phase after its last use time.
110 At the end of the program execution, the profiler performs a last census during
111 which all closures in the heap are declared to be dead and @LDV_recordDead()@
112 is invoked on each of them.
113 Then, the profiler computes the final statistics.
117 The field @hp.ldvw@ in the @StgProfHeader@ structure corresponds to the LDV
118 word:\footnote{We share the LDV word for both retainer profiling and LDVU
119 profiling, which cannot take place simultaneously. This is the reason why there is a
120 union structure inside the @StgProHeader@ structure.}
125 retainerSet *rs; // Retainer Set
126 StgWord ldvw; // Lag/Drag/Void Word
130 For instance, the LDV word of a closure @c@ can now be accessed with
131 @c->header.prof.hp.ldvw@ (or by @LDVW(c)@ where @LDVW()@ is a macro in
132 @includes/StgLdvProf.h@).
134 An LDV word is divided into three fields, whose position is specified
135 by three constants in @includes/StgLdvProf.h@:
136 @LDV_STATE_MASK@ (state flag), @LDV_CREATE_MASK@ (creation time), and
137 @LDV_LAST_MASK@ (most recent use time).
138 The constant @LDV_SHIFT@ specifies how many bits are allocated for
139 creation time or most recent use time.
140 For instance, the creation time of a closure @c@ can be obtained by
141 @(LDVW(c) & LDV_CREATE_MASK) >> LDV_SHIFT@.
143 The creation time field and the most recent use time field can be set only by the
144 macros @LDV_recordCreate()@ and @LDV_recordUse()@.
145 @LDV_recordCreate()@ must be called whenever a new dynamic closure is created,
146 and this is handily accomplished by rewriting the macro @SET_PROF_HDR()@
147 (in @includes/ClosureMacros.h@) (we do not need to change @SET_STATIC_PROF_HDR()@
148 because static closures are not involved in LDVU profiling at all):
151 #define SET_PROF_HDR(c,ccs_) \
152 ((c)->header.prof.ccs = ccs_, \
153 LDV_recordCreate((c)))
156 There are a few cases in which the info table of a closure changes through
157 an explicit invocation of @SET_INFO()@ or a direct assignment to its @header.info@
158 field: 1) an indirection closure is replaced by an old-generation
159 indirection closure; 2) a thunk is replaced by a blackhole; 3) a thunk is replaced
160 by an indirection closure when its evaluation result becomes available.\footnote{A
161 direct assignment to the @header.info@ field implies that its cost centre
162 field is not initialized. This is no problem in the case of @EVACUATED@ closures
164 not be used again after a garbage collection. However, I am not sure if this is safe
165 for @BLACKHOLE\_BQ@ closures (in @StgMiscClosures.hc@) when retainer profiling,
166 which employs cost centre stacks, is going on.
167 If it is safe, please leave a comment there.}
168 We regard such a situation as
169 the destruction of an old closure followed by the creation of a new closure
170 at the same memory address.\footnote{This would be unnecessary if the two closures
171 are of the same size, but it is not always the case. We choose to distinguish
172 the two closures for the sake of consistency.}
173 For instance, when an @IND_PERM@ closure is replaced by an @IND_OLDGEN_PERM@
174 closures (during scavenging in @GC.c@), we wrap the invocation of @SET_INFO()@ with
175 the invocations of @LDV_recordDead()@ and @LDV_recordCreate()@ as follows
176 (@LDV_recordDead()@ requires the actual size of the closures being destroyed):
179 LDV_recordDead((StgClosure *)p, sizeofW(StgInd) - sizeofW(StgProfHeader));
180 SET_INFO(((StgClosure *)p), &stg_IND_OLDGEN_PERM_info);
181 LDV_recordCreate((StgClosure *)p);
184 @LDV_recordUse()@ is called on a closure whenever it is used, or \emph{entered}.
185 Its state flag changes if necessary to indicate that it has been used, and
186 the current global time is stored in its last use time field.
188 \section{Global Time \texttt{ldvTime} and Retainer Profiling}
190 The global time, stored in @ldvTime@, is used primarily to record the number
191 of times heap censuses have been performed for LDV profiling.
192 It is initialized to $1$ and incremented each time a heap census is completed
193 through an invocation of @LdvCensus()@ (i.e., at the end of @LdvCensus()@).
194 Its implication is that
195 all closures created between two successive invocations of @LdvCensus()@
196 belong to the same generation and have the same creation time.
197 Another implication is that
198 if a closure is used at least once between two successive heap
199 censuses, we consider the closure to be in the use phase
200 during the corresponding time period
201 (because we just set its last use time field to the current value
202 of @ldvTime@ whenever it is used).
203 Notice that a closure with a creation time $t_c$ may be destroyed before
204 the major garbage collection before the actual heap census for time $t_c$ and thus
205 may \emph{not} be observed during the heap census for time $t_c$.
206 Such a closure does not affect the profiling statistics at all.
208 In addition, the global time @ldvTime@ indicates
209 which of LDVU profiling and retainer profiling is currently active:
210 during LDVU profiling, it is initialized to $1$ in @initLdvProfiling()@
211 and then increments as LDVU profiling proceeds;
212 during retainer profiling, however, it is always fixed to $0$.
214 Thus, wherever a piece of code shared by both retainer profiling and
215 LDVU profiling comes to play, we usually need to first examine the value of @ldvTime@
216 if necessary. For instance, consider the macro @LDV_recordUse()@:
219 #define LDV_recordUse(c) \
221 LDVW((c)) = (LDVW((c)) & LDV_CREATE_MASK) | ldvTime | LDV_STATE_USE;
224 If retainer profiling is being performed, @ldvTime@ is equal to $0$, and
225 @LDV_recordUse()@ causes no side effect.\footnote{Due to this interference
226 with LDVU profiling, retainer profiling slows down a bit; for instance,
227 checking @ldvTime@ against $0$ in the above example would always evaluate to
228 @rtsFalse@ during retainer profiling.
229 However, this is the price to be paid for our decision not to employ a
230 separate field for LDVU profiling.}
232 As another example, consider @LDV_recordCreate()@:
235 #define LDV_recordCreate(c) \
236 LDVW((c)) = (ldvTime << LDV_SHIFT) | LDV_STATE_CREATE
239 The above definition of @LDV_recordCreate()@ works without any problem
240 even for retainer profiling: during retainer profiling,
241 a retainer set field (@hp.ldvw@) must be initialized to a null pointer.
242 Since @ldvTime@ is fixed to $0$, @LDV_recordCreate()@ initializes
243 retainer set fields correctly.
245 \section{Heap Censuses}
247 The LDVU profiler performs heap censuses periodically by invoking the
248 function @LdvCensus()@
250 take place at any random moment during program execution).
251 Since LDVU profiling deals only with live closures, we choose to have
252 a heap census preceded by a major garbage collection so that only live
253 closures reside in the heap
254 (see @schedule()@ in @Schedule.c@).\footnote{It turns out that this
255 choice does not considerably simplify the implementation; we need to
256 finish a complete linear scan of the entire live heap at any rate.
257 As a side note, in~\cite{RR}, a heap census is \emph{followed} by a garbage
258 collection, which is the opposite of our strategy.}
260 During a census, we examine each closure one by one and computes the following
264 \item the total size of all \emph{inherently used} closures.
265 \item the total size of all closures which have never been used.
266 \item the total size of all closures which have been used at least once.
269 An inherently used closure is, by definition, one which is deemed to
270 be in use even at the point of its birth. A closure which cannot be
271 entered (e.g., @ARR_WORDS@) belongs to this category (otherwise
272 such a closure would stay in the void phase all the time).
273 In the current implementation, the following types of closures are
274 considered to be inherently used:
275 @TSO@, @MVAR@, @MUT_ARR_PTRS@, @MUT_ARR_PTRS_FROZEN@, @ARR_WORDS@,
276 @WEAK@, @MUT_VAR@, @MUT_CONS@, @FOREIGN@, @BCO@, and @STABLE_NAME@.
278 The three quantities are stored in @gi[ldvTime]@, a cell in the @LdvGenInfo@ array
280 The structure @LdvGenInfo@ is defined as follows:
285 int inherentlyUsed; // total size of 'inherently used' closures
286 int notUsed; // total size of 'never used' closures
287 int used; // total size of 'used at least once' closures
292 The above three quantities account for mutually exclusive sets of closures.
293 In other words, the second and the third are computed from those closures
294 which are not inherently used. Their state flag indicates which of the
295 second and the third their size should be attributed to.
297 \subsection{Linear Scan of the Heap during Heap Censuses}
299 During a heap census, we need to visit every live closure once and a linear
300 scan of the heap is sufficient to our goal.
301 Since we assume that a major garbage collection cleans up the heap before
302 any heap census, we can take advantage of the following facts to implement
303 a linear scan for heap censuses:
306 \item The nursery is empty. The small object pool and the large object pool,
307 however, may \emph{not} be empty. This is because the garbage collector
308 invokes @scheduleFinalizer()@ after removing dead closures, and
309 @scheduleFinalizer()@ may create new closures through @allocate()@.
310 \item @IND@, @IND_OLDGEN@, and @EVACUATED@ closures do not appear in the heap.
313 The implementation of the linear scan strategy is complicated by the
314 following three facts.
315 First, either the small object pool (accessible from @small_alloc_list@)
316 or the large object pool (accessible from @g0s0->large_objects@)
317 may contain an initial evaluation stack, which is allocated via @allocate()@ in
318 @initModules()@ (in @RtsStartup.c@).
319 The initial evaluation stack is heterogenous from any closure; it may not
320 consist of consecutively allocated closures.
321 It is hard to discern the initial evaluation stack (unless
322 we keep track of its position and size).
323 Second, a closure may not necessarily be adjacent to the next closure in the heap
324 because there may appear \emph{slop words} between two closures.
325 This happens when a closure is replaced by another closure with different
328 We solve the first problem simply by postponing heap censuses until the first
329 garbage collection, whether it is a major garbage collection or not.
330 This simple idea works because the initial evaluation stack is removed from
331 the heap during a garbage collection. The boolean variable @hasBeenAnyGC@
332 is set to @rtsTrue@ the first time a garbage collection is performed, and
333 hence @LdvCensus()@ returns immediately unless @hasBeenAnyGC@ has a @rtsTrue@
335 In practice, however, this premature return seldom happens: minor garbage
336 collections occur frequently enough that the first invocation of @LdvCensus()@ is
337 preceded by a minor garbage collection in most cases.
339 We solve the second problem by filling all possible slop words with zeroes.
340 Then we can find the next closure after any give closure by skipping zeroes
341 if any. First of all, we notice that slop words surviving a major garbage
342 collection can be generated only in the following two cases:
345 \item A closure is overwritten with a blackhole.
346 \item A closure is overwritten with an indirection closure.
347 %\item A weak pointer is overwritten with a dead weak pointer.
350 In either case, an existing closure is destroyed after being replaced with a
352 If the two closures are of the same size, no slop words are introduce and
353 we only need to invoke
354 @LDV_recordDead()@ on the existing closure, which cannot be an inherently used
356 If not, that is, the new closure is smaller than the existing closure
357 (the opposite cannot happen), we need to fill one or more slop words with zeroes
358 as well as invoke @LDV_recordDead()@ on the existing closure.
359 The macro @LDV_recordDead_FILL_SLOP_DYNAMIC()@ accomplishes these two tasks:
360 it determines the size of the existing closure, invokes @LDV_recordDead()@, and
361 fills the slop words with zeroes.
362 After excluding all cases in which the two closures are of the same size,
363 we invoke @LDV_recordDead_FILL_SLOP_DYNAMIC()@ only from:
366 \item @UPD_BH_UPDATABLE()@ and @UPD_BH_SINGLE_ENTRY()@ in @includes/StgMacros.h@,
367 @threadLazyBlackHole()@ and @threadSqueezeStack()@ in @GC.c@.
368 \item @updateWithIndirection()@ and @updateWithPermIndirection()@
369 in @Storage.h@.\footnote{Actually slop words created in
370 @updateWithIndirection()@ cannot survive major garbage collections.
371 Still we invoke @LDV\_recordDead\_FILL\_SLOP\_DYNAMIC()@ to support linear
372 scan of the heap during a garbage collection, which is discussed in the next
376 Notice that weak pointers being overwritten with dead weak pointers can be handled
377 properly without recourse to @LDV_recordDead_FILL_SLOP_DYNAMIC()@. The reason is
378 that dead weak pointers are of the same size as normal weak
379 pointers.\footnote{This is not the case without profiling. It was not the case
380 in the previous version of
381 GHC, either. See the comments on @stg\_DEAD\_WEAK\_info@ in @StgMiscClosures.hc@.}
383 \section{Destruction of Closures}
385 In order to compute the total size of closures for each of the four phases,
386 we must report the destruction of every closure (except inherently used closures)
387 to the LDVU profiler by invoking @LDV_recordDead()@.
388 @LDV_recordDead()@ must not be called on any inherently used closure
389 because any invocation of @LDV_recordDead()@ affects the
390 statistics regarding void and drag phases, which no inherently used
393 @LDV_recordDead()@ updates two fields @voidNew@ and @dragNew@ in the @LdvGenInfo@
405 @gi[ldvTime].voidNew@ accumulates the size of all closures satisfying the following
406 two conditions: 1) observed during the heap census at time @ldvTime@;
407 2) now known to have been in the void phase at time @ldvTime@.
408 It is updated when a closure which has never been used is destroyed.
409 Suppose that a closure @c@ which has never been used is about to be destroyed.
410 If its creation time is $t_c$, we judge that @c@ has been in the
411 void phase all its lifetime, namely, from time $t_c$ to @ldvTime@.
412 Since @c@ will not be observed during the next heap census, which corresponds
413 to time @ldvTime@, @c@ contributes to the void phase of times $t_c$ through
415 Therefore, we increase the @voidNew@ field of @gi[@$t_c$@]@ through @gi[ldvTime - 1]@
416 by the size of @c@.\footnote{In the actual implementation, we update @gi[$t_c$]@
417 and @gi[ldvTime]@ (not @gi[ldvTime@$ - $1@]@) only: @gi[$t_c$]@ and @gi[ldvTime]@
418 are increased and decreased by the size of @c@, respectively.
419 After finishing the program execution, we can correctly adjust all the fields.}
421 @gi[ldvTime].dragNew@ accumulates the size of all closures satisfying the following
422 two conditions: 1) observed during the heap census at time @ldvTime@;
423 2) now known to have been in the drag phase at time @ldvTime@.
424 It is updated when a closure which has been used at least once is destroyed.
425 Suppose that a closure @c@ which has been used last at time $t_l$ is about to
427 We judge that @c@ has been in the drag phase from time $t_l + 1$ to
428 time @ldvTime@$ - 1$ (if $t_l + 1 > $@ldvTime@$ - 1$, nothing happens).
429 Therefore, we increase the @dragNew@ field of @gi[@$t_l + 1$@]@ through
430 @gi[ldvTime@$ - 1$@]@
431 by the size of @c@.\footnote{As in the case of @voidNew@, we update
432 @gi[@$t_l + 1$@]@ and @gi[ldvTime]@ only.}
434 Now we need to find out all the cases of closure destruction.
435 There are four cases in which a closure is destroyed:
438 \item A closure is overwritten with a blackhole:
439 @UPD_BH_UPDATABLE()@ and @UPD_BH_SINGLE_ENTRY()@ in @includes/StgMacros.h@,
440 @threadLazyBlackHole()@ and @threadSqueezeStack()@ in @GC.c@,
441 the entry code for @BLACKHOLE@ closures in @StgMiscClosures.hc@ (a
442 @BLACKHOLE@ closure is changed into a @BLACKHOLE_BQ@ closure).
443 We call either @LDV_recordDead()@ or @LDV_recordDead_FILL_SLOP_DYNAMIC()@.
445 \item A weak pointer is overwritten with a dead weak pointer:
446 @finalizzeWeakzh_fast()@ in @PrimOps.hc@,
447 @finalizeWeakPointersNow()@ and @scheduleFinalizers()@ in @Weak.c@.
448 Since a weak pointer is inherently used, we do not call @LDV_recordDead()@.
450 \item A closure is overwritten with an indirection closure:
451 @updateWithIndirection()@ and @updateWithPermIndirection()@ in @Storage.h@,
452 @scavenge()@ in @GC.c@, in which an @IND_PERM@ closure is explicitly replaced
453 with an @IND_OLDGEN_PERM@ closure during scavenging.
454 We call either @LDV_recordDead()@ or @LDV_recordDead_FILL_SLOP_DYNAMIC()@.
456 \item Closures are removed permanently from the graph during garbage collections.
457 We locate and dispose of all dead closures by
458 linearly scanning the from-space right before tidying up.
459 This is feasible because any closures which is about to be removed from the graph
460 still remains in the from-space until tidying up is completed.
461 The next subsection explains how to implement this idea.
464 \textbf{To do:} if okay, replace the invocation of @SET_HDR()@ with that of
465 @SET_INFO()@ and remove @LDV_recordCreate()@ in some of the above cases.
467 \subsection{Linear scan of the from-space during garbage collections}
469 In order to implement linear scan of the from-space during a garbage collection
471 we need to take into consideration the following facts:
474 \item The from-space includes the nursery.
475 Furthermore, a closure in the nursery may not necessarily be adjacent to the next
476 closure because slop words lie between two closures;
477 the Haskell mutator may allocate more space than actually needed in the
478 nursery when creating a closure, potentially leaving slop words.
480 \item The pointer @free@ of a block in the nursery may incorrectly point to
481 a byte past its actual boundary.
483 the Haskell mutator first increases @hpLim@ without comparing it with the
484 actual boundary when allocating fresh memory for a new closure.
485 @hpLim@ is later assigned to the pointer @free@ of the corresponding memory
486 block, which means that during a heap census, the pointer @hpLim@ may not
488 Notice that @hpLim@ is not available during LDVU profiling; it is valid
489 only during the Haskell mutator time.
491 \item The from-space may well contain a good number of @EVACUATED@ closures,
492 and they must be skipped over.
495 The first problem could be solved by always monitoring @Hp@ during the Haskell
496 mutator time: whenever @Hp@ is increased, we fill with zeroes
497 as many words as the change of @HP@. Then, we could skip any trailing
498 zero words when linearly scanning the nursery.
499 Alternatively we could initialize the entire nursery with zeroes after
500 each garbage collection and not worry about any change made to @Hp@ during the
501 Haskell mutator time.
502 The number of zero words to be written to the nursery could be reduced
503 in the first approach, for we do not have to fill the header for a new closure.
504 Nevertheless we choose to employ the second approach because
505 it simplifies the implementation code significantly
506 (see @resetNurseries()@ in @Storage.c@).
507 Moreover, the second approach compensates for its redundant initialization cost
508 by providing faster execution (due to a single memory write loop in contrast
509 to frequent memory write loops in the first approach).
510 Also, we attribute the initialization cost to the runtime system and thus
511 the Haskell mutator behavior is little affected.
513 The second problem is easily solved by limiting the scan up to the actual
514 block boundary for each nursery block (see @processNurseryForDead()@).
515 In other words, for a nursery block descriptor @bd@,
516 whichever of @bd->start@$ + $@BLOCK_SIZE_W@ and @bd->free@ is smaller
517 is used as the actual boundary.
519 We solve the third problem by exploiting LDV words of @EVACUATED@ closures:
520 we store the size of an evacuated closure, which now resides in the to-space,
521 in the LDV word of the new @EVACUATED@ closure occupying its memory.
522 This is easily implemented by inserting a call to the macro
523 @SET_EVACUAEE_FOR_LDV()@ in @copy()@ and @copyPart()@ (in @GC.c@).
524 Thus, when we encounter an @EVACUATED@ closure while linearly scanning the
525 nursery, we can skip a correct number of words by referring to its LDV word.
527 The linear scan of the from-space is initiated by the garbage collector.
528 From the function @LdvCensusForDead()@, every dead closure in the from-space is
529 visited through an invocation of @processHeapClosureForDead()@.
531 \subsection{Final scan of the heap}
533 Since a closure surviving the final garbage collection is implicitly destroyed
534 when the runtime system shuts down, we must invoke @processHeapClosureForDead@
535 on \emph{every} closure in the heap once more after the final garbage collection.
536 The function @LdvCensusKillAll()@, which is invoked from @shutdownHaskell()@
537 in @RtsStartup.c@, traverses the entire heap and visits each closure.
538 It also stops LDVU profiling by resetting @ldvTime@ to $0$.
540 It may be that after LDVU profiling stops, new closures may be created
541 and even garbage collections may be performed.
542 We choose to ignore these closures because they are all concerned about
543 finalizing weak pointers (in @finalizeWeakPointersNow()@).
544 It can be catastrophic to invoke @LdvCensusKillAll()@ after finishing
545 @finalizeWeakPointersNow()@: @finalizeWeakPointersNow()@ calls
546 @rts_evalIO()@, which is essentially initiating a new program execution,
547 and no assumptions made upon LDVU profiling hold any longer.
549 \section{Time of Use}
551 In order to yield correct LDVU profiling results, we must make sure that
552 @LDV_recordUse()@ be called on a closure whenever it is used; otherwise,
553 most of closures would be reported to be in the void phase.
554 @includes/StgLdvProf.h@ provides nine entry macros (e.g., @LDV_ENT_THK()@).
555 Each of the entry macros corresponds to a type of closures which can be entered.
556 For instance, @LDV_ENT_THK()@ is supposed to be invoked whenever a thunk
557 is used. In the current implementation, all these macros expand to
558 @LDV_recordUse()@ with no additional work.
560 \textbf{To do:} modify the compiler so that it inserts the above macros
561 at appropriate places in the generated C code.
563 \section{Computing Final Statistics}
565 After the final scan of the heap, we can accurately determine the total
566 size of closures in one of the four phases at the moment of each heap census.
567 The structure @LdvGenInfo@ is augmented with two additional fields
568 @voidTotal@ and @dragTotal@:
579 @gi[@$i$@].voidTotal@ and @gi[@$i$@].dragTotal@ are computed
580 from @gi[@$i$@].voidNew@ and @gi[@$i$@].dragNew@, respectively.\footnote{Due
581 to a slight optimization described before, @gi[@$i$@].voidTotal@ is actually
582 computed as $\sum_{1 \leq j \leq i}$@gi[@$j$@].voidNew@.
583 @gi[@$i$@].dragTotal@ is computed in a similar way.}
584 Then, the total size of closures in the lag phase @gi[@$i$@].lagTotal@ is computed
585 as @gi[@$i$@].notUsed@$-$@gi[@$i$@].voidTotal@ (because any unused closure
586 is either in the void phase or in the lag phase).
588 the total size of closures in the use phase @gi[@$i$@].useTotal@ is computed
589 as @gi[@$i$@].used@$-$@gi[@$i$@].dragTotal@ (because any used closure
590 is either in the use phase or in the drag phase).
591 @endLdvProfiling()@, called from @endHeapProfiling@ in @ProfHeap.c@, computes these
596 The runtime system option @-hL@ tells the executable program to
597 perform LDVU profiling and produce a @.hp@ file:
603 The option @-i@ can be used to
604 specify a desired interval at which LDVU profiling is performed.
605 The default and minimum value is half a second:
608 $ Foo.out +RTS -hL -i2.5 -RTS
611 The @.hp@ file can be supplied to the @hp2ps@ program to create a postscript
612 file showing the progress of LDVU profiling in a graph:
619 The horizontal axis of the graph is in the Haskell mutator time, which excludes
620 the runtime system time such as garbage collection time and LDVU profiling
622 The Haskell mutator runs a bit slower than it would without performing
623 LDVU profiling, but the difference is minute.
624 Also, the timer employed to periodically perform retainer profiling
625 is not perfectly accurate. Therefore, the result may slightly vary for each
626 execution of retainer profiling.
628 \textbf{To do:} Currently the LDVU profiling is not supported with @-G1@ option.
632 This section gives a summary of changes made to the GHC in
633 implementing LDVU profiling.
634 Only three files (@includes/StgLdvProf.h@, @LdvProfile.c@, and
635 @LdvProfile.h@) are new, and all others exist in the GHC.
637 @\includes@ directory:
640 \item[StgLdvProf.h] defines type @LdvGenInfo@, constants, and macros related
642 \item[ClosureMacros.h] changes macro @SET_PROF_HDR()@.
643 \item[Stg.h] includes th header file @StgLdvProf.h@.
644 \item[StgMacros.h] changes macros @UPD_BH_UPDATABLE()@ and @UPD_BH_SINGLE_ENTRY()@.
650 \item[GC.c] invokes @LdvCensusForDead()@ before tidying up, sets @hasBeenAnyGC@ to
651 @rtsTrue@, and changes @copy()@ and @copyPart()@.
652 Invokes @LDV_recordDead()@ and @LDV_recordDead_FILL_SLOP_DYNAMIC()@.
653 \item[Itimer.c] changes @handle_tick()@.
654 \item[LdvProfile.c] implements the LDVU profiling engine.
655 \item[LdvProfile.h] is the header for @LdvProfile.c@.
656 \item[PrimOps.hc] changes @finalizzeWeakzh_fast()@.
657 \item[ProfHeap.c] changes @initHeapProfiling()@ and @endHeapProfiling()@.
658 \item[Profiling.c] changes @initProfilingLogFile@ and @report_ccs_profiling()@.
659 \item[Proftimer.c] declares @ticks_to_retainer_ldv_profiling@,
660 @performRetainerLdvProfiling@, and @doContextSwitch@.
661 \item[Proftimer.h] is the header for @Proftimer.c@.
662 \item[RtsAPI.c] implements @setProfileHeader()@.
664 sets @RtsFlags.ProfFlags.doHeapProfile@,
665 adds a string to @usage_text[]@ in @setupRtsFlags()@.
666 \item[RtsFlags.h] defines constants @HEAP_BY_LDV@ and @LDVchar@.
667 \item[RtsStartup.c] changes @shutDownHaskell()@.
668 \item[Schedule.c] changes @schedule()@.
670 declares @LDV_start_time@, @LDV_tot_time@, @LDVe_start_time@,
672 Changes @mut_user_time_during_GC()@, @mut_user_time()@,
674 @stat_endExit()@, and
677 @mut_user_time_during_LDV()@,
678 @stat_startLDV()@, and
680 \item[Stats.h] is hte header for @Stats.c@.
681 \item[StgMiscClosures.hc] inserts entry macros in
682 @stg_IND_entry()@, @stg_IND_PERM_entry()@, @stg_IND_OLDGEN_entry()@,
683 @stg_IND_OLDGEN_PERM_entry()@, @stg_BLACKHOLE_entry()@, @stg_BLACKHOLE_BQ_entry()@,
684 and @stg_CAF_BLACKHOLE_entry()@.
685 Invokes @LDV_recordDead()@ in @stg_BLACKHOLE_entry@.
686 Redefines @stg_DEAD_WEAK_info@.
687 \item[Storage.c] changes @initStorage()@, @resetNurseries()@, and @allocNursery()@.
688 \item[Storage.h] changes @updateWithIndirection()@ and @updateWithPermIndirection()@.
689 \item[Updates.hc] inserts entry macros in @stg_PAP_entry()@ and @stg_AP_UPD_entry()@.
690 \item[Weak.c] changes @scheduleFinalizers()@.
693 \bibliographystyle{plain}
694 \bibliography{reference}