1 \documentclass{article}
2 \usepackage{code,a4wide}
4 \usepackage{graphics,epsfig,epic,eepic,epsfig}
6 \setlength{\parskip}{0.25cm}
7 \setlength{\parsep}{0.25cm}
8 \setlength{\topsep}{0cm}
9 \setlength{\parindent}{0cm}
10 \renewcommand{\textfraction}{0.2}
11 \renewcommand{\floatpagefraction}{0.7}
15 \newcommand{\block}{block}
16 \newcommand{\Block}{Block}
17 \newcommand{\segment}{segment}
18 \newcommand{\Segment}{Segment}
19 \newcommand{\step}{step}
20 \newcommand{\Step}{Step}
22 \newcommand{\note}[1]{{\em $\spadesuit$ #1}}
25 \title{Implementation of Retainer Profiling}
31 \section{Retainer Profiling}
33 Retainer profiling~\cite{CN} is a profiling technique which is based upon a
34 special view of production and consumption of heap objects at runtime:
35 while producers build heap objects to form new pieces of graph,
36 consumers hold pointers to these heap objects, or \emph{retain} them, so
37 that they are not freed during garbage collections.
38 On this basis, we refereed to such consumers as \emph{retainers}.
39 Notice that an object can have more than one retainer because it can
40 be pointed to by multiple objects.
42 For each live object in the heap, retainer profiling computes
43 all its retainers, or its \emph{retainer set}.
44 A naive implementation of retainer profiling could consider every
45 immediate ancestor of an object as its retainer.
46 Although this approach appears to provide an accurate report on the
47 relationship between retainers and retainees, the result can hardly be useful.
48 For instance, it is pointless to scrutinize a list and treat each cons
49 cell as a retainer of the following cons cell.
50 This observation suggests that we need to have a way of designating only
51 certain types of objects as candidates for retainers.
52 In other words, we need to employ an oracle which tells whether a given
53 object can be a retainer or not.
55 Since no retainer of a particular object needs to be using the
56 object actively, we can find all its retainers simply by traversing
57 the graph. In other words, we do not have to distinguish those retainers
58 actively exploiting it from other retainers just holding pointers
59 to it (either directly or indirectly).
60 Thus, retainer profiling can be accomplished simply by traversing the
63 Figure~\ref{fig-retaineralgorithm} shows the algorithm for retainer
64 profiling. The traversal begins at every root, and proceeds
65 in a depth first manner (or a breadth first manner).
66 The function @R()@ returns the identity of a given retainer such
67 as its memory address, its information table address, or
68 the name of the module
70 Notice that the retainer identity function does not need to be a
72 multiple objects can share the same identity.
73 Such a retainer identity function reduces the cost of traversal.
74 For instance, when an object
75 is reached from several retainers which share the same identity, we need to
76 consider only the first visit to the object.
77 In other words, whichever retainer (among those sharing the same identity)
78 leads to the object for the first time affects the retainer set of the object
80 and all the other retainers can be ignored.
81 Thus, the more coarse the function @R()@ is, the less
82 it costs to traverse the graph for retainer profiling.
83 The function @isRetainer()@ tells whether a given object is a retainer or not.
84 Hence, the behavior of the retainer profiling algorithm is determined
85 uniquely by: 1) the set of roots; 2) the function @R()@; 3) the function
98 if c is a retainer then
104 if R(r) is a member of c.retainerSet then
106 add R(r) to c.retainerSet
107 if isRetainer(c) then
111 for every successor c' of c
114 \caption{Retainer profiling algorithm}
115 \label{fig-retaineralgorithm}
119 Another way of formulating retainer profiling is in terms of fixed point
120 equations on retainer sets.
121 To be specific, given the two functions @isRetainer()@ and @R()@,
122 the retainer set of every object is computed as the least fixed point
123 solution of the following equations:
125 \item For every root @r@, @R(r)@ $\in$ @r.retainerSet@.
126 \item For every object @c@,
127 $\bigcup_{\mathit{each\ ancestor\ @a@\ of\ @c@}}$ @from(a)@ $\subseteq$
129 \item @from(a)@ = if @isRetainer(a)@ then @a.retainerSet@ else $\{@a@\}$.
132 This document describes the implementation of retainer profiling on
133 the Glasgow Haskell Compiler runtime system.
134 It explains every detail of the development so that it can be (hopefully)
135 a complete maintenance guide.
136 A secondary goal is to help (hopefully) those who wish to extend the system
137 to implement another profiling scheme.\footnote{Unless otherwise mentioned,
138 all identifiers are defined in @RetainerProfile.c@}
140 \section{Installing the GHC}
142 Installing the GHC is done as follows:
145 \item Get the source code from the CVS repository.
147 ./ cvs checkout fpconfig
148 ./ cvs checkout fptools
151 \item Set up the basic configuration.
154 ./fptools/ghc/ autoconf
158 \item Set up the configuration for development and debugging.
160 ./fptools/mk vi build.mk
161 GhcHcOpts = -O -fasm -Rghc-timing
168 @GhcLibWays@ tells the compiler to build the code for profiling as well.
169 @GhcRtsHcOpts@ has additional flags for @gcc@ when compiling @.hc@ files.
170 @GhcRtsCcOpts@ has additional flags for @gcc@ when compiling @.c@ files.
171 Since we will implement retainer profiling in @.c@ files, we turn on the
173 The empty setting for @STRIP@ tells the compiler not to remove source code
174 information (generated due to the @-g@ option) from executable files so that
175 they can be examined with @gdb@.
177 \item Remove unnecessary files if needed and build everything.
179 ./fptools/ rm dll green-card haggis happy hdirect hood libraries
184 \section{Adding Retainer Set Fields}
186 Since every Haskell closure now needs to store its retainer set at runtime,
187 it must be augmented with a new field,
188 namely, a emph{retainer set field}.
189 This section explains how to add such a field to Haskell closures.
190 It should be clear how to generalize the idea for adding
191 any number of new fields.\footnote{The GHC provides two
192 ways of building executable programs from
193 source files: normal way and profiling way.
194 We are concerned only about profiling way, and all the pieces of code
195 implementing profiling way are wrapped by the @PROFILING@
196 pre-processing directive (as in @\#ifdef PROFILING@).
197 Therefore, all the additions and changes that we make to the source code
198 are assumed to be wrapped by the @PROFILING@ pre-processing
199 directive as well unless otherwise mentioned.}
201 \subsection{Adding a new field to Haskell closures}
203 We want to add a retainer set field of type @retainerSet@ to every
204 closure, so we create a new file @includes/StgRetainerProf.h@ where
205 we define the type @retainerSet@.
206 The actual definition of @retainerSet@ will be given later.
209 /* includes/StgRetainerProf.h */
210 typedef ... retainerSet;
213 We make type @retainerSet@ to be publicly available by including
214 @includes/StgRetainerProf.h@ itself to @includes/Stg.h@ (not wrapped
219 #include "StgRetainerProf.h"
222 Then we add a retainer set field @rs@ to the @StgProfHeader@ structure.
225 /* include/Closures.h */
227 CostCentreStack *ccs;
232 Now every closure @c@ (including static closures) has a retainer set field,
233 which can be accessed with @c->header.prof.rs@ (where @c@ denotes the
234 address of the closure).
236 \subsection{Changing constants}
238 We are ready to reflect the new size of Haskell closures to other part
240 This is accomplished by changing a few constants which specify the size
241 of certain types of closures and their layout.
243 When building the runtime system, the @gcc@ compiler correctly figures out
244 the size of every structure on its own.
246 the GHC simply reads @includes/Constants.h@ to to determine the size of
247 closures assumed by the runtime system.
248 Thus, we change the constants used by the GHC itself (as opposed to
249 the runtime system). They are all found in @includes/Constants.h@.
250 We increase each of them by 1 to reflect the retainer set field which is one
253 /* includes/Constants.h */
254 #define PROF_HDR_SIZE 2
255 #define SCC_UF_SIZE 5
256 #define SCC_SEQ_FRAME_SIZE 4
258 @PROF_HDR_SIZE@ denotes the size of the structure @StgProfHeader@, which
259 is now two words long.
260 @SCC_UF_SIZE@ and @SCC_SEQ_FRAME_SIZE@ denote the size of the structures
261 @StgUpdateFrame@ and @StgSeqFrame@ (in @includes/Closures.h@) in
264 Since these constants are used when building the GHC (by another compiler),
265 we must rebuild the GHC. In other words, when executed, the code generated by
266 the GHC must now allocate one more word for the retainer set field than before,
267 and hence it is inevitable to rebuild the GHC, which is done as follows:
270 ./fptools/ghc/ make clean
271 ./fptools/ghc/ make boot
275 The second command @make boot@ instructs the build system to analyze
276 the source code dependency so that the next execution of @make@ correctly
277 finds all required files.
279 Next we change four bitmap constants which specify the layout of
280 certain types of closures.
281 As an example, let us consider @RET_BITMAP@, which specifies the layout
282 of thunk selectors (corresponding to closure type @THUNK_SELECTOR@).
283 Without a retainer set field, there is only one non-pointer (represented
284 by $1$) followed by one or more pointers (represented by $0$) in the closure
285 body and the bitmap representation is $0b1$, or $1$.
286 With a retainer set field, which is not a pointer to another closure and thus
287 represented by $1$, there are two non-pointers, and the bitmap representation
288 is $0b11$, or $3$. Notice that the bitmap is interpreted in reverse order:
289 the least significant bit corresponds to the first word in the closure body,
290 and the second least significant bit to the second word, and so forth.
291 The same rule applies to the other three bitmap constants:
292 @CATCH_FRAME_BITMAP@ (for closure type @CATCH_FRAME@ and structure
294 @STOP_THREAD_BITMAP@ (for closure type @STOP_FRAME@ and structure
296 @UPD_FRAME_BITMAP@ (for closure type @UPDATE_FRAME@ and structure
300 /* rts/StgStdThunks.hc */
302 /* rts/Exception.hc */
303 #define CATCH_FRAME_BITMAP 15
304 /* rts/StgStartup.hc */
305 #define STOP_THREAD_BITMAP 3
307 #define UPD_FRAME_BITMAP 7
310 For most closure types, the new definition of @StgProfHeader@ is
311 automatically propagated to their corresponding structures.
312 However, there are six closures types which are not affected by
313 @StgProfHeader@. They are all stack closures:
314 @RET_DYN@, @RET_BCO@, @RET_SMALL@, @RET_VEC_SMALL@, @RET_BIG@, and
316 If you want a new field to be added to these closures, you may
317 have to modify their corresponding structures.
319 \textbf{To do:} Currently the above changes introduce a new bug in the
320 runtime system. For instance, @nofib/real/symalg@ ends up with a division-by-zero
321 exception if we add a new field.
323 \subsection{Initialization code}
325 When a new closure is created, its retainer set field may have to be
326 initialized according to the way that retainer profiling is implemented.
327 For instance, we could use as an initial value a pointer to an empty retainer
329 Alternatively we could assign a null pointer to indicate that its retainer
330 set has not been computed yet, which we adopt in our implementation.
331 In either case, we have to visit the new closure and execute some initialization
332 code on it so that its retainer set field is set to an appropriate value.
334 There are four parts in the source code which need to be modified.
335 dynamic closure initialization, static closure initialization,
336 closure creation through C function invocation,
337 and update frame initialization.
338 The first is accomplished by modifying the macro @SET_PROF_HDR()@ (in
339 @include/ClosureMacros.h@). When a closure @c@ is created at runtime,
340 @SET_PROF_HDR()@ is invoked immediately with @c@ as its first argument.
341 Thus, the following code initializes the retainer set field of every
342 dynamic closure to a null pointer.
345 /* include/ClosureMacros.h */
346 #define SET_PROF_HDR(c,ccs_) \
347 ((c)->header.prof.ccs = ccs_, (c)->header.prof.rs = NULL)
350 Similarly, the macro @SET_STATIC_PROF_HDR()@ (in the
351 same file) specifies how the retainer set field of every static closure
352 is initialized, which is rewritten as follows:
355 /* include/ClosureMacros.h */
356 #define SET_STATIC_PROF_HDR(ccs_) \
357 prof : { ccs : ccs_, rs : NULL },
360 There is another way of creating dynamic closures through explicit C
361 function invocations at runtime.
362 Such functions are all defined in @RtsAPI.c@: @rts_mkChar()@, @rts_mkInt()@,
363 @rts_mkWord()@, and so forth.
364 Each function allocates memory for a new closure,
365 initializes it, and returns its address.
366 Therefore, we can simply insert in each function another initialization code
367 for retainer set fields.
368 To this end, we define a macro @setRetainerField()@ and insert it
372 #define setRetainerField(p) \
373 (p)->header.prof.rs = NULL
376 For instance, @rts_mkChar()@ is now defined as follows:
381 rts_mkChar (HsChar c)
383 StgClosure *p = (StgClosure *)allocate(CONSTR_sizeW(0,1));
390 Finally we may need to initialize the retainer set field of an update frame
391 (stack closure) when it is pushed onto the stack for the first
392 time.\footnote{In our implementation of
393 retainer profiling, the retainer set field is not used for any stack closure.
394 Hence, the following modification is entirely unnecessary.
395 Also, update frames are the only exception to the standard way of creating
396 stack closures: all the other types of stack closures with a retainer set
397 field are eventually initialized by
398 the macro @SET\_HDR()@ (in @includes/ClosureMacros.h@), which in turn
399 invokes @SET\_PROF\_HDR()@. This is not the case for update frames.
400 Compare @PUSH\_UPD\_FRAME()@ (in @includes/Updates.h@) and
401 @PUSH\_SEQ\_FRAME()@ (in @includes/StgMacros.h@) for clarification.}
402 For instance, if we want to initialize the retainer set field of update
403 frames to a null pointer, we can rewrite the macro @PUSH_STD_CCCS()@
404 (in @includes/Updates.h@) as follows:
407 /* includes/Updates.h */
408 #define PUSH_STD_CCCS(frame) \
409 (frame->header.prof.ccs = CCCS, frame->header.prof.rs = NULL)
412 \section{Retainer Sets}
414 At the end of retainer profiling, every live closure (except stack
415 closures, for which we do not compute retainer sets) is associated with
416 a retainer set; there can be no closure without an associated retainer set
417 because every live closure is visited during traversal.
418 Since many closures may well be associated with a common retainer set,
419 we want to avoid creating any particular retainer set more than once.
420 This section presents the details of manipulating retainer sets in our
423 \subsection{Identity of a retainer}
425 The function @R()@ in Figure~\ref{fig-retaineralgorithm} returns
426 the identity of a retainer. In order to implement it, we need
427 a type for retainer identity.
428 The type @retainer@ (in @includes/StgRetainerProf.h@) is introduced for
431 There are various ways of defining the type @retainer@.
432 For instance, we can designate the information table address of a retainer as
433 its identity as follows:
436 struct _StgInfoTable;
437 typedef struct _StgInfoTable *retainer;
440 We can also use the cost centre stack associated with the retainer:
443 typedef CostCentreStack *retainer;
446 The function @R()@ is embodied as the function @getRetainerFrom()@ in the
447 implementation, whose type is @(StgClosure *)@ $\rightarrow$ @retainer@.
448 It is straightforward to define @getRetainerFrom()@ according to the definition
449 of @retainer@, as illustrated below:
452 retainer getRetainerFrom(StgClosure *c) { return get_itbl(c); }
453 retainer getRetainerFrom(StgClosure *c) { return c->header.prof.ccs; }
456 \subsection{Retainer sets and the cost function}
458 A retainer set is stored in the structure @retainerSet@
459 (in @includes/StgRetainerProf.h@):
462 typedef struct _retainerSet {
471 The field @num@ gives the number of retainers in the retainer set, which
472 are all stored in the array @element[]@. Thus, the size of @element[]@
473 is assumed to be @num@.
474 The field @cost@ gives the sum of the \emph{costs} of those closures
475 associated with the retainer set: if a closure @c@ is
476 associated with the retainer set, that is, if @c@ is retained by each
477 retainer in the retainer set and none else,
478 the cost of @c@ is added to the field @cost@.
479 The field @id@ gives a unique identification number for the retainer set.
481 We define a \emph{cost function}, which returns the cost of a given closure,
482 in order to compute the field @cost@.
483 The cost function can be defined in several ways.
484 A typical definition is on the size of a closure, which results in
485 the field @cost@ accumulating the size of all the closures retained by a
487 If we just want to count the number of closures retained by the
488 retainer set, we can simply set the cost of every closure to one regardless
490 Furthermore, we can define the cost function flexibly according to
492 For instance, we can set the size of any static closure to zero so that
493 it is not taken into account at all in computing the field @cost@.
494 Notice that static closures are also visited during traversal because they
495 may lead to other dynamic closures (e.g., static indirection closures of
496 closure type @IND_STATIC@).
497 This is especially desirable because we usually focus on the heap use.
498 We can also selectively choose certain dynamic closure types not to contribute
501 In our implementation, there are two functions related with the cost function:
502 @cost()@ and @costPure()@.
503 @cost()@ returns the size of the entire memory allocated for a given closure
504 (even including the two fields in the structure @StgProfHeader@).
505 It returns zero for static closures.
506 @costPure()@ returns the size of the memory which would be allocated for
507 a given closure with no profiling.
508 It is defined in terms of @cost()@, and it suffices to change only @cost()@
509 when a new scheme for the cost function is desired.
510 @costPure()@ is put to actual use in computing the field @cost@ because it
511 effectively hides the memory overhead incurred by profiling.
513 \subsection{Implementation}
515 The algorithm for retainer profiling in Figure~\ref{fig-retaineralgorithm}
516 adds at most one retainer to an existing retainer set (or an empty retainer set)
517 at any moment; it does not require a retainer set union operation.
518 This observation simplifies the implementation, and
519 we employ the following two functions for creating new retainer sets:
520 @singleton()@, which creates a singleton retainer set, and
521 @addElement()@, which adds an element to an existing retainer set.
523 It is a frequent operation during retainer profiling to search for a retainer
524 set, which may or may not exist, built from a given retainer set and a
526 To efficiently implement this operation,
527 we choose to store all retainer sets in a hash table and
528 the structure @retainerSet@ is now extended with two new fields
529 @hashKey@ and @link@.
530 The field @hashKey@ stores the hash key which is obtained
531 from the retainers in a retainer set.
532 The field @link@ points to the next retainer set in the same bucket:
535 typedef struct _retainerSet {
538 struct _retainerSet *link;
543 The hashing function must be defined in such a way that a retainer set
544 can have only one unique hash key regardless of the order its elements
545 are stored, i.e., the hashing function must be additive.
547 It is often observed that two successive executions of retainer profiling share
548 a number of retainer sets in common, especially if the two executions are
550 This also implies that the number of all retainer sets which can be created
551 at any moment does not grow indefinitely regardless of the interval at which
552 retainer profiling is performed; it does not grow commensurately with the
553 number of times retainer profiling is executed.
554 This observation eliminates the need to free the memory allocated for
555 retainer sets; we can simply set the @cost@ field of every retainer set
556 to zero after each retainer profiling and reuse it during the next time.
558 \section{Graph Traversal}
560 At the heart of retainer profiling lies \emph{graph traversal};
561 the algorithm in Figure~\ref{fig-retaineralgorithm} is supposed to visit
562 every closure in the graph at least once and yield statistics on the heap use.
563 Since only live closures are reachable from the root, the algorithm
564 does not deal with dead closures.
566 This section presents details on how to achieve an efficient implementation of
567 graph traversal without incurring extra memory overhead and compromising speed.
571 Traversing a graph itself can be done in a straightforward way;
572 we choose either depth first search or breadth first search, and traverse
573 the graph starting from a given set of roots.
574 After a complete traversal, each live closure @c@ (including static closures)
575 has an associated retainer set, whose address is stored in the field
578 A real complication arises when retainer profiling is performed once again:
579 all live closures which have survived all garbage collections since
580 the previous retainer profiling
581 still have an associated retainer set (indicated by
582 a non-null pointer in their retainer set field), which is no longer
583 valid. Any new closure created since then has
584 a null pointer in its retainer set field at the beginning of retainer
585 profiling and will become associated with a retainer set.
586 Thus, we can no longer distinguish valid retainer set fields
589 A simple remedy is to linearly scan the heap at the beginning of each
590 retainer profiling and set all retainer set fields to a null pointer.
591 It resets the retainer set field of each dynamic closure, whether it is
592 live or not with respect to the given set of root.
593 This is feasible because any closure in the heap directly adjoins the
594 next closure, if any.
595 The problem is that we have no way of visiting all live static closures,
596 for which we compute retainer sets.
598 A moment of thought, however, reveals that we can completely avoid computing
599 retainer sets for static closures. This is because retainer profiling is
600 concerned only about the heap, which consists of dynamic closures and no
601 static closures. In other words, we can treat every static closure as
602 a bridge connecting two dynamic closures.
603 For instance, if a dynamic closure @c@$_1$ has a pointer to a static
604 closure @s@ and @c@ has a pointer to another dynamic closure @c@$_2$,
605 we can think of the pointer in @c@$_1$ as a direct pointer to @c@$_2$.
606 The big problem is that if the graph has a cycle containing static closures,
607 an infinite loop occurs. In other words, we have no way of telling whether
608 a static closure has been visited or not and are forced to compute
609 retainer sets for static closures as well.\footnote{For instance,
610 a static closure is allowed to have a self-reference in its SRT, which
611 is also followed during retainer profiling.}
613 Another remedy is to stores in every closure a time stamp for the
614 retainer set field. The time stamp indicates whether the retainer
615 set field is valid or no longer valid (i.e., it is for the previous
617 At the cost of one extra field in each closure, we can achieve an
618 elegant implementation with little complication.
619 However, it turns out that the memory overhead is too big.\footnote{A typical
620 dynamic closure is only two or three words long.}
621 Thus, our goal is to stick to the definition of the structure @StgProfHeader@
622 given earlier and yet to achieve an elegant solution.
624 \subsection{Basic plan}
626 Since we visit every live object and update its retainer set field,
627 any retainer set field can either be valid (the corresponding retainer
628 set is valid) or point to a retainer set created during the previous
630 In order to distinguish valid retainer set fields
631 from invalid ones, we exploit the least significant bit of the retainer
632 set field: we maintain a one bit mark which flips over every time
633 retainer profiling is performed, and judge that a retainer set field is
634 valid only if its least significant bit matches the mark.
635 The variable @flip@ serves for this purpose.
636 The macros @isRetainerSetFieldValid()@ tests if the retainer set field
637 of a give closure @c@ is valid:
640 #define isRetainerSetFieldValid(c) \
641 ((((StgWord)(c)->header.prof.rs & 1) ^ flip) == 0)
644 As an example, a retainer set field can be set to a null value conforming
645 the current value of @flip@ by the macro @setRetainerSetToNull()@:
648 #define setRetainerSetToNull(c) \
649 (c)->header.prof.rs = (retainerSet *)((StgWord)NULL | flip)
652 Now, when a dynamic closure @c@ is created, its retainer set field is
653 initialized to a null value conforming to the current value of
654 @flip@:\footnote{Actually this is not mandatory: even when the null
655 value does not conform to the current value of @flip@, it will be replaced
656 by a correct null value when @c@ is visited later.}
660 #define SET_PROF_HDR(c,ccs_) \
661 ((c)->header.prof.ccs = ccs_, (c)->header.prof.rs = (retainerSet *)((StgWord)NULL | flip))
664 We do not need to revise @SET_STATIC_PROF_HDR()@ if the initial value of
665 @flip@ is set to $0$.\footnote{For the same reason, an initial value $1$
666 does not compromise the correctness of the implementation.}
668 \subsection{Set of roots}
670 The set of roots consists of all thread closures (running, sleeping, or
671 blocked) existing at the beginning of a retainer profiling.
672 It is handily obtained in an indirect way by invoking the function
673 @GetRoots()@ (in @Schedule.c@) with an appropriate argument, which must be
675 @GetRoots()@ invokes on each root known to the runtime system its argument.
676 Thus, we implement a function @retainClosure()@, which initiates traversal
677 from a given root and updates the retainer set of every closure reachable
679 and invokes @GetRoots()@ with @retainClosure@ as an argument.
681 In addition to the thread closures, weak pointers are also considered
682 as roots; they may not be reachable from any thread closure yet are still
684 A weak pointer has three pointer fields: @key@, @value@, and
685 @finalizer@ (see the structure @StgWeak@ in @includes/Closures.h@).
686 It turns out that these pointers may not be valid all the time:
687 at a certain point during execution, for instance, the pointer @key@ may point
689 However, right after a major garbage collection, all the three pointers are
690 guaranteed to be valid, i.e., they all point to live closures.
691 This facilitates the handling of weak pointers if we choose to
692 perform retainer profiling immediately after a major garbage collection.
693 All weak pointers are found in the linked list @weak_ptr_list@
696 See the function @computeRetainerSet()@ for details.
698 \subsection{Static closures}
700 When a live dynamic closure @c@ is visited for the first time during traversal,
701 its retainer set field is checked against the current value of @flip@.
702 If it was created at some point since the previous retainer profiling,
703 its retainer set field is already set to a correct null value.
704 Otherwise, it must have been visited
705 during the previous retainer profiling and thus its retainer set field is
706 invalid and will be set to a correct null value.
707 Therefore it is unnecessary to visit all dynamic closures and set their
708 retainer set field to a correct null value at the beginning of each retainer
711 However, this operation is required for static closures.
712 The reason is that a static closure, which is never garbage collected,
713 may appear alternately in the set of live closures.
714 In other words, a currently live static closure may become dead and
715 be resuscitated again.
716 Therefore, for a static closure, it does not help to check if its
717 retainer set field conforms to the current value of @flip@.
719 if a static closure happens to belong to the set of live closures every other
720 retainer profiling, its retainer set field will never set to a null value,
722 Therefore, we choose to visit all live static closures at the beginning
723 of each retainer profiling and set their retainer set field to a
726 In order to find all live static closures, we have each retainer
727 profiling preceded by a major garbage collection, which knows all live
728 static closures.\footnote{This is a heavy
729 restriction on retainer profiling, which makes retainer profiling partially
730 dependent on garbage collection.
731 However, it does not affect any retainer profiling result because
732 retainer profiling considers only live closures, which survive any
734 To be specific, the garbage collector builds a linked list
735 @scavenged_static_objects@ (in @GC.c@) during a major garbage collection,
736 which stores all live static closures of our interest.
738 A static closure of closure type @IND\_STATIC@ may be put in the
739 list @mut\_once\_list@ of the oldest generation, instead of the list
740 @scavenged\_static\_objects@.
741 In our implementation, such a closure is just skipped over because it
742 contains only a pointer to a dynamic closure, and we do not compute
744 Thus, there is no need to traverse the list @mut\_once\_list@ of the oldest
746 Since it destroys the linked list after finishing the major garbage collection
747 (by invoking the function @zero_static_object_list()@ with
748 @scavenged_static_objects@ as its argument),
749 we traverse the linked list to set the retainer set field of each
750 live static closure to a correct null value before its destruction.
751 This is done by invoking the function
752 @resetStaticObjectForRetainerProfiling()@.
754 \subsection{Traversal}
756 The traversal proceeds in a depth first manner and is implemented
757 with two mutually recursive functions: @retainStack()@ and @retainerClosure()@.
758 @retainerStack()@ can be invoked on dynamic closures holding a stack chunk:
759 closure types @TSO@, @PAP@, and @AP_UPD@.
760 It in turn invokes @retainerClosure()@ on each closure reachable from
761 stack closures in the stack chunk. Notice that it does not invoke
762 @retainerClosure()@ on those stack closures because we do not compute
763 retainer sets for stack closures.
764 @retainerClosure()@ iteratively traverses all live closures reachable
765 from a given closure.
766 It maintains its own stack to record the next scan position in every closure
767 currently under consideration.\footnote{A recursive version of
768 @retainerClosure()@ could be implemented easily.
769 @retainerClosure()@ in our implementation is an iterative version.}
770 When it encounters a closure holding a stack chunk, it invokes @retainerStack()@
773 the traversal is triggered simply by invoking @retainerClosure()@ on every root.
775 \subsection{Sanity check}
777 Since we assume that a retainer profiling is preceded by a major garbage
779 we expect that the size of all the dynamic closures visited during
780 any retainer profiling adds up exactly to the total size of the heap.
781 In fact, this is not the case; there can be closures not reachable from
782 the set of roots yet residing in the heap even after a major garbage
785 First, a dead weak pointer remains in the heap until its finalizer
786 finishes. Although its finalizer thread closure is part of the set of roots,
787 the dead weak pointer itself is not reachable from any root.
788 Since it cannot be visited during retainer profiling anyhow, we do not
789 need to located it and set its retainer set field
790 appropriately.\footnote{Dead weak pointers are identified with their
791 information table @stg\_DEAD\_WEAK\_info@ (in @StgMiscClosures.hc@).
792 Notice that their closure type is @CONSTR@, \emph{not} @WEAK@;
793 their information table is replaced by @stg\_DEAD\_WEAK\_info@ in the
794 function @scheduleFinalizers()@ (in @GC.c@).}
797 mutable variables (of closure type @MUT_VAR@) may remain in the heap
798 even when they are not reachable from the set of roots while
799 dynamic closures pointed to by them must be live.\footnote{I do not
800 understand clearly why this happens :(}
801 Since such mutable variables may become live again (in the sense that
802 they become reachable from the set of roots), we must locate them
803 and set their retainer set field appropriately after each retainer
804 profiling. This is handily accomplished by traversing the list
805 @mut_once_list@ in every generation.
807 \section{Retainer Profiling Schemes}
809 A retainer profiling scheme specifies \emph{what} retainer profiling
810 yields (as opposed to \emph{how} retainer profiling computes the retainer
811 set for every live object).
812 It is determined primarily by the meaning of retainer identity,
813 that is, the type @retainer@ (in @includes/StgRetainerProf.h@).
814 The function @getRetainerFrom()@ must be defined according to the
815 definition of the type @retainer@.
817 In order for a new retain profiling scheme to fully work, we need to follow
821 \item Define the type @retainer@ as desired.
822 \item Write @getRetainerFrom()@.
823 \item Write two hashing functions @hashkeySingletone()@ and
824 @hashKeyAddElement()@, which return the hash key from a single
825 retainer and a retainer set with another retainer, respectively.
826 \item Write two printing functions @printRetainer()@ and
827 @printRetainerSetShort()@.
828 These functions are employed when a retainer or a retainer set is
829 printed in the output file.
832 In our implementation, we use cost centre stacks for retainer identity:
835 typedef CostCentreStack *retainer;
838 retainer getRetainerFrom(StgClosure *c) { return c->header.prof.ccs; }
841 void printRetainer(FILE *f, retainer cc)
843 fprintf(f,"%s[%s]", cc->label, cc->module);
849 Since cost centre stacks are used as retainer identity, a source program
850 must be given proper cost centre annotations by programmers.
852 we can ask the compiler to automatically insert cost centre annotations.
853 For instance, the compiler option @-auto-all@ inserts a cost centre
854 annotation around every top-level function as shown below
855 (the @-p@ option is a must
856 because we must build the executable file in a profiling way):
859 $ ghc-inplace -o Foo.out -p -auto-all Foo.hs
862 The runtime system option @-hR@ tells the executable program to
863 gather profiling statistics and report them in a @.prof@ file:
866 $ Foo.out +RTS -hR -RTS
869 The option @-i@ can be used to
870 specify a desired interval at which retainer profiling is performed.
871 The default and minimum value is half a second:
874 $ Foo.out +RTS -hR -i2.5 -RTS
877 Then, two text files are generated: a @.prof@ file and a @.hp@ file.
878 The @.prof@ file records the progress of retainer profiling:
879 for each retainer profiling performed during program execution,
881 the Haskell mutator time (as opposed to the user time) at which
882 the retainer profiling starts,
883 the average number of times a closure is visited,
884 the sum of costs assigned to all retainer sets (obtained from the field
885 @cost@ in each retainer set),
886 and the number of all retainer sets created \emph{since} the beginning
887 of program execution.
888 A typical entry in a @.prof@ file looks like:
891 Retainer Profiling: 3, at 3.530000 seconds
892 Average number of visits per object = 1.687765
893 Current total costs = 801844
894 Number of retainer sets = 118
897 The sum of costs assigned to all retainer sets may \emph{not} be equal to the
899 The discrepancy is attributed to those live object which are not reachable
900 from the set of roots.
901 Still it is a good estimate of the size of the heap at the moment when
902 the retainer profiling was performed.
904 The @.prof@ file also shows the contents of every retainer set which
905 has been assigned a positive cost (i.e., the field @cost@) at least once;
906 not every retainer set created is assigned a positive cost because quite
907 a few retainer sets are created as intermediate retainer sets before
908 creating a real retainer set. This results from the restriction on the way
909 retainer sets are created (only one retainer can be added to an existing
910 retainer set at a time).
912 An example of the contents of a retainer set is:
915 SET 71 = {<doFile[Main],main[Main],MAIN[MAIN]>, <synth_2[Main],doFile[Main],main[Main],MAIN[MAIN]>}
918 The retainer set has an identification number $71$.
919 It is associated with two retainers, whose retainer identities are shown
920 inside angle brackets @<...>@.
921 For instance, the first retainer is created when the cost centre stack
922 is @doFile[Main],main[Main],MAIN[MAIN]@, shown from the top to the bottom.
923 Each entry in angle brackets consists of a cost centre name (e.g., @doFile@)
924 and its module name (e.g., @Main@).
926 The @.hp@ file can be supplied to the @hp2ps@ program to create a postscript
927 file showing the progress of retainer profiling in a graph:
934 An example of such a graph is shown in Figure~\ref{fig-cacheprof}.
935 It shows the cost assigned to each retainer set at the point
936 when a retainer profiling is performed (marked by a corresponding inverted
937 triangles on the horizontal axis).
938 The abbreviated contents of each retainer set is displayed in the right column.
939 Due to the space limitation,
940 it shows only topmost cost centres (without module names)
941 instead of printing the full contents.
942 For instance, @(71)doFile,synth_2@ corresponds to a retainer set shown above
943 (@71@ is its identification number).
944 The contents may be truncated if it is too long.
946 Notice that the time is in the Haskell mutator time, which excludes
947 the runtime system time such as garbage collection time and retainer profiling
948 time. Thus, the actual execution takes longer than indicated in the
949 graph. Also, the timer employed to periodically perform retainer profiling
950 is not perfectly accurate. Therefore, the result may slightly vary for each
951 execution of retainer profiling.
955 \epsfig{file=cacheprof_p.eps,width=5in}
956 \caption{A graph showing the progress of retainer profiling}
957 \label{fig-cacheprof}
960 \section{Comparision with nhc}
964 This section gives a summary of changes made to the GHC in
965 implementing retainer profiling.
966 Only three files (@includes/StgRetainerProf.h@, @RetainerProfile.c@, and
967 @RetainerProfile.h@) are new, and all others exist in the GHC.
969 @\includes@ directory:
972 \item[StgRetainerProf.h] defines types @retainer@ and @retainerSet@.
973 \item[Stg.h] includes the header file @StgRetainerProf.h@.
974 \item[Closures.h] changes structure @StgProfHeader@.
975 \item[Constants.h] changes constants @PROF_HDR_SIZE@, @SCC_UF_SIZE@, and
976 @SCC_SEQ_FRAME_SIZE@.
977 \item[ClosureMacros.h] changes macros @SET_PROF_HDR()@ and
978 @SET_STATIC_PROF_HDR()@.
979 \item[Updates.h] changes macro @PUSH_STD_CCCS()@.
985 \item[Exception.hc] changes constant @CATCH_FRAME_BITMAP@,
986 \item[StgStartup.hc] changes constant @STOP_THREAD_BITMAP@.
987 \item[StgStdThunks.hc] changes constant @RET_BITMAP@.
988 \item[Updates.hc] changes constant @UPD_FRAME_BITMAP@.
989 \item[RetainerProfile.c] implements the retainer profiling engine.
990 \item[RetainerProfile.h] is the header for @RetainerProfile.c@.
991 \item[GC.c] invokes @resetStaticObjectForRetainerProfiling()@ in
993 \item[Itimer.c] changes @handle_tick()@.
994 \item[ProfHeap.c] changes @initHeapProfiling()@ and @endHeapProfiling()@.
995 \item[Profiling.c] changes @initProfilingLogFile()@ and
996 @report_ccs_profiling()@.
997 \item[Proftimer.c] declares @ticks_to_retainer_profiling@,
998 @performRetainerProfiling@, and @doContextSwitch@.
999 \item[Proftimer.h] is the header for @Proftimer.c@.
1000 \item[RtsAPI.c] implements @setRetainerField()@.
1002 sets @RtsFlags.ProfFlags.doHeapProfile@ and
1003 adds a string to @usage_text[]@ in @setupRtsFlags()@.
1004 \item[RtsFlags.h] defines constants @HEAP_BY_RETAINER@ and @RETAINERchar@.
1005 \item[RtsStartup.c] includes the header file @RetainerProfile.h@.
1006 Changes @shutdownHaskell()@.
1007 \item[Schedule.c] changes @schedule()@.
1009 declares @RP_start_time@, @RP_tot_time@, @RPe_start_time@,
1011 Changes @mut_user_time_during_GC()@, @mut_user_time()@,
1013 @stat_endExit()@, and
1016 @mut_user_time_during_RP()@,
1017 @stat_startRP()@, and
1019 \item[Stats.h] is the header for @Stats.c@.
1020 \item[StgMiscClosures.hc] redefines @stg_DEAD_WEAK_info@.
1021 \item[Storage.c] changes @initStorage()@, @memInventory()@.
1024 \bibliographystyle{plain}
1025 \bibliography{reference}