ghc/docs/storage-mgt/rp.tex

   1 \documentclass{article}
   2 \usepackage{code,a4wide}
   3
   4 \usepackage{graphics,epsfig,epic,eepic,epsfig}
   5
   6 \setlength{\parskip}{0.25cm}
   7 \setlength{\parsep}{0.25cm}
   8 \setlength{\topsep}{0cm}
   9 \setlength{\parindent}{0cm}
  10 \renewcommand{\textfraction}{0.2}
  11 \renewcommand{\floatpagefraction}{0.7}
  12
  13
  14 % Terminology
  15 \newcommand{\block}{block}
  16 \newcommand{\Block}{Block}
  17 \newcommand{\segment}{segment}
  18 \newcommand{\Segment}{Segment}
  19 \newcommand{\step}{step}
  20 \newcommand{\Step}{Step}
  21
  22 \newcommand{\note}[1]{{\em $\spadesuit$ #1}}
  23
  24 \begin{document}
  25 \title{Implementation of Retainer Profiling}
  26 \author{Sungwoo Park}
  27
  28 \makeatactive
  29 \maketitle
  30
  31 \section{Retainer Profiling}
  32
  33 Retainer profiling~\cite{CN} is a profiling technique which is based upon a
  34 special view of production and consumption of heap objects at runtime:
  35 while producers build heap objects to form new pieces of graph,
  36 consumers hold pointers to these heap objects, or \emph{retain} them, so
  37 that they are not freed during garbage collections.
  38 On this basis, we refereed to such consumers as \emph{retainers}.
  39 Notice that an object can have more than one retainer because it can
  40 be pointed to by multiple objects.
  41
  42 For each live object in the heap, retainer profiling computes
  43 all its retainers, or its \emph{retainer set}.
  44 A naive implementation of retainer profiling could consider every
  45 immediate ancestor of an object as its retainer.
  46 Although this approach appears to provide an accurate report on the
  47 relationship between retainers and retainees, the result can hardly be useful.
  48 For instance, it is pointless to scrutinize a list and treat each cons
  49 cell as a retainer of the following cons cell.
  50 This observation suggests that we need to have a way of designating only
  51 certain types of objects as candidates for retainers.
  52 In other words, we need to employ an oracle which tells whether a given
  53 object can be a retainer or not.
  54
  55 Since no retainer of a particular object needs to be using the
  56 object actively, we can find all its retainers simply by traversing
  57 the graph. In other words, we do not have to distinguish those retainers
  58 actively exploiting it from other retainers just holding pointers
  59 to it (either directly or indirectly).
  60 Thus, retainer profiling can be accomplished simply by traversing the
  61 graph.
  62
  63 Figure~\ref{fig-retaineralgorithm} shows the algorithm for retainer
  64 profiling. The traversal begins at every root, and proceeds
  65 in a depth first manner (or a breadth first manner).
  66 The function @R()@ returns the identity of a given retainer such
  67 as its memory address, its information table address, or
  68 the name of the module
  69 which creates it.
  70 Notice that the retainer identity function does not need to be a
  71 one-to-one mapping:
  72 multiple objects can share the same identity.
  73 Such a retainer identity function reduces the cost of traversal.
  74 For instance, when an object
  75 is reached from several retainers which share the same identity, we need to
  76 consider only the first visit to the object.
  77 In other words, whichever retainer (among those sharing the same identity)
  78 leads to the object for the first time affects the retainer set of the object
  79 in consideration
  80 and all the other retainers can be ignored.
  81 Thus, the more coarse the function @R()@ is, the less
  82 it costs to traverse the graph for retainer profiling.
  83 The function @isRetainer()@ tells whether a given object is a retainer or not.
  84 Hence, the behavior of the retainer profiling algorithm is determined
  85 uniquely by: 1) the set of roots; 2) the function @R()@; 3) the function
  86 @isRetainer()@.
  87
  88 \begin{figure}[ht]
  89 \begin{center}
  90 \begin{code}
  91 for every root r
  92   retain(r, r)
  93
  94 R(r) =
  95   the identity of r
  96
  97 isRetainer(c) =
  98   if c is a retainer then
  99     true
 100   else
 101     false
 102
 103 retain(c, r) =
 104   if R(r) is a member of c.retainerSet then
 105     return
 106   add R(r) to c.retainerSet
 107   if isRetainer(c) then
 108     r' := c
 109   else
 110     r' := r
 111   for every successor c' of c
 112     retain(c', r')
 113 \end{code}
 114 \caption{Retainer profiling algorithm}
 115 \label{fig-retaineralgorithm}
 116 \end{center}
 117 \end{figure}
 118
 119 Another way of formulating retainer profiling is in terms of fixed point
 120 equations on retainer sets.
 121 To be specific, given the two functions @isRetainer()@ and @R()@,
 122 the retainer set of every object is computed as the least fixed point
 123 solution of the following equations:
 124 \begin{itemize}
 125 \item For every root @r@, @R(r)@ $\in$ @r.retainerSet@.
 126 \item For every object @c@,
 127 $\bigcup_{\mathit{each\ ancestor\ @a@\ of\ @c@}}$ @from(a)@ $\subseteq$
 128 @c.retainerSet@.
 129 \item @from(a)@ = if @isRetainer(a)@ then @a.retainerSet@ else $\{@a@\}$.
 130 \end{itemize}
 131
 132 This document describes the implementation of retainer profiling on
 133 the Glasgow Haskell Compiler runtime system.
 134 It explains every detail of the development so that it can be (hopefully)
 135 a complete maintenance guide.
 136 A secondary goal is to help (hopefully) those who wish to extend the system
 137 to implement another profiling scheme.\footnote{Unless otherwise mentioned,
 138 all identifiers are defined in @RetainerProfile.c@}
 139
 140 \section{Installing the GHC}
 141
 142 Installing the GHC is done as follows:
 143
 144 \begin{enumerate}
 145 \item Get the source code from the CVS repository.
 146 \begin{code}
 147 ./              cvs checkout fpconfig
 148 ./              cvs checkout fptools
 149 \end{code}
 150
 151 \item Set up the basic configuration.
 152 \begin{code}
 153 ./fptools/      autoconf
 154 ./fptools/ghc/  autoconf
 155 ./fptools/      configure
 156 \end{code}
 157
 158 \item Set up the configuration for development and debugging.
 159 \begin{code}
 160 ./fptools/mk    vi build.mk
 161     GhcHcOpts = -O -fasm -Rghc-timing
 162     SplitObjs = NO
 163     GhcLibWays = p
 164     GhcRtsHcOpts =
 165     GhcRtsCcOpts = -g
 166     STRIP =:
 167 \end{code}
 168 @GhcLibWays@ tells the compiler to build the code for profiling as well.
 169 @GhcRtsHcOpts@ has additional flags for @gcc@ when compiling @.hc@ files.
 170 @GhcRtsCcOpts@ has additional flags for @gcc@ when compiling @.c@ files.
 171 Since we will implement retainer profiling in @.c@ files, we turn on the
 172 debugging flag @-g@.
 173 The empty setting for @STRIP@ tells the compiler not to remove source code
 174 information (generated due to the @-g@ option) from executable files so that
 175 they can be examined with @gdb@.
 176
 177 \item Remove unnecessary files if needed and build everything.
 178 \begin{code}
 179 ./fptools/      rm dll green-card haggis happy hdirect hood libraries
 180 ./fptools/      make
 181 \end{code}
 182 \end{enumerate}
 183
 184 \section{Adding Retainer Set Fields}
 185
 186 Since every Haskell closure now needs to store its retainer set at runtime,
 187 it must be augmented with a new field,
 188 namely, a emph{retainer set field}.
 189 This section explains how to add such a field to Haskell closures.
 190 It should be clear how to generalize the idea for adding
 191 any number of new fields.\footnote{The GHC provides two
 192 ways of building executable programs from
 193 source files: normal way and profiling way.
 194 We are concerned only about profiling way, and all the pieces of code
 195 implementing profiling way are wrapped by the @PROFILING@
 196 pre-processing directive (as in @\#ifdef PROFILING@).
 197 Therefore, all the additions and changes that we make to the source code
 198 are assumed to be wrapped by the @PROFILING@ pre-processing
 199 directive as well unless otherwise mentioned.}
 200
 201 \subsection{Adding a new field to Haskell closures}
 202
 203 We want to add a retainer set field of type @retainerSet@ to every
 204 closure, so we create a new file @includes/StgRetainerProf.h@ where
 205 we define the type @retainerSet@.
 206 The actual definition of @retainerSet@ will be given later.
 207
 208 \begin{code}
 209 /* includes/StgRetainerProf.h */
 210 typedef ... retainerSet;
 211 \end{code}
 212
 213 We make type @retainerSet@ to be publicly available by including
 214 @includes/StgRetainerProf.h@ itself to @includes/Stg.h@ (not wrapped
 215 by @PROFILING@).
 216
 217 \begin{code}
 218 /* includes/Stg.h */
 219 #include "StgRetainerProf.h"
 220 \end{code}
 221
 222 Then we add a retainer set field @rs@ to the @StgProfHeader@ structure.
 223
 224 \begin{code}
 225 /* include/Closures.h */
 226 typedef struct {
 227    CostCentreStack *ccs;
 228    retainerSet *rs;
 229 } StgProfHeader;
 230 \end{code}
 231
 232 Now every closure @c@ (including static closures) has a retainer set field,
 233 which can be accessed with @c->header.prof.rs@ (where @c@ denotes the
 234 address of the closure).
 235
 236 \subsection{Changing constants}
 237
 238 We are ready to reflect the new size of Haskell closures to other part
 239 of the source code.
 240 This is accomplished by changing a few constants which specify the size
 241 of certain types of closures and their layout.
 242
 243 When building the runtime system, the @gcc@ compiler correctly figures out
 244 the size of every structure on its own.
 245 However,
 246 the GHC simply reads @includes/Constants.h@ to to determine the size of
 247 closures assumed by the runtime system.
 248 Thus, we change the constants used by the GHC itself (as opposed to
 249 the runtime system). They are all found in @includes/Constants.h@.
 250 We increase each of them by 1 to reflect the retainer set field which is one
 251 word long:
 252 \begin{code}
 253 /* includes/Constants.h */
 254 #define PROF_HDR_SIZE       2
 255 #define SCC_UF_SIZE         5
 256 #define SCC_SEQ_FRAME_SIZE  4
 257 \end{code}
 258 @PROF_HDR_SIZE@ denotes the size of the structure @StgProfHeader@, which
 259 is now two words long.
 260 @SCC_UF_SIZE@ and @SCC_SEQ_FRAME_SIZE@ denote the size of the structures
 261 @StgUpdateFrame@ and @StgSeqFrame@ (in @includes/Closures.h@) in
 262 words.
 263
 264 Since these constants are used when building the GHC (by another compiler),
 265 we must rebuild the GHC. In other words, when executed, the code generated by
 266 the GHC must now allocate one more word for the retainer set field than before,
 267 and hence it is inevitable to rebuild the GHC, which is done as follows:
 268
 269 \begin{code}
 270 ./fptools/ghc/  make clean
 271 ./fptools/ghc/  make boot
 272 ./fptools/ghc/  make
 273 \end{code}
 274
 275 The second command @make boot@ instructs the build system to analyze
 276 the source code dependency so that the next execution of @make@ correctly
 277 finds all required files.
 278
 279 Next we change four bitmap constants which specify the layout of
 280 certain types of closures.
 281 As an example, let us consider @RET_BITMAP@, which specifies the layout
 282 of thunk selectors (corresponding to closure type @THUNK_SELECTOR@).
 283 Without a retainer set field, there is only one non-pointer (represented
 284 by $1$) followed by one or more pointers (represented by $0$) in the closure
 285 body and the bitmap representation is $0b1$, or $1$.
 286 With a retainer set field, which is not a pointer to another closure and thus
 287 represented by $1$, there are two non-pointers, and the bitmap representation
 288 is $0b11$, or $3$. Notice that the bitmap is interpreted in reverse order:
 289 the least significant bit corresponds to the first word in the closure body,
 290 and the second least significant bit to the second word, and so forth.
 291 The same rule applies to the other three bitmap constants:
 292 @CATCH_FRAME_BITMAP@ (for closure type @CATCH_FRAME@ and structure
 293 @StgCatchFrame@),
 294 @STOP_THREAD_BITMAP@ (for closure type @STOP_FRAME@ and structure
 295 @StgStopFrame@), and
 296 @UPD_FRAME_BITMAP@ (for closure type @UPDATE_FRAME@ and structure
 297 @StgUpdateFrame@).
 298
 299 \begin{code}
 300 /* rts/StgStdThunks.hc */
 301 #define RET_BITMAP          3
 302 /* rts/Exception.hc */
 303 #define CATCH_FRAME_BITMAP  15
 304 /* rts/StgStartup.hc */
 305 #define STOP_THREAD_BITMAP  3
 306 /* rts/updates.hc */
 307 #define UPD_FRAME_BITMAP    7
 308 \end{code}
 309
 310 For most closure types, the new definition of @StgProfHeader@ is
 311 automatically propagated to their corresponding structures.
 312 However, there are six closures types which are not affected by
 313 @StgProfHeader@. They are all stack closures:
 314 @RET_DYN@, @RET_BCO@, @RET_SMALL@, @RET_VEC_SMALL@, @RET_BIG@, and
 315 @RET_VEC_BIG@.
 316 If you want a new field to be added to these closures, you may
 317 have to modify their corresponding structures.
 318
 319 \textbf{To do:} Currently the above changes introduce a new bug in the
 320 runtime system. For instance, @nofib/real/symalg@ ends up with a division-by-zero
 321 exception if we add a new field.
 322
 323 \subsection{Initialization code}
 324
 325 When a new closure is created, its retainer set field may have to be
 326 initialized according to the way that retainer profiling is implemented.
 327 For instance, we could use as an initial value a pointer to an empty retainer
 328 set.
 329 Alternatively we could assign a null pointer to indicate that its retainer
 330 set has not been computed yet, which we adopt in our implementation.
 331 In either case, we have to visit the new closure and execute some initialization
 332 code on it so that its retainer set field is set to an appropriate value.
 333
 334 There are four parts in the source code which need to be modified.
 335 dynamic closure initialization, static closure initialization,
 336 closure creation through C function invocation,
 337 and update frame initialization.
 338 The first is accomplished by modifying the macro @SET_PROF_HDR()@ (in
 339 @include/ClosureMacros.h@). When a closure  @c@ is created at runtime,
 340 @SET_PROF_HDR()@ is invoked immediately with @c@ as its first argument.
 341 Thus, the following code initializes the retainer set field of every
 342 dynamic closure to a null pointer.
 343
 344 \begin{code}
 345 /* include/ClosureMacros.h */
 346 #define SET_PROF_HDR(c,ccs_)            \
 347         ((c)->header.prof.ccs = ccs_, (c)->header.prof.rs = NULL)
 348 \end{code}
 349
 350 Similarly, the macro @SET_STATIC_PROF_HDR()@ (in the
 351 same file) specifies how the retainer set field of every static closure
 352 is initialized, which is rewritten as follows:
 353
 354 \begin{code}
 355 /* include/ClosureMacros.h */
 356 #define SET_STATIC_PROF_HDR(ccs_)       \
 357         prof : { ccs : ccs_, rs : NULL },
 358 \end{code}
 359
 360 There is another way of creating dynamic closures through explicit C
 361 function invocations at runtime.
 362 Such functions are all defined in @RtsAPI.c@: @rts_mkChar()@, @rts_mkInt()@,
 363 @rts_mkWord()@, and so forth.
 364 Each function allocates memory for a new closure,
 365 initializes it, and returns its address.
 366 Therefore, we can simply insert in each function another initialization code
 367 for retainer set fields.
 368 To this end, we define a macro @setRetainerField()@ and insert it
 369 in each function:
 370
 371 \begin{code}
 372 #define setRetainerField(p)             \
 373   (p)->header.prof.rs = NULL
 374 \end{code}
 375
 376 For instance, @rts_mkChar()@ is now defined as follows:
 377
 378 \begin{code}
 379 /* RtsAPI.c */
 380 HaskellObj
 381 rts_mkChar (HsChar c)
 382 {
 383   StgClosure *p = (StgClosure *)allocate(CONSTR_sizeW(0,1));
 384   ...
 385   setRetainerField(p);
 386   return p;
 387 }
 388 \end{code}
 389
 390 Finally we may need to initialize the retainer set field of an update frame
 391 (stack closure) when it is pushed onto the stack for the first
 392 time.\footnote{In our implementation of
 393 retainer profiling, the retainer set field is not used for any stack closure.
 394 Hence, the following modification is entirely unnecessary.
 395 Also, update frames are the only exception to the standard way of creating
 396 stack closures: all the other types of stack closures with a retainer set
 397 field are eventually initialized by
 398 the macro @SET\_HDR()@ (in @includes/ClosureMacros.h@), which in turn
 399 invokes @SET\_PROF\_HDR()@. This is not the case for update frames.
 400 Compare @PUSH\_UPD\_FRAME()@ (in @includes/Updates.h@) and
 401 @PUSH\_SEQ\_FRAME()@ (in @includes/StgMacros.h@) for clarification.}
 402 For instance, if we want to initialize the retainer set field of update
 403 frames to a null pointer, we can rewrite the macro @PUSH_STD_CCCS()@
 404 (in @includes/Updates.h@) as follows:
 405
 406 \begin{code}
 407 /* includes/Updates.h */
 408 #define PUSH_STD_CCCS(frame)            \
 409   (frame->header.prof.ccs = CCCS, frame->header.prof.rs = NULL)
 410 \end{code}
 411
 412 \section{Retainer Sets}
 413
 414 At the end of retainer profiling, every live closure (except stack
 415 closures, for which we do not compute retainer sets) is associated with
 416 a retainer set; there can be no closure without an associated retainer set
 417 because every live closure is visited during traversal.
 418 Since many closures may well be associated with a common retainer set,
 419 we want to avoid creating any particular retainer set more than once.
 420 This section presents the details of manipulating retainer sets in our
 421 implementation.
 422
 423 \subsection{Identity of a retainer}
 424
 425 The function @R()@ in Figure~\ref{fig-retaineralgorithm} returns
 426 the identity of a retainer. In order to implement it, we need
 427 a type for retainer identity.
 428 The type @retainer@ (in @includes/StgRetainerProf.h@) is introduced for
 429 this purpose.
 430
 431 There are various ways of defining the type @retainer@.
 432 For instance, we can designate the information table address of a retainer as
 433 its identity as follows:
 434
 435 \begin{code}
 436 struct _StgInfoTable;
 437 typedef struct _StgInfoTable *retainer;
 438 \end{code}
 439
 440 We can also use the cost centre stack associated with the retainer:
 441
 442 \begin{code}
 443 typedef CostCentreStack *retainer;
 444 \end{code}
 445
 446 The function @R()@ is embodied as the function @getRetainerFrom()@ in the
 447 implementation, whose type is @(StgClosure *)@ $\rightarrow$ @retainer@.
 448 It is straightforward to define @getRetainerFrom()@ according to the definition
 449 of @retainer@, as illustrated below:
 450
 451 \begin{code}
 452 retainer getRetainerFrom(StgClosure *c) { return get_itbl(c); }
 453 retainer getRetainerFrom(StgClosure *c) { return c->header.prof.ccs; }
 454 \end{code}
 455
 456 \subsection{Retainer sets and the cost function}
 457
 458 A retainer set is stored in the structure @retainerSet@
 459 (in @includes/StgRetainerProf.h@):
 460
 461 \begin{code}
 462 typedef struct _retainerSet {
 463   nat num;
 464   nat cost;
 465   ...
 466   int id;
 467   retainer element[0];
 468 } retainerSet;
 469 \end{code}
 470
 471 The field @num@ gives the number of retainers in the retainer set, which
 472 are all stored in the array @element[]@. Thus, the size of @element[]@
 473 is assumed to be @num@.
 474 The field @cost@ gives the sum of the \emph{costs} of those closures
 475 associated with the retainer set: if a closure @c@ is
 476 associated with the retainer set, that is, if @c@ is retained by each
 477 retainer in the retainer set and none else,
 478 the cost of @c@ is added to the field @cost@.
 479 The field @id@ gives a unique identification number for the retainer set.
 480
 481 We define a \emph{cost function}, which returns the cost of a given closure,
 482 in order to compute the field @cost@.
 483 The cost function can be defined in several ways.
 484 A typical definition is on the size of a closure, which results in
 485 the field @cost@ accumulating the size of all the closures retained by a
 486 retainer set.
 487 If we just want to count the number of closures retained by the
 488 retainer set, we can simply set the cost of every closure to one regardless
 489 of its closure type.
 490 Furthermore, we can define the cost function flexibly according to
 491 the closure type.
 492 For instance, we can set the size of any static closure to zero so that
 493 it is not taken into account at all in computing the field @cost@.
 494 Notice that static closures are also visited during traversal because they
 495 may lead to other dynamic closures (e.g., static indirection closures of
 496 closure type @IND_STATIC@).
 497 This is especially desirable because we usually focus on the heap use.
 498 We can also selectively choose certain dynamic closure types not to contribute
 499 to the field @cost@.
 500
 501 In our implementation, there are two functions related with the cost function:
 502 @cost()@ and @costPure()@.
 503 @cost()@ returns the size of the entire memory allocated for a given closure
 504 (even including the two fields in the structure @StgProfHeader@).
 505 It returns zero for static closures.
 506 @costPure()@ returns the size of the memory which would be allocated for
 507 a given closure with no profiling.
 508 It is defined in terms of @cost()@, and it suffices to change only @cost()@
 509 when a new scheme for the cost function is desired.
 510 @costPure()@ is put to actual use in computing the field @cost@ because it
 511 effectively hides the memory overhead incurred by profiling.
 512
 513 \subsection{Implementation}
 514
 515 The algorithm for retainer profiling in Figure~\ref{fig-retaineralgorithm}
 516 adds at most one retainer to an existing retainer set (or an empty retainer set)
 517 at any moment; it does not require a retainer set union operation.
 518 This observation simplifies the implementation, and
 519 we employ the following two functions for creating new retainer sets:
 520 @singleton()@, which creates a singleton retainer set, and
 521 @addElement()@, which adds an element to an existing retainer set.
 522
 523 It is a frequent operation during retainer profiling to search for a retainer
 524 set, which may or may not exist, built from a given retainer set and a
 525 particular retainer.
 526 To efficiently implement this operation,
 527 we choose to store all retainer sets in a hash table and
 528 the structure @retainerSet@ is now extended with two new fields
 529 @hashKey@ and @link@.
 530 The field @hashKey@ stores the hash key which is obtained
 531 from the retainers in a retainer set.
 532 The field @link@ points to the next retainer set in the same bucket:
 533
 534 \begin{code}
 535 typedef struct _retainerSet {
 536   ...
 537   StgWord hashKey;
 538   struct _retainerSet *link;
 539   ...
 540 } retainerSet;
 541 \end{code}
 542
 543 The hashing function must be defined in such a way that a retainer set
 544 can have only one unique hash key regardless of the order its elements
 545 are stored, i.e., the hashing function must be additive.
 546
 547 It is often observed that two successive executions of retainer profiling share
 548 a number of retainer sets in common, especially if the two executions are
 549 close in time.
 550 This also implies that the number of all retainer sets which can be created
 551 at any moment does not grow indefinitely regardless of the interval at which
 552 retainer profiling is performed; it does not grow commensurately with the
 553 number of times retainer profiling is executed.
 554 This observation eliminates the need to free the memory allocated for
 555 retainer sets; we can simply set the @cost@ field of every retainer set
 556 to zero after each retainer profiling and reuse it during the next time.
 557
 558 \section{Graph Traversal}
 559
 560 At the heart of retainer profiling lies \emph{graph traversal};
 561 the algorithm in Figure~\ref{fig-retaineralgorithm} is supposed to visit
 562 every closure in the graph at least once and yield statistics on the heap use.
 563 Since only live closures are reachable from the root, the algorithm
 564 does not deal with dead closures.
 565
 566 This section presents details on how to achieve an efficient implementation of
 567 graph traversal without incurring extra memory overhead and compromising speed.
 568
 569 \subsection{Goal}
 570
 571 Traversing a graph itself can be done in a straightforward way;
 572 we choose either depth first search or breadth first search, and traverse
 573 the graph starting from a given set of roots.
 574 After a complete traversal, each live closure @c@ (including static closures)
 575 has an associated retainer set, whose address is stored in the field
 576 @c->header.prof.rs@.
 577
 578 A real complication arises when retainer profiling is performed once again:
 579 all live closures which have survived all garbage collections since
 580 the previous retainer profiling
 581 still have an associated retainer set (indicated by
 582 a non-null pointer in their retainer set field), which is no longer
 583 valid. Any new closure created since then has
 584 a null pointer in its retainer set field at the beginning of retainer
 585 profiling and will become associated with a retainer set.
 586 Thus, we can no longer distinguish valid retainer set fields
 587 from invalid ones.
 588
 589 A simple remedy is to linearly scan the heap at the beginning of each
 590 retainer profiling and set all retainer set fields to a null pointer.
 591 It resets the retainer set field of each dynamic closure, whether it is
 592 live or not with respect to the given set of root.
 593 This is feasible because any closure in the heap directly adjoins the
 594 next closure, if any.
 595 The problem is that we have no way of visiting all live static closures,
 596 for which we compute retainer sets.
 597
 598 A moment of thought, however, reveals that we can completely avoid computing
 599 retainer sets for static closures. This is because retainer profiling is
 600 concerned only about the heap, which consists of dynamic closures and no
 601 static closures. In other words, we can treat every static closure as
 602 a bridge connecting two dynamic closures.
 603 For instance, if a dynamic closure @c@$_1$ has a pointer to a static
 604 closure @s@ and @c@ has a pointer to another dynamic closure @c@$_2$,
 605 we can think of the pointer in @c@$_1$ as a direct pointer to @c@$_2$.
 606 The big problem is that if the graph has a cycle containing static closures,
 607 an infinite loop occurs. In other words, we have no way of telling whether
 608 a static closure has been visited or not and are forced to compute
 609 retainer sets for static closures as well.\footnote{For instance,
 610 a static closure is allowed to have a self-reference in its SRT, which
 611 is also followed during retainer profiling.}
 612
 613 Another remedy is to stores in every closure a time stamp for the
 614 retainer set field. The time stamp indicates whether the retainer
 615 set field is valid or no longer valid (i.e., it is for the previous
 616 retainer profiling).
 617 At the cost of one extra field in each closure, we can achieve an
 618 elegant implementation with little complication.
 619 However, it turns out that the memory overhead is too big.\footnote{A typical
 620 dynamic closure is only two or three words long.}
 621 Thus, our goal is to stick to the definition of the structure @StgProfHeader@
 622 given earlier and yet to achieve an elegant solution.
 623
 624 \subsection{Basic plan}
 625
 626 Since we visit every live object and update its retainer set field,
 627 any retainer set field can either be valid (the corresponding retainer
 628 set is valid) or point to a retainer set created during the previous
 629 retainer profiling.
 630 In order to distinguish valid retainer set fields
 631 from invalid ones, we exploit the least significant bit of the retainer
 632 set field: we maintain a one bit mark which flips over every time
 633 retainer profiling is performed, and judge that a retainer set field is
 634 valid only if its least significant bit matches the mark.
 635 The variable @flip@ serves for this purpose.
 636 The macros @isRetainerSetFieldValid()@ tests if the retainer set field
 637 of a give closure @c@ is valid:
 638
 639 \begin{code}
 640 #define isRetainerSetFieldValid(c) \
 641   ((((StgWord)(c)->header.prof.rs & 1) ^ flip) == 0)
 642 \end{code}
 643
 644 As an example, a retainer set field can be set to a null value conforming
 645 the current value of @flip@ by the macro @setRetainerSetToNull()@:
 646
 647 \begin{code}
 648 #define setRetainerSetToNull(c)   \
 649   (c)->header.prof.rs = (retainerSet *)((StgWord)NULL | flip)
 650 \end{code}
 651
 652 Now, when a dynamic closure @c@ is created, its retainer set field is
 653 initialized to a null value conforming to the current value of
 654 @flip@:\footnote{Actually this is not mandatory: even when the null
 655 value does not conform to the current value of @flip@, it will be replaced
 656 by a correct null value when @c@ is visited later.}
 657
 658 \begin{code}
 659 extern StgWord flip;
 660 #define SET_PROF_HDR(c,ccs_)            \
 661         ((c)->header.prof.ccs = ccs_, (c)->header.prof.rs = (retainerSet *)((StgWord)NULL | flip))
 662 \end{code}
 663
 664 We do not need to revise @SET_STATIC_PROF_HDR()@ if the initial value of
 665 @flip@ is set to $0$.\footnote{For the same reason, an initial value $1$
 666 does not compromise the correctness of the implementation.}
 667
 668 \subsection{Set of roots}
 669
 670 The set of roots consists of all thread closures (running, sleeping, or
 671 blocked) existing at the beginning of a retainer profiling.
 672 It is handily obtained in an indirect way by invoking the function
 673 @GetRoots()@ (in @Schedule.c@) with an appropriate argument, which must be
 674 a function:
 675 @GetRoots()@ invokes on each root known to the runtime system its argument.
 676 Thus, we implement a function @retainClosure()@, which initiates traversal
 677 from a given root and updates the retainer set of every closure reachable
 678 from the root,
 679 and invokes @GetRoots()@ with @retainClosure@ as an argument.
 680
 681 In addition to the thread closures, weak pointers are also considered
 682 as roots; they may not be reachable from any thread closure yet are still
 683 being in used.
 684 A weak pointer has three pointer fields: @key@, @value@, and
 685 @finalizer@ (see the structure @StgWeak@ in @includes/Closures.h@).
 686 It turns out that these pointers may not be valid all the time:
 687 at a certain point during execution, for instance, the pointer @key@ may point
 688 to a dead closure.
 689 However, right after a major garbage collection, all the three pointers are
 690 guaranteed to be valid, i.e., they all point to live closures.
 691 This facilitates the handling of weak pointers if we choose to
 692 perform retainer profiling immediately after a major garbage collection.
 693 All weak pointers are found in the linked list @weak_ptr_list@
 694 (in @Weak.c@).
 695
 696 See the function @computeRetainerSet()@ for details.
 697
 698 \subsection{Static closures}
 699
 700 When a live dynamic closure @c@ is visited for the first time during traversal,
 701 its retainer set field is checked against the current value of @flip@.
 702 If it was created at some point since the previous retainer profiling,
 703 its retainer set field is already set to a correct null value.
 704 Otherwise, it must have been visited
 705 during the previous retainer profiling and thus its retainer set field is
 706 invalid and will be set to a correct null value.
 707 Therefore it is unnecessary to visit all dynamic closures and set their
 708 retainer set field to a correct null value at the beginning of each retainer
 709 profiling.
 710
 711 However, this operation is required for static closures.
 712 The reason is that a static closure, which is never garbage collected,
 713 may appear alternately in the set of live closures.
 714 In other words, a currently live static closure may become dead and
 715 be resuscitated again.
 716 Therefore, for a static closure, it does not help to check if its
 717 retainer set field conforms to the current value of @flip@.
 718 For instance,
 719 if a static closure happens to belong to the set of live closures every other
 720 retainer profiling, its retainer set field will never set to a null value,
 721 which is disastrous.
 722 Therefore, we choose to visit all live static closures at the beginning
 723 of each retainer profiling and set their retainer set field to a
 724 correct null value.
 725
 726 In order to find all live static closures, we have each retainer
 727 profiling preceded by a major garbage collection, which knows all live
 728 static closures.\footnote{This is a heavy
 729 restriction on retainer profiling, which makes retainer profiling partially
 730 dependent on garbage collection.
 731 However, it does not affect any retainer profiling result because
 732 retainer profiling considers only live closures, which survive any
 733 garbage collection.}
 734 To be specific, the garbage collector builds a linked list
 735 @scavenged_static_objects@ (in @GC.c@) during a major garbage collection,
 736 which stores all live static closures of our interest.
 737 \footnote{
 738 A static closure of closure type @IND\_STATIC@ may be put in the
 739 list @mut\_once\_list@ of the oldest generation, instead of the list
 740 @scavenged\_static\_objects@.
 741 In our implementation, such a closure is just skipped over because it
 742 contains only a pointer to a dynamic closure, and we do not compute
 743 its retainer set.
 744 Thus, there is no need to traverse the list @mut\_once\_list@ of the oldest
 745 generation.}
 746 Since it destroys the linked list after finishing the major garbage collection
 747 (by invoking the function @zero_static_object_list()@ with
 748 @scavenged_static_objects@ as its argument),
 749 we traverse the linked list to set the retainer set field of each
 750 live static closure to a correct null value before its destruction.
 751 This is done by invoking the function
 752 @resetStaticObjectForRetainerProfiling()@.
 753
 754 \subsection{Traversal}
 755
 756 The traversal proceeds in a depth first manner and is implemented
 757 with two mutually recursive functions: @retainStack()@ and @retainerClosure()@.
 758 @retainerStack()@ can be invoked on dynamic closures holding a stack chunk:
 759 closure types @TSO@, @PAP@, and @AP_UPD@.
 760 It in turn invokes @retainerClosure()@ on each closure reachable from
 761 stack closures in the stack chunk. Notice that it does not invoke
 762 @retainerClosure()@ on those stack closures because we do not compute
 763 retainer sets for stack closures.
 764 @retainerClosure()@ iteratively traverses all live closures reachable
 765 from a given closure.
 766 It maintains its own stack to record the next scan position in every closure
 767 currently under consideration.\footnote{A recursive version of
 768 @retainerClosure()@ could be implemented easily.
 769 @retainerClosure()@ in our implementation is an iterative version.}
 770 When it encounters a closure holding a stack chunk, it invokes @retainerStack()@
 771 on that closure.
 772 Hence,
 773 the traversal is triggered simply by invoking @retainerClosure()@ on every root.
 774
 775 \subsection{Sanity check}
 776
 777 Since we assume that a retainer profiling is preceded by a major garbage
 778 collection,
 779 we expect that the size of all the dynamic closures visited during
 780 any retainer profiling adds up exactly to the total size of the heap.
 781 In fact, this is not the case; there can be closures not reachable from
 782 the set of roots yet residing in the heap even after a major garbage
 783 collection.
 784
 785 First, a dead weak pointer remains in the heap until its finalizer
 786 finishes. Although its finalizer thread closure is part of the set of roots,
 787 the dead weak pointer itself is not reachable from any root.
 788 Since it cannot be visited during retainer profiling anyhow, we do not
 789 need to located it and set its retainer set field
 790 appropriately.\footnote{Dead weak pointers are identified with their
 791 information table @stg\_DEAD\_WEAK\_info@ (in @StgMiscClosures.hc@).
 792 Notice that their closure type is @CONSTR@, \emph{not} @WEAK@;
 793 their information table is replaced by @stg\_DEAD\_WEAK\_info@ in the
 794 function @scheduleFinalizers()@ (in @GC.c@).}
 795
 796 Second,
 797 mutable variables (of closure type @MUT_VAR@) may remain in the heap
 798 even when they are not reachable from the set of roots while
 799 dynamic closures pointed to by them must be live.\footnote{I do not
 800 understand clearly why this happens :(}
 801 Since such mutable variables may become live again (in the sense that
 802 they become reachable from the set of roots), we must locate them
 803 and set their retainer set field appropriately after each retainer
 804 profiling. This is handily accomplished by traversing the list
 805 @mut_once_list@ in every generation.
 806
 807 \section{Retainer Profiling Schemes}
 808
 809 A retainer profiling scheme specifies \emph{what} retainer profiling
 810 yields (as opposed to \emph{how} retainer profiling computes the retainer
 811 set for every live object).
 812 It is determined primarily by the meaning of retainer identity,
 813 that is, the type @retainer@ (in @includes/StgRetainerProf.h@).
 814 The function @getRetainerFrom()@ must be defined according to the
 815 definition of the type @retainer@.
 816
 817 In order for a new retain profiling scheme to fully work, we need to follow
 818 four steps:
 819
 820 \begin{enumerate}
 821 \item Define the type @retainer@ as desired.
 822 \item Write @getRetainerFrom()@.
 823 \item Write two hashing functions @hashkeySingletone()@ and
 824       @hashKeyAddElement()@, which return the hash key from a single
 825       retainer and a retainer set with another retainer, respectively.
 826 \item Write two printing functions @printRetainer()@ and
 827       @printRetainerSetShort()@.
 828       These functions are employed when a retainer or a retainer set is
 829       printed in the output file.
 830 \end{enumerate}
 831
 832 In our implementation, we use cost centre stacks for retainer identity:
 833
 834 \begin{code}
 835 typedef CostCentreStack *retainer;
 836 \end{code}
 837 \begin{code}
 838 retainer getRetainerFrom(StgClosure *c) { return c->header.prof.ccs; }
 839 \end{code}
 840 \begin{code}
 841 void printRetainer(FILE *f, retainer cc)
 842 {
 843   fprintf(f,"%s[%s]", cc->label, cc->module);
 844 }
 845 \end{code}
 846
 847 \section{Usage}
 848
 849 Since cost centre stacks are used as retainer identity, a source program
 850 must be given proper cost centre annotations by programmers.
 851 Alternatively,
 852 we can ask the compiler to automatically insert cost centre annotations.
 853 For instance, the compiler option @-auto-all@ inserts a cost centre
 854 annotation around every top-level function as shown below
 855 (the @-p@ option is a must
 856 because we must build the executable file in a profiling way):
 857
 858 \begin{code}
 859 $ ghc-inplace -o Foo.out -p -auto-all Foo.hs
 860 \end{code}
 861
 862 The runtime system option @-hR@ tells the executable program to
 863 gather profiling statistics and report them in a @.prof@ file:
 864
 865 \begin{code}
 866 $ Foo.out +RTS -hR -RTS
 867 \end{code}
 868
 869 The option @-i@ can be used to
 870 specify a desired interval at which retainer profiling is performed.
 871 The default and minimum value is half a second:
 872
 873 \begin{code}
 874 $ Foo.out +RTS -hR -i2.5 -RTS
 875 \end{code}
 876
 877 Then, two text files are generated: a @.prof@ file and a @.hp@ file.
 878 The @.prof@ file records the progress of retainer profiling:
 879 for each retainer profiling performed during program execution,
 880 it shows
 881 the Haskell mutator time (as opposed to the user time) at which
 882 the retainer profiling starts,
 883 the average number of times a closure is visited,
 884 the sum of costs assigned to all retainer sets (obtained from the field
 885 @cost@ in each retainer set),
 886 and the number of all retainer sets created \emph{since} the beginning
 887 of program execution.
 888 A typical entry in a @.prof@ file looks like:
 889
 890 \begin{code}
 891 Retainer Profiling: 3, at 3.530000 seconds
 892   Average number of visits per object = 1.687765
 893   Current total costs = 801844
 894   Number of retainer sets = 118
 895 \end{code}
 896
 897 The sum of costs assigned to all retainer sets may \emph{not} be equal to the
 898 size of the heap.
 899 The discrepancy is attributed to those live object which are not reachable
 900 from the set of roots.
 901 Still it is a good estimate of the size of the heap at the moment when
 902 the retainer profiling was performed.
 903
 904 The @.prof@ file also shows the contents of every retainer set which
 905 has been assigned a positive cost (i.e., the field @cost@) at least once;
 906 not every retainer set created is assigned a positive cost because quite
 907 a few retainer sets are created as intermediate retainer sets before
 908 creating a real retainer set. This results from the restriction on the way
 909 retainer sets are created (only one retainer can be added to an existing
 910 retainer set at a time).
 911
 912 An example of the contents of a retainer set is:
 913
 914 \begin{code}
 915 SET 71 = {<doFile[Main],main[Main],MAIN[MAIN]>, <synth_2[Main],doFile[Main],main[Main],MAIN[MAIN]>}
 916 \end{code}
 917
 918 The retainer set has an identification number $71$.
 919 It is associated with two retainers, whose retainer identities are shown
 920 inside angle brackets @<...>@.
 921 For instance, the first retainer is created when the cost centre stack
 922 is @doFile[Main],main[Main],MAIN[MAIN]@, shown from the top to the bottom.
 923 Each entry in angle brackets consists of a cost centre name (e.g., @doFile@)
 924 and its module name (e.g., @Main@).
 925
 926 The @.hp@ file can be supplied to the @hp2ps@ program to create a postscript
 927 file showing the progress of retainer profiling in a graph:
 928
 929 \begin{code}
 930 $ hp2ps Foo.hs
 931 $ gv Foo.ps
 932 \end{code}
 933
 934 An example of such a graph is shown in Figure~\ref{fig-cacheprof}.
 935 It shows the cost assigned to each retainer set at the point
 936 when a retainer profiling is performed (marked by a corresponding inverted
 937 triangles on the horizontal axis).
 938 The abbreviated contents of each retainer set is displayed in the right column.
 939 Due to the space limitation,
 940 it shows only topmost cost centres (without module names)
 941 instead of printing the full contents.
 942 For instance, @(71)doFile,synth_2@ corresponds to a retainer set shown above
 943 (@71@ is its identification number).
 944 The contents may be truncated if it is too long.
 945
 946 Notice that the time is in the Haskell mutator time, which excludes
 947 the runtime system time such as garbage collection time and retainer profiling
 948 time. Thus, the actual execution takes longer than indicated in the
 949 graph. Also, the timer employed to periodically perform retainer profiling
 950 is not perfectly accurate. Therefore, the result may slightly vary for each
 951 execution of retainer profiling.
 952
 953 \begin{figure}[ht]
 954 \centering
 955 \epsfig{file=cacheprof_p.eps,width=5in}
 956 \caption{A graph showing the progress of retainer profiling}
 957 \label{fig-cacheprof}
 958 \end{figure}
 959
 960 \section{Comparision with nhc}
 961
 962 \section{Files}
 963
 964 This section gives a summary of changes made to the GHC in
 965 implementing retainer profiling.
 966 Only three files (@includes/StgRetainerProf.h@, @RetainerProfile.c@, and
 967 @RetainerProfile.h@) are new, and all others exist in the GHC.
 968
 969 @\includes@ directory:
 970
 971 \begin{description}
 972 \item[StgRetainerProf.h] defines types @retainer@ and @retainerSet@.
 973 \item[Stg.h] includes the header file @StgRetainerProf.h@.
 974 \item[Closures.h] changes structure @StgProfHeader@.
 975 \item[Constants.h] changes constants @PROF_HDR_SIZE@, @SCC_UF_SIZE@, and
 976   @SCC_SEQ_FRAME_SIZE@.
 977 \item[ClosureMacros.h] changes macros @SET_PROF_HDR()@ and
 978   @SET_STATIC_PROF_HDR()@.
 979 \item[Updates.h] changes macro @PUSH_STD_CCCS()@.
 980 \end{description}
 981
 982 @\rts@ directory:
 983
 984 \begin{description}
 985 \item[Exception.hc] changes constant @CATCH_FRAME_BITMAP@,
 986 \item[StgStartup.hc] changes constant @STOP_THREAD_BITMAP@.
 987 \item[StgStdThunks.hc] changes constant @RET_BITMAP@.
 988 \item[Updates.hc] changes constant @UPD_FRAME_BITMAP@.
 989 \item[RetainerProfile.c] implements the retainer profiling engine.
 990 \item[RetainerProfile.h] is the header for @RetainerProfile.c@.
 991 \item[GC.c] invokes @resetStaticObjectForRetainerProfiling()@ in
 992   @GarbageCollect()@.
 993 \item[Itimer.c] changes @handle_tick()@.
 994 \item[ProfHeap.c] changes @initHeapProfiling()@ and @endHeapProfiling()@.
 995 \item[Profiling.c] changes @initProfilingLogFile()@ and
 996   @report_ccs_profiling()@.
 997 \item[Proftimer.c] declares @ticks_to_retainer_profiling@,
 998   @performRetainerProfiling@, and @doContextSwitch@.
 999 \item[Proftimer.h] is the header for @Proftimer.c@.
1000 \item[RtsAPI.c] implements @setRetainerField()@.
1001 \item[RtsFlags.c]
1002   sets @RtsFlags.ProfFlags.doHeapProfile@ and
1003   adds a string to  @usage_text[]@ in @setupRtsFlags()@.
1004 \item[RtsFlags.h] defines constants @HEAP_BY_RETAINER@ and @RETAINERchar@.
1005 \item[RtsStartup.c] includes the header file @RetainerProfile.h@.
1006   Changes @shutdownHaskell()@.
1007 \item[Schedule.c] changes @schedule()@.
1008 \item[Stats.c]
1009   declares @RP_start_time@, @RP_tot_time@, @RPe_start_time@,
1010   @RPe_tot_time@.
1011   Changes @mut_user_time_during_GC()@, @mut_user_time()@,
1012   @stat_startExit()@,
1013   @stat_endExit()@, and
1014   @stat_exit()@.
1015   Defines
1016   @mut_user_time_during_RP()@,
1017   @stat_startRP()@, and
1018   @stat_endRP()@.
1019 \item[Stats.h] is the header for @Stats.c@.
1020 \item[StgMiscClosures.hc] redefines @stg_DEAD_WEAK_info@.
1021 \item[Storage.c] changes @initStorage()@, @memInventory()@.
1022 \end{description}
1023
1024 \bibliographystyle{plain}
1025 \bibliography{reference}
1026
1027 \end{document}