compiler/profiling/NOTES

   1 Profiling Implementation Notes -- June/July/Sept 1994
   2 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   3
   4 Simon and Will
   5
   6 Pre-code-generator-ish
   7 ~~~~~~~~~~~~~~~~~~~~~~
   8
   9 * Automagic insertion of _sccs_ on...
  10
  11   - If -auto is specified, add _scc_ on each *exported* top-level definition.
  12     NB this includes CAFs.  Done by addAutoCostCentres (Core-to-Core pass).
  13
  14   - If -auto-all is specified, add _scc_ on *all* top-level definitions.
  15     Done by same pass.
  16
  17   - Always: just before code generation of module M, onto any CAF
  18     which hasn't already got an explicit cost centre attached, pin
  19     "AllCAFs-M".
  20
  21     Done by finalStgMassageForProfiling (final STG-to-STG pass)
  22
  23     Only the one-off costs of evaluating the CAFs will be attributed
  24     to the AllCAFs-M cost centre.  We hope that these costs will be
  25     small; since the _scc_s are introduced automatically it's
  26     confusing to attribute any significant costs to them.  However if
  27     there *are* significant one-off costs we'd better know about it.
  28
  29     Why so late in the compilation process?  We aren't *absolutely*
  30     sure what is and isn't a CAF until *just* before code generation.
  31     So we don't want to mark them as such until then.
  32
  33   - Individual DICTs
  34
  35     We do it in the desugarer, because that's the *only* point at
  36     which we *know* exactly what bindings are introduced by
  37     overloading.  NB should include bindings for selected methods, eg
  38
  39         f d = let op = _scc_ DICT op_sel d in
  40               ...op...op...op
  41
  42     The DICT CC ensures that:
  43     (a) [minor] that the selection cost is separately attributed
  44     (b) [major] that the cost of executing op is attributed to
  45         its call site, eg
  46
  47         ...(scc "a" op)...(scc "b" op)...(scc "c" op)...
  48
  49 * Automagic "boxing" of higher-order args:
  50
  51         finalStgMassageForProfiling (final STG-to-STG pass)
  52
  53         This (as well as CAF stuff above) is really quite separate
  54         from the other business of finalStgMassageForProfiling
  55         (collecting up CostCentres that need to be
  56         declared/registered).
  57
  58         But throwing it all into the pot together means that we don't
  59         have to have Yet Another STG Syntax Walker.
  60
  61         Furthermore, these "boxes" are really just let-bindings that
  62         many other parts of the compiler will happily substitute away!
  63         Doing them at the very last instant prevents this.
  64
  65         A down side of doing these so late is that we get lots of
  66         "let"s, which if generated earlier and not substituted away,
  67         could be floated outwards.  Having them floated outwards would
  68         lessen the chance of skewing profiling results (because of
  69         gratuitous "let"s added by the compiler into the inner loop of
  70         some program...).  The allocation itself will be attributed to
  71         profiling overhead; the only thing which'll be skewed is time measurement.
  72
  73         So if we have, post-boxing-higher-order-args...
  74
  75             _scc_ "foo" ( let f' = [f] \ [] f
  76                           in
  77                           map f' xs )
  78
  79         ... we want "foo" to be put in the thunk for "f'", but we want the
  80         allocation cost (heap census stuff) to be attr to OVERHEAD.
  81
  82         As an example of what could be improved
  83                 f = _scc_ "f" (g h)
  84         To save dynamic allocation, we could have a static closure for h:
  85                 h_inf = _scc_ "f" h
  86                 f = _scc_ "f" (g h_inf)
  87
  88
  89
  90
  91
  92 Code generator-ish
  93 ~~~~~~~~~~~~~~~~~~
  94
  95 (1) _Entry_ code for a closure *usually* sets CC from the closure,
  96                  at the fast entry point
  97
  98     Exceptions:
  99
 100     (a) Top-level subsumed functions (i.e., w/ no _scc_ on them)
 101
 102         Refrain from setting CC from the closure
 103
 104     (b) Constructors
 105
 106         Again, refrain.  (This is *new*)
 107
 108         Reasons: (i) The CC will be zapped very shortly by the restore
 109         of the enclosing CC when we return to the eval'ing "case".
 110         (ii) Any intervening updates will indirect to this existing
 111         constructor (...mumble... new update mechanism... mumble...)
 112
 113 (2) "_scc_ cc expr"
 114
 115     Set current CC to "cc".
 116     No later "restore" of the previous CC is reqd.
 117
 118 (3) "case e of { ...alts... }" expression (eval)
 119
 120     Save CC before eval'ing scrutinee
 121     Restore CC at the start of the case-alternative(s)
 122
 123 (4) _Updates_ : updatee gets current CC
 124
 125     (???? not sure this is OK yet 94/07/04)
 126
 127     Reasons:
 128
 129     * Constructors : want to be insensitive to return-in-heap vs
 130         return-in-regs.  For example,
 131
 132         f x = _scc_ "f" (x, x)
 133
 134         The pair (x,x) would get CC of "f" if returned-in-heap;
 135         therefore, updatees should get CC of "f".
 136
 137     * PAPs : Example:
 138
 139         f x = _scc_ "f" (let g = \ y -> ... in g)
 140
 141         At the moment of update (updatePAP?), CC is "f", which
 142         is what we want to set it to if the "updatee" is entered
 143
 144         When we enter the PAP ("please put the arguments back so I can
 145         use them"), we restore the setup as at the moment the
 146         arg-satisfaction check failed.
 147
 148         Be careful!  UPDATE_PAP is called from the arg-satis check,
 149         which is before the fast entry point.  So the cost centre
 150         won't yet have been set from the closure which has just
 151         been entered.  Solution: in UPDATE_PAP see if the cost centre inside
 152         the function closure which is being entered is "SUB"; if so, use
 153         the current cost centre to update the updatee; otherwise use that
 154         inside the function closure.  (See the computation of cc_pap
 155         in rule 16_l for lexical semantics.)
 156
 157
 158 (5) CAFs
 159
 160 CAFs get their own cost centre.  Ie
 161
 162                         x = e
 163 is transformed to
 164                         x = _scc_ "CAF:x" e
 165
 166 Or sometimes we lump all the CAFs in a module together.
 167 (Reporting issue or code-gen issue?)
 168
 169
 170
 171 Hybrid stuff
 172 ~~~~~~~~~~~~
 173
 174 The problem:
 175
 176   f = _scc_ "CAF:f" (let g = \xy -> ...
 177                    in (g,g))
 178
 179 Now, g has cost-centre "CAF:f", and is returned as part of
 180 the result.  So whenever the function embedded in the result
 181 is called, the costs will accumulate to "CAF:f".  This is
 182 particularly (de)pressing for dictionaries, which contain lots
 183 of functions.
 184
 185 Solution:
 186
 187   A.  Whenever in case (1) above we would otherwise "set the CC from the
 188   closure", we *refrain* from doing so if
 189         (a) the closure is a function, not a thunk; and
 190         (b) the cost-centre in the closure is a CAF cost centre.
 191
 192   B.  Whenever we enter a thunk [at least, one which might return a function]
 193   we save the current cost centre in the update frame.  Then, UPDATE_PAP
 194   restores the saved cost centre from the update frame iff the cost
 195   centre at the point of update (cc_pap in (4) above) is a CAF cost centre.
 196
 197   It isn't necessary to save and possibly-restore the cost centre for
 198   thunks which will certainly return a constructor, because the
 199   cost centre is about to be restored anyway by the enclosing case.
 200
 201 Both A and B are runtime tests.  For A, consider:
 202
 203   f = _scc_ "CAF:f" (g 2)
 204
 205   h y = _scc_ "h" g (y+y)
 206
 207   g x = let w = \p -> ...
 208         in (w,w)
 209
 210
 211 Now, in the call to g from h, the cost-centre on w will be "h", and
 212 indeed all calls to the result of the call should be attributed to
 213 "h".
 214
 215   ... _scc_ "x1" (let (t,_) = h 2 in t 3) ...
 216
 217   Costs of executing (w 3) attributed to "h".
 218
 219 But in the call to g from f, the cost-centre on w will be
 220 "CAF:f", and calls to w should be attributed to the call site.
 221
 222   ..._scc_ "x2" (let (t,_) = f in t 3)...
 223
 224   Costs of executing (w 3) attributed to "x2".
 225
 226
 227         Remaining problem
 228
 229 Consider
 230
 231         _scc_ "CAF:f" (if expensive then g 2 else g 3)
 232
 233 where g is a function with arity 2.  In theory we should
 234 restore the enclosing cost centre once we've reduced to
 235 (g 2) or (g 3).  In practice this is pretty tiresome; and pretty rare.
 236
 237 A quick fix: given (_scc_ "CAF" e) where e might be function-valued
 238 (in practice we usually know, because CAF sccs are top level), transform to
 239
 240         _scc_ "CAF" (let f = e in f)
 241
 242
 243
 244
 245
 246 ============
 247
 248 scc cc x  ===>   x
 249
 250   UNLESS
 251
 252 (a)  cc is a user-defined, non-dup'd cost
 253      centre (so we care about entry counts)
 254
 255 OR
 256
 257 (b) cc is not a CAF/DICT cost centre and x is top-level subsumed
 258      function.
 259         [If x is lambda/let bound it'll have a cost centre
 260          attached dynamically.]
 261
 262         To repeat, the transformation is OK if
 263                 x is a not top-level subsumed function
 264         OR
 265                 cc is a CAF/DICT cost centre and x is a top-level
 266                 subsumed function
 267
 268
 269
 270 (scc cc e) x  ===>  (scc cc e x)
 271
 272         OK????? IFF
 273
 274 cc is not CAF/DICT  --- remains to be proved!!!!!!
 275 True for lex
 276 False for eval
 277 Can we tell which in hybrid?
 278
 279 eg  Is this ok?
 280
 281         (scc "f" (scc "CAF" (\x.b))) y ==>   (scc "f" (scc "CAF" (\x.b) y))
 282
 283
 284 \x -> (scc cc e)    ===>   (scc cc \x->e)
 285
 286         OK IFF cc is not CAF/DICT
 287
 288
 289 scc cc1 (scc cc2 e))   ===>  scc cc2 e
 290
 291         IFF not interested in cc1's entry count
 292         AND cc2 is not CAF/DICT
 293
 294 (scc cc1 ... (scc cc2 e) ...)   ===>  (scc cc1 ... e ...)
 295
 296         IFF cc2 is CAF/DICT
 297         AND e is a lambda not appearing as the RHS of a let
 298             OR
 299             e is a variable not bound to SUB
 300
 301