1 Profiling Implementation Notes -- June/July/Sept 1994
2 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
9 * Automagic insertion of _sccs_ on...
11 - If -auto is specified, add _scc_ on each *exported* top-level definition.
12 NB this includes CAFs. Done by addAutoCostCentres (Core-to-Core pass).
14 - If -auto-all is specified, add _scc_ on *all* top-level definitions.
17 - Always: just before code generation of module M, onto any CAF
18 which hasn't already got an explicit cost centre attached, pin
21 Done by finalStgMassageForProfiling (final STG-to-STG pass)
23 Only the one-off costs of evaluating the CAFs will be attributed
24 to the AllCAFs-M cost centre. We hope that these costs will be
25 small; since the _scc_s are introduced automatically it's
26 confusing to attribute any significant costs to them. However if
27 there *are* significant one-off costs we'd better know about it.
29 Why so late in the compilation process? We aren't *absolutely*
30 sure what is and isn't a CAF until *just* before code generation.
31 So we don't want to mark them as such until then.
35 We do it in the desugarer, because that's the *only* point at
36 which we *know* exactly what bindings are introduced by
37 overloading. NB should include bindings for selected methods, eg
39 f d = let op = _scc_ DICT op_sel d in
42 The DICT CC ensures that:
43 (a) [minor] that the selection cost is separately attributed
44 (b) [major] that the cost of executing op is attributed to
47 ...(scc "a" op)...(scc "b" op)...(scc "c" op)...
49 * Automagic "boxing" of higher-order args:
51 finalStgMassageForProfiling (final STG-to-STG pass)
53 This (as well as CAF stuff above) is really quite separate
54 from the other business of finalStgMassageForProfiling
55 (collecting up CostCentres that need to be
58 But throwing it all into the pot together means that we don't
59 have to have Yet Another STG Syntax Walker.
61 Furthermore, these "boxes" are really just let-bindings that
62 many other parts of the compiler will happily substitute away!
63 Doing them at the very last instant prevents this.
65 A down side of doing these so late is that we get lots of
66 "let"s, which if generated earlier and not substituted away,
67 could be floated outwards. Having them floated outwards would
68 lessen the chance of skewing profiling results (because of
69 gratuitous "let"s added by the compiler into the inner loop of
70 some program...). The allocation itself will be attributed to
71 profiling overhead; the only thing which'll be skewed is time measurement.
73 So if we have, post-boxing-higher-order-args...
75 _scc_ "foo" ( let f' = [f] \ [] f
79 ... we want "foo" to be put in the thunk for "f'", but we want the
80 allocation cost (heap census stuff) to be attr to OVERHEAD.
82 As an example of what could be improved
84 To save dynamic allocation, we could have a static closure for h:
86 f = _scc_ "f" (g h_inf)
95 (1) _Entry_ code for a closure *usually* sets CC from the closure,
96 at the fast entry point
100 (a) Top-level subsumed functions (i.e., w/ no _scc_ on them)
102 Refrain from setting CC from the closure
106 Again, refrain. (This is *new*)
108 Reasons: (i) The CC will be zapped very shortly by the restore
109 of the enclosing CC when we return to the eval'ing "case".
110 (ii) Any intervening updates will indirect to this existing
111 constructor (...mumble... new update mechanism... mumble...)
115 Set current CC to "cc".
116 No later "restore" of the previous CC is reqd.
118 (3) "case e of { ...alts... }" expression (eval)
120 Save CC before eval'ing scrutinee
121 Restore CC at the start of the case-alternative(s)
123 (4) _Updates_ : updatee gets current CC
125 (???? not sure this is OK yet 94/07/04)
129 * Constructors : want to be insensitive to return-in-heap vs
130 return-in-regs. For example,
132 f x = _scc_ "f" (x, x)
134 The pair (x,x) would get CC of "f" if returned-in-heap;
135 therefore, updatees should get CC of "f".
139 f x = _scc_ "f" (let g = \ y -> ... in g)
141 At the moment of update (updatePAP?), CC is "f", which
142 is what we want to set it to if the "updatee" is entered
144 When we enter the PAP ("please put the arguments back so I can
145 use them"), we restore the setup as at the moment the
146 arg-satisfaction check failed.
148 Be careful! UPDATE_PAP is called from the arg-satis check,
149 which is before the fast entry point. So the cost centre
150 won't yet have been set from the closure which has just
151 been entered. Solution: in UPDATE_PAP see if the cost centre inside
152 the function closure which is being entered is "SUB"; if so, use
153 the current cost centre to update the updatee; otherwise use that
154 inside the function closure. (See the computation of cc_pap
155 in rule 16_l for lexical semantics.)
160 CAFs get their own cost centre. Ie
166 Or sometimes we lump all the CAFs in a module together.
167 (Reporting issue or code-gen issue?)
176 f = _scc_ "CAF:f" (let g = \xy -> ...
179 Now, g has cost-centre "CAF:f", and is returned as part of
180 the result. So whenever the function embedded in the result
181 is called, the costs will accumulate to "CAF:f". This is
182 particularly (de)pressing for dictionaries, which contain lots
187 A. Whenever in case (1) above we would otherwise "set the CC from the
188 closure", we *refrain* from doing so if
189 (a) the closure is a function, not a thunk; and
190 (b) the cost-centre in the closure is a CAF cost centre.
192 B. Whenever we enter a thunk [at least, one which might return a function]
193 we save the current cost centre in the update frame. Then, UPDATE_PAP
194 restores the saved cost centre from the update frame iff the cost
195 centre at the point of update (cc_pap in (4) above) is a CAF cost centre.
197 It isn't necessary to save and possibly-restore the cost centre for
198 thunks which will certainly return a constructor, because the
199 cost centre is about to be restored anyway by the enclosing case.
201 Both A and B are runtime tests. For A, consider:
203 f = _scc_ "CAF:f" (g 2)
205 h y = _scc_ "h" g (y+y)
207 g x = let w = \p -> ...
211 Now, in the call to g from h, the cost-centre on w will be "h", and
212 indeed all calls to the result of the call should be attributed to
215 ... _scc_ "x1" (let (t,_) = h 2 in t 3) ...
217 Costs of executing (w 3) attributed to "h".
219 But in the call to g from f, the cost-centre on w will be
220 "CAF:f", and calls to w should be attributed to the call site.
222 ..._scc_ "x2" (let (t,_) = f in t 3)...
224 Costs of executing (w 3) attributed to "x2".
231 _scc_ "CAF:f" (if expensive then g 2 else g 3)
233 where g is a function with arity 2. In theory we should
234 restore the enclosing cost centre once we've reduced to
235 (g 2) or (g 3). In practice this is pretty tiresome; and pretty rare.
237 A quick fix: given (_scc_ "CAF" e) where e might be function-valued
238 (in practice we usually know, because CAF sccs are top level), transform to
240 _scc_ "CAF" (let f = e in f)
252 (a) cc is a user-defined, non-dup'd cost
253 centre (so we care about entry counts)
257 (b) cc is not a CAF/DICT cost centre and x is top-level subsumed
259 [If x is lambda/let bound it'll have a cost centre
260 attached dynamically.]
262 To repeat, the transformation is OK if
263 x is a not top-level subsumed function
265 cc is a CAF/DICT cost centre and x is a top-level
270 (scc cc e) x ===> (scc cc e x)
274 cc is not CAF/DICT --- remains to be proved!!!!!!
277 Can we tell which in hybrid?
281 (scc "f" (scc "CAF" (\x.b))) y ==> (scc "f" (scc "CAF" (\x.b) y))
284 \x -> (scc cc e) ===> (scc cc \x->e)
286 OK IFF cc is not CAF/DICT
289 scc cc1 (scc cc2 e)) ===> scc cc2 e
291 IFF not interested in cc1's entry count
292 AND cc2 is not CAF/DICT
294 (scc cc1 ... (scc cc2 e) ...) ===> (scc cc1 ... e ...)
297 AND e is a lambda not appearing as the RHS of a let
299 e is a variable not bound to SUB