1 Ways to save code space
2 ~~~~~~~~~~~~~~~~~~~~~~~
11 We have lots of thunks of the form
17 where f is know function of arity 3 (ie saturated).
18 At the moment we generate special code for this one closure,
20 pushes an update frame
21 loads p,q,r into registers from the closure (or using
22 immediate loads if they are literals),
25 Since there are quite a lot of thunks of this form, the idea is to
26 generate some code (and its info table) just once, *with the
27 definition of f*, which does exactly as described above. We can then
28 use this code for every thunk of (exactly) this form. Call this
29 the "heap entry" for f:
31 slow entry: args on stack
32 fast entry: args in regs
33 heap entry: args in closure pointed to by Node
35 So the thunk for x would look like this:
41 | common heap entry code for f
42 ------> push update frame
43 R2 := R1[2] -- Second arg
44 R3 := R1[3] -- Third arg
45 R1 := R1[1] -- First arg
48 The jump to f_fast can be implemented as a fall-through. (The
49 slow entry point can take a jump instead!)
51 Of course there are also lots of thunks which *aren't* of the heap-entry
54 x = let v = ... in ...
57 Things to watch out for:
59 * Literal args. Consider
63 We don't *have* to use the heap entry for f (we could generate special
64 code+info table as we do now), but we *can* use it provided we
65 generate a thunk with 2 and 4 stored in it as well as p:
71 | common heap entry code for f
72 ------> push update frame
73 R2 := R1[2] -- Second arg
74 R3 := R1[3] -- Third arg
75 R1 := R1[1] -- First arg
78 (If we have special code the thunk needs only p stored in it, because
79 the special code can use immediate constants for 2 and 4:
86 ----> push update frame
87 R2 := R1[1] -- Second arg
93 * Single-entry thunks. If x is a single-entry thunk, there's no need to
94 push an update frame. That suggests:
100 | heap entry code for f
101 ----> -- NO! NO! push update frame
102 R2 := R1[2] -- Second arg
103 R3 := R1[3] -- Third arg
104 R1 := R1[1] -- First arg
107 Let's call the two variants the
109 and no-update heap entry
111 We can't fall through from the standard heap-entry code (which pushes
112 an update frame) to the arg-loading code, because both need an info table.
113 We have to take a jump.
115 For non-exported functions we may be able to see that only one of the
116 two heap entries is required.
118 * Local functions. When f is a *local* (ie not top-level) function, its
119 fast-entry convention is that
120 R1 = the function closure
126 f = \r -> ..r..p...q...
134 The shape of the heap-entry closure for f must be
140 -------> heap entry code
141 must load *f* into R1 as well as q and
148 Avoiding generating entries and info tables
149 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
150 At present, for every function we generate all of the following,
151 just in case. But they aren't always all needed, as noted below:
153 [NB: all of this applies only to *functions*. Thunks always
154 have closure, info table, and entry code.]
157 * Fast-entry code ALWAYS NEEDED
160 Needed iff (a) we have any un-saturated calls to the function
161 OR (b) the function is passed as an arg
163 * The function closure
164 Needed iff (a) we have any un-saturated calls to the function
165 OR (b) the function is passed as an arg
166 OR (c) if the function has free vars (ie top level)
168 Why case (a) here? Because if the arg-satis check fails,
169 UpdatePAP stuffs a pointer to the function closure in the PAP.
170 [Could be changed; UpdatePAP could stuff in a code ptr instead,
171 but doesn't seem worth it.]
173 [NB: these conditions imply that we might need the closure
174 without the slow-entry code. Here's how.
176 f x y = let g w = ...x..y..w...
180 Here we need a closure for g which contains x and y,
181 but since the calls are all saturated we just jump to the
182 fast entry point for g, with R1 pointing to the closure for g.]
185 * Slow-entry info table
186 Needed iff (a) we have any un-saturated calls to the function
187 OR (b) the function is passed as an arg
188 OR (c) the function has free vars (ie top level)
190 NB. (c) is only required so that the function closure has
191 an info table to point to, to keep the storage manager happy.
192 If (c) alone is true we could fake up an info table by choosing
193 one of a standard family of info tables, whose entry code just
196 If (c) is retained, then we'll sometimes generate an info table
197 (for storage mgr purposes) without slow-entry code. Then we need
198 to use an error label in the info table to substitute for the absent
201 * Standard heap-entry code
202 Standard heap-entry info table
203 Needed iff we have any updatable thunks of the standard heap-entry shape.
205 * Single-update heap-entry code
206 Single-update heap-entry info table
207 Needed iff we have any non-updatable thunks of the
208 standard heap-entry shape.
211 All are needed if the function is exported, just to play safe.
213 Idea: generate just the stuff we need!
218 staticClosureRequired -- Assumption: it's a top-level, no-free-var binding
222 staticClosureRequired (StgBinderInfo arg_occ unsat_occ _ _) args
223 = arg_occ || -- There's an argument occurrence
224 unsat_occ || -- There's an unsaturated call
225 null args -- It's a thunk
227 staticClosureRequired NoStgBinderInfo args = True
231 slowFunEntryCodeRequired -- Assumption: it's a function, not a thunk.
234 slowFunEntryCodeRequired (StgBinderInfo arg_occ unsat_occ _ _)
235 = arg_occ || -- There's an argument occurrence
236 unsat_occ -- There's an unsaturated call
237 slowFunEntryCodeRequired NoStgBinderInfo = True
240 funInfoTableRequired -- Assumption: it's a function, not a thunk.
241 :: Bool -- Top level?
244 funInfoTableRequired top_level (StgBinderInfo arg_occ unsat_occ _ _)
246 arg_occ || -- There's an argument occurrence
247 unsat_occ -- There's an unsaturated call
249 funInfoTableRequired top_level NoStgBinderInfo = True