1 \documentclass{article}
4 \setlength{\parskip}{0.25cm}
5 \setlength{\parsep}{0.25cm}
6 \setlength{\topsep}{0cm}
7 \setlength{\parindent}{0cm}
8 \renewcommand{\textfraction}{0.2}
9 \renewcommand{\floatpagefraction}{0.7}
13 \newcommand{\block}{block}
14 \newcommand{\Block}{Block}
15 \newcommand{\segment}{segment}
16 \newcommand{\Segment}{Segment}
17 \newcommand{\step}{step}
18 \newcommand{\Step}{Step}
20 \newcommand{\note}[1]{{\em $\spadesuit$ #1}}
23 \title{The GHC storage manager}
24 \author{Simon Peyton Jones}
29 \section{Introduction}
31 This is a draft design document, not a completed thing.
35 Storage management goals:
38 \item Generational collection, supporting multiple generations.
40 \item The ability to pin the allocation
41 area into a few pages that we hope will fit entirely in the cache.
43 \item Allows objects to age within a generation before getting promoted.
45 \item Heap can grow as needed, rather than having to be pre-sized
48 \item We support mark/sweep/compact collection for older generations.
49 This is a Good Thing when the live memory approaches the available
50 physical memory, because it reduces paging.
52 \item Little OS support needed. No mmap etc. All that we require is
53 the ability to @malloc@ (or whatever) a new chunk of memory.
54 There can be intervening ``sandbars'' allocated by other programs
55 (e.g. DLLs or other @malloc@'d structures) between chunks of heap.
57 \item Possible feature: ability to support mostly-copying conservative
58 collection.[.bartlett mostly 1989, morrisset mostly.]
62 Language-support goals:
64 \item The garbage collector ``shorts out'' indirection objects introduced
65 by the mutator (notably when overwriting a thunk with an indirection).
67 \item The garbage collector executes selector thunks.
68 For example, a thunk for
69 @(fst x)@ where @x@ is a pointer to a pair @(a,b)@ would be
70 evaluated by the garbage collector to just @a@. This is an important
71 strategy for plugging space leaks.
73 \item The garbage collector traversese the code tree, as well as
74 the heap data structures, to find which CAFs are live. This is a Royal Pain.
76 \item The garbage collector finalises some objects (typically a tiny minority).
77 At the moment ``finalisation'' means ``call a C routine when this thing
78 dies'' but it would be more general to schedule a call to a Haskell
82 Instrumentation goals:
85 \item The garbage collector can gather heap-census information for profiling.
86 To this end we can force GC to happen more often than it otherwise would,
87 and the collector can gather information about the type and cost-centre
88 associated with each heap object.
94 Every garbage collector uses assupmtions about the sort of
95 objects it manipulates, and how the mutator behaves. Here we
96 collect our assumptions:
98 \item Each heap object starts with a one-word header that encodes, or points
99 to, information about the layout of the object. The details of this
100 encoding aren't fundamental to the collector, but are, of course, rather
103 \item Every heap pointer points to the beginning of
104 an object; no pointers into the middle of objects.
106 \item There are two spare bits in the LSB of a pointer, because
107 every heap object is at least word aligned.
110 Heap objects are packed contiguously, with no ``slop'' between them.
111 This assumption makes it possible to scan a region of heap linearly.
112 It's easy to arrange that objects are packed contiguously in the
113 to-space of a copying collector, but the no-slop assumption also
114 applies during mutation. This is a bit of a pain, because we often
115 update a thunk with an indirection to the thunk's value; and the
116 indirection isn't necessarily the same size as the thunk. This can be
117 solved, but it takes a few more instructions.
119 We know of two reasons to uphold the no-slop assumption:
121 \item A {\em mostly-copying conservative (MCC) collector} has to deal with
122 possible pointers from conservative regions.[.morrisset mostly.]
123 Given a possible pointer the MCC collector must find the head of the object into
124 which the putative pointer points. One easy way to do this is
125 to scan forward from a known point; but only if there is no
127 \item An {\em incremental collector}
128 will scan linearly during mutation.[.field while.]
135 The basic memory structure is similar to other BIBOP-based
136 collectors.[.stop the bibop, reppy garbage collector, moss garbage toolkit.]
138 A {\em \block} is a contiguous chunk of $2^K$ bytes, starting on a
139 $2^K$-byte boundary. We expect a \block{} to be between 4k and 64k bytes.
141 A \block{} is the unit of allocation for the storage manager. That is,
142 the {\em block allocator} hands over store to the mutator in multiples of
143 one \block. (The mutator may then allocate multiple heap objects within
144 a \block, of course.)
146 \Block{}s are often allocated individually, but for large
147 objects, or just to reduce inter-block linkage costs, a
148 contiguous group of \block{}s can be allocated; we call
149 this a {\em \block{} group} or sometimes just {\em group}.
150 The first \block{} of a group is called the {\em group head}.
152 \subsection{\Block{} descriptors}
154 Each block has an associated {\em block{} descriptor}, a
155 record with the following structure:
165 The function @BDescr(a)@ takes an address @a@ and returns
166 the address of its block descriptor. It is expected that
167 @BDescr@ only takes a few instructions (see below).
168 The field of a \block{} descriptor have the following
171 \begin{tabular}{lp{4.5in}}
172 @Gen@ & The generation to which this block belongs. The
173 youngest generation is 0. Valid for all blocks whether or not
176 @Step@ & The number of GCs that this block has survived
177 while within its current generation. Only valid for group heads.
179 @Start@ & Points to the first byte of the block itself.
180 Valid for all blocks.
182 @Free@ & For a group head, @Free@ points to the first free (un-allocated)
183 byte in the group. For non-group heads, @Free@ is set to zero;
184 this identifies the \block{} as a non-head.
186 @Link@ & Used to chain \block{}s together. For non-group-heads,
187 @Link@ points to the (\block{} descriptor of the) group head.
191 \subsection{\Step{}s}
193 A {\em \step} is a linked list of \block{} groups, {\em not} necessarily contiguous,
194 all belonging to the same generation,
195 and all of which have survived the same number of garbage collections.
197 We think of a \step{} as {\em logically} contiguous (via their @Link@ fields) even though
198 it is not physically contiguous.
200 The \step{} number of a \step{} indicates its age (in GCs) within its generation.
201 (\Step{} zero is the youngest \step{} in a generation.) During GC of generation G,
203 from \step{} N are moved to \step{} N+1, and live objects from \step{} @MaxStep@(G)
204 are promoted to generation G+1. In this way, objects are given a decent chance
205 to die before being promoted.
207 In effect, we record the age for a set of objects grouped by address (the \step{}),
208 rather than recording the age for each object individually. This is quite
211 \subsection{Generations}
213 A {\em generation} is a set of \step{}s
214 (not necessarily contiguous). Most allocation takes
215 place in generation zero, the youngest generation.
217 The big difference between generations and \step{}s is that:
219 \item Garbage collection applies to all of generation G and all younger
220 generations (for some G). You cannot do garbage collection on just
221 part of a generation.
223 \item Consequently we have to track inter-generational pointers from
224 older to younger generations; we do not
225 track inter-\step{} pointers within a generation.
228 \subsection{The \block{} allocator}
230 The {\em \block{} allocator} is responsible for
231 allocating blocks. It mediates between the higher levels of the
232 storage manager and the operating system.
234 It supports the following API:
236 \item{@InitialiseBlockAllocator()@} initialises the \block{} allocator.
237 \item{@bdescr *AllocBlocks( int n )@} allocates a {\em contiguous}
238 group of @n@ \block{}s, returning a pointer to the \block{}
239 descriptor of the first. Returns @NULL@ if the request can't be
240 satisfied. The block descriptors of the group are initialised by the block allocator.
241 \item{@FreeBlocks( bdescr *p )@} returns a block group to the block allocator.
242 \item{@BDescr( void *p )@} takes a pointer @p@ to any
243 byte within a block{} and returns a pointer to its \block{}
244 descriptor. It does no error checking, and fails horribly if the @p@
245 does not point into a \block{} allocated by the \block{} allocator.
246 It is probably implemented as an @inline@ procedure.
250 \subsection{Implementation}
252 Our current implementation plan for the \block{} allocator is as
253 follows. It allocates memory from the OS in {\em megablocks}. A
254 megablock is a contiguous chunk of memory of $2^M$ bytes or less,
255 starting on a $2^M$ byte boundary. All the \block{} descriptors are
256 laid out in an array at the start of the megablock. Each is less than 32 bytes
257 long. So to get from an address @p@ to its block descriptor, @BDESCR@
260 bdescr *BDescr( void *p ) {
261 return (((p>>(K-5)) & Bmask) | (p & Mmask)));
264 Depending on how many \block{}s fit in the megablock, the first
265 one or more descriptors will be unused.
267 (An alternative design has the \block{} descriptors at the start of
268 each \block{}. This makes @BDescr@ shorter, but is pessimal for
269 cache locality when fiddling with \block{} descriptors.
270 It also means that only the first block in a contiguous chunk of
271 blocks can have a descriptor. That in turn makes life difficult
272 for a mostly-copying conservative collector. Given a possible pointer
273 from a conservative region, a MCC collector needs
274 to find the start of the object into which the putative pointer
275 points. One way to do this is to scan from the start of the
276 contiguous chunk of \block{}s into which the pointer points; and to
277 find that it helps to have a descriptor for each \block{}.)
279 The \block{} allocator maintains a pool of free \block{}s.
280 Because it may be asked for a contiguous chunk of \block{}s it
281 needs to keep the free-list ordered (or something similar).
284 \section{The Storage Manager API}
286 The storage manager supports the following API.
288 \item[@void *Allocate( int n )@] allocates @n@ bytes and returns
289 a pointer to the first byte of the block. Returns @NULL@ if
290 the request could not be satisfied --- @Allocate@ does not call
291 the garbage collector itself. It is the responsibility of the caller
292 to completely fill in the allocated chunk with objects packed end to
293 end. @OpenNursery( void **p_hp, void **p_hplim )@
295 \item[@OpenNursery( void **p\_hp, void **p\_hplim )@]
296 returns pointers in @*p_hp@ and @*p_hplim@, to the beginning and end
297 of a free chunk of store.
298 More precisely, @*p_hp@ points to the first byte of the region
299 and @*p_hplim@ to the first byte beyond the end of the region.
300 @*p_hp@ is set to @NULL@ if there is no chunk available.
302 @OpenNursery@ is used by the main mutuator to get a chunk of
303 contiguous store to allocate in. It allocates until @hp@ gets
304 too close to @hplim@ and then calls @ExtendNursery@ to try
305 to get a new region of nursery.
307 \item[@ExtendNursery( void **p\_hp, void **p\_hplim )@]
308 returns to the storage manager the region handed out by
309 the previous @OpenNursery@ or @ExtendNursery@. Suppose that
310 this previous call returned with @*p_hp@ set
311 to $hp$ and @p_hplim@ set to
313 Then the @*p_hplim@ given to @ExtendNursery@ should be the same
314 as $hplim$, and @*p_hp@ should be greater than or equal to $hp$ and
315 less than or equal to $hplim$.
317 @ExtendNursery@ tries to find another suitable region. If it
318 succeeds, it sets @*hp@ and @*hplim@ appropriately; if not, it sets
319 @*hp@ to @NULL@. The new region is not necessarily contiguous
322 \note{Maybe @ExtendNursery@ should be a macro, so that we can pass a
323 globally allocated register to it? (We can't take the address of a register.)
324 If so then presumably so should @OpenNursery@.}
326 \item[@ZapNursery()@] arranges that the next call to @ExtendNursery@
327 will fail. @ZapNursery@ is called by asynchronous interrupts to arrange that
328 execution is brought to an orderly halt.
330 \item[@GarbageCollect( void *Roots )@] performs garbage collection.
331 The parameter @Roots@ is a procedure which is called by the garbage
332 collector when it wishes to find all the roots. This procedure
333 should in turn call @MarkRoot@ on each such root.
335 \item[@void *MarkRoot( void *p )@] informs the garbage collector that
336 @p@ is a root. It returns the new location of the object.
338 \item[@RecordMutable( void *p )@] informs the storage manager that
339 @p@ points to an object that may contain pointers to objects in
340 a younger generation.
342 It is not necessary to call @RecordMutable@ on objects known to
345 It is only necessary to call @RecordMutable@ once. For objects that
346 are genuinely mutable (like mutable arrays), the object is permanently
347 recorded as mutable. On the other hand, thunks that are updated only
348 once can be dropped from the mutables pool once their pointer has been
349 dealt with (promoted).
351 \item{@boolean InNursery( void *p )@} tells whether @p@ points into
352 the nursery. This should be a rather fast test; it is used to guard
353 calls to @RecordMutable@.
359 We want to experiment with the idea of keeping the youngest generation
360 entirely in the cache, so that an object might be allocated, used,
361 and garbage collected without ever hitting main memory.
363 To do this, generation zero consists of a fixed number of one-\block{}
364 \segment{}s, carefully allocated in a contiguous address range so that
365 they fit snugly next to each other in the cache.
366 (Large objects are allocated directly into generation 1,
367 so that there is no need to find contiguous \block{}s in generation 0 --- Section~\ref{sect:large}.)
369 After a GC, the storage manager decides how many generation-0 \segment{}s
371 allocated before the next minor GC; it chains these \segment{}s together
372 through their @SegLink@ fields. The storage manager then makes the allocation pointer
373 (@Hp@) point to the start of the first \segment{}, and the limit pointer
374 (@HpLim@) point to the end of that \segment{}.
375 Objects are allocated by incrementing the @Hp@ until
376 it hits @HpLim@. If the @SegLink@ field of the current \segment{} is NIL,
377 then it is time for GC; otherwise @Hp@ is set to point to the first data
378 word of the next \segment{},
379 and @HpLim@ to the end of the \segment{}, and allocation
380 continues. So between GCs allocation is only interrupted briefly
381 (i.e. a handful of instructions)
382 to move @Hp@/@HpLim@ on to the next \segment{}. Actually, this regular
383 interruption can be useful, because it makes a good moment to pre-empt
384 a thread to schedule another, or to respond to an interrupt.
385 (Pre-emption cannot occur at arbitrary times because we are assuming
386 an accurate garbage collector.)
388 We expect there to be two \step{}s in generation zero, \step{} 0 and \step{} 1.
389 At a minor GC we promote live objects in \step{} 1, and copy \step{} 0 to
390 become a new \step{} 1. The new \step{} 1 should fit also fit into the cache.
391 Suppose there are 16 generation-zero \segment{}s. A reasonable budget
395 10 \segment{}s to allocate into (\step{} 0) \\
396 3 \segment{}s for \step{} 1 \\
397 3 \segment{}s free for the new \step{} 1
401 This assumes a 30\% survival rate from \step{} 0 to \step{} 1. We can do in-flight
402 measurements to tune this. And (MOST IMPORTANT) if we get it wrong, and
403 get a blip in which 9 \segment{}s survive minor GC, all is not lost... we simply
404 have to use some new \block{}s and generation zero gets bigger for a while
405 and perhaps less cache resident. (This contrasts with systems which rely
406 on contiguous partitions, where an overflow is catastrophic, and which must
407 therefore over-reserve. We get better address-space usage by not having
408 to be so conservative. Again, this isn't a new observation.)
411 \section{Doing garbage collection}
413 When GC happens, we first decide which generation to collect. (How?)
414 When we collect generation G we collect all younger generations too.
416 To begin with we assume a copying style garbage collection
417 algorithm. The structure of a full GC is as follows:
421 \item For each root @r@ do @r:=Evacuate(r)@.
426 A {\em root} is either a pointer held by the RTS (notably the queue
427 of runnable threads), or a pointer from a generation > G that might
428 point into a generation <= G.
430 The @Evacuate@ operation does this:
434 /* Ignore pointers into non-collected generations */
435 if generation(p) > G then return(p);
437 /* Copy the object and return its new location */
438 p' := copy the object pointed to by p into to-space(g);
443 The @Scavenge@ operation operation does this:
446 While there is an un-scavenged object O in to-space do
447 for each pointer q in O do
449 remember that O has been scavenged
452 We say ``copy into to-space'' and ``find an un-scavenged object in to-space'',
453 but actually each \step{} of each generation has a separate to-space,
454 complete with allocation pointer and sweep pointer. (The sweep pointer
455 finds the un-scavenged objects.) When copying
456 into a to-space a check must be made for to-space
457 overflow and a new \segment{} chained on if necessary.
458 This is much as when the
459 mutator is doing allocation. The difference is that we are allocating
460 in multiple \segment{}s in an interleaved fashion, so we can't hold
461 the allocation pointers in registers.
463 Only when all the sweep pointers
464 have reached the corresponding allocation pointer is GC done.
465 This entails multiple passes through the \step{}s until a pass
466 finds no un-scavenged objects. [NB: if we treated inter-generational pointers
467 between generations <= G as roots, then we could avoid some of these multiple
468 passes. But we'd collect less garbage, and hence copy more
469 objects, so it would probably be slower.]
471 Notice that there is no issue about partitioning the address space
472 among generations, as there might be if each lived in a contiguous address
475 \section{Large objects}
478 Large objects are allocated directly in generation 1, each alone in its
479 own \segment{}. This means that they never, ever, need to be copied! Instead,
480 we simply change their generation or \step{} number. We also need to chain the
481 object onto a list of large objects to be swept for pointers. This act also
482 serves to mark the object, so we don't chain it on twice, or increment its
483 \step{} number twice.
485 Stack objects probably count as large objects. This is one reason for not
486 wanting \block{}s to be too big.
491 So far we have assumed copying collection. Actually, for
492 older generations we may want to use
493 mark/sweep/compact because it is a bit less greedy on address
494 space, and may help paging behaviour.
496 What I have in mind is a pointer-reversing mark phase, so that we don't have
497 to allocate space for a mark stack, although one could argue that a mark
498 stack will almost invariably be small, and if it gets big then you can always
499 malloc some more space. Mumble mumble.
501 \bibliographystyle{plain}
502 \bibliography{bibl,simon}