From: simonpj <unknown>
Date: Mon, 4 Jun 2001 13:40:39 +0000 (+0000)
Subject: [project @ 2001-06-04 13:40:39 by simonpj]
X-Git-Tag: Approximately_9120_patches~1818
X-Git-Url: http://git.megacz.com/?a=commitdiff_plain;h=22e1cbe9a1886d3f12c6d93d39b7a76d40182920;p=ghc-hetmet.git

[project @ 2001-06-04 13:40:39 by simonpj]
Add storage mgt doc
---

diff --git a/ghc/docs/storage-mgt/Makefile b/ghc/docs/storage-mgt/Makefile
new file mode 100644
index 0000000..f1f0281
--- /dev/null
+++ b/ghc/docs/storage-mgt/Makefile
@@ -0,0 +1,23 @@
+#	General makefile for Latex stuff
+
+ps: sm.ps
+
+
+######## General rules
+.SUFFIXES:
+.PRECIOUS: %.tex %.ps %.bbl
+
+%.dvi: %.tex $(addsuffix .tex, $(basename $(wildcard *.verb *.fig))) $(wildcard *.bib)
+	latex $<
+	@if grep -s "\citation" $*.aux; then bibtex $*; fi
+
+%.ps: %.dvi
+	dvips -f < $< > $@
+
+clean:
+	rm -f *.aux *.log
+
+nuke: clean
+	rm -f *.dvi *.ps *.bbl *.blg
+
+# End of file
diff --git a/ghc/docs/storage-mgt/sm.tex b/ghc/docs/storage-mgt/sm.tex
new file mode 100644
index 0000000..7c3af13
--- /dev/null
+++ b/ghc/docs/storage-mgt/sm.tex
@@ -0,0 +1,504 @@
+\documentclass{article}
+\usepackage{code}
+
+\setlength{\parskip}{0.25cm}
+\setlength{\parsep}{0.25cm}
+\setlength{\topsep}{0cm}
+\setlength{\parindent}{0cm}
+\renewcommand{\textfraction}{0.2}
+\renewcommand{\floatpagefraction}{0.7}
+
+
+% Terminology
+\newcommand{\block}{block}
+\newcommand{\Block}{Block}
+\newcommand{\segment}{segment}
+\newcommand{\Segment}{Segment}
+\newcommand{\step}{step}
+\newcommand{\Step}{Step}
+
+\newcommand{\note}[1]{{\em $\spadesuit$ #1}}
+
+\begin{document}
+\title{The GHC storage manager}
+\author{Simon Peyton Jones}
+
+\makeatactive
+\maketitle
+
+\section{Introduction}
+
+This is a draft design document, not a completed thing.
+
+\section{Goals}
+
+Storage management goals:
+
+\begin{itemize}
+\item Generational collection, supporting multiple generations.
+
+\item The ability to pin the allocation
+area into a few pages that we hope will fit entirely in the cache.
+
+\item Allows objects to age within a generation before getting promoted.
+
+\item Heap can grow as needed, rather than having to be pre-sized
+  by the programmer.
+
+\item We support mark/sweep/compact collection for older generations.
+This is a Good Thing when the live memory approaches the available
+physical memory, because it reduces paging.
+
+\item Little OS support needed.  No mmap etc. All that we require is
+  the ability to @malloc@ (or whatever) a new chunk of memory.
+  There can be intervening ``sandbars'' allocated by other programs
+  (e.g. DLLs or other @malloc@'d structures) between chunks of heap.
+
+\item Possible feature: ability to support mostly-copying conservative 
+collection.[.bartlett mostly 1989, morrisset mostly.]
+Not top priority.
+\end{itemize}
+
+Language-support goals:
+\begin{itemize}
+\item The garbage collector ``shorts out'' indirection objects introduced
+by the mutator (notably when overwriting a thunk with an indirection).
+
+\item The garbage collector executes selector thunks.  
+For example, a thunk for
+@(fst x)@ where @x@ is a pointer to a pair @(a,b)@ would be
+evaluated by the garbage collector to just @a@.  This is an important
+strategy for plugging space leaks.
+
+\item The garbage collector traversese the code tree, as well as
+the heap data structures, to find which CAFs are live.  This is a Royal Pain.
+
+\item The garbage collector finalises some objects (typically a tiny minority).
+At the moment ``finalisation'' means ``call a C routine when this thing
+dies'' but it would be more general to schedule a call to a Haskell 
+procedure.
+\end{itemize}
+
+Instrumentation goals:
+
+\begin{itemize}
+\item The garbage collector can gather heap-census information for profiling.
+To this end we can force GC to happen more often than it otherwise would,
+and the collector can gather information about the type and cost-centre
+associated with each heap object.
+\end{itemize}
+
+
+\section{Assumptions}
+
+Every garbage collector uses assupmtions about the sort of
+objects it manipulates, and how the mutator behaves.  Here we
+collect our assumptions:
+\begin{itemize}
+\item Each heap object starts with a one-word header that encodes, or points
+to, information about the layout of the object.  The details of this
+encoding aren't fundamental to the collector, but are, of course, rather 
+important.
+
+\item Every heap pointer points to the beginning of
+an object; no pointers into the middle of objects.
+
+\item There are two spare bits in the LSB of a pointer, because
+every heap object is at least word aligned.
+
+\item
+Heap objects are packed contiguously, with no ``slop'' between them.
+This assumption makes it possible to scan a region of heap linearly.
+It's easy to arrange that objects are packed contiguously in the
+to-space of a copying collector, but the no-slop assumption also
+applies during mutation.  This is a bit of a pain, because we often
+update a thunk with an indirection to the thunk's value; and the
+indirection isn't necessarily the same size as the thunk.  This can be
+solved, but it takes a few more instructions.
+
+We know of two reasons to uphold the no-slop assumption:
+\begin{itemize}
+\item A {\em mostly-copying conservative (MCC) collector} has to deal with 
+possible pointers from conservative regions.[.morrisset mostly.]
+Given a possible pointer the MCC collector must find the head of the object into
+which the putative pointer points.  One easy way to do this is
+to scan forward from a known point; but only if there is no
+slop.
+\item An {\em incremental collector}
+will scan linearly during mutation.[.field while.]
+\end{itemize}
+\end{itemize}
+
+
+\section{\Block{}s}
+
+The basic memory structure is similar to other BIBOP-based 
+collectors.[.stop the bibop, reppy garbage collector, moss garbage toolkit.]
+
+A {\em \block} is a contiguous chunk of $2^K$ bytes, starting on a 
+$2^K$-byte boundary.  We expect a \block{} to be between 4k and 64k bytes.
+
+A \block{} is the unit of allocation for the storage manager.  That is,
+the {\em block allocator} hands over store to the mutator in multiples of
+one \block.  (The mutator may then allocate multiple heap objects within
+a \block, of course.)
+
+\Block{}s are often allocated individually, but for large
+objects, or just to reduce inter-block linkage costs, a
+contiguous group of \block{}s can be allocated; we call
+this a {\em \block{} group} or sometimes just {\em group}.
+The first \block{} of a group is called the {\em group head}.
+
+\subsection{\Block{} descriptors}
+
+Each block has an associated {\em block{} descriptor}, a
+record with the following structure:
+\begin{code}
+  typedef struct {
+	int Gen;
+	int Step;
+	void *Start;
+	void *Free;
+	bdescr *Link
+    } bdescr;
+\end{code}
+The function @BDescr(a)@ takes an address @a@ and returns
+the address of its block descriptor.  It is expected that 
+@BDescr@ only takes a few instructions (see below).
+The field of a \block{} descriptor have the following 
+purposes:
+\begin{center}
+\begin{tabular}{lp{4.5in}}
+@Gen@ & The generation to which this block belongs.  The 
+youngest generation is 0. Valid for all blocks whether or not
+it is a group head.
+\\ \\
+@Step@ & The number of GCs that this block has survived
+while within its current generation. Only valid for group heads.
+\\ \\
+@Start@ & Points to the first byte of the block itself. 
+Valid for all blocks.
+\\ \\
+@Free@ & For a group head, @Free@ points to the first free (un-allocated) 
+byte in the group. For non-group heads, @Free@ is set to zero;
+this identifies the \block{} as a non-head.
+\\ \\
+@Link@ & Used to chain \block{}s together.  For non-group-heads,
+@Link@ points to the (\block{} descriptor of the) group head.
+\end{tabular}
+\end{center}
+
+\subsection{\Step{}s}
+
+A {\em \step} is a linked list of \block{} groups, {\em not} necessarily contiguous,
+all belonging to the same generation,
+and all of which have survived the same number of garbage collections.
+
+We think of a \step{} as {\em logically} contiguous (via their @Link@ fields) even though
+it is not physically contiguous.
+
+The \step{} number of a \step{} indicates its age (in GCs) within its generation.
+(\Step{} zero is the youngest \step{} in a generation.)  During GC of generation G, 
+live objects
+from \step{} N are moved to \step{} N+1, and live objects from \step{} @MaxStep@(G)
+are promoted to generation G+1. In this way, objects are given a decent chance
+to die before being promoted.  
+
+In effect, we record the age for a set of objects grouped by address (the \step{}), 
+rather than recording the age for each object individually.  This is quite
+a conventional idea.
+
+\subsection{Generations}
+
+A {\em generation} is a set of \step{}s 
+(not necessarily contiguous).  Most allocation takes 
+place in generation zero, the youngest generation.
+
+The big difference between generations and \step{}s is that:
+\begin{itemize}
+\item Garbage collection applies to all of generation G and all younger
+  generations (for some G).  You cannot do garbage collection on just
+  part of a generation.
+
+\item Consequently we have to track inter-generational pointers from
+older to younger generations; we do not
+  track inter-\step{} pointers within a generation.
+\end{itemize}
+
+\subsection{The \block{} allocator}
+
+The {\em \block{} allocator} is responsible for
+allocating blocks.  It mediates between the higher levels of the
+storage manager and the operating system.
+
+It supports the following API:
+\begin{description}
+\item{@InitialiseBlockAllocator()@} initialises the \block{} allocator.
+\item{@bdescr *AllocBlocks( int n )@} allocates a {\em contiguous}
+group of @n@ \block{}s, returning a pointer to the \block{} 
+descriptor of the first.  Returns @NULL@ if the request can't be
+satisfied.  The block descriptors of the group are initialised by the block allocator.
+\item{@FreeBlocks( bdescr *p )@} returns a block group to the block allocator.
+\item{@BDescr( void *p )@} takes a pointer @p@ to any
+byte within a block{} and returns a pointer to its \block{}
+descriptor.  It does no error checking, and fails horribly if the @p@
+does not point into a \block{} allocated by the \block{} allocator.
+It is probably implemented as an @inline@ procedure.
+\end{description}
+
+
+\subsection{Implementation}
+
+Our current implementation plan for the \block{} allocator is as
+follows.  It allocates memory from the OS in {\em megablocks}.  A
+megablock is a contiguous chunk of memory of $2^M$ bytes or less,
+starting on a $2^M$ byte boundary.  All the \block{} descriptors are
+laid out in an array at the start of the megablock.  Each is less than 32 bytes
+long.  So to get from an address @p@ to its block descriptor, @BDESCR@
+does this:
+\begin{code}
+  bdescr *BDescr( void *p ) {
+	return (((p>>(K-5)) & Bmask) | (p & Mmask)));
+  }
+\end{code}
+Depending on how many \block{}s fit in the megablock, the first
+one or more descriptors will be unused.
+
+(An alternative design has the \block{} descriptors at the start of
+each \block{}.  This makes @BDescr@ shorter, but is pessimal for
+cache locality when fiddling with \block{} descriptors.  
+It also means that only the first block in a contiguous chunk of
+blocks can have a descriptor.  That in turn makes life difficult
+for a mostly-copying conservative collector.  Given a possible pointer
+from a conservative region, a MCC collector needs
+to find the start of the object into which the putative pointer
+points.  One way to do this is to scan from the start of the
+contiguous chunk of \block{}s into which the pointer points; and to 
+find that it helps to have a descriptor for each \block{}.)
+
+The \block{} allocator maintains a pool of free \block{}s.
+Because it may be asked for a contiguous chunk of \block{}s it 
+needs to keep the free-list ordered (or something similar).
+
+
+\section{The Storage Manager API}
+
+The storage manager supports the following API.
+\begin{description}
+\item[@void *Allocate( int n )@] allocates @n@ bytes and returns
+a pointer to the first byte of the block.  Returns @NULL@ if
+the request could not be satisfied --- @Allocate@ does not call
+the garbage collector itself.  It is the responsibility of the caller
+to completely fill in the allocated chunk with objects packed end to
+end. @OpenNursery( void **p_hp, void **p_hplim )@ 
+
+\item[@OpenNursery( void **p\_hp, void **p\_hplim )@]
+returns pointers in @*p_hp@ and @*p_hplim@, to the beginning and end
+of a free chunk of store.  
+More precisely, @*p_hp@ points to the first byte of the region
+and @*p_hplim@ to the first byte beyond the end of the region.
+@*p_hp@ is set to @NULL@ if there is no chunk available.
+
+@OpenNursery@ is used by the main mutuator to get a chunk of
+contiguous store to allocate in.  It allocates until @hp@ gets
+too close to @hplim@ and then calls @ExtendNursery@ to try
+to get a new region of nursery.
+
+\item[@ExtendNursery( void **p\_hp, void **p\_hplim )@]
+returns to the storage manager the region handed out by
+the previous @OpenNursery@ or @ExtendNursery@.  Suppose that 
+this previous call returned with @*p_hp@ set 
+to $hp$ and @p_hplim@ set to
+$hplim$.
+Then the @*p_hplim@ given to @ExtendNursery@ should be the same 
+as $hplim$, and @*p_hp@ should be greater than or equal to $hp$ and 
+less than or equal to $hplim$.  
+
+@ExtendNursery@ tries to find another suitable region.  If it 
+succeeds, it sets @*hp@ and @*hplim@ appropriately; if not, it sets
+@*hp@ to @NULL@.  The new region is not necessarily contiguous 
+with the old one.
+
+\note{Maybe @ExtendNursery@ should be a macro, so that we can pass a
+globally allocated register to it?  (We can't take the address of a register.)
+If so then presumably so should @OpenNursery@.}
+
+\item[@ZapNursery()@] arranges that the next call to @ExtendNursery@
+will fail.  @ZapNursery@ is called by asynchronous interrupts to arrange that 
+execution is brought to an orderly halt.
+
+\item[@GarbageCollect( void *Roots )@] performs garbage collection.
+The parameter @Roots@ is a procedure which is called by the garbage
+collector when it wishes to find all the roots.  This procedure
+should in turn call @MarkRoot@ on each such root.
+
+\item[@void *MarkRoot( void *p )@] informs the garbage collector that
+@p@ is a root.  It returns the new location of the object.
+
+\item[@RecordMutable( void *p )@] informs the storage manager that
+@p@ points to an object that may contain pointers to objects in
+a younger generation.
+
+It is not necessary to call @RecordMutable@ on objects known to
+be in the nursery.
+
+It is only necessary to call @RecordMutable@ once.  For objects that
+are genuinely mutable (like mutable arrays), the object is permanently
+recorded as mutable.  On the other hand, thunks that are updated only
+once can be dropped from the mutables pool once their pointer has been
+dealt with (promoted).
+
+\item{@boolean InNursery( void *p )@} tells whether @p@ points into
+the nursery.  This should be a rather fast test; it is used to guard
+calls to @RecordMutable@.
+\end{description}
+
+
+\section{Allocation}
+
+We want to experiment with the idea of keeping the youngest generation
+entirely in the cache, so that an object might be allocated, used,
+and garbage collected without ever hitting main memory.
+
+To do this, generation zero consists of a fixed number of one-\block{}
+\segment{}s, carefully allocated in a contiguous address range so that
+they fit snugly next to each other in the cache.  
+(Large objects are allocated directly into generation 1,
+so that there is no need to find contiguous \block{}s in generation 0 --- Section~\ref{sect:large}.)
+
+After a GC, the storage manager decides how many generation-0 \segment{}s
+(= \block{}s) can be
+allocated before the next minor GC; it chains these \segment{}s together
+through their @SegLink@ fields.  The storage manager then makes the allocation pointer 
+(@Hp@) point to the start of the first \segment{}, and the limit pointer
+(@HpLim@) point to the end of that \segment{}.
+Objects are allocated by incrementing the @Hp@ until
+it hits @HpLim@.  If the @SegLink@ field of the current \segment{} is NIL,
+then it is time for GC; otherwise @Hp@ is set to point to the first data 
+word of the next \segment{},
+and @HpLim@ to the end of the \segment{}, and allocation
+continues.  So between GCs allocation is only interrupted briefly
+(i.e. a handful of instructions)
+to move @Hp@/@HpLim@ on to the next \segment{}.  Actually, this regular
+interruption can be useful, because it makes a good moment to pre-empt
+a thread to schedule another, or to respond to an interrupt.
+(Pre-emption cannot occur at arbitrary times because we are assuming
+an accurate garbage collector.)
+
+We expect there to be two \step{}s in generation zero, \step{} 0 and \step{} 1.
+At a minor GC we promote live objects in \step{} 1, and copy \step{} 0 to
+become a new \step{} 1.  The new \step{} 1 should fit also fit into the cache.
+Suppose there are 16 generation-zero \segment{}s.  A reasonable budget
+might be this:
+\begin{center}
+\begin{tabular}{l}
+	10 \segment{}s to allocate into (\step{} 0) \\
+	3 \segment{}s for \step{} 1 \\
+	3 \segment{}s free for the new \step{} 1
+\end{tabular}
+\end{center}
+
+This assumes a 30\% survival rate from \step{} 0 to \step{} 1.  We can do in-flight
+measurements to tune this.  And (MOST IMPORTANT) if we get it wrong, and
+get a blip in which 9 \segment{}s survive minor GC, all is not lost... we simply
+have to use some new \block{}s and generation zero gets bigger for a while
+and perhaps less cache resident.  (This contrasts with systems which rely
+on contiguous partitions, where an overflow is catastrophic, and which must
+therefore over-reserve.  We get better address-space usage by not having
+to be so conservative.  Again, this isn't a new observation.)
+
+
+\section{Doing garbage collection}
+
+When GC happens, we first decide which generation to collect.  (How?)
+When we collect generation G we collect all younger generations too.
+
+To begin with we assume a copying style garbage collection
+algorithm.  The structure of a full GC is as follows:
+
+\begin{enumerate}
+\item Initialise
+\item For each root @r@ do @r:=Evacuate(r)@.
+\item Scavenge
+\item Tidy up
+\end{enumerate}
+
+A {\em root} is either a pointer held by the RTS (notably the queue
+of runnable threads), or a pointer from a generation > G that might
+point into a generation <= G.
+
+The @Evacuate@ operation does this:
+
+\begin{code}
+Evacuate(p) { 
+  /* Ignore pointers into non-collected generations */
+  if generation(p) > G then return(p);
+
+  /* Copy the object and return its new location */
+  p' := copy the object pointed to by p into to-space(g);
+
+  return(p');
+}
+\end{code}
+The @Scavenge@ operation operation does this:
+\begin{code}
+Scavenge() {
+  While there is an un-scavenged object O in to-space do
+	for each pointer q in O do
+		q := Mark(q)
+		remember that O has been scavenged
+}
+\end{code}
+We say ``copy into to-space'' and ``find an un-scavenged object in to-space'',
+but actually each \step{} of each generation has a separate to-space,
+complete with allocation pointer and sweep pointer. (The sweep pointer
+finds the un-scavenged objects.)  When copying
+into a to-space a check must be made for to-space 
+overflow and a new \segment{} chained on if necessary. 
+This is much as when the 
+mutator is doing allocation.  The difference is that we are allocating
+in multiple \segment{}s in an interleaved fashion, so we can't hold
+the allocation pointers in registers.
+
+Only when all the sweep pointers
+have reached the corresponding allocation pointer is GC done.
+This entails multiple passes through the \step{}s until a pass 
+finds no un-scavenged objects.  [NB: if we treated inter-generational pointers
+between generations <= G as roots, then we could avoid some of these multiple
+passes.  But we'd collect less garbage, and hence copy more
+objects, so it would probably be slower.]  
+
+Notice that there is no issue about partitioning the address space
+among generations, as there might be if each lived in a contiguous address
+space.
+
+\section{Large objects}
+\label{sect:large}
+
+Large objects are allocated directly in generation 1, each alone in its
+own \segment{}.  This means that they never, ever, need to be copied!  Instead,
+we simply change their generation or \step{} number.  We also need to chain the
+object onto a list of large objects to be swept for pointers.  This act also
+serves to mark the object, so we don't chain it on twice, or increment its
+\step{} number twice.
+
+Stack objects probably count as large objects.  This is one reason for not
+wanting \block{}s to be too big.
+
+
+\section{Mark sweep}
+
+So far we have assumed copying collection.  Actually, for 
+older generations we may want to use 
+mark/sweep/compact because it is a bit less greedy on address
+space, and may help paging behaviour.
+
+What I have in mind is a pointer-reversing mark phase, so that we don't have
+to allocate space for a mark stack, although one could argue that a mark
+stack will almost invariably be small, and if it gets big then you can always
+malloc some more space.  Mumble mumble.
+
+\bibliographystyle{plain} 
+\bibliography{bibl,simon}
+
+\end{document}