ghc/docs/libraries/Weak.sgml

   1 <sect> <idx/Weak/
   2 <label id="sec:Weak">
   3 <p>
   4
   5 The <tt/Weak/ library provides a "weak pointer" abstraction, giving
   6 the user some control over the garbage collection of specified
   7 objects, and allowing objects to be "finalized" with an arbitrary
   8 Haskell IO computation when they die.
   9
  10 Weak pointers partially replace the old foreign object interface, as
  11 we will explain later.
  12
  13 <sect1>Module Signature
  14 <p>
  15
  16 <tscreen><verb>
  17 module Weak (
  18         Weak,                   -- abstract
  19         -- instance Eq (Weak v)
  20
  21         mkWeak,                 -- :: k -> v -> Maybe (IO ()) -> IO (Weak v)
  22         deRefWeak,              -- :: Weak v -> IO (Maybe v)
  23         finalize,               -- :: Weak v -> IO ()
  24
  25         -- Not yet implemented
  26         -- replaceFinalizer     -- :: Weak v -> IO () -> IO ()
  27
  28         mkWeakPtr,              -- :: k -> Maybe (IO ()) -> IO (Weak k)
  29         mkWeakPair,             -- :: k -> v -> Maybe (IO ()) -> IO (Weak (k,v))
  30         addFinalizer,           -- :: k -> IO () -> IO ()
  31         addForeignFinalizer     -- :: ForeignObj -> IO () -> IO ()
  32    ) where
  33 </verb></tscreen>
  34
  35 <sect1>Weak pointers
  36 <p>
  37
  38 In general terms, a weak pointer is a reference to an object that is
  39 not followed by the garbage collector --- that is, the existence of a
  40 weak pointer to an object has no effect on the lifetime of that
  41 object.  A weak pointer can be de-referenced to find out
  42 whether the object it refers to is still alive or not, and if so
  43 to return the object itself.
  44
  45 Weak pointers are particularly useful for caches and memo tables.
  46 To build a memo table, you build a data structure
  47 mapping from the function argument (the key) to its result (the
  48 value).  When you apply the function to a new argument you first
  49 check whether the key/value pair is already in the memo table.
  50 The key point is that the memo table itself should not keep the
  51 key and value alive.  So the table should contain a weak pointer
  52 to the key, not an ordinary pointer.  The pointer to the value must
  53 not be weak, because the only reference to the value might indeed be
  54 from the memo table.
  55
  56 So it looks as if the memo table will keep all its values
  57 alive for ever.  One way to solve this is to purge the table
  58 occasionally, by deleting entries whose keys have died.
  59
  60 The weak pointers in this library
  61 support another approach, called <em/finalization/.
  62 When the key referred to by a weak pointer dies, the storage manager
  63 arranges to run a programmer-specified finalizer.  In the case of memo
  64 tables, for example, the finalizer could remove the key/value pair
  65 from the memo table.
  66
  67 Another difficulty with the memo table is that the value of a
  68 key/value pair might itself contain a pointer to the key.
  69 So the memo table keeps the value alive, which keeps the key alive,
  70 even though there may be no other references to the key so both should
  71 die.  The weak pointers in this library provide a slight
  72 generalisation of the basic weak-pointer idea, in which each
  73 weak pointer actually contains both a key and a value.
  74 We describe this in more detail below.
  75
  76 <sect1>The simple interface
  77 <p>
  78
  79 <tscreen><verb>
  80         mkWeakPtr    :: a -> Maybe (IO ()) -> IO (Weak a)
  81         deRefWeak    :: Weak a -> IO (Maybe a)
  82         addFinalizer :: a -> IO () -> IO ()
  83 </verb></tscreen>
  84
  85 <tt/mkWeakPtr/ takes a value of any type <tt/a/, and maybe a finalizer
  86 of type <tt/IO ()/, and returns a weak pointer object referring to the
  87 value, of type <tt/Weak a/.  It is in the <tt/IO/ monad because it has
  88 the side effect of arranging that the finalizer (if there is one) will
  89 be run when the object dies.  In what follows, a ``weak pointer
  90 object'', or ``weak pointer'' for short, means precisely ``a Haskell
  91 value of type <tt/Weak t/'' for some type <tt/t/.  A weak pointer
  92 (object) is a first-class Haskell value; it can be passed to
  93 functions, stored in data structures, and so on.
  94
  95 <tt/deRefWeak/ dereferences a weak pointer, returning <tt/Just v/ if
  96 the value is still alive.  If the key has already died, then
  97 <tt/deRefWeak/ returns <tt/Nothing/; that's why it's in the <tt/IO/
  98 monad - the return value of <tt/deRefWeak/ depends on when the garbage
  99 collector runs.
 100
 101 <tt/addFinalizer/ is just another name for <tt/mkWeakPtr/ except that
 102 it throws the weak pointer itself away.  (The runtime system will
 103 remember that the weak pointer and hence the finalizer exists even if
 104 the program has forgotten it.)
 105
 106 <tscreen><verb>
 107   addFinalizer :: a -> IO () -> IO ()
 108   addFinalizer v f = do { mkWeakPtr v f; return () }
 109 </verb></tscreen>
 110
 111 The effect of <tt/addFinalizer/ is simply that the finalizer runs when
 112 the referenced object dies.
 113
 114 The following properties hold:
 115
 116 <itemize>
 117 <item> <tt/deRefWeak/ returns the original object until
 118 that object is considered dead; it returns <tt/Nothing/
 119 subsequently.
 120 <item>
 121 Every finalizer will eventually be run, exactly once, either
 122 soon after the object dies, or at the end of the program.
 123 There is no requirement for the programmer to hold onto the
 124 weak pointer itself; finalization is completely unaffected by
 125 whether the weak pointer itself is alive.
 126 <item>
 127 There may be multiple weak pointers to a single object.
 128 In this case, the finalizers for each of these weak pointers will
 129 all be run in some arbitrary order, or perhaps concurrently,
 130 when the object dies.  If the programmer specifies a finalizer that
 131 assumes it has the only reference to an object
 132 (for example, a file that it wishes to close), then the programmer
 133 must ensure that there is only one such finalizer.
 134 <item>
 135 The storage manager attempts to run the finalizer(s) for an
 136 object soon after the object dies, but promptness is not guaranteed.
 137 (What is guaranteed is that the finalizer will
 138 eventually run, exactly once.)
 139 <item>
 140 At the moment when a finalizer is run, a call to <tt/deRefWeak/
 141 will return <tt/Nothing/.
 142 <item>
 143 A finalizer may contain a pointer to the object, but that pointer
 144 will not keep the object alive.  For example:
 145 <tscreen><verb>
 146   f :: Show a => a -> IO a
 147   f x = addFinalizer x (print (show x))
 148 </verb></tscreen>
 149 Here the finalizer <tt/print (show x)/ contains a reference to <tt/x/
 150 itself, but that does not keep <tt/x/ alive.  When that is the only
 151 reference to <tt/x/, the finalizer is run; and the message appears
 152 on the screen.
 153 <item>
 154 A finalizer may even resurrect the object, by (say) storing it in
 155 some global data structure.
 156 </itemize>
 157
 158 <sect1>The general interface
 159 <p>
 160
 161 The <tt/Weak/ library offers a slight generalisation of
 162 the simple weak pointers described so far:
 163 <tscreen><verb>
 164         mkWeak :: k -> v -> Maybe (IO ()) -> IO (Weak v)
 165 </verb></tscreen>
 166 <tt/mkWeak/ takes a key of any type <tt/k/ and a value of any type
 167 <tt/v/, as well as a finalizer, and returns a weak pointer of type
 168 <tt/Weak v/.
 169
 170 <tt/deRefWeak/ returns the <em/value/ only, not the key, as its
 171 type (given above) implies:
 172 <tscreen><verb>
 173         deRefWeak :: Weak a -> IO (Maybe a)
 174 </verb></tscreen>
 175 However, <tt/deRefWeak/ returns <tt/Nothing/ if the <em/key/, not the
 176 value, has died.  Furthermore, references from the value to the key
 177 do not keep the key alive, in the same way that the finalizer does
 178 not keep the key alive.
 179
 180 Simple weak pointers are readily defined in terms of these more general
 181 weak pointers:
 182 <tscreen><verb>
 183   mkWeakPtr :: a -> Maybe (IO ()) -> IO (Weak a)
 184   mkWeakPtr v f = mkWeak v v f
 185 </verb></tscreen>
 186
 187 These more general weak pointers are enough to implement memo
 188 tables properly.
 189
 190 A weak pointer can be finalized early, using the <tt/finalize/ operation:
 191
 192 <tscreen><verb>
 193 finalize :: Weak v -> IO ()
 194 </verb></tscreen>
 195
 196 <sect1> A precise semantics
 197 <p>
 198 The above informal specification is fine for simple situations,
 199 but matters can get complicated.  In particular, it needs to
 200 be clear exactly when a key dies, so that any weak pointers
 201 that refer to it can be finalized.
 202 Suppose, for example, the value of one weak pointer refers
 203 to the key of another...does that keep the key alive?
 204
 205 The behaviour is simply this:
 206
 207 <itemize>
 208 <item> If a weak pointer (object) refers to an <em/unreachable/
 209 key, it may be finalized.
 210 <item> Finalization means (a) arrange that subsequent calls
 211 to <tt/deRefWeak/ return <tt/Nothing/; and (b) run the finalizer.
 212 </itemize>
 213
 214 This behaviour depends on what it means for a key to be reachable.
 215 Informally,
 216 something is reachable if it can be reached by following ordinary
 217 pointers from the root set, but not following weak pointers.
 218 We define reachability more precisely as
 219 follows
 220 A heap object is reachable if:
 221
 222 <itemize>
 223 <item> It is a member of the <em/root set/.
 224 <item> It is directly pointed to by a reachable object, other than
 225 a weak pointer object.
 226 <item> It is a weak pointer object whose key is reachable.
 227 <item> It is the value or finalizer of an object whose key is
 228 reachable.
 229 </itemize>
 230
 231 The root set consists of all runnable threads, and all stable pointers
 232 (see Section <ref id="sec:stable-pointers" name="Stable Pointers">).
 233 NOTE: currently all top-level objects are considered to be reachable,
 234 although we hope to remove this restriction in the future.  A
 235 <tt/Char/ or small <tt/Int/ will also be constantly reachable, since
 236 the garbage collector replaces heap-resident <tt/Char/s and small
 237 <tt/Int/s with pointers to static copies.
 238
 239 Notice that a pointer to the key from its associated
 240 value or finalizer does not make the key reachable.
 241 However, if the key is reachable some other way, then the value
 242 and the finalizer are reachable, and so, therefore, are any other
 243 keys they refer to directly or indirectly.
 244
 245
 246 <sect1>Finalization for foreign objects
 247 <label id="foreign-finalizers">
 248 <p>
 249
 250 A foreign object is some data that lives outside the Haskell heap, for
 251 example some <tt/malloc/ed data in C land.  It's useful to be able to
 252 know when the Haskell program no longer needs the <tt/malloc/ed data,
 253 so it can be <tt/free/d.  We can use weak pointers and finalizers for
 254 this, but we have to be careful: the foreign data is usually
 255 referenced by an address, ie. an <tt/Addr/ (see Section <ref
 256 name="Addr" id="sec:Addr">), and we must retain the invariant that
 257 <em/if the Haskell program still needs the foreign object, then it
 258 retains the <tt/Addr/ object in the heap/.  This invariant isn't
 259 guaranteed to hold if we use <tt/Addr/, because an <tt/Addr/ consists
 260 of a box around a raw address <tt/Addr#/.  If the Haskell program can
 261 manipulate the <tt/Addr#/ object independently of the heap-resident
 262 <tt/Addr/, then the foreign object could be inadvertently finalized
 263 early, because a weak pointer to the <tt/Addr/ would find no more
 264 references to its key and trigger the finalizer despite the fact that
 265 the program still holds the <tt/Addr#/ and intends to use it.
 266
 267 To avoid this somewhat subtle race condition, we use another type of
 268 foreign address, called <tt/ForeignObj/ (see Section <ref
 269 id="sec:Foreign" name="Foreign">).  Historical note: <tt/ForeignObj/
 270 is identical to the old <tt/ForeignObj/ except that it no longer
 271 supports finalization - that's provided by the weak
 272 pointer/finalization mechanism above.
 273
 274 A <tt/ForeignObj/ is basically an address, but the <tt/ForeignObj/
 275 itself is a heap-resident object and can therefore be watched by weak
 276 pointers.  A <tt/ForeignObj/ can be passed to C functions (in which
 277 case the C function gets a straightforward pointer), but it cannot be
 278 decomposed into an <tt/Addr#/.