ghc/docs/libraries/Weak.sgml

   1 <sect> <idx/Weak/
   2 <label id="sec:Weak">
   3 <p>
   4
   5 The <tt/Weak/ library provides a "weak pointer" abstraction, giving
   6 the user some control over the garbage collection of specified
   7 objects, and allowing objects to be "finalised" with an arbitrary
   8 Haskell IO computation when they die.
   9
  10 Weak pointers partially replace the old foreign object interface, as
  11 we will explain later.
  12
  13 <sect1>Module Signature
  14 <p>
  15
  16 <tscreen><verb>
  17 module Weak (
  18         Weak,                   -- abstract
  19         -- instance Eq (Weak v)
  20
  21         mkWeak,                 -- :: k -> v -> IO () -> IO (Weak v)
  22         deRefWeak,              -- :: Weak v -> IO (Maybe v)
  23         finalise,               -- :: Weak v -> IO ()
  24
  25         -- Not yet implemented
  26         -- replaceFinaliser     -- :: Weak v -> IO () -> IO ()
  27
  28         mkWeakPtr,              -- :: k -> IO () -> IO (Weak k)
  29         mkWeakPair,             -- :: k -> v -> IO () -> IO (Weak (k,v))
  30         mkWeakNoFinaliser,      -- :: k -> v -> IO (Weak v)
  31         addFinaliser,           -- :: k -> IO () -> IO ()
  32         addForeignFinaliser     -- :: ForeignObj -> IO () -> IO ()
  33    ) where
  34 </verb></tscreen>
  35
  36 <sect1>Weak pointers
  37 <p>
  38
  39 In general terms, a weak pointer is a reference to an object that is
  40 not followed by the garbage collector --- that is, the existence of a
  41 weak pointer to an object has no effect on the lifetime of that
  42 object.  A weak pointer can be de-referenced to find out
  43 whether the object it refers to is still alive or not, and if so
  44 to return the object itself.
  45
  46 Weak pointers are particularly useful for caches and memo tables.
  47 To build a memo table, you build a data structure
  48 mapping from the function argument (the key) to its result (the
  49 value).  When you apply the function to a new argument you first
  50 check whether the key/value pair is already in the memo table.
  51 The key point is that the memo table itself should not keep the
  52 key and value alive.  So the table should contain a weak pointer
  53 to the key, not an ordinary pointer.  The pointer to the value must
  54 not be weak, because the only reference to the value might indeed be
  55 from the memo table.
  56
  57 So it looks as if the memo table will keep all its values
  58 alive for ever.  One way to solve this is to purge the table
  59 occasionally, by deleting entries whose keys have died.
  60
  61 The weak pointers in this library
  62 support another approach, called <em/finalisation/.
  63 When the key referred to by a weak pointer dies, the storage manager
  64 arranges to run a programmer-specified finaliser.  In the case of memo
  65 tables, for example, the finaliser could remove the key/value pair
  66 from the memo table.
  67
  68 Another difficulty with the memo table is that the value of a
  69 key/value pair might itself contain a pointer to the key.
  70 So the memo table keeps the value alive, which keeps the key alive,
  71 even though there may be no other references to the key so both should
  72 die.  The weak pointers in this library provide a slight
  73 generalisation of the basic weak-pointer idea, in which each
  74 weak pointer actually contains both a key and a value.
  75 We describe this in more detail below.
  76
  77 <sect1>The simple interface
  78 <p>
  79
  80 <tscreen><verb>
  81         mkWeakPtr    :: a -> IO () -> IO (Weak a)
  82         deRefWeak    :: Weak a -> IO (Maybe a)
  83         addFinaliser :: a -> IO () -> IO ()
  84 </verb></tscreen>
  85
  86 <tt/mkWeakPtr/ takes a value of any type <tt/a/, and a finaliser of
  87 type <tt/IO ()/, and returns a weak pointer object referring
  88 to the value, of type <tt/Weak a/.
  89 It is in the <tt/IO/ monad because it has the
  90 side effect of arranging that the finaliser will be run when the
  91 object dies.  In what follows, a ``weak pointer object'', or ``weak
  92 pointer'' for short, means precisely ``a Haskell value of
  93 type <tt/Weak t/'' for some type <tt/t/.
  94 A weak pointer (object) is a first-class Haskell value; it can be passed to
  95 functions, stored in data structures, and so on.
  96
  97 <tt/deRefWeak/ dereferences a weak pointer, returning <tt/Just v/ if
  98 the value is still alive.  If the key has already died, then
  99 <tt/deRefWeak/ returns <tt/Nothing/; that's why it's in the <tt/IO/
 100 monad - the return value of <tt/deRefWeak/ depends on when the garbage
 101 collector runs.
 102
 103 <tt/addFinaliser/ is just another name for <tt/mkWeakPtr/ except that
 104 it throws the weak pointer itself away.  (The runtime system will
 105 remember that the weak pointer and hence the finaliser exists even if
 106 the program has forgotten it.)
 107
 108 <tscreen><verb>
 109   addFinaliser :: a -> IO () -> IO ()
 110   addFinaliser v f = do { mkWeakPtr v f; return () }
 111 </verb></tscreen>
 112
 113 The effect of <tt/addFinaliser/ is simply that the finaliser runs when
 114 the referenced object dies.
 115
 116 The following properties hold:
 117
 118 <itemize>
 119 <item> <tt/deRefWeak/ returns the original object until
 120 that object is considered dead; it returns <tt/Nothing/
 121 subsequently.
 122 <item>
 123 Every finaliser will eventually be run, exactly once, either
 124 soon after the object dies, or at the end of the program.
 125 There is no requirement for the programmer to hold onto the
 126 weak pointer itself; finalisation is completely unaffected by
 127 whether the weak pointer itself is alive.
 128 <item>
 129 There may be multiple weak pointers to a single object.
 130 In this case, the finalisers for each of these weak pointers will
 131 all be run in some arbitrary order, or perhaps concurrently,
 132 when the object dies.  If the programmer specifies a finaliser that
 133 assumes it has the only reference to an object
 134 (for example, a file that it wishes to close), then the programmer
 135 must ensure that there is only one such finaliser.
 136 <item>
 137 The storage manager attempts to run the finaliser(s) for an
 138 object soon after the object dies, but promptness is not guaranteed.
 139 (What is guaranteed is that the finaliser will
 140 eventually run, exactly once.)
 141 <item>
 142 At the moment when a finaliser is run, a call to <tt/deRefWeak/
 143 will return <tt/Nothing/.
 144 <item>
 145 A finaliser may contain a pointer to the object, but that pointer
 146 will not keep the object alive.  For example:
 147 <tscreen><verb>
 148   f :: Show a => a -> IO a
 149   f x = addFinaliser x (print (show x))
 150 </verb></tscreen>
 151 Here the finaliser <tt/print (show x)/ contains a reference to <tt/x/
 152 itself, but that does not keep <tt/x/ alive.  When that is the only
 153 reference to <tt/x/, the finaliser is run; and the message appears
 154 on the screen.
 155 <item>
 156 A finaliser may even resurrect the object, by (say) storing it in
 157 some global data structure.
 158 </itemize>
 159
 160 <sect1>The general interface
 161 <p>
 162
 163 The <tt/Weak/ library offers a slight generalisation of
 164 the simple weak pointers described so far:
 165 <tscreen><verb>
 166         mkWeak :: k -> v -> IO () -> IO (Weak v)
 167 </verb></tscreen>
 168 <tt/mkWeak/ takes a key of any type <tt/k/ and a value of any type
 169 <tt/v/, as well as a finaliser, and returns a weak pointer of type
 170 <tt/Weak v/.
 171
 172 <tt/deRefWeak/ returns the <em/value/ only, not the key, as its
 173 type (given above) implies:
 174 <tscreen><verb>
 175         deRefWeak :: Weak a -> IO (Maybe a)
 176 </verb></tscreen>
 177 However, <tt/deRefWeak/ returns <tt/Nothing/ if the <em/key/, not the
 178 value, has died.  Furthermore, references from the value to the key
 179 do not keep the key alive, in the same way that the finaliser does
 180 not keep the key alive.
 181
 182 Simple weak pointers are readily defined in terms of these more general
 183 weak pointers:
 184 <tscreen><verb>
 185   mkWeakPtr :: a -> IO () -> IO (Weak a)
 186   mkWeakPtr v f = mkWeak v v f
 187 </verb></tscreen>
 188
 189 These more general weak pointers are enough to implement memo
 190 tables properly.
 191
 192 A weak pointer can be finalised early, using the <tt/finalise/ operation:
 193
 194 <tscreen><verb>
 195 finalise :: Weak v -> IO ()
 196 </verb></tscreen>
 197
 198 When you don't need a finaliser, we provide the following operation:
 199
 200 <tscreen><verb>
 201 mkWeakNoFinaliser :: k -> v -> IO (Weak v)
 202 mkWeakNoFinaliser k v = mkWeak k v (return ())
 203 </verb></tscreen>
 204
 205 Which creates a weak pointer with a null finaliser.  Lots of null
 206 finalisers can be expensive, because each one runs in a separate
 207 thread, so the intention is that <tt/mkWeakNoFinaliser> avoids all the
 208 extra costs by generating a special kind of weak pointer without a
 209 finaliser.  So although the semantics of mkWeakNoFinaliser is as given
 210 above, its actual implementation is somewhat different.
 211
 212 <sect1> A precise semantics
 213 <p>
 214 The above informal specification is fine for simple situations,
 215 but matters can get complicated.  In particular, it needs to
 216 be clear exactly when a key dies, so that any weak pointers
 217 that refer to it can be finalised.
 218 Suppose, for example, the value of one weak pointer refers
 219 to the key of another...does that keep the key alive?
 220
 221 The behaviour is simply this:
 222
 223 <itemize>
 224 <item> If a weak pointer (object) refers to an <em/unreachable/
 225 key, it may be finalised.
 226 <item> Finalisation means (a) arrange that subsequent calls
 227 to <tt/deRefWeak/ return <tt/Nothing/; and (b) run the finaliser.
 228 </itemize>
 229
 230 This behaviour depends on what it means for a key to be reachable.
 231 Informally,
 232 something is reachable if it can be reached by following ordinary
 233 pointers from the root set, but not following weak pointers.
 234 We define reachability more precisely as
 235 follows
 236 A heap object is reachable if:
 237
 238 <itemize>
 239 <item> It is a member of the <em/root set/.
 240 <item> It is directly pointed to by a reachable object, other than
 241 a weak pointer object.
 242 <item> It is a weak pointer object whose key is reachable.
 243 <item> It is the value or finaliser of an object whose key is
 244 reachable.
 245 </itemize>
 246
 247 The root set consists of all runnable threads, and all stable pointers
 248 (see Section <ref id="sec:stable-pointers" name="Stable Pointers">).
 249 NOTE: currently all top-level objects are considered to be reachable,
 250 although we hope to remove this restriction in the future.  A
 251 <tt/Char/ or small <tt/Int/ will also be constantly reachable, since
 252 the garbage collector replaces heap-resident <tt/Char/s and small
 253 <tt/Int/s with pointers to static copies.
 254
 255 Notice that a pointer to the key from its associated
 256 value or finaliser does not make the key reachable.
 257 However, if the key is reachable some other way, then the value
 258 and the finaliser are reachable, and so, therefore, are any other
 259 keys they refer to directly or indirectly.
 260
 261
 262 <sect1>Finalisation for foreign objects
 263 <label id="foreign-finalisers">
 264 <p>
 265
 266 A foreign object is some data that lives outside the Haskell heap, for
 267 example some <tt/malloc/ed data in C land.  It's useful to be able to
 268 know when the Haskell program no longer needs the <tt/malloc/ed data,
 269 so it can be <tt/free/d.  We can use weak pointers and finalisers for
 270 this, but we have to be careful: the foreign data is usually
 271 referenced by an address, ie. an <tt/Addr/ (see Section <ref
 272 name="Addr" id="sec:Addr">), and we must retain the invariant that
 273 <em/if the Haskell program still needs the foreign object, then it
 274 retains the <tt/Addr/ object in the heap/.  This invariant isn't
 275 guaranteed to hold if we use <tt/Addr/, because an <tt/Addr/ consists
 276 of a box around a raw address <tt/Addr#/.  If the Haskell program can
 277 manipulate the <tt/Addr#/ object independently of the heap-resident
 278 <tt/Addr/, then the foreign object could be inadvertently finalised
 279 early, because a weak pointer to the <tt/Addr/ would find no more
 280 references to its key and trigger the finaliser despite the fact that
 281 the program still holds the <tt/Addr#/ and intends to use it.
 282
 283 To avoid this somewhat subtle race condition, we use another type of
 284 foreign address, called <tt/ForeignObj/ (see Section <ref
 285 id="sec:Foreign" name="Foreign">).  Historical note: <tt/ForeignObj/
 286 is identical to the old <tt/ForeignObj/ except that it no longer
 287 supports finalisation - that's provided by the weak
 288 pointer/finalisation mechanism above.
 289
 290 A <tt/ForeignObj/ is basically an address, but the <tt/ForeignObj/
 291 itself is a heap-resident object and can therefore be watched by weak
 292 pointers.  A <tt/ForeignObj/ can be passed to C functions (in which
 293 case the C function gets a straightforward pointer), but it cannot be
 294 decomposed into an <tt/Addr#/.