ghc/docs/libraries/Weak.sgml

   1 <sect> <idx/Weak/
   2 <label id="sec:Weak">
   3 <p>
   4
   5 The <tt/Weak/ library provides a "weak pointer" abstraction, giving
   6 the user some control over the garbage collection of specified
   7 objects, and allowing objects to be "finalised" with an arbitrary
   8 Haskell IO computation when they die.
   9
  10 Weak pointers partially replace the old foreign object interface, as
  11 we will explain later.
  12
  13 <sect1>Module Signature
  14 <p>
  15
  16 <tscreen><verb>
  17 module Weak (
  18         Weak,                   -- abstract
  19         -- instance Eq (Weak v)
  20
  21         mkWeak,                 -- :: k -> v -> IO () -> IO (Weak v)
  22         deRefWeak               -- :: Weak v -> IO (Maybe v)
  23
  24         -- Not yet implemented
  25         -- finalise             -- :: Weak v -> IO ()
  26         -- replaceFinaliser     -- :: Weak v -> IO () -> IO ()
  27
  28         mkWeakPtr               -- :: k -> IO () -> IO (Weak k)
  29         mkWeakPair              -- :: k -> v -> IO () -> IO (Weak (k,v))
  30         addFinaliser            -- :: k -> IO () -> IO ()
  31         addForeignFinaliser     -- :: ForeignObj -> IO () -> IO ()
  32    ) where
  33 </verb></tscreen>
  34
  35 <sect1>Weak pointers
  36 <p>
  37
  38 In general terms, a weak pointer is a reference to an object that is
  39 not followed by the garbage collector --- that is, the existence of a
  40 weak pointer to an object has no effect on the lifetime of that
  41 object.  A weak pointer can be de-referenced to find out
  42 whether the object it refers to is still alive or not, and if so
  43 to return the object itself.
  44
  45 Weak pointers are particularly useful for caches and memo tables.
  46 To build a memo table, you build a data structure
  47 mapping from the function argument (the key) to its result (the
  48 value).  When you apply the function to a new argument you first
  49 check whether the key/value pair is already in the memo table.
  50 The key point is that the memo table itself should not keep the
  51 key and value alive.  So the table should contain a weak pointer
  52 to the key, not an ordinary pointer.  The pointer to the value must
  53 not be weak, because the only reference to the value might indeed be
  54 from the memo table.
  55
  56 So it looks as if the memo table will keep all its values
  57 alive for ever.  One way to solve this is to purge the table
  58 occasionally, by deleting entries whose keys have died.
  59
  60 The weak pointers in this library
  61 support another approach, called <em/finalisation/.
  62 When the key referred to by a weak pointer dies, the storage manager
  63 arranges to run a programmer-specified finaliser.  In the case of memo
  64 tables, for example, the finaliser could remove the key/value pair
  65 from the memo table.
  66
  67 Another difficulty with the memo table is that the value of a
  68 key/value pair might itself contain a pointer to the key.
  69 So the memo table keeps the value alive, which keeps the key alive,
  70 even though there may be no other references to the key so both should
  71 die.  The weak pointers in this library provide a slight
  72 generalisation of the basic weak-pointer idea, in which each
  73 weak pointer actually contains both a key and a value.
  74 We describe this in more detail below.
  75
  76 <sect1>The simple interface
  77 <p>
  78
  79 <tscreen><verb>
  80         mkWeakPtr    :: a -> IO () -> IO (Weak a)
  81         deRefWeak    :: Weak a -> IO (Maybe a)
  82         addFinaliser :: a -> IO () -> IO ()
  83 </verb></tscreen>
  84
  85 <tt/mkWeakPtr/ takes a value of any type <tt/a/, and a finaliser of
  86 type <tt/IO ()/, and returns a weak pointer object referring
  87 to the value, of type <tt/Weak a/.
  88 It is in the <tt/IO/ monad because it has the
  89 side effect of arranging that the finaliser will be run when the
  90 object dies.  In what follows, a ``weak pointer object'', or ``weak
  91 pointer'' for short, means precisely ``a Haskell value of
  92 type <tt/Weak t/'' for some type <tt/t/.
  93 A weak pointer (object) is a first-class Haskell value; it can be passed to
  94 functions, stored in data structures, and so on.
  95
  96 <tt/deRefWeak/ dereferences a weak pointer, returning <tt/Just v/ if
  97 the value is still alive.  If the key has already died, then
  98 <tt/deRefWeak/ returns <tt/Nothing/; that's why it's in the <tt/IO/
  99 monad - the return value of <tt/deRefWeak/ depends on when the garbage
 100 collector runs.
 101
 102 <tt/addFinaliser/ is just another name for <tt/mkWeakPtr/ except that
 103 it throws the weak pointer itself away.  (The runtime system will
 104 remember that the weak pointer and hence the finaliser exists even if
 105 the program has forgotten it.)
 106
 107 <tscreen><verb>
 108   addFinaliser :: a -> IO () -> IO ()
 109   addFinaliser v f = do { mkWeakPtr v f; return () }
 110 </verb></tscreen>
 111
 112 The effect of <tt/addFinaliser/ is simply that the finaliser runs when
 113 the referenced object dies.
 114
 115 The following properties hold:
 116
 117 <itemize>
 118 <item> <tt/deRefWeak/ returns the original object until
 119 that object is considered dead; it returns <tt/Nothing/
 120 subsequently.
 121 <item>
 122 Every finaliser will eventually be run, exactly once, either
 123 soon after the object dies, or at the end of the program.
 124 There is no requirement for the programmer to hold onto the
 125 weak pointer itself; finalisation is completely unaffected by
 126 whether the weak pointer itself is alive.
 127 <item>
 128 There may be multiple weak pointers to a single object.
 129 In this case, the finalisers for each of these weak pointers will
 130 all be run in some arbitrary order, or perhaps concurrently,
 131 when the object dies.  If the programmer specifies a finaliser that
 132 assumes it has the only reference to an object
 133 (for example, a file that it wishes to close), then the programmer
 134 must ensure that there is only one such finaliser.
 135 <item>
 136 The storage manager attempts to run the finaliser(s) for an
 137 object soon after the object dies, but promptness is not guaranteed.
 138 (What is guaranteed is that the finaliser will
 139 eventually run, exactly once.)
 140 <item>
 141 At the moment when a finaliser is run, a call to <tt/deRefWeak/
 142 will return <tt/Nothing/.
 143 <item>
 144 A finaliser may contain a pointer to the object, but that pointer
 145 will not keep the object alive.  For example:
 146 <tscreen><verb>
 147   f :: Show a => a -> IO a
 148   f x = addFinaliser x (print (show x))
 149 </verb></tscreen>
 150 Here the finaliser <tt/print (show x)/ contains a reference to <tt/x/
 151 itself, but that does not keep <tt/x/ alive.  When that is the only
 152 reference to <tt/x/, the finaliser is run; and the message appears
 153 on the screen.
 154 <item>
 155 A finaliser may even resurrect the object, by (say) storing it in
 156 some global data structure.
 157 </itemize>
 158
 159 <sect1>The general interface
 160 <p>
 161
 162 The <tt/Weak/ library offers a slight generalisation of
 163 the simple weak pointers described so far:
 164 <tscreen><verb>
 165         mkWeak :: k -> v -> IO () -> IO (Weak v)
 166 </verb></tscreen>
 167 <tt/mkWeak/ takes a key of any type <tt/k/ and a value of any type
 168 <tt/v/, as well as a finaliser, and returns a weak pointer of type
 169 <tt/Weak v/.
 170
 171 <tt/deRefWeak/ returns the <em/value/ only, not the key, as its
 172 type (given above) implies:
 173 <tscreen><verb>
 174         deRefWeak :: Weak a -> IO (Maybe a)
 175 </verb></tscreen>
 176 However, <tt/deRefWeak/ returns <tt/Nothing/ if the <em/key/, not the
 177 value, has died.  Furthermore, references from the value to the key
 178 do not keep the key alive, in the same way that the finaliser does
 179 not keep the key alive.
 180
 181 Simple weak pointers are readily defined in terms of these more general
 182 weak pointers:
 183 <tscreen><verb>
 184   mkWeakPtr :: a -> IO () -> IO (Weak a)
 185   mkWeakPtr v f = mkWeak v v f
 186 </verb></tscreen>
 187
 188 These more general weak pointers are enough to implement memo
 189 tables properly.
 190
 191 <sect1> A precise semantics
 192 <p>
 193 The above informal specification is fine for simple situations,
 194 but matters can get complicated.  In particular, it needs to
 195 be clear exactly when a key dies, so that any weak pointers
 196 that refer to it can be finalised.
 197 Suppose, for example, the value of one weak pointer refers
 198 to the key of another...does that keep the key alive?
 199
 200 The behaviour is simply this:
 201
 202 <itemize>
 203 <item> If a weak pointer (object) refers to an <em/unreachable/
 204 key, it may be finalised.
 205 <item> Finalisation means (a) arrange that subsequent calls
 206 to <tt/deRefWeak/ return <tt/Nothing/; and (b) run the finaliser.
 207 </itemize>
 208
 209 This behaviour depends on what it means for a key to be reachable.
 210 Informally,
 211 something is reachable if it can be reached by following ordinary
 212 pointers from the root set, but not following weak pointers.
 213 We define reachability more precisely as
 214 follows
 215 A heap object is reachable if:
 216
 217 <itemize>
 218 <item> It is directly pointed to by a reachable object, other than
 219 a weak pointer object.
 220 <item> It is a weak pointer object whose key is reachable.
 221 <item> It is the value or finaliser of an object whose key is
 222 reachable.
 223 </itemize>
 224
 225 Notice that a pointer to the key from its associated
 226 value or finaliser does not make the key reachable.
 227 However, if the key is reachable some other way, then the value
 228 and the finaliser are reachable, and so, therefore, are any other
 229 keys they refer to directly or indirectly.
 230
 231
 232 <sect1>Finalisation for foreign objects
 233 <p>
 234
 235 A foreign object is some data that lives outside the Haskell heap, for
 236 example some <tt/malloc/ed data in C land.  It's useful to be able to
 237 know when the Haskell program no longer needs the <tt/malloc/ed data,
 238 so it can be <tt/free/d.  We can use weak pointers and finalisers for
 239 this, but we have to be careful: the foreign data is usually
 240 referenced by an address, ie. an <tt/Addr/ (see Section <ref
 241 name="Addr" id="sec:Addr">), and we must retain the invariant that
 242 <em/if the Haskell program still needs the foreign object, then it
 243 retains the <tt/Addr/ object in the heap/.  This invariant isn't
 244 guaranteed to hold if we use <tt/Addr/, because an <tt/Addr/ consists
 245 of a box around a raw address <tt/Addr#/.  If the Haskell program can
 246 manipulate the <tt/Addr#/ object independently of the heap-resident
 247 <tt/Addr/, then the foreign object could be inadvertently finalised
 248 early, because a weak pointer to the <tt/Addr/ would find no more
 249 references to its key and trigger the finaliser despite the fact that
 250 the program still holds the <tt/Addr#/ and intends to use it.
 251
 252 To avoid this somewhat subtle race condition, we use another type of
 253 foreign address, called <tt/ForeignObj/.  Historical note:
 254 <tt/ForeignObj/ is identical to the old <tt/ForeignObj/ except that it
 255 no longer supports finalisation - that's provided by the weak
 256 pointer/finalisation mechanism above.
 257
 258 A <tt/ForeignObj/ is basically an address, but the <tt/ForeignObj/
 259 itself is a heap-resident object and can therefore be watched by weak
 260 pointers.  A <tt/ForeignObj/ can be passed to C functions (in which
 261 case the C function gets a straightforward pointer), but it cannot be
 262 decomposed into an <tt/Addr#/.  Operations on <tt/ForeignObj/ are
 263 provided by the <tt/Foreign/ module (see Section <ref name="Foreign"
 264 id="sec:Foreign">).