ghc/docs/state_interface/state-interface.verb

   1 \documentstyle[a4wide,grasp]{article}
   2 \renewcommand{\textfraction}{0.1}
   3 \renewcommand{\floatpagefraction}{0.9}
   4 \renewcommand{\dblfloatpagefraction}{0.9}
   5
   6 \sloppy
   7
   8
   9 \begin{document}
  10
  11 \title{GHC prelude: types and operations}
  12 \author{Simon L Peyton Jones \and John Launchbury \and Will Partain}
  13
  14 \maketitle
  15 \tableofcontents
  16
  17 This ``state interface document'' corresponds to Glasgow Haskell
  18 version~0.23.
  19
  20 \section{Really primitive stuff}
  21
  22 This section defines all the types which are primitive in Glasgow Haskell, and the
  23 operations provided for them.
  24
  25 A primitive type is one which cannot be defined in Haskell, and which is
  26 therefore built into the language and compiler.
  27 Primitive types are always unboxed; that is, a value of primitive type cannot be
  28 bottom.
  29
  30 Primitive values are often represented by a simple bit-pattern, such as @Int#@,
  31 @Float#@, @Double#@.  But this is not necessarily the case: a primitive value
  32 might be represented by a pointer to a heap-allocated object.  Examples include
  33 @Array#@, the type of primitive arrays.  You might think this odd: doesn't being
  34 heap-allocated mean that it has a box?  No, it does not.  A primitive array is
  35 heap-allocated because it is too big a value to fit in a register, and would be
  36 too expensive to copy around; in a sense, it is accidental that it is represented
  37 by a pointer.  If a pointer represents a primitive value, then it really does
  38 point to that value: no unevaluated thunks, no indirections...nothing can be at
  39 the other end of the pointer than the primitive value.
  40
  41 This section also describes a few non-primitive types, which are needed
  42 to express the result types of some primitive operations.
  43
  44 \subsection{Character and numeric types}
  45
  46 There are the following obvious primitive types:
  47 @
  48 type Char#
  49 type Int#       -- see also Word# and Addr#, later
  50 type Float#
  51 type Double#
  52 @
  53 If you want to know their exact equivalents in C, see
  54 @ghc/includes/StgTypes.lh@ in the GHC source.
  55
  56 Literals for these types may be written as follows:
  57 @
  58 1#              an Int#
  59 1.2#            a Float#
  60 1.34##          a Double#
  61 'a'#            a Char#; for weird characters, use '\o<octal>'#
  62 "a"#            an Addr# (a `char *')
  63 @
  64
  65 \subsubsection{Comparison operations}
  66 @
  67 {gt,ge,eq,ne,lt,le}Char# :: Char# -> Char# -> Bool
  68     -- ditto for Int#, Word#, Float#, Double#, and Addr#
  69 @
  70
  71 \subsubsection{Unboxed-character operations}
  72 @
  73 ord# :: Char# -> Int#
  74 chr# :: Int# -> Char#
  75 @
  76
  77 \subsubsection{Unboxed-@Int@ operations}
  78 @
  79 {plus,minus,times,quot,div,rem}Int# :: Int# -> Int# -> Int#
  80 negateInt# :: Int# -> Int#
  81 @
  82 NB: No error/overflow checking!
  83
  84 \subsubsection{Unboxed-@Float@ and @Double@ operations}
  85 @
  86 {plus,minus,times,divide}Float# :: Float# -> Float# -> Float#
  87 negateFloat# :: Float# -> Float#
  88
  89 float2Int#      :: Float# -> Int#   -- just a cast, no checking!
  90 int2Float#      :: Int# -> Float#
  91
  92 expFloat#       :: Float# -> Float#
  93 logFloat#       :: Float# -> Float#
  94 sqrtFloat#      :: Float# -> Float#
  95 sinFloat#       :: Float# -> Float#
  96 cosFloat#       :: Float# -> Float#
  97 tanFloat#       :: Float# -> Float#
  98 asinFloat#      :: Float# -> Float#
  99 acosFloat#      :: Float# -> Float#
 100 atanFloat#      :: Float# -> Float#
 101 sinhFloat#      :: Float# -> Float#
 102 coshFloat#      :: Float# -> Float#
 103 tanhFloat#      :: Float# -> Float#
 104 powerFloat#     :: Float# -> Float# -> Float#
 105 @
 106 There's an exactly-matching set of unboxed-@Double@ ops; replace
 107 @Float#@ with @Double#@ in the list above.  There are two
 108 coercion functions for @Float#@/@Double#@:
 109 @
 110 float2Double#   :: Float# -> Double#
 111 double2Float#   :: Double# -> Float#
 112 @
 113 The primitive versions of @encodeFloat@/@decodeFloat@:
 114 @
 115 encodeFloat#    :: Int# -> Int# -> ByteArray#   -- Integer mantissa
 116                 -> Int#                         -- Int exponent
 117                 -> Float#
 118
 119 decodeFloat#    :: Float#
 120                 -> _ReturnIntAndGMP
 121 @
 122 (And the same for @Double#@s.)
 123
 124 \subsection{Operations on/for @Integers@ (interface to GMP)}
 125 \label{sect:horrid-Integer-pairing-types}
 126
 127 We implement @Integers@ (arbitrary-precision integers) using the GNU
 128 multiple-precision (GMP) package.
 129
 130 The data type for @Integer@ must mirror that for @MP_INT@ in @gmp.h@
 131 (see @gmp.info@).  It comes out as:
 132 @
 133 data Integer = J# Int# Int# ByteArray#
 134 @
 135 So, @Integer@ is really just a ``pairing'' type for a particular
 136 collection of primitive types.
 137
 138 The operations in the GMP return other combinations of
 139 GMP-plus-something, so we need ``pairing'' types for those, too:
 140 @
 141 type _ReturnGMP       = Integer -- synonym
 142 data _Return2GMPs     = _Return2GMPs Int# Int# ByteArray#
 143                                      Int# Int# ByteArray#
 144 data _ReturnIntAndGMP = _ReturnIntAndGMP Int#
 145                                          Int# Int# ByteArray#
 146
 147 -- ????? something to return a string of bytes (in the heap?)
 148 @
 149 The primitive ops to support @Integers@ use the ``pieces'' of the
 150 representation, and are as follows:
 151 @
 152 negateInteger#  :: Int# -> Int# -> ByteArray# -> Integer
 153
 154 {plus,minus,times}Integer# :: Int# -> Int# -> ByteArray#
 155                            -> Int# -> Int# -> ByteArray#
 156                            -> Integer
 157
 158 cmpInteger# :: Int# -> Int# -> ByteArray#
 159             -> Int# -> Int# -> ByteArray#
 160             -> Int# -- -1 for <; 0 for ==; +1 for >
 161
 162 divModInteger#, quotRemInteger#
 163         :: Int# -> Int# -> ByteArray#
 164         -> Int# -> Int# -> ByteArray#
 165         -> _Return2GMPs
 166
 167 integer2Int# :: Int# -> Int# -> ByteArray#
 168              -> Int#
 169
 170 int2Integer#  :: Int#  -> Integer -- NB: no error-checking on these two!
 171 word2Integer# :: Word# -> Integer
 172
 173 addr2Integer# :: Addr# -> Integer
 174         -- the Addr# is taken to be a `char *' string
 175         -- to be converted into an Integer
 176 @
 177
 178
 179 \subsection{Words and addresses}
 180
 181 A @Word#@ is used for bit-twiddling operations.  It is the same size as
 182 an @Int#@, but has no sign nor any arithmetic operations.
 183 @
 184 type Word#      -- Same size/etc as Int# but *unsigned*
 185 type Addr#      -- A pointer from outside the "Haskell world" (from C, probably);
 186                 -- described under "arrays"
 187 @
 188 @Word#@s and @Addr#@s have the usual comparison operations.
 189 Other unboxed-@Word@ ops (bit-twiddling and coercions):
 190 @
 191 and#, or# :: Word# -> Word# -> Word#
 192
 193 not# :: Word# -> Word#
 194
 195 shiftL#, shiftRA#, shiftRL# :: Word# -> Int# -> Word#
 196         -- shift left, right arithmetic, right logical
 197
 198 iShiftL#, iShiftRA#, iShiftRL# :: Int# -> Int# -> Int#
 199         -- same shift ops, but on Int#s
 200
 201 int2Word#       :: Int#  -> Word# -- just a cast, really
 202 word2Int#       :: Word# -> Int#
 203 @
 204
 205 Unboxed-@Addr@ ops (C casts, really):
 206 @
 207 int2Addr#       :: Int#  -> Addr#
 208 addr2Int#       :: Addr# -> Int#
 209 @
 210 Operations for indexing off of C pointers (@Addr#@s) to snatch values
 211 are listed under ``arrays''.
 212
 213 \subsection{Arrays}
 214
 215 The type @Array# elt@ is the type of primitive,
 216 unboxed arrays of values of type @elt@.
 217 @
 218 type Array# elt
 219 @
 220 @Array#@ is more primitive than a Haskell
 221 array --- indeed, Haskell arrays are implemented using @Array#@ ---
 222 in that an @Array#@ is indexed only by @Int#@s, starting at zero.  It is also
 223 more primitive by virtue of being unboxed.  That doesn't mean that it isn't
 224 a heap-allocated object --- of course, it is.  Rather, being unboxed means
 225 that it is represented by a pointer to the array itself, and not to a thunk
 226 which will evaluate to the array (or to bottom).
 227 The components of an @Array#@ are themselves boxed.
 228
 229 The type @ByteArray#@ is similar to @Array#@, except that it contains
 230 just a string of (non-pointer) bytes.
 231 @
 232 type ByteArray#
 233 @
 234 Arrays of these types are useful when a Haskell program wishes to
 235 construct a value to pass to a C procedure.  It is also possible to
 236 use them to build (say) arrays of unboxed characters for internal use
 237 in a Haskell program.  Given these uses, @ByteArray#@ is deliberately
 238 a bit vague about the type of its components.  Operations are provided
 239 to extract values of type @Char#@, @Int#@, @Float#@, @Double#@, and
 240 @Addr#@ from arbitrary offsets within a @ByteArray#@.  (For type @Foo#@,
 241 the $i$th offset gets you the $i$th @Foo#@, not the @Foo#@ at byte-position $i$.  Mumble.)
 242 (If you want a @Word#@, grab an @Int#@, then coerce it.)
 243
 244 Lastly, we have static byte-arrays, of type @Addr#@ [mentioned
 245 previously].  (Remember the duality between arrays and pointers in C.)
 246 Arrays of this types are represented by a pointer to an array in the
 247 world outside Haskell, so this pointer is not followed by the garbage
 248 collector.  In other respects they are just like @ByteArray#@.  They
 249 are only needed in order to pass values from C to Haskell.
 250
 251 \subsubsection{Reading and writing.}
 252
 253 Primitive arrays are linear, and indexed starting at zero.
 254
 255 The size and indices of a @ByteArray#@, @Addr#@, and
 256 @MutableByteArray#@ are all in bytes.  It's up to the program to
 257 calculate the correct byte offset from the start of the array.  This
 258 allows a @ByteArray#@ to contain a mixture of values of different
 259 type, which is often needed when preparing data for and unpicking
 260 results from C.  (Umm... not true of indices... WDP 95/09)
 261
 262 {\em Should we provide some @sizeOfDouble#@ constants?}
 263
 264 Out-of-range errors on indexing should be caught by the code which
 265 uses the primitive operation; the primitive operations themselves do
 266 {\em not} check for out-of-range indexes. The intention is that the
 267 primitive ops compile to one machine instruction or thereabouts.
 268
 269 We use the terms ``reading'' and ``writing'' to refer to accessing {\em mutable}
 270 arrays (see Section~\ref{sect:mutable}), and ``indexing''
 271 to refer to reading a value from an {\em immutable}
 272 array.
 273
 274 If you want to read/write a @Word#@, read an @Int#@ and coerce.
 275
 276 Immutable byte arrays are straightforward to index (all indices in bytes):
 277 @
 278 indexCharArray#   :: ByteArray# -> Int# -> Char#
 279 indexIntArray#    :: ByteArray# -> Int# -> Int#
 280 indexAddrArray#   :: ByteArray# -> Int# -> Addr#
 281 indexFloatArray#  :: ByteArray# -> Int# -> Float#
 282 indexDoubleArray# :: ByteArray# -> Int# -> Double#
 283
 284 indexCharOffAddr#   :: Addr# -> Int# -> Char#
 285 indexIntOffAddr#    :: Addr# -> Int# -> Int#
 286 indexFloatOffAddr#  :: Addr# -> Int# -> Float#
 287 indexDoubleOffAddr# :: Addr# -> Int# -> Double#
 288 indexAddrOffAddr#   :: Addr# -> Int# -> Addr#   -- Get an Addr# from an Addr# offset
 289 @
 290 The last of these, @indexAddrOffAddr#@, extracts an @Addr#@ using an offset
 291 from another @Addr#@, thereby providing the ability to follow a chain of
 292 C pointers.
 293
 294 Something a bit more interesting goes on when indexing arrays of boxed
 295 objects, because the result is simply the boxed object. So presumably
 296 it should be entered --- we never usually return an unevaluated
 297 object!  This is a pain: primitive ops aren't supposed to do
 298 complicated things like enter objects.  The current solution is to
 299 return a lifted value, but I don't like it!
 300 @
 301 indexArray#       :: Array# elt -> Int# -> _Lift elt    -- Yuk!
 302 @
 303
 304 \subsubsection{The state type}
 305
 306 The primitive type @State#@ represents the state of a state transformer.
 307 It is parameterised on the desired type of state, which serves to keep
 308 states from distinct threads distinct from one another.  But the {\em only}
 309 effect of this parameterisation is in the type system: all values of type
 310 @State#@ are represented in the same way.  Indeed, they are all
 311 represented by nothing at all!  The code generator ``knows'' to generate no
 312 code, and allocate no registers etc, for primitive states.
 313 @
 314 type State# s
 315 @
 316
 317 The type @_RealWorld@ is truly opaque: there are no values defined
 318 of this type, and no operations over it.  It is ``primitive'' in that
 319 sense---but it is {\em not unboxed!} Its only role in life is to be the type
 320 which distinguishes the @PrimIO@ state transformer (see
 321 Section~\ref{sect:io-spec}).
 322 @
 323 data _RealWorld
 324 @
 325
 326 \subsubsection{States}
 327
 328 A single, primitive, value of type @State# _RealWorld@ is provided.
 329 @
 330 realWorld# :: State# _RealWorld
 331 @
 332 (Note: in the compiler, not a @PrimOp@; just a mucho magic @Id@.)
 333
 334 \subsection{State pairing types}
 335 \label{sect:horrid-pairing-types}
 336
 337 This subsection defines some types which, while they aren't quite primitive
 338 because we can define them in Haskell, are very nearly so.  They define
 339 constructors which pair a primitive state with a value of each primitive type.
 340 They are required to express the result type of the primitive operations in the
 341 state monad.
 342 @
 343 data StateAndPtr#    s elt = StateAndPtr#    (State# s) elt
 344
 345 data StateAndChar#   s     = StateAndChar#   (State# s) Char#
 346 data StateAndInt#    s     = StateAndInt#    (State# s) Int#
 347 data StateAndWord#   s     = StateAndWord#   (State# s) Word#
 348 data StateAndFloat#  s     = StateAndFloat#  (State# s) Float#
 349 data StateAndDouble# s     = StateAndDouble# (State# s) Double#
 350 data StateAndAddr#   s     = StateAndAddr#   (State# s) Addr#
 351
 352 data StateAndStablePtr# s a = StateAndStablePtr#  (State# s) (StablePtr# a)
 353 data StateAndForeignObj# s  = StateAndForeignObj# (State# s) ForeignObj#
 354 data StateAndSynchVar#  s a = StateAndSynchVar#  (State# s) (SynchVar# a)
 355
 356 data StateAndArray#            s elt = StateAndArray#        (State# s) (Array# elt)
 357 data StateAndMutableArray#     s elt = StateAndMutableArray# (State# s) (MutableArray# s elt)
 358 data StateAndByteArray#        s = StateAndByteArray#        (State# s) ByteArray#
 359 data StateAndMutableByteArray# s = StateAndMutableByteArray# (State# s) (MutableByteArray# s)
 360 @
 361
 362
 363 \subsection{Mutable arrays}
 364 \label{sect:mutable}
 365
 366 Corresponding to @Array#@ and @ByteArray#@,
 367 we have the types of mutable versions of each.
 368 In each case, the representation is a pointer
 369 to a suitable block of (mutable) heap-allocated storage.
 370 @
 371 type MutableArray# s elt
 372 type MutableByteArray# s
 373 @
 374 \subsubsection{Allocation.}
 375
 376 Mutable arrays can be allocated.
 377 Only pointer-arrays are initialised; arrays of non-pointers are filled
 378 in by ``user code'' rather than by the array-allocation primitive.
 379 Reason: only the pointer case has to worry about GC striking with a
 380 partly-initialised array.
 381 @
 382 newArray#       :: Int# -> elt -> State# s -> StateAndMutableArray# s elt
 383
 384 newCharArray#   :: Int# -> State# s -> StateAndMutableByteArray# s
 385 newIntArray#    :: Int# -> State# s -> StateAndMutableByteArray# s
 386 newAddrArray#   :: Int# -> State# s -> StateAndMutableByteArray# s
 387 newFloatArray#  :: Int# -> State# s -> StateAndMutableByteArray# s
 388 newDoubleArray# :: Int# -> State# s -> StateAndMutableByteArray# s
 389 @
 390 The size of a @ByteArray#@ is given in bytes.
 391
 392 \subsubsection{Reading and writing}
 393
 394 %OLD: Remember, offsets in a @MutableByteArray#@ are in bytes.
 395 @
 396 readArray#       :: MutableArray# s elt -> Int# -> State# s -> StateAndPtr#    s elt
 397 readCharArray#   :: MutableByteArray# s -> Int# -> State# s -> StateAndChar#   s
 398 readIntArray#    :: MutableByteArray# s -> Int# -> State# s -> StateAndInt#    s
 399 readAddrArray#   :: MutableByteArray# s -> Int# -> State# s -> StateAndAddr#   s
 400 readFloatArray#  :: MutableByteArray# s -> Int# -> State# s -> StateAndFloat#  s
 401 readDoubleArray# :: MutableByteArray# s -> Int# -> State# s -> StateAndDouble# s
 402
 403 writeArray#       :: MutableArray# s elt -> Int# -> elt     -> State# s -> State# s
 404 writeCharArray#   :: MutableByteArray# s -> Int# -> Char#   -> State# s -> State# s
 405 writeIntArray#    :: MutableByteArray# s -> Int# -> Int#    -> State# s -> State# s
 406 writeAddrArray#   :: MutableByteArray# s -> Int# -> Addr#   -> State# s -> State# s
 407 writeFloatArray#  :: MutableByteArray# s -> Int# -> Float#  -> State# s -> State# s
 408 writeDoubleArray# :: MutableByteArray# s -> Int# -> Double# -> State# s -> State# s
 409 @
 410
 411 \subsubsection{Equality.}
 412
 413 One can take ``equality'' of mutable arrays.  What is compared is the
 414 {\em name} or reference to the mutable array, not its contents.
 415 @
 416 sameMutableArray#     :: MutableArray# s elt -> MutableArray# s elt -> Bool
 417 sameMutableByteArray# :: MutableByteArray# s -> MutableByteArray# s -> Bool
 418 @
 419
 420 \subsubsection{Freezing mutable arrays}
 421
 422 Only unsafe-freeze has a primitive.  (Safe freeze is done directly in Haskell
 423 by copying the array and then using @unsafeFreeze@.)
 424 @
 425 unsafeFreezeArray#     :: MutableArray# s elt -> State# s -> StateAndArray#     s elt
 426 unsafeFreezeByteArray# :: MutableByteArray# s -> State# s -> StateAndByteArray# s
 427 @
 428
 429 \subsubsection{Stable pointers}
 430
 431 {\em Andy's comment.} {\bf Errors:} The following is not strictly true: the current
 432 implementation is not as polymorphic as claimed.  The reason for this
 433 is that the C programmer will have to use a different entry-routine
 434 for each type of stable pointer.  At present, we only supply a very
 435 limited number (1) of these routines.  It might be possible to
 436 increase the range of these routines by providing general purpose
 437 entry points to apply stable pointers to (stable pointers to)
 438 arguments and to enter (stable pointers to) boxed primitive values.
 439 {\em End of Andy's comment.}
 440
 441 A stable pointer is a name for a Haskell object which can be passed to the
 442 external world.  It is ``stable'' in the sense that the name does not change when
 443 the Haskell garbage collector runs --- in contrast to the address of the object
 444 which may well change.
 445
 446 The stable pointer type is parameterised by the type of the thing which is named.
 447 @
 448 type StablePtr# a
 449 @
 450 A stable pointer is represented by an index into the (static)
 451 @StablePointerTable@.  The Haskell garbage collector treats the
 452 @StablePointerTable@ as a source of roots for GC.
 453
 454 The @makeStablePointer@ function converts a value into a stable pointer.
 455 It is part of the @PrimIO@ monad, because we want to be sure we don't
 456 allocate one twice by accident, and then only free one of the copies.
 457 @
 458 makeStablePointer#  :: a -> State# _RealWorld -> StateAndStablePtr# _RealWorld a
 459 freeStablePointer#  :: StablePtr# a -> State# _RealWorld -> State# _RealWorld
 460 deRefStablePointer# :: StablePtr# a -> State# _RealWorld -> StateAndPtr _RealWorld a
 461 @
 462 There is also a C procedure @FreeStablePtr@ which frees a stable pointer.
 463
 464 %
 465 % Rewritten and updated for MallocPtr++ -- 4/96 SOF
 466 %
 467 \subsubsection{Foreign objects}
 468
 469 A \tr{ForeignObj} is a reference to an object outside the Haskell
 470 world (i.e., from the C world, or a reference to an object on another
 471 machine completely.), where the Haskell world has been told ``Let me
 472 know when you're finished with this ...''.
 473
 474 The \tr{ForeignObj} type is just a special @Addr#@ ({\em not} parameterised).
 475 @
 476 type ForeignObj#
 477 @
 478
 479 A typical use of \tr{ForeignObj} is in constructing Haskell bindings
 480 to external libraries. A good example is that of writing a binding to
 481 an image-processing library (which was actually the main motivation
 482 for implementing \tr{ForeignObj}'s precursor, \tr{MallocPtr}). The
 483 images manipulated are not stored in the Haskell heap, either because
 484 the library insist on allocating them internally or we (sensibly)
 485 decide to spare the GC from having to heave heavy images around.
 486
 487 @
 488 data Image = Image ForeignObj#
 489
 490 instance _CCallable Image
 491 @
 492
 493 The \tr{ForeignObj#} type is then used to refer to the externally
 494 allocated image, and to acheive some type safety, the Haskell binding
 495 defines the @Image@ data type. So, a value of type \tr{ForeignObj#} is
 496 used to ``box'' up an external reference into a Haskell heap object
 497 that we can then indirectly reference:
 498
 499 @
 500 createImage :: (Int,Int) -> PrimIO Image
 501 @
 502
 503 So far, this looks just like an @Addr#@ type, but \tr{ForeignObj#}
 504 offers a bit more, namely that we can specify a {\em finalisation
 505 routine} to invoke when the \tr{ForeignObj#} is discarded by the
 506 GC. The garbage collector invokes the finalisation routine associated
 507 with the \tr{ForeignObj#}, saying `` Thanks, I'm through with this
 508 now..'' For the image-processing library, the finalisation routine could for
 509 the images free up memory allocated for them. The finalisation routine has
 510 currently to be written in C (the finalisation routine can in turn call on
 511 @FreeStablePtr@ to deallocate a stable pointer.).
 512
 513 Associating a finalisation routine with an external object is done by
 514 \tr{makeForeignObj#}:
 515
 516 @
 517 makeForeignObj# :: Addr# -- foreign reference
 518                 -> Addr# -- pointer to finalisation routine
 519                 -> StateAndForeignObj# _RealWorld ForeignObj#
 520 @
 521
 522 (Implementation: a linked list of all @ForeignObj#@s is maintained to allow the
 523  garbage collector to detect when a @ForeignObj#@ becomes garbage.)
 524
 525 Like @Array@, @ForeignObj#@s are represented by heap objects.
 526
 527 {\bf ToDo:} Decide whether @FreeCHeapPointer@ is allowed to call on a
 528 stable pointer. (I sincerely hope not since we will still be in the
 529 GC at this point.)
 530
 531 \subsubsection{Synchronizing variables (I-vars, M-vars)}
 532
 533 ToDo ToDo ToDo
 534
 535 @
 536 type SynchVar# s elt    -- primitive
 537
 538 newSynchVar#:: State# s -> StateAndSynchVar# s elt
 539
 540 takeMVar#   :: SynchVar# s elt -> State# s -> StateAndPtr# s elt
 541 putMVar#    :: SynchVar# s elt -> State# s -> State# s
 542
 543 readIVar#   :: SynchVar# s elt -> State# s -> StateAndPtr# s elt
 544 writeIVar#  :: SynchVar# s elt -> State# s -> State# s
 545 @
 546
 547 \subsubsection{Controlling the garbage collector}
 548
 549 The C function {\tt PerformGC\/}, allows the C world to force Haskell
 550 to do a garbage collection.  It can only be called while Haskell
 551 is performing a C Call.
 552
 553 Note that this function can be used to define a Haskell IO operation
 554 with the same effect:
 555 @
 556 >       performGCIO :: PrimIO ()
 557 >       performGCIO = _ccall_gc_ PerformGC
 558 @
 559
 560 {\bf ToDo:} Is there any need for abnormal/normal termination to force
 561 a GC too?  Is there any need for a function that provides finer
 562 control over GC: argument = amount of space required; result = amount
 563 of space recovered.
 564
 565 \subsection{@spark#@ primitive operation (for parallel execution)}
 566
 567 {\em ToDo: say something}  It's used in the unfolding for @par@.
 568
 569 \subsection{The @errorIO#@ primitive operation}
 570
 571 The @errorIO#@ primitive takes an argument of type @PrimIO@.  It aborts execution of
 572 the current program, and continues instead by performing the given @PrimIO@ value
 573 on the current state of the world.
 574 @
 575 errorIO# :: PrimIO () -> a
 576 @
 577
 578 \subsection{C Calls}
 579
 580 {\bf ToDo:} current implementation has state variable as second
 581 argument not last argument.
 582
 583 The @ccall#@ primitive can't be given an ordinary type, because it has
 584 a variable number of arguments.  The nearest we can get is:
 585 @
 586 ccall# :: CRoutine -> a1# -> ... -> an# -> State# _RealWorld -> StateAndR# _RealWorld
 587 @
 588 where the type variables @a1#@\ldots@an#@ and @r#@ can be instantiated by any
 589 primitive type, and @StateAndR#@ is the appropriate pairing type from
 590 Section~\ref{sect:horrid-pairing-types}.  The @CRoutine@
 591 isn't a proper Haskell type at all; it just reminds us that @ccall#@ needs to
 592 know what C routine to call.
 593
 594 This notation is really short for a massive family of @ccall#@ primitives, one
 595 for each combination of types.  (Of course, the compiler simply remembers the
 596 types involved, and generates appropriate code when it finally spits out the C.)
 597
 598 Unlike all the other primitive operators, @ccall#@ is not bound to an in-scope
 599 identifier.  The only way it is possible to generate a @ccall#@ is via the
 600 @_ccall_@ construct.
 601
 602 All this applies equally to @casm#@:
 603 @
 604 casm#  :: CAsmString -> a1# -> ... -> an# -> State# _RealWorld -> StateAndR# _RealWorld
 605 @
 606
 607 %------------------------------------------------------------
 608 \section{Library stuff built with the Really Primitive Stuff}
 609
 610 \subsection{The state transformer monad}
 611
 612 \subsubsection{Types}
 613
 614 A state transformer is a function from a state to a pair of a result and a new
 615 state.
 616 @
 617 type _ST s a = _State s -> (a, _State s)
 618 @
 619 The @_ST@ type is {\em abstract}, so that the programmer cannot see its
 620 representation.  If he could, he could write bad things like:
 621 @
 622 bad :: _ST s a
 623 bad = \s -> ...(f s)...(g s)...
 624 @
 625 Here, @s@ is duplicated, which would be bad news.
 626
 627 A state is represented by a primitive state value, of type @State# s@,
 628 wrapped up in a @_State@ constructor.  The reason for boxing it in this
 629 way is so that we can be strict or lazy in the state.  (Remember, all
 630 primitive types are unboxed, and hence can't be bottom; but types built
 631 with @data@ are all boxed.)
 632 @
 633 data _State s = S# (State# s)
 634 @
 635
 636 \subsubsection{The state transformer combinators}
 637
 638 Now for the combinators, all of which live inside the @_ST@
 639 abstraction.  Notice that @returnST@ and @thenST@ are lazy in the
 640 state.
 641 @
 642 returnST :: a -> _ST s a
 643 returnST a s = (a, s)
 644
 645 thenST :: _ST s a -> (a -> _ST s b) -> _ST s b
 646 thenST m k s = let (r,new_s) = m s
 647                in
 648                k r new_s
 649
 650 fixST :: (a -> _ST s a) -> _ST s a
 651 fixST k s = let ans = k r s
 652                 (r,new_s) = ans
 653             in
 654             ans
 655 @
 656 The interesting one is, of course, @_runST@.  We can't infer its type!
 657 (It has a funny name because it must be wired into the compiler.)
 658 @
 659 -- _runST :: forall a. (forall s. _ST s a) -> a
 660 _runST m = case m (S# realWorld#) of
 661            (r,_) -> r
 662 @
 663
 664 \subsubsection{Other useful combinators}
 665
 666 There are various other standard combinators, all defined in terms the
 667 fundamental combinators above. The @seqST@ combinator is like
 668 @thenST@, except that it discards the result of the first state
 669 transformer:
 670 @
 671 seqST :: _ST s a -> _ST s b -> _ST s b
 672 seqST m1 m2 = m1 `thenST` (\_ -> m2)
 673 @
 674
 675 We also have {\em strict} (... in the state...) variants of the
 676 then/return combinators (same types as their pals):
 677 @
 678 returnStrictlyST a s@(S# _) = (a, s)
 679
 680 thenStrictlyST m k s@(S# _)
 681   = case (m s) of { (r, new_s@(S# _)) ->
 682     k r new_s }
 683
 684 seqStrictlyST m k = ... ditto, for seqST ...
 685 @
 686
 687 The combinator @listST@ takes a list of state transformers, and
 688 composes them in sequence, returning a list of their results:
 689 @
 690 listST :: [_ST s a] -> _ST s [a]
 691 listST []     = returnST []
 692 listST (m:ms) = m               `thenST` \ r ->
 693                 listST ms       `thenST` \ rs ->
 694                 returnST (r:rs)
 695 @
 696 The @mapST@ combinator ``lifts'' a function from a value to state
 697 transformers to one which works over a list of values:
 698 @
 699 mapST :: (a -> _ST s b) -> [a] -> _ST s [b]
 700 mapST f ms = listST (map f ms)
 701 @
 702 The @mapAndUnzipST@ combinator is similar to @mapST@, except that here the
 703 function returns a pair:
 704 @
 705 mapAndUnzipST :: (a -> _ST s (b,c)) -> [a] -> _ST s ([b],[c])
 706 mapAndUnzipST f (m:ms)
 707   = f m                 `thenST` \ ( r1,  r2) ->
 708     mapAndUnzipST f ms  `thenST` \ (rs1, rs2) ->
 709     returnST (r1:rs1, r2:rs2)
 710 @
 711
 712 \subsubsection{The @PrimIO@ monad}
 713 \label{sect:io-spec}
 714
 715 The @PrimIO@ type is defined in as a state transformer which manipulates the
 716 @_RealWorld@.
 717 @
 718 type PrimIO a = _ST _RealWorld a      -- Transparent
 719 @
 720 The @PrimIO@ type is an ordinary type synonym, transparent to the programmer.
 721
 722 The type @_RealWorld@ and value @realWorld#@ do not need to be hidden (although
 723 there is no particular point in exposing them).  Even having a value of type
 724 @realWorld#@ does not compromise safety, since the type @_ST@ is hidden.
 725
 726 It is type-correct to use @returnST@ in an I/O context, but it is a
 727 bit more efficient to use @returnPrimIO@.  The latter is strict in the
 728 state, which propagates backwards to all the earlier combinators
 729 (provided they are unfolded).  Why is it safe for @returnPrimIO@ to be
 730 strict in the state?  Because every context in which an I/O state
 731 transformer is used will certainly evaluate the resulting state; it is
 732 the state of the real world!
 733 @
 734 returnPrimIO :: a -> PrimIO a
 735 returnPrimIO a s@(S# _) -> (a, s)
 736 @
 737 We provide strict versions of the other combinators too.
 738 @
 739 thenPrimIO m k s = case m s of
 740                      (r,s) -> k r s
 741 @
 742 @fixPrimIO@ has to be lazy, though!
 743 @
 744 fixPrimIO  = fixST
 745 @
 746 The other combinators are just the same as before, but use the strict
 747 @thenPrimIO@ and @returnPrimIO@ for efficiency.
 748 @
 749 foldrPrimIO f z []     = z
 750 foldrPrimIO f z (m:ms) = foldrPrimIO f z ms `thenPrimIO` \ ms' ->
 751                          f m ms'
 752
 753 listPrimIO ms = foldrPrimIO (\ a xs -> a `thenPrimIO` \ x -> returnPrimIO (x : xs))
 754                 (returnPrimIO []) ms
 755
 756 mapPrimIO f ms = listPrimIO (map f ms)
 757
 758 mapAndUnzipPrimIO f (m:ms)
 759   = f m                     `thenPrimIO` \ ( r1,  r2) ->
 760     mapAndUnzipPrimIO f ms  `thenPrimIO` \ (rs1, rs2) ->
 761     returnPrimIO (r1:rs1, r2:rs2)
 762 @
 763
 764 \subsection{Arrays}
 765
 766 \subsubsection{Types}
 767
 768 @
 769 data Array      ix elt = _Array     (ix,ix) (Array# elt)
 770 data _ByteArray ix     = _ByteArray (ix,ix) ByteArray#
 771
 772 data _MutableArray     s ix elt = _MutableArray     (ix,ix) (MutableArray# s elt)
 773 data _MutableByteArray s ix     = _MutableByteArray (ix,ix) (MutableByteArray# s)
 774 @
 775
 776 \subsubsection{Operations on immutable arrays}
 777
 778 Ordinary array indexing is straightforward.
 779 @
 780 (!) :: Ix ix => Array ix elt -> ix -> elt
 781 @
 782 QUESTIONs: should @_ByteArray@s be indexed by Ints or ix?  With byte offsets
 783 or sized ones? (sized ones [WDP])
 784 @
 785 indexCharArray   :: Ix ix => _ByteArray ix -> ix -> Char
 786 indexIntArray    :: Ix ix => _ByteArray ix -> ix -> Int
 787 indexAddrArray   :: Ix ix => _ByteArray ix -> ix -> _Addr
 788 indexFloatArray  :: Ix ix => _ByteArray ix -> ix -> Float
 789 indexDoubleArray :: Ix ix => _ByteArray ix -> ix -> Double
 790 @
 791 @Addr@s are indexed straightforwardly by @Int@s.  Unlike the primitive
 792 operations, though, the offsets assume that the array consists entirely of the
 793 type of value being indexed, and so there's an implicit multiplication by
 794 the size of that value.  To access @Addr@s with mixed values requires
 795 you to do a DIY job using the primitives.
 796 @
 797 indexAddrChar :: Addr -> Int -> Char
 798 ...etc...
 799 indexStaticCharArray   :: Addr -> Int -> Char
 800 indexStaticIntArray    :: Addr -> Int -> Int
 801 indexStaticFloatArray  :: Addr -> Int -> Float
 802 indexStaticDoubleArray :: Addr -> Int -> Double
 803 indexStaticArray       :: Addr -> Int -> Addr
 804 @
 805
 806 \subsubsection{Operations on mutable arrays}
 807 @
 808 newArray     :: Ix ix => (ix,ix) -> elt -> _ST s (_MutableArray s ix elt)
 809 newCharArray :: Ix ix => (ix,ix) -> _ST s (_MutableByteArray s ix)
 810 ...
 811 @
 812
 813 @
 814 readArray   :: Ix ix => _MutableArray s ix elt -> ix -> _ST s elt
 815 readCharArray   :: Ix ix => _MutableByteArray s ix -> ix -> _ST s Char
 816 ...
 817 @
 818
 819 @
 820 writeArray  :: Ix ix => _MutableArray s ix elt -> ix -> elt -> _ST s ()
 821 writeCharArray  :: Ix ix => _MutableByteArray s ix -> ix -> Char -> _ST s ()
 822 ...
 823 @
 824
 825 @
 826 freezeArray :: Ix ix => _MutableArray s ix elt -> _ST s (Array ix elt)
 827 freezeCharArray :: Ix ix => _MutableByteArray s ix -> _ST s (_ByteArray ix)
 828 ...
 829 @
 830
 831 We have no need on one-function-per-type for unsafe freezing:
 832 @
 833 unsafeFreezeArray :: Ix ix => _MutableArray s ix elt -> _ST s (Array ix elt)
 834 unsafeFreezeByteArray :: Ix ix => _MutableByteArray s ix -> _ST s (_ByteArray ix)
 835 @
 836
 837 Sometimes we want to snaffle the bounds of one of these beasts:
 838 @
 839 boundsOfArray     :: Ix ix => _MutableArray s ix elt -> (ix, ix)
 840 boundsOfByteArray :: Ix ix => _MutableByteArray s ix -> (ix, ix)
 841 @
 842
 843 Lastly, ``equality'':
 844 @
 845 sameMutableArray     :: _MutableArray s ix elt -> _MutableArray s ix elt -> Bool
 846 sameMutableByteArray :: _MutableByteArray s ix -> _MutableByteArray s ix -> Bool
 847 @
 848
 849
 850 \subsection{Variables}
 851
 852 \subsubsection{Types}
 853
 854 Mutable variables are (for now anyway) implemented as arrays.  The @MutableVar@ type
 855 is opaque, so we can change the implementation later if we want.
 856 @
 857 type MutableVar s a = _MutableArray s Int a
 858 @
 859
 860 \subsubsection{Operations}
 861 @
 862 newVar   :: a -> _ST s (MutableVar s a)
 863 readVar  :: MutableVar s a -> _ST s a
 864 writeVar :: MutableVar s a -> a -> _ST s ()
 865 sameVar  :: MutableVar s a -> MutableVar s a -> Bool
 866 @
 867
 868 \subsection{Stable pointers}
 869
 870 Nothing exciting here, just simple boxing up.
 871 @
 872 data _StablePtr a = _StablePtr (StablePtr# a)
 873
 874 makeStablePointer :: a -> _StablePtr a
 875 freeStablePointer :: _StablePtr a -> PrimIO ()
 876 @
 877
 878 \subsection{Foreign objects}
 879
 880 Again, just boxing up.
 881 @
 882 data _ForeignObj = _ForeignObj ForeignObj#
 883
 884 makeForeignObj :: _Addr -> _Addr -> PrimIO _ForeignObj
 885 @
 886
 887 \subsection{C calls}
 888
 889 Everything in this section goes for @_casm_@ too.
 890
 891 {\em ToDo: mention @_ccall_gc_@ and @_casm_gc_@...}
 892
 893 The @_ccall_@ construct has the following form:
 894 $$@_ccall_@~croutine~a_1~\ldots~a_n$$
 895 This whole construct has type $@PrimIO@~res$.
 896 The rules are these:
 897 \begin{itemize}
 898 \item
 899 The first ``argument'', $croutine$, must be the literal name of a C procedure.
 900 It cannot be a Haskell expression which evaluates to a string, etc; it must be
 901 simply the name of the procedure.
 902 \item
 903 The arguments $a_1, \ldots,a_n$ must be of {\em C-callable} type.
 904 \item
 905 The whole construct has type $@PrimIO@~ty$, where $ty$ is a {\em C-returnable} type.
 906 \end{itemize}
 907 A {\em boxed-primitive} type is both C-callable and C-returnable.
 908 A boxed primitive type is anything declared by:
 909 @
 910 data T = C# t
 911 @
 912 where @t@ is a primitive type.  Note that
 913 programmer-defined boxed-primitive types are perfectly OK:
 914 @
 915 data Widget = W# Int#
 916 data Screen = S# CHeapPtr#
 917 @
 918
 919 There are other types that can be passed to C (C-callable).  This
 920 table summarises (including the standard boxed-primitive types):
 921 @
 922 Boxed               Type of transferd   Corresp.     Which is
 923 Type                Prim. component     C type       *probably*...
 924 ------              ---------------     ------       -------------
 925 Char                Char#               StgChar       unsigned char
 926 Int                 Int#                StgInt        long int
 927 _Word               Word#               StgWord       unsigned long int
 928 _Addr               Addr#               StgAddr       char *
 929 Float               Float#              StgFloat      float
 930 Double              Double#             StgDouble     double
 931
 932 Array               Array#              StgArray      StgPtr
 933 _ByteArray          ByteArray#          StgByteArray  StgPtr
 934 _MutableArray       MutableArray#       StgArray      StgPtr
 935 _MutableByteArray   MutableByteArray#   StgByteArray  StgPtr
 936
 937 _State              State#              nothing!
 938
 939 _StablePtr          StablePtr#          StgStablePtr  StgPtr
 940 _ForeignObj         ForeignObj#         StgForeignObj StgPtr
 941 @
 942
 943 All of the above are {\em C-returnable} except:
 944 @
 945         Array, _ByteArray, _MutableArray, _MutableByteArray
 946 @
 947
 948 {\bf ToDo:} I'm pretty wary of @Array@ and @_MutableArray@ being in
 949 this list, and not too happy about @_State@ [WDP].
 950
 951 {\bf ToDo:} Can code generator pass all the primitive types?  Should this be
 952 extended to include {\tt Bool\/} (or any enumeration type?)
 953
 954 The type checker must be able to figure out just which of the C-callable/returnable
 955 types is being used.  If it can't, you have to add type signatures. For example,
 956 @
 957 f x = _ccall_ foo x
 958 @
 959 is not good enough, because the compiler can't work out what type @x@ is, nor
 960 what type the @_ccall_@ returns.  You have to write, say:
 961 @
 962 f :: Int -> PrimIO Float
 963 f x = _ccall_ foo x
 964 @
 965
 966 \subsubsection{Implementation}
 967
 968 The desugarer unwraps the @_ccall_@ construct by inserting the necessary
 969 evaluations etc to unbox the arguments.  For example, the body of the definition
 970 of @f@ above would become:
 971 @
 972         (\ s -> case x of { I# x# ->
 973                 case s of { S# s# ->
 974                 case ccall# [Int#,Float#] x# s# of { StateAndFloat# f# new_s# ->
 975                 (F# f#, S# new_s#)
 976                 }}})
 977 @
 978 Notice that the state, too, is unboxed.
 979
 980 The code generator must deal specially with primitive objects which
 981 are stored on the heap.
 982
 983 ... details omitted ...
 984
 985 %
 986 %More importantly, it must construct a C Heap Pointer heap-object after
 987 %a @_ccall_@ which returns a @MallocPtr#@.
 988 %
 989
 990 %--------------------------------------------------------
 991 \section{Non-primitive stuff that must be wired into GHC}
 992
 993 @
 994 data Char    = C# Char#
 995 data Int     = I# Int#
 996 data _Word   = W# Word#
 997 data _Addr   = A# Addr#
 998
 999 data Float   = F# Float#
1000 data Double  = D# Double#
1001 data Integer = J# Int# Int# ByteArray#
1002
1003 -- and the other boxed-primitive types:
1004     Array, _ByteArray, _MutableArray, _MutableByteArray,
1005     _StablePtr, _ForeignObj
1006
1007 data Bool     = False | True
1008 data CMP_TAG# = LT# | EQ# | GT#  -- used in derived comparisons
1009
1010 data List a = [] | a : (List a)
1011 -- tuples...
1012
1013 data Ratio a  = a :% a
1014 type Rational = Ratio Integer
1015
1016 data {Request,Response,etc} -- so we can check the type of "main"
1017
1018 data _Lift a = _Lift a    -- used Yukkily as described elsewhere
1019
1020 type String  = [Char]    -- convenience, only
1021 @
1022
1023 %------------------------------------------------------------
1024 \section{Programmer interface(s)}
1025
1026 \subsection{The bog-standard interface}
1027
1028 If you rely on the implicit @import Prelude@ that GHC normally does
1029 for you, and if you don't use any weird flags (notably
1030 @-fglasgow-exts@), and if you don't import one of the fairly-magic
1031 @PreludeGla*@ interfaces, then GHC should work {\em exactly} as the
1032 Haskell report says, and the full user namespaces should be available
1033 to you.
1034
1035 Exception: until we burn in the new names @_scc_@ and @_ccall_@, the
1036 names @scc@ and @ccall@ are still available.
1037
1038 \subsection{If you mess about with @import Prelude@...}
1039
1040 Innocent renaming and hiding, e.g.,
1041 @
1042 import Prelude hiding ( fromIntegral ) renaming (map to mop)
1043 @
1044 should work just fine (even it {\em is} atrocious programming practice).
1045
1046 There are some things you can do that will make GHC crash, e.g.,
1047 hiding a standard class:
1048 @
1049 import Prelude hiding ( Eq(..) )
1050 @
1051 Don't do that.
1052
1053 \subsection{Turning on Glasgow extensions with @-fglasgow-exts@}
1054
1055 If you turn on @-fglasgow-exts@, then all the primitive types and
1056 operations described herein are available.
1057
1058 It is possible that some name conflicts between your code and the
1059 wired-in things might spring to life (though we doubt it...).
1060 Change your names :-)
1061
1062 \subsection{@import PreludeGlaST@}
1063
1064 @
1065 type ST s a = _ST s a   -- so you don't need -fglasgow-exts...
1066 @
1067
1068 \subsection{@import PreludeGlaMisc@}
1069
1070 \end{document}
1071