\documentstyle[a4wide,grasp]{article} \renewcommand{\textfraction}{0.1} \renewcommand{\floatpagefraction}{0.9} \renewcommand{\dblfloatpagefraction}{0.9} \sloppy \renewcommand{\today}{July 1996} \begin{document} \title{GHC prelude: types and operations} \author{Simon L Peyton Jones \and John Launchbury \and Will Partain} \maketitle \tableofcontents This ``state interface document'' corresponds to Glasgow Haskell version~2.01. \section{Really primitive stuff} This section defines all the types which are primitive in Glasgow Haskell, and the operations provided for them. A primitive type is one which cannot be defined in Haskell, and which is therefore built into the language and compiler. Primitive types are always unboxed; that is, a value of primitive type cannot be bottom. Primitive values are often represented by a simple bit-pattern, such as @Int#@, @Float#@, @Double#@. But this is not necessarily the case: a primitive value might be represented by a pointer to a heap-allocated object. Examples include @Array#@, the type of primitive arrays. You might think this odd: doesn't being heap-allocated mean that it has a box? No, it does not. A primitive array is heap-allocated because it is too big a value to fit in a register, and would be too expensive to copy around; in a sense, it is accidental that it is represented by a pointer. If a pointer represents a primitive value, then it really does point to that value: no unevaluated thunks, no indirections...nothing can be at the other end of the pointer than the primitive value. This section also describes a few non-primitive types, which are needed to express the result types of some primitive operations. \subsection{Character and numeric types} There are the following obvious primitive types: @ type Char# type Int# -- see also Word# and Addr#, later type Float# type Double# @ If you want to know their exact equivalents in C, see @ghc/includes/StgTypes.lh@ in the GHC source. Literals for these types may be written as follows: @ 1# an Int# 1.2# a Float# 1.34## a Double# 'a'# a Char#; for weird characters, use '\o'# "a"# an Addr# (a `char *') @ \subsubsection{Comparison operations} @ {gt,ge,eq,ne,lt,le}Char# :: Char# -> Char# -> Bool -- ditto for Int#, Word#, Float#, Double#, and Addr# @ \subsubsection{Unboxed-character operations} @ ord# :: Char# -> Int# chr# :: Int# -> Char# @ \subsubsection{Unboxed-@Int@ operations} @ {plus,minus,times,quot,div,rem}Int# :: Int# -> Int# -> Int# negateInt# :: Int# -> Int# @ NB: No error/overflow checking! \subsubsection{Unboxed-@Double@ and @Float@ operations} @ {plus,minus,times,divide}Double# :: Double# -> Double# -> Double# negateDouble# :: Double# -> Double# float2Int# :: Double# -> Int# -- just a cast, no checking! int2Double# :: Int# -> Double# expDouble# :: Double# -> Double# logDouble# :: Double# -> Double# sqrtDouble# :: Double# -> Double# sinDouble# :: Double# -> Double# cosDouble# :: Double# -> Double# tanDouble# :: Double# -> Double# asinDouble# :: Double# -> Double# acosDouble# :: Double# -> Double# atanDouble# :: Double# -> Double# sinhDouble# :: Double# -> Double# coshDouble# :: Double# -> Double# tanhDouble# :: Double# -> Double# powerDouble# :: Double# -> Double# -> Double# @ There's an exactly-matching set of unboxed-@Float@ ops; replace @Double#@ with @Float#@ in the list above. There are two coercion functions for @Float#@/@Double#@: @ float2Double# :: Float# -> Double# double2Float# :: Double# -> Float# @ The primitive versions of @encodeDouble@/@decodeDouble@: @ encodeDouble# :: Int# -> Int# -> ByteArray# -- Integer mantissa -> Int# -- Int exponent -> Double# decodeDouble# :: Double# -> GHCbase.ReturnIntAndGMP @ (And the same for @Float#@s.) \subsection{Operations on/for @Integers@ (interface to GMP)} \label{sect:horrid-Integer-pairing-types} We implement @Integers@ (arbitrary-precision integers) using the GNU multiple-precision (GMP) package. NB: some of this might change if we upgrade to using GMP~2.x. The data type for @Integer@ must mirror that for @MP_INT@ in @gmp.h@ (see @gmp.info@). It comes out as: @ data Integer = J# Int# Int# ByteArray# @ So, @Integer@ is really just a ``pairing'' type for a particular collection of primitive types. The operations in the GMP return other combinations of GMP-plus-something, so we need ``pairing'' types for those, too: @ data Return2GMPs = Return2GMPs Int# Int# ByteArray# Int# Int# ByteArray# data ReturnIntAndGMP = ReturnIntAndGMP Int# Int# Int# ByteArray# -- ????? something to return a string of bytes (in the heap?) @ The primitive ops to support @Integers@ use the ``pieces'' of the representation, and are as follows: @ negateInteger# :: Int# -> Int# -> ByteArray# -> Integer {plus,minus,times}Integer# :: Int# -> Int# -> ByteArray# -> Int# -> Int# -> ByteArray# -> Integer cmpInteger# :: Int# -> Int# -> ByteArray# -> Int# -> Int# -> ByteArray# -> Int# -- -1 for <; 0 for ==; +1 for > divModInteger#, quotRemInteger# :: Int# -> Int# -> ByteArray# -> Int# -> Int# -> ByteArray# -> GHCbase.Return2GMPs integer2Int# :: Int# -> Int# -> ByteArray# -> Int# int2Integer# :: Int# -> Integer -- NB: no error-checking on these two! word2Integer# :: Word# -> Integer addr2Integer# :: Addr# -> Integer -- the Addr# is taken to be a `char *' string -- to be converted into an Integer @ \subsection{Words and addresses} A @Word#@ is used for bit-twiddling operations. It is the same size as an @Int#@, but has no sign nor any arithmetic operations. @ type Word# -- Same size/etc as Int# but *unsigned* type Addr# -- A pointer from outside the "Haskell world" (from C, probably); -- described under "arrays" @ @Word#@s and @Addr#@s have the usual comparison operations. Other unboxed-@Word@ ops (bit-twiddling and coercions): @ and#, or# :: Word# -> Word# -> Word# not# :: Word# -> Word# shiftL#, shiftRA#, shiftRL# :: Word# -> Int# -> Word# -- shift left, right arithmetic, right logical iShiftL#, iShiftRA#, iShiftRL# :: Int# -> Int# -> Int# -- same shift ops, but on Int#s int2Word# :: Int# -> Word# -- just a cast, really word2Int# :: Word# -> Int# @ Unboxed-@Addr@ ops (C casts, really): @ int2Addr# :: Int# -> Addr# addr2Int# :: Addr# -> Int# @ Operations for indexing off of C pointers (@Addr#@s) to snatch values are listed under ``arrays''. \subsection{Arrays} The type @Array# elt@ is the type of primitive, unboxed arrays of values of type @elt@. @ type Array# elt @ @Array#@ is more primitive than a Haskell array --- indeed, Haskell arrays are implemented using @Array#@ --- in that an @Array#@ is indexed only by @Int#@s, starting at zero. It is also more primitive by virtue of being unboxed. That doesn't mean that it isn't a heap-allocated object --- of course, it is. Rather, being unboxed means that it is represented by a pointer to the array itself, and not to a thunk which will evaluate to the array (or to bottom). The components of an @Array#@ are themselves boxed. The type @ByteArray#@ is similar to @Array#@, except that it contains just a string of (non-pointer) bytes. @ type ByteArray# @ Arrays of these types are useful when a Haskell program wishes to construct a value to pass to a C procedure. It is also possible to use them to build (say) arrays of unboxed characters for internal use in a Haskell program. Given these uses, @ByteArray#@ is deliberately a bit vague about the type of its components. Operations are provided to extract values of type @Char#@, @Int#@, @Float#@, @Double#@, and @Addr#@ from arbitrary offsets within a @ByteArray#@. (For type @Foo#@, the $i$th offset gets you the $i$th @Foo#@, not the @Foo#@ at byte-position $i$. Mumble.) (If you want a @Word#@, grab an @Int#@, then coerce it.) Lastly, we have static byte-arrays, of type @Addr#@ [mentioned previously]. (Remember the duality between arrays and pointers in C.) Arrays of this types are represented by a pointer to an array in the world outside Haskell, so this pointer is not followed by the garbage collector. In other respects they are just like @ByteArray#@. They are only needed in order to pass values from C to Haskell. \subsubsection{Reading and writing.} Primitive arrays are linear, and indexed starting at zero. The size and indices of a @ByteArray#@, @Addr#@, and @MutableByteArray#@ are all in bytes. It's up to the program to calculate the correct byte offset from the start of the array. This allows a @ByteArray#@ to contain a mixture of values of different type, which is often needed when preparing data for and unpicking results from C. (Umm... not true of indices... WDP 95/09) {\em Should we provide some @sizeOfDouble#@ constants?} Out-of-range errors on indexing should be caught by the code which uses the primitive operation; the primitive operations themselves do {\em not} check for out-of-range indexes. The intention is that the primitive ops compile to one machine instruction or thereabouts. We use the terms ``reading'' and ``writing'' to refer to accessing {\em mutable} arrays (see Section~\ref{sect:mutable}), and ``indexing'' to refer to reading a value from an {\em immutable} array. If you want to read/write a @Word#@, read an @Int#@ and coerce. Immutable byte arrays are straightforward to index (all indices in bytes): @ indexCharArray# :: ByteArray# -> Int# -> Char# indexIntArray# :: ByteArray# -> Int# -> Int# indexAddrArray# :: ByteArray# -> Int# -> Addr# indexFloatArray# :: ByteArray# -> Int# -> Float# indexDoubleArray# :: ByteArray# -> Int# -> Double# indexCharOffAddr# :: Addr# -> Int# -> Char# indexIntOffAddr# :: Addr# -> Int# -> Int# indexFloatOffAddr# :: Addr# -> Int# -> Float# indexDoubleOffAddr# :: Addr# -> Int# -> Double# indexAddrOffAddr# :: Addr# -> Int# -> Addr# -- Get an Addr# from an Addr# offset @ The last of these, @indexAddrOffAddr#@, extracts an @Addr#@ using an offset from another @Addr#@, thereby providing the ability to follow a chain of C pointers. Something a bit more interesting goes on when indexing arrays of boxed objects, because the result is simply the boxed object. So presumably it should be entered --- we never usually return an unevaluated object! This is a pain: primitive ops aren't supposed to do complicated things like enter objects. The current solution is to return a lifted value, but I don't like it! @ indexArray# :: Array# elt -> Int# -> GHCbase.Lift elt -- Yuk! @ \subsubsection{The state type} The primitive type @State#@ represents the state of a state transformer. It is parameterised on the desired type of state, which serves to keep states from distinct threads distinct from one another. But the {\em only} effect of this parameterisation is in the type system: all values of type @State#@ are represented in the same way. Indeed, they are all represented by nothing at all! The code generator ``knows'' to generate no code, and allocate no registers etc, for primitive states. @ type State# s @ The type @GHCbuiltins.RealWorld@ is truly opaque: there are no values defined of this type, and no operations over it. It is ``primitive'' in that sense---but it is {\em not unboxed!} Its only role in life is to be the type which distinguishes the @PrimIO@ state transformer (see Section~\ref{sect:io-spec}). @ data RealWorld @ \subsubsection{States} A single, primitive, value of type @State# RealWorld@ is provided. @ realWorld# :: State# GHCbuiltins.RealWorld @ (Note: in the compiler, not a @PrimOp@; just a mucho magic @Id@.) \subsection{State pairing types} \label{sect:horrid-pairing-types} This subsection defines some types which, while they aren't quite primitive because we can define them in Haskell, are very nearly so. They define constructors which pair a primitive state with a value of each primitive type. They are required to express the result type of the primitive operations in the state monad. @ data StateAndPtr# s elt = StateAndPtr# (State# s) elt data StateAndChar# s = StateAndChar# (State# s) Char# data StateAndInt# s = StateAndInt# (State# s) Int# data StateAndWord# s = StateAndWord# (State# s) Word# data StateAndFloat# s = StateAndFloat# (State# s) Float# data StateAndDouble# s = StateAndDouble# (State# s) Double# data StateAndAddr# s = StateAndAddr# (State# s) Addr# data StateAndStablePtr# s a = StateAndStablePtr# (State# s) (StablePtr# a) data StateAndForeignObj# s = StateAndForeignObj# (State# s) ForeignObj# data StateAndSynchVar# s a = StateAndSynchVar# (State# s) (SynchVar# a) data StateAndArray# s elt = StateAndArray# (State# s) (Array# elt) data StateAndMutableArray# s elt = StateAndMutableArray# (State# s) (MutableArray# s elt) data StateAndByteArray# s = StateAndByteArray# (State# s) ByteArray# data StateAndMutableByteArray# s = StateAndMutableByteArray# (State# s) (MutableByteArray# s) @ \subsection{Mutable arrays} \label{sect:mutable} Corresponding to @Array#@ and @ByteArray#@, we have the types of mutable versions of each. In each case, the representation is a pointer to a suitable block of (mutable) heap-allocated storage. @ type MutableArray# s elt type MutableByteArray# s @ \subsubsection{Allocation.} Mutable arrays can be allocated. Only pointer-arrays are initialised; arrays of non-pointers are filled in by ``user code'' rather than by the array-allocation primitive. Reason: only the pointer case has to worry about GC striking with a partly-initialised array. @ newArray# :: Int# -> elt -> State# s -> StateAndMutableArray# s elt newCharArray# :: Int# -> State# s -> StateAndMutableByteArray# s newIntArray# :: Int# -> State# s -> StateAndMutableByteArray# s newAddrArray# :: Int# -> State# s -> StateAndMutableByteArray# s newFloatArray# :: Int# -> State# s -> StateAndMutableByteArray# s newDoubleArray# :: Int# -> State# s -> StateAndMutableByteArray# s @ The size of a @ByteArray#@ is given in bytes. \subsubsection{Reading and writing} %OLD: Remember, offsets in a @MutableByteArray#@ are in bytes. @ readArray# :: MutableArray# s elt -> Int# -> State# s -> StateAndPtr# s elt readCharArray# :: MutableByteArray# s -> Int# -> State# s -> StateAndChar# s readIntArray# :: MutableByteArray# s -> Int# -> State# s -> StateAndInt# s readAddrArray# :: MutableByteArray# s -> Int# -> State# s -> StateAndAddr# s readFloatArray# :: MutableByteArray# s -> Int# -> State# s -> StateAndFloat# s readDoubleArray# :: MutableByteArray# s -> Int# -> State# s -> StateAndDouble# s writeArray# :: MutableArray# s elt -> Int# -> elt -> State# s -> State# s writeCharArray# :: MutableByteArray# s -> Int# -> Char# -> State# s -> State# s writeIntArray# :: MutableByteArray# s -> Int# -> Int# -> State# s -> State# s writeAddrArray# :: MutableByteArray# s -> Int# -> Addr# -> State# s -> State# s writeFloatArray# :: MutableByteArray# s -> Int# -> Float# -> State# s -> State# s writeDoubleArray# :: MutableByteArray# s -> Int# -> Double# -> State# s -> State# s @ \subsubsection{Equality.} One can take ``equality'' of mutable arrays. What is compared is the {\em name} or reference to the mutable array, not its contents. @ sameMutableArray# :: MutableArray# s elt -> MutableArray# s elt -> Bool sameMutableByteArray# :: MutableByteArray# s -> MutableByteArray# s -> Bool @ \subsubsection{Freezing mutable arrays} Only unsafe-freeze has a primitive. (Safe freeze is done directly in Haskell by copying the array and then using @unsafeFreeze@.) @ unsafeFreezeArray# :: MutableArray# s elt -> State# s -> StateAndArray# s elt unsafeFreezeByteArray# :: MutableByteArray# s -> State# s -> StateAndByteArray# s @ \subsubsection{Stable pointers} {\em Andy's comment.} {\bf Errors:} The following is not strictly true: the current implementation is not as polymorphic as claimed. The reason for this is that the C programmer will have to use a different entry-routine for each type of stable pointer. At present, we only supply a very limited number (1) of these routines. It might be possible to increase the range of these routines by providing general purpose entry points to apply stable pointers to (stable pointers to) arguments and to enter (stable pointers to) boxed primitive values. {\em End of Andy's comment.} A stable pointer is a name for a Haskell object which can be passed to the external world. It is ``stable'' in the sense that the name does not change when the Haskell garbage collector runs --- in contrast to the address of the object which may well change. The stable pointer type is parameterised by the type of the thing which is named. @ type StablePtr# a @ A stable pointer is represented by an index into the (static) @StablePointerTable@. The Haskell garbage collector treats the @StablePointerTable@ as a source of roots for GC. The @makeStablePointer@ function converts a value into a stable pointer. It is part of the @PrimIO@ monad, because we want to be sure we don't allocate one twice by accident, and then only free one of the copies. @ makeStablePointer# :: a -> State# RealWorld -> StateAndStablePtr# RealWorld a freeStablePointer# :: StablePtr# a -> State# RealWorld -> State# RealWorld deRefStablePointer# :: StablePtr# a -> State# RealWorld -> StateAndPtr RealWorld a @ There is also a C procedure @FreeStablePtr@ which frees a stable pointer. % % Rewritten and updated for MallocPtr++ -- 4/96 SOF % \subsubsection{Foreign objects} A @ForeignObj@ is a reference to an object outside the Haskell world (i.e., from the C world, or a reference to an object on another machine completely.), where the Haskell world has been told ``Let me know when you're finished with this ...''. The @ForeignObj@ type is just a special @Addr#@ ({\em not} parameterised). @ type ForeignObj# @ A typical use of @ForeignObj@ is in constructing Haskell bindings to external libraries. A good example is that of writing a binding to an image-processing library (which was actually the main motivation for implementing @ForeignObj@'s precursor, @MallocPtr@). The images manipulated are not stored in the Haskell heap, either because the library insist on allocating them internally or we (sensibly) decide to spare the GC from having to heave heavy images around. @ data Image = Image ForeignObj# instance CCallable Image @ The @ForeignObj#@ type is then used to refer to the externally allocated image, and to acheive some type safety, the Haskell binding defines the @Image@ data type. So, a value of type @ForeignObj#@ is used to ``box'' up an external reference into a Haskell heap object that we can then indirectly reference: @ createImage :: (Int,Int) -> PrimIO Image @ So far, this looks just like an @Addr#@ type, but @ForeignObj#@ offers a bit more, namely that we can specify a {\em finalisation routine} to invoke when the @ForeignObj#@ is discarded by the GC. The garbage collector invokes the finalisation routine associated with the @ForeignObj#@, saying `` Thanks, I'm through with this now..'' For the image-processing library, the finalisation routine could for the images free up memory allocated for them. The finalisation routine has currently to be written in C (the finalisation routine can in turn call on @FreeStablePtr@ to deallocate a stable pointer.). Associating a finalisation routine with an external object is done by @makeForeignObj#@: @ makeForeignObj# :: Addr# -- foreign reference -> Addr# -- pointer to finalisation routine -> StateAndForeignObj# RealWorld ForeignObj# @ (Implementation: a linked list of all @ForeignObj#@s is maintained to allow the garbage collector to detect when a @ForeignObj#@ becomes garbage.) Like @Array@, @ForeignObj#@s are represented by heap objects. {\bf ToDo:} Decide whether @FreeCHeapPointer@ is allowed to call on a stable pointer. (I sincerely hope not since we will still be in the GC at this point.) \subsubsection{Synchronizing variables (I-vars, M-vars)} ToDo ToDo ToDo @ type SynchVar# s elt -- primitive newSynchVar#:: State# s -> StateAndSynchVar# s elt takeMVar# :: SynchVar# s elt -> State# s -> StateAndPtr# s elt putMVar# :: SynchVar# s elt -> State# s -> State# s readIVar# :: SynchVar# s elt -> State# s -> StateAndPtr# s elt writeIVar# :: SynchVar# s elt -> State# s -> State# s @ \subsubsection{Controlling the garbage collector} The C function {\tt PerformGC\/}, allows the C world to force Haskell to do a garbage collection. It can only be called while Haskell is performing a C Call. Note that this function can be used to define a Haskell IO operation with the same effect: @ > performGCIO :: PrimIO () > performGCIO = _ccall_gc_ PerformGC @ {\bf ToDo:} Is there any need for abnormal/normal termination to force a GC too? Is there any need for a function that provides finer control over GC: argument = amount of space required; result = amount of space recovered. \subsection{@spark#@ primitive operation (for parallel execution)} {\em ToDo: say something} It's used in the unfolding for @par@. \subsection{The @errorIO#@ primitive operation} The @errorIO#@ primitive takes an argument much like @PrimIO@. It aborts execution of the current program, and continues instead by performing the given @PrimIO@-like value on the current state of the world. @ errorIO# :: (State RealWorld -> ((), State RealWorld)) -> a @ \subsection{C Calls} {\bf ToDo:} current implementation has state variable as second argument not last argument. The @ccall#@ primitive can't be given an ordinary type, because it has a variable number of arguments. The nearest we can get is: @ ccall# :: CRoutine -> a1# -> ... -> an# -> State# RealWorld -> StateAndR# RealWorld @ where the type variables @a1#@\ldots@an#@ and @r#@ can be instantiated by any primitive type, and @StateAndR#@ is the appropriate pairing type from Section~\ref{sect:horrid-pairing-types}. The @CRoutine@ isn't a proper Haskell type at all; it just reminds us that @ccall#@ needs to know what C routine to call. This notation is really short for a massive family of @ccall#@ primitives, one for each combination of types. (Of course, the compiler simply remembers the types involved, and generates appropriate code when it finally spits out the C.) Unlike all the other primitive operators, @ccall#@ is not bound to an in-scope identifier. The only way it is possible to generate a @ccall#@ is via the @_ccall_@ construct. All this applies equally to @casm#@: @ casm# :: CAsmString -> a1# -> ... -> an# -> State# RealWorld -> StateAndR# RealWorld @ %------------------------------------------------------------ \section{Library stuff built with the Really Primitive Stuff} \subsection{The state transformer monad} \subsubsection{Types} A state transformer is a function from a state to a pair of a result and a new state. @ newtype ST s a = ST (State s -> (a, State s)) @ The @ST@ type is {\em abstract}, so that the programmer cannot see its representation. If he could, he could write bad things like: @ bad :: ST s a bad = ST $ \ s -> ...(f s)...(g s)... @ Here, @s@ is duplicated, which would be bad news. A state is represented by a primitive state value, of type @State# s@, wrapped up in a @State@ constructor. The reason for boxing it in this way is so that we can be strict or lazy in the state. (Remember, all primitive types are unboxed, and hence can't be bottom; but types built with @data@ are all boxed.) @ data State s = S# (State# s) @ \subsubsection{The state transformer combinators} Now for the combinators, all of which live inside the @ST@ abstraction. Notice that @returnST@ and @thenST@ are lazy in the state. @ returnST :: a -> ST s a returnST a s = (a, s) thenST :: ST s a -> (a -> ST s b) -> ST s b thenST m k s = let (r,new_s) = m s in k r new_s fixST :: (a -> ST s a) -> ST s a fixST k s = let ans = k r s (r,new_s) = ans in ans @ The interesting one is, of course, @runST@. We can't infer its type! (It has a funny name because it must be wired into the compiler.) @ -- runST :: forall a. (forall s. ST s a) -> a runST m = case m (S# realWorld#) of (r,_) -> r @ \subsubsection{Other useful combinators} There are various other standard combinators, all defined in terms the fundamental combinators above. The @seqST@ combinator is like @thenST@, except that it discards the result of the first state transformer: @ seqST :: ST s a -> ST s b -> ST s b seqST m1 m2 = m1 `thenST` (\_ -> m2) @ We also have {\em strict} (... in the state...) variants of the then/return combinators (same types as their pals): @ returnStrictlyST a s@(S# _) = (a, s) thenStrictlyST m k s@(S# _) = case (m s) of { (r, new_s@(S# _)) -> k r new_s } seqStrictlyST m k = ... ditto, for seqST ... @ The combinator @listST@ takes a list of state transformers, and composes them in sequence, returning a list of their results: @ listST :: [ST s a] -> ST s [a] listST [] = returnST [] listST (m:ms) = m `thenST` \ r -> listST ms `thenST` \ rs -> returnST (r:rs) @ The @mapST@ combinator ``lifts'' a function from a value to state transformers to one which works over a list of values: @ mapST :: (a -> ST s b) -> [a] -> ST s [b] mapST f ms = listST (map f ms) @ The @mapAndUnzipST@ combinator is similar to @mapST@, except that here the function returns a pair: @ mapAndUnzipST :: (a -> ST s (b,c)) -> [a] -> ST s ([b],[c]) mapAndUnzipST f (m:ms) = f m `thenST` \ ( r1, r2) -> mapAndUnzipST f ms `thenST` \ (rs1, rs2) -> returnST (r1:rs1, r2:rs2) @ \subsubsection{The @PrimIO@ monad} \label{sect:io-spec} The @PrimIO@ type is defined in as a state transformer which manipulates the @RealWorld@. @ type PrimIO a = ST RealWorld a -- Transparent @ The @PrimIO@ type is an ordinary type synonym, transparent to the programmer. The type @RealWorld@ and value @realWorld#@ do not need to be hidden (although there is no particular point in exposing them). Even having a value of type @realWorld#@ does not compromise safety, since the type @ST@ is hidden. It is type-correct to use @returnST@ in an I/O context, but it is a bit more efficient to use @returnPrimIO@. The latter is strict in the state, which propagates backwards to all the earlier combinators (provided they are unfolded). Why is it safe for @returnPrimIO@ to be strict in the state? Because every context in which an I/O state transformer is used will certainly evaluate the resulting state; it is the state of the real world! @ returnPrimIO :: a -> PrimIO a returnPrimIO a s@(S# _) -> (a, s) @ We provide strict versions of the other combinators too. @ thenPrimIO m k s = case m s of (r,s) -> k r s @ @fixPrimIO@ has to be lazy, though! @ fixPrimIO = fixST @ The other combinators are just the same as before, but use the strict @thenPrimIO@ and @returnPrimIO@ for efficiency. @ foldrPrimIO f z [] = z foldrPrimIO f z (m:ms) = foldrPrimIO f z ms `thenPrimIO` \ ms' -> f m ms' listPrimIO ms = foldrPrimIO (\ a xs -> a `thenPrimIO` \ x -> returnPrimIO (x : xs)) (returnPrimIO []) ms mapPrimIO f ms = listPrimIO (map f ms) mapAndUnzipPrimIO f (m:ms) = f m `thenPrimIO` \ ( r1, r2) -> mapAndUnzipPrimIO f ms `thenPrimIO` \ (rs1, rs2) -> returnPrimIO (r1:rs1, r2:rs2) @ \subsection{Arrays} \subsubsection{Types} @ data Array ix elt = Array (ix,ix) (Array# elt) data ByteArray ix = ByteArray (ix,ix) ByteArray# data MutableArray s ix elt = MutableArray (ix,ix) (MutableArray# s elt) data MutableByteArray s ix = MutableByteArray (ix,ix) (MutableByteArray# s) @ \subsubsection{Operations on immutable arrays} Ordinary array indexing is straightforward. @ (!) :: Ix ix => Array ix elt -> ix -> elt @ QUESTIONs: should @ByteArray@s be indexed by Ints or ix? With byte offsets or sized ones? (sized ones [WDP]) @ indexCharArray :: Ix ix => ByteArray ix -> ix -> Char indexIntArray :: Ix ix => ByteArray ix -> ix -> Int indexAddrArray :: Ix ix => ByteArray ix -> ix -> Addr indexFloatArray :: Ix ix => ByteArray ix -> ix -> Float indexDoubleArray :: Ix ix => ByteArray ix -> ix -> Double @ @Addr@s are indexed straightforwardly by @Int@s. Unlike the primitive operations, though, the offsets assume that the array consists entirely of the type of value being indexed, and so there's an implicit multiplication by the size of that value. To access @Addr@s with mixed values requires you to do a DIY job using the primitives. @ indexAddrChar :: Addr -> Int -> Char ...etc... indexStaticCharArray :: Addr -> Int -> Char indexStaticIntArray :: Addr -> Int -> Int indexStaticFloatArray :: Addr -> Int -> Float indexStaticDoubleArray :: Addr -> Int -> Double indexStaticArray :: Addr -> Int -> Addr @ \subsubsection{Operations on mutable arrays} @ newArray :: Ix ix => (ix,ix) -> elt -> ST s (MutableArray s ix elt) newCharArray :: Ix ix => (ix,ix) -> ST s (MutableByteArray s ix) ... @ @ readArray :: Ix ix => MutableArray s ix elt -> ix -> ST s elt readCharArray :: Ix ix => MutableByteArray s ix -> ix -> ST s Char ... @ @ writeArray :: Ix ix => MutableArray s ix elt -> ix -> elt -> ST s () writeCharArray :: Ix ix => MutableByteArray s ix -> ix -> Char -> ST s () ... @ @ freezeArray :: Ix ix => MutableArray s ix elt -> ST s (Array ix elt) freezeCharArray :: Ix ix => MutableByteArray s ix -> ST s (ByteArray ix) ... @ We have no need on one-function-per-type for unsafe freezing: @ unsafeFreezeArray :: Ix ix => MutableArray s ix elt -> ST s (Array ix elt) unsafeFreezeByteArray :: Ix ix => MutableByteArray s ix -> ST s (ByteArray ix) @ Sometimes we want to snaffle the bounds of one of these beasts: @ boundsOfArray :: Ix ix => MutableArray s ix elt -> (ix, ix) boundsOfByteArray :: Ix ix => MutableByteArray s ix -> (ix, ix) @ Lastly, ``equality'': @ sameMutableArray :: MutableArray s ix elt -> MutableArray s ix elt -> Bool sameMutableByteArray :: MutableByteArray s ix -> MutableByteArray s ix -> Bool @ \subsection{Variables} \subsubsection{Types} Mutable variables are (for now anyway) implemented as arrays. The @MutableVar@ type is opaque, so we can change the implementation later if we want. @ type MutableVar s a = MutableArray s Int a @ \subsubsection{Operations} @ newVar :: a -> ST s (MutableVar s a) readVar :: MutableVar s a -> ST s a writeVar :: MutableVar s a -> a -> ST s () sameVar :: MutableVar s a -> MutableVar s a -> Bool @ \subsection{Stable pointers} Nothing exciting here, just simple boxing up. @ data StablePtr a = StablePtr (StablePtr# a) makeStablePointer :: a -> StablePtr a freeStablePointer :: StablePtr a -> PrimIO () @ \subsection{Foreign objects} Again, just boxing up. @ data ForeignObj = ForeignObj ForeignObj# makeForeignObj :: Addr -> Addr -> PrimIO ForeignObj @ \subsection{C calls} Everything in this section goes for @_casm_@ too. {\em ToDo: mention @_ccall_gc_@ and @_casm_gc_@...} The @_ccall_@ construct has the following form: $$@_ccall_@~croutine~a_1~\ldots~a_n$$ This whole construct has type $@PrimIO@~res$. The rules are these: \begin{itemize} \item The first ``argument'', $croutine$, must be the literal name of a C procedure. It cannot be a Haskell expression which evaluates to a string, etc; it must be simply the name of the procedure. \item The arguments $a_1, \ldots,a_n$ must be of {\em C-callable} type. \item The whole construct has type $@PrimIO@~ty$, where $ty$ is a {\em C-returnable} type. \end{itemize} A {\em boxed-primitive} type is both C-callable and C-returnable. A boxed primitive type is anything declared by: @ data T = C# t @ where @t@ is a primitive type. Note that programmer-defined boxed-primitive types are perfectly OK: @ data Widget = W# Int# data Screen = S# CHeapPtr# @ There are other types that can be passed to C (C-callable). This table summarises (including the standard boxed-primitive types): @ Boxed Type of transferd Corresp. Which is Type Prim. component C type *probably*... ------ --------------- ------ ------------- Char Char# StgChar unsigned char Int Int# StgInt long int Word Word# StgWord unsigned long int Addr Addr# StgAddr char * Float Float# StgFloat float Double Double# StgDouble double Array Array# StgArray StgPtr ByteArray ByteArray# StgByteArray StgPtr MutableArray MutableArray# StgArray StgPtr MutableByteArray MutableByteArray# StgByteArray StgPtr State State# nothing! StablePtr StablePtr# StgStablePtr StgPtr ForeignObj ForeignObj# StgForeignObj StgPtr @ All of the above are {\em C-returnable} except: @ Array, ByteArray, MutableArray, MutableByteArray @ {\bf ToDo:} I'm pretty wary of @Array@ and @MutableArray@ being in this list, and not too happy about @State@ [WDP]. {\bf ToDo:} Can code generator pass all the primitive types? Should this be extended to include {\tt Bool\/} (or any enumeration type?) The type checker must be able to figure out just which of the C-callable/returnable types is being used. If it can't, you have to add type signatures. For example, @ f x = _ccall_ foo x @ is not good enough, because the compiler can't work out what type @x@ is, nor what type the @_ccall_@ returns. You have to write, say: @ f :: Int -> PrimIO Float f x = _ccall_ foo x @ \subsubsection{Implementation} The desugarer unwraps the @_ccall_@ construct by inserting the necessary evaluations etc to unbox the arguments. For example, the body of the definition of @f@ above would become: @ (\ s -> case x of { I# x# -> case s of { S# s# -> case ccall# [Int#,Float#] x# s# of { StateAndFloat# f# new_s# -> (F# f#, S# new_s#) }}}) @ Notice that the state, too, is unboxed. The code generator must deal specially with primitive objects which are stored on the heap. ... details omitted ... % %More importantly, it must construct a C Heap Pointer heap-object after %a @_ccall_@ which returns a @MallocPtr#@. % %-------------------------------------------------------- \section{Non-primitive stuff that must be wired into GHC} @ data Char = C# Char# data Int = I# Int# data Word = W# Word# data Addr = A# Addr# data Float = F# Float# data Double = D# Double# data Integer = J# Int# Int# ByteArray# -- and the other boxed-primitive types: Array, ByteArray, MutableArray, MutableByteArray, StablePtr, ForeignObj data Bool = False | True data Ordering = LT | EQ | GT -- used in derived comparisons data List a = [] | a : (List a) -- tuples... data Lift a = Lift a -- used Yukkily as described elsewhere type String = [Char] -- convenience, only @ %------------------------------------------------------------ \section{Programmer interface(s)} \subsection{The bog-standard interface} If you rely on the implicit @import Prelude@ that GHC normally does for you, and if you don't use any weird flags (notably @-fglasgow-exts@), and if you don't import one of the fairly-magic @PreludeGla*@ interfaces, then GHC should work {\em exactly} as the Haskell report says, and the full user namespaces should be available to you. \subsection{If you mess about with @import Prelude@...} Innocent hiding, e.g., @ import Prelude hiding ( fromIntegral ) @ should work just fine. There are some things you can do that will make GHC crash, e.g., hiding a standard class: @ import Prelude hiding ( Eq(..) ) @ Don't do that. \subsection{Turning on Glasgow extensions with @-fglasgow-exts@} If you turn on @-fglasgow-exts@, then all the primitive types and operations described herein are available. It is possible that some name conflicts between your code and the wired-in things might spring to life (though we doubt it...). Change your names :-) \end{document}