Unboxed types and primitive operationsGHC.Prim moduleThis chapter defines all the types which are primitive in
Glasgow Haskell, and the operations provided for them. You bring
them into scope by importing module GHC.Prim.Note: while you really can use this stuff to write fast code,
we generally find it a lot less painful, and more satisfying in the
long run, to use higher-level language features and libraries. With
any luck, the code you write will be optimised to the efficient
unboxed version in any case. And if it isn't, we'd like to know
about it.Unboxed types
Unboxed types (Glasgow extension)Most types in GHC are boxed, which means
that values of that type are represented by a pointer to a heap
object. The representation of a Haskell Int, for
example, is a two-word heap object. An unboxed
type, however, is represented by the value itself, no pointers or heap
allocation are involved.
Unboxed types correspond to the “raw machine” types you
would use in C: Int# (long int),
Double# (double), Addr#
(void *), etc. The primitive operations
(PrimOps) on these types are what you might expect; e.g.,
(+#) is addition on
Int#s, and is the machine-addition that we all
know and love—usually one instruction.
Primitive (unboxed) types cannot be defined in Haskell, and are
therefore built into the language and compiler. Primitive types are
always unlifted; that is, a value of a primitive type cannot be
bottom. We use the convention that primitive types, values, and
operations have a # suffix.
Primitive values are often represented by a simple bit-pattern, such
as Int#, Float#,
Double#. But this is not necessarily the case:
a primitive value might be represented by a pointer to a
heap-allocated object. Examples include
Array#, the type of primitive arrays. A
primitive array is heap-allocated because it is too big a value to fit
in a register, and would be too expensive to copy around; in a sense,
it is accidental that it is represented by a pointer. If a pointer
represents a primitive value, then it really does point to that value:
no unevaluated thunks, no indirections…nothing can be at the
other end of the pointer than the primitive value.
There are some restrictions on the use of primitive types, the main
one being that you can't pass a primitive value to a polymorphic
function or store one in a polymorphic data type. This rules out
things like [Int#] (i.e. lists of primitive
integers). The reason for this restriction is that polymorphic
arguments and constructor fields are assumed to be pointers: if an
unboxed integer is stored in one of these, the garbage collector would
attempt to follow it, leading to unpredictable space leaks. Or a
seq operation on the polymorphic component may
attempt to dereference the pointer, with disastrous results. Even
worse, the unboxed value might be larger than a pointer
(Double# for instance).
Nevertheless, A numerically-intensive program using unboxed types can
go a lot faster than its “standard”
counterpart—we saw a threefold speedup on one example.
Unboxed Tuples
Unboxed tuples aren't really exported by GHC.Prim,
they're available by default with . An
unboxed tuple looks like this:
(# e_1, ..., e_n #)
where e_1..e_n are expressions of any
type (primitive or non-primitive). The type of an unboxed tuple looks
the same.
Unboxed tuples are used for functions that need to return multiple
values, but they avoid the heap allocation normally associated with
using fully-fledged tuples. When an unboxed tuple is returned, the
components are put directly into registers or on the stack; the
unboxed tuple itself does not have a composite representation. Many
of the primitive operations listed in this section return unboxed
tuples.
There are some pretty stringent restrictions on the use of unboxed tuples:
Unboxed tuple types are subject to the same restrictions as
other unboxed types; i.e. they may not be stored in polymorphic data
structures or passed to polymorphic functions.
Unboxed tuples may only be constructed as the direct result of
a function, and may only be deconstructed with a case expression.
eg. the following are valid:
f x y = (# x+1, y-1 #)
g x = case f x x of { (# a, b #) -> a + b }
but the following are invalid:
f x y = g (# x, y #)
g (# x, y #) = x + y
No variable can have an unboxed tuple type. This is illegal:
f :: (# Int, Int #) -> (# Int, Int #)
f x = x
because x has an unboxed tuple type.
Note: we may relax some of these restrictions in the future.
The IO and ST monads use unboxed
tuples to avoid unnecessary allocation during sequences of operations.
Character and numeric typescharacter types, primitivenumeric types, primitiveinteger types, primitivefloating point types, primitive
There are the following obvious primitive types:
type Char#
type Int#
type Word#
type Addr#
type Float#
type Double#
type Int64#
type Word64#
Char#Int#Word#Addr#Float#Double#Int64#Word64#
If you really want to know their exact equivalents in C, see
ghc/includes/StgTypes.h in the GHC source tree.
Literals for these types may be written as follows:
1# an Int#
1.2# a Float#
1.34## a Double#
'a'# a Char#; for weird characters, use e.g. '\o<octal>'#
"a"# an Addr# (a `char *'); only characters '\0'..'\255' allowed
literals, primitiveconstants, primitivenumbers, primitiveComparison operationscomparisons, primitiveoperators, comparison
{>,>=,==,/=,<,<=}# :: Int# -> Int# -> Bool
{gt,ge,eq,ne,lt,le}Char# :: Char# -> Char# -> Bool
-- ditto for Word# and Addr#
>#>=#==#/=#<#<=#gt{Char,Word,Addr}#ge{Char,Word,Addr}#eq{Char,Word,Addr}#ne{Char,Word,Addr}#lt{Char,Word,Addr}#le{Char,Word,Addr}#Primitive-character operationscharacters, primitive operationsoperators, primitive character
ord# :: Char# -> Int#
chr# :: Int# -> Char#
ord#chr#Primitive-Int operationsintegers, primitive operationsoperators, primitive integer
{+,-,*,quotInt,remInt,gcdInt}# :: Int# -> Int# -> Int#
negateInt# :: Int# -> Int#
iShiftL#, iShiftRA#, iShiftRL# :: Int# -> Int# -> Int#
-- shift left, right arithmetic, right logical
addIntC#, subIntC#, mulIntC# :: Int# -> Int# -> (# Int#, Int# #)
-- add, subtract, multiply with carry
+#-#*#quotInt#remInt#gcdInt#iShiftL#iShiftRA#iShiftRL#addIntC#subIntC#mulIntC#shift operations, integerNote: No error/overflow checking!
Primitive-Double and Float operationsfloating point numbers, primitiveoperators, primitive floating point
{+,-,*,/}## :: Double# -> Double# -> Double#
{<,<=,==,/=,>=,>}## :: Double# -> Double# -> Bool
negateDouble# :: Double# -> Double#
double2Int# :: Double# -> Int#
int2Double# :: Int# -> Double#
{plus,minux,times,divide}Float# :: Float# -> Float# -> Float#
{gt,ge,eq,ne,lt,le}Float# :: Float# -> Float# -> Bool
negateFloat# :: Float# -> Float#
float2Int# :: Float# -> Int#
int2Float# :: Int# -> Float#
+##-##*##/##<##<=##==##=/##>=##>##negateDouble#double2Int#int2Double#plusFloat#minusFloat#timesFloat#divideFloat#gtFloat#geFloat#eqFloat#neFloat#ltFloat#leFloat#negateFloat#float2Int#int2Float#
And a full complement of trigonometric functions:
expDouble# :: Double# -> Double#
logDouble# :: Double# -> Double#
sqrtDouble# :: Double# -> Double#
sinDouble# :: Double# -> Double#
cosDouble# :: Double# -> Double#
tanDouble# :: Double# -> Double#
asinDouble# :: Double# -> Double#
acosDouble# :: Double# -> Double#
atanDouble# :: Double# -> Double#
sinhDouble# :: Double# -> Double#
coshDouble# :: Double# -> Double#
tanhDouble# :: Double# -> Double#
powerDouble# :: Double# -> Double# -> Double#
trigonometric functions, primitive
similarly for Float#.
There are two coercion functions for Float#/Double#:
float2Double# :: Float# -> Double#
double2Float# :: Double# -> Float#
float2Double#double2Float#
The primitive version of decodeDouble
(encodeDouble is implemented as an external C
function):
decodeDouble# :: Double# -> PrelNum.ReturnIntAndGMP
encodeDouble#decodeDouble#
(And the same for Float#s.)
Operations on/for Integers (interface to GMP)
arbitrary precision integersInteger, operations on
We implement Integers (arbitrary-precision
integers) using the GNU multiple-precision (GMP) package (version
2.0.2).
The data type for Integer is either a small
integer, represented by an Int, or a large integer
represented using the pieces required by GMP's
MP_INT in gmp.h (see
gmp.info in
ghc/includes/runtime/gmp). It comes out as:
data Integer = S# Int# -- small integers
| J# Int# ByteArray# -- large integers
Integer type The primitive
ops to support large Integers use the
“pieces” of the representation, and are as follows:
negateInteger# :: Int# -> ByteArray# -> Integer
{plus,minus,times}Integer#, gcdInteger#,
quotInteger#, remInteger#, divExactInteger#
:: Int# -> ByteArray#
-> Int# -> ByteArray#
-> (# Int#, ByteArray# #)
cmpInteger#
:: Int# -> ByteArray#
-> Int# -> ByteArray#
-> Int# -- -1 for <; 0 for ==; +1 for >
cmpIntegerInt#
:: Int# -> ByteArray#
-> Int#
-> Int# -- -1 for <; 0 for ==; +1 for >
gcdIntegerInt# ::
:: Int# -> ByteArray#
-> Int#
-> Int#
divModInteger#, quotRemInteger#
:: Int# -> ByteArray#
-> Int# -> ByteArray#
-> (# Int#, ByteArray#,
Int#, ByteArray# #)
integer2Int# :: Int# -> ByteArray# -> Int#
int2Integer# :: Int# -> Integer -- NB: no error-checking on these two!
word2Integer# :: Word# -> Integer
addr2Integer# :: Addr# -> Integer
-- the Addr# is taken to be a `char *' string
-- to be converted into an Integer.
negateInteger#plusInteger#minusInteger#timesInteger#quotInteger#remInteger#gcdInteger#gcdIntegerInt#divExactInteger#cmpInteger#divModInteger#quotRemInteger#integer2Int#int2Integer#word2Integer#addr2Integer#Words and addressesword, primitive typeaddress, primitive typeunsigned integer, primitive typepointer, primitive type
A Word# is used for bit-twiddling operations.
It is the same size as an Int#, but has no sign
nor any arithmetic operations.
type Word# -- Same size/etc as Int# but *unsigned*
type Addr# -- A pointer from outside the "Haskell world" (from C, probably);
-- described under "arrays"
Word#Addr#Word#s and Addr#s have
the usual comparison operations. Other
unboxed-Word ops (bit-twiddling and coercions):
{gt,ge,eq,ne,lt,le}Word# :: Word# -> Word# -> Bool
and#, or#, xor# :: Word# -> Word# -> Word#
-- standard bit ops.
quotWord#, remWord# :: Word# -> Word# -> Word#
-- word (i.e. unsigned) versions are different from int
-- versions, so we have to provide these explicitly.
not# :: Word# -> Word#
shiftL#, shiftRL# :: Word# -> Int# -> Word#
-- shift left, right logical
int2Word# :: Int# -> Word# -- just a cast, really
word2Int# :: Word# -> Int#
bit operations, Word and AddrgtWord#geWord#eqWord#neWord#ltWord#leWord#and#or#xor#not#quotWord#remWord#shiftL#shiftRA#shiftRL#int2Word#word2Int#
Unboxed-Addr ops (C casts, really):
{gt,ge,eq,ne,lt,le}Addr# :: Addr# -> Addr# -> Bool
int2Addr# :: Int# -> Addr#
addr2Int# :: Addr# -> Int#
addr2Integer# :: Addr# -> (# Int#, ByteArray# #)
gtAddr#geAddr#eqAddr#neAddr#ltAddr#leAddr#int2Addr#addr2Int#addr2Integer#
The casts between Int#,
Word# and Addr#
correspond to null operations at the machine level, but are required
to keep the Haskell type checker happy.
Operations for indexing off of C pointers
(Addr#s) to snatch values are listed under
“arrays”.
Arraysarrays, primitive
The type Array# elt is the type of primitive,
unpointed arrays of values of type elt.
type Array# elt
Array#Array# is more primitive than a Haskell
array—indeed, the Haskell Array interface is
implemented using Array#—in that an
Array# is indexed only by
Int#s, starting at zero. It is also more
primitive by virtue of being unboxed. That doesn't mean that it isn't
a heap-allocated object—of course, it is. Rather, being unboxed
means that it is represented by a pointer to the array itself, and not
to a thunk which will evaluate to the array (or to bottom). The
components of an Array# are themselves boxed.
The type ByteArray# is similar to
Array#, except that it contains just a string
of (non-pointer) bytes.
type ByteArray#
ByteArray#
Arrays of these types are useful when a Haskell program wishes to
construct a value to pass to a C procedure. It is also possible to use
them to build (say) arrays of unboxed characters for internal use in a
Haskell program. Given these uses, ByteArray#
is deliberately a bit vague about the type of its components.
Operations are provided to extract values of type
Char#, Int#,
Float#, Double#, and
Addr# from arbitrary offsets within a
ByteArray#. (For type
Foo#, the $i$th offset gets you the $i$th
Foo#, not the Foo# at
byte-position $i$. Mumble.) (If you want a
Word#, grab an Int#,
then coerce it.)
Lastly, we have static byte-arrays, of type
Addr# [mentioned previously]. (Remember
the duality between arrays and pointers in C.) Arrays of this types
are represented by a pointer to an array in the world outside Haskell,
so this pointer is not followed by the garbage collector. In other
respects they are just like ByteArray#. They
are only needed in order to pass values from C to Haskell.
Reading and writing
Primitive arrays are linear, and indexed starting at zero.
The size and indices of a ByteArray#, Addr#, and
MutableByteArray# are all in bytes. It's up to the program to
calculate the correct byte offset from the start of the array. This
allows a ByteArray# to contain a mixture of values of different
type, which is often needed when preparing data for and unpicking
results from C. (Umm…not true of indices…WDP 95/09)
Should we provide some sizeOfDouble# constants?
Out-of-range errors on indexing should be caught by the code which
uses the primitive operation; the primitive operations themselves do
not check for out-of-range indexes. The intention is that the
primitive ops compile to one machine instruction or thereabouts.
We use the terms “reading” and “writing” to refer to accessing
mutable arrays (see ), and
“indexing” to refer to reading a value from an immutable
array.
Immutable byte arrays are straightforward to index (all indices are in
units of the size of the object being read):
indexCharArray# :: ByteArray# -> Int# -> Char#
indexIntArray# :: ByteArray# -> Int# -> Int#
indexAddrArray# :: ByteArray# -> Int# -> Addr#
indexFloatArray# :: ByteArray# -> Int# -> Float#
indexDoubleArray# :: ByteArray# -> Int# -> Double#
indexCharOffAddr# :: Addr# -> Int# -> Char#
indexIntOffAddr# :: Addr# -> Int# -> Int#
indexFloatOffAddr# :: Addr# -> Int# -> Float#
indexDoubleOffAddr# :: Addr# -> Int# -> Double#
indexAddrOffAddr# :: Addr# -> Int# -> Addr#
-- Get an Addr# from an Addr# offset
indexCharArray#indexIntArray#indexAddrArray#indexFloatArray#indexDoubleArray#indexCharOffAddr#indexIntOffAddr#indexFloatOffAddr#indexDoubleOffAddr#indexAddrOffAddr#
The last of these, indexAddrOffAddr#, extracts an Addr# using an offset
from another Addr#, thereby providing the ability to follow a chain of
C pointers.
Something a bit more interesting goes on when indexing arrays of boxed
objects, because the result is simply the boxed object. So presumably
it should be entered—we never usually return an unevaluated
object! This is a pain: primitive ops aren't supposed to do
complicated things like enter objects. The current solution is to
return a single element unboxed tuple (see ).
indexArray# :: Array# elt -> Int# -> (# elt #)
indexArray#The state typestate, primitive typeState#
The primitive type State# represents the state of a state
transformer. It is parameterised on the desired type of state, which
serves to keep states from distinct threads distinct from one another.
But the only effect of this parameterisation is in the type
system: all values of type State# are represented in the same way.
Indeed, they are all represented by nothing at all! The code
generator “knows” to generate no code, and allocate no registers
etc, for primitive states.
type State# s
The type GHC.RealWorld is truly opaque: there are no values defined
of this type, and no operations over it. It is “primitive” in that
sense - but it is not unlifted! Its only role in life is to be
the type which distinguishes the IO state transformer.
data RealWorld
State of the world
A single, primitive, value of type State# RealWorld is provided.
realWorld# :: State# RealWorld
realWorld# state object
(Note: in the compiler, not a PrimOp; just a mucho magic
Id. Exported from GHC, though).
Mutable arraysmutable arraysarrays, mutable
Corresponding to Array# and ByteArray#, we have the types of
mutable versions of each. In each case, the representation is a
pointer to a suitable block of (mutable) heap-allocated storage.
type MutableArray# s elt
type MutableByteArray# s
MutableArray#MutableByteArray#Allocationmutable arrays, allocationarrays, allocationallocation, of mutable arrays
Mutable arrays can be allocated. Only pointer-arrays are initialised;
arrays of non-pointers are filled in by “user code” rather than by
the array-allocation primitive. Reason: only the pointer case has to
worry about GC striking with a partly-initialised array.
newArray# :: Int# -> elt -> State# s -> (# State# s, MutableArray# s elt #)
newCharArray# :: Int# -> State# s -> (# State# s, MutableByteArray# s elt #)
newIntArray# :: Int# -> State# s -> (# State# s, MutableByteArray# s elt #)
newAddrArray# :: Int# -> State# s -> (# State# s, MutableByteArray# s elt #)
newFloatArray# :: Int# -> State# s -> (# State# s, MutableByteArray# s elt #)
newDoubleArray# :: Int# -> State# s -> (# State# s, MutableByteArray# s elt #)
newArray#newCharArray#newIntArray#newAddrArray#newFloatArray#newDoubleArray#
The size of a ByteArray# is given in bytes.
Reading and writingarrays, reading and writing
readArray# :: MutableArray# s elt -> Int# -> State# s -> (# State# s, elt #)
readCharArray# :: MutableByteArray# s -> Int# -> State# s -> (# State# s, Char# #)
readIntArray# :: MutableByteArray# s -> Int# -> State# s -> (# State# s, Int# #)
readAddrArray# :: MutableByteArray# s -> Int# -> State# s -> (# State# s, Addr# #)
readFloatArray# :: MutableByteArray# s -> Int# -> State# s -> (# State# s, Float# #)
readDoubleArray# :: MutableByteArray# s -> Int# -> State# s -> (# State# s, Double# #)
writeArray# :: MutableArray# s elt -> Int# -> elt -> State# s -> State# s
writeCharArray# :: MutableByteArray# s -> Int# -> Char# -> State# s -> State# s
writeIntArray# :: MutableByteArray# s -> Int# -> Int# -> State# s -> State# s
writeAddrArray# :: MutableByteArray# s -> Int# -> Addr# -> State# s -> State# s
writeFloatArray# :: MutableByteArray# s -> Int# -> Float# -> State# s -> State# s
writeDoubleArray# :: MutableByteArray# s -> Int# -> Double# -> State# s -> State# s
readArray#readCharArray#readIntArray#readAddrArray#readFloatArray#readDoubleArray#writeArray#writeCharArray#writeIntArray#writeAddrArray#writeFloatArray#writeDoubleArray#Equalityarrays, testing for equality
One can take “equality” of mutable arrays. What is compared is the
name or reference to the mutable array, not its contents.
sameMutableArray# :: MutableArray# s elt -> MutableArray# s elt -> Bool
sameMutableByteArray# :: MutableByteArray# s -> MutableByteArray# s -> Bool
sameMutableArray#sameMutableByteArray#Freezing mutable arraysarrays, freezing mutablefreezing mutable arraysmutable arrays, freezing
Only unsafe-freeze has a primitive. (Safe freeze is done directly in Haskell
by copying the array and then using unsafeFreeze.)
unsafeFreezeArray# :: MutableArray# s elt -> State# s -> (# State# s, Array# s elt #)
unsafeFreezeByteArray# :: MutableByteArray# s -> State# s -> (# State# s, ByteArray# #)
unsafeFreezeArray#unsafeFreezeByteArray#Synchronizing variables (M-vars)synchronising variables (M-vars)M-Vars
Synchronising variables are the primitive type used to implement
Concurrent Haskell's MVars (see the Concurrent Haskell paper for
the operational behaviour of these operations).
type MVar# s elt -- primitive
newMVar# :: State# s -> (# State# s, MVar# s elt #)
takeMVar# :: SynchVar# s elt -> State# s -> (# State# s, elt #)
putMVar# :: SynchVar# s elt -> State# s -> State# s
SynchVar#newSynchVar#takeMVarputMVarPrimitive arrays, mutable and otherwise
primitive arrays (Glasgow extension)arrays, primitive (Glasgow extension)
GHC knows about quite a few flavours of Large Swathes of Bytes.
First, GHC distinguishes between primitive arrays of (boxed) Haskell
objects (type Array# obj) and primitive arrays of bytes (type
ByteArray#).
Second, it distinguishes between…
Immutable:
Arrays that do not change (as with “standard” Haskell arrays); you
can only read from them. Obviously, they do not need the care and
attention of the state-transformer monad.
Mutable:
Arrays that may be changed or “mutated.” All the operations on them
live within the state-transformer monad and the updates happen
in-place.
“Static” (in C land):
A C routine may pass an Addr# pointer back into Haskell land. There
are then primitive operations with which you may merrily grab values
over in C land, by indexing off the “static” pointer.
“Stable” pointers:
If, for some reason, you wish to hand a Haskell pointer (i.e.,
not an unboxed value) to a C routine, you first make the
pointer “stable,” so that the garbage collector won't forget that it
exists. That is, GHC provides a safe way to pass Haskell pointers to
C.
Please see for more details.
“Foreign objects”:
A “foreign object” is a safe way to pass an external object (a
C-allocated pointer, say) to Haskell and have Haskell do the Right
Thing when it no longer references the object. So, for example, C
could pass a large bitmap over to Haskell and say “please free this
memory when you're done with it.”
Please see for more details.
The libraries documentatation gives more details on all these
“primitive array” types and the operations on them.