ghc/docs/users_guide/glasgow_exts.sgml

   1 <Para>
   2 <IndexTerm><Primary>language, GHC</Primary></IndexTerm>
   3 <IndexTerm><Primary>extensions, GHC</Primary></IndexTerm>
   4 As with all known Haskell systems, GHC implements some extensions to
   5 the language.  To use them, you'll need to give a <Option>-fglasgow-exts</Option>
   6 <IndexTerm><Primary>-fglasgow-exts option</Primary></IndexTerm> option.
   7 </Para>
   8
   9 <Para>
  10 Virtually all of the Glasgow extensions serve to give you access to
  11 the underlying facilities with which we implement Haskell.  Thus, you
  12 can get at the Raw Iron, if you are willing to write some non-standard
  13 code at a more primitive level.  You need not be &ldquo;stuck&rdquo; on
  14 performance because of the implementation costs of Haskell's
  15 &ldquo;high-level&rdquo; features&mdash;you can always code &ldquo;under&rdquo; them.  In an extreme case, you can write all your time-critical code in C, and then just glue it together with Haskell!
  16 </Para>
  17
  18 <Para>
  19 Executive summary of our extensions:
  20 </Para>
  21
  22 <Para>
  23 <VariableList>
  24
  25 <VarListEntry>
  26 <Term>Unboxed types and primitive operations:</Term>
  27 <ListItem>
  28 <Para>
  29 You can get right down to the raw machine types and operations;
  30 included in this are &ldquo;primitive arrays&rdquo; (direct access to Big Wads
  31 of Bytes).  Please see <XRef LinkEnd="glasgow-unboxed"> and following.
  32 </Para>
  33 </ListItem>
  34 </VarListEntry>
  35
  36 <VarListEntry>
  37 <Term>Multi-parameter type classes:</Term>
  38 <ListItem>
  39 <Para>
  40 GHC's type system supports extended type classes with multiple
  41 parameters.  Please see <XRef LinkEnd="multi-param-type-classes">.
  42 </Para>
  43 </ListItem>
  44 </VarListEntry>
  45
  46 <VarListEntry>
  47 <Term>Local universal quantification:</Term>
  48 <ListItem>
  49 <Para>
  50 GHC's type system supports explicit universal quantification in
  51 constructor fields and function arguments.  This is useful for things
  52 like defining <Literal>runST</Literal> from the state-thread world.  See <XRef LinkEnd="universal-quantification">.
  53 </Para>
  54 </ListItem>
  55 </VarListEntry>
  56
  57 <VarListEntry>
  58 <Term>Extistentially quantification in data types:</Term>
  59 <ListItem>
  60 <Para>
  61 Some or all of the type variables in a datatype declaration may be
  62 <Emphasis>existentially quantified</Emphasis>.  More details in <XRef LinkEnd="existential-quantification">.
  63 </Para>
  64 </ListItem>
  65 </VarListEntry>
  66
  67 <VarListEntry>
  68 <Term>Scoped type variables:</Term>
  69 <ListItem>
  70 <Para>
  71 Scoped type variables enable the programmer to supply type signatures
  72 for some nested declarations, where this would not be legal in Haskell
  73 98.  Details in <XRef LinkEnd="scoped-type-variables">.
  74 </Para>
  75 </ListItem>
  76 </VarListEntry>
  77
  78 <VarListEntry>
  79 <Term>Pattern guards</Term>
  80 <ListItem>
  81 <Para>
  82 Instead of being a boolean expression, a guard is a list of qualifiers, exactly as in a list comprehension. See <XRef LinkEnd="pattern-guards">.
  83 </Para>
  84 </ListItem>
  85 </VarListEntry>
  86
  87 <VarListEntry>
  88 <Term>Foreign calling:</Term>
  89 <ListItem>
  90 <Para>
  91 Just what it sounds like.  We provide <Emphasis>lots</Emphasis> of rope that you
  92 can dangle around your neck.  Please see <XRef LinkEnd="ffi">.
  93 </Para>
  94 </ListItem>
  95 </VarListEntry>
  96
  97 <VarListEntry>
  98 <Term>Pragmas</Term>
  99 <ListItem>
 100 <Para>
 101 Pragmas are special instructions to the compiler placed in the source
 102 file.  The pragmas GHC supports are described in <XRef LinkEnd="pragmas">.
 103 </Para>
 104 </ListItem>
 105 </VarListEntry>
 106
 107 <VarListEntry>
 108 <Term>Rewrite rules:</Term>
 109 <ListItem>
 110 <Para>
 111 The programmer can specify rewrite rules as part of the source program
 112 (in a pragma).  GHC applies these rewrite rules wherever it can.
 113 Details in <XRef LinkEnd="rewrite-rules">.
 114 </Para>
 115 </ListItem>
 116 </VarListEntry>
 117 </VariableList>
 118 </Para>
 119
 120 <Para>
 121 Before you get too carried away working at the lowest level (e.g.,
 122 sloshing <Literal>MutableByteArray&num;</Literal>s around your
 123 program), you may wish to check if there are libraries that provide a
 124 &ldquo;Haskellised veneer&rdquo; over the features you want.  See
 125 <xref linkend="book-hslibs">.
 126 </Para>
 127
 128 <Sect1 id="primitives">
 129 <Title>Unboxed types and primitive operations
 130 </Title>
 131 <IndexTerm><Primary>PrelGHC module</Primary></IndexTerm>
 132
 133 <Para>
 134 This module defines all the types which are primitive in Glasgow
 135 Haskell, and the operations provided for them.
 136 </Para>
 137
 138 <Sect2 id="glasgow-unboxed">
 139 <Title>Unboxed types
 140 </Title>
 141
 142 <Para>
 143 <IndexTerm><Primary>Unboxed types (Glasgow extension)</Primary></IndexTerm>
 144 </Para>
 145
 146 <para>Most types in GHC are <firstterm>boxed</firstterm>, which means
 147 that values of that type are represented by a pointer to a heap
 148 object.  The representation of a Haskell <literal>Int</literal>, for
 149 example, is a two-word heap object.  An <firstterm>unboxed</firstterm>
 150 type, however, is represented by the value itself, no pointers or heap
 151 allocation are involved.
 152 </para>
 153
 154 <Para>
 155 Unboxed types correspond to the &ldquo;raw machine&rdquo; types you
 156 would use in C: <Literal>Int&num;</Literal> (long int),
 157 <Literal>Double&num;</Literal> (double), <Literal>Addr&num;</Literal>
 158 (void *), etc.  The <Emphasis>primitive operations</Emphasis>
 159 (PrimOps) on these types are what you might expect; e.g.,
 160 <Literal>(+&num;)</Literal> is addition on
 161 <Literal>Int&num;</Literal>s, and is the machine-addition that we all
 162 know and love&mdash;usually one instruction.
 163 </Para>
 164
 165 <Para>
 166 Primitive (unboxed) types cannot be defined in Haskell, and are
 167 therefore built into the language and compiler.  Primitive types are
 168 always unlifted; that is, a value of a primitive type cannot be
 169 bottom.  We use the convention that primitive types, values, and
 170 operations have a <Literal>&num;</Literal> suffix.
 171 </Para>
 172
 173 <Para>
 174 Primitive values are often represented by a simple bit-pattern, such
 175 as <Literal>Int&num;</Literal>, <Literal>Float&num;</Literal>,
 176 <Literal>Double&num;</Literal>.  But this is not necessarily the case:
 177 a primitive value might be represented by a pointer to a
 178 heap-allocated object.  Examples include
 179 <Literal>Array&num;</Literal>, the type of primitive arrays.  A
 180 primitive array is heap-allocated because it is too big a value to fit
 181 in a register, and would be too expensive to copy around; in a sense,
 182 it is accidental that it is represented by a pointer.  If a pointer
 183 represents a primitive value, then it really does point to that value:
 184 no unevaluated thunks, no indirections&hellip;nothing can be at the
 185 other end of the pointer than the primitive value.
 186 </Para>
 187
 188 <Para>
 189 There are some restrictions on the use of primitive types, the main
 190 one being that you can't pass a primitive value to a polymorphic
 191 function or store one in a polymorphic data type.  This rules out
 192 things like <Literal>[Int&num;]</Literal> (i.e. lists of primitive
 193 integers).  The reason for this restriction is that polymorphic
 194 arguments and constructor fields are assumed to be pointers: if an
 195 unboxed integer is stored in one of these, the garbage collector would
 196 attempt to follow it, leading to unpredictable space leaks.  Or a
 197 <Function>seq</Function> operation on the polymorphic component may
 198 attempt to dereference the pointer, with disastrous results.  Even
 199 worse, the unboxed value might be larger than a pointer
 200 (<Literal>Double&num;</Literal> for instance).
 201 </Para>
 202
 203 <Para>
 204 Nevertheless, A numerically-intensive program using unboxed types can
 205 go a <Emphasis>lot</Emphasis> faster than its &ldquo;standard&rdquo;
 206 counterpart&mdash;we saw a threefold speedup on one example.
 207 </Para>
 208
 209 </sect2>
 210
 211 <Sect2 id="unboxed-tuples">
 212 <Title>Unboxed Tuples
 213 </Title>
 214
 215 <Para>
 216 Unboxed tuples aren't really exported by <Literal>PrelGHC</Literal>,
 217 they're available by default with <Option>-fglasgow-exts</Option>.  An
 218 unboxed tuple looks like this:
 219 </Para>
 220
 221 <Para>
 222
 223 <ProgramListing>
 224 (# e_1, ..., e_n #)
 225 </ProgramListing>
 226
 227 </Para>
 228
 229 <Para>
 230 where <Literal>e&lowbar;1..e&lowbar;n</Literal> are expressions of any
 231 type (primitive or non-primitive).  The type of an unboxed tuple looks
 232 the same.
 233 </Para>
 234
 235 <Para>
 236 Unboxed tuples are used for functions that need to return multiple
 237 values, but they avoid the heap allocation normally associated with
 238 using fully-fledged tuples.  When an unboxed tuple is returned, the
 239 components are put directly into registers or on the stack; the
 240 unboxed tuple itself does not have a composite representation.  Many
 241 of the primitive operations listed in this section return unboxed
 242 tuples.
 243 </Para>
 244
 245 <Para>
 246 There are some pretty stringent restrictions on the use of unboxed tuples:
 247 </Para>
 248
 249 <Para>
 250
 251 <ItemizedList>
 252 <ListItem>
 253
 254 <Para>
 255  Unboxed tuple types are subject to the same restrictions as
 256 other unboxed types; i.e. they may not be stored in polymorphic data
 257 structures or passed to polymorphic functions.
 258
 259 </Para>
 260 </ListItem>
 261 <ListItem>
 262
 263 <Para>
 264  Unboxed tuples may only be constructed as the direct result of
 265 a function, and may only be deconstructed with a <Literal>case</Literal> expression.
 266 eg. the following are valid:
 267
 268
 269 <ProgramListing>
 270 f x y = (# x+1, y-1 #)
 271 g x = case f x x of { (# a, b #) -&#62; a + b }
 272 </ProgramListing>
 273
 274
 275 but the following are invalid:
 276
 277
 278 <ProgramListing>
 279 f x y = g (# x, y #)
 280 g (# x, y #) = x + y
 281 </ProgramListing>
 282
 283
 284 </Para>
 285 </ListItem>
 286 <ListItem>
 287
 288 <Para>
 289  No variable can have an unboxed tuple type.  This is illegal:
 290
 291
 292 <ProgramListing>
 293 f :: (# Int, Int #) -&#62; (# Int, Int #)
 294 f x = x
 295 </ProgramListing>
 296
 297
 298 because <VarName>x</VarName> has an unboxed tuple type.
 299
 300 </Para>
 301 </ListItem>
 302
 303 </ItemizedList>
 304
 305 </Para>
 306
 307 <Para>
 308 Note: we may relax some of these restrictions in the future.
 309 </Para>
 310
 311 <Para>
 312 The <Literal>IO</Literal> and <Literal>ST</Literal> monads use unboxed tuples to avoid unnecessary
 313 allocation during sequences of operations.
 314 </Para>
 315
 316 </Sect2>
 317
 318 <Sect2>
 319 <Title>Character and numeric types</Title>
 320
 321 <Para>
 322 <IndexTerm><Primary>character types, primitive</Primary></IndexTerm>
 323 <IndexTerm><Primary>numeric types, primitive</Primary></IndexTerm>
 324 <IndexTerm><Primary>integer types, primitive</Primary></IndexTerm>
 325 <IndexTerm><Primary>floating point types, primitive</Primary></IndexTerm>
 326 There are the following obvious primitive types:
 327 </Para>
 328
 329 <Para>
 330
 331 <ProgramListing>
 332 type Char#
 333 type Int#
 334 type Word#
 335 type Addr#
 336 type Float#
 337 type Double#
 338 type Int64#
 339 type Word64#
 340 </ProgramListing>
 341
 342 <IndexTerm><Primary><literal>Char&num;</literal></Primary></IndexTerm>
 343 <IndexTerm><Primary><literal>Int&num;</literal></Primary></IndexTerm>
 344 <IndexTerm><Primary><literal>Word&num;</literal></Primary></IndexTerm>
 345 <IndexTerm><Primary><literal>Addr&num;</literal></Primary></IndexTerm>
 346 <IndexTerm><Primary><literal>Float&num;</literal></Primary></IndexTerm>
 347 <IndexTerm><Primary><literal>Double&num;</literal></Primary></IndexTerm>
 348 <IndexTerm><Primary><literal>Int64&num;</literal></Primary></IndexTerm>
 349 <IndexTerm><Primary><literal>Word64&num;</literal></Primary></IndexTerm>
 350 </Para>
 351
 352 <Para>
 353 If you really want to know their exact equivalents in C, see
 354 <Filename>ghc/includes/StgTypes.h</Filename> in the GHC source tree.
 355 </Para>
 356
 357 <Para>
 358 Literals for these types may be written as follows:
 359 </Para>
 360
 361 <Para>
 362
 363 <ProgramListing>
 364 1#              an Int#
 365 1.2#            a Float#
 366 1.34##          a Double#
 367 'a'#            a Char#; for weird characters, use '\o&#60;octal&#62;'#
 368 "a"#            an Addr# (a `char *')
 369 </ProgramListing>
 370
 371 <IndexTerm><Primary>literals, primitive</Primary></IndexTerm>
 372 <IndexTerm><Primary>constants, primitive</Primary></IndexTerm>
 373 <IndexTerm><Primary>numbers, primitive</Primary></IndexTerm>
 374 </Para>
 375
 376 </Sect2>
 377
 378 <Sect2>
 379 <Title>Comparison operations</Title>
 380
 381 <Para>
 382 <IndexTerm><Primary>comparisons, primitive</Primary></IndexTerm>
 383 <IndexTerm><Primary>operators, comparison</Primary></IndexTerm>
 384 </Para>
 385
 386 <Para>
 387
 388 <ProgramListing>
 389 {&#62;,&#62;=,==,/=,&#60;,&#60;=}# :: Int# -&#62; Int# -&#62; Bool
 390
 391 {gt,ge,eq,ne,lt,le}Char# :: Char# -&#62; Char# -&#62; Bool
 392     -- ditto for Word# and Addr#
 393 </ProgramListing>
 394
 395 <IndexTerm><Primary><literal>&#62;&num;</literal></Primary></IndexTerm>
 396 <IndexTerm><Primary><literal>&#62;=&num;</literal></Primary></IndexTerm>
 397 <IndexTerm><Primary><literal>==&num;</literal></Primary></IndexTerm>
 398 <IndexTerm><Primary><literal>/=&num;</literal></Primary></IndexTerm>
 399 <IndexTerm><Primary><literal>&#60;&num;</literal></Primary></IndexTerm>
 400 <IndexTerm><Primary><literal>&#60;=&num;</literal></Primary></IndexTerm>
 401 <IndexTerm><Primary><literal>gt&lcub;Char,Word,Addr&rcub;&num;</literal></Primary></IndexTerm>
 402 <IndexTerm><Primary><literal>ge&lcub;Char,Word,Addr&rcub;&num;</literal></Primary></IndexTerm>
 403 <IndexTerm><Primary><literal>eq&lcub;Char,Word,Addr&rcub;&num;</literal></Primary></IndexTerm>
 404 <IndexTerm><Primary><literal>ne&lcub;Char,Word,Addr&rcub;&num;</literal></Primary></IndexTerm>
 405 <IndexTerm><Primary><literal>lt&lcub;Char,Word,Addr&rcub;&num;</literal></Primary></IndexTerm>
 406 <IndexTerm><Primary><literal>le&lcub;Char,Word,Addr&rcub;&num;</literal></Primary></IndexTerm>
 407 </Para>
 408
 409 </Sect2>
 410
 411 <Sect2>
 412 <Title>Primitive-character operations</Title>
 413
 414 <Para>
 415 <IndexTerm><Primary>characters, primitive operations</Primary></IndexTerm>
 416 <IndexTerm><Primary>operators, primitive character</Primary></IndexTerm>
 417 </Para>
 418
 419 <Para>
 420
 421 <ProgramListing>
 422 ord# :: Char# -&#62; Int#
 423 chr# :: Int# -&#62; Char#
 424 </ProgramListing>
 425
 426 <IndexTerm><Primary><literal>ord&num;</literal></Primary></IndexTerm>
 427 <IndexTerm><Primary><literal>chr&num;</literal></Primary></IndexTerm>
 428 </Para>
 429
 430 </Sect2>
 431
 432 <Sect2>
 433 <Title>Primitive-<Literal>Int</Literal> operations</Title>
 434
 435 <Para>
 436 <IndexTerm><Primary>integers, primitive operations</Primary></IndexTerm>
 437 <IndexTerm><Primary>operators, primitive integer</Primary></IndexTerm>
 438 </Para>
 439
 440 <Para>
 441
 442 <ProgramListing>
 443 {+,-,*,quotInt,remInt,gcdInt}# :: Int# -&#62; Int# -&#62; Int#
 444 negateInt# :: Int# -&#62; Int#
 445
 446 iShiftL#, iShiftRA#, iShiftRL# :: Int# -&#62; Int# -&#62; Int#
 447         -- shift left, right arithmetic, right logical
 448
 449 addIntC#, subIntC#, mulIntC# :: Int# -> Int# -> (# Int#, Int# #)
 450         -- add, subtract, multiply with carry
 451 </ProgramListing>
 452
 453 <IndexTerm><Primary><literal>+&num;</literal></Primary></IndexTerm>
 454 <IndexTerm><Primary><literal>-&num;</literal></Primary></IndexTerm>
 455 <IndexTerm><Primary><literal>*&num;</literal></Primary></IndexTerm>
 456 <IndexTerm><Primary><literal>quotInt&num;</literal></Primary></IndexTerm>
 457 <IndexTerm><Primary><literal>remInt&num;</literal></Primary></IndexTerm>
 458 <IndexTerm><Primary><literal>gcdInt&num;</literal></Primary></IndexTerm>
 459 <IndexTerm><Primary><literal>iShiftL&num;</literal></Primary></IndexTerm>
 460 <IndexTerm><Primary><literal>iShiftRA&num;</literal></Primary></IndexTerm>
 461 <IndexTerm><Primary><literal>iShiftRL&num;</literal></Primary></IndexTerm>
 462 <IndexTerm><Primary><literal>addIntC&num;</literal></Primary></IndexTerm>
 463 <IndexTerm><Primary><literal>subIntC&num;</literal></Primary></IndexTerm>
 464 <IndexTerm><Primary><literal>mulIntC&num;</literal></Primary></IndexTerm>
 465 <IndexTerm><Primary>shift operations, integer</Primary></IndexTerm>
 466 </Para>
 467
 468 <Para>
 469 <Emphasis>Note:</Emphasis> No error/overflow checking!
 470 </Para>
 471
 472 </Sect2>
 473
 474 <Sect2>
 475 <Title>Primitive-<Literal>Double</Literal> and <Literal>Float</Literal> operations</Title>
 476
 477 <Para>
 478 <IndexTerm><Primary>floating point numbers, primitive</Primary></IndexTerm>
 479 <IndexTerm><Primary>operators, primitive floating point</Primary></IndexTerm>
 480 </Para>
 481
 482 <Para>
 483
 484 <ProgramListing>
 485 {+,-,*,/}##         :: Double# -&#62; Double# -&#62; Double#
 486 {&#60;,&#60;=,==,/=,&#62;=,&#62;}## :: Double# -&#62; Double# -&#62; Bool
 487 negateDouble#       :: Double# -&#62; Double#
 488 double2Int#         :: Double# -&#62; Int#
 489 int2Double#         :: Int#    -&#62; Double#
 490
 491 {plus,minux,times,divide}Float# :: Float# -&#62; Float# -&#62; Float#
 492 {gt,ge,eq,ne,lt,le}Float# :: Float# -&#62; Float# -&#62; Bool
 493 negateFloat#        :: Float# -&#62; Float#
 494 float2Int#          :: Float# -&#62; Int#
 495 int2Float#          :: Int#   -&#62; Float#
 496 </ProgramListing>
 497
 498 </Para>
 499
 500 <Para>
 501 <IndexTerm><Primary><literal>+&num;&num;</literal></Primary></IndexTerm>
 502 <IndexTerm><Primary><literal>-&num;&num;</literal></Primary></IndexTerm>
 503 <IndexTerm><Primary><literal>*&num;&num;</literal></Primary></IndexTerm>
 504 <IndexTerm><Primary><literal>/&num;&num;</literal></Primary></IndexTerm>
 505 <IndexTerm><Primary><literal>&#60;&num;&num;</literal></Primary></IndexTerm>
 506 <IndexTerm><Primary><literal>&#60;=&num;&num;</literal></Primary></IndexTerm>
 507 <IndexTerm><Primary><literal>==&num;&num;</literal></Primary></IndexTerm>
 508 <IndexTerm><Primary><literal>=/&num;&num;</literal></Primary></IndexTerm>
 509 <IndexTerm><Primary><literal>&#62;=&num;&num;</literal></Primary></IndexTerm>
 510 <IndexTerm><Primary><literal>&#62;&num;&num;</literal></Primary></IndexTerm>
 511 <IndexTerm><Primary><literal>negateDouble&num;</literal></Primary></IndexTerm>
 512 <IndexTerm><Primary><literal>double2Int&num;</literal></Primary></IndexTerm>
 513 <IndexTerm><Primary><literal>int2Double&num;</literal></Primary></IndexTerm>
 514 </Para>
 515
 516 <Para>
 517 <IndexTerm><Primary><literal>plusFloat&num;</literal></Primary></IndexTerm>
 518 <IndexTerm><Primary><literal>minusFloat&num;</literal></Primary></IndexTerm>
 519 <IndexTerm><Primary><literal>timesFloat&num;</literal></Primary></IndexTerm>
 520 <IndexTerm><Primary><literal>divideFloat&num;</literal></Primary></IndexTerm>
 521 <IndexTerm><Primary><literal>gtFloat&num;</literal></Primary></IndexTerm>
 522 <IndexTerm><Primary><literal>geFloat&num;</literal></Primary></IndexTerm>
 523 <IndexTerm><Primary><literal>eqFloat&num;</literal></Primary></IndexTerm>
 524 <IndexTerm><Primary><literal>neFloat&num;</literal></Primary></IndexTerm>
 525 <IndexTerm><Primary><literal>ltFloat&num;</literal></Primary></IndexTerm>
 526 <IndexTerm><Primary><literal>leFloat&num;</literal></Primary></IndexTerm>
 527 <IndexTerm><Primary><literal>negateFloat&num;</literal></Primary></IndexTerm>
 528 <IndexTerm><Primary><literal>float2Int&num;</literal></Primary></IndexTerm>
 529 <IndexTerm><Primary><literal>int2Float&num;</literal></Primary></IndexTerm>
 530 </Para>
 531
 532 <Para>
 533 And a full complement of trigonometric functions:
 534 </Para>
 535
 536 <Para>
 537
 538 <ProgramListing>
 539 expDouble#      :: Double# -&#62; Double#
 540 logDouble#      :: Double# -&#62; Double#
 541 sqrtDouble#     :: Double# -&#62; Double#
 542 sinDouble#      :: Double# -&#62; Double#
 543 cosDouble#      :: Double# -&#62; Double#
 544 tanDouble#      :: Double# -&#62; Double#
 545 asinDouble#     :: Double# -&#62; Double#
 546 acosDouble#     :: Double# -&#62; Double#
 547 atanDouble#     :: Double# -&#62; Double#
 548 sinhDouble#     :: Double# -&#62; Double#
 549 coshDouble#     :: Double# -&#62; Double#
 550 tanhDouble#     :: Double# -&#62; Double#
 551 powerDouble#    :: Double# -&#62; Double# -&#62; Double#
 552 </ProgramListing>
 553
 554 <IndexTerm><Primary>trigonometric functions, primitive</Primary></IndexTerm>
 555 </Para>
 556
 557 <Para>
 558 similarly for <Literal>Float&num;</Literal>.
 559 </Para>
 560
 561 <Para>
 562 There are two coercion functions for <Literal>Float&num;</Literal>/<Literal>Double&num;</Literal>:
 563 </Para>
 564
 565 <Para>
 566
 567 <ProgramListing>
 568 float2Double#   :: Float# -&#62; Double#
 569 double2Float#   :: Double# -&#62; Float#
 570 </ProgramListing>
 571
 572 <IndexTerm><Primary><literal>float2Double&num;</literal></Primary></IndexTerm>
 573 <IndexTerm><Primary><literal>double2Float&num;</literal></Primary></IndexTerm>
 574 </Para>
 575
 576 <Para>
 577 The primitive version of <Function>decodeDouble</Function>
 578 (<Function>encodeDouble</Function> is implemented as an external C
 579 function):
 580 </Para>
 581
 582 <Para>
 583
 584 <ProgramListing>
 585 decodeDouble#   :: Double# -&#62; PrelNum.ReturnIntAndGMP
 586 </ProgramListing>
 587
 588 <IndexTerm><Primary><literal>encodeDouble&num;</literal></Primary></IndexTerm>
 589 <IndexTerm><Primary><literal>decodeDouble&num;</literal></Primary></IndexTerm>
 590 </Para>
 591
 592 <Para>
 593 (And the same for <Literal>Float&num;</Literal>s.)
 594 </Para>
 595
 596 </Sect2>
 597
 598 <Sect2 id="integer-operations">
 599 <Title>Operations on/for <Literal>Integers</Literal> (interface to GMP)
 600 </Title>
 601
 602 <Para>
 603 <IndexTerm><Primary>arbitrary precision integers</Primary></IndexTerm>
 604 <IndexTerm><Primary>Integer, operations on</Primary></IndexTerm>
 605 </Para>
 606
 607 <Para>
 608 We implement <Literal>Integers</Literal> (arbitrary-precision
 609 integers) using the GNU multiple-precision (GMP) package (version
 610 2.0.2).
 611 </Para>
 612
 613 <Para>
 614 The data type for <Literal>Integer</Literal> is either a small
 615 integer, represented by an <Literal>Int</Literal>, or a large integer
 616 represented using the pieces required by GMP's
 617 <Literal>MP&lowbar;INT</Literal> in <Filename>gmp.h</Filename> (see
 618 <Filename>gmp.info</Filename> in
 619 <Filename>ghc/includes/runtime/gmp</Filename>).  It comes out as:
 620 </Para>
 621
 622 <Para>
 623
 624 <ProgramListing>
 625 data Integer = S# Int#             -- small integers
 626              | J# Int# ByteArray#  -- large integers
 627 </ProgramListing>
 628
 629 <IndexTerm><Primary>Integer type</Primary></IndexTerm> The primitive
 630 ops to support large <Literal>Integers</Literal> use the
 631 &ldquo;pieces&rdquo; of the representation, and are as follows:
 632 </Para>
 633
 634 <Para>
 635
 636 <ProgramListing>
 637 negateInteger#  :: Int# -&#62; ByteArray# -&#62; Integer
 638
 639 {plus,minus,times}Integer#, gcdInteger#,
 640   quotInteger#, remInteger#, divExactInteger#
 641         :: Int# -> ByteArray#
 642         -> Int# -> ByteArray#
 643         -> (# Int#, ByteArray# #)
 644
 645 cmpInteger#
 646         :: Int# -> ByteArray#
 647         -> Int# -> ByteArray#
 648         -> Int# -- -1 for &#60;; 0 for ==; +1 for >
 649
 650 cmpIntegerInt#
 651         :: Int# -> ByteArray#
 652         -> Int#
 653         -> Int# -- -1 for &#60;; 0 for ==; +1 for >
 654
 655 gcdIntegerInt# ::
 656         :: Int# -> ByteArray#
 657         -> Int#
 658         -> Int#
 659
 660 divModInteger#, quotRemInteger#
 661         :: Int# -> ByteArray#
 662         -> Int# -> ByteArray#
 663         -> (# Int#, ByteArray#,
 664                   Int#, ByteArray# #)
 665
 666 integer2Int# :: Int# -> ByteArray# -> Int#
 667
 668 int2Integer#  :: Int#  -> Integer -- NB: no error-checking on these two!
 669 word2Integer# :: Word# -> Integer
 670
 671 addr2Integer# :: Addr# -> Integer
 672         -- the Addr# is taken to be a `char *' string
 673         -- to be converted into an Integer.
 674 </ProgramListing>
 675
 676 <IndexTerm><Primary><literal>negateInteger&num;</literal></Primary></IndexTerm>
 677 <IndexTerm><Primary><literal>plusInteger&num;</literal></Primary></IndexTerm>
 678 <IndexTerm><Primary><literal>minusInteger&num;</literal></Primary></IndexTerm>
 679 <IndexTerm><Primary><literal>timesInteger&num;</literal></Primary></IndexTerm>
 680 <IndexTerm><Primary><literal>quotInteger&num;</literal></Primary></IndexTerm>
 681 <IndexTerm><Primary><literal>remInteger&num;</literal></Primary></IndexTerm>
 682 <IndexTerm><Primary><literal>gcdInteger&num;</literal></Primary></IndexTerm>
 683 <IndexTerm><Primary><literal>gcdIntegerInt&num;</literal></Primary></IndexTerm>
 684 <IndexTerm><Primary><literal>divExactInteger&num;</literal></Primary></IndexTerm>
 685 <IndexTerm><Primary><literal>cmpInteger&num;</literal></Primary></IndexTerm>
 686 <IndexTerm><Primary><literal>divModInteger&num;</literal></Primary></IndexTerm>
 687 <IndexTerm><Primary><literal>quotRemInteger&num;</literal></Primary></IndexTerm>
 688 <IndexTerm><Primary><literal>integer2Int&num;</literal></Primary></IndexTerm>
 689 <IndexTerm><Primary><literal>int2Integer&num;</literal></Primary></IndexTerm>
 690 <IndexTerm><Primary><literal>word2Integer&num;</literal></Primary></IndexTerm>
 691 <IndexTerm><Primary><literal>addr2Integer&num;</literal></Primary></IndexTerm>
 692 </Para>
 693
 694 </Sect2>
 695
 696 <Sect2>
 697 <Title>Words and addresses</Title>
 698
 699 <Para>
 700 <IndexTerm><Primary>word, primitive type</Primary></IndexTerm>
 701 <IndexTerm><Primary>address, primitive type</Primary></IndexTerm>
 702 <IndexTerm><Primary>unsigned integer, primitive type</Primary></IndexTerm>
 703 <IndexTerm><Primary>pointer, primitive type</Primary></IndexTerm>
 704 </Para>
 705
 706 <Para>
 707 A <Literal>Word&num;</Literal> is used for bit-twiddling operations.
 708 It is the same size as an <Literal>Int&num;</Literal>, but has no sign
 709 nor any arithmetic operations.
 710
 711 <ProgramListing>
 712 type Word#      -- Same size/etc as Int# but *unsigned*
 713 type Addr#      -- A pointer from outside the "Haskell world" (from C, probably);
 714                 -- described under "arrays"
 715 </ProgramListing>
 716
 717 <IndexTerm><Primary><literal>Word&num;</literal></Primary></IndexTerm>
 718 <IndexTerm><Primary><literal>Addr&num;</literal></Primary></IndexTerm>
 719 </Para>
 720
 721 <Para>
 722 <Literal>Word&num;</Literal>s and <Literal>Addr&num;</Literal>s have
 723 the usual comparison operations.  Other
 724 unboxed-<Literal>Word</Literal> ops (bit-twiddling and coercions):
 725 </Para>
 726
 727 <Para>
 728
 729 <ProgramListing>
 730 {gt,ge,eq,ne,lt,le}Word# :: Word# -> Word# -> Bool
 731
 732 and#, or#, xor# :: Word# -> Word# -> Word#
 733         -- standard bit ops.
 734
 735 quotWord#, remWord# :: Word# -> Word# -> Word#
 736         -- word (i.e. unsigned) versions are different from int
 737         -- versions, so we have to provide these explicitly.
 738
 739 not# :: Word# -> Word#
 740
 741 shiftL#, shiftRL# :: Word# -> Int# -> Word#
 742         -- shift left, right logical
 743
 744 int2Word#       :: Int#  -> Word# -- just a cast, really
 745 word2Int#       :: Word# -> Int#
 746 </ProgramListing>
 747
 748 <IndexTerm><Primary>bit operations, Word and Addr</Primary></IndexTerm>
 749 <IndexTerm><Primary><literal>gtWord&num;</literal></Primary></IndexTerm>
 750 <IndexTerm><Primary><literal>geWord&num;</literal></Primary></IndexTerm>
 751 <IndexTerm><Primary><literal>eqWord&num;</literal></Primary></IndexTerm>
 752 <IndexTerm><Primary><literal>neWord&num;</literal></Primary></IndexTerm>
 753 <IndexTerm><Primary><literal>ltWord&num;</literal></Primary></IndexTerm>
 754 <IndexTerm><Primary><literal>leWord&num;</literal></Primary></IndexTerm>
 755 <IndexTerm><Primary><literal>and&num;</literal></Primary></IndexTerm>
 756 <IndexTerm><Primary><literal>or&num;</literal></Primary></IndexTerm>
 757 <IndexTerm><Primary><literal>xor&num;</literal></Primary></IndexTerm>
 758 <IndexTerm><Primary><literal>not&num;</literal></Primary></IndexTerm>
 759 <IndexTerm><Primary><literal>quotWord&num;</literal></Primary></IndexTerm>
 760 <IndexTerm><Primary><literal>remWord&num;</literal></Primary></IndexTerm>
 761 <IndexTerm><Primary><literal>shiftL&num;</literal></Primary></IndexTerm>
 762 <IndexTerm><Primary><literal>shiftRA&num;</literal></Primary></IndexTerm>
 763 <IndexTerm><Primary><literal>shiftRL&num;</literal></Primary></IndexTerm>
 764 <IndexTerm><Primary><literal>int2Word&num;</literal></Primary></IndexTerm>
 765 <IndexTerm><Primary><literal>word2Int&num;</literal></Primary></IndexTerm>
 766 </Para>
 767
 768 <Para>
 769 Unboxed-<Literal>Addr</Literal> ops (C casts, really):
 770
 771 <ProgramListing>
 772 {gt,ge,eq,ne,lt,le}Addr# :: Addr# -> Addr# -> Bool
 773
 774 int2Addr#       :: Int#  -> Addr#
 775 addr2Int#       :: Addr# -> Int#
 776 addr2Integer#   :: Addr# -> (# Int#, ByteArray# #)
 777 </ProgramListing>
 778
 779 <IndexTerm><Primary><literal>gtAddr&num;</literal></Primary></IndexTerm>
 780 <IndexTerm><Primary><literal>geAddr&num;</literal></Primary></IndexTerm>
 781 <IndexTerm><Primary><literal>eqAddr&num;</literal></Primary></IndexTerm>
 782 <IndexTerm><Primary><literal>neAddr&num;</literal></Primary></IndexTerm>
 783 <IndexTerm><Primary><literal>ltAddr&num;</literal></Primary></IndexTerm>
 784 <IndexTerm><Primary><literal>leAddr&num;</literal></Primary></IndexTerm>
 785 <IndexTerm><Primary><literal>int2Addr&num;</literal></Primary></IndexTerm>
 786 <IndexTerm><Primary><literal>addr2Int&num;</literal></Primary></IndexTerm>
 787 <IndexTerm><Primary><literal>addr2Integer&num;</literal></Primary></IndexTerm>
 788 </Para>
 789
 790 <Para>
 791 The casts between <Literal>Int&num;</Literal>,
 792 <Literal>Word&num;</Literal> and <Literal>Addr&num;</Literal>
 793 correspond to null operations at the machine level, but are required
 794 to keep the Haskell type checker happy.
 795 </Para>
 796
 797 <Para>
 798 Operations for indexing off of C pointers
 799 (<Literal>Addr&num;</Literal>s) to snatch values are listed under
 800 &ldquo;arrays&rdquo;.
 801 </Para>
 802
 803 </Sect2>
 804
 805 <Sect2>
 806 <Title>Arrays</Title>
 807
 808 <Para>
 809 <IndexTerm><Primary>arrays, primitive</Primary></IndexTerm>
 810 </Para>
 811
 812 <Para>
 813 The type <Literal>Array&num; elt</Literal> is the type of primitive,
 814 unpointed arrays of values of type <Literal>elt</Literal>.
 815 </Para>
 816
 817 <Para>
 818
 819 <ProgramListing>
 820 type Array# elt
 821 </ProgramListing>
 822
 823 <IndexTerm><Primary><literal>Array&num;</literal></Primary></IndexTerm>
 824 </Para>
 825
 826 <Para>
 827 <Literal>Array&num;</Literal> is more primitive than a Haskell
 828 array&mdash;indeed, the Haskell <Literal>Array</Literal> interface is
 829 implemented using <Literal>Array&num;</Literal>&mdash;in that an
 830 <Literal>Array&num;</Literal> is indexed only by
 831 <Literal>Int&num;</Literal>s, starting at zero.  It is also more
 832 primitive by virtue of being unboxed.  That doesn't mean that it isn't
 833 a heap-allocated object&mdash;of course, it is.  Rather, being unboxed
 834 means that it is represented by a pointer to the array itself, and not
 835 to a thunk which will evaluate to the array (or to bottom).  The
 836 components of an <Literal>Array&num;</Literal> are themselves boxed.
 837 </Para>
 838
 839 <Para>
 840 The type <Literal>ByteArray&num;</Literal> is similar to
 841 <Literal>Array&num;</Literal>, except that it contains just a string
 842 of (non-pointer) bytes.
 843 </Para>
 844
 845 <Para>
 846
 847 <ProgramListing>
 848 type ByteArray#
 849 </ProgramListing>
 850
 851 <IndexTerm><Primary><literal>ByteArray&num;</literal></Primary></IndexTerm>
 852 </Para>
 853
 854 <Para>
 855 Arrays of these types are useful when a Haskell program wishes to
 856 construct a value to pass to a C procedure. It is also possible to use
 857 them to build (say) arrays of unboxed characters for internal use in a
 858 Haskell program.  Given these uses, <Literal>ByteArray&num;</Literal>
 859 is deliberately a bit vague about the type of its components.
 860 Operations are provided to extract values of type
 861 <Literal>Char&num;</Literal>, <Literal>Int&num;</Literal>,
 862 <Literal>Float&num;</Literal>, <Literal>Double&num;</Literal>, and
 863 <Literal>Addr&num;</Literal> from arbitrary offsets within a
 864 <Literal>ByteArray&num;</Literal>.  (For type
 865 <Literal>Foo&num;</Literal>, the $i$th offset gets you the $i$th
 866 <Literal>Foo&num;</Literal>, not the <Literal>Foo&num;</Literal> at
 867 byte-position $i$.  Mumble.)  (If you want a
 868 <Literal>Word&num;</Literal>, grab an <Literal>Int&num;</Literal>,
 869 then coerce it.)
 870 </Para>
 871
 872 <Para>
 873 Lastly, we have static byte-arrays, of type
 874 <Literal>Addr&num;</Literal> &lsqb;mentioned previously].  (Remember
 875 the duality between arrays and pointers in C.)  Arrays of this types
 876 are represented by a pointer to an array in the world outside Haskell,
 877 so this pointer is not followed by the garbage collector.  In other
 878 respects they are just like <Literal>ByteArray&num;</Literal>.  They
 879 are only needed in order to pass values from C to Haskell.
 880 </Para>
 881
 882 </Sect2>
 883
 884 <Sect2>
 885 <Title>Reading and writing</Title>
 886
 887 <Para>
 888 Primitive arrays are linear, and indexed starting at zero.
 889 </Para>
 890
 891 <Para>
 892 The size and indices of a <Literal>ByteArray&num;</Literal>, <Literal>Addr&num;</Literal>, and
 893 <Literal>MutableByteArray&num;</Literal> are all in bytes.  It's up to the program to
 894 calculate the correct byte offset from the start of the array.  This
 895 allows a <Literal>ByteArray&num;</Literal> to contain a mixture of values of different
 896 type, which is often needed when preparing data for and unpicking
 897 results from C.  (Umm&hellip;not true of indices&hellip;WDP 95/09)
 898 </Para>
 899
 900 <Para>
 901 <Emphasis>Should we provide some <Literal>sizeOfDouble&num;</Literal> constants?</Emphasis>
 902 </Para>
 903
 904 <Para>
 905 Out-of-range errors on indexing should be caught by the code which
 906 uses the primitive operation; the primitive operations themselves do
 907 <Emphasis>not</Emphasis> check for out-of-range indexes. The intention is that the
 908 primitive ops compile to one machine instruction or thereabouts.
 909 </Para>
 910
 911 <Para>
 912 We use the terms &ldquo;reading&rdquo; and &ldquo;writing&rdquo; to refer to accessing
 913 <Emphasis>mutable</Emphasis> arrays (see <XRef LinkEnd="sect-mutable">), and
 914 &ldquo;indexing&rdquo; to refer to reading a value from an <Emphasis>immutable</Emphasis>
 915 array.
 916 </Para>
 917
 918 <Para>
 919 Immutable byte arrays are straightforward to index (all indices in bytes):
 920
 921 <ProgramListing>
 922 indexCharArray#   :: ByteArray# -> Int# -> Char#
 923 indexIntArray#    :: ByteArray# -> Int# -> Int#
 924 indexAddrArray#   :: ByteArray# -> Int# -> Addr#
 925 indexFloatArray#  :: ByteArray# -> Int# -> Float#
 926 indexDoubleArray# :: ByteArray# -> Int# -> Double#
 927
 928 indexCharOffAddr#   :: Addr# -> Int# -> Char#
 929 indexIntOffAddr#    :: Addr# -> Int# -> Int#
 930 indexFloatOffAddr#  :: Addr# -> Int# -> Float#
 931 indexDoubleOffAddr# :: Addr# -> Int# -> Double#
 932 indexAddrOffAddr#   :: Addr# -> Int# -> Addr#
 933  -- Get an Addr# from an Addr# offset
 934 </ProgramListing>
 935
 936 <IndexTerm><Primary><literal>indexCharArray&num;</literal></Primary></IndexTerm>
 937 <IndexTerm><Primary><literal>indexIntArray&num;</literal></Primary></IndexTerm>
 938 <IndexTerm><Primary><literal>indexAddrArray&num;</literal></Primary></IndexTerm>
 939 <IndexTerm><Primary><literal>indexFloatArray&num;</literal></Primary></IndexTerm>
 940 <IndexTerm><Primary><literal>indexDoubleArray&num;</literal></Primary></IndexTerm>
 941 <IndexTerm><Primary><literal>indexCharOffAddr&num;</literal></Primary></IndexTerm>
 942 <IndexTerm><Primary><literal>indexIntOffAddr&num;</literal></Primary></IndexTerm>
 943 <IndexTerm><Primary><literal>indexFloatOffAddr&num;</literal></Primary></IndexTerm>
 944 <IndexTerm><Primary><literal>indexDoubleOffAddr&num;</literal></Primary></IndexTerm>
 945 <IndexTerm><Primary><literal>indexAddrOffAddr&num;</literal></Primary></IndexTerm>
 946 </Para>
 947
 948 <Para>
 949 The last of these, <Function>indexAddrOffAddr&num;</Function>, extracts an <Literal>Addr&num;</Literal> using an offset
 950 from another <Literal>Addr&num;</Literal>, thereby providing the ability to follow a chain of
 951 C pointers.
 952 </Para>
 953
 954 <Para>
 955 Something a bit more interesting goes on when indexing arrays of boxed
 956 objects, because the result is simply the boxed object. So presumably
 957 it should be entered&mdash;we never usually return an unevaluated
 958 object!  This is a pain: primitive ops aren't supposed to do
 959 complicated things like enter objects.  The current solution is to
 960 return a single element unboxed tuple (see <XRef LinkEnd="unboxed-tuples">).
 961 </Para>
 962
 963 <Para>
 964
 965 <ProgramListing>
 966 indexArray#       :: Array# elt -> Int# -> (# elt #)
 967 </ProgramListing>
 968
 969 <IndexTerm><Primary><literal>indexArray&num;</literal></Primary></IndexTerm>
 970 </Para>
 971
 972 </Sect2>
 973
 974 <Sect2>
 975 <Title>The state type</Title>
 976
 977 <Para>
 978 <IndexTerm><Primary><literal>state, primitive type</literal></Primary></IndexTerm>
 979 <IndexTerm><Primary><literal>State&num;</literal></Primary></IndexTerm>
 980 </Para>
 981
 982 <Para>
 983 The primitive type <Literal>State&num;</Literal> represents the state of a state
 984 transformer.  It is parameterised on the desired type of state, which
 985 serves to keep states from distinct threads distinct from one another.
 986 But the <Emphasis>only</Emphasis> effect of this parameterisation is in the type
 987 system: all values of type <Literal>State&num;</Literal> are represented in the same way.
 988 Indeed, they are all represented by nothing at all!  The code
 989 generator &ldquo;knows&rdquo; to generate no code, and allocate no registers
 990 etc, for primitive states.
 991 </Para>
 992
 993 <Para>
 994
 995 <ProgramListing>
 996 type State# s
 997 </ProgramListing>
 998
 999 </Para>
1000
1001 <Para>
1002 The type <Literal>GHC.RealWorld</Literal> is truly opaque: there are no values defined
1003 of this type, and no operations over it.  It is &ldquo;primitive&rdquo; in that
1004 sense - but it is <Emphasis>not unlifted!</Emphasis> Its only role in life is to be
1005 the type which distinguishes the <Literal>IO</Literal> state transformer.
1006 </Para>
1007
1008 <Para>
1009
1010 <ProgramListing>
1011 data RealWorld
1012 </ProgramListing>
1013
1014 </Para>
1015
1016 </Sect2>
1017
1018 <Sect2>
1019 <Title>State of the world</Title>
1020
1021 <Para>
1022 A single, primitive, value of type <Literal>State&num; RealWorld</Literal> is provided.
1023 </Para>
1024
1025 <Para>
1026
1027 <ProgramListing>
1028 realWorld# :: State# RealWorld
1029 </ProgramListing>
1030
1031 <IndexTerm><Primary>realWorld&num; state object</Primary></IndexTerm>
1032 </Para>
1033
1034 <Para>
1035 (Note: in the compiler, not a <Literal>PrimOp</Literal>; just a mucho magic
1036 <Literal>Id</Literal>. Exported from <Literal>GHC</Literal>, though).
1037 </Para>
1038
1039 </Sect2>
1040
1041 <Sect2 id="sect-mutable">
1042 <Title>Mutable arrays</Title>
1043
1044 <Para>
1045 <IndexTerm><Primary>mutable arrays</Primary></IndexTerm>
1046 <IndexTerm><Primary>arrays, mutable</Primary></IndexTerm>
1047 Corresponding to <Literal>Array&num;</Literal> and <Literal>ByteArray&num;</Literal>, we have the types of
1048 mutable versions of each.  In each case, the representation is a
1049 pointer to a suitable block of (mutable) heap-allocated storage.
1050 </Para>
1051
1052 <Para>
1053
1054 <ProgramListing>
1055 type MutableArray# s elt
1056 type MutableByteArray# s
1057 </ProgramListing>
1058
1059 <IndexTerm><Primary><literal>MutableArray&num;</literal></Primary></IndexTerm>
1060 <IndexTerm><Primary><literal>MutableByteArray&num;</literal></Primary></IndexTerm>
1061 </Para>
1062
1063 <Sect3>
1064 <Title>Allocation</Title>
1065
1066 <Para>
1067 <IndexTerm><Primary>mutable arrays, allocation</Primary></IndexTerm>
1068 <IndexTerm><Primary>arrays, allocation</Primary></IndexTerm>
1069 <IndexTerm><Primary>allocation, of mutable arrays</Primary></IndexTerm>
1070 </Para>
1071
1072 <Para>
1073 Mutable arrays can be allocated. Only pointer-arrays are initialised;
1074 arrays of non-pointers are filled in by &ldquo;user code&rdquo; rather than by
1075 the array-allocation primitive.  Reason: only the pointer case has to
1076 worry about GC striking with a partly-initialised array.
1077 </Para>
1078
1079 <Para>
1080
1081 <ProgramListing>
1082 newArray#       :: Int# -> elt -> State# s -> (# State# s, MutableArray# s elt #)
1083
1084 newCharArray#   :: Int# -> State# s -> (# State# s, MutableByteArray# s elt #)
1085 newIntArray#    :: Int# -> State# s -> (# State# s, MutableByteArray# s elt #)
1086 newAddrArray#   :: Int# -> State# s -> (# State# s, MutableByteArray# s elt #)
1087 newFloatArray#  :: Int# -> State# s -> (# State# s, MutableByteArray# s elt #)
1088 newDoubleArray# :: Int# -> State# s -> (# State# s, MutableByteArray# s elt #)
1089 </ProgramListing>
1090
1091 <IndexTerm><Primary><literal>newArray&num;</literal></Primary></IndexTerm>
1092 <IndexTerm><Primary><literal>newCharArray&num;</literal></Primary></IndexTerm>
1093 <IndexTerm><Primary><literal>newIntArray&num;</literal></Primary></IndexTerm>
1094 <IndexTerm><Primary><literal>newAddrArray&num;</literal></Primary></IndexTerm>
1095 <IndexTerm><Primary><literal>newFloatArray&num;</literal></Primary></IndexTerm>
1096 <IndexTerm><Primary><literal>newDoubleArray&num;</literal></Primary></IndexTerm>
1097 </Para>
1098
1099 <Para>
1100 The size of a <Literal>ByteArray&num;</Literal> is given in bytes.
1101 </Para>
1102
1103 </Sect3>
1104
1105 <Sect3>
1106 <Title>Reading and writing</Title>
1107
1108 <Para>
1109 <IndexTerm><Primary>arrays, reading and writing</Primary></IndexTerm>
1110 </Para>
1111
1112 <Para>
1113
1114 <ProgramListing>
1115 readArray#       :: MutableArray# s elt -> Int# -> State# s -> (# State# s, elt #)
1116 readCharArray#   :: MutableByteArray# s -> Int# -> State# s -> (# State# s, Char# #)
1117 readIntArray#    :: MutableByteArray# s -> Int# -> State# s -> (# State# s, Int# #)
1118 readAddrArray#   :: MutableByteArray# s -> Int# -> State# s -> (# State# s, Addr# #)
1119 readFloatArray#  :: MutableByteArray# s -> Int# -> State# s -> (# State# s, Float# #)
1120 readDoubleArray# :: MutableByteArray# s -> Int# -> State# s -> (# State# s, Double# #)
1121
1122 writeArray#       :: MutableArray# s elt -> Int# -> elt     -> State# s -> State# s
1123 writeCharArray#   :: MutableByteArray# s -> Int# -> Char#   -> State# s -> State# s
1124 writeIntArray#    :: MutableByteArray# s -> Int# -> Int#    -> State# s -> State# s
1125 writeAddrArray#   :: MutableByteArray# s -> Int# -> Addr#   -> State# s -> State# s
1126 writeFloatArray#  :: MutableByteArray# s -> Int# -> Float#  -> State# s -> State# s
1127 writeDoubleArray# :: MutableByteArray# s -> Int# -> Double# -> State# s -> State# s
1128 </ProgramListing>
1129
1130 <IndexTerm><Primary><literal>readArray&num;</literal></Primary></IndexTerm>
1131 <IndexTerm><Primary><literal>readCharArray&num;</literal></Primary></IndexTerm>
1132 <IndexTerm><Primary><literal>readIntArray&num;</literal></Primary></IndexTerm>
1133 <IndexTerm><Primary><literal>readAddrArray&num;</literal></Primary></IndexTerm>
1134 <IndexTerm><Primary><literal>readFloatArray&num;</literal></Primary></IndexTerm>
1135 <IndexTerm><Primary><literal>readDoubleArray&num;</literal></Primary></IndexTerm>
1136 <IndexTerm><Primary><literal>writeArray&num;</literal></Primary></IndexTerm>
1137 <IndexTerm><Primary><literal>writeCharArray&num;</literal></Primary></IndexTerm>
1138 <IndexTerm><Primary><literal>writeIntArray&num;</literal></Primary></IndexTerm>
1139 <IndexTerm><Primary><literal>writeAddrArray&num;</literal></Primary></IndexTerm>
1140 <IndexTerm><Primary><literal>writeFloatArray&num;</literal></Primary></IndexTerm>
1141 <IndexTerm><Primary><literal>writeDoubleArray&num;</literal></Primary></IndexTerm>
1142 </Para>
1143
1144 </Sect3>
1145
1146 <Sect3>
1147 <Title>Equality</Title>
1148
1149 <Para>
1150 <IndexTerm><Primary>arrays, testing for equality</Primary></IndexTerm>
1151 </Para>
1152
1153 <Para>
1154 One can take &ldquo;equality&rdquo; of mutable arrays.  What is compared is the
1155 <Emphasis>name</Emphasis> or reference to the mutable array, not its contents.
1156 </Para>
1157
1158 <Para>
1159
1160 <ProgramListing>
1161 sameMutableArray#     :: MutableArray# s elt -> MutableArray# s elt -> Bool
1162 sameMutableByteArray# :: MutableByteArray# s -> MutableByteArray# s -> Bool
1163 </ProgramListing>
1164
1165 <IndexTerm><Primary><literal>sameMutableArray&num;</literal></Primary></IndexTerm>
1166 <IndexTerm><Primary><literal>sameMutableByteArray&num;</literal></Primary></IndexTerm>
1167 </Para>
1168
1169 </Sect3>
1170
1171 <Sect3>
1172 <Title>Freezing mutable arrays</Title>
1173
1174 <Para>
1175 <IndexTerm><Primary>arrays, freezing mutable</Primary></IndexTerm>
1176 <IndexTerm><Primary>freezing mutable arrays</Primary></IndexTerm>
1177 <IndexTerm><Primary>mutable arrays, freezing</Primary></IndexTerm>
1178 </Para>
1179
1180 <Para>
1181 Only unsafe-freeze has a primitive.  (Safe freeze is done directly in Haskell
1182 by copying the array and then using <Function>unsafeFreeze</Function>.)
1183 </Para>
1184
1185 <Para>
1186
1187 <ProgramListing>
1188 unsafeFreezeArray#     :: MutableArray# s elt -> State# s -> (# State# s, Array# s elt #)
1189 unsafeFreezeByteArray# :: MutableByteArray# s -> State# s -> (# State# s, ByteArray# #)
1190 </ProgramListing>
1191
1192 <IndexTerm><Primary><literal>unsafeFreezeArray&num;</literal></Primary></IndexTerm>
1193 <IndexTerm><Primary><literal>unsafeFreezeByteArray&num;</literal></Primary></IndexTerm>
1194 </Para>
1195
1196 </Sect3>
1197
1198 </Sect2>
1199
1200 <Sect2>
1201 <Title>Synchronizing variables (M-vars)</Title>
1202
1203 <Para>
1204 <IndexTerm><Primary>synchronising variables (M-vars)</Primary></IndexTerm>
1205 <IndexTerm><Primary>M-Vars</Primary></IndexTerm>
1206 </Para>
1207
1208 <Para>
1209 Synchronising variables are the primitive type used to implement
1210 Concurrent Haskell's MVars (see the Concurrent Haskell paper for
1211 the operational behaviour of these operations).
1212 </Para>
1213
1214 <Para>
1215
1216 <ProgramListing>
1217 type MVar# s elt        -- primitive
1218
1219 newMVar#    :: State# s -> (# State# s, MVar# s elt #)
1220 takeMVar#   :: SynchVar# s elt -> State# s -> (# State# s, elt #)
1221 putMVar#    :: SynchVar# s elt -> State# s -> State# s
1222 </ProgramListing>
1223
1224 <IndexTerm><Primary><literal>SynchVar&num;</literal></Primary></IndexTerm>
1225 <IndexTerm><Primary><literal>newSynchVar&num;</literal></Primary></IndexTerm>
1226 <IndexTerm><Primary><literal>takeMVar</literal></Primary></IndexTerm>
1227 <IndexTerm><Primary><literal>putMVar</literal></Primary></IndexTerm>
1228 </Para>
1229
1230 </Sect2>
1231
1232 </Sect1>
1233
1234 <Sect1 id="glasgow-ST-monad">
1235 <Title>Primitive state-transformer monad
1236 </Title>
1237
1238 <Para>
1239 <IndexTerm><Primary>state transformers (Glasgow extensions)</Primary></IndexTerm>
1240 <IndexTerm><Primary>ST monad (Glasgow extension)</Primary></IndexTerm>
1241 </Para>
1242
1243 <Para>
1244 This monad underlies our implementation of arrays, mutable and
1245 immutable, and our implementation of I/O, including &ldquo;C calls&rdquo;.
1246 </Para>
1247
1248 <Para>
1249 The <Literal>ST</Literal> library, which provides access to the
1250 <Function>ST</Function> monad, is described in <xref
1251 linkend="sec-ST">.
1252 </Para>
1253
1254 </Sect1>
1255
1256 <Sect1 id="glasgow-prim-arrays">
1257 <Title>Primitive arrays, mutable and otherwise
1258 </Title>
1259
1260 <Para>
1261 <IndexTerm><Primary>primitive arrays (Glasgow extension)</Primary></IndexTerm>
1262 <IndexTerm><Primary>arrays, primitive (Glasgow extension)</Primary></IndexTerm>
1263 </Para>
1264
1265 <Para>
1266 GHC knows about quite a few flavours of Large Swathes of Bytes.
1267 </Para>
1268
1269 <Para>
1270 First, GHC distinguishes between primitive arrays of (boxed) Haskell
1271 objects (type <Literal>Array&num; obj</Literal>) and primitive arrays of bytes (type
1272 <Literal>ByteArray&num;</Literal>).
1273 </Para>
1274
1275 <Para>
1276 Second, it distinguishes between&hellip;
1277 <VariableList>
1278
1279 <VarListEntry>
1280 <Term>Immutable:</Term>
1281 <ListItem>
1282 <Para>
1283 Arrays that do not change (as with &ldquo;standard&rdquo; Haskell arrays); you
1284 can only read from them.  Obviously, they do not need the care and
1285 attention of the state-transformer monad.
1286 </Para>
1287 </ListItem>
1288 </VarListEntry>
1289 <VarListEntry>
1290 <Term>Mutable:</Term>
1291 <ListItem>
1292 <Para>
1293 Arrays that may be changed or &ldquo;mutated.&rdquo;  All the operations on them
1294 live within the state-transformer monad and the updates happen
1295 <Emphasis>in-place</Emphasis>.
1296 </Para>
1297 </ListItem>
1298 </VarListEntry>
1299 <VarListEntry>
1300 <Term>&ldquo;Static&rdquo; (in C land):</Term>
1301 <ListItem>
1302 <Para>
1303 A C routine may pass an <Literal>Addr&num;</Literal> pointer back into Haskell land.  There
1304 are then primitive operations with which you may merrily grab values
1305 over in C land, by indexing off the &ldquo;static&rdquo; pointer.
1306 </Para>
1307 </ListItem>
1308 </VarListEntry>
1309 <VarListEntry>
1310 <Term>&ldquo;Stable&rdquo; pointers:</Term>
1311 <ListItem>
1312 <Para>
1313 If, for some reason, you wish to hand a Haskell pointer (i.e.,
1314 <Emphasis>not</Emphasis> an unboxed value) to a C routine, you first make the
1315 pointer &ldquo;stable,&rdquo; so that the garbage collector won't forget that it
1316 exists.  That is, GHC provides a safe way to pass Haskell pointers to
1317 C.
1318 </Para>
1319
1320 <Para>
1321 Please see <XRef LinkEnd="glasgow-stablePtrs"> for more details.
1322 </Para>
1323 </ListItem>
1324 </VarListEntry>
1325 <VarListEntry>
1326 <Term>&ldquo;Foreign objects&rdquo;:</Term>
1327 <ListItem>
1328 <Para>
1329 A &ldquo;foreign object&rdquo; is a safe way to pass an external object (a
1330 C-allocated pointer, say) to Haskell and have Haskell do the Right
1331 Thing when it no longer references the object.  So, for example, C
1332 could pass a large bitmap over to Haskell and say &ldquo;please free this
1333 memory when you're done with it.&rdquo;
1334 </Para>
1335
1336 <Para>
1337 Please see <XRef LinkEnd="glasgow-foreignObjs"> for more details.
1338 </Para>
1339 </ListItem>
1340 </VarListEntry>
1341 </VariableList>
1342 </Para>
1343
1344 <Para>
1345 The libraries documentatation gives more details on all these
1346 &ldquo;primitive array&rdquo; types and the operations on them.
1347 </Para>
1348
1349 </Sect1>
1350
1351
1352 <Sect1 id="pattern-guards">
1353 <Title>Pattern guards</Title>
1354
1355 <Para>
1356 <IndexTerm><Primary>Pattern guards (Glasgow extension)</Primary></IndexTerm>
1357 The discussion that follows is an abbreviated version of Simon Peyton Jones's original <ULink URL="http://research.microsoft.com/~simonpj/Haskell/guards.html">proposal</ULink>. (Note that the proposal was written before pattern guards were implemented, so refers to them as unimplemented.)
1358 </Para>
1359
1360 <Para>
1361 Suppose we have an abstract data type of finite maps, with a
1362 lookup operation:
1363
1364 <ProgramListing>
1365 lookup :: FiniteMap -> Int -> Maybe Int
1366 </ProgramListing>
1367
1368 The lookup returns <Function>Nothing</Function> if the supplied key is not in the domain of the mapping, and <Function>(Just v)</Function> otherwise,
1369 where <VarName>v</VarName> is the value that the key maps to.  Now consider the following definition:
1370 </Para>
1371
1372 <ProgramListing>
1373 clunky env var1 var2 | ok1 && ok2 = val1 + val2
1374 | otherwise  = var1 + var2
1375 where
1376   m1 = lookup env var1
1377   m2 = lookup env var2
1378   ok1 = maybeToBool m1
1379   ok2 = maybeToBool m2
1380   val1 = expectJust m1
1381   val2 = expectJust m2
1382 </ProgramListing>
1383
1384 <Para>
1385 The auxiliary functions are
1386 </Para>
1387
1388 <ProgramListing>
1389 maybeToBool :: Maybe a -&gt; Bool
1390 maybeToBool (Just x) = True
1391 maybeToBool Nothing  = False
1392
1393 expectJust :: Maybe a -&gt; a
1394 expectJust (Just x) = x
1395 expectJust Nothing  = error "Unexpected Nothing"
1396 </ProgramListing>
1397
1398 <Para>
1399 What is <Function>clunky</Function> doing? The guard <Literal>ok1 &&
1400 ok2</Literal> checks that both lookups succeed, using
1401 <Function>maybeToBool</Function> to convert the <Function>Maybe</Function>
1402 types to booleans. The (lazily evaluated) <Function>expectJust</Function>
1403 calls extract the values from the results of the lookups, and binds the
1404 returned values to <VarName>val1</VarName> and <VarName>val2</VarName>
1405 respectively.  If either lookup fails, then clunky takes the
1406 <Literal>otherwise</Literal> case and returns the sum of its arguments.
1407 </Para>
1408
1409 <Para>
1410 This is certainly legal Haskell, but it is a tremendously verbose and
1411 un-obvious way to achieve the desired effect.  Arguably, a more direct way
1412 to write clunky would be to use case expressions:
1413 </Para>
1414
1415 <ProgramListing>
1416 clunky env var1 var1 = case lookup env var1 of
1417   Nothing -&gt; fail
1418   Just val1 -&gt; case lookup env var2 of
1419     Nothing -&gt; fail
1420     Just val2 -&gt; val1 + val2
1421 where
1422   fail = val1 + val2
1423 </ProgramListing>
1424
1425 <Para>
1426 This is a bit shorter, but hardly better.  Of course, we can rewrite any set
1427 of pattern-matching, guarded equations as case expressions; that is
1428 precisely what the compiler does when compiling equations! The reason that
1429 Haskell provides guarded equations is because they allow us to write down
1430 the cases we want to consider, one at a time, independently of each other.
1431 This structure is hidden in the case version.  Two of the right-hand sides
1432 are really the same (<Function>fail</Function>), and the whole expression
1433 tends to become more and more indented.
1434 </Para>
1435
1436 <Para>
1437 Here is how I would write clunky:
1438 </Para>
1439
1440 <ProgramListing>
1441 clunky env var1 var1
1442   | Just val1 &lt;- lookup env var1
1443   , Just val2 &lt;- lookup env var2
1444   = val1 + val2
1445 ...other equations for clunky...
1446 </ProgramListing>
1447
1448 <Para>
1449 The semantics should be clear enough.  The qualifers are matched in order.
1450 For a <Literal>&lt;-</Literal> qualifier, which I call a pattern guard, the
1451 right hand side is evaluated and matched against the pattern on the left.
1452 If the match fails then the whole guard fails and the next equation is
1453 tried.  If it succeeds, then the appropriate binding takes place, and the
1454 next qualifier is matched, in the augmented environment.  Unlike list
1455 comprehensions, however, the type of the expression to the right of the
1456 <Literal>&lt;-</Literal> is the same as the type of the pattern to its
1457 left.  The bindings introduced by pattern guards scope over all the
1458 remaining guard qualifiers, and over the right hand side of the equation.
1459 </Para>
1460
1461 <Para>
1462 Just as with list comprehensions, boolean expressions can be freely mixed
1463 with among the pattern guards.  For example:
1464 </Para>
1465
1466 <ProgramListing>
1467 f x | [y] <- x
1468     , y > 3
1469     , Just z <- h y
1470     = ...
1471 </ProgramListing>
1472
1473 <Para>
1474 Haskell's current guards therefore emerge as a special case, in which the
1475 qualifier list has just one element, a boolean expression.
1476 </Para>
1477 </Sect1>
1478
1479 <Sect1 id="sec-ffi">
1480 <Title>The foreign interface</Title>
1481
1482 <Para>
1483 The foreign interface consists of language and library support. The former
1484 is described later in <XRef LinkEnd="ffi">; the latter is outlined below,
1485 and detailed in <XRef LinkEnd="sec-Foreign">.
1486 </Para>
1487
1488 <Sect2 id="glasgow-foreign-headers">
1489 <Title>Using function headers
1490 </Title>
1491
1492 <Para>
1493 <IndexTerm><Primary>C calls, function headers</Primary></IndexTerm>
1494 </Para>
1495
1496 <Para>
1497 When generating C (using the <Option>-fvia-C</Option> directive), one can assist the
1498 C compiler in detecting type errors by using the <Command>-&num;include</Command> directive
1499 to provide <Filename>.h</Filename> files containing function headers.
1500 </Para>
1501
1502 <Para>
1503 For example,
1504 </Para>
1505
1506 <Para>
1507
1508 <ProgramListing>
1509 #include "HsFFI.h"
1510
1511 void         initialiseEFS (HsInt size);
1512 HsInt        terminateEFS (void);
1513 HsForeignObj emptyEFS(void);
1514 HsForeignObj updateEFS (HsForeignObj a, HsInt i, HsInt x);
1515 HsInt        lookupEFS (HsForeignObj a, HsInt i);
1516 </ProgramListing>
1517 </Para>
1518
1519       <para>The types <literal>HsInt</literal>,
1520       <literal>HsForeignObj</literal> etc. are described in <xref
1521       linkend="sec-mapping-table">.</Para>
1522
1523       <Para>Note that this approach is only
1524       <Emphasis>essential</Emphasis> for returning
1525       <Literal>float</Literal>s (or if <Literal>sizeof(int) !=
1526       sizeof(int *)</Literal> on your architecture) but is a Good
1527       Thing for anyone who cares about writing solid code.  You're
1528       crazy not to do it.</Para>
1529
1530 </Sect2>
1531
1532 <Sect2 id="glasgow-stablePtrs">
1533 <Title>Subverting automatic unboxing with &ldquo;stable pointers&rdquo;
1534 </Title>
1535
1536 <Para>
1537 <IndexTerm><Primary>stable pointers (Glasgow extension)</Primary></IndexTerm>
1538 </Para>
1539
1540 <Para>
1541 The arguments of a <Function>&lowbar;ccall&lowbar;</Function> automatically unboxed before the
1542 call.  There are two reasons why this is usually the Right Thing to
1543 do:
1544 </Para>
1545
1546 <Para>
1547
1548 <ItemizedList>
1549 <ListItem>
1550
1551 <Para>
1552 C is a strict language: it would be excessively tedious to pass
1553 unevaluated arguments and require the C programmer to force their
1554 evaluation before using them.
1555
1556 </Para>
1557 </ListItem>
1558 <ListItem>
1559
1560 <Para>
1561  Boxed values are stored on the Haskell heap and may be moved
1562 within the heap if a garbage collection occurs&mdash;that is, pointers
1563 to boxed objects are not <Emphasis>stable</Emphasis>.
1564 </Para>
1565 </ListItem>
1566
1567 </ItemizedList>
1568
1569 </Para>
1570
1571 <Para>
1572 It is possible to subvert the unboxing process by creating a &ldquo;stable
1573 pointer&rdquo; to a value and passing the stable pointer instead.  For
1574 example, to pass/return an integer lazily to C functions <Function>storeC</Function> and
1575 <Function>fetchC</Function> might write:
1576 </Para>
1577
1578 <Para>
1579
1580 <ProgramListing>
1581 storeH :: Int -> IO ()
1582 storeH x = makeStablePtr x              >>= \ stable_x ->
1583            _ccall_ storeC stable_x
1584
1585 fetchH :: IO Int
1586 fetchH x = _ccall_ fetchC               >>= \ stable_x ->
1587            deRefStablePtr stable_x      >>= \ x ->
1588            freeStablePtr stable_x       >>
1589            return x
1590 </ProgramListing>
1591
1592 </Para>
1593
1594 <Para>
1595 The garbage collector will refrain from throwing a stable pointer away
1596 until you explicitly call one of the following from C or Haskell.
1597 </Para>
1598
1599 <Para>
1600
1601 <ProgramListing>
1602 void freeStablePointer( StgStablePtr stablePtrToToss )
1603 freeStablePtr :: StablePtr a -> IO ()
1604 </ProgramListing>
1605
1606 </Para>
1607
1608 <Para>
1609 As with the use of <Function>free</Function> in C programs, GREAT CARE SHOULD BE
1610 EXERCISED to ensure these functions are called at the right time: too
1611 early and you get dangling references (and, if you're lucky, an error
1612 message from the runtime system); too late and you get space leaks.
1613 </Para>
1614
1615 <Para>
1616 And to force evaluation of the argument within <Function>fooC</Function>, one would
1617 call one of the following C functions (according to type of argument).
1618 </Para>
1619
1620 <Para>
1621
1622 <ProgramListing>
1623 void     performIO  ( StgStablePtr stableIndex /* StablePtr s (IO ()) */ );
1624 StgInt   enterInt   ( StgStablePtr stableIndex /* StablePtr s Int */ );
1625 StgFloat enterFloat ( StgStablePtr stableIndex /* StablePtr s Float */ );
1626 </ProgramListing>
1627
1628 </Para>
1629
1630 <Para>
1631 <IndexTerm><Primary>performIO</Primary></IndexTerm>
1632 <IndexTerm><Primary>enterInt</Primary></IndexTerm>
1633 <IndexTerm><Primary>enterFloat</Primary></IndexTerm>
1634 </Para>
1635
1636 <Para>
1637 Nota Bene: <Function>&lowbar;ccall&lowbar;GC&lowbar;</Function><IndexTerm><Primary>&lowbar;ccall&lowbar;GC&lowbar;</Primary></IndexTerm> must be used if any of
1638 these functions are used.
1639 </Para>
1640
1641 </Sect2>
1642
1643 <Sect2 id="glasgow-foreignObjs">
1644 <Title>Foreign objects: pointing outside the Haskell heap
1645 </Title>
1646
1647 <Para>
1648 <IndexTerm><Primary>foreign objects (Glasgow extension)</Primary></IndexTerm>
1649 </Para>
1650
1651 <Para>
1652 There are two types that GHC programs can use to reference
1653 (heap-allocated) objects outside the Haskell world: <Literal>Addr</Literal> and
1654 <Literal>ForeignObj</Literal>.
1655 </Para>
1656
1657 <Para>
1658 If you use <Literal>Addr</Literal>, it is up to you to the programmer to arrange
1659 allocation and deallocation of the objects.
1660 </Para>
1661
1662 <Para>
1663 If you use <Literal>ForeignObj</Literal>, GHC's garbage collector will call upon the
1664 user-supplied <Emphasis>finaliser</Emphasis> function to free the object when the
1665 Haskell world no longer can access the object.  (An object is
1666 associated with a finaliser function when the abstract
1667 Haskell type <Literal>ForeignObj</Literal> is created). The finaliser function is
1668 expressed in C, and is passed as argument the object:
1669 </Para>
1670
1671 <Para>
1672
1673 <ProgramListing>
1674 void foreignFinaliser ( StgForeignObj fo )
1675 </ProgramListing>
1676
1677 </Para>
1678
1679 <Para>
1680 when the Haskell world can no longer access the object.  Since
1681 <Literal>ForeignObj</Literal>s only get released when a garbage collection occurs, we
1682 provide ways of triggering a garbage collection from within C and from
1683 within Haskell.
1684 </Para>
1685
1686 <Para>
1687
1688 <ProgramListing>
1689 void GarbageCollect()
1690 performGC :: IO ()
1691 </ProgramListing>
1692
1693 </Para>
1694
1695 <Para>
1696 More information on the programmers' interface to <Literal>ForeignObj</Literal> can be
1697 found in the library documentation.
1698 </Para>
1699
1700 </Sect2>
1701
1702 <Sect2 id="glasgow-avoiding-monads">
1703 <Title>Avoiding monads
1704 </Title>
1705
1706 <Para>
1707 <IndexTerm><Primary>C calls to `pure C'</Primary></IndexTerm>
1708 <IndexTerm><Primary>unsafePerformIO</Primary></IndexTerm>
1709 </Para>
1710
1711 <Para>
1712 The <Function>&lowbar;ccall&lowbar;</Function> construct is part of the <Literal>IO</Literal> monad because 9 out of 10
1713 uses will be to call imperative functions with side effects such as
1714 <Function>printf</Function>.  Use of the monad ensures that these operations happen in a
1715 predictable order in spite of laziness and compiler optimisations.
1716 </Para>
1717
1718 <Para>
1719 To avoid having to be in the monad to call a C function, it is
1720 possible to use <Function>unsafePerformIO</Function>, which is available from the
1721 <Literal>IOExts</Literal> module.  There are three situations where one might like to
1722 call a C function from outside the IO world:
1723 </Para>
1724
1725 <Para>
1726
1727 <ItemizedList>
1728 <ListItem>
1729
1730 <Para>
1731 Calling a function with no side-effects:
1732
1733 <ProgramListing>
1734 atan2d :: Double -> Double -> Double
1735 atan2d y x = unsafePerformIO (_ccall_ atan2d y x)
1736
1737 sincosd :: Double -> (Double, Double)
1738 sincosd x = unsafePerformIO $ do
1739         da &#60;- newDoubleArray (0, 1)
1740         _casm_ &ldquo;sincosd( %0, &amp;((double *)%1[0]), &amp;((double *)%1[1]) );&rdquo; x da
1741         s &#60;- readDoubleArray da 0
1742         c &#60;- readDoubleArray da 1
1743         return (s, c)
1744 </ProgramListing>
1745
1746
1747 </Para>
1748 </ListItem>
1749 <ListItem>
1750
1751 <Para>
1752  Calling a set of functions which have side-effects but which can
1753 be used in a purely functional manner.
1754
1755 For example, an imperative implementation of a purely functional
1756 lookup-table might be accessed using the following functions.
1757
1758
1759 <ProgramListing>
1760 empty  :: EFS x
1761 update :: EFS x -> Int -> x -> EFS x
1762 lookup :: EFS a -> Int -> a
1763
1764 empty = unsafePerformIO (_ccall_ emptyEFS)
1765
1766 update a i x = unsafePerformIO $
1767         makeStablePtr x         >>= \ stable_x ->
1768         _ccall_ updateEFS a i stable_x
1769
1770 lookup a i = unsafePerformIO $
1771         _ccall_ lookupEFS a i   >>= \ stable_x ->
1772         deRefStablePtr stable_x
1773 </ProgramListing>
1774
1775
1776 You will almost always want to use <Literal>ForeignObj</Literal>s with this.
1777
1778 </Para>
1779 </ListItem>
1780 <ListItem>
1781
1782 <Para>
1783  Calling a side-effecting function even though the results will
1784 be unpredictable.  For example the <Function>trace</Function> function is defined by:
1785
1786
1787 <ProgramListing>
1788 trace :: String -> a -> a
1789 trace string expr
1790   = unsafePerformIO (
1791         ((_ccall_ PreTraceHook sTDERR{-msg-}):: IO ())  >>
1792         fputs sTDERR string                             >>
1793         ((_ccall_ PostTraceHook sTDERR{-msg-}):: IO ()) >>
1794         return expr )
1795   where
1796     sTDERR = (&ldquo;stderr&rdquo; :: Addr)
1797 </ProgramListing>
1798
1799
1800 (This kind of use is not highly recommended&mdash;it is only really
1801 useful in debugging code.)
1802 </Para>
1803 </ListItem>
1804
1805 </ItemizedList>
1806
1807 </Para>
1808
1809 </Sect2>
1810
1811 <Sect2 id="ccall-gotchas">
1812 <Title>C-calling &ldquo;gotchas&rdquo; checklist
1813 </Title>
1814
1815 <Para>
1816 <IndexTerm><Primary>C call dangers</Primary></IndexTerm>
1817 <IndexTerm><Primary>CCallable</Primary></IndexTerm>
1818 <IndexTerm><Primary>CReturnable</Primary></IndexTerm>
1819 </Para>
1820
1821 <Para>
1822 And some advice, too.
1823 </Para>
1824
1825 <Para>
1826
1827 <ItemizedList>
1828 <ListItem>
1829
1830 <Para>
1831  For modules that use <Function>&lowbar;ccall&lowbar;</Function>s, etc., compile with
1832 <Option>-fvia-C</Option>.<IndexTerm><Primary>-fvia-C option</Primary></IndexTerm> You don't have to, but you should.
1833
1834 Also, use the <Option>-&num;include "prototypes.h"</Option> flag (hack) to inform the C
1835 compiler of the fully-prototyped types of all the C functions you
1836 call.  (<XRef LinkEnd="glasgow-foreign-headers"> says more about this&hellip;)
1837
1838 This scheme is the <Emphasis>only</Emphasis> way that you will get <Emphasis>any</Emphasis>
1839 typechecking of your <Function>&lowbar;ccall&lowbar;</Function>s.  (It shouldn't be that way, but&hellip;).
1840 GHC will pass the flag <Option>-Wimplicit</Option> to <Command>gcc</Command> so that you'll get warnings
1841 if any <Function>&lowbar;ccall&lowbar;</Function>ed functions have no prototypes.
1842
1843 </Para>
1844 </ListItem>
1845 <ListItem>
1846
1847 <Para>
1848 Try to avoid <Function>&lowbar;ccall&lowbar;</Function>s to C&nbsp;functions that take <Literal>float</Literal>
1849 arguments or return <Literal>float</Literal> results.  Reason: if you do, you will
1850 become entangled in (ANSI?) C's rules for when arguments/results are
1851 promoted to <Literal>doubles</Literal>.  It's a nightmare and just not worth it.
1852 Use <Literal>doubles</Literal> if possible.
1853
1854 If you do use <Literal>floats</Literal>, check and re-check that the right thing is
1855 happening.  Perhaps compile with <Option>-keep-hc-file-too</Option> and look at
1856 the intermediate C (<Function>.hc</Function>).
1857
1858 </Para>
1859 </ListItem>
1860 <ListItem>
1861
1862 <Para>
1863  The compiler uses two non-standard type-classes when
1864 type-checking the arguments and results of <Function>&lowbar;ccall&lowbar;</Function>: the arguments
1865 (respectively result) of <Function>&lowbar;ccall&lowbar;</Function> must be instances of the class
1866 <Literal>CCallable</Literal> (respectively <Literal>CReturnable</Literal>).  Both classes may be
1867 imported from the module <Literal>CCall</Literal>, but this should only be
1868 necessary if you want to define a new instance.  (Neither class
1869 defines any methods&mdash;their only function is to keep the
1870 type-checker happy.)
1871
1872 The type checker must be able to figure out just which of the
1873 C-callable/returnable types is being used.  If it can't, you have to
1874 add type signatures. For example,
1875
1876
1877 <ProgramListing>
1878 f x = _ccall_ foo x
1879 </ProgramListing>
1880
1881
1882 is not good enough, because the compiler can't work out what type <VarName>x</VarName>
1883 is, nor what type the <Function>&lowbar;ccall&lowbar;</Function> returns.  You have to write, say:
1884
1885
1886 <ProgramListing>
1887 f :: Int -> IO Double
1888 f x = _ccall_ foo x
1889 </ProgramListing>
1890
1891
1892 This table summarises the standard instances of these classes.
1893
1894 <InformalTable>
1895 <TGroup Cols="4">
1896 <ColSpec Align="Left" Colsep="0">
1897 <ColSpec Align="Left" Colsep="0">
1898 <ColSpec Align="Left" Colsep="0">
1899 <ColSpec Align="Left" Colsep="0">
1900 <TBody>
1901 <Row>
1902 <Entry><Emphasis>Type</Emphasis> </Entry>
1903 <Entry><Emphasis>CCallable</Emphasis></Entry>
1904 <Entry><Emphasis>CReturnable</Emphasis> </Entry>
1905 <Entry><Emphasis>Which is probably&hellip;</Emphasis> </Entry>
1906 </Row>
1907 <Row>
1908 <Entry>
1909 <Literal>Char</Literal> </Entry>
1910 <Entry> Yes </Entry>
1911 <Entry> Yes </Entry>
1912 <Entry> <Literal>unsigned char</Literal> </Entry>
1913 </Row>
1914 <Row>
1915 <Entry>
1916 <Literal>Int</Literal> </Entry>
1917 <Entry> Yes </Entry>
1918 <Entry> Yes </Entry>
1919 <Entry> <Literal>long int</Literal> </Entry>
1920 </Row>
1921 <Row>
1922 <Entry>
1923 <Literal>Word</Literal> </Entry>
1924 <Entry> Yes </Entry>
1925 <Entry> Yes </Entry>
1926 <Entry> <Literal>unsigned long int</Literal> </Entry>
1927 </Row>
1928 <Row>
1929 <Entry>
1930 <Literal>Addr</Literal> </Entry>
1931 <Entry> Yes </Entry>
1932 <Entry> Yes </Entry>
1933 <Entry> <Literal>void *</Literal> </Entry>
1934 </Row>
1935 <Row>
1936 <Entry>
1937 <Literal>Float</Literal> </Entry>
1938 <Entry> Yes </Entry>
1939 <Entry> Yes </Entry>
1940 <Entry> <Literal>float</Literal> </Entry>
1941 </Row>
1942 <Row>
1943 <Entry>
1944 <Literal>Double</Literal> </Entry>
1945 <Entry> Yes </Entry>
1946 <Entry> Yes </Entry>
1947 <Entry> <Literal>double</Literal> </Entry>
1948 </Row>
1949 <Row>
1950 <Entry>
1951 <Literal>()</Literal> </Entry>
1952 <Entry> No </Entry>
1953 <Entry> Yes </Entry>
1954 <Entry> <Literal>void</Literal> </Entry>
1955 </Row>
1956 <Row>
1957 <Entry>
1958 <Literal>[Char]</Literal> </Entry>
1959 <Entry> Yes </Entry>
1960 <Entry> No </Entry>
1961 <Entry> <Literal>char *</Literal> (null-terminated) </Entry>
1962 </Row>
1963 <Row>
1964 <Entry>
1965 <Literal>Array</Literal> </Entry>
1966 <Entry> Yes </Entry>
1967 <Entry> No </Entry>
1968 <Entry> <Literal>unsigned long *</Literal> </Entry>
1969 </Row>
1970 <Row>
1971 <Entry>
1972 <Literal>ByteArray</Literal> </Entry>
1973 <Entry> Yes </Entry>
1974 <Entry> No </Entry>
1975 <Entry> <Literal>unsigned long *</Literal> </Entry>
1976 </Row>
1977 <Row>
1978 <Entry>
1979 <Literal>MutableArray</Literal> </Entry>
1980 <Entry> Yes </Entry>
1981 <Entry> No </Entry>
1982 <Entry> <Literal>unsigned long *</Literal> </Entry>
1983 </Row>
1984 <Row>
1985 <Entry>
1986 <Literal>MutableByteArray</Literal> </Entry>
1987 <Entry> Yes </Entry>
1988 <Entry> No </Entry>
1989 <Entry> <Literal>unsigned long *</Literal> </Entry>
1990 </Row>
1991 <Row>
1992 <Entry>
1993 <Literal>State</Literal> </Entry>
1994 <Entry> Yes </Entry>
1995 <Entry> Yes </Entry>
1996 <Entry> nothing!</Entry>
1997 </Row>
1998 <Row>
1999 <Entry>
2000 <Literal>StablePtr</Literal> </Entry>
2001 <Entry> Yes </Entry>
2002 <Entry> Yes </Entry>
2003 <Entry> <Literal>unsigned long *</Literal> </Entry>
2004 </Row>
2005 <Row>
2006 <Entry>
2007 <Literal>ForeignObjs</Literal> </Entry>
2008 <Entry> Yes </Entry>
2009 <Entry> Yes </Entry>
2010 <Entry> see later </Entry>
2011 </Row>
2012
2013 </TBody>
2014
2015 </TGroup>
2016 </InformalTable>
2017
2018 Actually, the <Literal>Word</Literal> type is defined as being the same size as a
2019 pointer on the target architecture, which is <Emphasis>probably</Emphasis>
2020 <Literal>unsigned long int</Literal>.
2021
2022 The brave and careful programmer can add their own instances of these
2023 classes for the following types:
2024
2025
2026 <ItemizedList>
2027 <ListItem>
2028
2029 <Para>
2030 A <Emphasis>boxed-primitive</Emphasis> type may be made an instance of both
2031 <Literal>CCallable</Literal> and <Literal>CReturnable</Literal>.
2032
2033 A boxed primitive type is any data type with a
2034 single unary constructor with a single primitive argument.  For
2035 example, the following are all boxed primitive types:
2036
2037
2038 <ProgramListing>
2039 Int
2040 Double
2041 data XDisplay = XDisplay Addr#
2042 data EFS a = EFS# ForeignObj#
2043 </ProgramListing>
2044
2045
2046
2047 <ProgramListing>
2048 instance CCallable   (EFS a)
2049 instance CReturnable (EFS a)
2050 </ProgramListing>
2051
2052
2053 </Para>
2054 </ListItem>
2055 <ListItem>
2056
2057 <Para>
2058  Any datatype with a single nullary constructor may be made an
2059 instance of <Literal>CReturnable</Literal>.  For example:
2060
2061
2062 <ProgramListing>
2063 data MyVoid = MyVoid
2064 instance CReturnable MyVoid
2065 </ProgramListing>
2066
2067
2068 </Para>
2069 </ListItem>
2070 <ListItem>
2071
2072 <Para>
2073  As at version 2.09, <Literal>String</Literal> (i.e., <Literal>[Char]</Literal>) is still
2074 not a <Literal>CReturnable</Literal> type.
2075
2076 Also, the now-builtin type <Literal>PackedString</Literal> is neither
2077 <Literal>CCallable</Literal> nor <Literal>CReturnable</Literal>.  (But there are functions in
2078 the PackedString interface to let you get at the necessary bits&hellip;)
2079 </Para>
2080 </ListItem>
2081
2082 </ItemizedList>
2083
2084
2085 </Para>
2086 </ListItem>
2087 <ListItem>
2088
2089 <Para>
2090  The code-generator will complain if you attempt to use <Literal>&percnt;r</Literal> in
2091 a <Literal>&lowbar;casm&lowbar;</Literal> whose result type is <Literal>IO ()</Literal>; or if you don't use <Literal>&percnt;r</Literal>
2092 <Emphasis>precisely</Emphasis> once for any other result type.  These messages are
2093 supposed to be helpful and catch bugs&mdash;please tell us if they wreck
2094 your life.
2095
2096 </Para>
2097 </ListItem>
2098 <ListItem>
2099
2100 <Para>
2101  If you call out to C code which may trigger the Haskell garbage
2102 collector or create new threads (examples of this later&hellip;), then you
2103 must use the <Function>&lowbar;ccall&lowbar;GC&lowbar;</Function><IndexTerm><Primary>&lowbar;ccall&lowbar;GC&lowbar; primitive</Primary></IndexTerm> or
2104 <Function>&lowbar;casm&lowbar;GC&lowbar;</Function><IndexTerm><Primary>&lowbar;casm&lowbar;GC&lowbar; primitive</Primary></IndexTerm> variant of C-calls.  (This
2105 does not work with the native code generator&mdash;use <Option>-fvia-C</Option>.) This
2106 stuff is hairy with a capital H!
2107 </Para>
2108 </ListItem>
2109
2110 </ItemizedList>
2111
2112 </Para>
2113
2114 </Sect2>
2115
2116 </Sect1>
2117
2118 <Sect1 id="multi-param-type-classes">
2119 <Title>Multi-parameter type classes
2120 </Title>
2121
2122 <Para>
2123 This section documents GHC's implementation of multi-parameter type
2124 classes.  There's lots of background in the paper <ULink
2125 URL="http://research.microsoft.com/~simonpj/multi.ps.gz" >Type
2126 classes: exploring the design space</ULink > (Simon Peyton Jones, Mark
2127 Jones, Erik Meijer).
2128 </Para>
2129
2130 <Para>
2131 I'd like to thank people who reported shorcomings in the GHC 3.02
2132 implementation.  Our default decisions were all conservative ones, and
2133 the experience of these heroic pioneers has given useful concrete
2134 examples to support several generalisations.  (These appear below as
2135 design choices not implemented in 3.02.)
2136 </Para>
2137
2138 <Para>
2139 I've discussed these notes with Mark Jones, and I believe that Hugs
2140 will migrate towards the same design choices as I outline here.
2141 Thanks to him, and to many others who have offered very useful
2142 feedback.
2143 </Para>
2144
2145 <Sect2>
2146 <Title>Types</Title>
2147
2148 <Para>
2149 There are the following restrictions on the form of a qualified
2150 type:
2151 </Para>
2152
2153 <Para>
2154
2155 <ProgramListing>
2156   forall tv1..tvn (c1, ...,cn) => type
2157 </ProgramListing>
2158
2159 </Para>
2160
2161 <Para>
2162 (Here, I write the "foralls" explicitly, although the Haskell source
2163 language omits them; in Haskell 1.4, all the free type variables of an
2164 explicit source-language type signature are universally quantified,
2165 except for the class type variables in a class declaration.  However,
2166 in GHC, you can give the foralls if you want.  See <XRef LinkEnd="universal-quantification">).
2167 </Para>
2168
2169 <Para>
2170
2171 <OrderedList>
2172 <ListItem>
2173
2174 <Para>
2175  <Emphasis>Each universally quantified type variable
2176 <Literal>tvi</Literal> must be mentioned (i.e. appear free) in <Literal>type</Literal></Emphasis>.
2177
2178 The reason for this is that a value with a type that does not obey
2179 this restriction could not be used without introducing
2180 ambiguity. Here, for example, is an illegal type:
2181
2182
2183 <ProgramListing>
2184   forall a. Eq a => Int
2185 </ProgramListing>
2186
2187
2188 When a value with this type was used, the constraint <Literal>Eq tv</Literal>
2189 would be introduced where <Literal>tv</Literal> is a fresh type variable, and
2190 (in the dictionary-translation implementation) the value would be
2191 applied to a dictionary for <Literal>Eq tv</Literal>.  The difficulty is that we
2192 can never know which instance of <Literal>Eq</Literal> to use because we never
2193 get any more information about <Literal>tv</Literal>.
2194
2195 </Para>
2196 </ListItem>
2197 <ListItem>
2198
2199 <Para>
2200  <Emphasis>Every constraint <Literal>ci</Literal> must mention at least one of the
2201 universally quantified type variables <Literal>tvi</Literal></Emphasis>.
2202
2203 For example, this type is OK because <Literal>C a b</Literal> mentions the
2204 universally quantified type variable <Literal>b</Literal>:
2205
2206
2207 <ProgramListing>
2208   forall a. C a b => burble
2209 </ProgramListing>
2210
2211
2212 The next type is illegal because the constraint <Literal>Eq b</Literal> does not
2213 mention <Literal>a</Literal>:
2214
2215
2216 <ProgramListing>
2217   forall a. Eq b => burble
2218 </ProgramListing>
2219
2220
2221 The reason for this restriction is milder than the other one.  The
2222 excluded types are never useful or necessary (because the offending
2223 context doesn't need to be witnessed at this point; it can be floated
2224 out).  Furthermore, floating them out increases sharing. Lastly,
2225 excluding them is a conservative choice; it leaves a patch of
2226 territory free in case we need it later.
2227
2228 </Para>
2229 </ListItem>
2230
2231 </OrderedList>
2232
2233 </Para>
2234
2235 <Para>
2236 These restrictions apply to all types, whether declared in a type signature
2237 or inferred.
2238 </Para>
2239
2240 <Para>
2241 Unlike Haskell 1.4, constraints in types do <Emphasis>not</Emphasis> have to be of
2242 the form <Emphasis>(class type-variables)</Emphasis>.  Thus, these type signatures
2243 are perfectly OK
2244 </Para>
2245
2246 <Para>
2247
2248 <ProgramListing>
2249   f :: Eq (m a) => [m a] -> [m a]
2250   g :: Eq [a] => ...
2251 </ProgramListing>
2252
2253 </Para>
2254
2255 <Para>
2256 This choice recovers principal types, a property that Haskell 1.4 does not have.
2257 </Para>
2258
2259 </Sect2>
2260
2261 <Sect2>
2262 <Title>Class declarations</Title>
2263
2264 <Para>
2265
2266 <OrderedList>
2267 <ListItem>
2268
2269 <Para>
2270  <Emphasis>Multi-parameter type classes are permitted</Emphasis>. For example:
2271
2272
2273 <ProgramListing>
2274   class Collection c a where
2275     union :: c a -> c a -> c a
2276     ...etc.
2277 </ProgramListing>
2278
2279
2280
2281 </Para>
2282 </ListItem>
2283 <ListItem>
2284
2285 <Para>
2286  <Emphasis>The class hierarchy must be acyclic</Emphasis>.  However, the definition
2287 of "acyclic" involves only the superclass relationships.  For example,
2288 this is OK:
2289
2290
2291 <ProgramListing>
2292   class C a where {
2293     op :: D b => a -> b -> b
2294   }
2295
2296   class C a => D a where { ... }
2297 </ProgramListing>
2298
2299
2300 Here, <Literal>C</Literal> is a superclass of <Literal>D</Literal>, but it's OK for a
2301 class operation <Literal>op</Literal> of <Literal>C</Literal> to mention <Literal>D</Literal>.  (It
2302 would not be OK for <Literal>D</Literal> to be a superclass of <Literal>C</Literal>.)
2303
2304 </Para>
2305 </ListItem>
2306 <ListItem>
2307
2308 <Para>
2309  <Emphasis>There are no restrictions on the context in a class declaration
2310 (which introduces superclasses), except that the class hierarchy must
2311 be acyclic</Emphasis>.  So these class declarations are OK:
2312
2313
2314 <ProgramListing>
2315   class Functor (m k) => FiniteMap m k where
2316     ...
2317
2318   class (Monad m, Monad (t m)) => Transform t m where
2319     lift :: m a -> (t m) a
2320 </ProgramListing>
2321
2322
2323 </Para>
2324 </ListItem>
2325 <ListItem>
2326
2327 <Para>
2328  <Emphasis>In the signature of a class operation, every constraint
2329 must mention at least one type variable that is not a class type
2330 variable</Emphasis>.
2331
2332 Thus:
2333
2334
2335 <ProgramListing>
2336   class Collection c a where
2337     mapC :: Collection c b => (a->b) -> c a -> c b
2338 </ProgramListing>
2339
2340
2341 is OK because the constraint <Literal>(Collection a b)</Literal> mentions
2342 <Literal>b</Literal>, even though it also mentions the class variable
2343 <Literal>a</Literal>.  On the other hand:
2344
2345
2346 <ProgramListing>
2347   class C a where
2348     op :: Eq a => (a,b) -> (a,b)
2349 </ProgramListing>
2350
2351
2352 is not OK because the constraint <Literal>(Eq a)</Literal> mentions on the class
2353 type variable <Literal>a</Literal>, but not <Literal>b</Literal>.  However, any such
2354 example is easily fixed by moving the offending context up to the
2355 superclass context:
2356
2357
2358 <ProgramListing>
2359   class Eq a => C a where
2360     op ::(a,b) -> (a,b)
2361 </ProgramListing>
2362
2363
2364 A yet more relaxed rule would allow the context of a class-op signature
2365 to mention only class type variables.  However, that conflicts with
2366 Rule 1(b) for types above.
2367
2368 </Para>
2369 </ListItem>
2370 <ListItem>
2371
2372 <Para>
2373  <Emphasis>The type of each class operation must mention <Emphasis>all</Emphasis> of
2374 the class type variables</Emphasis>.  For example:
2375
2376
2377 <ProgramListing>
2378   class Coll s a where
2379     empty  :: s
2380     insert :: s -> a -> s
2381 </ProgramListing>
2382
2383
2384 is not OK, because the type of <Literal>empty</Literal> doesn't mention
2385 <Literal>a</Literal>.  This rule is a consequence of Rule 1(a), above, for
2386 types, and has the same motivation.
2387
2388 Sometimes, offending class declarations exhibit misunderstandings.  For
2389 example, <Literal>Coll</Literal> might be rewritten
2390
2391
2392 <ProgramListing>
2393   class Coll s a where
2394     empty  :: s a
2395     insert :: s a -> a -> s a
2396 </ProgramListing>
2397
2398
2399 which makes the connection between the type of a collection of
2400 <Literal>a</Literal>'s (namely <Literal>(s a)</Literal>) and the element type <Literal>a</Literal>.
2401 Occasionally this really doesn't work, in which case you can split the
2402 class like this:
2403
2404
2405 <ProgramListing>
2406   class CollE s where
2407     empty  :: s
2408
2409   class CollE s => Coll s a where
2410     insert :: s -> a -> s
2411 </ProgramListing>
2412
2413
2414 </Para>
2415 </ListItem>
2416
2417 </OrderedList>
2418
2419 </Para>
2420
2421 </Sect2>
2422
2423 <Sect2>
2424 <Title>Instance declarations</Title>
2425
2426 <Para>
2427
2428 <OrderedList>
2429 <ListItem>
2430
2431 <Para>
2432  <Emphasis>Instance declarations may not overlap</Emphasis>.  The two instance
2433 declarations
2434
2435
2436 <ProgramListing>
2437   instance context1 => C type1 where ...
2438   instance context2 => C type2 where ...
2439 </ProgramListing>
2440
2441
2442 "overlap" if <Literal>type1</Literal> and <Literal>type2</Literal> unify
2443
2444 However, if you give the command line option
2445 <Option>-fallow-overlapping-instances</Option><IndexTerm><Primary>-fallow-overlapping-instances
2446 option</Primary></IndexTerm> then two overlapping instance declarations are permitted
2447 iff
2448
2449
2450 <ItemizedList>
2451 <ListItem>
2452
2453 <Para>
2454  EITHER <Literal>type1</Literal> and <Literal>type2</Literal> do not unify
2455 </Para>
2456 </ListItem>
2457 <ListItem>
2458
2459 <Para>
2460  OR <Literal>type2</Literal> is a substitution instance of <Literal>type1</Literal>
2461 (but not identical to <Literal>type1</Literal>)
2462 </Para>
2463 </ListItem>
2464 <ListItem>
2465
2466 <Para>
2467  OR vice versa
2468 </Para>
2469 </ListItem>
2470
2471 </ItemizedList>
2472
2473
2474 Notice that these rules
2475
2476
2477 <ItemizedList>
2478 <ListItem>
2479
2480 <Para>
2481  make it clear which instance decl to use
2482 (pick the most specific one that matches)
2483
2484 </Para>
2485 </ListItem>
2486 <ListItem>
2487
2488 <Para>
2489  do not mention the contexts <Literal>context1</Literal>, <Literal>context2</Literal>
2490 Reason: you can pick which instance decl
2491 "matches" based on the type.
2492 </Para>
2493 </ListItem>
2494
2495 </ItemizedList>
2496
2497
2498 Regrettably, GHC doesn't guarantee to detect overlapping instance
2499 declarations if they appear in different modules.  GHC can "see" the
2500 instance declarations in the transitive closure of all the modules
2501 imported by the one being compiled, so it can "see" all instance decls
2502 when it is compiling <Literal>Main</Literal>.  However, it currently chooses not
2503 to look at ones that can't possibly be of use in the module currently
2504 being compiled, in the interests of efficiency.  (Perhaps we should
2505 change that decision, at least for <Literal>Main</Literal>.)
2506
2507 </Para>
2508 </ListItem>
2509 <ListItem>
2510
2511 <Para>
2512  <Emphasis>There are no restrictions on the type in an instance
2513 <Emphasis>head</Emphasis>, except that at least one must not be a type variable</Emphasis>.
2514 The instance "head" is the bit after the "=>" in an instance decl. For
2515 example, these are OK:
2516
2517
2518 <ProgramListing>
2519   instance C Int a where ...
2520
2521   instance D (Int, Int) where ...
2522
2523   instance E [[a]] where ...
2524 </ProgramListing>
2525
2526
2527 Note that instance heads <Emphasis>may</Emphasis> contain repeated type variables.
2528 For example, this is OK:
2529
2530
2531 <ProgramListing>
2532   instance Stateful (ST s) (MutVar s) where ...
2533 </ProgramListing>
2534
2535
2536 The "at least one not a type variable" restriction is to ensure that
2537 context reduction terminates: each reduction step removes one type
2538 constructor.  For example, the following would make the type checker
2539 loop if it wasn't excluded:
2540
2541
2542 <ProgramListing>
2543   instance C a => C a where ...
2544 </ProgramListing>
2545
2546
2547 There are two situations in which the rule is a bit of a pain. First,
2548 if one allows overlapping instance declarations then it's quite
2549 convenient to have a "default instance" declaration that applies if
2550 something more specific does not:
2551
2552
2553 <ProgramListing>
2554   instance C a where
2555     op = ... -- Default
2556 </ProgramListing>
2557
2558
2559 Second, sometimes you might want to use the following to get the
2560 effect of a "class synonym":
2561
2562
2563 <ProgramListing>
2564   class (C1 a, C2 a, C3 a) => C a where { }
2565
2566   instance (C1 a, C2 a, C3 a) => C a where { }
2567 </ProgramListing>
2568
2569
2570 This allows you to write shorter signatures:
2571
2572
2573 <ProgramListing>
2574   f :: C a => ...
2575 </ProgramListing>
2576
2577
2578 instead of
2579
2580
2581 <ProgramListing>
2582   f :: (C1 a, C2 a, C3 a) => ...
2583 </ProgramListing>
2584
2585
2586 I'm on the lookout for a simple rule that preserves decidability while
2587 allowing these idioms.  The experimental flag
2588 <Option>-fallow-undecidable-instances</Option><IndexTerm><Primary>-fallow-undecidable-instances
2589 option</Primary></IndexTerm> lifts this restriction, allowing all the types in an
2590 instance head to be type variables.
2591
2592 </Para>
2593 </ListItem>
2594 <ListItem>
2595
2596 <Para>
2597  <Emphasis>Unlike Haskell 1.4, instance heads may use type
2598 synonyms</Emphasis>.  As always, using a type synonym is just shorthand for
2599 writing the RHS of the type synonym definition.  For example:
2600
2601
2602 <ProgramListing>
2603   type Point = (Int,Int)
2604   instance C Point   where ...
2605   instance C [Point] where ...
2606 </ProgramListing>
2607
2608
2609 is legal.  However, if you added
2610
2611
2612 <ProgramListing>
2613   instance C (Int,Int) where ...
2614 </ProgramListing>
2615
2616
2617 as well, then the compiler will complain about the overlapping
2618 (actually, identical) instance declarations.  As always, type synonyms
2619 must be fully applied.  You cannot, for example, write:
2620
2621
2622 <ProgramListing>
2623   type P a = [[a]]
2624   instance Monad P where ...
2625 </ProgramListing>
2626
2627
2628 This design decision is independent of all the others, and easily
2629 reversed, but it makes sense to me.
2630
2631 </Para>
2632 </ListItem>
2633 <ListItem>
2634
2635 <Para>
2636 <Emphasis>The types in an instance-declaration <Emphasis>context</Emphasis> must all
2637 be type variables</Emphasis>. Thus
2638
2639
2640 <ProgramListing>
2641 instance C a b => Eq (a,b) where ...
2642 </ProgramListing>
2643
2644
2645 is OK, but
2646
2647
2648 <ProgramListing>
2649 instance C Int b => Foo b where ...
2650 </ProgramListing>
2651
2652
2653 is not OK.  Again, the intent here is to make sure that context
2654 reduction terminates.
2655
2656 Voluminous correspondence on the Haskell mailing list has convinced me
2657 that it's worth experimenting with a more liberal rule.  If you use
2658 the flag <Option>-fallow-undecidable-instances</Option> can use arbitrary
2659 types in an instance context.  Termination is ensured by having a
2660 fixed-depth recursion stack.  If you exceed the stack depth you get a
2661 sort of backtrace, and the opportunity to increase the stack depth
2662 with <Option>-fcontext-stack</Option><Emphasis>N</Emphasis>.
2663
2664 </Para>
2665 </ListItem>
2666
2667 </OrderedList>
2668
2669 </Para>
2670
2671 </Sect2>
2672
2673 </Sect1>
2674
2675 <Sect1 id="universal-quantification">
2676 <Title>Explicit universal quantification
2677 </Title>
2678
2679 <Para>
2680 GHC now allows you to write explicitly quantified types.  GHC's
2681 syntax for this now agrees with Hugs's, namely:
2682 </Para>
2683
2684 <Para>
2685
2686 <ProgramListing>
2687         forall a b. (Ord a, Eq  b) => a -> b -> a
2688 </ProgramListing>
2689
2690 </Para>
2691
2692 <Para>
2693 The context is, of course, optional.  You can't use <Literal>forall</Literal> as
2694 a type variable any more!
2695 </Para>
2696
2697 <Para>
2698 Haskell type signatures are implicitly quantified.  The <Literal>forall</Literal>
2699 allows us to say exactly what this means.  For example:
2700 </Para>
2701
2702 <Para>
2703
2704 <ProgramListing>
2705         g :: b -> b
2706 </ProgramListing>
2707
2708 </Para>
2709
2710 <Para>
2711 means this:
2712 </Para>
2713
2714 <Para>
2715
2716 <ProgramListing>
2717         g :: forall b. (b -> b)
2718 </ProgramListing>
2719
2720 </Para>
2721
2722 <Para>
2723 The two are treated identically.
2724 </Para>
2725
2726 <Sect2 id="univ">
2727 <Title>Universally-quantified data type fields
2728 </Title>
2729
2730 <Para>
2731 In a <Literal>data</Literal> or <Literal>newtype</Literal> declaration one can quantify
2732 the types of the constructor arguments.  Here are several examples:
2733 </Para>
2734
2735 <Para>
2736
2737 <ProgramListing>
2738 data T a = T1 (forall b. b -> b -> b) a
2739
2740 data MonadT m = MkMonad { return :: forall a. a -> m a,
2741                           bind   :: forall a b. m a -> (a -> m b) -> m b
2742                         }
2743
2744 newtype Swizzle = MkSwizzle (Ord a => [a] -> [a])
2745 </ProgramListing>
2746
2747 </Para>
2748
2749 <Para>
2750 The constructors now have so-called <Emphasis>rank 2</Emphasis> polymorphic
2751 types, in which there is a for-all in the argument types.:
2752 </Para>
2753
2754 <Para>
2755
2756 <ProgramListing>
2757 T1 :: forall a. (forall b. b -> b -> b) -> a -> T a
2758 MkMonad :: forall m. (forall a. a -> m a)
2759                   -> (forall a b. m a -> (a -> m b) -> m b)
2760                   -> MonadT m
2761 MkSwizzle :: (Ord a => [a] -> [a]) -> Swizzle
2762 </ProgramListing>
2763
2764 </Para>
2765
2766 <Para>
2767 Notice that you don't need to use a <Literal>forall</Literal> if there's an
2768 explicit context.  For example in the first argument of the
2769 constructor <Function>MkSwizzle</Function>, an implicit "<Literal>forall a.</Literal>" is
2770 prefixed to the argument type.  The implicit <Literal>forall</Literal>
2771 quantifies all type variables that are not already in scope, and are
2772 mentioned in the type quantified over.
2773 </Para>
2774
2775 <Para>
2776 As for type signatures, implicit quantification happens for non-overloaded
2777 types too.  So if you write this:
2778
2779 <ProgramListing>
2780   data T a = MkT (Either a b) (b -> b)
2781 </ProgramListing>
2782
2783 it's just as if you had written this:
2784
2785 <ProgramListing>
2786   data T a = MkT (forall b. Either a b) (forall b. b -> b)
2787 </ProgramListing>
2788
2789 That is, since the type variable <Literal>b</Literal> isn't in scope, it's
2790 implicitly universally quantified.  (Arguably, it would be better
2791 to <Emphasis>require</Emphasis> explicit quantification on constructor arguments
2792 where that is what is wanted.  Feedback welcomed.)
2793 </Para>
2794
2795 </Sect2>
2796
2797 <Sect2>
2798 <Title>Construction </Title>
2799
2800 <Para>
2801 You construct values of types <Literal>T1, MonadT, Swizzle</Literal> by applying
2802 the constructor to suitable values, just as usual.  For example,
2803 </Para>
2804
2805 <Para>
2806
2807 <ProgramListing>
2808 (T1 (\xy->x) 3) :: T Int
2809
2810 (MkSwizzle sort)    :: Swizzle
2811 (MkSwizzle reverse) :: Swizzle
2812
2813 (let r x = Just x
2814      b m k = case m of
2815                 Just y -> k y
2816                 Nothing -> Nothing
2817   in
2818   MkMonad r b) :: MonadT Maybe
2819 </ProgramListing>
2820
2821 </Para>
2822
2823 <Para>
2824 The type of the argument can, as usual, be more general than the type
2825 required, as <Literal>(MkSwizzle reverse)</Literal> shows.  (<Function>reverse</Function>
2826 does not need the <Literal>Ord</Literal> constraint.)
2827 </Para>
2828
2829 </Sect2>
2830
2831 <Sect2>
2832 <Title>Pattern matching</Title>
2833
2834 <Para>
2835 When you use pattern matching, the bound variables may now have
2836 polymorphic types.  For example:
2837 </Para>
2838
2839 <Para>
2840
2841 <ProgramListing>
2842         f :: T a -> a -> (a, Char)
2843         f (T1 f k) x = (f k x, f 'c' 'd')
2844
2845         g :: (Ord a, Ord b) => Swizzle -> [a] -> (a -> b) -> [b]
2846         g (MkSwizzle s) xs f = s (map f (s xs))
2847
2848         h :: MonadT m -> [m a] -> m [a]
2849         h m [] = return m []
2850         h m (x:xs) = bind m x           $ \y ->
2851                       bind m (h m xs)   $ \ys ->
2852                       return m (y:ys)
2853 </ProgramListing>
2854
2855 </Para>
2856
2857 <Para>
2858 In the function <Function>h</Function> we use the record selectors <Literal>return</Literal>
2859 and <Literal>bind</Literal> to extract the polymorphic bind and return functions
2860 from the <Literal>MonadT</Literal> data structure, rather than using pattern
2861 matching.
2862 </Para>
2863
2864 <Para>
2865 You cannot pattern-match against an argument that is polymorphic.
2866 For example:
2867
2868 <ProgramListing>
2869         newtype TIM s a = TIM (ST s (Maybe a))
2870
2871         runTIM :: (forall s. TIM s a) -> Maybe a
2872         runTIM (TIM m) = runST m
2873 </ProgramListing>
2874
2875 </Para>
2876
2877 <Para>
2878 Here the pattern-match fails, because you can't pattern-match against
2879 an argument of type <Literal>(forall s. TIM s a)</Literal>.  Instead you
2880 must bind the variable and pattern match in the right hand side:
2881
2882 <ProgramListing>
2883         runTIM :: (forall s. TIM s a) -> Maybe a
2884         runTIM tm = case tm of { TIM m -> runST m }
2885 </ProgramListing>
2886
2887 The <Literal>tm</Literal> on the right hand side is (invisibly) instantiated, like
2888 any polymorphic value at its occurrence site, and now you can pattern-match
2889 against it.
2890 </Para>
2891
2892 </Sect2>
2893
2894 <Sect2>
2895 <Title>The partial-application restriction</Title>
2896
2897 <Para>
2898 There is really only one way in which data structures with polymorphic
2899 components might surprise you: you must not partially apply them.
2900 For example, this is illegal:
2901 </Para>
2902
2903 <Para>
2904
2905 <ProgramListing>
2906         map MkSwizzle [sort, reverse]
2907 </ProgramListing>
2908
2909 </Para>
2910
2911 <Para>
2912 The restriction is this: <Emphasis>every subexpression of the program must
2913 have a type that has no for-alls, except that in a function
2914 application (f e1&hellip;en) the partial applications are not subject to
2915 this rule</Emphasis>.  The restriction makes type inference feasible.
2916 </Para>
2917
2918 <Para>
2919 In the illegal example, the sub-expression <Literal>MkSwizzle</Literal> has the
2920 polymorphic type <Literal>(Ord b => [b] -> [b]) -> Swizzle</Literal> and is not
2921 a sub-expression of an enclosing application.  On the other hand, this
2922 expression is OK:
2923 </Para>
2924
2925 <Para>
2926
2927 <ProgramListing>
2928         map (T1 (\a b -> a)) [1,2,3]
2929 </ProgramListing>
2930
2931 </Para>
2932
2933 <Para>
2934 even though it involves a partial application of <Function>T1</Function>, because
2935 the sub-expression <Literal>T1 (\a b -> a)</Literal> has type <Literal>Int -> T
2936 Int</Literal>.
2937 </Para>
2938
2939 </Sect2>
2940
2941 <Sect2 id="sigs">
2942 <Title>Type signatures
2943 </Title>
2944
2945 <Para>
2946 Once you have data constructors with universally-quantified fields, or
2947 constants such as <Constant>runST</Constant> that have rank-2 types, it isn't long
2948 before you discover that you need more!  Consider:
2949 </Para>
2950
2951 <Para>
2952
2953 <ProgramListing>
2954   mkTs f x y = [T1 f x, T1 f y]
2955 </ProgramListing>
2956
2957 </Para>
2958
2959 <Para>
2960 <Function>mkTs</Function> is a fuction that constructs some values of type
2961 <Literal>T</Literal>, using some pieces passed to it.  The trouble is that since
2962 <Literal>f</Literal> is a function argument, Haskell assumes that it is
2963 monomorphic, so we'll get a type error when applying <Function>T1</Function> to
2964 it.  This is a rather silly example, but the problem really bites in
2965 practice.  Lots of people trip over the fact that you can't make
2966 "wrappers functions" for <Constant>runST</Constant> for exactly the same reason.
2967 In short, it is impossible to build abstractions around functions with
2968 rank-2 types.
2969 </Para>
2970
2971 <Para>
2972 The solution is fairly clear.  We provide the ability to give a rank-2
2973 type signature for <Emphasis>ordinary</Emphasis> functions (not only data
2974 constructors), thus:
2975 </Para>
2976
2977 <Para>
2978
2979 <ProgramListing>
2980   mkTs :: (forall b. b -> b -> b) -> a -> [T a]
2981   mkTs f x y = [T1 f x, T1 f y]
2982 </ProgramListing>
2983
2984 </Para>
2985
2986 <Para>
2987 This type signature tells the compiler to attribute <Literal>f</Literal> with
2988 the polymorphic type <Literal>(forall b. b -> b -> b)</Literal> when type
2989 checking the body of <Function>mkTs</Function>, so now the application of
2990 <Function>T1</Function> is fine.
2991 </Para>
2992
2993 <Para>
2994 There are two restrictions:
2995 </Para>
2996
2997 <Para>
2998
2999 <ItemizedList>
3000 <ListItem>
3001
3002 <Para>
3003  You can only define a rank 2 type, specified by the following
3004 grammar:
3005
3006
3007 <ProgramListing>
3008 rank2type ::= [forall tyvars .] [context =>] funty
3009 funty     ::= ([forall tyvars .] [context =>] ty) -> funty
3010             | ty
3011 ty        ::= ...current Haskell monotype syntax...
3012 </ProgramListing>
3013
3014
3015 Informally, the universal quantification must all be right at the beginning,
3016 or at the top level of a function argument.
3017
3018 </Para>
3019 </ListItem>
3020 <ListItem>
3021
3022 <Para>
3023  There is a restriction on the definition of a function whose
3024 type signature is a rank-2 type: the polymorphic arguments must be
3025 matched on the left hand side of the "<Literal>=</Literal>" sign.  You can't
3026 define <Function>mkTs</Function> like this:
3027
3028
3029 <ProgramListing>
3030 mkTs :: (forall b. b -> b -> b) -> a -> [T a]
3031 mkTs = \ f x y -> [T1 f x, T1 f y]
3032 </ProgramListing>
3033
3034
3035
3036 The same partial-application rule applies to ordinary functions with
3037 rank-2 types as applied to data constructors.
3038
3039 </Para>
3040 </ListItem>
3041
3042 </ItemizedList>
3043
3044 </Para>
3045
3046 </Sect2>
3047
3048
3049 <Sect2 id="hoist">
3050 <Title>Type synonyms and hoisting
3051 </Title>
3052
3053 <Para>
3054 GHC also allows you to write a <Literal>forall</Literal> in a type synonym, thus:
3055 <ProgramListing>
3056   type Discard a = forall b. a -> b -> a
3057
3058   f :: Discard a
3059   f x y = x
3060 </ProgramListing>
3061 However, it is often convenient to use these sort of synonyms at the right hand
3062 end of an arrow, thus:
3063 <ProgramListing>
3064   type Discard a = forall b. a -> b -> a
3065
3066   g :: Int -> Discard Int
3067   g x y z = x+y
3068 </ProgramListing>
3069 Simply expanding the type synonym would give
3070 <ProgramListing>
3071   g :: Int -> (forall b. Int -> b -> Int)
3072 </ProgramListing>
3073 but GHC "hoists" the <Literal>forall</Literal> to give the isomorphic type
3074 <ProgramListing>
3075   g :: forall b. Int -> Int -> b -> Int
3076 </ProgramListing>
3077 In general, the rule is this: <Emphasis>to determine the type specified by any explicit
3078 user-written type (e.g. in a type signature), GHC expands type synonyms and then repeatedly
3079 performs the transformation:</Emphasis>
3080 <ProgramListing>
3081   <Emphasis>type1</Emphasis> -> forall a. <Emphasis>type2</Emphasis>
3082 ==>
3083   forall a. <Emphasis>type1</Emphasis> -> <Emphasis>type2</Emphasis>
3084 </ProgramListing>
3085 (In fact, GHC tries to retain as much synonym information as possible for use in
3086 error messages, but that is a usability issue.)  This rule applies, of course, whether
3087 or not the <Literal>forall</Literal> comes from a synonym. For example, here is another
3088 valid way to write <Literal>g</Literal>'s type signature:
3089 <ProgramListing>
3090   g :: Int -> Int -> forall b. b -> Int
3091 </ProgramListing>
3092 </Para>
3093 </Sect2>
3094
3095 </Sect1>
3096
3097 <Sect1 id="existential-quantification">
3098 <Title>Existentially quantified data constructors
3099 </Title>
3100
3101 <Para>
3102 The idea of using existential quantification in data type declarations
3103 was suggested by Laufer (I believe, thought doubtless someone will
3104 correct me), and implemented in Hope+. It's been in Lennart
3105 Augustsson's <Command>hbc</Command> Haskell compiler for several years, and
3106 proved very useful.  Here's the idea.  Consider the declaration:
3107 </Para>
3108
3109 <Para>
3110
3111 <ProgramListing>
3112   data Foo = forall a. MkFoo a (a -> Bool)
3113            | Nil
3114 </ProgramListing>
3115
3116 </Para>
3117
3118 <Para>
3119 The data type <Literal>Foo</Literal> has two constructors with types:
3120 </Para>
3121
3122 <Para>
3123
3124 <ProgramListing>
3125   MkFoo :: forall a. a -> (a -> Bool) -> Foo
3126   Nil   :: Foo
3127 </ProgramListing>
3128
3129 </Para>
3130
3131 <Para>
3132 Notice that the type variable <Literal>a</Literal> in the type of <Function>MkFoo</Function>
3133 does not appear in the data type itself, which is plain <Literal>Foo</Literal>.
3134 For example, the following expression is fine:
3135 </Para>
3136
3137 <Para>
3138
3139 <ProgramListing>
3140   [MkFoo 3 even, MkFoo 'c' isUpper] :: [Foo]
3141 </ProgramListing>
3142
3143 </Para>
3144
3145 <Para>
3146 Here, <Literal>(MkFoo 3 even)</Literal> packages an integer with a function
3147 <Function>even</Function> that maps an integer to <Literal>Bool</Literal>; and <Function>MkFoo 'c'
3148 isUpper</Function> packages a character with a compatible function.  These
3149 two things are each of type <Literal>Foo</Literal> and can be put in a list.
3150 </Para>
3151
3152 <Para>
3153 What can we do with a value of type <Literal>Foo</Literal>?.  In particular,
3154 what happens when we pattern-match on <Function>MkFoo</Function>?
3155 </Para>
3156
3157 <Para>
3158
3159 <ProgramListing>
3160   f (MkFoo val fn) = ???
3161 </ProgramListing>
3162
3163 </Para>
3164
3165 <Para>
3166 Since all we know about <Literal>val</Literal> and <Function>fn</Function> is that they
3167 are compatible, the only (useful) thing we can do with them is to
3168 apply <Function>fn</Function> to <Literal>val</Literal> to get a boolean.  For example:
3169 </Para>
3170
3171 <Para>
3172
3173 <ProgramListing>
3174   f :: Foo -> Bool
3175   f (MkFoo val fn) = fn val
3176 </ProgramListing>
3177
3178 </Para>
3179
3180 <Para>
3181 What this allows us to do is to package heterogenous values
3182 together with a bunch of functions that manipulate them, and then treat
3183 that collection of packages in a uniform manner.  You can express
3184 quite a bit of object-oriented-like programming this way.
3185 </Para>
3186
3187 <Sect2 id="existential">
3188 <Title>Why existential?
3189 </Title>
3190
3191 <Para>
3192 What has this to do with <Emphasis>existential</Emphasis> quantification?
3193 Simply that <Function>MkFoo</Function> has the (nearly) isomorphic type
3194 </Para>
3195
3196 <Para>
3197
3198 <ProgramListing>
3199   MkFoo :: (exists a . (a, a -> Bool)) -> Foo
3200 </ProgramListing>
3201
3202 </Para>
3203
3204 <Para>
3205 But Haskell programmers can safely think of the ordinary
3206 <Emphasis>universally</Emphasis> quantified type given above, thereby avoiding
3207 adding a new existential quantification construct.
3208 </Para>
3209
3210 </Sect2>
3211
3212 <Sect2>
3213 <Title>Type classes</Title>
3214
3215 <Para>
3216 An easy extension (implemented in <Command>hbc</Command>) is to allow
3217 arbitrary contexts before the constructor.  For example:
3218 </Para>
3219
3220 <Para>
3221
3222 <ProgramListing>
3223 data Baz = forall a. Eq a => Baz1 a a
3224          | forall b. Show b => Baz2 b (b -> b)
3225 </ProgramListing>
3226
3227 </Para>
3228
3229 <Para>
3230 The two constructors have the types you'd expect:
3231 </Para>
3232
3233 <Para>
3234
3235 <ProgramListing>
3236 Baz1 :: forall a. Eq a => a -> a -> Baz
3237 Baz2 :: forall b. Show b => b -> (b -> b) -> Baz
3238 </ProgramListing>
3239
3240 </Para>
3241
3242 <Para>
3243 But when pattern matching on <Function>Baz1</Function> the matched values can be compared
3244 for equality, and when pattern matching on <Function>Baz2</Function> the first matched
3245 value can be converted to a string (as well as applying the function to it).
3246 So this program is legal:
3247 </Para>
3248
3249 <Para>
3250
3251 <ProgramListing>
3252   f :: Baz -> String
3253   f (Baz1 p q) | p == q    = "Yes"
3254                | otherwise = "No"
3255   f (Baz1 v fn)            = show (fn v)
3256 </ProgramListing>
3257
3258 </Para>
3259
3260 <Para>
3261 Operationally, in a dictionary-passing implementation, the
3262 constructors <Function>Baz1</Function> and <Function>Baz2</Function> must store the
3263 dictionaries for <Literal>Eq</Literal> and <Literal>Show</Literal> respectively, and
3264 extract it on pattern matching.
3265 </Para>
3266
3267 <Para>
3268 Notice the way that the syntax fits smoothly with that used for
3269 universal quantification earlier.
3270 </Para>
3271
3272 </Sect2>
3273
3274 <Sect2>
3275 <Title>Restrictions</Title>
3276
3277 <Para>
3278 There are several restrictions on the ways in which existentially-quantified
3279 constructors can be use.
3280 </Para>
3281
3282 <Para>
3283
3284 <ItemizedList>
3285 <ListItem>
3286
3287 <Para>
3288  When pattern matching, each pattern match introduces a new,
3289 distinct, type for each existential type variable.  These types cannot
3290 be unified with any other type, nor can they escape from the scope of
3291 the pattern match.  For example, these fragments are incorrect:
3292
3293
3294 <ProgramListing>
3295 f1 (MkFoo a f) = a
3296 </ProgramListing>
3297
3298
3299 Here, the type bound by <Function>MkFoo</Function> "escapes", because <Literal>a</Literal>
3300 is the result of <Function>f1</Function>.  One way to see why this is wrong is to
3301 ask what type <Function>f1</Function> has:
3302
3303
3304 <ProgramListing>
3305   f1 :: Foo -> a             -- Weird!
3306 </ProgramListing>
3307
3308
3309 What is this "<Literal>a</Literal>" in the result type? Clearly we don't mean
3310 this:
3311
3312
3313 <ProgramListing>
3314   f1 :: forall a. Foo -> a   -- Wrong!
3315 </ProgramListing>
3316
3317
3318 The original program is just plain wrong.  Here's another sort of error
3319
3320
3321 <ProgramListing>
3322   f2 (Baz1 a b) (Baz1 p q) = a==q
3323 </ProgramListing>
3324
3325
3326 It's ok to say <Literal>a==b</Literal> or <Literal>p==q</Literal>, but
3327 <Literal>a==q</Literal> is wrong because it equates the two distinct types arising
3328 from the two <Function>Baz1</Function> constructors.
3329
3330
3331 </Para>
3332 </ListItem>
3333 <ListItem>
3334
3335 <Para>
3336 You can't pattern-match on an existentially quantified
3337 constructor in a <Literal>let</Literal> or <Literal>where</Literal> group of
3338 bindings. So this is illegal:
3339
3340
3341 <ProgramListing>
3342   f3 x = a==b where { Baz1 a b = x }
3343 </ProgramListing>
3344
3345
3346 You can only pattern-match
3347 on an existentially-quantified constructor in a <Literal>case</Literal> expression or
3348 in the patterns of a function definition.
3349
3350 The reason for this restriction is really an implementation one.
3351 Type-checking binding groups is already a nightmare without
3352 existentials complicating the picture.  Also an existential pattern
3353 binding at the top level of a module doesn't make sense, because it's
3354 not clear how to prevent the existentially-quantified type "escaping".
3355 So for now, there's a simple-to-state restriction.  We'll see how
3356 annoying it is.
3357
3358 </Para>
3359 </ListItem>
3360 <ListItem>
3361
3362 <Para>
3363 You can't use existential quantification for <Literal>newtype</Literal>
3364 declarations.  So this is illegal:
3365
3366
3367 <ProgramListing>
3368   newtype T = forall a. Ord a => MkT a
3369 </ProgramListing>
3370
3371
3372 Reason: a value of type <Literal>T</Literal> must be represented as a pair
3373 of a dictionary for <Literal>Ord t</Literal> and a value of type <Literal>t</Literal>.
3374 That contradicts the idea that <Literal>newtype</Literal> should have no
3375 concrete representation.  You can get just the same efficiency and effect
3376 by using <Literal>data</Literal> instead of <Literal>newtype</Literal>.  If there is no
3377 overloading involved, then there is more of a case for allowing
3378 an existentially-quantified <Literal>newtype</Literal>, because the <Literal>data</Literal>
3379 because the <Literal>data</Literal> version does carry an implementation cost,
3380 but single-field existentially quantified constructors aren't much
3381 use.  So the simple restriction (no existential stuff on <Literal>newtype</Literal>)
3382 stands, unless there are convincing reasons to change it.
3383
3384
3385 </Para>
3386 </ListItem>
3387 <ListItem>
3388
3389 <Para>
3390  You can't use <Literal>deriving</Literal> to define instances of a
3391 data type with existentially quantified data constructors.
3392
3393 Reason: in most cases it would not make sense. For example:&num;
3394
3395 <ProgramListing>
3396 data T = forall a. MkT [a] deriving( Eq )
3397 </ProgramListing>
3398
3399 To derive <Literal>Eq</Literal> in the standard way we would need to have equality
3400 between the single component of two <Function>MkT</Function> constructors:
3401
3402 <ProgramListing>
3403 instance Eq T where
3404   (MkT a) == (MkT b) = ???
3405 </ProgramListing>
3406
3407 But <VarName>a</VarName> and <VarName>b</VarName> have distinct types, and so can't be compared.
3408 It's just about possible to imagine examples in which the derived instance
3409 would make sense, but it seems altogether simpler simply to prohibit such
3410 declarations.  Define your own instances!
3411 </Para>
3412 </ListItem>
3413
3414 </ItemizedList>
3415
3416 </Para>
3417
3418 </Sect2>
3419
3420 </Sect1>
3421
3422 <Sect1 id="sec-assertions">
3423 <Title>Assertions
3424 <IndexTerm><Primary>Assertions</Primary></IndexTerm>
3425 </Title>
3426
3427 <Para>
3428 If you want to make use of assertions in your standard Haskell code, you
3429 could define a function like the following:
3430 </Para>
3431
3432 <Para>
3433
3434 <ProgramListing>
3435 assert :: Bool -> a -> a
3436 assert False x = error "assertion failed!"
3437 assert _     x = x
3438 </ProgramListing>
3439
3440 </Para>
3441
3442 <Para>
3443 which works, but gives you back a less than useful error message --
3444 an assertion failed, but which and where?
3445 </Para>
3446
3447 <Para>
3448 One way out is to define an extended <Function>assert</Function> function which also
3449 takes a descriptive string to include in the error message and
3450 perhaps combine this with the use of a pre-processor which inserts
3451 the source location where <Function>assert</Function> was used.
3452 </Para>
3453
3454 <Para>
3455 Ghc offers a helping hand here, doing all of this for you. For every
3456 use of <Function>assert</Function> in the user's source:
3457 </Para>
3458
3459 <Para>
3460
3461 <ProgramListing>
3462 kelvinToC :: Double -> Double
3463 kelvinToC k = assert (k &amp;gt;= 0.0) (k+273.15)
3464 </ProgramListing>
3465
3466 </Para>
3467
3468 <Para>
3469 Ghc will rewrite this to also include the source location where the
3470 assertion was made,
3471 </Para>
3472
3473 <Para>
3474
3475 <ProgramListing>
3476 assert pred val ==> assertError "Main.hs|15" pred val
3477 </ProgramListing>
3478
3479 </Para>
3480
3481 <Para>
3482 The rewrite is only performed by the compiler when it spots
3483 applications of <Function>Exception.assert</Function>, so you can still define and
3484 use your own versions of <Function>assert</Function>, should you so wish. If not,
3485 import <Literal>Exception</Literal> to make use <Function>assert</Function> in your code.
3486 </Para>
3487
3488 <Para>
3489 To have the compiler ignore uses of assert, use the compiler option
3490 <Option>-fignore-asserts</Option>. <IndexTerm><Primary>-fignore-asserts option</Primary></IndexTerm> That is,
3491 expressions of the form <Literal>assert pred e</Literal> will be rewritten to <Literal>e</Literal>.
3492 </Para>
3493
3494 <Para>
3495 Assertion failures can be caught, see the documentation for the
3496 <literal>Exception</literal> library (<xref linkend="sec-Exception">)
3497 for the details.
3498 </Para>
3499
3500 </Sect1>
3501
3502 <Sect1 id="scoped-type-variables">
3503 <Title>Scoped Type Variables
3504 </Title>
3505
3506 <Para>
3507 A <Emphasis>pattern type signature</Emphasis> can introduce a <Emphasis>scoped type
3508 variable</Emphasis>.  For example
3509 </Para>
3510
3511 <Para>
3512
3513 <ProgramListing>
3514 f (xs::[a]) = ys ++ ys
3515            where
3516               ys :: [a]
3517               ys = reverse xs
3518 </ProgramListing>
3519
3520 </Para>
3521
3522 <Para>
3523 The pattern <Literal>(xs::[a])</Literal> includes a type signature for <VarName>xs</VarName>.
3524 This brings the type variable <Literal>a</Literal> into scope; it scopes over
3525 all the patterns and right hand sides for this equation for <Function>f</Function>.
3526 In particular, it is in scope at the type signature for <VarName>y</VarName>.
3527 </Para>
3528
3529 <Para>
3530 At ordinary type signatures, such as that for <VarName>ys</VarName>, any type variables
3531 mentioned in the type signature <Emphasis>that are not in scope</Emphasis> are
3532 implicitly universally quantified.  (If there are no type variables in
3533 scope, all type variables mentioned in the signature are universally
3534 quantified, which is just as in Haskell 98.)  In this case, since <VarName>a</VarName>
3535 is in scope, it is not universally quantified, so the type of <VarName>ys</VarName> is
3536 the same as that of <VarName>xs</VarName>.  In Haskell 98 it is not possible to declare
3537 a type for <VarName>ys</VarName>; a major benefit of scoped type variables is that
3538 it becomes possible to do so.
3539 </Para>
3540
3541 <Para>
3542 Scoped type variables are implemented in both GHC and Hugs.  Where the
3543 implementations differ from the specification below, those differences
3544 are noted.
3545 </Para>
3546
3547 <Para>
3548 So much for the basic idea.  Here are the details.
3549 </Para>
3550
3551 <Sect2>
3552 <Title>Scope and implicit quantification</Title>
3553
3554 <Para>
3555
3556 <ItemizedList>
3557 <ListItem>
3558
3559 <Para>
3560  All the type variables mentioned in the patterns for a single
3561 function definition equation, that are not already in scope,
3562 are brought into scope by the patterns.  We describe this set as
3563 the <Emphasis>type variables bound by the equation</Emphasis>.
3564
3565 </Para>
3566 </ListItem>
3567 <ListItem>
3568
3569 <Para>
3570  The type variables thus brought into scope may be mentioned
3571 in ordinary type signatures or pattern type signatures anywhere within
3572 their scope.
3573
3574 </Para>
3575 </ListItem>
3576 <ListItem>
3577
3578 <Para>
3579  In ordinary type signatures, any type variable mentioned in the
3580 signature that is in scope is <Emphasis>not</Emphasis> universally quantified.
3581
3582 </Para>
3583 </ListItem>
3584 <ListItem>
3585
3586 <Para>
3587  Ordinary type signatures do not bring any new type variables
3588 into scope (except in the type signature itself!). So this is illegal:
3589
3590
3591 <ProgramListing>
3592   f :: a -> a
3593   f x = x::a
3594 </ProgramListing>
3595
3596
3597 It's illegal because <VarName>a</VarName> is not in scope in the body of <Function>f</Function>,
3598 so the ordinary signature <Literal>x::a</Literal> is equivalent to <Literal>x::forall a.a</Literal>;
3599 and that is an incorrect typing.
3600
3601 </Para>
3602 </ListItem>
3603 <ListItem>
3604
3605 <Para>
3606  There is no implicit universal quantification on pattern type
3607 signatures, nor may one write an explicit <Literal>forall</Literal> type in a pattern
3608 type signature.  The pattern type signature is a monotype.
3609
3610 </Para>
3611 </ListItem>
3612 <ListItem>
3613
3614 <Para>
3615
3616 The type variables in the head of a <Literal>class</Literal> or <Literal>instance</Literal> declaration
3617 scope over the methods defined in the <Literal>where</Literal> part.  For example:
3618
3619
3620 <ProgramListing>
3621   class C a where
3622     op :: [a] -> a
3623
3624     op xs = let ys::[a]
3625                 ys = reverse xs
3626             in
3627             head ys
3628 </ProgramListing>
3629
3630
3631 (Not implemented in Hugs yet, Dec 98).
3632 </Para>
3633 </ListItem>
3634
3635 </ItemizedList>
3636
3637 </Para>
3638
3639 </Sect2>
3640
3641 <Sect2>
3642 <Title>Polymorphism</Title>
3643
3644 <Para>
3645
3646 <ItemizedList>
3647 <ListItem>
3648
3649 <Para>
3650  Pattern type signatures are completely orthogonal to ordinary, separate
3651 type signatures.  The two can be used independently or together.  There is
3652 no scoping associated with the names of the type variables in a separate type signature.
3653
3654
3655 <ProgramListing>
3656    f :: [a] -> [a]
3657    f (xs::[b]) = reverse xs
3658 </ProgramListing>
3659
3660
3661 </Para>
3662 </ListItem>
3663 <ListItem>
3664
3665 <Para>
3666  The function must be polymorphic in the type variables
3667 bound by all its equations.  Operationally, the type variables bound
3668 by one equation must not:
3669
3670
3671 <ItemizedList>
3672 <ListItem>
3673
3674 <Para>
3675  Be unified with a type (such as <Literal>Int</Literal>, or <Literal>[a]</Literal>).
3676 </Para>
3677 </ListItem>
3678 <ListItem>
3679
3680 <Para>
3681  Be unified with a type variable free in the environment.
3682 </Para>
3683 </ListItem>
3684 <ListItem>
3685
3686 <Para>
3687  Be unified with each other.  (They may unify with the type variables
3688 bound by another equation for the same function, of course.)
3689 </Para>
3690 </ListItem>
3691
3692 </ItemizedList>
3693
3694
3695 For example, the following all fail to type check:
3696
3697
3698 <ProgramListing>
3699   f (x::a) (y::b) = [x,y]       -- a unifies with b
3700
3701   g (x::a) = x + 1::Int         -- a unifies with Int
3702
3703   h x = let k (y::a) = [x,y]    -- a is free in the
3704         in k x                  -- environment
3705
3706   k (x::a) True    = ...        -- a unifies with Int
3707   k (x::Int) False = ...
3708
3709   w :: [b] -> [b]
3710   w (x::a) = x                  -- a unifies with [b]
3711 </ProgramListing>
3712
3713
3714 </Para>
3715 </ListItem>
3716 <ListItem>
3717
3718 <Para>
3719  The pattern-bound type variable may, however, be constrained
3720 by the context of the principal type, thus:
3721
3722
3723 <ProgramListing>
3724   f (x::a) (y::a) = x+y*2
3725 </ProgramListing>
3726
3727
3728 gets the inferred type: <Literal>forall a. Num a =&gt; a -&gt; a -&gt; a</Literal>.
3729 </Para>
3730 </ListItem>
3731
3732 </ItemizedList>
3733
3734 </Para>
3735
3736 </Sect2>
3737
3738 <Sect2>
3739 <Title>Result type signatures</Title>
3740
3741 <Para>
3742
3743 <ItemizedList>
3744 <ListItem>
3745
3746 <Para>
3747  The result type of a function can be given a signature,
3748 thus:
3749
3750
3751 <ProgramListing>
3752   f (x::a) :: [a] = [x,x,x]
3753 </ProgramListing>
3754
3755
3756 The final <Literal>:: [a]</Literal> after all the patterns gives a signature to the
3757 result type.  Sometimes this is the only way of naming the type variable
3758 you want:
3759
3760
3761 <ProgramListing>
3762   f :: Int -> [a] -> [a]
3763   f n :: ([a] -> [a]) = let g (x::a, y::a) = (y,x)
3764                         in \xs -> map g (reverse xs `zip` xs)
3765 </ProgramListing>
3766
3767
3768 </Para>
3769 </ListItem>
3770
3771 </ItemizedList>
3772
3773 </Para>
3774
3775 <Para>
3776 Result type signatures are not yet implemented in Hugs.
3777 </Para>
3778
3779 </Sect2>
3780
3781 <Sect2>
3782 <Title>Pattern signatures on other constructs</Title>
3783
3784 <Para>
3785
3786 <ItemizedList>
3787 <ListItem>
3788
3789 <Para>
3790  A pattern type signature can be on an arbitrary sub-pattern, not
3791 just on a variable:
3792
3793
3794 <ProgramListing>
3795   f ((x,y)::(a,b)) = (y,x) :: (b,a)
3796 </ProgramListing>
3797
3798
3799 </Para>
3800 </ListItem>
3801 <ListItem>
3802
3803 <Para>
3804  Pattern type signatures, including the result part, can be used
3805 in lambda abstractions:
3806
3807
3808 <ProgramListing>
3809   (\ (x::a, y) :: a -> x)
3810 </ProgramListing>
3811
3812
3813 Type variables bound by these patterns must be polymorphic in
3814 the sense defined above.
3815 For example:
3816
3817
3818 <ProgramListing>
3819   f1 (x::c) = f1 x      -- ok
3820   f2 = \(x::c) -> f2 x  -- not ok
3821 </ProgramListing>
3822
3823
3824 Here, <Function>f1</Function> is OK, but <Function>f2</Function> is not, because <VarName>c</VarName> gets unified
3825 with a type variable free in the environment, in this
3826 case, the type of <Function>f2</Function>, which is in the environment when
3827 the lambda abstraction is checked.
3828
3829 </Para>
3830 </ListItem>
3831 <ListItem>
3832
3833 <Para>
3834  Pattern type signatures, including the result part, can be used
3835 in <Literal>case</Literal> expressions:
3836
3837
3838 <ProgramListing>
3839   case e of { (x::a, y) :: a -> x }
3840 </ProgramListing>
3841
3842
3843 The pattern-bound type variables must, as usual,
3844 be polymorphic in the following sense: each case alternative,
3845 considered as a lambda abstraction, must be polymorphic.
3846 Thus this is OK:
3847
3848
3849 <ProgramListing>
3850   case (True,False) of { (x::a, y) -> x }
3851 </ProgramListing>
3852
3853
3854 Even though the context is that of a pair of booleans,
3855 the alternative itself is polymorphic.  Of course, it is
3856 also OK to say:
3857
3858
3859 <ProgramListing>
3860   case (True,False) of { (x::Bool, y) -> x }
3861 </ProgramListing>
3862
3863
3864 </Para>
3865 </ListItem>
3866 <ListItem>
3867
3868 <Para>
3869 To avoid ambiguity, the type after the &ldquo;<Literal>::</Literal>&rdquo; in a result
3870 pattern signature on a lambda or <Literal>case</Literal> must be atomic (i.e. a single
3871 token or a parenthesised type of some sort).  To see why,
3872 consider how one would parse this:
3873
3874
3875 <ProgramListing>
3876   \ x :: a -> b -> x
3877 </ProgramListing>
3878
3879
3880 </Para>
3881 </ListItem>
3882 <ListItem>
3883
3884 <Para>
3885  Pattern type signatures that bind new type variables
3886 may not be used in pattern bindings at all.
3887 So this is illegal:
3888
3889
3890 <ProgramListing>
3891   f x = let (y, z::a) = x in ...
3892 </ProgramListing>
3893
3894
3895 But these are OK, because they do not bind fresh type variables:
3896
3897
3898 <ProgramListing>
3899   f1 x            = let (y, z::Int) = x in ...
3900   f2 (x::(Int,a)) = let (y, z::a)   = x in ...
3901 </ProgramListing>
3902
3903
3904 However a single variable is considered a degenerate function binding,
3905 rather than a degerate pattern binding, so this is permitted, even
3906 though it binds a type variable:
3907
3908
3909 <ProgramListing>
3910   f :: (b->b) = \(x::b) -> x
3911 </ProgramListing>
3912
3913
3914 </Para>
3915 </ListItem>
3916
3917 </ItemizedList>
3918
3919 Such degnerate function bindings do not fall under the monomorphism
3920 restriction.  Thus:
3921 </Para>
3922
3923 <Para>
3924
3925 <ProgramListing>
3926   g :: a -> a -> Bool = \x y. x==y
3927 </ProgramListing>
3928
3929 </Para>
3930
3931 <Para>
3932 Here <Function>g</Function> has type <Literal>forall a. Eq a =&gt; a -&gt; a -&gt; Bool</Literal>, just as if
3933 <Function>g</Function> had a separate type signature.  Lacking a type signature, <Function>g</Function>
3934 would get a monomorphic type.
3935 </Para>
3936
3937 </Sect2>
3938
3939 <Sect2>
3940 <Title>Existentials</Title>
3941
3942 <Para>
3943
3944 <ItemizedList>
3945 <ListItem>
3946
3947 <Para>
3948  Pattern type signatures can bind existential type variables.
3949 For example:
3950
3951
3952 <ProgramListing>
3953   data T = forall a. MkT [a]
3954
3955   f :: T -> T
3956   f (MkT [t::a]) = MkT t3
3957                  where
3958                    t3::[a] = [t,t,t]
3959 </ProgramListing>
3960
3961
3962 </Para>
3963 </ListItem>
3964
3965 </ItemizedList>
3966
3967 </Para>
3968
3969 </Sect2>
3970
3971 </Sect1>
3972
3973 <Sect1 id="pragmas">
3974 <Title>Pragmas
3975 </Title>
3976
3977 <Para>
3978 GHC supports several pragmas, or instructions to the compiler placed
3979 in the source code.  Pragmas don't affect the meaning of the program,
3980 but they might affect the efficiency of the generated code.
3981 </Para>
3982
3983 <Sect2 id="inline-pragma">
3984 <Title>INLINE pragma
3985
3986 <IndexTerm><Primary>INLINE pragma</Primary></IndexTerm>
3987 <IndexTerm><Primary>pragma, INLINE</Primary></IndexTerm></Title>
3988
3989 <Para>
3990 GHC (with <Option>-O</Option>, as always) tries to inline (or &ldquo;unfold&rdquo;)
3991 functions/values that are &ldquo;small enough,&rdquo; thus avoiding the call
3992 overhead and possibly exposing other more-wonderful optimisations.
3993 </Para>
3994
3995 <Para>
3996 You will probably see these unfoldings (in Core syntax) in your
3997 interface files.
3998 </Para>
3999
4000 <Para>
4001 Normally, if GHC decides a function is &ldquo;too expensive&rdquo; to inline, it
4002 will not do so, nor will it export that unfolding for other modules to
4003 use.
4004 </Para>
4005
4006 <Para>
4007 The sledgehammer you can bring to bear is the
4008 <Literal>INLINE</Literal><IndexTerm><Primary>INLINE pragma</Primary></IndexTerm> pragma, used thusly:
4009
4010 <ProgramListing>
4011 key_function :: Int -> String -> (Bool, Double)
4012
4013 #ifdef __GLASGOW_HASKELL__
4014 {-# INLINE key_function #-}
4015 #endif
4016 </ProgramListing>
4017
4018 (You don't need to do the C pre-processor carry-on unless you're going
4019 to stick the code through HBC&mdash;it doesn't like <Literal>INLINE</Literal> pragmas.)
4020 </Para>
4021
4022 <Para>
4023 The major effect of an <Literal>INLINE</Literal> pragma is to declare a function's
4024 &ldquo;cost&rdquo; to be very low.  The normal unfolding machinery will then be
4025 very keen to inline it.
4026 </Para>
4027
4028 <Para>
4029 An <Literal>INLINE</Literal> pragma for a function can be put anywhere its type
4030 signature could be put.
4031 </Para>
4032
4033 <Para>
4034 <Literal>INLINE</Literal> pragmas are a particularly good idea for the
4035 <Literal>then</Literal>/<Literal>return</Literal> (or <Literal>bind</Literal>/<Literal>unit</Literal>) functions in a monad.
4036 For example, in GHC's own <Literal>UniqueSupply</Literal> monad code, we have:
4037
4038 <ProgramListing>
4039 #ifdef __GLASGOW_HASKELL__
4040 {-# INLINE thenUs #-}
4041 {-# INLINE returnUs #-}
4042 #endif
4043 </ProgramListing>
4044
4045 </Para>
4046
4047 </Sect2>
4048
4049 <Sect2 id="noinline-pragma">
4050 <Title>NOINLINE pragma
4051 </Title>
4052
4053 <Para>
4054 <IndexTerm><Primary>NOINLINE pragma</Primary></IndexTerm>
4055 <IndexTerm><Primary>pragma, NOINLINE</Primary></IndexTerm>
4056 </Para>
4057
4058 <Para>
4059 The <Literal>NOINLINE</Literal> pragma does exactly what you'd expect: it stops the
4060 named function from being inlined by the compiler.  You shouldn't ever
4061 need to do this, unless you're very cautious about code size.
4062 </Para>
4063
4064 </Sect2>
4065
4066 <Sect2 id="specialize-pragma">
4067 <Title>SPECIALIZE pragma
4068 </Title>
4069
4070 <Para>
4071 <IndexTerm><Primary>SPECIALIZE pragma</Primary></IndexTerm>
4072 <IndexTerm><Primary>pragma, SPECIALIZE</Primary></IndexTerm>
4073 <IndexTerm><Primary>overloading, death to</Primary></IndexTerm>
4074 </Para>
4075
4076 <Para>
4077 (UK spelling also accepted.)  For key overloaded functions, you can
4078 create extra versions (NB: more code space) specialised to particular
4079 types.  Thus, if you have an overloaded function:
4080 </Para>
4081
4082 <Para>
4083
4084 <ProgramListing>
4085 hammeredLookup :: Ord key => [(key, value)] -> key -> value
4086 </ProgramListing>
4087
4088 </Para>
4089
4090 <Para>
4091 If it is heavily used on lists with <Literal>Widget</Literal> keys, you could
4092 specialise it as follows:
4093
4094 <ProgramListing>
4095 {-# SPECIALIZE hammeredLookup :: [(Widget, value)] -> Widget -> value #-}
4096 </ProgramListing>
4097
4098 </Para>
4099
4100 <Para>
4101 To get very fancy, you can also specify a named function to use for
4102 the specialised value, by adding <Literal>= blah</Literal>, as in:
4103
4104 <ProgramListing>
4105 {-# SPECIALIZE hammeredLookup :: ...as before... = blah #-}
4106 </ProgramListing>
4107
4108 It's <Emphasis>Your Responsibility</Emphasis> to make sure that <Function>blah</Function> really
4109 behaves as a specialised version of <Function>hammeredLookup</Function>!!!
4110 </Para>
4111
4112 <Para>
4113 NOTE: the <Literal>=blah</Literal> feature isn't implemented in GHC 4.xx.
4114 </Para>
4115
4116 <Para>
4117 An example in which the <Literal>= blah</Literal> form will Win Big:
4118
4119 <ProgramListing>
4120 toDouble :: Real a => a -> Double
4121 toDouble = fromRational . toRational
4122
4123 {-# SPECIALIZE toDouble :: Int -> Double = i2d #-}
4124 i2d (I# i) = D# (int2Double# i) -- uses Glasgow prim-op directly
4125 </ProgramListing>
4126
4127 The <Function>i2d</Function> function is virtually one machine instruction; the
4128 default conversion&mdash;via an intermediate <Literal>Rational</Literal>&mdash;is obscenely
4129 expensive by comparison.
4130 </Para>
4131
4132 <Para>
4133 By using the US spelling, your <Literal>SPECIALIZE</Literal> pragma will work with
4134 HBC, too.  Note that HBC doesn't support the <Literal>= blah</Literal> form.
4135 </Para>
4136
4137 <Para>
4138 A <Literal>SPECIALIZE</Literal> pragma for a function can be put anywhere its type
4139 signature could be put.
4140 </Para>
4141
4142 </Sect2>
4143
4144 <Sect2 id="specialize-instance-pragma">
4145 <Title>SPECIALIZE instance pragma
4146 </Title>
4147
4148 <Para>
4149 <IndexTerm><Primary>SPECIALIZE pragma</Primary></IndexTerm>
4150 <IndexTerm><Primary>overloading, death to</Primary></IndexTerm>
4151 Same idea, except for instance declarations.  For example:
4152
4153 <ProgramListing>
4154 instance (Eq a) => Eq (Foo a) where { ... usual stuff ... }
4155
4156 {-# SPECIALIZE instance Eq (Foo [(Int, Bar)] #-}
4157 </ProgramListing>
4158
4159 Compatible with HBC, by the way.
4160 </Para>
4161
4162 </Sect2>
4163
4164 <Sect2 id="line-pragma">
4165 <Title>LINE pragma
4166 </Title>
4167
4168 <Para>
4169 <IndexTerm><Primary>LINE pragma</Primary></IndexTerm>
4170 <IndexTerm><Primary>pragma, LINE</Primary></IndexTerm>
4171 </Para>
4172
4173 <Para>
4174 This pragma is similar to C's <Literal>&num;line</Literal> pragma, and is mainly for use in
4175 automatically generated Haskell code.  It lets you specify the line
4176 number and filename of the original code; for example
4177 </Para>
4178
4179 <Para>
4180
4181 <ProgramListing>
4182 {-# LINE 42 "Foo.vhs" #-}
4183 </ProgramListing>
4184
4185 </Para>
4186
4187 <Para>
4188 if you'd generated the current file from something called <Filename>Foo.vhs</Filename>
4189 and this line corresponds to line 42 in the original.  GHC will adjust
4190 its error messages to refer to the line/file named in the <Literal>LINE</Literal>
4191 pragma.
4192 </Para>
4193
4194 </Sect2>
4195
4196 <Sect2>
4197 <Title>RULES pragma</Title>
4198
4199 <Para>
4200 The RULES pragma lets you specify rewrite rules.  It is described in
4201 <XRef LinkEnd="rewrite-rules">.
4202 </Para>
4203
4204 </Sect2>
4205
4206 </Sect1>
4207
4208 <Sect1 id="rewrite-rules">
4209 <Title>Rewrite rules
4210
4211 <IndexTerm><Primary>RULES pagma</Primary></IndexTerm>
4212 <IndexTerm><Primary>pragma, RULES</Primary></IndexTerm>
4213 <IndexTerm><Primary>rewrite rules</Primary></IndexTerm></Title>
4214
4215 <Para>
4216 The programmer can specify rewrite rules as part of the source program
4217 (in a pragma).  GHC applies these rewrite rules wherever it can.
4218 </Para>
4219
4220 <Para>
4221 Here is an example:
4222
4223 <ProgramListing>
4224   {-# RULES
4225         "map/map"       forall f g xs. map f (map g xs) = map (f.g) xs
4226   #-}
4227 </ProgramListing>
4228
4229 </Para>
4230
4231 <Sect2>
4232 <Title>Syntax</Title>
4233
4234 <Para>
4235 From a syntactic point of view:
4236
4237 <ItemizedList>
4238 <ListItem>
4239
4240 <Para>
4241  Each rule has a name, enclosed in double quotes.  The name itself has
4242 no significance at all.  It is only used when reporting how many times the rule fired.
4243 </Para>
4244 </ListItem>
4245 <ListItem>
4246
4247 <Para>
4248  There may be zero or more rules in a <Literal>RULES</Literal> pragma.
4249 </Para>
4250 </ListItem>
4251 <ListItem>
4252
4253 <Para>
4254  Layout applies in a <Literal>RULES</Literal> pragma.  Currently no new indentation level
4255 is set, so you must lay out your rules starting in the same column as the
4256 enclosing definitions.
4257 </Para>
4258 </ListItem>
4259 <ListItem>
4260
4261 <Para>
4262  Each variable mentioned in a rule must either be in scope (e.g. <Function>map</Function>),
4263 or bound by the <Literal>forall</Literal> (e.g. <Function>f</Function>, <Function>g</Function>, <Function>xs</Function>).  The variables bound by
4264 the <Literal>forall</Literal> are called the <Emphasis>pattern</Emphasis> variables.  They are separated
4265 by spaces, just like in a type <Literal>forall</Literal>.
4266 </Para>
4267 </ListItem>
4268 <ListItem>
4269
4270 <Para>
4271  A pattern variable may optionally have a type signature.
4272 If the type of the pattern variable is polymorphic, it <Emphasis>must</Emphasis> have a type signature.
4273 For example, here is the <Literal>foldr/build</Literal> rule:
4274
4275 <ProgramListing>
4276 "fold/build"  forall k z (g::forall b. (a->b->b) -> b -> b) .
4277               foldr k z (build g) = g k z
4278 </ProgramListing>
4279
4280 Since <Function>g</Function> has a polymorphic type, it must have a type signature.
4281
4282 </Para>
4283 </ListItem>
4284 <ListItem>
4285
4286 <Para>
4287 The left hand side of a rule must consist of a top-level variable applied
4288 to arbitrary expressions.  For example, this is <Emphasis>not</Emphasis> OK:
4289
4290 <ProgramListing>
4291 "wrong1"   forall e1 e2.  case True of { True -> e1; False -> e2 } = e1
4292 "wrong2"   forall f.      f True = True
4293 </ProgramListing>
4294
4295 In <Literal>"wrong1"</Literal>, the LHS is not an application; in <Literal>"wrong1"</Literal>, the LHS has a pattern variable
4296 in the head.
4297 </Para>
4298 </ListItem>
4299 <ListItem>
4300
4301 <Para>
4302  A rule does not need to be in the same module as (any of) the
4303 variables it mentions, though of course they need to be in scope.
4304 </Para>
4305 </ListItem>
4306 <ListItem>
4307
4308 <Para>
4309  Rules are automatically exported from a module, just as instance declarations are.
4310 </Para>
4311 </ListItem>
4312
4313 </ItemizedList>
4314
4315 </Para>
4316
4317 </Sect2>
4318
4319 <Sect2>
4320 <Title>Semantics</Title>
4321
4322 <Para>
4323 From a semantic point of view:
4324
4325 <ItemizedList>
4326 <ListItem>
4327
4328 <Para>
4329 Rules are only applied if you use the <Option>-O</Option> flag.
4330 </Para>
4331 </ListItem>
4332
4333 <ListItem>
4334 <Para>
4335  Rules are regarded as left-to-right rewrite rules.
4336 When GHC finds an expression that is a substitution instance of the LHS
4337 of a rule, it replaces the expression by the (appropriately-substituted) RHS.
4338 By "a substitution instance" we mean that the LHS can be made equal to the
4339 expression by substituting for the pattern variables.
4340
4341 </Para>
4342 </ListItem>
4343 <ListItem>
4344
4345 <Para>
4346  The LHS and RHS of a rule are typechecked, and must have the
4347 same type.
4348
4349 </Para>
4350 </ListItem>
4351 <ListItem>
4352
4353 <Para>
4354  GHC makes absolutely no attempt to verify that the LHS and RHS
4355 of a rule have the same meaning.  That is undecideable in general, and
4356 infeasible in most interesting cases.  The responsibility is entirely the programmer's!
4357
4358 </Para>
4359 </ListItem>
4360 <ListItem>
4361
4362 <Para>
4363  GHC makes no attempt to make sure that the rules are confluent or
4364 terminating.  For example:
4365
4366 <ProgramListing>
4367   "loop"        forall x,y.  f x y = f y x
4368 </ProgramListing>
4369
4370 This rule will cause the compiler to go into an infinite loop.
4371
4372 </Para>
4373 </ListItem>
4374 <ListItem>
4375
4376 <Para>
4377  If more than one rule matches a call, GHC will choose one arbitrarily to apply.
4378
4379 </Para>
4380 </ListItem>
4381 <ListItem>
4382 <Para>
4383  GHC currently uses a very simple, syntactic, matching algorithm
4384 for matching a rule LHS with an expression.  It seeks a substitution
4385 which makes the LHS and expression syntactically equal modulo alpha
4386 conversion.  The pattern (rule), but not the expression, is eta-expanded if
4387 necessary.  (Eta-expanding the epression can lead to laziness bugs.)
4388 But not beta conversion (that's called higher-order matching).
4389 </Para>
4390
4391 <Para>
4392 Matching is carried out on GHC's intermediate language, which includes
4393 type abstractions and applications.  So a rule only matches if the
4394 types match too.  See <XRef LinkEnd="rule-spec"> below.
4395 </Para>
4396 </ListItem>
4397 <ListItem>
4398
4399 <Para>
4400  GHC keeps trying to apply the rules as it optimises the program.
4401 For example, consider:
4402
4403 <ProgramListing>
4404   let s = map f
4405       t = map g
4406   in
4407   s (t xs)
4408 </ProgramListing>
4409
4410 The expression <Literal>s (t xs)</Literal> does not match the rule <Literal>"map/map"</Literal>, but GHC
4411 will substitute for <VarName>s</VarName> and <VarName>t</VarName>, giving an expression which does match.
4412 If <VarName>s</VarName> or <VarName>t</VarName> was (a) used more than once, and (b) large or a redex, then it would
4413 not be substituted, and the rule would not fire.
4414
4415 </Para>
4416 </ListItem>
4417 <ListItem>
4418
4419 <Para>
4420  In the earlier phases of compilation, GHC inlines <Emphasis>nothing
4421 that appears on the LHS of a rule</Emphasis>, because once you have substituted
4422 for something you can't match against it (given the simple minded
4423 matching).  So if you write the rule
4424
4425 <ProgramListing>
4426         "map/map"       forall f,g.  map f . map g = map (f.g)
4427 </ProgramListing>
4428
4429 this <Emphasis>won't</Emphasis> match the expression <Literal>map f (map g xs)</Literal>.
4430 It will only match something written with explicit use of ".".
4431 Well, not quite.  It <Emphasis>will</Emphasis> match the expression
4432
4433 <ProgramListing>
4434 wibble f g xs
4435 </ProgramListing>
4436
4437 where <Function>wibble</Function> is defined:
4438
4439 <ProgramListing>
4440 wibble f g = map f . map g
4441 </ProgramListing>
4442
4443 because <Function>wibble</Function> will be inlined (it's small).
4444
4445 Later on in compilation, GHC starts inlining even things on the
4446 LHS of rules, but still leaves the rules enabled.  This inlining
4447 policy is controlled by the per-simplification-pass flag <Option>-finline-phase</Option><Emphasis>n</Emphasis>.
4448
4449 </Para>
4450 </ListItem>
4451 <ListItem>
4452
4453 <Para>
4454  All rules are implicitly exported from the module, and are therefore
4455 in force in any module that imports the module that defined the rule, directly
4456 or indirectly.  (That is, if A imports B, which imports C, then C's rules are
4457 in force when compiling A.)  The situation is very similar to that for instance
4458 declarations.
4459 </Para>
4460 </ListItem>
4461
4462 </ItemizedList>
4463
4464 </Para>
4465
4466 </Sect2>
4467
4468 <Sect2>
4469 <Title>List fusion</Title>
4470
4471 <Para>
4472 The RULES mechanism is used to implement fusion (deforestation) of common list functions.
4473 If a "good consumer" consumes an intermediate list constructed by a "good producer", the
4474 intermediate list should be eliminated entirely.
4475 </Para>
4476
4477 <Para>
4478 The following are good producers:
4479
4480 <ItemizedList>
4481 <ListItem>
4482
4483 <Para>
4484  List comprehensions
4485 </Para>
4486 </ListItem>
4487 <ListItem>
4488
4489 <Para>
4490  Enumerations of <Literal>Int</Literal> and <Literal>Char</Literal> (e.g. <Literal>['a'..'z']</Literal>).
4491 </Para>
4492 </ListItem>
4493 <ListItem>
4494
4495 <Para>
4496  Explicit lists (e.g. <Literal>[True, False]</Literal>)
4497 </Para>
4498 </ListItem>
4499 <ListItem>
4500
4501 <Para>
4502  The cons constructor (e.g <Literal>3:4:[]</Literal>)
4503 </Para>
4504 </ListItem>
4505 <ListItem>
4506
4507 <Para>
4508  <Function>++</Function>
4509 </Para>
4510 </ListItem>
4511 <ListItem>
4512
4513 <Para>
4514  <Function>map</Function>
4515 </Para>
4516 </ListItem>
4517 <ListItem>
4518
4519 <Para>
4520  <Function>filter</Function>
4521 </Para>
4522 </ListItem>
4523 <ListItem>
4524
4525 <Para>
4526  <Function>iterate</Function>, <Function>repeat</Function>
4527 </Para>
4528 </ListItem>
4529 <ListItem>
4530
4531 <Para>
4532  <Function>zip</Function>, <Function>zipWith</Function>
4533 </Para>
4534 </ListItem>
4535
4536 </ItemizedList>
4537
4538 </Para>
4539
4540 <Para>
4541 The following are good consumers:
4542
4543 <ItemizedList>
4544 <ListItem>
4545
4546 <Para>
4547  List comprehensions
4548 </Para>
4549 </ListItem>
4550 <ListItem>
4551
4552 <Para>
4553  <Function>array</Function> (on its second argument)
4554 </Para>
4555 </ListItem>
4556 <ListItem>
4557
4558 <Para>
4559  <Function>length</Function>
4560 </Para>
4561 </ListItem>
4562 <ListItem>
4563
4564 <Para>
4565  <Function>++</Function> (on its first argument)
4566 </Para>
4567 </ListItem>
4568 <ListItem>
4569
4570 <Para>
4571  <Function>map</Function>
4572 </Para>
4573 </ListItem>
4574 <ListItem>
4575
4576 <Para>
4577  <Function>filter</Function>
4578 </Para>
4579 </ListItem>
4580 <ListItem>
4581
4582 <Para>
4583  <Function>concat</Function>
4584 </Para>
4585 </ListItem>
4586 <ListItem>
4587
4588 <Para>
4589  <Function>unzip</Function>, <Function>unzip2</Function>, <Function>unzip3</Function>, <Function>unzip4</Function>
4590 </Para>
4591 </ListItem>
4592 <ListItem>
4593
4594 <Para>
4595  <Function>zip</Function>, <Function>zipWith</Function> (but on one argument only; if both are good producers, <Function>zip</Function>
4596 will fuse with one but not the other)
4597 </Para>
4598 </ListItem>
4599 <ListItem>
4600
4601 <Para>
4602  <Function>partition</Function>
4603 </Para>
4604 </ListItem>
4605 <ListItem>
4606
4607 <Para>
4608  <Function>head</Function>
4609 </Para>
4610 </ListItem>
4611 <ListItem>
4612
4613 <Para>
4614  <Function>and</Function>, <Function>or</Function>, <Function>any</Function>, <Function>all</Function>
4615 </Para>
4616 </ListItem>
4617 <ListItem>
4618
4619 <Para>
4620  <Function>sequence&lowbar;</Function>
4621 </Para>
4622 </ListItem>
4623 <ListItem>
4624
4625 <Para>
4626  <Function>msum</Function>
4627 </Para>
4628 </ListItem>
4629 <ListItem>
4630
4631 <Para>
4632  <Function>sortBy</Function>
4633 </Para>
4634 </ListItem>
4635
4636 </ItemizedList>
4637
4638 </Para>
4639
4640 <Para>
4641 So, for example, the following should generate no intermediate lists:
4642
4643 <ProgramListing>
4644 array (1,10) [(i,i*i) | i &#60;- map (+ 1) [0..9]]
4645 </ProgramListing>
4646
4647 </Para>
4648
4649 <Para>
4650 This list could readily be extended; if there are Prelude functions that you use
4651 a lot which are not included, please tell us.
4652 </Para>
4653
4654 <Para>
4655 If you want to write your own good consumers or producers, look at the
4656 Prelude definitions of the above functions to see how to do so.
4657 </Para>
4658
4659 </Sect2>
4660
4661 <Sect2 id="rule-spec">
4662 <Title>Specialisation
4663 </Title>
4664
4665 <Para>
4666 Rewrite rules can be used to get the same effect as a feature
4667 present in earlier version of GHC:
4668
4669 <ProgramListing>
4670   {-# SPECIALIZE fromIntegral :: Int8 -> Int16 = int8ToInt16 #-}
4671 </ProgramListing>
4672
4673 This told GHC to use <Function>int8ToInt16</Function> instead of <Function>fromIntegral</Function> whenever
4674 the latter was called with type <Literal>Int8 -&gt; Int16</Literal>.  That is, rather than
4675 specialising the original definition of <Function>fromIntegral</Function> the programmer is
4676 promising that it is safe to use <Function>int8ToInt16</Function> instead.
4677 </Para>
4678
4679 <Para>
4680 This feature is no longer in GHC.  But rewrite rules let you do the
4681 same thing:
4682
4683 <ProgramListing>
4684 {-# RULES
4685   "fromIntegral/Int8/Int16" fromIntegral = int8ToInt16
4686 #-}
4687 </ProgramListing>
4688
4689 This slightly odd-looking rule instructs GHC to replace <Function>fromIntegral</Function>
4690 by <Function>int8ToInt16</Function> <Emphasis>whenever the types match</Emphasis>.  Speaking more operationally,
4691 GHC adds the type and dictionary applications to get the typed rule
4692
4693 <ProgramListing>
4694 forall (d1::Integral Int8) (d2::Num Int16) .
4695         fromIntegral Int8 Int16 d1 d2 = int8ToInt16
4696 </ProgramListing>
4697
4698 What is more,
4699 this rule does not need to be in the same file as fromIntegral,
4700 unlike the <Literal>SPECIALISE</Literal> pragmas which currently do (so that they
4701 have an original definition available to specialise).
4702 </Para>
4703
4704 </Sect2>
4705
4706 <Sect2>
4707 <Title>Controlling what's going on</Title>
4708
4709 <Para>
4710
4711 <ItemizedList>
4712 <ListItem>
4713
4714 <Para>
4715  Use <Option>-ddump-rules</Option> to see what transformation rules GHC is using.
4716 </Para>
4717 </ListItem>
4718 <ListItem>
4719
4720 <Para>
4721  Use <Option>-ddump-simpl-stats</Option> to see what rules are being fired.
4722 If you add <Option>-dppr-debug</Option> you get a more detailed listing.
4723 </Para>
4724 </ListItem>
4725 <ListItem>
4726
4727 <Para>
4728  The defintion of (say) <Function>build</Function> in <FileName>PrelBase.lhs</FileName> looks llike this:
4729
4730 <ProgramListing>
4731         build   :: forall a. (forall b. (a -> b -> b) -> b -> b) -> [a]
4732         {-# INLINE build #-}
4733         build g = g (:) []
4734 </ProgramListing>
4735
4736 Notice the <Literal>INLINE</Literal>!  That prevents <Literal>(:)</Literal> from being inlined when compiling
4737 <Literal>PrelBase</Literal>, so that an importing module will &ldquo;see&rdquo; the <Literal>(:)</Literal>, and can
4738 match it on the LHS of a rule.  <Literal>INLINE</Literal> prevents any inlining happening
4739 in the RHS of the <Literal>INLINE</Literal> thing.  I regret the delicacy of this.
4740
4741 </Para>
4742 </ListItem>
4743 <ListItem>
4744
4745 <Para>
4746  In <Filename>ghc/lib/std/PrelBase.lhs</Filename> look at the rules for <Function>map</Function> to
4747 see how to write rules that will do fusion and yet give an efficient
4748 program even if fusion doesn't happen.  More rules in <Filename>PrelList.lhs</Filename>.
4749 </Para>
4750 </ListItem>
4751
4752 </ItemizedList>
4753
4754 </Para>
4755
4756 </Sect2>
4757
4758 </Sect1>
4759
4760 <!-- Emacs stuff:
4761      ;;; Local Variables: ***
4762      ;;; mode: sgml ***
4763      ;;; sgml-parent-document: ("users_guide.sgml" "book" "chapter" "sect1") ***
4764      ;;; End: ***
4765  -->