ghc/docs/users_guide/glasgow_exts.sgml

   1 <Para>
   2 <IndexTerm><Primary>language, GHC</Primary></IndexTerm>
   3 <IndexTerm><Primary>extensions, GHC</Primary></IndexTerm>
   4 As with all known Haskell systems, GHC implements some extensions to
   5 the language.  To use them, you'll need to give a <Option>-fglasgow-exts</Option>
   6 <IndexTerm><Primary>-fglasgow-exts option</Primary></IndexTerm> option.
   7 </Para>
   8
   9 <Para>
  10 Virtually all of the Glasgow extensions serve to give you access to
  11 the underlying facilities with which we implement Haskell.  Thus, you
  12 can get at the Raw Iron, if you are willing to write some non-standard
  13 code at a more primitive level.  You need not be &ldquo;stuck&rdquo; on
  14 performance because of the implementation costs of Haskell's
  15 &ldquo;high-level&rdquo; features&mdash;you can always code &ldquo;under&rdquo; them.  In an extreme case, you can write all your time-critical code in C, and then just glue it together with Haskell!
  16 </Para>
  17
  18 <Para>
  19 Executive summary of our extensions:
  20 </Para>
  21
  22 <Para>
  23 <VariableList>
  24
  25 <VarListEntry>
  26 <Term>Unboxed types and primitive operations:</Term>
  27 <ListItem>
  28 <Para>
  29 You can get right down to the raw machine types and operations;
  30 included in this are &ldquo;primitive arrays&rdquo; (direct access to Big Wads
  31 of Bytes).  Please see <XRef LinkEnd="glasgow-unboxed"> and following.
  32 </Para>
  33 </ListItem>
  34 </VarListEntry>
  35
  36 <VarListEntry>
  37 <Term>Multi-parameter type classes:</Term>
  38 <ListItem>
  39 <Para>
  40 GHC's type system supports extended type classes with multiple
  41 parameters.  Please see <XRef LinkEnd="multi-param-type-classes">.
  42 </Para>
  43 </ListItem>
  44 </VarListEntry>
  45
  46 <VarListEntry>
  47 <Term>Local universal quantification:</Term>
  48 <ListItem>
  49 <Para>
  50 GHC's type system supports explicit universal quantification in
  51 constructor fields and function arguments.  This is useful for things
  52 like defining <Literal>runST</Literal> from the state-thread world.  See <XRef LinkEnd="universal-quantification">.
  53 </Para>
  54 </ListItem>
  55 </VarListEntry>
  56
  57 <VarListEntry>
  58 <Term>Extistentially quantification in data types:</Term>
  59 <ListItem>
  60 <Para>
  61 Some or all of the type variables in a datatype declaration may be
  62 <Emphasis>existentially quantified</Emphasis>.  More details in <XRef LinkEnd="existential-quantification">.
  63 </Para>
  64 </ListItem>
  65 </VarListEntry>
  66
  67 <VarListEntry>
  68 <Term>Scoped type variables:</Term>
  69 <ListItem>
  70 <Para>
  71 Scoped type variables enable the programmer to supply type signatures
  72 for some nested declarations, where this would not be legal in Haskell
  73 98.  Details in <XRef LinkEnd="scoped-type-variables">.
  74 </Para>
  75 </ListItem>
  76 </VarListEntry>
  77
  78 <VarListEntry>
  79 <Term>Pattern guards</Term>
  80 <ListItem>
  81 <Para>
  82 Instead of being a boolean expression, a guard is a list of qualifiers, exactly as in a list comprehension. See <XRef LinkEnd="pattern-guards">.
  83 </Para>
  84 </ListItem>
  85 </VarListEntry>
  86
  87 <VarListEntry>
  88 <Term>Foreign calling:</Term>
  89 <ListItem>
  90 <Para>
  91 Just what it sounds like.  We provide <Emphasis>lots</Emphasis> of rope that you
  92 can dangle around your neck.  Please see <XRef LinkEnd="ffi">.
  93 </Para>
  94 </ListItem>
  95 </VarListEntry>
  96
  97 <VarListEntry>
  98 <Term>Pragmas</Term>
  99 <ListItem>
 100 <Para>
 101 Pragmas are special instructions to the compiler placed in the source
 102 file.  The pragmas GHC supports are described in <XRef LinkEnd="pragmas">.
 103 </Para>
 104 </ListItem>
 105 </VarListEntry>
 106
 107 <VarListEntry>
 108 <Term>Rewrite rules:</Term>
 109 <ListItem>
 110 <Para>
 111 The programmer can specify rewrite rules as part of the source program
 112 (in a pragma).  GHC applies these rewrite rules wherever it can.
 113 Details in <XRef LinkEnd="rewrite-rules">.
 114 </Para>
 115 </ListItem>
 116 </VarListEntry>
 117 </VariableList>
 118 </Para>
 119
 120 <Para>
 121 Before you get too carried away working at the lowest level (e.g.,
 122 sloshing <Literal>MutableByteArray&num;</Literal>s around your
 123 program), you may wish to check if there are libraries that provide a
 124 &ldquo;Haskellised veneer&rdquo; over the features you want.  See
 125 <xref linkend="book-hslibs">.
 126 </Para>
 127
 128 <Sect1 id="primitives">
 129 <Title>Unboxed types and primitive operations
 130 </Title>
 131 <IndexTerm><Primary>PrelGHC module</Primary></IndexTerm>
 132
 133 <Para>
 134 This module defines all the types which are primitive in Glasgow
 135 Haskell, and the operations provided for them.
 136 </Para>
 137
 138 <Sect2 id="glasgow-unboxed">
 139 <Title>Unboxed types
 140 </Title>
 141
 142 <Para>
 143 <IndexTerm><Primary>Unboxed types (Glasgow extension)</Primary></IndexTerm>
 144 </Para>
 145
 146 <para>Most types in GHC are <firstterm>boxed</firstterm>, which means
 147 that values of that type are represented by a pointer to a heap
 148 object.  The representation of a Haskell <literal>Int</literal>, for
 149 example, is a two-word heap object.  An <firstterm>unboxed</firstterm>
 150 type, however, is represented by the value itself, no pointers or heap
 151 allocation are involved.
 152 </para>
 153
 154 <Para>
 155 Unboxed types correspond to the &ldquo;raw machine&rdquo; types you
 156 would use in C: <Literal>Int&num;</Literal> (long int),
 157 <Literal>Double&num;</Literal> (double), <Literal>Addr&num;</Literal>
 158 (void *), etc.  The <Emphasis>primitive operations</Emphasis>
 159 (PrimOps) on these types are what you might expect; e.g.,
 160 <Literal>(+&num;)</Literal> is addition on
 161 <Literal>Int&num;</Literal>s, and is the machine-addition that we all
 162 know and love&mdash;usually one instruction.
 163 </Para>
 164
 165 <Para>
 166 Primitive (unboxed) types cannot be defined in Haskell, and are
 167 therefore built into the language and compiler.  Primitive types are
 168 always unlifted; that is, a value of a primitive type cannot be
 169 bottom.  We use the convention that primitive types, values, and
 170 operations have a <Literal>&num;</Literal> suffix.
 171 </Para>
 172
 173 <Para>
 174 Primitive values are often represented by a simple bit-pattern, such
 175 as <Literal>Int&num;</Literal>, <Literal>Float&num;</Literal>,
 176 <Literal>Double&num;</Literal>.  But this is not necessarily the case:
 177 a primitive value might be represented by a pointer to a
 178 heap-allocated object.  Examples include
 179 <Literal>Array&num;</Literal>, the type of primitive arrays.  A
 180 primitive array is heap-allocated because it is too big a value to fit
 181 in a register, and would be too expensive to copy around; in a sense,
 182 it is accidental that it is represented by a pointer.  If a pointer
 183 represents a primitive value, then it really does point to that value:
 184 no unevaluated thunks, no indirections&hellip;nothing can be at the
 185 other end of the pointer than the primitive value.
 186 </Para>
 187
 188 <Para>
 189 There are some restrictions on the use of primitive types, the main
 190 one being that you can't pass a primitive value to a polymorphic
 191 function or store one in a polymorphic data type.  This rules out
 192 things like <Literal>[Int&num;]</Literal> (i.e. lists of primitive
 193 integers).  The reason for this restriction is that polymorphic
 194 arguments and constructor fields are assumed to be pointers: if an
 195 unboxed integer is stored in one of these, the garbage collector would
 196 attempt to follow it, leading to unpredictable space leaks.  Or a
 197 <Function>seq</Function> operation on the polymorphic component may
 198 attempt to dereference the pointer, with disastrous results.  Even
 199 worse, the unboxed value might be larger than a pointer
 200 (<Literal>Double&num;</Literal> for instance).
 201 </Para>
 202
 203 <Para>
 204 Nevertheless, A numerically-intensive program using unboxed types can
 205 go a <Emphasis>lot</Emphasis> faster than its &ldquo;standard&rdquo;
 206 counterpart&mdash;we saw a threefold speedup on one example.
 207 </Para>
 208
 209 </sect2>
 210
 211 <Sect2 id="unboxed-tuples">
 212 <Title>Unboxed Tuples
 213 </Title>
 214
 215 <Para>
 216 Unboxed tuples aren't really exported by <Literal>PrelGHC</Literal>,
 217 they're available by default with <Option>-fglasgow-exts</Option>.  An
 218 unboxed tuple looks like this:
 219 </Para>
 220
 221 <Para>
 222
 223 <ProgramListing>
 224 (# e_1, ..., e_n #)
 225 </ProgramListing>
 226
 227 </Para>
 228
 229 <Para>
 230 where <Literal>e&lowbar;1..e&lowbar;n</Literal> are expressions of any
 231 type (primitive or non-primitive).  The type of an unboxed tuple looks
 232 the same.
 233 </Para>
 234
 235 <Para>
 236 Unboxed tuples are used for functions that need to return multiple
 237 values, but they avoid the heap allocation normally associated with
 238 using fully-fledged tuples.  When an unboxed tuple is returned, the
 239 components are put directly into registers or on the stack; the
 240 unboxed tuple itself does not have a composite representation.  Many
 241 of the primitive operations listed in this section return unboxed
 242 tuples.
 243 </Para>
 244
 245 <Para>
 246 There are some pretty stringent restrictions on the use of unboxed tuples:
 247 </Para>
 248
 249 <Para>
 250
 251 <ItemizedList>
 252 <ListItem>
 253
 254 <Para>
 255  Unboxed tuple types are subject to the same restrictions as
 256 other unboxed types; i.e. they may not be stored in polymorphic data
 257 structures or passed to polymorphic functions.
 258
 259 </Para>
 260 </ListItem>
 261 <ListItem>
 262
 263 <Para>
 264  Unboxed tuples may only be constructed as the direct result of
 265 a function, and may only be deconstructed with a <Literal>case</Literal> expression.
 266 eg. the following are valid:
 267
 268
 269 <ProgramListing>
 270 f x y = (# x+1, y-1 #)
 271 g x = case f x x of { (# a, b #) -&#62; a + b }
 272 </ProgramListing>
 273
 274
 275 but the following are invalid:
 276
 277
 278 <ProgramListing>
 279 f x y = g (# x, y #)
 280 g (# x, y #) = x + y
 281 </ProgramListing>
 282
 283
 284 </Para>
 285 </ListItem>
 286 <ListItem>
 287
 288 <Para>
 289  No variable can have an unboxed tuple type.  This is illegal:
 290
 291
 292 <ProgramListing>
 293 f :: (# Int, Int #) -&#62; (# Int, Int #)
 294 f x = x
 295 </ProgramListing>
 296
 297
 298 because <VarName>x</VarName> has an unboxed tuple type.
 299
 300 </Para>
 301 </ListItem>
 302
 303 </ItemizedList>
 304
 305 </Para>
 306
 307 <Para>
 308 Note: we may relax some of these restrictions in the future.
 309 </Para>
 310
 311 <Para>
 312 The <Literal>IO</Literal> and <Literal>ST</Literal> monads use unboxed tuples to avoid unnecessary
 313 allocation during sequences of operations.
 314 </Para>
 315
 316 </Sect2>
 317
 318 <Sect2>
 319 <Title>Character and numeric types</Title>
 320
 321 <Para>
 322 <IndexTerm><Primary>character types, primitive</Primary></IndexTerm>
 323 <IndexTerm><Primary>numeric types, primitive</Primary></IndexTerm>
 324 <IndexTerm><Primary>integer types, primitive</Primary></IndexTerm>
 325 <IndexTerm><Primary>floating point types, primitive</Primary></IndexTerm>
 326 There are the following obvious primitive types:
 327 </Para>
 328
 329 <Para>
 330
 331 <ProgramListing>
 332 type Char#
 333 type Int#
 334 type Word#
 335 type Addr#
 336 type Float#
 337 type Double#
 338 type Int64#
 339 type Word64#
 340 </ProgramListing>
 341
 342 <IndexTerm><Primary><literal>Char&num;</literal></Primary></IndexTerm>
 343 <IndexTerm><Primary><literal>Int&num;</literal></Primary></IndexTerm>
 344 <IndexTerm><Primary><literal>Word&num;</literal></Primary></IndexTerm>
 345 <IndexTerm><Primary><literal>Addr&num;</literal></Primary></IndexTerm>
 346 <IndexTerm><Primary><literal>Float&num;</literal></Primary></IndexTerm>
 347 <IndexTerm><Primary><literal>Double&num;</literal></Primary></IndexTerm>
 348 <IndexTerm><Primary><literal>Int64&num;</literal></Primary></IndexTerm>
 349 <IndexTerm><Primary><literal>Word64&num;</literal></Primary></IndexTerm>
 350 </Para>
 351
 352 <Para>
 353 If you really want to know their exact equivalents in C, see
 354 <Filename>ghc/includes/StgTypes.h</Filename> in the GHC source tree.
 355 </Para>
 356
 357 <Para>
 358 Literals for these types may be written as follows:
 359 </Para>
 360
 361 <Para>
 362
 363 <ProgramListing>
 364 1#              an Int#
 365 1.2#            a Float#
 366 1.34##          a Double#
 367 'a'#            a Char#; for weird characters, use '\o&#60;octal&#62;'#
 368 "a"#            an Addr# (a `char *')
 369 </ProgramListing>
 370
 371 <IndexTerm><Primary>literals, primitive</Primary></IndexTerm>
 372 <IndexTerm><Primary>constants, primitive</Primary></IndexTerm>
 373 <IndexTerm><Primary>numbers, primitive</Primary></IndexTerm>
 374 </Para>
 375
 376 </Sect2>
 377
 378 <Sect2>
 379 <Title>Comparison operations</Title>
 380
 381 <Para>
 382 <IndexTerm><Primary>comparisons, primitive</Primary></IndexTerm>
 383 <IndexTerm><Primary>operators, comparison</Primary></IndexTerm>
 384 </Para>
 385
 386 <Para>
 387
 388 <ProgramListing>
 389 {&#62;,&#62;=,==,/=,&#60;,&#60;=}# :: Int# -&#62; Int# -&#62; Bool
 390
 391 {gt,ge,eq,ne,lt,le}Char# :: Char# -&#62; Char# -&#62; Bool
 392     -- ditto for Word# and Addr#
 393 </ProgramListing>
 394
 395 <IndexTerm><Primary><literal>&#62;&num;</literal></Primary></IndexTerm>
 396 <IndexTerm><Primary><literal>&#62;=&num;</literal></Primary></IndexTerm>
 397 <IndexTerm><Primary><literal>==&num;</literal></Primary></IndexTerm>
 398 <IndexTerm><Primary><literal>/=&num;</literal></Primary></IndexTerm>
 399 <IndexTerm><Primary><literal>&#60;&num;</literal></Primary></IndexTerm>
 400 <IndexTerm><Primary><literal>&#60;=&num;</literal></Primary></IndexTerm>
 401 <IndexTerm><Primary><literal>gt&lcub;Char,Word,Addr&rcub;&num;</literal></Primary></IndexTerm>
 402 <IndexTerm><Primary><literal>ge&lcub;Char,Word,Addr&rcub;&num;</literal></Primary></IndexTerm>
 403 <IndexTerm><Primary><literal>eq&lcub;Char,Word,Addr&rcub;&num;</literal></Primary></IndexTerm>
 404 <IndexTerm><Primary><literal>ne&lcub;Char,Word,Addr&rcub;&num;</literal></Primary></IndexTerm>
 405 <IndexTerm><Primary><literal>lt&lcub;Char,Word,Addr&rcub;&num;</literal></Primary></IndexTerm>
 406 <IndexTerm><Primary><literal>le&lcub;Char,Word,Addr&rcub;&num;</literal></Primary></IndexTerm>
 407 </Para>
 408
 409 </Sect2>
 410
 411 <Sect2>
 412 <Title>Primitive-character operations</Title>
 413
 414 <Para>
 415 <IndexTerm><Primary>characters, primitive operations</Primary></IndexTerm>
 416 <IndexTerm><Primary>operators, primitive character</Primary></IndexTerm>
 417 </Para>
 418
 419 <Para>
 420
 421 <ProgramListing>
 422 ord# :: Char# -&#62; Int#
 423 chr# :: Int# -&#62; Char#
 424 </ProgramListing>
 425
 426 <IndexTerm><Primary><literal>ord&num;</literal></Primary></IndexTerm>
 427 <IndexTerm><Primary><literal>chr&num;</literal></Primary></IndexTerm>
 428 </Para>
 429
 430 </Sect2>
 431
 432 <Sect2>
 433 <Title>Primitive-<Literal>Int</Literal> operations</Title>
 434
 435 <Para>
 436 <IndexTerm><Primary>integers, primitive operations</Primary></IndexTerm>
 437 <IndexTerm><Primary>operators, primitive integer</Primary></IndexTerm>
 438 </Para>
 439
 440 <Para>
 441
 442 <ProgramListing>
 443 {+,-,*,quotInt,remInt,gcdInt}# :: Int# -&#62; Int# -&#62; Int#
 444 negateInt# :: Int# -&#62; Int#
 445
 446 iShiftL#, iShiftRA#, iShiftRL# :: Int# -&#62; Int# -&#62; Int#
 447         -- shift left, right arithmetic, right logical
 448
 449 addIntC#, subIntC#, mulIntC# :: Int# -> Int# -> (# Int#, Int# #)
 450         -- add, subtract, multiply with carry
 451 </ProgramListing>
 452
 453 <IndexTerm><Primary><literal>+&num;</literal></Primary></IndexTerm>
 454 <IndexTerm><Primary><literal>-&num;</literal></Primary></IndexTerm>
 455 <IndexTerm><Primary><literal>*&num;</literal></Primary></IndexTerm>
 456 <IndexTerm><Primary><literal>quotInt&num;</literal></Primary></IndexTerm>
 457 <IndexTerm><Primary><literal>remInt&num;</literal></Primary></IndexTerm>
 458 <IndexTerm><Primary><literal>gcdInt&num;</literal></Primary></IndexTerm>
 459 <IndexTerm><Primary><literal>iShiftL&num;</literal></Primary></IndexTerm>
 460 <IndexTerm><Primary><literal>iShiftRA&num;</literal></Primary></IndexTerm>
 461 <IndexTerm><Primary><literal>iShiftRL&num;</literal></Primary></IndexTerm>
 462 <IndexTerm><Primary><literal>addIntC&num;</literal></Primary></IndexTerm>
 463 <IndexTerm><Primary><literal>subIntC&num;</literal></Primary></IndexTerm>
 464 <IndexTerm><Primary><literal>mulIntC&num;</literal></Primary></IndexTerm>
 465 <IndexTerm><Primary>shift operations, integer</Primary></IndexTerm>
 466 </Para>
 467
 468 <Para>
 469 <Emphasis>Note:</Emphasis> No error/overflow checking!
 470 </Para>
 471
 472 </Sect2>
 473
 474 <Sect2>
 475 <Title>Primitive-<Literal>Double</Literal> and <Literal>Float</Literal> operations</Title>
 476
 477 <Para>
 478 <IndexTerm><Primary>floating point numbers, primitive</Primary></IndexTerm>
 479 <IndexTerm><Primary>operators, primitive floating point</Primary></IndexTerm>
 480 </Para>
 481
 482 <Para>
 483
 484 <ProgramListing>
 485 {+,-,*,/}##         :: Double# -&#62; Double# -&#62; Double#
 486 {&#60;,&#60;=,==,/=,&#62;=,&#62;}## :: Double# -&#62; Double# -&#62; Bool
 487 negateDouble#       :: Double# -&#62; Double#
 488 double2Int#         :: Double# -&#62; Int#
 489 int2Double#         :: Int#    -&#62; Double#
 490
 491 {plus,minux,times,divide}Float# :: Float# -&#62; Float# -&#62; Float#
 492 {gt,ge,eq,ne,lt,le}Float# :: Float# -&#62; Float# -&#62; Bool
 493 negateFloat#        :: Float# -&#62; Float#
 494 float2Int#          :: Float# -&#62; Int#
 495 int2Float#          :: Int#   -&#62; Float#
 496 </ProgramListing>
 497
 498 </Para>
 499
 500 <Para>
 501 <IndexTerm><Primary><literal>+&num;&num;</literal></Primary></IndexTerm>
 502 <IndexTerm><Primary><literal>-&num;&num;</literal></Primary></IndexTerm>
 503 <IndexTerm><Primary><literal>*&num;&num;</literal></Primary></IndexTerm>
 504 <IndexTerm><Primary><literal>/&num;&num;</literal></Primary></IndexTerm>
 505 <IndexTerm><Primary><literal>&#60;&num;&num;</literal></Primary></IndexTerm>
 506 <IndexTerm><Primary><literal>&#60;=&num;&num;</literal></Primary></IndexTerm>
 507 <IndexTerm><Primary><literal>==&num;&num;</literal></Primary></IndexTerm>
 508 <IndexTerm><Primary><literal>=/&num;&num;</literal></Primary></IndexTerm>
 509 <IndexTerm><Primary><literal>&#62;=&num;&num;</literal></Primary></IndexTerm>
 510 <IndexTerm><Primary><literal>&#62;&num;&num;</literal></Primary></IndexTerm>
 511 <IndexTerm><Primary><literal>negateDouble&num;</literal></Primary></IndexTerm>
 512 <IndexTerm><Primary><literal>double2Int&num;</literal></Primary></IndexTerm>
 513 <IndexTerm><Primary><literal>int2Double&num;</literal></Primary></IndexTerm>
 514 </Para>
 515
 516 <Para>
 517 <IndexTerm><Primary><literal>plusFloat&num;</literal></Primary></IndexTerm>
 518 <IndexTerm><Primary><literal>minusFloat&num;</literal></Primary></IndexTerm>
 519 <IndexTerm><Primary><literal>timesFloat&num;</literal></Primary></IndexTerm>
 520 <IndexTerm><Primary><literal>divideFloat&num;</literal></Primary></IndexTerm>
 521 <IndexTerm><Primary><literal>gtFloat&num;</literal></Primary></IndexTerm>
 522 <IndexTerm><Primary><literal>geFloat&num;</literal></Primary></IndexTerm>
 523 <IndexTerm><Primary><literal>eqFloat&num;</literal></Primary></IndexTerm>
 524 <IndexTerm><Primary><literal>neFloat&num;</literal></Primary></IndexTerm>
 525 <IndexTerm><Primary><literal>ltFloat&num;</literal></Primary></IndexTerm>
 526 <IndexTerm><Primary><literal>leFloat&num;</literal></Primary></IndexTerm>
 527 <IndexTerm><Primary><literal>negateFloat&num;</literal></Primary></IndexTerm>
 528 <IndexTerm><Primary><literal>float2Int&num;</literal></Primary></IndexTerm>
 529 <IndexTerm><Primary><literal>int2Float&num;</literal></Primary></IndexTerm>
 530 </Para>
 531
 532 <Para>
 533 And a full complement of trigonometric functions:
 534 </Para>
 535
 536 <Para>
 537
 538 <ProgramListing>
 539 expDouble#      :: Double# -&#62; Double#
 540 logDouble#      :: Double# -&#62; Double#
 541 sqrtDouble#     :: Double# -&#62; Double#
 542 sinDouble#      :: Double# -&#62; Double#
 543 cosDouble#      :: Double# -&#62; Double#
 544 tanDouble#      :: Double# -&#62; Double#
 545 asinDouble#     :: Double# -&#62; Double#
 546 acosDouble#     :: Double# -&#62; Double#
 547 atanDouble#     :: Double# -&#62; Double#
 548 sinhDouble#     :: Double# -&#62; Double#
 549 coshDouble#     :: Double# -&#62; Double#
 550 tanhDouble#     :: Double# -&#62; Double#
 551 powerDouble#    :: Double# -&#62; Double# -&#62; Double#
 552 </ProgramListing>
 553
 554 <IndexTerm><Primary>trigonometric functions, primitive</Primary></IndexTerm>
 555 </Para>
 556
 557 <Para>
 558 similarly for <Literal>Float&num;</Literal>.
 559 </Para>
 560
 561 <Para>
 562 There are two coercion functions for <Literal>Float&num;</Literal>/<Literal>Double&num;</Literal>:
 563 </Para>
 564
 565 <Para>
 566
 567 <ProgramListing>
 568 float2Double#   :: Float# -&#62; Double#
 569 double2Float#   :: Double# -&#62; Float#
 570 </ProgramListing>
 571
 572 <IndexTerm><Primary><literal>float2Double&num;</literal></Primary></IndexTerm>
 573 <IndexTerm><Primary><literal>double2Float&num;</literal></Primary></IndexTerm>
 574 </Para>
 575
 576 <Para>
 577 The primitive version of <Function>decodeDouble</Function>
 578 (<Function>encodeDouble</Function> is implemented as an external C
 579 function):
 580 </Para>
 581
 582 <Para>
 583
 584 <ProgramListing>
 585 decodeDouble#   :: Double# -&#62; PrelNum.ReturnIntAndGMP
 586 </ProgramListing>
 587
 588 <IndexTerm><Primary><literal>encodeDouble&num;</literal></Primary></IndexTerm>
 589 <IndexTerm><Primary><literal>decodeDouble&num;</literal></Primary></IndexTerm>
 590 </Para>
 591
 592 <Para>
 593 (And the same for <Literal>Float&num;</Literal>s.)
 594 </Para>
 595
 596 </Sect2>
 597
 598 <Sect2 id="integer-operations">
 599 <Title>Operations on/for <Literal>Integers</Literal> (interface to GMP)
 600 </Title>
 601
 602 <Para>
 603 <IndexTerm><Primary>arbitrary precision integers</Primary></IndexTerm>
 604 <IndexTerm><Primary>Integer, operations on</Primary></IndexTerm>
 605 </Para>
 606
 607 <Para>
 608 We implement <Literal>Integers</Literal> (arbitrary-precision
 609 integers) using the GNU multiple-precision (GMP) package (version
 610 2.0.2).
 611 </Para>
 612
 613 <Para>
 614 The data type for <Literal>Integer</Literal> is either a small
 615 integer, represented by an <Literal>Int</Literal>, or a large integer
 616 represented using the pieces required by GMP's
 617 <Literal>MP&lowbar;INT</Literal> in <Filename>gmp.h</Filename> (see
 618 <Filename>gmp.info</Filename> in
 619 <Filename>ghc/includes/runtime/gmp</Filename>).  It comes out as:
 620 </Para>
 621
 622 <Para>
 623
 624 <ProgramListing>
 625 data Integer = S# Int#             -- small integers
 626              | J# Int# ByteArray#  -- large integers
 627 </ProgramListing>
 628
 629 <IndexTerm><Primary>Integer type</Primary></IndexTerm> The primitive
 630 ops to support large <Literal>Integers</Literal> use the
 631 &ldquo;pieces&rdquo; of the representation, and are as follows:
 632 </Para>
 633
 634 <Para>
 635
 636 <ProgramListing>
 637 negateInteger#  :: Int# -&#62; ByteArray# -&#62; Integer
 638
 639 {plus,minus,times}Integer#, gcdInteger#,
 640   quotInteger#, remInteger#, divExactInteger#
 641         :: Int# -> ByteArray#
 642         -> Int# -> ByteArray#
 643         -> (# Int#, ByteArray# #)
 644
 645 cmpInteger#
 646         :: Int# -> ByteArray#
 647         -> Int# -> ByteArray#
 648         -> Int# -- -1 for &#60;; 0 for ==; +1 for >
 649
 650 cmpIntegerInt#
 651         :: Int# -> ByteArray#
 652         -> Int#
 653         -> Int# -- -1 for &#60;; 0 for ==; +1 for >
 654
 655 gcdIntegerInt# ::
 656         :: Int# -> ByteArray#
 657         -> Int#
 658         -> Int#
 659
 660 divModInteger#, quotRemInteger#
 661         :: Int# -> ByteArray#
 662         -> Int# -> ByteArray#
 663         -> (# Int#, ByteArray#,
 664                   Int#, ByteArray# #)
 665
 666 integer2Int# :: Int# -> ByteArray# -> Int#
 667
 668 int2Integer#  :: Int#  -> Integer -- NB: no error-checking on these two!
 669 word2Integer# :: Word# -> Integer
 670
 671 addr2Integer# :: Addr# -> Integer
 672         -- the Addr# is taken to be a `char *' string
 673         -- to be converted into an Integer.
 674 </ProgramListing>
 675
 676 <IndexTerm><Primary><literal>negateInteger&num;</literal></Primary></IndexTerm>
 677 <IndexTerm><Primary><literal>plusInteger&num;</literal></Primary></IndexTerm>
 678 <IndexTerm><Primary><literal>minusInteger&num;</literal></Primary></IndexTerm>
 679 <IndexTerm><Primary><literal>timesInteger&num;</literal></Primary></IndexTerm>
 680 <IndexTerm><Primary><literal>quotInteger&num;</literal></Primary></IndexTerm>
 681 <IndexTerm><Primary><literal>remInteger&num;</literal></Primary></IndexTerm>
 682 <IndexTerm><Primary><literal>gcdInteger&num;</literal></Primary></IndexTerm>
 683 <IndexTerm><Primary><literal>gcdIntegerInt&num;</literal></Primary></IndexTerm>
 684 <IndexTerm><Primary><literal>divExactInteger&num;</literal></Primary></IndexTerm>
 685 <IndexTerm><Primary><literal>cmpInteger&num;</literal></Primary></IndexTerm>
 686 <IndexTerm><Primary><literal>divModInteger&num;</literal></Primary></IndexTerm>
 687 <IndexTerm><Primary><literal>quotRemInteger&num;</literal></Primary></IndexTerm>
 688 <IndexTerm><Primary><literal>integer2Int&num;</literal></Primary></IndexTerm>
 689 <IndexTerm><Primary><literal>int2Integer&num;</literal></Primary></IndexTerm>
 690 <IndexTerm><Primary><literal>word2Integer&num;</literal></Primary></IndexTerm>
 691 <IndexTerm><Primary><literal>addr2Integer&num;</literal></Primary></IndexTerm>
 692 </Para>
 693
 694 </Sect2>
 695
 696 <Sect2>
 697 <Title>Words and addresses</Title>
 698
 699 <Para>
 700 <IndexTerm><Primary>word, primitive type</Primary></IndexTerm>
 701 <IndexTerm><Primary>address, primitive type</Primary></IndexTerm>
 702 <IndexTerm><Primary>unsigned integer, primitive type</Primary></IndexTerm>
 703 <IndexTerm><Primary>pointer, primitive type</Primary></IndexTerm>
 704 </Para>
 705
 706 <Para>
 707 A <Literal>Word&num;</Literal> is used for bit-twiddling operations.
 708 It is the same size as an <Literal>Int&num;</Literal>, but has no sign
 709 nor any arithmetic operations.
 710
 711 <ProgramListing>
 712 type Word#      -- Same size/etc as Int# but *unsigned*
 713 type Addr#      -- A pointer from outside the "Haskell world" (from C, probably);
 714                 -- described under "arrays"
 715 </ProgramListing>
 716
 717 <IndexTerm><Primary><literal>Word&num;</literal></Primary></IndexTerm>
 718 <IndexTerm><Primary><literal>Addr&num;</literal></Primary></IndexTerm>
 719 </Para>
 720
 721 <Para>
 722 <Literal>Word&num;</Literal>s and <Literal>Addr&num;</Literal>s have
 723 the usual comparison operations.  Other
 724 unboxed-<Literal>Word</Literal> ops (bit-twiddling and coercions):
 725 </Para>
 726
 727 <Para>
 728
 729 <ProgramListing>
 730 {gt,ge,eq,ne,lt,le}Word# :: Word# -> Word# -> Bool
 731
 732 and#, or#, xor# :: Word# -> Word# -> Word#
 733         -- standard bit ops.
 734
 735 quotWord#, remWord# :: Word# -> Word# -> Word#
 736         -- word (i.e. unsigned) versions are different from int
 737         -- versions, so we have to provide these explicitly.
 738
 739 not# :: Word# -> Word#
 740
 741 shiftL#, shiftRL# :: Word# -> Int# -> Word#
 742         -- shift left, right logical
 743
 744 int2Word#       :: Int#  -> Word# -- just a cast, really
 745 word2Int#       :: Word# -> Int#
 746 </ProgramListing>
 747
 748 <IndexTerm><Primary>bit operations, Word and Addr</Primary></IndexTerm>
 749 <IndexTerm><Primary><literal>gtWord&num;</literal></Primary></IndexTerm>
 750 <IndexTerm><Primary><literal>geWord&num;</literal></Primary></IndexTerm>
 751 <IndexTerm><Primary><literal>eqWord&num;</literal></Primary></IndexTerm>
 752 <IndexTerm><Primary><literal>neWord&num;</literal></Primary></IndexTerm>
 753 <IndexTerm><Primary><literal>ltWord&num;</literal></Primary></IndexTerm>
 754 <IndexTerm><Primary><literal>leWord&num;</literal></Primary></IndexTerm>
 755 <IndexTerm><Primary><literal>and&num;</literal></Primary></IndexTerm>
 756 <IndexTerm><Primary><literal>or&num;</literal></Primary></IndexTerm>
 757 <IndexTerm><Primary><literal>xor&num;</literal></Primary></IndexTerm>
 758 <IndexTerm><Primary><literal>not&num;</literal></Primary></IndexTerm>
 759 <IndexTerm><Primary><literal>quotWord&num;</literal></Primary></IndexTerm>
 760 <IndexTerm><Primary><literal>remWord&num;</literal></Primary></IndexTerm>
 761 <IndexTerm><Primary><literal>shiftL&num;</literal></Primary></IndexTerm>
 762 <IndexTerm><Primary><literal>shiftRA&num;</literal></Primary></IndexTerm>
 763 <IndexTerm><Primary><literal>shiftRL&num;</literal></Primary></IndexTerm>
 764 <IndexTerm><Primary><literal>int2Word&num;</literal></Primary></IndexTerm>
 765 <IndexTerm><Primary><literal>word2Int&num;</literal></Primary></IndexTerm>
 766 </Para>
 767
 768 <Para>
 769 Unboxed-<Literal>Addr</Literal> ops (C casts, really):
 770
 771 <ProgramListing>
 772 {gt,ge,eq,ne,lt,le}Addr# :: Addr# -> Addr# -> Bool
 773
 774 int2Addr#       :: Int#  -> Addr#
 775 addr2Int#       :: Addr# -> Int#
 776 addr2Integer#   :: Addr# -> (# Int#, ByteArray# #)
 777 </ProgramListing>
 778
 779 <IndexTerm><Primary><literal>gtAddr&num;</literal></Primary></IndexTerm>
 780 <IndexTerm><Primary><literal>geAddr&num;</literal></Primary></IndexTerm>
 781 <IndexTerm><Primary><literal>eqAddr&num;</literal></Primary></IndexTerm>
 782 <IndexTerm><Primary><literal>neAddr&num;</literal></Primary></IndexTerm>
 783 <IndexTerm><Primary><literal>ltAddr&num;</literal></Primary></IndexTerm>
 784 <IndexTerm><Primary><literal>leAddr&num;</literal></Primary></IndexTerm>
 785 <IndexTerm><Primary><literal>int2Addr&num;</literal></Primary></IndexTerm>
 786 <IndexTerm><Primary><literal>addr2Int&num;</literal></Primary></IndexTerm>
 787 <IndexTerm><Primary><literal>addr2Integer&num;</literal></Primary></IndexTerm>
 788 </Para>
 789
 790 <Para>
 791 The casts between <Literal>Int&num;</Literal>,
 792 <Literal>Word&num;</Literal> and <Literal>Addr&num;</Literal>
 793 correspond to null operations at the machine level, but are required
 794 to keep the Haskell type checker happy.
 795 </Para>
 796
 797 <Para>
 798 Operations for indexing off of C pointers
 799 (<Literal>Addr&num;</Literal>s) to snatch values are listed under
 800 &ldquo;arrays&rdquo;.
 801 </Para>
 802
 803 </Sect2>
 804
 805 <Sect2>
 806 <Title>Arrays</Title>
 807
 808 <Para>
 809 <IndexTerm><Primary>arrays, primitive</Primary></IndexTerm>
 810 </Para>
 811
 812 <Para>
 813 The type <Literal>Array&num; elt</Literal> is the type of primitive,
 814 unpointed arrays of values of type <Literal>elt</Literal>.
 815 </Para>
 816
 817 <Para>
 818
 819 <ProgramListing>
 820 type Array# elt
 821 </ProgramListing>
 822
 823 <IndexTerm><Primary><literal>Array&num;</literal></Primary></IndexTerm>
 824 </Para>
 825
 826 <Para>
 827 <Literal>Array&num;</Literal> is more primitive than a Haskell
 828 array&mdash;indeed, the Haskell <Literal>Array</Literal> interface is
 829 implemented using <Literal>Array&num;</Literal>&mdash;in that an
 830 <Literal>Array&num;</Literal> is indexed only by
 831 <Literal>Int&num;</Literal>s, starting at zero.  It is also more
 832 primitive by virtue of being unboxed.  That doesn't mean that it isn't
 833 a heap-allocated object&mdash;of course, it is.  Rather, being unboxed
 834 means that it is represented by a pointer to the array itself, and not
 835 to a thunk which will evaluate to the array (or to bottom).  The
 836 components of an <Literal>Array&num;</Literal> are themselves boxed.
 837 </Para>
 838
 839 <Para>
 840 The type <Literal>ByteArray&num;</Literal> is similar to
 841 <Literal>Array&num;</Literal>, except that it contains just a string
 842 of (non-pointer) bytes.
 843 </Para>
 844
 845 <Para>
 846
 847 <ProgramListing>
 848 type ByteArray#
 849 </ProgramListing>
 850
 851 <IndexTerm><Primary><literal>ByteArray&num;</literal></Primary></IndexTerm>
 852 </Para>
 853
 854 <Para>
 855 Arrays of these types are useful when a Haskell program wishes to
 856 construct a value to pass to a C procedure. It is also possible to use
 857 them to build (say) arrays of unboxed characters for internal use in a
 858 Haskell program.  Given these uses, <Literal>ByteArray&num;</Literal>
 859 is deliberately a bit vague about the type of its components.
 860 Operations are provided to extract values of type
 861 <Literal>Char&num;</Literal>, <Literal>Int&num;</Literal>,
 862 <Literal>Float&num;</Literal>, <Literal>Double&num;</Literal>, and
 863 <Literal>Addr&num;</Literal> from arbitrary offsets within a
 864 <Literal>ByteArray&num;</Literal>.  (For type
 865 <Literal>Foo&num;</Literal>, the $i$th offset gets you the $i$th
 866 <Literal>Foo&num;</Literal>, not the <Literal>Foo&num;</Literal> at
 867 byte-position $i$.  Mumble.)  (If you want a
 868 <Literal>Word&num;</Literal>, grab an <Literal>Int&num;</Literal>,
 869 then coerce it.)
 870 </Para>
 871
 872 <Para>
 873 Lastly, we have static byte-arrays, of type
 874 <Literal>Addr&num;</Literal> &lsqb;mentioned previously].  (Remember
 875 the duality between arrays and pointers in C.)  Arrays of this types
 876 are represented by a pointer to an array in the world outside Haskell,
 877 so this pointer is not followed by the garbage collector.  In other
 878 respects they are just like <Literal>ByteArray&num;</Literal>.  They
 879 are only needed in order to pass values from C to Haskell.
 880 </Para>
 881
 882 </Sect2>
 883
 884 <Sect2>
 885 <Title>Reading and writing</Title>
 886
 887 <Para>
 888 Primitive arrays are linear, and indexed starting at zero.
 889 </Para>
 890
 891 <Para>
 892 The size and indices of a <Literal>ByteArray&num;</Literal>, <Literal>Addr&num;</Literal>, and
 893 <Literal>MutableByteArray&num;</Literal> are all in bytes.  It's up to the program to
 894 calculate the correct byte offset from the start of the array.  This
 895 allows a <Literal>ByteArray&num;</Literal> to contain a mixture of values of different
 896 type, which is often needed when preparing data for and unpicking
 897 results from C.  (Umm&hellip;not true of indices&hellip;WDP 95/09)
 898 </Para>
 899
 900 <Para>
 901 <Emphasis>Should we provide some <Literal>sizeOfDouble&num;</Literal> constants?</Emphasis>
 902 </Para>
 903
 904 <Para>
 905 Out-of-range errors on indexing should be caught by the code which
 906 uses the primitive operation; the primitive operations themselves do
 907 <Emphasis>not</Emphasis> check for out-of-range indexes. The intention is that the
 908 primitive ops compile to one machine instruction or thereabouts.
 909 </Para>
 910
 911 <Para>
 912 We use the terms &ldquo;reading&rdquo; and &ldquo;writing&rdquo; to refer to accessing
 913 <Emphasis>mutable</Emphasis> arrays (see <XRef LinkEnd="sect-mutable">), and
 914 &ldquo;indexing&rdquo; to refer to reading a value from an <Emphasis>immutable</Emphasis>
 915 array.
 916 </Para>
 917
 918 <Para>
 919 Immutable byte arrays are straightforward to index (all indices in bytes):
 920
 921 <ProgramListing>
 922 indexCharArray#   :: ByteArray# -> Int# -> Char#
 923 indexIntArray#    :: ByteArray# -> Int# -> Int#
 924 indexAddrArray#   :: ByteArray# -> Int# -> Addr#
 925 indexFloatArray#  :: ByteArray# -> Int# -> Float#
 926 indexDoubleArray# :: ByteArray# -> Int# -> Double#
 927
 928 indexCharOffAddr#   :: Addr# -> Int# -> Char#
 929 indexIntOffAddr#    :: Addr# -> Int# -> Int#
 930 indexFloatOffAddr#  :: Addr# -> Int# -> Float#
 931 indexDoubleOffAddr# :: Addr# -> Int# -> Double#
 932 indexAddrOffAddr#   :: Addr# -> Int# -> Addr#
 933  -- Get an Addr# from an Addr# offset
 934 </ProgramListing>
 935
 936 <IndexTerm><Primary><literal>indexCharArray&num;</literal></Primary></IndexTerm>
 937 <IndexTerm><Primary><literal>indexIntArray&num;</literal></Primary></IndexTerm>
 938 <IndexTerm><Primary><literal>indexAddrArray&num;</literal></Primary></IndexTerm>
 939 <IndexTerm><Primary><literal>indexFloatArray&num;</literal></Primary></IndexTerm>
 940 <IndexTerm><Primary><literal>indexDoubleArray&num;</literal></Primary></IndexTerm>
 941 <IndexTerm><Primary><literal>indexCharOffAddr&num;</literal></Primary></IndexTerm>
 942 <IndexTerm><Primary><literal>indexIntOffAddr&num;</literal></Primary></IndexTerm>
 943 <IndexTerm><Primary><literal>indexFloatOffAddr&num;</literal></Primary></IndexTerm>
 944 <IndexTerm><Primary><literal>indexDoubleOffAddr&num;</literal></Primary></IndexTerm>
 945 <IndexTerm><Primary><literal>indexAddrOffAddr&num;</literal></Primary></IndexTerm>
 946 </Para>
 947
 948 <Para>
 949 The last of these, <Function>indexAddrOffAddr&num;</Function>, extracts an <Literal>Addr&num;</Literal> using an offset
 950 from another <Literal>Addr&num;</Literal>, thereby providing the ability to follow a chain of
 951 C pointers.
 952 </Para>
 953
 954 <Para>
 955 Something a bit more interesting goes on when indexing arrays of boxed
 956 objects, because the result is simply the boxed object. So presumably
 957 it should be entered&mdash;we never usually return an unevaluated
 958 object!  This is a pain: primitive ops aren't supposed to do
 959 complicated things like enter objects.  The current solution is to
 960 return a single element unboxed tuple (see <XRef LinkEnd="unboxed-tuples">).
 961 </Para>
 962
 963 <Para>
 964
 965 <ProgramListing>
 966 indexArray#       :: Array# elt -> Int# -> (# elt #)
 967 </ProgramListing>
 968
 969 <IndexTerm><Primary><literal>indexArray&num;</literal></Primary></IndexTerm>
 970 </Para>
 971
 972 </Sect2>
 973
 974 <Sect2>
 975 <Title>The state type</Title>
 976
 977 <Para>
 978 <IndexTerm><Primary><literal>state, primitive type</literal></Primary></IndexTerm>
 979 <IndexTerm><Primary><literal>State&num;</literal></Primary></IndexTerm>
 980 </Para>
 981
 982 <Para>
 983 The primitive type <Literal>State&num;</Literal> represents the state of a state
 984 transformer.  It is parameterised on the desired type of state, which
 985 serves to keep states from distinct threads distinct from one another.
 986 But the <Emphasis>only</Emphasis> effect of this parameterisation is in the type
 987 system: all values of type <Literal>State&num;</Literal> are represented in the same way.
 988 Indeed, they are all represented by nothing at all!  The code
 989 generator &ldquo;knows&rdquo; to generate no code, and allocate no registers
 990 etc, for primitive states.
 991 </Para>
 992
 993 <Para>
 994
 995 <ProgramListing>
 996 type State# s
 997 </ProgramListing>
 998
 999 </Para>
1000
1001 <Para>
1002 The type <Literal>GHC.RealWorld</Literal> is truly opaque: there are no values defined
1003 of this type, and no operations over it.  It is &ldquo;primitive&rdquo; in that
1004 sense - but it is <Emphasis>not unlifted!</Emphasis> Its only role in life is to be
1005 the type which distinguishes the <Literal>IO</Literal> state transformer.
1006 </Para>
1007
1008 <Para>
1009
1010 <ProgramListing>
1011 data RealWorld
1012 </ProgramListing>
1013
1014 </Para>
1015
1016 </Sect2>
1017
1018 <Sect2>
1019 <Title>State of the world</Title>
1020
1021 <Para>
1022 A single, primitive, value of type <Literal>State&num; RealWorld</Literal> is provided.
1023 </Para>
1024
1025 <Para>
1026
1027 <ProgramListing>
1028 realWorld# :: State# RealWorld
1029 </ProgramListing>
1030
1031 <IndexTerm><Primary>realWorld&num; state object</Primary></IndexTerm>
1032 </Para>
1033
1034 <Para>
1035 (Note: in the compiler, not a <Literal>PrimOp</Literal>; just a mucho magic
1036 <Literal>Id</Literal>. Exported from <Literal>GHC</Literal>, though).
1037 </Para>
1038
1039 </Sect2>
1040
1041 <Sect2 id="sect-mutable">
1042 <Title>Mutable arrays</Title>
1043
1044 <Para>
1045 <IndexTerm><Primary>mutable arrays</Primary></IndexTerm>
1046 <IndexTerm><Primary>arrays, mutable</Primary></IndexTerm>
1047 Corresponding to <Literal>Array&num;</Literal> and <Literal>ByteArray&num;</Literal>, we have the types of
1048 mutable versions of each.  In each case, the representation is a
1049 pointer to a suitable block of (mutable) heap-allocated storage.
1050 </Para>
1051
1052 <Para>
1053
1054 <ProgramListing>
1055 type MutableArray# s elt
1056 type MutableByteArray# s
1057 </ProgramListing>
1058
1059 <IndexTerm><Primary><literal>MutableArray&num;</literal></Primary></IndexTerm>
1060 <IndexTerm><Primary><literal>MutableByteArray&num;</literal></Primary></IndexTerm>
1061 </Para>
1062
1063 <Sect3>
1064 <Title>Allocation</Title>
1065
1066 <Para>
1067 <IndexTerm><Primary>mutable arrays, allocation</Primary></IndexTerm>
1068 <IndexTerm><Primary>arrays, allocation</Primary></IndexTerm>
1069 <IndexTerm><Primary>allocation, of mutable arrays</Primary></IndexTerm>
1070 </Para>
1071
1072 <Para>
1073 Mutable arrays can be allocated. Only pointer-arrays are initialised;
1074 arrays of non-pointers are filled in by &ldquo;user code&rdquo; rather than by
1075 the array-allocation primitive.  Reason: only the pointer case has to
1076 worry about GC striking with a partly-initialised array.
1077 </Para>
1078
1079 <Para>
1080
1081 <ProgramListing>
1082 newArray#       :: Int# -> elt -> State# s -> (# State# s, MutableArray# s elt #)
1083
1084 newCharArray#   :: Int# -> State# s -> (# State# s, MutableByteArray# s elt #)
1085 newIntArray#    :: Int# -> State# s -> (# State# s, MutableByteArray# s elt #)
1086 newAddrArray#   :: Int# -> State# s -> (# State# s, MutableByteArray# s elt #)
1087 newFloatArray#  :: Int# -> State# s -> (# State# s, MutableByteArray# s elt #)
1088 newDoubleArray# :: Int# -> State# s -> (# State# s, MutableByteArray# s elt #)
1089 </ProgramListing>
1090
1091 <IndexTerm><Primary><literal>newArray&num;</literal></Primary></IndexTerm>
1092 <IndexTerm><Primary><literal>newCharArray&num;</literal></Primary></IndexTerm>
1093 <IndexTerm><Primary><literal>newIntArray&num;</literal></Primary></IndexTerm>
1094 <IndexTerm><Primary><literal>newAddrArray&num;</literal></Primary></IndexTerm>
1095 <IndexTerm><Primary><literal>newFloatArray&num;</literal></Primary></IndexTerm>
1096 <IndexTerm><Primary><literal>newDoubleArray&num;</literal></Primary></IndexTerm>
1097 </Para>
1098
1099 <Para>
1100 The size of a <Literal>ByteArray&num;</Literal> is given in bytes.
1101 </Para>
1102
1103 </Sect3>
1104
1105 <Sect3>
1106 <Title>Reading and writing</Title>
1107
1108 <Para>
1109 <IndexTerm><Primary>arrays, reading and writing</Primary></IndexTerm>
1110 </Para>
1111
1112 <Para>
1113
1114 <ProgramListing>
1115 readArray#       :: MutableArray# s elt -> Int# -> State# s -> (# State# s, elt #)
1116 readCharArray#   :: MutableByteArray# s -> Int# -> State# s -> (# State# s, Char# #)
1117 readIntArray#    :: MutableByteArray# s -> Int# -> State# s -> (# State# s, Int# #)
1118 readAddrArray#   :: MutableByteArray# s -> Int# -> State# s -> (# State# s, Addr# #)
1119 readFloatArray#  :: MutableByteArray# s -> Int# -> State# s -> (# State# s, Float# #)
1120 readDoubleArray# :: MutableByteArray# s -> Int# -> State# s -> (# State# s, Double# #)
1121
1122 writeArray#       :: MutableArray# s elt -> Int# -> elt     -> State# s -> State# s
1123 writeCharArray#   :: MutableByteArray# s -> Int# -> Char#   -> State# s -> State# s
1124 writeIntArray#    :: MutableByteArray# s -> Int# -> Int#    -> State# s -> State# s
1125 writeAddrArray#   :: MutableByteArray# s -> Int# -> Addr#   -> State# s -> State# s
1126 writeFloatArray#  :: MutableByteArray# s -> Int# -> Float#  -> State# s -> State# s
1127 writeDoubleArray# :: MutableByteArray# s -> Int# -> Double# -> State# s -> State# s
1128 </ProgramListing>
1129
1130 <IndexTerm><Primary><literal>readArray&num;</literal></Primary></IndexTerm>
1131 <IndexTerm><Primary><literal>readCharArray&num;</literal></Primary></IndexTerm>
1132 <IndexTerm><Primary><literal>readIntArray&num;</literal></Primary></IndexTerm>
1133 <IndexTerm><Primary><literal>readAddrArray&num;</literal></Primary></IndexTerm>
1134 <IndexTerm><Primary><literal>readFloatArray&num;</literal></Primary></IndexTerm>
1135 <IndexTerm><Primary><literal>readDoubleArray&num;</literal></Primary></IndexTerm>
1136 <IndexTerm><Primary><literal>writeArray&num;</literal></Primary></IndexTerm>
1137 <IndexTerm><Primary><literal>writeCharArray&num;</literal></Primary></IndexTerm>
1138 <IndexTerm><Primary><literal>writeIntArray&num;</literal></Primary></IndexTerm>
1139 <IndexTerm><Primary><literal>writeAddrArray&num;</literal></Primary></IndexTerm>
1140 <IndexTerm><Primary><literal>writeFloatArray&num;</literal></Primary></IndexTerm>
1141 <IndexTerm><Primary><literal>writeDoubleArray&num;</literal></Primary></IndexTerm>
1142 </Para>
1143
1144 </Sect3>
1145
1146 <Sect3>
1147 <Title>Equality</Title>
1148
1149 <Para>
1150 <IndexTerm><Primary>arrays, testing for equality</Primary></IndexTerm>
1151 </Para>
1152
1153 <Para>
1154 One can take &ldquo;equality&rdquo; of mutable arrays.  What is compared is the
1155 <Emphasis>name</Emphasis> or reference to the mutable array, not its contents.
1156 </Para>
1157
1158 <Para>
1159
1160 <ProgramListing>
1161 sameMutableArray#     :: MutableArray# s elt -> MutableArray# s elt -> Bool
1162 sameMutableByteArray# :: MutableByteArray# s -> MutableByteArray# s -> Bool
1163 </ProgramListing>
1164
1165 <IndexTerm><Primary><literal>sameMutableArray&num;</literal></Primary></IndexTerm>
1166 <IndexTerm><Primary><literal>sameMutableByteArray&num;</literal></Primary></IndexTerm>
1167 </Para>
1168
1169 </Sect3>
1170
1171 <Sect3>
1172 <Title>Freezing mutable arrays</Title>
1173
1174 <Para>
1175 <IndexTerm><Primary>arrays, freezing mutable</Primary></IndexTerm>
1176 <IndexTerm><Primary>freezing mutable arrays</Primary></IndexTerm>
1177 <IndexTerm><Primary>mutable arrays, freezing</Primary></IndexTerm>
1178 </Para>
1179
1180 <Para>
1181 Only unsafe-freeze has a primitive.  (Safe freeze is done directly in Haskell
1182 by copying the array and then using <Function>unsafeFreeze</Function>.)
1183 </Para>
1184
1185 <Para>
1186
1187 <ProgramListing>
1188 unsafeFreezeArray#     :: MutableArray# s elt -> State# s -> (# State# s, Array# s elt #)
1189 unsafeFreezeByteArray# :: MutableByteArray# s -> State# s -> (# State# s, ByteArray# #)
1190 </ProgramListing>
1191
1192 <IndexTerm><Primary><literal>unsafeFreezeArray&num;</literal></Primary></IndexTerm>
1193 <IndexTerm><Primary><literal>unsafeFreezeByteArray&num;</literal></Primary></IndexTerm>
1194 </Para>
1195
1196 </Sect3>
1197
1198 </Sect2>
1199
1200 <Sect2>
1201 <Title>Synchronizing variables (M-vars)</Title>
1202
1203 <Para>
1204 <IndexTerm><Primary>synchronising variables (M-vars)</Primary></IndexTerm>
1205 <IndexTerm><Primary>M-Vars</Primary></IndexTerm>
1206 </Para>
1207
1208 <Para>
1209 Synchronising variables are the primitive type used to implement
1210 Concurrent Haskell's MVars (see the Concurrent Haskell paper for
1211 the operational behaviour of these operations).
1212 </Para>
1213
1214 <Para>
1215
1216 <ProgramListing>
1217 type MVar# s elt        -- primitive
1218
1219 newMVar#    :: State# s -> (# State# s, MVar# s elt #)
1220 takeMVar#   :: SynchVar# s elt -> State# s -> (# State# s, elt #)
1221 putMVar#    :: SynchVar# s elt -> State# s -> State# s
1222 </ProgramListing>
1223
1224 <IndexTerm><Primary><literal>SynchVar&num;</literal></Primary></IndexTerm>
1225 <IndexTerm><Primary><literal>newSynchVar&num;</literal></Primary></IndexTerm>
1226 <IndexTerm><Primary><literal>takeMVar</literal></Primary></IndexTerm>
1227 <IndexTerm><Primary><literal>putMVar</literal></Primary></IndexTerm>
1228 </Para>
1229
1230 </Sect2>
1231
1232 </Sect1>
1233
1234 <Sect1 id="glasgow-ST-monad">
1235 <Title>Primitive state-transformer monad
1236 </Title>
1237
1238 <Para>
1239 <IndexTerm><Primary>state transformers (Glasgow extensions)</Primary></IndexTerm>
1240 <IndexTerm><Primary>ST monad (Glasgow extension)</Primary></IndexTerm>
1241 </Para>
1242
1243 <Para>
1244 This monad underlies our implementation of arrays, mutable and
1245 immutable, and our implementation of I/O, including &ldquo;C calls&rdquo;.
1246 </Para>
1247
1248 <Para>
1249 The <Literal>ST</Literal> library, which provides access to the
1250 <Function>ST</Function> monad, is described in <xref
1251 linkend="sec-ST">.
1252 </Para>
1253
1254 </Sect1>
1255
1256 <Sect1 id="glasgow-prim-arrays">
1257 <Title>Primitive arrays, mutable and otherwise
1258 </Title>
1259
1260 <Para>
1261 <IndexTerm><Primary>primitive arrays (Glasgow extension)</Primary></IndexTerm>
1262 <IndexTerm><Primary>arrays, primitive (Glasgow extension)</Primary></IndexTerm>
1263 </Para>
1264
1265 <Para>
1266 GHC knows about quite a few flavours of Large Swathes of Bytes.
1267 </Para>
1268
1269 <Para>
1270 First, GHC distinguishes between primitive arrays of (boxed) Haskell
1271 objects (type <Literal>Array&num; obj</Literal>) and primitive arrays of bytes (type
1272 <Literal>ByteArray&num;</Literal>).
1273 </Para>
1274
1275 <Para>
1276 Second, it distinguishes between&hellip;
1277 <VariableList>
1278
1279 <VarListEntry>
1280 <Term>Immutable:</Term>
1281 <ListItem>
1282 <Para>
1283 Arrays that do not change (as with &ldquo;standard&rdquo; Haskell arrays); you
1284 can only read from them.  Obviously, they do not need the care and
1285 attention of the state-transformer monad.
1286 </Para>
1287 </ListItem>
1288 </VarListEntry>
1289 <VarListEntry>
1290 <Term>Mutable:</Term>
1291 <ListItem>
1292 <Para>
1293 Arrays that may be changed or &ldquo;mutated.&rdquo;  All the operations on them
1294 live within the state-transformer monad and the updates happen
1295 <Emphasis>in-place</Emphasis>.
1296 </Para>
1297 </ListItem>
1298 </VarListEntry>
1299 <VarListEntry>
1300 <Term>&ldquo;Static&rdquo; (in C land):</Term>
1301 <ListItem>
1302 <Para>
1303 A C routine may pass an <Literal>Addr&num;</Literal> pointer back into Haskell land.  There
1304 are then primitive operations with which you may merrily grab values
1305 over in C land, by indexing off the &ldquo;static&rdquo; pointer.
1306 </Para>
1307 </ListItem>
1308 </VarListEntry>
1309 <VarListEntry>
1310 <Term>&ldquo;Stable&rdquo; pointers:</Term>
1311 <ListItem>
1312 <Para>
1313 If, for some reason, you wish to hand a Haskell pointer (i.e.,
1314 <Emphasis>not</Emphasis> an unboxed value) to a C routine, you first make the
1315 pointer &ldquo;stable,&rdquo; so that the garbage collector won't forget that it
1316 exists.  That is, GHC provides a safe way to pass Haskell pointers to
1317 C.
1318 </Para>
1319
1320 <Para>
1321 Please see <XRef LinkEnd="glasgow-stablePtrs"> for more details.
1322 </Para>
1323 </ListItem>
1324 </VarListEntry>
1325 <VarListEntry>
1326 <Term>&ldquo;Foreign objects&rdquo;:</Term>
1327 <ListItem>
1328 <Para>
1329 A &ldquo;foreign object&rdquo; is a safe way to pass an external object (a
1330 C-allocated pointer, say) to Haskell and have Haskell do the Right
1331 Thing when it no longer references the object.  So, for example, C
1332 could pass a large bitmap over to Haskell and say &ldquo;please free this
1333 memory when you're done with it.&rdquo;
1334 </Para>
1335
1336 <Para>
1337 Please see <XRef LinkEnd="glasgow-foreignObjs"> for more details.
1338 </Para>
1339 </ListItem>
1340 </VarListEntry>
1341 </VariableList>
1342 </Para>
1343
1344 <Para>
1345 The libraries documentatation gives more details on all these
1346 &ldquo;primitive array&rdquo; types and the operations on them.
1347 </Para>
1348
1349 </Sect1>
1350
1351
1352 <Sect1 id="pattern-guards">
1353 <Title>Pattern guards</Title>
1354
1355 <Para>
1356 <IndexTerm><Primary>Pattern guards (Glasgow extension)</Primary></IndexTerm>
1357 The discussion that follows is an abbreviated version of Simon Peyton Jones's original <ULink URL="http://research.microsoft.com/~simonpj/Haskell/guards.html">proposal</ULink>. (Note that the proposal was written before pattern guards were implemented, so refers to them as unimplemented.)
1358 </Para>
1359
1360 <Para>
1361 Suppose we have an abstract data type of finite maps, with a
1362 lookup operation:
1363
1364 <ProgramListing>
1365 lookup :: FiniteMap -> Int -> Maybe Int
1366 </ProgramListing>
1367
1368 The lookup returns <Function>Nothing</Function> if the supplied key is not in the domain of the mapping, and <Function>(Just v)</Function> otherwise,
1369 where <VarName>v</VarName> is the value that the key maps to.  Now consider the following definition:
1370 </Para>
1371
1372 <ProgramListing>
1373 clunky env var1 var2 | ok1 && ok2 = val1 + val2
1374 | otherwise  = var1 + var2
1375 where
1376   m1 = lookup env var1
1377   m2 = lookup env var2
1378   ok1 = maybeToBool m1
1379   ok2 = maybeToBool m2
1380   val1 = expectJust m1
1381   val2 = expectJust m2
1382 </ProgramListing>
1383
1384 <Para>
1385 The auxiliary functions are
1386 </Para>
1387
1388 <ProgramListing>
1389 maybeToBool :: Maybe a -&gt; Bool
1390 maybeToBool (Just x) = True
1391 maybeToBool Nothing  = False
1392
1393 expectJust :: Maybe a -&gt; a
1394 expectJust (Just x) = x
1395 expectJust Nothing  = error "Unexpected Nothing"
1396 </ProgramListing>
1397
1398 <Para>
1399 What is <Function>clunky</Function> doing? The guard <Literal>ok1 &&
1400 ok2</Literal> checks that both lookups succeed, using
1401 <Function>maybeToBool</Function> to convert the <Function>Maybe</Function>
1402 types to booleans. The (lazily evaluated) <Function>expectJust</Function>
1403 calls extract the values from the results of the lookups, and binds the
1404 returned values to <VarName>val1</VarName> and <VarName>val2</VarName>
1405 respectively.  If either lookup fails, then clunky takes the
1406 <Literal>otherwise</Literal> case and returns the sum of its arguments.
1407 </Para>
1408
1409 <Para>
1410 This is certainly legal Haskell, but it is a tremendously verbose and
1411 un-obvious way to achieve the desired effect.  Arguably, a more direct way
1412 to write clunky would be to use case expressions:
1413 </Para>
1414
1415 <ProgramListing>
1416 clunky env var1 var1 = case lookup env var1 of
1417   Nothing -&gt; fail
1418   Just val1 -&gt; case lookup env var2 of
1419     Nothing -&gt; fail
1420     Just val2 -&gt; val1 + val2
1421 where
1422   fail = val1 + val2
1423 </ProgramListing>
1424
1425 <Para>
1426 This is a bit shorter, but hardly better.  Of course, we can rewrite any set
1427 of pattern-matching, guarded equations as case expressions; that is
1428 precisely what the compiler does when compiling equations! The reason that
1429 Haskell provides guarded equations is because they allow us to write down
1430 the cases we want to consider, one at a time, independently of each other.
1431 This structure is hidden in the case version.  Two of the right-hand sides
1432 are really the same (<Function>fail</Function>), and the whole expression
1433 tends to become more and more indented.
1434 </Para>
1435
1436 <Para>
1437 Here is how I would write clunky:
1438 </Para>
1439
1440 <ProgramListing>
1441 clunky env var1 var1
1442   | Just val1 &lt;- lookup env var1
1443   , Just val2 &lt;- lookup env var2
1444   = val1 + val2
1445 ...other equations for clunky...
1446 </ProgramListing>
1447
1448 <Para>
1449 The semantics should be clear enough.  The qualifers are matched in order.
1450 For a <Literal>&lt;-</Literal> qualifier, which I call a pattern guard, the
1451 right hand side is evaluated and matched against the pattern on the left.
1452 If the match fails then the whole guard fails and the next equation is
1453 tried.  If it succeeds, then the appropriate binding takes place, and the
1454 next qualifier is matched, in the augmented environment.  Unlike list
1455 comprehensions, however, the type of the expression to the right of the
1456 <Literal>&lt;-</Literal> is the same as the type of the pattern to its
1457 left.  The bindings introduced by pattern guards scope over all the
1458 remaining guard qualifiers, and over the right hand side of the equation.
1459 </Para>
1460
1461 <Para>
1462 Just as with list comprehensions, boolean expressions can be freely mixed
1463 with among the pattern guards.  For example:
1464 </Para>
1465
1466 <ProgramListing>
1467 f x | [y] <- x
1468     , y > 3
1469     , Just z <- h y
1470     = ...
1471 </ProgramListing>
1472
1473 <Para>
1474 Haskell's current guards therefore emerge as a special case, in which the
1475 qualifier list has just one element, a boolean expression.
1476 </Para>
1477 </Sect1>
1478
1479 <Sect1 id="sec-ffi">
1480 <Title>The foreign interface</Title>
1481
1482 <Para>
1483 The foreign interface consists of language and library support. The former
1484 is described later in <XRef LinkEnd="ffi">; the latter is outlined below,
1485 and detailed in <XRef LinkEnd="sec-Foreign">.
1486 </Para>
1487
1488 <Sect2 id="glasgow-foreign-headers">
1489 <Title>Using function headers
1490 </Title>
1491
1492 <Para>
1493 <IndexTerm><Primary>C calls, function headers</Primary></IndexTerm>
1494 </Para>
1495
1496 <Para>
1497 When generating C (using the <Option>-fvia-C</Option> directive), one can assist the
1498 C compiler in detecting type errors by using the <Command>-&num;include</Command> directive
1499 to provide <Filename>.h</Filename> files containing function headers.
1500 </Para>
1501
1502 <Para>
1503 For example,
1504 </Para>
1505
1506 <Para>
1507
1508 <ProgramListing>
1509 typedef unsigned long *StgForeignObj;
1510 typedef long StgInt;
1511
1512 void          initialiseEFS (StgInt size);
1513 StgInt        terminateEFS (void);
1514 StgForeignObj emptyEFS(void);
1515 StgForeignObj updateEFS (StgForeignObj a, StgInt i, StgInt x);
1516 StgInt        lookupEFS (StgForeignObj a, StgInt i);
1517 </ProgramListing>
1518
1519 </Para>
1520
1521 <Para>
1522 You can find appropriate definitions for <Literal>StgInt</Literal>, <Literal>StgForeignObj</Literal>,
1523 etc using <Command>gcc</Command> on your architecture by consulting
1524 <Filename>ghc/includes/StgTypes.h</Filename>.  The following table summarises the
1525 relationship between Haskell types and C types.
1526 </Para>
1527
1528 <Para>
1529
1530 <InformalTable>
1531 <TGroup Cols="2">
1532 <ColSpec Align="Left" Colsep="0">
1533 <ColSpec Align="Left" Colsep="0">
1534 <TBody>
1535 <Row>
1536 <Entry><Emphasis>C type name</Emphasis> </Entry>
1537 <Entry> <Emphasis>Haskell Type</Emphasis> </Entry>
1538 </Row>
1539
1540 <Row>
1541 <Entry>
1542 <Literal>StgChar</Literal> </Entry>
1543 <Entry> <Literal>Char&num;</Literal> </Entry>
1544 </Row>
1545 <Row>
1546 <Entry>
1547 <Literal>StgInt</Literal> </Entry>
1548 <Entry> <Literal>Int&num;</Literal> </Entry>
1549 </Row>
1550 <Row>
1551 <Entry>
1552 <Literal>StgWord</Literal> </Entry>
1553 <Entry> <Literal>Word&num;</Literal> </Entry>
1554 </Row>
1555 <Row>
1556 <Entry>
1557 <Literal>StgAddr</Literal> </Entry>
1558 <Entry> <Literal>Addr&num;</Literal> </Entry>
1559 </Row>
1560 <Row>
1561 <Entry>
1562 <Literal>StgFloat</Literal> </Entry>
1563 <Entry> <Literal>Float&num;</Literal> </Entry>
1564 </Row>
1565 <Row>
1566 <Entry>
1567 <Literal>StgDouble</Literal> </Entry>
1568 <Entry> <Literal>Double&num;</Literal> </Entry>
1569 </Row>
1570 <Row>
1571 <Entry>
1572 <Literal>StgArray</Literal> </Entry>
1573 <Entry> <Literal>Array&num;</Literal> </Entry>
1574 </Row>
1575 <Row>
1576 <Entry>
1577 <Literal>StgByteArray</Literal> </Entry>
1578 <Entry> <Literal>ByteArray&num;</Literal> </Entry>
1579 </Row>
1580 <Row>
1581 <Entry>
1582 <Literal>StgArray</Literal> </Entry>
1583 <Entry> <Literal>MutableArray&num;</Literal> </Entry>
1584 </Row>
1585 <Row>
1586 <Entry>
1587 <Literal>StgByteArray</Literal> </Entry>
1588 <Entry> <Literal>MutableByteArray&num;</Literal> </Entry>
1589 </Row>
1590 <Row>
1591 <Entry>
1592 <Literal>StgStablePtr</Literal> </Entry>
1593 <Entry> <Literal>StablePtr&num;</Literal> </Entry>
1594 </Row>
1595 <Row>
1596 <Entry>
1597 <Literal>StgForeignObj</Literal> </Entry>
1598 <Entry> <Literal>ForeignObj&num;</Literal></Entry>
1599 </Row>
1600 </TBody>
1601
1602 </TGroup>
1603 </InformalTable>
1604 </Para>
1605
1606 <Para>
1607 Note that this approach is only <Emphasis>essential</Emphasis> for returning
1608 <Literal>float</Literal>s (or if <Literal>sizeof(int) != sizeof(int *)</Literal> on your
1609 architecture) but is a Good Thing for anyone who cares about writing
1610 solid code.  You're crazy not to do it.
1611 </Para>
1612
1613 </Sect2>
1614
1615 <Sect2 id="glasgow-stablePtrs">
1616 <Title>Subverting automatic unboxing with &ldquo;stable pointers&rdquo;
1617 </Title>
1618
1619 <Para>
1620 <IndexTerm><Primary>stable pointers (Glasgow extension)</Primary></IndexTerm>
1621 </Para>
1622
1623 <Para>
1624 The arguments of a <Function>&lowbar;ccall&lowbar;</Function> automatically unboxed before the
1625 call.  There are two reasons why this is usually the Right Thing to
1626 do:
1627 </Para>
1628
1629 <Para>
1630
1631 <ItemizedList>
1632 <ListItem>
1633
1634 <Para>
1635 C is a strict language: it would be excessively tedious to pass
1636 unevaluated arguments and require the C programmer to force their
1637 evaluation before using them.
1638
1639 </Para>
1640 </ListItem>
1641 <ListItem>
1642
1643 <Para>
1644  Boxed values are stored on the Haskell heap and may be moved
1645 within the heap if a garbage collection occurs&mdash;that is, pointers
1646 to boxed objects are not <Emphasis>stable</Emphasis>.
1647 </Para>
1648 </ListItem>
1649
1650 </ItemizedList>
1651
1652 </Para>
1653
1654 <Para>
1655 It is possible to subvert the unboxing process by creating a &ldquo;stable
1656 pointer&rdquo; to a value and passing the stable pointer instead.  For
1657 example, to pass/return an integer lazily to C functions <Function>storeC</Function> and
1658 <Function>fetchC</Function> might write:
1659 </Para>
1660
1661 <Para>
1662
1663 <ProgramListing>
1664 storeH :: Int -> IO ()
1665 storeH x = makeStablePtr x              >>= \ stable_x ->
1666            _ccall_ storeC stable_x
1667
1668 fetchH :: IO Int
1669 fetchH x = _ccall_ fetchC               >>= \ stable_x ->
1670            deRefStablePtr stable_x      >>= \ x ->
1671            freeStablePtr stable_x       >>
1672            return x
1673 </ProgramListing>
1674
1675 </Para>
1676
1677 <Para>
1678 The garbage collector will refrain from throwing a stable pointer away
1679 until you explicitly call one of the following from C or Haskell.
1680 </Para>
1681
1682 <Para>
1683
1684 <ProgramListing>
1685 void freeStablePointer( StgStablePtr stablePtrToToss )
1686 freeStablePtr :: StablePtr a -> IO ()
1687 </ProgramListing>
1688
1689 </Para>
1690
1691 <Para>
1692 As with the use of <Function>free</Function> in C programs, GREAT CARE SHOULD BE
1693 EXERCISED to ensure these functions are called at the right time: too
1694 early and you get dangling references (and, if you're lucky, an error
1695 message from the runtime system); too late and you get space leaks.
1696 </Para>
1697
1698 <Para>
1699 And to force evaluation of the argument within <Function>fooC</Function>, one would
1700 call one of the following C functions (according to type of argument).
1701 </Para>
1702
1703 <Para>
1704
1705 <ProgramListing>
1706 void     performIO  ( StgStablePtr stableIndex /* StablePtr s (IO ()) */ );
1707 StgInt   enterInt   ( StgStablePtr stableIndex /* StablePtr s Int */ );
1708 StgFloat enterFloat ( StgStablePtr stableIndex /* StablePtr s Float */ );
1709 </ProgramListing>
1710
1711 </Para>
1712
1713 <Para>
1714 <IndexTerm><Primary>performIO</Primary></IndexTerm>
1715 <IndexTerm><Primary>enterInt</Primary></IndexTerm>
1716 <IndexTerm><Primary>enterFloat</Primary></IndexTerm>
1717 </Para>
1718
1719 <Para>
1720 Nota Bene: <Function>&lowbar;ccall&lowbar;GC&lowbar;</Function><IndexTerm><Primary>&lowbar;ccall&lowbar;GC&lowbar;</Primary></IndexTerm> must be used if any of
1721 these functions are used.
1722 </Para>
1723
1724 </Sect2>
1725
1726 <Sect2 id="glasgow-foreignObjs">
1727 <Title>Foreign objects: pointing outside the Haskell heap
1728 </Title>
1729
1730 <Para>
1731 <IndexTerm><Primary>foreign objects (Glasgow extension)</Primary></IndexTerm>
1732 </Para>
1733
1734 <Para>
1735 There are two types that GHC programs can use to reference
1736 (heap-allocated) objects outside the Haskell world: <Literal>Addr</Literal> and
1737 <Literal>ForeignObj</Literal>.
1738 </Para>
1739
1740 <Para>
1741 If you use <Literal>Addr</Literal>, it is up to you to the programmer to arrange
1742 allocation and deallocation of the objects.
1743 </Para>
1744
1745 <Para>
1746 If you use <Literal>ForeignObj</Literal>, GHC's garbage collector will call upon the
1747 user-supplied <Emphasis>finaliser</Emphasis> function to free the object when the
1748 Haskell world no longer can access the object.  (An object is
1749 associated with a finaliser function when the abstract
1750 Haskell type <Literal>ForeignObj</Literal> is created). The finaliser function is
1751 expressed in C, and is passed as argument the object:
1752 </Para>
1753
1754 <Para>
1755
1756 <ProgramListing>
1757 void foreignFinaliser ( StgForeignObj fo )
1758 </ProgramListing>
1759
1760 </Para>
1761
1762 <Para>
1763 when the Haskell world can no longer access the object.  Since
1764 <Literal>ForeignObj</Literal>s only get released when a garbage collection occurs, we
1765 provide ways of triggering a garbage collection from within C and from
1766 within Haskell.
1767 </Para>
1768
1769 <Para>
1770
1771 <ProgramListing>
1772 void GarbageCollect()
1773 performGC :: IO ()
1774 </ProgramListing>
1775
1776 </Para>
1777
1778 <Para>
1779 More information on the programmers' interface to <Literal>ForeignObj</Literal> can be
1780 found in the library documentation.
1781 </Para>
1782
1783 </Sect2>
1784
1785 <Sect2 id="glasgow-avoiding-monads">
1786 <Title>Avoiding monads
1787 </Title>
1788
1789 <Para>
1790 <IndexTerm><Primary>C calls to `pure C'</Primary></IndexTerm>
1791 <IndexTerm><Primary>unsafePerformIO</Primary></IndexTerm>
1792 </Para>
1793
1794 <Para>
1795 The <Function>&lowbar;ccall&lowbar;</Function> construct is part of the <Literal>IO</Literal> monad because 9 out of 10
1796 uses will be to call imperative functions with side effects such as
1797 <Function>printf</Function>.  Use of the monad ensures that these operations happen in a
1798 predictable order in spite of laziness and compiler optimisations.
1799 </Para>
1800
1801 <Para>
1802 To avoid having to be in the monad to call a C function, it is
1803 possible to use <Function>unsafePerformIO</Function>, which is available from the
1804 <Literal>IOExts</Literal> module.  There are three situations where one might like to
1805 call a C function from outside the IO world:
1806 </Para>
1807
1808 <Para>
1809
1810 <ItemizedList>
1811 <ListItem>
1812
1813 <Para>
1814 Calling a function with no side-effects:
1815
1816 <ProgramListing>
1817 atan2d :: Double -> Double -> Double
1818 atan2d y x = unsafePerformIO (_ccall_ atan2d y x)
1819
1820 sincosd :: Double -> (Double, Double)
1821 sincosd x = unsafePerformIO $ do
1822         da &#60;- newDoubleArray (0, 1)
1823         _casm_ &ldquo;sincosd( %0, &amp;((double *)%1[0]), &amp;((double *)%1[1]) );&rdquo; x da
1824         s &#60;- readDoubleArray da 0
1825         c &#60;- readDoubleArray da 1
1826         return (s, c)
1827 </ProgramListing>
1828
1829
1830 </Para>
1831 </ListItem>
1832 <ListItem>
1833
1834 <Para>
1835  Calling a set of functions which have side-effects but which can
1836 be used in a purely functional manner.
1837
1838 For example, an imperative implementation of a purely functional
1839 lookup-table might be accessed using the following functions.
1840
1841
1842 <ProgramListing>
1843 empty  :: EFS x
1844 update :: EFS x -> Int -> x -> EFS x
1845 lookup :: EFS a -> Int -> a
1846
1847 empty = unsafePerformIO (_ccall_ emptyEFS)
1848
1849 update a i x = unsafePerformIO $
1850         makeStablePtr x         >>= \ stable_x ->
1851         _ccall_ updateEFS a i stable_x
1852
1853 lookup a i = unsafePerformIO $
1854         _ccall_ lookupEFS a i   >>= \ stable_x ->
1855         deRefStablePtr stable_x
1856 </ProgramListing>
1857
1858
1859 You will almost always want to use <Literal>ForeignObj</Literal>s with this.
1860
1861 </Para>
1862 </ListItem>
1863 <ListItem>
1864
1865 <Para>
1866  Calling a side-effecting function even though the results will
1867 be unpredictable.  For example the <Function>trace</Function> function is defined by:
1868
1869
1870 <ProgramListing>
1871 trace :: String -> a -> a
1872 trace string expr
1873   = unsafePerformIO (
1874         ((_ccall_ PreTraceHook sTDERR{-msg-}):: IO ())  >>
1875         fputs sTDERR string                             >>
1876         ((_ccall_ PostTraceHook sTDERR{-msg-}):: IO ()) >>
1877         return expr )
1878   where
1879     sTDERR = (&ldquo;stderr&rdquo; :: Addr)
1880 </ProgramListing>
1881
1882
1883 (This kind of use is not highly recommended&mdash;it is only really
1884 useful in debugging code.)
1885 </Para>
1886 </ListItem>
1887
1888 </ItemizedList>
1889
1890 </Para>
1891
1892 </Sect2>
1893
1894 <Sect2 id="ccall-gotchas">
1895 <Title>C-calling &ldquo;gotchas&rdquo; checklist
1896 </Title>
1897
1898 <Para>
1899 <IndexTerm><Primary>C call dangers</Primary></IndexTerm>
1900 <IndexTerm><Primary>CCallable</Primary></IndexTerm>
1901 <IndexTerm><Primary>CReturnable</Primary></IndexTerm>
1902 </Para>
1903
1904 <Para>
1905 And some advice, too.
1906 </Para>
1907
1908 <Para>
1909
1910 <ItemizedList>
1911 <ListItem>
1912
1913 <Para>
1914  For modules that use <Function>&lowbar;ccall&lowbar;</Function>s, etc., compile with
1915 <Option>-fvia-C</Option>.<IndexTerm><Primary>-fvia-C option</Primary></IndexTerm> You don't have to, but you should.
1916
1917 Also, use the <Option>-&num;include "prototypes.h"</Option> flag (hack) to inform the C
1918 compiler of the fully-prototyped types of all the C functions you
1919 call.  (<XRef LinkEnd="glasgow-foreign-headers"> says more about this&hellip;)
1920
1921 This scheme is the <Emphasis>only</Emphasis> way that you will get <Emphasis>any</Emphasis>
1922 typechecking of your <Function>&lowbar;ccall&lowbar;</Function>s.  (It shouldn't be that way, but&hellip;).
1923 GHC will pass the flag <Option>-Wimplicit</Option> to <Command>gcc</Command> so that you'll get warnings
1924 if any <Function>&lowbar;ccall&lowbar;</Function>ed functions have no prototypes.
1925
1926 </Para>
1927 </ListItem>
1928 <ListItem>
1929
1930 <Para>
1931 Try to avoid <Function>&lowbar;ccall&lowbar;</Function>s to C&nbsp;functions that take <Literal>float</Literal>
1932 arguments or return <Literal>float</Literal> results.  Reason: if you do, you will
1933 become entangled in (ANSI?) C's rules for when arguments/results are
1934 promoted to <Literal>doubles</Literal>.  It's a nightmare and just not worth it.
1935 Use <Literal>doubles</Literal> if possible.
1936
1937 If you do use <Literal>floats</Literal>, check and re-check that the right thing is
1938 happening.  Perhaps compile with <Option>-keep-hc-file-too</Option> and look at
1939 the intermediate C (<Function>.hc</Function>).
1940
1941 </Para>
1942 </ListItem>
1943 <ListItem>
1944
1945 <Para>
1946  The compiler uses two non-standard type-classes when
1947 type-checking the arguments and results of <Function>&lowbar;ccall&lowbar;</Function>: the arguments
1948 (respectively result) of <Function>&lowbar;ccall&lowbar;</Function> must be instances of the class
1949 <Literal>CCallable</Literal> (respectively <Literal>CReturnable</Literal>).  Both classes may be
1950 imported from the module <Literal>CCall</Literal>, but this should only be
1951 necessary if you want to define a new instance.  (Neither class
1952 defines any methods&mdash;their only function is to keep the
1953 type-checker happy.)
1954
1955 The type checker must be able to figure out just which of the
1956 C-callable/returnable types is being used.  If it can't, you have to
1957 add type signatures. For example,
1958
1959
1960 <ProgramListing>
1961 f x = _ccall_ foo x
1962 </ProgramListing>
1963
1964
1965 is not good enough, because the compiler can't work out what type <VarName>x</VarName>
1966 is, nor what type the <Function>&lowbar;ccall&lowbar;</Function> returns.  You have to write, say:
1967
1968
1969 <ProgramListing>
1970 f :: Int -> IO Double
1971 f x = _ccall_ foo x
1972 </ProgramListing>
1973
1974
1975 This table summarises the standard instances of these classes.
1976
1977 <InformalTable>
1978 <TGroup Cols="4">
1979 <ColSpec Align="Left" Colsep="0">
1980 <ColSpec Align="Left" Colsep="0">
1981 <ColSpec Align="Left" Colsep="0">
1982 <ColSpec Align="Left" Colsep="0">
1983 <TBody>
1984 <Row>
1985 <Entry><Emphasis>Type</Emphasis> </Entry>
1986 <Entry><Emphasis>CCallable</Emphasis></Entry>
1987 <Entry><Emphasis>CReturnable</Emphasis> </Entry>
1988 <Entry><Emphasis>Which is probably&hellip;</Emphasis> </Entry>
1989 </Row>
1990 <Row>
1991 <Entry>
1992 <Literal>Char</Literal> </Entry>
1993 <Entry> Yes </Entry>
1994 <Entry> Yes </Entry>
1995 <Entry> <Literal>unsigned char</Literal> </Entry>
1996 </Row>
1997 <Row>
1998 <Entry>
1999 <Literal>Int</Literal> </Entry>
2000 <Entry> Yes </Entry>
2001 <Entry> Yes </Entry>
2002 <Entry> <Literal>long int</Literal> </Entry>
2003 </Row>
2004 <Row>
2005 <Entry>
2006 <Literal>Word</Literal> </Entry>
2007 <Entry> Yes </Entry>
2008 <Entry> Yes </Entry>
2009 <Entry> <Literal>unsigned long int</Literal> </Entry>
2010 </Row>
2011 <Row>
2012 <Entry>
2013 <Literal>Addr</Literal> </Entry>
2014 <Entry> Yes </Entry>
2015 <Entry> Yes </Entry>
2016 <Entry> <Literal>void *</Literal> </Entry>
2017 </Row>
2018 <Row>
2019 <Entry>
2020 <Literal>Float</Literal> </Entry>
2021 <Entry> Yes </Entry>
2022 <Entry> Yes </Entry>
2023 <Entry> <Literal>float</Literal> </Entry>
2024 </Row>
2025 <Row>
2026 <Entry>
2027 <Literal>Double</Literal> </Entry>
2028 <Entry> Yes </Entry>
2029 <Entry> Yes </Entry>
2030 <Entry> <Literal>double</Literal> </Entry>
2031 </Row>
2032 <Row>
2033 <Entry>
2034 <Literal>()</Literal> </Entry>
2035 <Entry> No </Entry>
2036 <Entry> Yes </Entry>
2037 <Entry> <Literal>void</Literal> </Entry>
2038 </Row>
2039 <Row>
2040 <Entry>
2041 <Literal>[Char]</Literal> </Entry>
2042 <Entry> Yes </Entry>
2043 <Entry> No </Entry>
2044 <Entry> <Literal>char *</Literal> (null-terminated) </Entry>
2045 </Row>
2046 <Row>
2047 <Entry>
2048 <Literal>Array</Literal> </Entry>
2049 <Entry> Yes </Entry>
2050 <Entry> No </Entry>
2051 <Entry> <Literal>unsigned long *</Literal> </Entry>
2052 </Row>
2053 <Row>
2054 <Entry>
2055 <Literal>ByteArray</Literal> </Entry>
2056 <Entry> Yes </Entry>
2057 <Entry> No </Entry>
2058 <Entry> <Literal>unsigned long *</Literal> </Entry>
2059 </Row>
2060 <Row>
2061 <Entry>
2062 <Literal>MutableArray</Literal> </Entry>
2063 <Entry> Yes </Entry>
2064 <Entry> No </Entry>
2065 <Entry> <Literal>unsigned long *</Literal> </Entry>
2066 </Row>
2067 <Row>
2068 <Entry>
2069 <Literal>MutableByteArray</Literal> </Entry>
2070 <Entry> Yes </Entry>
2071 <Entry> No </Entry>
2072 <Entry> <Literal>unsigned long *</Literal> </Entry>
2073 </Row>
2074 <Row>
2075 <Entry>
2076 <Literal>State</Literal> </Entry>
2077 <Entry> Yes </Entry>
2078 <Entry> Yes </Entry>
2079 <Entry> nothing!</Entry>
2080 </Row>
2081 <Row>
2082 <Entry>
2083 <Literal>StablePtr</Literal> </Entry>
2084 <Entry> Yes </Entry>
2085 <Entry> Yes </Entry>
2086 <Entry> <Literal>unsigned long *</Literal> </Entry>
2087 </Row>
2088 <Row>
2089 <Entry>
2090 <Literal>ForeignObjs</Literal> </Entry>
2091 <Entry> Yes </Entry>
2092 <Entry> Yes </Entry>
2093 <Entry> see later </Entry>
2094 </Row>
2095
2096 </TBody>
2097
2098 </TGroup>
2099 </InformalTable>
2100
2101 Actually, the <Literal>Word</Literal> type is defined as being the same size as a
2102 pointer on the target architecture, which is <Emphasis>probably</Emphasis>
2103 <Literal>unsigned long int</Literal>.
2104
2105 The brave and careful programmer can add their own instances of these
2106 classes for the following types:
2107
2108
2109 <ItemizedList>
2110 <ListItem>
2111
2112 <Para>
2113 A <Emphasis>boxed-primitive</Emphasis> type may be made an instance of both
2114 <Literal>CCallable</Literal> and <Literal>CReturnable</Literal>.
2115
2116 A boxed primitive type is any data type with a
2117 single unary constructor with a single primitive argument.  For
2118 example, the following are all boxed primitive types:
2119
2120
2121 <ProgramListing>
2122 Int
2123 Double
2124 data XDisplay = XDisplay Addr#
2125 data EFS a = EFS# ForeignObj#
2126 </ProgramListing>
2127
2128
2129
2130 <ProgramListing>
2131 instance CCallable   (EFS a)
2132 instance CReturnable (EFS a)
2133 </ProgramListing>
2134
2135
2136 </Para>
2137 </ListItem>
2138 <ListItem>
2139
2140 <Para>
2141  Any datatype with a single nullary constructor may be made an
2142 instance of <Literal>CReturnable</Literal>.  For example:
2143
2144
2145 <ProgramListing>
2146 data MyVoid = MyVoid
2147 instance CReturnable MyVoid
2148 </ProgramListing>
2149
2150
2151 </Para>
2152 </ListItem>
2153 <ListItem>
2154
2155 <Para>
2156  As at version 2.09, <Literal>String</Literal> (i.e., <Literal>[Char]</Literal>) is still
2157 not a <Literal>CReturnable</Literal> type.
2158
2159 Also, the now-builtin type <Literal>PackedString</Literal> is neither
2160 <Literal>CCallable</Literal> nor <Literal>CReturnable</Literal>.  (But there are functions in
2161 the PackedString interface to let you get at the necessary bits&hellip;)
2162 </Para>
2163 </ListItem>
2164
2165 </ItemizedList>
2166
2167
2168 </Para>
2169 </ListItem>
2170 <ListItem>
2171
2172 <Para>
2173  The code-generator will complain if you attempt to use <Literal>&percnt;r</Literal> in
2174 a <Literal>&lowbar;casm&lowbar;</Literal> whose result type is <Literal>IO ()</Literal>; or if you don't use <Literal>&percnt;r</Literal>
2175 <Emphasis>precisely</Emphasis> once for any other result type.  These messages are
2176 supposed to be helpful and catch bugs&mdash;please tell us if they wreck
2177 your life.
2178
2179 </Para>
2180 </ListItem>
2181 <ListItem>
2182
2183 <Para>
2184  If you call out to C code which may trigger the Haskell garbage
2185 collector or create new threads (examples of this later&hellip;), then you
2186 must use the <Function>&lowbar;ccall&lowbar;GC&lowbar;</Function><IndexTerm><Primary>&lowbar;ccall&lowbar;GC&lowbar; primitive</Primary></IndexTerm> or
2187 <Function>&lowbar;casm&lowbar;GC&lowbar;</Function><IndexTerm><Primary>&lowbar;casm&lowbar;GC&lowbar; primitive</Primary></IndexTerm> variant of C-calls.  (This
2188 does not work with the native code generator&mdash;use <Option>-fvia-C</Option>.) This
2189 stuff is hairy with a capital H!
2190 </Para>
2191 </ListItem>
2192
2193 </ItemizedList>
2194
2195 </Para>
2196
2197 </Sect2>
2198
2199 </Sect1>
2200
2201 <Sect1 id="multi-param-type-classes">
2202 <Title>Multi-parameter type classes
2203 </Title>
2204
2205 <Para>
2206 This section documents GHC's implementation of multi-parameter type
2207 classes.  There's lots of background in the paper <ULink
2208 URL="http://research.microsoft.com/~simonpj/multi.ps.gz" >Type
2209 classes: exploring the design space</ULink > (Simon Peyton Jones, Mark
2210 Jones, Erik Meijer).
2211 </Para>
2212
2213 <Para>
2214 I'd like to thank people who reported shorcomings in the GHC 3.02
2215 implementation.  Our default decisions were all conservative ones, and
2216 the experience of these heroic pioneers has given useful concrete
2217 examples to support several generalisations.  (These appear below as
2218 design choices not implemented in 3.02.)
2219 </Para>
2220
2221 <Para>
2222 I've discussed these notes with Mark Jones, and I believe that Hugs
2223 will migrate towards the same design choices as I outline here.
2224 Thanks to him, and to many others who have offered very useful
2225 feedback.
2226 </Para>
2227
2228 <Sect2>
2229 <Title>Types</Title>
2230
2231 <Para>
2232 There are the following restrictions on the form of a qualified
2233 type:
2234 </Para>
2235
2236 <Para>
2237
2238 <ProgramListing>
2239   forall tv1..tvn (c1, ...,cn) => type
2240 </ProgramListing>
2241
2242 </Para>
2243
2244 <Para>
2245 (Here, I write the "foralls" explicitly, although the Haskell source
2246 language omits them; in Haskell 1.4, all the free type variables of an
2247 explicit source-language type signature are universally quantified,
2248 except for the class type variables in a class declaration.  However,
2249 in GHC, you can give the foralls if you want.  See <XRef LinkEnd="universal-quantification">).
2250 </Para>
2251
2252 <Para>
2253
2254 <OrderedList>
2255 <ListItem>
2256
2257 <Para>
2258  <Emphasis>Each universally quantified type variable
2259 <Literal>tvi</Literal> must be mentioned (i.e. appear free) in <Literal>type</Literal></Emphasis>.
2260
2261 The reason for this is that a value with a type that does not obey
2262 this restriction could not be used without introducing
2263 ambiguity. Here, for example, is an illegal type:
2264
2265
2266 <ProgramListing>
2267   forall a. Eq a => Int
2268 </ProgramListing>
2269
2270
2271 When a value with this type was used, the constraint <Literal>Eq tv</Literal>
2272 would be introduced where <Literal>tv</Literal> is a fresh type variable, and
2273 (in the dictionary-translation implementation) the value would be
2274 applied to a dictionary for <Literal>Eq tv</Literal>.  The difficulty is that we
2275 can never know which instance of <Literal>Eq</Literal> to use because we never
2276 get any more information about <Literal>tv</Literal>.
2277
2278 </Para>
2279 </ListItem>
2280 <ListItem>
2281
2282 <Para>
2283  <Emphasis>Every constraint <Literal>ci</Literal> must mention at least one of the
2284 universally quantified type variables <Literal>tvi</Literal></Emphasis>.
2285
2286 For example, this type is OK because <Literal>C a b</Literal> mentions the
2287 universally quantified type variable <Literal>b</Literal>:
2288
2289
2290 <ProgramListing>
2291   forall a. C a b => burble
2292 </ProgramListing>
2293
2294
2295 The next type is illegal because the constraint <Literal>Eq b</Literal> does not
2296 mention <Literal>a</Literal>:
2297
2298
2299 <ProgramListing>
2300   forall a. Eq b => burble
2301 </ProgramListing>
2302
2303
2304 The reason for this restriction is milder than the other one.  The
2305 excluded types are never useful or necessary (because the offending
2306 context doesn't need to be witnessed at this point; it can be floated
2307 out).  Furthermore, floating them out increases sharing. Lastly,
2308 excluding them is a conservative choice; it leaves a patch of
2309 territory free in case we need it later.
2310
2311 </Para>
2312 </ListItem>
2313
2314 </OrderedList>
2315
2316 </Para>
2317
2318 <Para>
2319 These restrictions apply to all types, whether declared in a type signature
2320 or inferred.
2321 </Para>
2322
2323 <Para>
2324 Unlike Haskell 1.4, constraints in types do <Emphasis>not</Emphasis> have to be of
2325 the form <Emphasis>(class type-variables)</Emphasis>.  Thus, these type signatures
2326 are perfectly OK
2327 </Para>
2328
2329 <Para>
2330
2331 <ProgramListing>
2332   f :: Eq (m a) => [m a] -> [m a]
2333   g :: Eq [a] => ...
2334 </ProgramListing>
2335
2336 </Para>
2337
2338 <Para>
2339 This choice recovers principal types, a property that Haskell 1.4 does not have.
2340 </Para>
2341
2342 </Sect2>
2343
2344 <Sect2>
2345 <Title>Class declarations</Title>
2346
2347 <Para>
2348
2349 <OrderedList>
2350 <ListItem>
2351
2352 <Para>
2353  <Emphasis>Multi-parameter type classes are permitted</Emphasis>. For example:
2354
2355
2356 <ProgramListing>
2357   class Collection c a where
2358     union :: c a -> c a -> c a
2359     ...etc.
2360 </ProgramListing>
2361
2362
2363
2364 </Para>
2365 </ListItem>
2366 <ListItem>
2367
2368 <Para>
2369  <Emphasis>The class hierarchy must be acyclic</Emphasis>.  However, the definition
2370 of "acyclic" involves only the superclass relationships.  For example,
2371 this is OK:
2372
2373
2374 <ProgramListing>
2375   class C a where {
2376     op :: D b => a -> b -> b
2377   }
2378
2379   class C a => D a where { ... }
2380 </ProgramListing>
2381
2382
2383 Here, <Literal>C</Literal> is a superclass of <Literal>D</Literal>, but it's OK for a
2384 class operation <Literal>op</Literal> of <Literal>C</Literal> to mention <Literal>D</Literal>.  (It
2385 would not be OK for <Literal>D</Literal> to be a superclass of <Literal>C</Literal>.)
2386
2387 </Para>
2388 </ListItem>
2389 <ListItem>
2390
2391 <Para>
2392  <Emphasis>There are no restrictions on the context in a class declaration
2393 (which introduces superclasses), except that the class hierarchy must
2394 be acyclic</Emphasis>.  So these class declarations are OK:
2395
2396
2397 <ProgramListing>
2398   class Functor (m k) => FiniteMap m k where
2399     ...
2400
2401   class (Monad m, Monad (t m)) => Transform t m where
2402     lift :: m a -> (t m) a
2403 </ProgramListing>
2404
2405
2406 </Para>
2407 </ListItem>
2408 <ListItem>
2409
2410 <Para>
2411  <Emphasis>In the signature of a class operation, every constraint
2412 must mention at least one type variable that is not a class type
2413 variable</Emphasis>.
2414
2415 Thus:
2416
2417
2418 <ProgramListing>
2419   class Collection c a where
2420     mapC :: Collection c b => (a->b) -> c a -> c b
2421 </ProgramListing>
2422
2423
2424 is OK because the constraint <Literal>(Collection a b)</Literal> mentions
2425 <Literal>b</Literal>, even though it also mentions the class variable
2426 <Literal>a</Literal>.  On the other hand:
2427
2428
2429 <ProgramListing>
2430   class C a where
2431     op :: Eq a => (a,b) -> (a,b)
2432 </ProgramListing>
2433
2434
2435 is not OK because the constraint <Literal>(Eq a)</Literal> mentions on the class
2436 type variable <Literal>a</Literal>, but not <Literal>b</Literal>.  However, any such
2437 example is easily fixed by moving the offending context up to the
2438 superclass context:
2439
2440
2441 <ProgramListing>
2442   class Eq a => C a where
2443     op ::(a,b) -> (a,b)
2444 </ProgramListing>
2445
2446
2447 A yet more relaxed rule would allow the context of a class-op signature
2448 to mention only class type variables.  However, that conflicts with
2449 Rule 1(b) for types above.
2450
2451 </Para>
2452 </ListItem>
2453 <ListItem>
2454
2455 <Para>
2456  <Emphasis>The type of each class operation must mention <Emphasis>all</Emphasis> of
2457 the class type variables</Emphasis>.  For example:
2458
2459
2460 <ProgramListing>
2461   class Coll s a where
2462     empty  :: s
2463     insert :: s -> a -> s
2464 </ProgramListing>
2465
2466
2467 is not OK, because the type of <Literal>empty</Literal> doesn't mention
2468 <Literal>a</Literal>.  This rule is a consequence of Rule 1(a), above, for
2469 types, and has the same motivation.
2470
2471 Sometimes, offending class declarations exhibit misunderstandings.  For
2472 example, <Literal>Coll</Literal> might be rewritten
2473
2474
2475 <ProgramListing>
2476   class Coll s a where
2477     empty  :: s a
2478     insert :: s a -> a -> s a
2479 </ProgramListing>
2480
2481
2482 which makes the connection between the type of a collection of
2483 <Literal>a</Literal>'s (namely <Literal>(s a)</Literal>) and the element type <Literal>a</Literal>.
2484 Occasionally this really doesn't work, in which case you can split the
2485 class like this:
2486
2487
2488 <ProgramListing>
2489   class CollE s where
2490     empty  :: s
2491
2492   class CollE s => Coll s a where
2493     insert :: s -> a -> s
2494 </ProgramListing>
2495
2496
2497 </Para>
2498 </ListItem>
2499
2500 </OrderedList>
2501
2502 </Para>
2503
2504 </Sect2>
2505
2506 <Sect2>
2507 <Title>Instance declarations</Title>
2508
2509 <Para>
2510
2511 <OrderedList>
2512 <ListItem>
2513
2514 <Para>
2515  <Emphasis>Instance declarations may not overlap</Emphasis>.  The two instance
2516 declarations
2517
2518
2519 <ProgramListing>
2520   instance context1 => C type1 where ...
2521   instance context2 => C type2 where ...
2522 </ProgramListing>
2523
2524
2525 "overlap" if <Literal>type1</Literal> and <Literal>type2</Literal> unify
2526
2527 However, if you give the command line option
2528 <Option>-fallow-overlapping-instances</Option><IndexTerm><Primary>-fallow-overlapping-instances
2529 option</Primary></IndexTerm> then two overlapping instance declarations are permitted
2530 iff
2531
2532
2533 <ItemizedList>
2534 <ListItem>
2535
2536 <Para>
2537  EITHER <Literal>type1</Literal> and <Literal>type2</Literal> do not unify
2538 </Para>
2539 </ListItem>
2540 <ListItem>
2541
2542 <Para>
2543  OR <Literal>type2</Literal> is a substitution instance of <Literal>type1</Literal>
2544 (but not identical to <Literal>type1</Literal>)
2545 </Para>
2546 </ListItem>
2547 <ListItem>
2548
2549 <Para>
2550  OR vice versa
2551 </Para>
2552 </ListItem>
2553
2554 </ItemizedList>
2555
2556
2557 Notice that these rules
2558
2559
2560 <ItemizedList>
2561 <ListItem>
2562
2563 <Para>
2564  make it clear which instance decl to use
2565 (pick the most specific one that matches)
2566
2567 </Para>
2568 </ListItem>
2569 <ListItem>
2570
2571 <Para>
2572  do not mention the contexts <Literal>context1</Literal>, <Literal>context2</Literal>
2573 Reason: you can pick which instance decl
2574 "matches" based on the type.
2575 </Para>
2576 </ListItem>
2577
2578 </ItemizedList>
2579
2580
2581 Regrettably, GHC doesn't guarantee to detect overlapping instance
2582 declarations if they appear in different modules.  GHC can "see" the
2583 instance declarations in the transitive closure of all the modules
2584 imported by the one being compiled, so it can "see" all instance decls
2585 when it is compiling <Literal>Main</Literal>.  However, it currently chooses not
2586 to look at ones that can't possibly be of use in the module currently
2587 being compiled, in the interests of efficiency.  (Perhaps we should
2588 change that decision, at least for <Literal>Main</Literal>.)
2589
2590 </Para>
2591 </ListItem>
2592 <ListItem>
2593
2594 <Para>
2595  <Emphasis>There are no restrictions on the type in an instance
2596 <Emphasis>head</Emphasis>, except that at least one must not be a type variable</Emphasis>.
2597 The instance "head" is the bit after the "=>" in an instance decl. For
2598 example, these are OK:
2599
2600
2601 <ProgramListing>
2602   instance C Int a where ...
2603
2604   instance D (Int, Int) where ...
2605
2606   instance E [[a]] where ...
2607 </ProgramListing>
2608
2609
2610 Note that instance heads <Emphasis>may</Emphasis> contain repeated type variables.
2611 For example, this is OK:
2612
2613
2614 <ProgramListing>
2615   instance Stateful (ST s) (MutVar s) where ...
2616 </ProgramListing>
2617
2618
2619 The "at least one not a type variable" restriction is to ensure that
2620 context reduction terminates: each reduction step removes one type
2621 constructor.  For example, the following would make the type checker
2622 loop if it wasn't excluded:
2623
2624
2625 <ProgramListing>
2626   instance C a => C a where ...
2627 </ProgramListing>
2628
2629
2630 There are two situations in which the rule is a bit of a pain. First,
2631 if one allows overlapping instance declarations then it's quite
2632 convenient to have a "default instance" declaration that applies if
2633 something more specific does not:
2634
2635
2636 <ProgramListing>
2637   instance C a where
2638     op = ... -- Default
2639 </ProgramListing>
2640
2641
2642 Second, sometimes you might want to use the following to get the
2643 effect of a "class synonym":
2644
2645
2646 <ProgramListing>
2647   class (C1 a, C2 a, C3 a) => C a where { }
2648
2649   instance (C1 a, C2 a, C3 a) => C a where { }
2650 </ProgramListing>
2651
2652
2653 This allows you to write shorter signatures:
2654
2655
2656 <ProgramListing>
2657   f :: C a => ...
2658 </ProgramListing>
2659
2660
2661 instead of
2662
2663
2664 <ProgramListing>
2665   f :: (C1 a, C2 a, C3 a) => ...
2666 </ProgramListing>
2667
2668
2669 I'm on the lookout for a simple rule that preserves decidability while
2670 allowing these idioms.  The experimental flag
2671 <Option>-fallow-undecidable-instances</Option><IndexTerm><Primary>-fallow-undecidable-instances
2672 option</Primary></IndexTerm> lifts this restriction, allowing all the types in an
2673 instance head to be type variables.
2674
2675 </Para>
2676 </ListItem>
2677 <ListItem>
2678
2679 <Para>
2680  <Emphasis>Unlike Haskell 1.4, instance heads may use type
2681 synonyms</Emphasis>.  As always, using a type synonym is just shorthand for
2682 writing the RHS of the type synonym definition.  For example:
2683
2684
2685 <ProgramListing>
2686   type Point = (Int,Int)
2687   instance C Point   where ...
2688   instance C [Point] where ...
2689 </ProgramListing>
2690
2691
2692 is legal.  However, if you added
2693
2694
2695 <ProgramListing>
2696   instance C (Int,Int) where ...
2697 </ProgramListing>
2698
2699
2700 as well, then the compiler will complain about the overlapping
2701 (actually, identical) instance declarations.  As always, type synonyms
2702 must be fully applied.  You cannot, for example, write:
2703
2704
2705 <ProgramListing>
2706   type P a = [[a]]
2707   instance Monad P where ...
2708 </ProgramListing>
2709
2710
2711 This design decision is independent of all the others, and easily
2712 reversed, but it makes sense to me.
2713
2714 </Para>
2715 </ListItem>
2716 <ListItem>
2717
2718 <Para>
2719 <Emphasis>The types in an instance-declaration <Emphasis>context</Emphasis> must all
2720 be type variables</Emphasis>. Thus
2721
2722
2723 <ProgramListing>
2724 instance C a b => Eq (a,b) where ...
2725 </ProgramListing>
2726
2727
2728 is OK, but
2729
2730
2731 <ProgramListing>
2732 instance C Int b => Foo b where ...
2733 </ProgramListing>
2734
2735
2736 is not OK.  Again, the intent here is to make sure that context
2737 reduction terminates.
2738
2739 Voluminous correspondence on the Haskell mailing list has convinced me
2740 that it's worth experimenting with a more liberal rule.  If you use
2741 the flag <Option>-fallow-undecidable-instances</Option> can use arbitrary
2742 types in an instance context.  Termination is ensured by having a
2743 fixed-depth recursion stack.  If you exceed the stack depth you get a
2744 sort of backtrace, and the opportunity to increase the stack depth
2745 with <Option>-fcontext-stack</Option><Emphasis>N</Emphasis>.
2746
2747 </Para>
2748 </ListItem>
2749
2750 </OrderedList>
2751
2752 </Para>
2753
2754 </Sect2>
2755
2756 </Sect1>
2757
2758 <Sect1 id="universal-quantification">
2759 <Title>Explicit universal quantification
2760 </Title>
2761
2762 <Para>
2763 GHC now allows you to write explicitly quantified types.  GHC's
2764 syntax for this now agrees with Hugs's, namely:
2765 </Para>
2766
2767 <Para>
2768
2769 <ProgramListing>
2770         forall a b. (Ord a, Eq  b) => a -> b -> a
2771 </ProgramListing>
2772
2773 </Para>
2774
2775 <Para>
2776 The context is, of course, optional.  You can't use <Literal>forall</Literal> as
2777 a type variable any more!
2778 </Para>
2779
2780 <Para>
2781 Haskell type signatures are implicitly quantified.  The <Literal>forall</Literal>
2782 allows us to say exactly what this means.  For example:
2783 </Para>
2784
2785 <Para>
2786
2787 <ProgramListing>
2788         g :: b -> b
2789 </ProgramListing>
2790
2791 </Para>
2792
2793 <Para>
2794 means this:
2795 </Para>
2796
2797 <Para>
2798
2799 <ProgramListing>
2800         g :: forall b. (b -> b)
2801 </ProgramListing>
2802
2803 </Para>
2804
2805 <Para>
2806 The two are treated identically.
2807 </Para>
2808
2809 <Sect2 id="univ">
2810 <Title>Universally-quantified data type fields
2811 </Title>
2812
2813 <Para>
2814 In a <Literal>data</Literal> or <Literal>newtype</Literal> declaration one can quantify
2815 the types of the constructor arguments.  Here are several examples:
2816 </Para>
2817
2818 <Para>
2819
2820 <ProgramListing>
2821 data T a = T1 (forall b. b -> b -> b) a
2822
2823 data MonadT m = MkMonad { return :: forall a. a -> m a,
2824                           bind   :: forall a b. m a -> (a -> m b) -> m b
2825                         }
2826
2827 newtype Swizzle = MkSwizzle (Ord a => [a] -> [a])
2828 </ProgramListing>
2829
2830 </Para>
2831
2832 <Para>
2833 The constructors now have so-called <Emphasis>rank 2</Emphasis> polymorphic
2834 types, in which there is a for-all in the argument types.:
2835 </Para>
2836
2837 <Para>
2838
2839 <ProgramListing>
2840 T1 :: forall a. (forall b. b -> b -> b) -> a -> T a
2841 MkMonad :: forall m. (forall a. a -> m a)
2842                   -> (forall a b. m a -> (a -> m b) -> m b)
2843                   -> MonadT m
2844 MkSwizzle :: (Ord a => [a] -> [a]) -> Swizzle
2845 </ProgramListing>
2846
2847 </Para>
2848
2849 <Para>
2850 Notice that you don't need to use a <Literal>forall</Literal> if there's an
2851 explicit context.  For example in the first argument of the
2852 constructor <Function>MkSwizzle</Function>, an implicit "<Literal>forall a.</Literal>" is
2853 prefixed to the argument type.  The implicit <Literal>forall</Literal>
2854 quantifies all type variables that are not already in scope, and are
2855 mentioned in the type quantified over.
2856 </Para>
2857
2858 <Para>
2859 As for type signatures, implicit quantification happens for non-overloaded
2860 types too.  So if you write this:
2861
2862 <ProgramListing>
2863   data T a = MkT (Either a b) (b -> b)
2864 </ProgramListing>
2865
2866 it's just as if you had written this:
2867
2868 <ProgramListing>
2869   data T a = MkT (forall b. Either a b) (forall b. b -> b)
2870 </ProgramListing>
2871
2872 That is, since the type variable <Literal>b</Literal> isn't in scope, it's
2873 implicitly universally quantified.  (Arguably, it would be better
2874 to <Emphasis>require</Emphasis> explicit quantification on constructor arguments
2875 where that is what is wanted.  Feedback welcomed.)
2876 </Para>
2877
2878 </Sect2>
2879
2880 <Sect2>
2881 <Title>Construction </Title>
2882
2883 <Para>
2884 You construct values of types <Literal>T1, MonadT, Swizzle</Literal> by applying
2885 the constructor to suitable values, just as usual.  For example,
2886 </Para>
2887
2888 <Para>
2889
2890 <ProgramListing>
2891 (T1 (\xy->x) 3) :: T Int
2892
2893 (MkSwizzle sort)    :: Swizzle
2894 (MkSwizzle reverse) :: Swizzle
2895
2896 (let r x = Just x
2897      b m k = case m of
2898                 Just y -> k y
2899                 Nothing -> Nothing
2900   in
2901   MkMonad r b) :: MonadT Maybe
2902 </ProgramListing>
2903
2904 </Para>
2905
2906 <Para>
2907 The type of the argument can, as usual, be more general than the type
2908 required, as <Literal>(MkSwizzle reverse)</Literal> shows.  (<Function>reverse</Function>
2909 does not need the <Literal>Ord</Literal> constraint.)
2910 </Para>
2911
2912 </Sect2>
2913
2914 <Sect2>
2915 <Title>Pattern matching</Title>
2916
2917 <Para>
2918 When you use pattern matching, the bound variables may now have
2919 polymorphic types.  For example:
2920 </Para>
2921
2922 <Para>
2923
2924 <ProgramListing>
2925         f :: T a -> a -> (a, Char)
2926         f (T1 f k) x = (f k x, f 'c' 'd')
2927
2928         g :: (Ord a, Ord b) => Swizzle -> [a] -> (a -> b) -> [b]
2929         g (MkSwizzle s) xs f = s (map f (s xs))
2930
2931         h :: MonadT m -> [m a] -> m [a]
2932         h m [] = return m []
2933         h m (x:xs) = bind m x           $ \y ->
2934                       bind m (h m xs)   $ \ys ->
2935                       return m (y:ys)
2936 </ProgramListing>
2937
2938 </Para>
2939
2940 <Para>
2941 In the function <Function>h</Function> we use the record selectors <Literal>return</Literal>
2942 and <Literal>bind</Literal> to extract the polymorphic bind and return functions
2943 from the <Literal>MonadT</Literal> data structure, rather than using pattern
2944 matching.
2945 </Para>
2946
2947 <Para>
2948 You cannot pattern-match against an argument that is polymorphic.
2949 For example:
2950
2951 <ProgramListing>
2952         newtype TIM s a = TIM (ST s (Maybe a))
2953
2954         runTIM :: (forall s. TIM s a) -> Maybe a
2955         runTIM (TIM m) = runST m
2956 </ProgramListing>
2957
2958 </Para>
2959
2960 <Para>
2961 Here the pattern-match fails, because you can't pattern-match against
2962 an argument of type <Literal>(forall s. TIM s a)</Literal>.  Instead you
2963 must bind the variable and pattern match in the right hand side:
2964
2965 <ProgramListing>
2966         runTIM :: (forall s. TIM s a) -> Maybe a
2967         runTIM tm = case tm of { TIM m -> runST m }
2968 </ProgramListing>
2969
2970 The <Literal>tm</Literal> on the right hand side is (invisibly) instantiated, like
2971 any polymorphic value at its occurrence site, and now you can pattern-match
2972 against it.
2973 </Para>
2974
2975 </Sect2>
2976
2977 <Sect2>
2978 <Title>The partial-application restriction</Title>
2979
2980 <Para>
2981 There is really only one way in which data structures with polymorphic
2982 components might surprise you: you must not partially apply them.
2983 For example, this is illegal:
2984 </Para>
2985
2986 <Para>
2987
2988 <ProgramListing>
2989         map MkSwizzle [sort, reverse]
2990 </ProgramListing>
2991
2992 </Para>
2993
2994 <Para>
2995 The restriction is this: <Emphasis>every subexpression of the program must
2996 have a type that has no for-alls, except that in a function
2997 application (f e1&hellip;en) the partial applications are not subject to
2998 this rule</Emphasis>.  The restriction makes type inference feasible.
2999 </Para>
3000
3001 <Para>
3002 In the illegal example, the sub-expression <Literal>MkSwizzle</Literal> has the
3003 polymorphic type <Literal>(Ord b => [b] -> [b]) -> Swizzle</Literal> and is not
3004 a sub-expression of an enclosing application.  On the other hand, this
3005 expression is OK:
3006 </Para>
3007
3008 <Para>
3009
3010 <ProgramListing>
3011         map (T1 (\a b -> a)) [1,2,3]
3012 </ProgramListing>
3013
3014 </Para>
3015
3016 <Para>
3017 even though it involves a partial application of <Function>T1</Function>, because
3018 the sub-expression <Literal>T1 (\a b -> a)</Literal> has type <Literal>Int -> T
3019 Int</Literal>.
3020 </Para>
3021
3022 </Sect2>
3023
3024 <Sect2 id="sigs">
3025 <Title>Type signatures
3026 </Title>
3027
3028 <Para>
3029 Once you have data constructors with universally-quantified fields, or
3030 constants such as <Constant>runST</Constant> that have rank-2 types, it isn't long
3031 before you discover that you need more!  Consider:
3032 </Para>
3033
3034 <Para>
3035
3036 <ProgramListing>
3037   mkTs f x y = [T1 f x, T1 f y]
3038 </ProgramListing>
3039
3040 </Para>
3041
3042 <Para>
3043 <Function>mkTs</Function> is a fuction that constructs some values of type
3044 <Literal>T</Literal>, using some pieces passed to it.  The trouble is that since
3045 <Literal>f</Literal> is a function argument, Haskell assumes that it is
3046 monomorphic, so we'll get a type error when applying <Function>T1</Function> to
3047 it.  This is a rather silly example, but the problem really bites in
3048 practice.  Lots of people trip over the fact that you can't make
3049 "wrappers functions" for <Constant>runST</Constant> for exactly the same reason.
3050 In short, it is impossible to build abstractions around functions with
3051 rank-2 types.
3052 </Para>
3053
3054 <Para>
3055 The solution is fairly clear.  We provide the ability to give a rank-2
3056 type signature for <Emphasis>ordinary</Emphasis> functions (not only data
3057 constructors), thus:
3058 </Para>
3059
3060 <Para>
3061
3062 <ProgramListing>
3063   mkTs :: (forall b. b -> b -> b) -> a -> [T a]
3064   mkTs f x y = [T1 f x, T1 f y]
3065 </ProgramListing>
3066
3067 </Para>
3068
3069 <Para>
3070 This type signature tells the compiler to attribute <Literal>f</Literal> with
3071 the polymorphic type <Literal>(forall b. b -> b -> b)</Literal> when type
3072 checking the body of <Function>mkTs</Function>, so now the application of
3073 <Function>T1</Function> is fine.
3074 </Para>
3075
3076 <Para>
3077 There are two restrictions:
3078 </Para>
3079
3080 <Para>
3081
3082 <ItemizedList>
3083 <ListItem>
3084
3085 <Para>
3086  You can only define a rank 2 type, specified by the following
3087 grammar:
3088
3089
3090 <ProgramListing>
3091 rank2type ::= [forall tyvars .] [context =>] funty
3092 funty     ::= ([forall tyvars .] [context =>] ty) -> funty
3093             | ty
3094 ty        ::= ...current Haskell monotype syntax...
3095 </ProgramListing>
3096
3097
3098 Informally, the universal quantification must all be right at the beginning,
3099 or at the top level of a function argument.
3100
3101 </Para>
3102 </ListItem>
3103 <ListItem>
3104
3105 <Para>
3106  There is a restriction on the definition of a function whose
3107 type signature is a rank-2 type: the polymorphic arguments must be
3108 matched on the left hand side of the "<Literal>=</Literal>" sign.  You can't
3109 define <Function>mkTs</Function> like this:
3110
3111
3112 <ProgramListing>
3113 mkTs :: (forall b. b -> b -> b) -> a -> [T a]
3114 mkTs = \ f x y -> [T1 f x, T1 f y]
3115 </ProgramListing>
3116
3117
3118
3119 The same partial-application rule applies to ordinary functions with
3120 rank-2 types as applied to data constructors.
3121
3122 </Para>
3123 </ListItem>
3124
3125 </ItemizedList>
3126
3127 </Para>
3128
3129 </Sect2>
3130
3131
3132 <Sect2 id="hoist">
3133 <Title>Type synonyms and hoisting
3134 </Title>
3135
3136 <Para>
3137 GHC also allows you to write a <Literal>forall</Literal> in a type synonym, thus:
3138 <ProgramListing>
3139   type Discard a = forall b. a -> b -> a
3140
3141   f :: Discard a
3142   f x y = x
3143 </ProgramListing>
3144 However, it is often convenient to use these sort of synonyms at the right hand
3145 end of an arrow, thus:
3146 <ProgramListing>
3147   type Discard a = forall b. a -> b -> a
3148
3149   g :: Int -> Discard Int
3150   g x y z = x+y
3151 </ProgramListing>
3152 Simply expanding the type synonym would give
3153 <ProgramListing>
3154   g :: Int -> (forall b. Int -> b -> Int)
3155 </ProgramListing>
3156 but GHC "hoists" the <Literal>forall</Literal> to give the isomorphic type
3157 <ProgramListing>
3158   g :: forall b. Int -> Int -> b -> Int
3159 </ProgramListing>
3160 In general, the rule is this: <Emphasis>to determine the type specified by any explicit
3161 user-written type (e.g. in a type signature), GHC expands type synonyms and then repeatedly
3162 performs the transformation:</Emphasis>
3163 <ProgramListing>
3164   <Emphasis>type1</Emphasis> -> forall a. <Emphasis>type2</Emphasis>
3165 ==>
3166   forall a. <Emphasis>type1</Emphasis> -> <Emphasis>type2</Emphasis>
3167 </ProgramListing>
3168 (In fact, GHC tries to retain as much synonym information as possible for use in
3169 error messages, but that is a usability issue.)  This rule applies, of course, whether
3170 or not the <Literal>forall</Literal> comes from a synonym. For example, here is another
3171 valid way to write <Literal>g</Literal>'s type signature:
3172 <ProgramListing>
3173   g :: Int -> Int -> forall b. b -> Int
3174 </ProgramListing>
3175 </Para>
3176 </Sect2>
3177
3178 </Sect1>
3179
3180 <Sect1 id="existential-quantification">
3181 <Title>Existentially quantified data constructors
3182 </Title>
3183
3184 <Para>
3185 The idea of using existential quantification in data type declarations
3186 was suggested by Laufer (I believe, thought doubtless someone will
3187 correct me), and implemented in Hope+. It's been in Lennart
3188 Augustsson's <Command>hbc</Command> Haskell compiler for several years, and
3189 proved very useful.  Here's the idea.  Consider the declaration:
3190 </Para>
3191
3192 <Para>
3193
3194 <ProgramListing>
3195   data Foo = forall a. MkFoo a (a -> Bool)
3196            | Nil
3197 </ProgramListing>
3198
3199 </Para>
3200
3201 <Para>
3202 The data type <Literal>Foo</Literal> has two constructors with types:
3203 </Para>
3204
3205 <Para>
3206
3207 <ProgramListing>
3208   MkFoo :: forall a. a -> (a -> Bool) -> Foo
3209   Nil   :: Foo
3210 </ProgramListing>
3211
3212 </Para>
3213
3214 <Para>
3215 Notice that the type variable <Literal>a</Literal> in the type of <Function>MkFoo</Function>
3216 does not appear in the data type itself, which is plain <Literal>Foo</Literal>.
3217 For example, the following expression is fine:
3218 </Para>
3219
3220 <Para>
3221
3222 <ProgramListing>
3223   [MkFoo 3 even, MkFoo 'c' isUpper] :: [Foo]
3224 </ProgramListing>
3225
3226 </Para>
3227
3228 <Para>
3229 Here, <Literal>(MkFoo 3 even)</Literal> packages an integer with a function
3230 <Function>even</Function> that maps an integer to <Literal>Bool</Literal>; and <Function>MkFoo 'c'
3231 isUpper</Function> packages a character with a compatible function.  These
3232 two things are each of type <Literal>Foo</Literal> and can be put in a list.
3233 </Para>
3234
3235 <Para>
3236 What can we do with a value of type <Literal>Foo</Literal>?.  In particular,
3237 what happens when we pattern-match on <Function>MkFoo</Function>?
3238 </Para>
3239
3240 <Para>
3241
3242 <ProgramListing>
3243   f (MkFoo val fn) = ???
3244 </ProgramListing>
3245
3246 </Para>
3247
3248 <Para>
3249 Since all we know about <Literal>val</Literal> and <Function>fn</Function> is that they
3250 are compatible, the only (useful) thing we can do with them is to
3251 apply <Function>fn</Function> to <Literal>val</Literal> to get a boolean.  For example:
3252 </Para>
3253
3254 <Para>
3255
3256 <ProgramListing>
3257   f :: Foo -> Bool
3258   f (MkFoo val fn) = fn val
3259 </ProgramListing>
3260
3261 </Para>
3262
3263 <Para>
3264 What this allows us to do is to package heterogenous values
3265 together with a bunch of functions that manipulate them, and then treat
3266 that collection of packages in a uniform manner.  You can express
3267 quite a bit of object-oriented-like programming this way.
3268 </Para>
3269
3270 <Sect2 id="existential">
3271 <Title>Why existential?
3272 </Title>
3273
3274 <Para>
3275 What has this to do with <Emphasis>existential</Emphasis> quantification?
3276 Simply that <Function>MkFoo</Function> has the (nearly) isomorphic type
3277 </Para>
3278
3279 <Para>
3280
3281 <ProgramListing>
3282   MkFoo :: (exists a . (a, a -> Bool)) -> Foo
3283 </ProgramListing>
3284
3285 </Para>
3286
3287 <Para>
3288 But Haskell programmers can safely think of the ordinary
3289 <Emphasis>universally</Emphasis> quantified type given above, thereby avoiding
3290 adding a new existential quantification construct.
3291 </Para>
3292
3293 </Sect2>
3294
3295 <Sect2>
3296 <Title>Type classes</Title>
3297
3298 <Para>
3299 An easy extension (implemented in <Command>hbc</Command>) is to allow
3300 arbitrary contexts before the constructor.  For example:
3301 </Para>
3302
3303 <Para>
3304
3305 <ProgramListing>
3306 data Baz = forall a. Eq a => Baz1 a a
3307          | forall b. Show b => Baz2 b (b -> b)
3308 </ProgramListing>
3309
3310 </Para>
3311
3312 <Para>
3313 The two constructors have the types you'd expect:
3314 </Para>
3315
3316 <Para>
3317
3318 <ProgramListing>
3319 Baz1 :: forall a. Eq a => a -> a -> Baz
3320 Baz2 :: forall b. Show b => b -> (b -> b) -> Baz
3321 </ProgramListing>
3322
3323 </Para>
3324
3325 <Para>
3326 But when pattern matching on <Function>Baz1</Function> the matched values can be compared
3327 for equality, and when pattern matching on <Function>Baz2</Function> the first matched
3328 value can be converted to a string (as well as applying the function to it).
3329 So this program is legal:
3330 </Para>
3331
3332 <Para>
3333
3334 <ProgramListing>
3335   f :: Baz -> String
3336   f (Baz1 p q) | p == q    = "Yes"
3337                | otherwise = "No"
3338   f (Baz1 v fn)            = show (fn v)
3339 </ProgramListing>
3340
3341 </Para>
3342
3343 <Para>
3344 Operationally, in a dictionary-passing implementation, the
3345 constructors <Function>Baz1</Function> and <Function>Baz2</Function> must store the
3346 dictionaries for <Literal>Eq</Literal> and <Literal>Show</Literal> respectively, and
3347 extract it on pattern matching.
3348 </Para>
3349
3350 <Para>
3351 Notice the way that the syntax fits smoothly with that used for
3352 universal quantification earlier.
3353 </Para>
3354
3355 </Sect2>
3356
3357 <Sect2>
3358 <Title>Restrictions</Title>
3359
3360 <Para>
3361 There are several restrictions on the ways in which existentially-quantified
3362 constructors can be use.
3363 </Para>
3364
3365 <Para>
3366
3367 <ItemizedList>
3368 <ListItem>
3369
3370 <Para>
3371  When pattern matching, each pattern match introduces a new,
3372 distinct, type for each existential type variable.  These types cannot
3373 be unified with any other type, nor can they escape from the scope of
3374 the pattern match.  For example, these fragments are incorrect:
3375
3376
3377 <ProgramListing>
3378 f1 (MkFoo a f) = a
3379 </ProgramListing>
3380
3381
3382 Here, the type bound by <Function>MkFoo</Function> "escapes", because <Literal>a</Literal>
3383 is the result of <Function>f1</Function>.  One way to see why this is wrong is to
3384 ask what type <Function>f1</Function> has:
3385
3386
3387 <ProgramListing>
3388   f1 :: Foo -> a             -- Weird!
3389 </ProgramListing>
3390
3391
3392 What is this "<Literal>a</Literal>" in the result type? Clearly we don't mean
3393 this:
3394
3395
3396 <ProgramListing>
3397   f1 :: forall a. Foo -> a   -- Wrong!
3398 </ProgramListing>
3399
3400
3401 The original program is just plain wrong.  Here's another sort of error
3402
3403
3404 <ProgramListing>
3405   f2 (Baz1 a b) (Baz1 p q) = a==q
3406 </ProgramListing>
3407
3408
3409 It's ok to say <Literal>a==b</Literal> or <Literal>p==q</Literal>, but
3410 <Literal>a==q</Literal> is wrong because it equates the two distinct types arising
3411 from the two <Function>Baz1</Function> constructors.
3412
3413
3414 </Para>
3415 </ListItem>
3416 <ListItem>
3417
3418 <Para>
3419 You can't pattern-match on an existentially quantified
3420 constructor in a <Literal>let</Literal> or <Literal>where</Literal> group of
3421 bindings. So this is illegal:
3422
3423
3424 <ProgramListing>
3425   f3 x = a==b where { Baz1 a b = x }
3426 </ProgramListing>
3427
3428
3429 You can only pattern-match
3430 on an existentially-quantified constructor in a <Literal>case</Literal> expression or
3431 in the patterns of a function definition.
3432
3433 The reason for this restriction is really an implementation one.
3434 Type-checking binding groups is already a nightmare without
3435 existentials complicating the picture.  Also an existential pattern
3436 binding at the top level of a module doesn't make sense, because it's
3437 not clear how to prevent the existentially-quantified type "escaping".
3438 So for now, there's a simple-to-state restriction.  We'll see how
3439 annoying it is.
3440
3441 </Para>
3442 </ListItem>
3443 <ListItem>
3444
3445 <Para>
3446 You can't use existential quantification for <Literal>newtype</Literal>
3447 declarations.  So this is illegal:
3448
3449
3450 <ProgramListing>
3451   newtype T = forall a. Ord a => MkT a
3452 </ProgramListing>
3453
3454
3455 Reason: a value of type <Literal>T</Literal> must be represented as a pair
3456 of a dictionary for <Literal>Ord t</Literal> and a value of type <Literal>t</Literal>.
3457 That contradicts the idea that <Literal>newtype</Literal> should have no
3458 concrete representation.  You can get just the same efficiency and effect
3459 by using <Literal>data</Literal> instead of <Literal>newtype</Literal>.  If there is no
3460 overloading involved, then there is more of a case for allowing
3461 an existentially-quantified <Literal>newtype</Literal>, because the <Literal>data</Literal>
3462 because the <Literal>data</Literal> version does carry an implementation cost,
3463 but single-field existentially quantified constructors aren't much
3464 use.  So the simple restriction (no existential stuff on <Literal>newtype</Literal>)
3465 stands, unless there are convincing reasons to change it.
3466
3467
3468 </Para>
3469 </ListItem>
3470 <ListItem>
3471
3472 <Para>
3473  You can't use <Literal>deriving</Literal> to define instances of a
3474 data type with existentially quantified data constructors.
3475
3476 Reason: in most cases it would not make sense. For example:&num;
3477
3478 <ProgramListing>
3479 data T = forall a. MkT [a] deriving( Eq )
3480 </ProgramListing>
3481
3482 To derive <Literal>Eq</Literal> in the standard way we would need to have equality
3483 between the single component of two <Function>MkT</Function> constructors:
3484
3485 <ProgramListing>
3486 instance Eq T where
3487   (MkT a) == (MkT b) = ???
3488 </ProgramListing>
3489
3490 But <VarName>a</VarName> and <VarName>b</VarName> have distinct types, and so can't be compared.
3491 It's just about possible to imagine examples in which the derived instance
3492 would make sense, but it seems altogether simpler simply to prohibit such
3493 declarations.  Define your own instances!
3494 </Para>
3495 </ListItem>
3496
3497 </ItemizedList>
3498
3499 </Para>
3500
3501 </Sect2>
3502
3503 </Sect1>
3504
3505 <Sect1 id="sec-assertions">
3506 <Title>Assertions
3507 <IndexTerm><Primary>Assertions</Primary></IndexTerm>
3508 </Title>
3509
3510 <Para>
3511 If you want to make use of assertions in your standard Haskell code, you
3512 could define a function like the following:
3513 </Para>
3514
3515 <Para>
3516
3517 <ProgramListing>
3518 assert :: Bool -> a -> a
3519 assert False x = error "assertion failed!"
3520 assert _     x = x
3521 </ProgramListing>
3522
3523 </Para>
3524
3525 <Para>
3526 which works, but gives you back a less than useful error message --
3527 an assertion failed, but which and where?
3528 </Para>
3529
3530 <Para>
3531 One way out is to define an extended <Function>assert</Function> function which also
3532 takes a descriptive string to include in the error message and
3533 perhaps combine this with the use of a pre-processor which inserts
3534 the source location where <Function>assert</Function> was used.
3535 </Para>
3536
3537 <Para>
3538 Ghc offers a helping hand here, doing all of this for you. For every
3539 use of <Function>assert</Function> in the user's source:
3540 </Para>
3541
3542 <Para>
3543
3544 <ProgramListing>
3545 kelvinToC :: Double -> Double
3546 kelvinToC k = assert (k &amp;gt;= 0.0) (k+273.15)
3547 </ProgramListing>
3548
3549 </Para>
3550
3551 <Para>
3552 Ghc will rewrite this to also include the source location where the
3553 assertion was made,
3554 </Para>
3555
3556 <Para>
3557
3558 <ProgramListing>
3559 assert pred val ==> assertError "Main.hs|15" pred val
3560 </ProgramListing>
3561
3562 </Para>
3563
3564 <Para>
3565 The rewrite is only performed by the compiler when it spots
3566 applications of <Function>Exception.assert</Function>, so you can still define and
3567 use your own versions of <Function>assert</Function>, should you so wish. If not,
3568 import <Literal>Exception</Literal> to make use <Function>assert</Function> in your code.
3569 </Para>
3570
3571 <Para>
3572 To have the compiler ignore uses of assert, use the compiler option
3573 <Option>-fignore-asserts</Option>. <IndexTerm><Primary>-fignore-asserts option</Primary></IndexTerm> That is,
3574 expressions of the form <Literal>assert pred e</Literal> will be rewritten to <Literal>e</Literal>.
3575 </Para>
3576
3577 <Para>
3578 Assertion failures can be caught, see the documentation for the
3579 <literal>Exception</literal> library (<xref linkend="sec-Exception">)
3580 for the details.
3581 </Para>
3582
3583 </Sect1>
3584
3585 <Sect1 id="scoped-type-variables">
3586 <Title>Scoped Type Variables
3587 </Title>
3588
3589 <Para>
3590 A <Emphasis>pattern type signature</Emphasis> can introduce a <Emphasis>scoped type
3591 variable</Emphasis>.  For example
3592 </Para>
3593
3594 <Para>
3595
3596 <ProgramListing>
3597 f (xs::[a]) = ys ++ ys
3598            where
3599               ys :: [a]
3600               ys = reverse xs
3601 </ProgramListing>
3602
3603 </Para>
3604
3605 <Para>
3606 The pattern <Literal>(xs::[a])</Literal> includes a type signature for <VarName>xs</VarName>.
3607 This brings the type variable <Literal>a</Literal> into scope; it scopes over
3608 all the patterns and right hand sides for this equation for <Function>f</Function>.
3609 In particular, it is in scope at the type signature for <VarName>y</VarName>.
3610 </Para>
3611
3612 <Para>
3613 At ordinary type signatures, such as that for <VarName>ys</VarName>, any type variables
3614 mentioned in the type signature <Emphasis>that are not in scope</Emphasis> are
3615 implicitly universally quantified.  (If there are no type variables in
3616 scope, all type variables mentioned in the signature are universally
3617 quantified, which is just as in Haskell 98.)  In this case, since <VarName>a</VarName>
3618 is in scope, it is not universally quantified, so the type of <VarName>ys</VarName> is
3619 the same as that of <VarName>xs</VarName>.  In Haskell 98 it is not possible to declare
3620 a type for <VarName>ys</VarName>; a major benefit of scoped type variables is that
3621 it becomes possible to do so.
3622 </Para>
3623
3624 <Para>
3625 Scoped type variables are implemented in both GHC and Hugs.  Where the
3626 implementations differ from the specification below, those differences
3627 are noted.
3628 </Para>
3629
3630 <Para>
3631 So much for the basic idea.  Here are the details.
3632 </Para>
3633
3634 <Sect2>
3635 <Title>Scope and implicit quantification</Title>
3636
3637 <Para>
3638
3639 <ItemizedList>
3640 <ListItem>
3641
3642 <Para>
3643  All the type variables mentioned in the patterns for a single
3644 function definition equation, that are not already in scope,
3645 are brought into scope by the patterns.  We describe this set as
3646 the <Emphasis>type variables bound by the equation</Emphasis>.
3647
3648 </Para>
3649 </ListItem>
3650 <ListItem>
3651
3652 <Para>
3653  The type variables thus brought into scope may be mentioned
3654 in ordinary type signatures or pattern type signatures anywhere within
3655 their scope.
3656
3657 </Para>
3658 </ListItem>
3659 <ListItem>
3660
3661 <Para>
3662  In ordinary type signatures, any type variable mentioned in the
3663 signature that is in scope is <Emphasis>not</Emphasis> universally quantified.
3664
3665 </Para>
3666 </ListItem>
3667 <ListItem>
3668
3669 <Para>
3670  Ordinary type signatures do not bring any new type variables
3671 into scope (except in the type signature itself!). So this is illegal:
3672
3673
3674 <ProgramListing>
3675   f :: a -> a
3676   f x = x::a
3677 </ProgramListing>
3678
3679
3680 It's illegal because <VarName>a</VarName> is not in scope in the body of <Function>f</Function>,
3681 so the ordinary signature <Literal>x::a</Literal> is equivalent to <Literal>x::forall a.a</Literal>;
3682 and that is an incorrect typing.
3683
3684 </Para>
3685 </ListItem>
3686 <ListItem>
3687
3688 <Para>
3689  There is no implicit universal quantification on pattern type
3690 signatures, nor may one write an explicit <Literal>forall</Literal> type in a pattern
3691 type signature.  The pattern type signature is a monotype.
3692
3693 </Para>
3694 </ListItem>
3695 <ListItem>
3696
3697 <Para>
3698
3699 The type variables in the head of a <Literal>class</Literal> or <Literal>instance</Literal> declaration
3700 scope over the methods defined in the <Literal>where</Literal> part.  For example:
3701
3702
3703 <ProgramListing>
3704   class C a where
3705     op :: [a] -> a
3706
3707     op xs = let ys::[a]
3708                 ys = reverse xs
3709             in
3710             head ys
3711 </ProgramListing>
3712
3713
3714 (Not implemented in Hugs yet, Dec 98).
3715 </Para>
3716 </ListItem>
3717
3718 </ItemizedList>
3719
3720 </Para>
3721
3722 </Sect2>
3723
3724 <Sect2>
3725 <Title>Polymorphism</Title>
3726
3727 <Para>
3728
3729 <ItemizedList>
3730 <ListItem>
3731
3732 <Para>
3733  Pattern type signatures are completely orthogonal to ordinary, separate
3734 type signatures.  The two can be used independently or together.  There is
3735 no scoping associated with the names of the type variables in a separate type signature.
3736
3737
3738 <ProgramListing>
3739    f :: [a] -> [a]
3740    f (xs::[b]) = reverse xs
3741 </ProgramListing>
3742
3743
3744 </Para>
3745 </ListItem>
3746 <ListItem>
3747
3748 <Para>
3749  The function must be polymorphic in the type variables
3750 bound by all its equations.  Operationally, the type variables bound
3751 by one equation must not:
3752
3753
3754 <ItemizedList>
3755 <ListItem>
3756
3757 <Para>
3758  Be unified with a type (such as <Literal>Int</Literal>, or <Literal>[a]</Literal>).
3759 </Para>
3760 </ListItem>
3761 <ListItem>
3762
3763 <Para>
3764  Be unified with a type variable free in the environment.
3765 </Para>
3766 </ListItem>
3767 <ListItem>
3768
3769 <Para>
3770  Be unified with each other.  (They may unify with the type variables
3771 bound by another equation for the same function, of course.)
3772 </Para>
3773 </ListItem>
3774
3775 </ItemizedList>
3776
3777
3778 For example, the following all fail to type check:
3779
3780
3781 <ProgramListing>
3782   f (x::a) (y::b) = [x,y]       -- a unifies with b
3783
3784   g (x::a) = x + 1::Int         -- a unifies with Int
3785
3786   h x = let k (y::a) = [x,y]    -- a is free in the
3787         in k x                  -- environment
3788
3789   k (x::a) True    = ...        -- a unifies with Int
3790   k (x::Int) False = ...
3791
3792   w :: [b] -> [b]
3793   w (x::a) = x                  -- a unifies with [b]
3794 </ProgramListing>
3795
3796
3797 </Para>
3798 </ListItem>
3799 <ListItem>
3800
3801 <Para>
3802  The pattern-bound type variable may, however, be constrained
3803 by the context of the principal type, thus:
3804
3805
3806 <ProgramListing>
3807   f (x::a) (y::a) = x+y*2
3808 </ProgramListing>
3809
3810
3811 gets the inferred type: <Literal>forall a. Num a =&gt; a -&gt; a -&gt; a</Literal>.
3812 </Para>
3813 </ListItem>
3814
3815 </ItemizedList>
3816
3817 </Para>
3818
3819 </Sect2>
3820
3821 <Sect2>
3822 <Title>Result type signatures</Title>
3823
3824 <Para>
3825
3826 <ItemizedList>
3827 <ListItem>
3828
3829 <Para>
3830  The result type of a function can be given a signature,
3831 thus:
3832
3833
3834 <ProgramListing>
3835   f (x::a) :: [a] = [x,x,x]
3836 </ProgramListing>
3837
3838
3839 The final <Literal>:: [a]</Literal> after all the patterns gives a signature to the
3840 result type.  Sometimes this is the only way of naming the type variable
3841 you want:
3842
3843
3844 <ProgramListing>
3845   f :: Int -> [a] -> [a]
3846   f n :: ([a] -> [a]) = let g (x::a, y::a) = (y,x)
3847                         in \xs -> map g (reverse xs `zip` xs)
3848 </ProgramListing>
3849
3850
3851 </Para>
3852 </ListItem>
3853
3854 </ItemizedList>
3855
3856 </Para>
3857
3858 <Para>
3859 Result type signatures are not yet implemented in Hugs.
3860 </Para>
3861
3862 </Sect2>
3863
3864 <Sect2>
3865 <Title>Pattern signatures on other constructs</Title>
3866
3867 <Para>
3868
3869 <ItemizedList>
3870 <ListItem>
3871
3872 <Para>
3873  A pattern type signature can be on an arbitrary sub-pattern, not
3874 just on a variable:
3875
3876
3877 <ProgramListing>
3878   f ((x,y)::(a,b)) = (y,x) :: (b,a)
3879 </ProgramListing>
3880
3881
3882 </Para>
3883 </ListItem>
3884 <ListItem>
3885
3886 <Para>
3887  Pattern type signatures, including the result part, can be used
3888 in lambda abstractions:
3889
3890
3891 <ProgramListing>
3892   (\ (x::a, y) :: a -> x)
3893 </ProgramListing>
3894
3895
3896 Type variables bound by these patterns must be polymorphic in
3897 the sense defined above.
3898 For example:
3899
3900
3901 <ProgramListing>
3902   f1 (x::c) = f1 x      -- ok
3903   f2 = \(x::c) -> f2 x  -- not ok
3904 </ProgramListing>
3905
3906
3907 Here, <Function>f1</Function> is OK, but <Function>f2</Function> is not, because <VarName>c</VarName> gets unified
3908 with a type variable free in the environment, in this
3909 case, the type of <Function>f2</Function>, which is in the environment when
3910 the lambda abstraction is checked.
3911
3912 </Para>
3913 </ListItem>
3914 <ListItem>
3915
3916 <Para>
3917  Pattern type signatures, including the result part, can be used
3918 in <Literal>case</Literal> expressions:
3919
3920
3921 <ProgramListing>
3922   case e of { (x::a, y) :: a -> x }
3923 </ProgramListing>
3924
3925
3926 The pattern-bound type variables must, as usual,
3927 be polymorphic in the following sense: each case alternative,
3928 considered as a lambda abstraction, must be polymorphic.
3929 Thus this is OK:
3930
3931
3932 <ProgramListing>
3933   case (True,False) of { (x::a, y) -> x }
3934 </ProgramListing>
3935
3936
3937 Even though the context is that of a pair of booleans,
3938 the alternative itself is polymorphic.  Of course, it is
3939 also OK to say:
3940
3941
3942 <ProgramListing>
3943   case (True,False) of { (x::Bool, y) -> x }
3944 </ProgramListing>
3945
3946
3947 </Para>
3948 </ListItem>
3949 <ListItem>
3950
3951 <Para>
3952 To avoid ambiguity, the type after the &ldquo;<Literal>::</Literal>&rdquo; in a result
3953 pattern signature on a lambda or <Literal>case</Literal> must be atomic (i.e. a single
3954 token or a parenthesised type of some sort).  To see why,
3955 consider how one would parse this:
3956
3957
3958 <ProgramListing>
3959   \ x :: a -> b -> x
3960 </ProgramListing>
3961
3962
3963 </Para>
3964 </ListItem>
3965 <ListItem>
3966
3967 <Para>
3968  Pattern type signatures that bind new type variables
3969 may not be used in pattern bindings at all.
3970 So this is illegal:
3971
3972
3973 <ProgramListing>
3974   f x = let (y, z::a) = x in ...
3975 </ProgramListing>
3976
3977
3978 But these are OK, because they do not bind fresh type variables:
3979
3980
3981 <ProgramListing>
3982   f1 x            = let (y, z::Int) = x in ...
3983   f2 (x::(Int,a)) = let (y, z::a)   = x in ...
3984 </ProgramListing>
3985
3986
3987 However a single variable is considered a degenerate function binding,
3988 rather than a degerate pattern binding, so this is permitted, even
3989 though it binds a type variable:
3990
3991
3992 <ProgramListing>
3993   f :: (b->b) = \(x::b) -> x
3994 </ProgramListing>
3995
3996
3997 </Para>
3998 </ListItem>
3999
4000 </ItemizedList>
4001
4002 Such degnerate function bindings do not fall under the monomorphism
4003 restriction.  Thus:
4004 </Para>
4005
4006 <Para>
4007
4008 <ProgramListing>
4009   g :: a -> a -> Bool = \x y. x==y
4010 </ProgramListing>
4011
4012 </Para>
4013
4014 <Para>
4015 Here <Function>g</Function> has type <Literal>forall a. Eq a =&gt; a -&gt; a -&gt; Bool</Literal>, just as if
4016 <Function>g</Function> had a separate type signature.  Lacking a type signature, <Function>g</Function>
4017 would get a monomorphic type.
4018 </Para>
4019
4020 </Sect2>
4021
4022 <Sect2>
4023 <Title>Existentials</Title>
4024
4025 <Para>
4026
4027 <ItemizedList>
4028 <ListItem>
4029
4030 <Para>
4031  Pattern type signatures can bind existential type variables.
4032 For example:
4033
4034
4035 <ProgramListing>
4036   data T = forall a. MkT [a]
4037
4038   f :: T -> T
4039   f (MkT [t::a]) = MkT t3
4040                  where
4041                    t3::[a] = [t,t,t]
4042 </ProgramListing>
4043
4044
4045 </Para>
4046 </ListItem>
4047
4048 </ItemizedList>
4049
4050 </Para>
4051
4052 </Sect2>
4053
4054 </Sect1>
4055
4056 <Sect1 id="pragmas">
4057 <Title>Pragmas
4058 </Title>
4059
4060 <Para>
4061 GHC supports several pragmas, or instructions to the compiler placed
4062 in the source code.  Pragmas don't affect the meaning of the program,
4063 but they might affect the efficiency of the generated code.
4064 </Para>
4065
4066 <Sect2 id="inline-pragma">
4067 <Title>INLINE pragma
4068
4069 <IndexTerm><Primary>INLINE pragma</Primary></IndexTerm>
4070 <IndexTerm><Primary>pragma, INLINE</Primary></IndexTerm></Title>
4071
4072 <Para>
4073 GHC (with <Option>-O</Option>, as always) tries to inline (or &ldquo;unfold&rdquo;)
4074 functions/values that are &ldquo;small enough,&rdquo; thus avoiding the call
4075 overhead and possibly exposing other more-wonderful optimisations.
4076 </Para>
4077
4078 <Para>
4079 You will probably see these unfoldings (in Core syntax) in your
4080 interface files.
4081 </Para>
4082
4083 <Para>
4084 Normally, if GHC decides a function is &ldquo;too expensive&rdquo; to inline, it
4085 will not do so, nor will it export that unfolding for other modules to
4086 use.
4087 </Para>
4088
4089 <Para>
4090 The sledgehammer you can bring to bear is the
4091 <Literal>INLINE</Literal><IndexTerm><Primary>INLINE pragma</Primary></IndexTerm> pragma, used thusly:
4092
4093 <ProgramListing>
4094 key_function :: Int -> String -> (Bool, Double)
4095
4096 #ifdef __GLASGOW_HASKELL__
4097 {-# INLINE key_function #-}
4098 #endif
4099 </ProgramListing>
4100
4101 (You don't need to do the C pre-processor carry-on unless you're going
4102 to stick the code through HBC&mdash;it doesn't like <Literal>INLINE</Literal> pragmas.)
4103 </Para>
4104
4105 <Para>
4106 The major effect of an <Literal>INLINE</Literal> pragma is to declare a function's
4107 &ldquo;cost&rdquo; to be very low.  The normal unfolding machinery will then be
4108 very keen to inline it.
4109 </Para>
4110
4111 <Para>
4112 An <Literal>INLINE</Literal> pragma for a function can be put anywhere its type
4113 signature could be put.
4114 </Para>
4115
4116 <Para>
4117 <Literal>INLINE</Literal> pragmas are a particularly good idea for the
4118 <Literal>then</Literal>/<Literal>return</Literal> (or <Literal>bind</Literal>/<Literal>unit</Literal>) functions in a monad.
4119 For example, in GHC's own <Literal>UniqueSupply</Literal> monad code, we have:
4120
4121 <ProgramListing>
4122 #ifdef __GLASGOW_HASKELL__
4123 {-# INLINE thenUs #-}
4124 {-# INLINE returnUs #-}
4125 #endif
4126 </ProgramListing>
4127
4128 </Para>
4129
4130 </Sect2>
4131
4132 <Sect2 id="noinline-pragma">
4133 <Title>NOINLINE pragma
4134 </Title>
4135
4136 <Para>
4137 <IndexTerm><Primary>NOINLINE pragma</Primary></IndexTerm>
4138 <IndexTerm><Primary>pragma, NOINLINE</Primary></IndexTerm>
4139 </Para>
4140
4141 <Para>
4142 The <Literal>NOINLINE</Literal> pragma does exactly what you'd expect: it stops the
4143 named function from being inlined by the compiler.  You shouldn't ever
4144 need to do this, unless you're very cautious about code size.
4145 </Para>
4146
4147 </Sect2>
4148
4149 <Sect2 id="specialize-pragma">
4150 <Title>SPECIALIZE pragma
4151 </Title>
4152
4153 <Para>
4154 <IndexTerm><Primary>SPECIALIZE pragma</Primary></IndexTerm>
4155 <IndexTerm><Primary>pragma, SPECIALIZE</Primary></IndexTerm>
4156 <IndexTerm><Primary>overloading, death to</Primary></IndexTerm>
4157 </Para>
4158
4159 <Para>
4160 (UK spelling also accepted.)  For key overloaded functions, you can
4161 create extra versions (NB: more code space) specialised to particular
4162 types.  Thus, if you have an overloaded function:
4163 </Para>
4164
4165 <Para>
4166
4167 <ProgramListing>
4168 hammeredLookup :: Ord key => [(key, value)] -> key -> value
4169 </ProgramListing>
4170
4171 </Para>
4172
4173 <Para>
4174 If it is heavily used on lists with <Literal>Widget</Literal> keys, you could
4175 specialise it as follows:
4176
4177 <ProgramListing>
4178 {-# SPECIALIZE hammeredLookup :: [(Widget, value)] -> Widget -> value #-}
4179 </ProgramListing>
4180
4181 </Para>
4182
4183 <Para>
4184 To get very fancy, you can also specify a named function to use for
4185 the specialised value, by adding <Literal>= blah</Literal>, as in:
4186
4187 <ProgramListing>
4188 {-# SPECIALIZE hammeredLookup :: ...as before... = blah #-}
4189 </ProgramListing>
4190
4191 It's <Emphasis>Your Responsibility</Emphasis> to make sure that <Function>blah</Function> really
4192 behaves as a specialised version of <Function>hammeredLookup</Function>!!!
4193 </Para>
4194
4195 <Para>
4196 NOTE: the <Literal>=blah</Literal> feature isn't implemented in GHC 4.xx.
4197 </Para>
4198
4199 <Para>
4200 An example in which the <Literal>= blah</Literal> form will Win Big:
4201
4202 <ProgramListing>
4203 toDouble :: Real a => a -> Double
4204 toDouble = fromRational . toRational
4205
4206 {-# SPECIALIZE toDouble :: Int -> Double = i2d #-}
4207 i2d (I# i) = D# (int2Double# i) -- uses Glasgow prim-op directly
4208 </ProgramListing>
4209
4210 The <Function>i2d</Function> function is virtually one machine instruction; the
4211 default conversion&mdash;via an intermediate <Literal>Rational</Literal>&mdash;is obscenely
4212 expensive by comparison.
4213 </Para>
4214
4215 <Para>
4216 By using the US spelling, your <Literal>SPECIALIZE</Literal> pragma will work with
4217 HBC, too.  Note that HBC doesn't support the <Literal>= blah</Literal> form.
4218 </Para>
4219
4220 <Para>
4221 A <Literal>SPECIALIZE</Literal> pragma for a function can be put anywhere its type
4222 signature could be put.
4223 </Para>
4224
4225 </Sect2>
4226
4227 <Sect2 id="specialize-instance-pragma">
4228 <Title>SPECIALIZE instance pragma
4229 </Title>
4230
4231 <Para>
4232 <IndexTerm><Primary>SPECIALIZE pragma</Primary></IndexTerm>
4233 <IndexTerm><Primary>overloading, death to</Primary></IndexTerm>
4234 Same idea, except for instance declarations.  For example:
4235
4236 <ProgramListing>
4237 instance (Eq a) => Eq (Foo a) where { ... usual stuff ... }
4238
4239 {-# SPECIALIZE instance Eq (Foo [(Int, Bar)] #-}
4240 </ProgramListing>
4241
4242 Compatible with HBC, by the way.
4243 </Para>
4244
4245 </Sect2>
4246
4247 <Sect2 id="line-pragma">
4248 <Title>LINE pragma
4249 </Title>
4250
4251 <Para>
4252 <IndexTerm><Primary>LINE pragma</Primary></IndexTerm>
4253 <IndexTerm><Primary>pragma, LINE</Primary></IndexTerm>
4254 </Para>
4255
4256 <Para>
4257 This pragma is similar to C's <Literal>&num;line</Literal> pragma, and is mainly for use in
4258 automatically generated Haskell code.  It lets you specify the line
4259 number and filename of the original code; for example
4260 </Para>
4261
4262 <Para>
4263
4264 <ProgramListing>
4265 {-# LINE 42 "Foo.vhs" #-}
4266 </ProgramListing>
4267
4268 </Para>
4269
4270 <Para>
4271 if you'd generated the current file from something called <Filename>Foo.vhs</Filename>
4272 and this line corresponds to line 42 in the original.  GHC will adjust
4273 its error messages to refer to the line/file named in the <Literal>LINE</Literal>
4274 pragma.
4275 </Para>
4276
4277 </Sect2>
4278
4279 <Sect2>
4280 <Title>RULES pragma</Title>
4281
4282 <Para>
4283 The RULES pragma lets you specify rewrite rules.  It is described in
4284 <XRef LinkEnd="rewrite-rules">.
4285 </Para>
4286
4287 </Sect2>
4288
4289 </Sect1>
4290
4291 <Sect1 id="rewrite-rules">
4292 <Title>Rewrite rules
4293
4294 <IndexTerm><Primary>RULES pagma</Primary></IndexTerm>
4295 <IndexTerm><Primary>pragma, RULES</Primary></IndexTerm>
4296 <IndexTerm><Primary>rewrite rules</Primary></IndexTerm></Title>
4297
4298 <Para>
4299 The programmer can specify rewrite rules as part of the source program
4300 (in a pragma).  GHC applies these rewrite rules wherever it can.
4301 </Para>
4302
4303 <Para>
4304 Here is an example:
4305
4306 <ProgramListing>
4307   {-# RULES
4308         "map/map"       forall f g xs. map f (map g xs) = map (f.g) xs
4309   #-}
4310 </ProgramListing>
4311
4312 </Para>
4313
4314 <Sect2>
4315 <Title>Syntax</Title>
4316
4317 <Para>
4318 From a syntactic point of view:
4319
4320 <ItemizedList>
4321 <ListItem>
4322
4323 <Para>
4324  Each rule has a name, enclosed in double quotes.  The name itself has
4325 no significance at all.  It is only used when reporting how many times the rule fired.
4326 </Para>
4327 </ListItem>
4328 <ListItem>
4329
4330 <Para>
4331  There may be zero or more rules in a <Literal>RULES</Literal> pragma.
4332 </Para>
4333 </ListItem>
4334 <ListItem>
4335
4336 <Para>
4337  Layout applies in a <Literal>RULES</Literal> pragma.  Currently no new indentation level
4338 is set, so you must lay out your rules starting in the same column as the
4339 enclosing definitions.
4340 </Para>
4341 </ListItem>
4342 <ListItem>
4343
4344 <Para>
4345  Each variable mentioned in a rule must either be in scope (e.g. <Function>map</Function>),
4346 or bound by the <Literal>forall</Literal> (e.g. <Function>f</Function>, <Function>g</Function>, <Function>xs</Function>).  The variables bound by
4347 the <Literal>forall</Literal> are called the <Emphasis>pattern</Emphasis> variables.  They are separated
4348 by spaces, just like in a type <Literal>forall</Literal>.
4349 </Para>
4350 </ListItem>
4351 <ListItem>
4352
4353 <Para>
4354  A pattern variable may optionally have a type signature.
4355 If the type of the pattern variable is polymorphic, it <Emphasis>must</Emphasis> have a type signature.
4356 For example, here is the <Literal>foldr/build</Literal> rule:
4357
4358 <ProgramListing>
4359 "fold/build"  forall k z (g::forall b. (a->b->b) -> b -> b) .
4360               foldr k z (build g) = g k z
4361 </ProgramListing>
4362
4363 Since <Function>g</Function> has a polymorphic type, it must have a type signature.
4364
4365 </Para>
4366 </ListItem>
4367 <ListItem>
4368
4369 <Para>
4370 The left hand side of a rule must consist of a top-level variable applied
4371 to arbitrary expressions.  For example, this is <Emphasis>not</Emphasis> OK:
4372
4373 <ProgramListing>
4374 "wrong1"   forall e1 e2.  case True of { True -> e1; False -> e2 } = e1
4375 "wrong2"   forall f.      f True = True
4376 </ProgramListing>
4377
4378 In <Literal>"wrong1"</Literal>, the LHS is not an application; in <Literal>"wrong1"</Literal>, the LHS has a pattern variable
4379 in the head.
4380 </Para>
4381 </ListItem>
4382 <ListItem>
4383
4384 <Para>
4385  A rule does not need to be in the same module as (any of) the
4386 variables it mentions, though of course they need to be in scope.
4387 </Para>
4388 </ListItem>
4389 <ListItem>
4390
4391 <Para>
4392  Rules are automatically exported from a module, just as instance declarations are.
4393 </Para>
4394 </ListItem>
4395
4396 </ItemizedList>
4397
4398 </Para>
4399
4400 </Sect2>
4401
4402 <Sect2>
4403 <Title>Semantics</Title>
4404
4405 <Para>
4406 From a semantic point of view:
4407
4408 <ItemizedList>
4409 <ListItem>
4410
4411 <Para>
4412 Rules are only applied if you use the <Option>-O</Option> flag.
4413 </Para>
4414 </ListItem>
4415
4416 <ListItem>
4417 <Para>
4418  Rules are regarded as left-to-right rewrite rules.
4419 When GHC finds an expression that is a substitution instance of the LHS
4420 of a rule, it replaces the expression by the (appropriately-substituted) RHS.
4421 By "a substitution instance" we mean that the LHS can be made equal to the
4422 expression by substituting for the pattern variables.
4423
4424 </Para>
4425 </ListItem>
4426 <ListItem>
4427
4428 <Para>
4429  The LHS and RHS of a rule are typechecked, and must have the
4430 same type.
4431
4432 </Para>
4433 </ListItem>
4434 <ListItem>
4435
4436 <Para>
4437  GHC makes absolutely no attempt to verify that the LHS and RHS
4438 of a rule have the same meaning.  That is undecideable in general, and
4439 infeasible in most interesting cases.  The responsibility is entirely the programmer's!
4440
4441 </Para>
4442 </ListItem>
4443 <ListItem>
4444
4445 <Para>
4446  GHC makes no attempt to make sure that the rules are confluent or
4447 terminating.  For example:
4448
4449 <ProgramListing>
4450   "loop"        forall x,y.  f x y = f y x
4451 </ProgramListing>
4452
4453 This rule will cause the compiler to go into an infinite loop.
4454
4455 </Para>
4456 </ListItem>
4457 <ListItem>
4458
4459 <Para>
4460  If more than one rule matches a call, GHC will choose one arbitrarily to apply.
4461
4462 </Para>
4463 </ListItem>
4464 <ListItem>
4465 <Para>
4466  GHC currently uses a very simple, syntactic, matching algorithm
4467 for matching a rule LHS with an expression.  It seeks a substitution
4468 which makes the LHS and expression syntactically equal modulo alpha
4469 conversion.  The pattern (rule), but not the expression, is eta-expanded if
4470 necessary.  (Eta-expanding the epression can lead to laziness bugs.)
4471 But not beta conversion (that's called higher-order matching).
4472 </Para>
4473
4474 <Para>
4475 Matching is carried out on GHC's intermediate language, which includes
4476 type abstractions and applications.  So a rule only matches if the
4477 types match too.  See <XRef LinkEnd="rule-spec"> below.
4478 </Para>
4479 </ListItem>
4480 <ListItem>
4481
4482 <Para>
4483  GHC keeps trying to apply the rules as it optimises the program.
4484 For example, consider:
4485
4486 <ProgramListing>
4487   let s = map f
4488       t = map g
4489   in
4490   s (t xs)
4491 </ProgramListing>
4492
4493 The expression <Literal>s (t xs)</Literal> does not match the rule <Literal>"map/map"</Literal>, but GHC
4494 will substitute for <VarName>s</VarName> and <VarName>t</VarName>, giving an expression which does match.
4495 If <VarName>s</VarName> or <VarName>t</VarName> was (a) used more than once, and (b) large or a redex, then it would
4496 not be substituted, and the rule would not fire.
4497
4498 </Para>
4499 </ListItem>
4500 <ListItem>
4501
4502 <Para>
4503  In the earlier phases of compilation, GHC inlines <Emphasis>nothing
4504 that appears on the LHS of a rule</Emphasis>, because once you have substituted
4505 for something you can't match against it (given the simple minded
4506 matching).  So if you write the rule
4507
4508 <ProgramListing>
4509         "map/map"       forall f,g.  map f . map g = map (f.g)
4510 </ProgramListing>
4511
4512 this <Emphasis>won't</Emphasis> match the expression <Literal>map f (map g xs)</Literal>.
4513 It will only match something written with explicit use of ".".
4514 Well, not quite.  It <Emphasis>will</Emphasis> match the expression
4515
4516 <ProgramListing>
4517 wibble f g xs
4518 </ProgramListing>
4519
4520 where <Function>wibble</Function> is defined:
4521
4522 <ProgramListing>
4523 wibble f g = map f . map g
4524 </ProgramListing>
4525
4526 because <Function>wibble</Function> will be inlined (it's small).
4527
4528 Later on in compilation, GHC starts inlining even things on the
4529 LHS of rules, but still leaves the rules enabled.  This inlining
4530 policy is controlled by the per-simplification-pass flag <Option>-finline-phase</Option><Emphasis>n</Emphasis>.
4531
4532 </Para>
4533 </ListItem>
4534 <ListItem>
4535
4536 <Para>
4537  All rules are implicitly exported from the module, and are therefore
4538 in force in any module that imports the module that defined the rule, directly
4539 or indirectly.  (That is, if A imports B, which imports C, then C's rules are
4540 in force when compiling A.)  The situation is very similar to that for instance
4541 declarations.
4542 </Para>
4543 </ListItem>
4544
4545 </ItemizedList>
4546
4547 </Para>
4548
4549 </Sect2>
4550
4551 <Sect2>
4552 <Title>List fusion</Title>
4553
4554 <Para>
4555 The RULES mechanism is used to implement fusion (deforestation) of common list functions.
4556 If a "good consumer" consumes an intermediate list constructed by a "good producer", the
4557 intermediate list should be eliminated entirely.
4558 </Para>
4559
4560 <Para>
4561 The following are good producers:
4562
4563 <ItemizedList>
4564 <ListItem>
4565
4566 <Para>
4567  List comprehensions
4568 </Para>
4569 </ListItem>
4570 <ListItem>
4571
4572 <Para>
4573  Enumerations of <Literal>Int</Literal> and <Literal>Char</Literal> (e.g. <Literal>['a'..'z']</Literal>).
4574 </Para>
4575 </ListItem>
4576 <ListItem>
4577
4578 <Para>
4579  Explicit lists (e.g. <Literal>[True, False]</Literal>)
4580 </Para>
4581 </ListItem>
4582 <ListItem>
4583
4584 <Para>
4585  The cons constructor (e.g <Literal>3:4:[]</Literal>)
4586 </Para>
4587 </ListItem>
4588 <ListItem>
4589
4590 <Para>
4591  <Function>++</Function>
4592 </Para>
4593 </ListItem>
4594 <ListItem>
4595
4596 <Para>
4597  <Function>map</Function>
4598 </Para>
4599 </ListItem>
4600 <ListItem>
4601
4602 <Para>
4603  <Function>filter</Function>
4604 </Para>
4605 </ListItem>
4606 <ListItem>
4607
4608 <Para>
4609  <Function>iterate</Function>, <Function>repeat</Function>
4610 </Para>
4611 </ListItem>
4612 <ListItem>
4613
4614 <Para>
4615  <Function>zip</Function>, <Function>zipWith</Function>
4616 </Para>
4617 </ListItem>
4618
4619 </ItemizedList>
4620
4621 </Para>
4622
4623 <Para>
4624 The following are good consumers:
4625
4626 <ItemizedList>
4627 <ListItem>
4628
4629 <Para>
4630  List comprehensions
4631 </Para>
4632 </ListItem>
4633 <ListItem>
4634
4635 <Para>
4636  <Function>array</Function> (on its second argument)
4637 </Para>
4638 </ListItem>
4639 <ListItem>
4640
4641 <Para>
4642  <Function>length</Function>
4643 </Para>
4644 </ListItem>
4645 <ListItem>
4646
4647 <Para>
4648  <Function>++</Function> (on its first argument)
4649 </Para>
4650 </ListItem>
4651 <ListItem>
4652
4653 <Para>
4654  <Function>map</Function>
4655 </Para>
4656 </ListItem>
4657 <ListItem>
4658
4659 <Para>
4660  <Function>filter</Function>
4661 </Para>
4662 </ListItem>
4663 <ListItem>
4664
4665 <Para>
4666  <Function>concat</Function>
4667 </Para>
4668 </ListItem>
4669 <ListItem>
4670
4671 <Para>
4672  <Function>unzip</Function>, <Function>unzip2</Function>, <Function>unzip3</Function>, <Function>unzip4</Function>
4673 </Para>
4674 </ListItem>
4675 <ListItem>
4676
4677 <Para>
4678  <Function>zip</Function>, <Function>zipWith</Function> (but on one argument only; if both are good producers, <Function>zip</Function>
4679 will fuse with one but not the other)
4680 </Para>
4681 </ListItem>
4682 <ListItem>
4683
4684 <Para>
4685  <Function>partition</Function>
4686 </Para>
4687 </ListItem>
4688 <ListItem>
4689
4690 <Para>
4691  <Function>head</Function>
4692 </Para>
4693 </ListItem>
4694 <ListItem>
4695
4696 <Para>
4697  <Function>and</Function>, <Function>or</Function>, <Function>any</Function>, <Function>all</Function>
4698 </Para>
4699 </ListItem>
4700 <ListItem>
4701
4702 <Para>
4703  <Function>sequence&lowbar;</Function>
4704 </Para>
4705 </ListItem>
4706 <ListItem>
4707
4708 <Para>
4709  <Function>msum</Function>
4710 </Para>
4711 </ListItem>
4712 <ListItem>
4713
4714 <Para>
4715  <Function>sortBy</Function>
4716 </Para>
4717 </ListItem>
4718
4719 </ItemizedList>
4720
4721 </Para>
4722
4723 <Para>
4724 So, for example, the following should generate no intermediate lists:
4725
4726 <ProgramListing>
4727 array (1,10) [(i,i*i) | i &#60;- map (+ 1) [0..9]]
4728 </ProgramListing>
4729
4730 </Para>
4731
4732 <Para>
4733 This list could readily be extended; if there are Prelude functions that you use
4734 a lot which are not included, please tell us.
4735 </Para>
4736
4737 <Para>
4738 If you want to write your own good consumers or producers, look at the
4739 Prelude definitions of the above functions to see how to do so.
4740 </Para>
4741
4742 </Sect2>
4743
4744 <Sect2 id="rule-spec">
4745 <Title>Specialisation
4746 </Title>
4747
4748 <Para>
4749 Rewrite rules can be used to get the same effect as a feature
4750 present in earlier version of GHC:
4751
4752 <ProgramListing>
4753   {-# SPECIALIZE fromIntegral :: Int8 -> Int16 = int8ToInt16 #-}
4754 </ProgramListing>
4755
4756 This told GHC to use <Function>int8ToInt16</Function> instead of <Function>fromIntegral</Function> whenever
4757 the latter was called with type <Literal>Int8 -&gt; Int16</Literal>.  That is, rather than
4758 specialising the original definition of <Function>fromIntegral</Function> the programmer is
4759 promising that it is safe to use <Function>int8ToInt16</Function> instead.
4760 </Para>
4761
4762 <Para>
4763 This feature is no longer in GHC.  But rewrite rules let you do the
4764 same thing:
4765
4766 <ProgramListing>
4767 {-# RULES
4768   "fromIntegral/Int8/Int16" fromIntegral = int8ToInt16
4769 #-}
4770 </ProgramListing>
4771
4772 This slightly odd-looking rule instructs GHC to replace <Function>fromIntegral</Function>
4773 by <Function>int8ToInt16</Function> <Emphasis>whenever the types match</Emphasis>.  Speaking more operationally,
4774 GHC adds the type and dictionary applications to get the typed rule
4775
4776 <ProgramListing>
4777 forall (d1::Integral Int8) (d2::Num Int16) .
4778         fromIntegral Int8 Int16 d1 d2 = int8ToInt16
4779 </ProgramListing>
4780
4781 What is more,
4782 this rule does not need to be in the same file as fromIntegral,
4783 unlike the <Literal>SPECIALISE</Literal> pragmas which currently do (so that they
4784 have an original definition available to specialise).
4785 </Para>
4786
4787 </Sect2>
4788
4789 <Sect2>
4790 <Title>Controlling what's going on</Title>
4791
4792 <Para>
4793
4794 <ItemizedList>
4795 <ListItem>
4796
4797 <Para>
4798  Use <Option>-ddump-rules</Option> to see what transformation rules GHC is using.
4799 </Para>
4800 </ListItem>
4801 <ListItem>
4802
4803 <Para>
4804  Use <Option>-ddump-simpl-stats</Option> to see what rules are being fired.
4805 If you add <Option>-dppr-debug</Option> you get a more detailed listing.
4806 </Para>
4807 </ListItem>
4808 <ListItem>
4809
4810 <Para>
4811  The defintion of (say) <Function>build</Function> in <FileName>PrelBase.lhs</FileName> looks llike this:
4812
4813 <ProgramListing>
4814         build   :: forall a. (forall b. (a -> b -> b) -> b -> b) -> [a]
4815         {-# INLINE build #-}
4816         build g = g (:) []
4817 </ProgramListing>
4818
4819 Notice the <Literal>INLINE</Literal>!  That prevents <Literal>(:)</Literal> from being inlined when compiling
4820 <Literal>PrelBase</Literal>, so that an importing module will &ldquo;see&rdquo; the <Literal>(:)</Literal>, and can
4821 match it on the LHS of a rule.  <Literal>INLINE</Literal> prevents any inlining happening
4822 in the RHS of the <Literal>INLINE</Literal> thing.  I regret the delicacy of this.
4823
4824 </Para>
4825 </ListItem>
4826 <ListItem>
4827
4828 <Para>
4829  In <Filename>ghc/lib/std/PrelBase.lhs</Filename> look at the rules for <Function>map</Function> to
4830 see how to write rules that will do fusion and yet give an efficient
4831 program even if fusion doesn't happen.  More rules in <Filename>PrelList.lhs</Filename>.
4832 </Para>
4833 </ListItem>
4834
4835 </ItemizedList>
4836
4837 </Para>
4838
4839 </Sect2>
4840
4841 </Sect1>
4842
4843 <!-- Emacs stuff:
4844      ;;; Local Variables: ***
4845      ;;; mode: sgml ***
4846      ;;; sgml-parent-document: ("users_guide.sgml" "book" "chapter" "sect1") ***
4847      ;;; End: ***
4848  -->