ghc/docs/users_guide/glasgow_exts.vsgml

   1 %
   2 % $Id: glasgow_exts.vsgml,v 1.20 1999/11/25 10:28:41 simonpj Exp $
   3 %
   4 % GHC Language Extensions.
   5 %
   6
   7 As with all known Haskell systems, GHC implements some extensions to
   8 the language.  To use them, you'll need to give a @-fglasgow-exts@%
   9 <nidx>-fglasgow-exts option</nidx> option.
  10
  11 Virtually all of the Glasgow extensions serve to give you access to
  12 the underlying facilities with which we implement Haskell.  Thus, you
  13 can get at the Raw Iron, if you are willing to write some non-standard
  14 code at a more primitive level.  You need not be ``stuck'' on
  15 performance because of the implementation costs of Haskell's
  16 ``high-level'' features---you can always code ``under'' them.  In an
  17 extreme case, you can write all your time-critical code in C, and then
  18 just glue it together with Haskell!
  19
  20 Executive summary of our extensions:
  21
  22 <descrip>
  23
  24 <tag>Unboxed types and primitive operations:</tag>
  25
  26 You can get right down to the raw machine types and operations;
  27 included in this are ``primitive arrays'' (direct access to Big Wads
  28 of Bytes).  Please see Section <ref name="Unboxed types"
  29 id="glasgow-unboxed"> and following.
  30
  31 <tag>Multi-parameter type classes:</tag>
  32
  33 GHC's type system supports extended type classes with multiple
  34 parameters.  Please see Section <ref name="Mult-parameter type
  35 classes" id="multi-param-type-classes">.
  36
  37 <tag>Local universal quantification:</tag>
  38
  39 GHC's type system supports explicit universal quantification in
  40 constructor fields and function arguments.  This is useful for things
  41 like defining @runST@ from the state-thread world.  See Section <ref
  42 name="Local universal quantification" id="universal-quantification">.
  43
  44 <tag>Extistentially quantification in data types:</tag>
  45
  46 Some or all of the type variables in a datatype declaration may be
  47 <em>existentially quantified</em>.  More details in Section <ref
  48 name="Existential Quantification" id="existential-quantification">.
  49
  50 <tag>Scoped type variables:</tag>
  51
  52 Scoped type variables enable the programmer to supply type signatures
  53 for some nested declarations, where this would not be legal in Haskell
  54 98.  Details in Section <ref name="Scoped Type Variables"
  55 id="scoped-type-variables">.
  56
  57 <tag>Calling out to C:</tag>
  58
  59 Just what it sounds like.  We provide <em>lots</em> of rope that you
  60 can dangle around your neck.  Please see Section <ref name="Calling~C
  61 directly from Haskell" id="glasgow-ccalls">.
  62
  63 <tag>Pragmas</tag>
  64
  65 Pragmas are special instructions to the compiler placed in the source
  66 file.  The pragmas GHC supports are described in Section <ref
  67 name="Pragmas" id="pragmas">.
  68
  69 <tag>Rewrite rules:</tag>
  70
  71 The programmer can specify rewrite rules as part of the source program
  72 (in a pragma).  GHC applies these rewrite rules wherever it can.
  73 Details in Section <ref name="Rewrite Rules"
  74 id="rewrite-rules">.
  75
  76 <tag>Pattern guards</tag>
  77
  78 add a more flexible syntax and semantics for guards in function definitions.
  79 This gives expressiveness somewhat comparable to that of ``views''.
  80 </descrip>
  81
  82 Before you get too carried away working at the lowest level (e.g.,
  83 sloshing @MutableByteArray#@s around your program), you may wish to
  84 check if there are system libraries that provide a ``Haskellised
  85 veneer'' over the features you want.  See Section <ref name="GHC
  86 Prelude and libraries" id="ghc-prelude">.
  87
  88 %************************************************************************
  89 %*                                                                      *
  90 <sect1>Unboxed types
  91 <label id="glasgow-unboxed">
  92 <p>
  93 <nidx>Unboxed types (Glasgow extension)</nidx>
  94 %*                                                                      *
  95 %************************************************************************
  96
  97 These types correspond to the ``raw machine'' types you would use in
  98 C: @Int#@ (long int), @Double#@ (double), @Addr#@ (void *), etc.  The
  99 <em>primitive operations</em> (PrimOps) on these types are what you
 100 might expect; e.g., @(+#)@ is addition on @Int#@s, and is the
 101 machine-addition that we all know and love---usually one instruction.
 102
 103 There are some restrictions on the use of unboxed types, the main one
 104 being that you can't pass an unboxed value to a polymorphic function
 105 or store one in a polymorphic data type.  This rules out things like
 106 @[Int#]@ (ie. lists of unboxed integers).  The reason for this
 107 restriction is that polymorphic arguments and constructor fields are
 108 assumed to be pointers: if an unboxed integer is stored in one of
 109 these, the garbage collector would attempt to follow it, leading to
 110 unpredictable space leaks.  Or a @seq@ operation on the polymorphic
 111 component may attempt to dereference the pointer, with disastrous
 112 results.  Even worse, the unboxed value might be larger than a pointer
 113 (@Double#@ for instance).
 114
 115 Nevertheless, A numerically-intensive program using unboxed types can
 116 go a <em>lot</em> faster than its ``standard'' counterpart---we saw a
 117 threefold speedup on one example.
 118
 119 Please see Section <ref name="The module PrelGHC: really primitive
 120 stuff" id="ghc-libs-ghc"> for the details of unboxed types and the
 121 operations on them.
 122
 123 %************************************************************************
 124 %*                                                                      *
 125 <sect1>Primitive state-transformer monad
 126 <label id="glasgow-ST-monad">
 127 <p>
 128 <nidx>state transformers (Glasgow extensions)</nidx>
 129 <nidx>ST monad (Glasgow extension)</nidx>
 130 %*                                                                      *
 131 %************************************************************************
 132
 133 This monad underlies our implementation of arrays, mutable and
 134 immutable, and our implementation of I/O, including ``C calls''.
 135
 136 The @ST@ library, which provides access to the @ST@ monad, is a
 137 GHC/Hugs extension library and is described in the separate <htmlurl
 138 name="GHC/Hugs Extension Libraries" url="libs.html"> document.
 139
 140 %************************************************************************
 141 %*                                                                      *
 142 <sect1>Primitive arrays, mutable and otherwise
 143 <label id="glasgow-prim-arrays">
 144 <p>
 145 <nidx>primitive arrays (Glasgow extension)</nidx>
 146 <nidx>arrays, primitive (Glasgow extension)</nidx>
 147 %*                                                                      *
 148 %************************************************************************
 149
 150 GHC knows about quite a few flavours of Large Swathes of Bytes.
 151
 152 First, GHC distinguishes between primitive arrays of (boxed) Haskell
 153 objects (type @Array# obj@) and primitive arrays of bytes (type
 154 @ByteArray#@).
 155
 156 Second, it distinguishes between...
 157 <descrip>
 158 <tag>Immutable:</tag>
 159 Arrays that do not change (as with ``standard'' Haskell arrays); you
 160 can only read from them.  Obviously, they do not need the care and
 161 attention of the state-transformer monad.
 162
 163 <tag>Mutable:</tag>
 164 Arrays that may be changed or ``mutated.''  All the operations on them
 165 live within the state-transformer monad and the updates happen
 166 <em>in-place</em>.
 167
 168 <tag>``Static'' (in C land):</tag>
 169 A C routine may pass an @Addr#@ pointer back into Haskell land.  There
 170 are then primitive operations with which you may merrily grab values
 171 over in C land, by indexing off the ``static'' pointer.
 172
 173 <tag>``Stable'' pointers:</tag>
 174 If, for some reason, you wish to hand a Haskell pointer (i.e.,
 175 <em>not</em> an unboxed value) to a C routine, you first make the
 176 pointer ``stable,'' so that the garbage collector won't forget that it
 177 exists.  That is, GHC provides a safe way to pass Haskell pointers to
 178 C.
 179
 180 Please see Section <ref name="Subverting automatic unboxing with
 181 ``stable pointers''" id="glasgow-stablePtrs"> for more details.
 182
 183 <tag>``Foreign objects'':</tag>
 184 A ``foreign object'' is a safe way to pass an external object (a
 185 C-allocated pointer, say) to Haskell and have Haskell do the Right
 186 Thing when it no longer references the object.  So, for example, C
 187 could pass a large bitmap over to Haskell and say ``please free this
 188 memory when you're done with it.''
 189
 190 Please see Section <ref name="Pointing outside the Haskell heap"
 191 id="glasgow-foreignObjs"> for more details.
 192
 193 </descrip>
 194
 195 The libraries section gives more details on all these ``primitive
 196 array'' types and the operations on them, Section <ref name="The GHC
 197 Prelude and Libraries" id="ghc-prelude">.  Some of these extensions
 198 are also supported by Hugs, and the supporting libraries are described
 199 in the <htmlurl name="GHC/Hugs Extension Libraries" url="libs.html">
 200 document.
 201
 202 %************************************************************************
 203 %*                                                                      *
 204 <sect1>Calling~C directly from Haskell
 205 <label id="glasgow-ccalls">
 206 <p>
 207 <nidx>C calls (Glasgow extension)</nidx>
 208 <nidx>_ccall_ (Glasgow extension)</nidx>
 209 <nidx>_casm_ (Glasgow extension)</nidx>
 210 %*                                                                      *
 211 %************************************************************************
 212
 213 GOOD ADVICE: Because this stuff is not Entirely Stable as far as names
 214 and things go, you would be well-advised to keep your C-callery
 215 corraled in a few modules, rather than sprinkled all over your code.
 216 It will then be quite easy to update later on.
 217
 218 %************************************************************************
 219 %*                                                                      *
 220 <sect2>@_ccall_@ and @_casm_@: an introduction
 221 <label id="ccall-intro">
 222 <p>
 223 %*                                                                      *
 224 %************************************************************************
 225
 226 The simplest way to use a simple C function
 227
 228 <tscreen><verb>
 229 double fooC( FILE *in, char c, int i, double d, unsigned int u )
 230 </verb></tscreen>
 231
 232 is to provide a Haskell wrapper:
 233
 234 <tscreen><verb>
 235 fooH :: Char -> Int -> Double -> Word -> IO Double
 236 fooH c i d w = _ccall_ fooC (``stdin''::Addr) c i d w
 237 </verb></tscreen>
 238
 239 The function @fooH@ will unbox all of its arguments, call the C
 240 function @fooC@ and box the corresponding arguments.
 241
 242 One of the annoyances about @_ccall_@s is when the C types don't quite
 243 match the Haskell compiler's ideas.  For this, the @_casm_@ variant
 244 may be just the ticket (NB: <em>no chance</em> of such code going
 245 through a native-code generator):
 246
 247 <tscreen><verb>
 248 import Addr
 249 import CString
 250
 251 oldGetEnv name
 252   = _casm_ ``%r = getenv((char *) %0);'' name >>= \ litstring ->
 253     return (
 254         if (litstring == nullAddr) then
 255             Left ("Fail:oldGetEnv:"++name)
 256         else
 257             Right (unpackCString litstring)
 258     )
 259 </verb></tscreen>
 260
 261 The first literal-literal argument to a @_casm_@ is like a @printf@
 262 format: @%r@ is replaced with the ``result,'' @%0@--@%n-1@ are
 263 replaced with the 1st--nth arguments.  As you can see above, it is an
 264 easy way to do simple C~casting.  Everything said about @_ccall_@ goes
 265 for @_casm_@ as well.
 266
 267 The use of @_casm_@ in your code does pose a problem to the compiler
 268 when it comes to generating an interface file for a freshly compiled
 269 module. Included in an interface file is the unfolding (if any) of a
 270 declaration. However, if a declaration's unfolding happens to contain
 271 a @_casm_@, its unfolding will <em/not/ be emitted into the interface
 272 file even if it qualifies by all the other criteria. The reason why
 273 the compiler prevents this from happening is that unfolding @_casm_@s
 274 into an interface file unduly constrains how code that import your
 275 module have to be compiled. If an imported declaration is unfolded and
 276 it contains a @_casm_@, you now have to be using a compiler backend
 277 capable of dealing with it (i.e., the C compiler backend). If you are
 278 using the C compiler backend, the unfolded @_casm_@ may still cause you
 279 problems since the C code snippet it contains may mention CPP symbols
 280 that were in scope when compiling the original module are not when
 281 compiling the importing module.
 282
 283 If you're willing to put up with the drawbacks of doing cross-module
 284 inlining of C code (GHC - A Better C Compiler :-), the option
 285 @-funfold-casms-in-hi-file@ will turn off the default behaviour.
 286 <nidx>-funfold-casms-in-hi-file option</nidx>
 287
 288 %************************************************************************
 289 %*                                                                      *
 290 <sect2>Literal-literals
 291 <label id="glasgow-literal-literals">
 292 <p>
 293 <nidx>Literal-literals</nidx>
 294 %*                                                                      *
 295 %************************************************************************
 296
 297 The literal-literal argument to @_casm_@ can be made use of separately
 298 from the @_casm_@ construct itself. Indeed, we've already used it:
 299
 300 <tscreen><verb>
 301 fooH :: Char -> Int -> Double -> Word -> IO Double
 302 fooH c i d w = _ccall_ fooC (``stdin''::Addr) c i d w
 303 </verb></tscreen>
 304
 305 The first argument that's passed to @fooC@ is given as a literal-literal,
 306 that is, a literal chunk of C code that will be inserted into the generated
 307 @.hc@ code at the right place.
 308
 309 A literal-literal is restricted to having a type that's an instance of
 310 the @CCallable@ class, see <ref name="CCallable" id="ccall-gotchas">
 311 for more information.
 312
 313 Notice that literal-literals are by their very nature unfriendly to
 314 native code generators, so exercise judgement about whether or not to
 315 make use of them in your code.
 316
 317 %************************************************************************
 318 %*                                                                      *
 319 <sect2>Using function headers
 320 <label id="glasgow-foreign-headers">
 321 <p>
 322 <nidx>C calls, function headers</nidx>
 323 %*                                                                      *
 324 %************************************************************************
 325
 326 When generating C (using the @-fvia-C@ directive), one can assist the
 327 C compiler in detecting type errors by using the @-#include@ directive
 328 to provide @.h@ files containing function headers.
 329
 330 For example,
 331
 332 <tscreen><verb>
 333 typedef unsigned long *StgForeignObj;
 334 typedef long StgInt;
 335
 336 void          initialiseEFS (StgInt size);
 337 StgInt        terminateEFS (void);
 338 StgForeignObj emptyEFS(void);
 339 StgForeignObj updateEFS (StgForeignObj a, StgInt i, StgInt x);
 340 StgInt        lookupEFS (StgForeignObj a, StgInt i);
 341 </verb></tscreen>
 342
 343 You can find appropriate definitions for @StgInt@, @StgForeignObj@,
 344 etc using @gcc@ on your architecture by consulting
 345 @ghc/includes/StgTypes.h@.  The following table summarises the
 346 relationship between Haskell types and C types.
 347
 348 <tabular ca="ll">
 349 <bf>C type name</bf>      | <bf>Haskell Type</bf> @@
 350 @@
 351 @StgChar@          | @Char#@ @@
 352 @StgInt@           | @Int#@ @@
 353 @StgWord@          | @Word#@ @@
 354 @StgAddr@          | @Addr#@ @@
 355 @StgFloat@         | @Float#@ @@
 356 @StgDouble@        | @Double#@ @@
 357
 358 @StgArray@         | @Array#@ @@
 359 @StgByteArray@     | @ByteArray#@ @@
 360 @StgArray@         | @MutableArray#@ @@
 361 @StgByteArray@     | @MutableByteArray#@ @@
 362
 363 @StgStablePtr@     | @StablePtr#@ @@
 364 @StgForeignObj@    | @ForeignObj#@
 365 </tabular>
 366
 367 Note that this approach is only <em>essential</em> for returning
 368 @float@s (or if @sizeof(int) != sizeof(int *)@ on your
 369 architecture) but is a Good Thing for anyone who cares about writing
 370 solid code.  You're crazy not to do it.
 371
 372 %************************************************************************
 373 %*                                                                      *
 374 <sect2>Subverting automatic unboxing with ``stable pointers''
 375 <label id="glasgow-stablePtrs">
 376 <p>
 377 <nidx>stable pointers (Glasgow extension)</nidx>
 378 %*                                                                      *
 379 %************************************************************************
 380
 381 The arguments of a @_ccall_@ are automatically unboxed before the
 382 call.  There are two reasons why this is usually the Right Thing to
 383 do:
 384
 385 <itemize>
 386 <item>
 387 C is a strict language: it would be excessively tedious to pass
 388 unevaluated arguments and require the C programmer to force their
 389 evaluation before using them.
 390
 391 <item> Boxed values are stored on the Haskell heap and may be moved
 392 within the heap if a garbage collection occurs---that is, pointers
 393 to boxed objects are not <em>stable</em>.
 394 </itemize>
 395
 396 It is possible to subvert the unboxing process by creating a ``stable
 397 pointer'' to a value and passing the stable pointer instead.  For
 398 example, to pass/return an integer lazily to C functions @storeC@ and
 399 @fetchC@, one might write:
 400
 401 <tscreen><verb>
 402 storeH :: Int -> IO ()
 403 storeH x = makeStablePtr x              >>= \ stable_x ->
 404            _ccall_ storeC stable_x
 405
 406 fetchH :: IO Int
 407 fetchH x = _ccall_ fetchC               >>= \ stable_x ->
 408            deRefStablePtr stable_x      >>= \ x ->
 409            freeStablePtr stable_x       >>
 410            return x
 411 </verb></tscreen>
 412
 413 The garbage collector will refrain from throwing a stable pointer away
 414 until you explicitly call one of the following from C or Haskell.
 415
 416 <tscreen><verb>
 417 void freeStablePointer( StgStablePtr stablePtrToToss )
 418 freeStablePtr :: StablePtr a -> IO ()
 419 </verb></tscreen>
 420
 421 As with the use of @free@ in C programs, GREAT CARE SHOULD BE
 422 EXERCISED to ensure these functions are called at the right time: too
 423 early and you get dangling references (and, if you're lucky, an error
 424 message from the runtime system); too late and you get space leaks.
 425
 426 And to force evaluation of the argument within @fooC@, one would
 427 call one of the following C functions (according to type of argument).
 428
 429 <tscreen><verb>
 430 void     performIO  ( StgStablePtr stableIndex /* StablePtr s (IO ()) */ );
 431 StgInt   enterInt   ( StgStablePtr stableIndex /* StablePtr s Int */ );
 432 StgFloat enterFloat ( StgStablePtr stableIndex /* StablePtr s Float */ );
 433 </verb></tscreen>
 434
 435 <nidx>performIO</nidx>
 436 <nidx>enterInt</nidx>
 437 <nidx>enterFloat</nidx>
 438
 439 Note Bene: @_ccall_GC_@<nidx>_ccall_GC_</nidx> must be used if any of
 440 these functions are used.
 441
 442 %************************************************************************
 443 %*                                                                      *
 444 <sect2>Foreign objects: pointing outside the Haskell heap
 445 <label id="glasgow-foreignObjs">
 446 <p>
 447 <nidx>foreign objects (Glasgow extension)</nidx>
 448 %*                                                                      *
 449 %************************************************************************
 450
 451 There are two types that @ghc@ programs can use to reference
 452 (heap-allocated) objects outside the Haskell world: @Addr@ and
 453 @ForeignObj@.
 454
 455 If you use @Addr@, it is up to you to the programmer to arrange
 456 allocation and deallocation of the objects.
 457
 458 If you use @ForeignObj@, @ghc@'s garbage collector will call upon the
 459 user-supplied <em>finaliser</em> function to free the object when the
 460 Haskell world no longer can access the object.  (An object is
 461 associated with a finaliser function when the abstract
 462  Haskell type @ForeignObj@ is created). The finaliser function is
 463 expressed in C, and is passed as argument the object:
 464
 465 <tscreen><verb>
 466 void foreignFinaliser ( StgForeignObj fo )
 467 </verb></tscreen>
 468
 469 when the Haskell world can no longer access the object.  Since
 470 @ForeignObj@s only get released when a garbage collection occurs, we
 471 provide ways of triggering a garbage collection from within C and from
 472 within Haskell.
 473
 474 <tscreen><verb>
 475 void GarbageCollect()
 476 performGC :: IO ()
 477 </verb></tscreen>
 478
 479 More information on the programmers' interface to @ForeignObj@ can be
 480 found in the library documentation.
 481
 482 %************************************************************************
 483 %*                                                                      *
 484 <sect2>Avoiding monads
 485 <label id="glasgow-avoiding-monads">
 486 <p>
 487 <nidx>C calls to `pure C'</nidx>
 488 <nidx>unsafePerformIO</nidx>
 489 %*                                                                      *
 490 %************************************************************************
 491
 492 The @_ccall_@ construct is part of the @IO@ monad because 9 out of 10
 493 uses will be to call imperative functions with side effects such as
 494 @printf@.  Use of the monad ensures that these operations happen in a
 495 predictable order in spite of laziness and compiler optimisations.
 496
 497 To avoid having to be in the monad to call a C function, it is
 498 possible to use @unsafePerformIO@, which is available from the
 499 @IOExts@ module.  There are three situations where one might like to
 500 call a C function from outside the IO world:
 501
 502 <itemize>
 503 <item>
 504 Calling a function with no side-effects:
 505 <tscreen><verb>
 506 atan2d :: Double -> Double -> Double
 507 atan2d y x = unsafePerformIO (_ccall_ atan2d y x)
 508
 509 sincosd :: Double -> (Double, Double)
 510 sincosd x = unsafePerformIO $ do
 511         da <- newDoubleArray (0, 1)
 512         _casm_ ``sincosd( %0, &((double *)%1[0]), &((double *)%1[1]) );'' x da
 513         s <- readDoubleArray da 0
 514         c <- readDoubleArray da 1
 515         return (s, c)
 516 </verb></tscreen>
 517
 518 <item> Calling a set of functions which have side-effects but which can
 519 be used in a purely functional manner.
 520
 521 For example, an imperative implementation of a purely functional
 522 lookup-table might be accessed using the following functions.
 523
 524 <tscreen><verb>
 525 empty  :: EFS x
 526 update :: EFS x -> Int -> x -> EFS x
 527 lookup :: EFS a -> Int -> a
 528
 529 empty = unsafePerformIO (_ccall_ emptyEFS)
 530
 531 update a i x = unsafePerformIO $
 532         makeStablePtr x         >>= \ stable_x ->
 533         _ccall_ updateEFS a i stable_x
 534
 535 lookup a i = unsafePerformIO $
 536         _ccall_ lookupEFS a i   >>= \ stable_x ->
 537         deRefStablePtr stable_x
 538 </verb></tscreen>
 539
 540 You will almost always want to use @ForeignObj@s with this.
 541
 542 <item> Calling a side-effecting function even though the results will
 543 be unpredictable.  For example the @trace@ function is defined by:
 544
 545 <tscreen><verb>
 546 trace :: String -> a -> a
 547 trace string expr
 548   = unsafePerformIO (
 549         ((_ccall_ PreTraceHook sTDERR{-msg-}):: IO ())  >>
 550         fputs sTDERR string                             >>
 551         ((_ccall_ PostTraceHook sTDERR{-msg-}):: IO ()) >>
 552         return expr )
 553   where
 554     sTDERR = (``stderr'' :: Addr)
 555 </verb></tscreen>
 556
 557 (This kind of use is not highly recommended --- it is only really
 558 useful in debugging code.)
 559 </itemize>
 560
 561 %************************************************************************
 562 %*                                                                      *
 563 <sect2>C-calling ``gotchas'' checklist
 564 <label id="ccall-gotchas">
 565 <p>
 566 <nidx>C call dangers</nidx>
 567 <nidx>CCallable</nidx>
 568 <nidx>CReturnable</nidx>
 569 %*                                                                      *
 570 %************************************************************************
 571
 572 And some advice, too.
 573
 574 <itemize>
 575 <item> For modules that use @_ccall_@s, etc., compile with
 576 @-fvia-C@.<nidx>-fvia-C option</nidx> You don't have to, but you should.
 577
 578 Also, use the @-#include "prototypes.h"@ flag (hack) to inform the C
 579 compiler of the fully-prototyped types of all the C functions you
 580 call.  (Section <ref name="Using function headers"
 581 id="glasgow-foreign-headers"> says more about this...)
 582
 583 This scheme is the <em>only</em> way that you will get <em>any</em>
 584 typechecking of your @_ccall_@s.  (It shouldn't be that way, but...).
 585 GHC will pass the flag @-Wimplicit@ to gcc so that you'll get warnings
 586 if any @_ccall_@ed functions have no prototypes.
 587
 588 <item>
 589 Try to avoid @_ccall_@s to C~functions that take @float@
 590 arguments or return @float@ results.  Reason: if you do, you will
 591 become entangled in (ANSI?) C's rules for when arguments/results are
 592 promoted to @doubles@.  It's a nightmare and just not worth it.
 593 Use @doubles@ if possible.
 594
 595 If you do use @floats@, check and re-check that the right thing is
 596 happening.  Perhaps compile with @-keep-hc-file-too@ and look at
 597 the intermediate C (@.hc@ file).
 598
 599 <item> The compiler uses two non-standard type-classes when
 600 type-checking the arguments and results of @_ccall_@: the arguments
 601 (respectively result) of @_ccall_@ must be instances of the class
 602 @CCallable@ (respectively @CReturnable@).  Both classes may be
 603 imported from the module @CCall@, but this should only be
 604 necessary if you want to define a new instance.  (Neither class
 605 defines any methods --- their only function is to keep the
 606 type-checker happy.)
 607
 608 The type checker must be able to figure out just which of the
 609 C-callable/returnable types is being used.  If it can't, you have to
 610 add type signatures. For example,
 611
 612 <tscreen><verb>
 613 f x = _ccall_ foo x
 614 </verb></tscreen>
 615
 616 is not good enough, because the compiler can't work out what type @x@
 617 is, nor what type the @_ccall_@ returns.  You have to write, say:
 618
 619 <tscreen><verb>
 620 f :: Int -> IO Double
 621 f x = _ccall_ foo x
 622 </verb></tscreen>
 623
 624 This table summarises the standard instances of these classes.
 625
 626 % ToDo: check this table against implementation!
 627
 628 <tabular ca="llll">
 629 <bf>Type</bf>       |<bf>CCallable</bf>|<bf>CReturnable</bf> | <bf>Which is probably...</bf> @@
 630
 631 @Char@              | Yes  | Yes   | @unsigned char@ @@
 632 @Int@               | Yes  | Yes   | @long int@ @@
 633 @Word@              | Yes  | Yes   | @unsigned long int@ @@
 634 @Addr@              | Yes  | Yes   | @void *@ @@
 635 @Float@             | Yes  | Yes   | @float@ @@
 636 @Double@            | Yes  | Yes   | @double@ @@
 637 @()@                | No   | Yes   | @void@ @@
 638 @[Char]@            | Yes  | No    | @char *@ (null-terminated) @@
 639
 640 @Array@             | Yes  | No    | @unsigned long *@ @@
 641 @ByteArray@         | Yes  | No    | @unsigned long *@ @@
 642 @MutableArray@      | Yes  | No    | @unsigned long *@ @@
 643 @MutableByteArray@  | Yes  | No    | @unsigned long *@ @@
 644
 645 @State@             | Yes  | Yes   | nothing!@@
 646
 647 @StablePtr@         | Yes  | Yes   | @unsigned long *@ @@
 648 @ForeignObjs@       | Yes  | Yes   | see later @@
 649 </tabular>
 650
 651 Actually, the @Word@ type is defined as being the same size as a
 652 pointer on the target architecture, which is <em>probably</em>
 653 @unsigned long int@.
 654
 655 The brave and careful programmer can add their own instances of these
 656 classes for the following types:
 657
 658 <itemize>
 659 <item>
 660 A <em>boxed-primitive</em> type may be made an instance of both
 661 @CCallable@ and @CReturnable@.
 662
 663 A boxed primitive type is any data type with a
 664 single unary constructor with a single primitive argument.  For
 665 example, the following are all boxed primitive types:
 666
 667 <tscreen><verb>
 668 Int
 669 Double
 670 data XDisplay = XDisplay Addr#
 671 data EFS a = EFS# ForeignObj#
 672 </verb></tscreen>
 673
 674 <tscreen><verb>
 675 instance CCallable   (EFS a)
 676 instance CReturnable (EFS a)
 677 </verb></tscreen>
 678
 679 <item> Any datatype with a single nullary constructor may be made an
 680 instance of @CReturnable@.  For example:
 681
 682 <tscreen><verb>
 683 data MyVoid = MyVoid
 684 instance CReturnable MyVoid
 685 </verb></tscreen>
 686
 687 <item> As at version 2.09, @String@ (i.e., @[Char]@) is still
 688 not a @CReturnable@ type.
 689
 690 Also, the now-builtin type @PackedString@ is neither
 691 @CCallable@ nor @CReturnable@.  (But there are functions in
 692 the PackedString interface to let you get at the necessary bits...)
 693 </itemize>
 694
 695 <item> The code-generator will complain if you attempt to use @%r@ in
 696 a @_casm_@ whose result type is @IO ()@; or if you don't use @%r@
 697 <em>precisely</em> once for any other result type.  These messages are
 698 supposed to be helpful and catch bugs---please tell us if they wreck
 699 your life.
 700
 701 <item> If you call out to C code which may trigger the Haskell garbage
 702 collector or create new threads (examples of this later...), then you
 703 must use the @_ccall_GC_@<nidx>_ccall_GC_ primitive</nidx> or
 704 @_casm_GC_@<nidx>_casm_GC_ primitive</nidx> variant of C-calls.  (This
 705 does not work with the native code generator - use @\fvia-C@.) This
 706 stuff is hairy with a capital H!  </itemize>
 707
 708 <sect1> Multi-parameter type classes
 709 <label id="multi-param-type-classes">
 710 <p>
 711
 712 This section documents GHC's implementation of multi-paramter type
 713 classes.  There's lots of background in the paper <url name="Type
 714 classes: exploring the design space"
 715 url="http://www.dcs.gla.ac.uk/~simonpj/multi.ps.gz"> (Simon Peyton
 716 Jones, Mark Jones, Erik Meijer).
 717
 718 I'd like to thank people who reported shorcomings in the GHC 3.02
 719 implementation.  Our default decisions were all conservative ones, and
 720 the experience of these heroic pioneers has given useful concrete
 721 examples to support several generalisations.  (These appear below as
 722 design choices not implemented in 3.02.)
 723
 724 I've discussed these notes with Mark Jones, and I believe that Hugs
 725 will migrate towards the same design choices as I outline here.
 726 Thanks to him, and to many others who have offered very useful
 727 feedback.
 728
 729 <sect2>Types
 730 <p>
 731
 732 There are the following restrictions on the form of a qualified
 733 type:
 734
 735 <tscreen><verb>
 736   forall tv1..tvn (c1, ...,cn) => type
 737 </verb></tscreen>
 738
 739 (Here, I write the "foralls" explicitly, although the Haskell source
 740 language omits them; in Haskell 1.4, all the free type variables of an
 741 explicit source-language type signature are universally quantified,
 742 except for the class type variables in a class declaration.  However,
 743 in GHC, you can give the foralls if you want.  See Section <ref
 744 name="Explicit universal quantification"
 745 id="universal-quantification">).
 746
 747 <enum>
 748
 749 <item> <bf>Each universally quantified type variable
 750 @tvi@ must be mentioned (i.e. appear free) in @type@</bf>.
 751
 752 The reason for this is that a value with a type that does not obey
 753 this restriction could not be used without introducing
 754 ambiguity. Here, for example, is an illegal type:
 755
 756 <tscreen><verb>
 757   forall a. Eq a => Int
 758 </verb></tscreen>
 759
 760 When a value with this type was used, the constraint <tt>Eq tv</tt>
 761 would be introduced where <tt>tv</tt> is a fresh type variable, and
 762 (in the dictionary-translation implementation) the value would be
 763 applied to a dictionary for <tt>Eq tv</tt>.  The difficulty is that we
 764 can never know which instance of <tt>Eq</tt> to use because we never
 765 get any more information about <tt>tv</tt>.
 766
 767 <item> <bf>Every constraint @ci@ must mention at least one of the
 768 universally quantified type variables @tvi@</bf>.
 769
 770 For example, this type is OK because <tt>C a b</tt> mentions the
 771 universally quantified type variable <tt>b</tt>:
 772
 773 <tscreen><verb>
 774   forall a. C a b => burble
 775 </verb></tscreen>
 776
 777 The next type is illegal because the constraint <tt>Eq b</tt> does not
 778 mention <tt>a</tt>:
 779
 780 <tscreen><verb>
 781   forall a. Eq b => burble
 782 </verb></tscreen>
 783
 784 The reason for this restriction is milder than the other one.  The
 785 excluded types are never useful or necessary (because the offending
 786 context doesn't need to be witnessed at this point; it can be floated
 787 out).  Furthermore, floating them out increases sharing. Lastly,
 788 excluding them is a conservative choice; it leaves a patch of
 789 territory free in case we need it later.
 790
 791 </enum>
 792
 793 These restrictions apply to all types, whether declared in a type signature
 794 or inferred.
 795
 796 Unlike Haskell 1.4, constraints in types do <bf>not</bf> have to be of
 797 the form <em>(class type-variables)</em>.  Thus, these type signatures
 798 are perfectly OK
 799
 800 <tscreen><verb>
 801   f :: Eq (m a) => [m a] -> [m a]
 802   g :: Eq [a] => ...
 803 </verb></tscreen>
 804
 805 This choice recovers principal types, a property that Haskell 1.4 does not have.
 806
 807 <sect2>Class declarations
 808 <p>
 809
 810 <enum>
 811
 812 <item> <bf>Multi-parameter type classes are permitted</bf>. For example:
 813
 814 <tscreen><verb>
 815   class Collection c a where
 816     union :: c a -> c a -> c a
 817     ...etc..
 818 </verb></tscreen>
 819
 820
 821 <item> <bf>The class hierarchy must be acyclic</bf>.  However, the definition
 822 of "acyclic" involves only the superclass relationships.  For example,
 823 this is OK:
 824
 825 <tscreen><verb>
 826   class C a where {
 827     op :: D b => a -> b -> b
 828   }
 829
 830   class C a => D a where { ... }
 831 </verb></tscreen>
 832
 833 Here, <tt>C</tt> is a superclass of <tt>D</tt>, but it's OK for a
 834 class operation <tt>op</tt> of <tt>C</tt> to mention <tt>D</tt>.  (It
 835 would not be OK for <tt>D</tt> to be a superclass of <tt>C</tt>.)
 836
 837 <item> <bf>There are no restrictions on the context in a class declaration
 838 (which introduces superclasses), except that the class hierarchy must
 839 be acyclic</bf>.  So these class declarations are OK:
 840
 841 <tscreen><verb>
 842   class Functor (m k) => FiniteMap m k where
 843     ...
 844
 845   class (Monad m, Monad (t m)) => Transform t m where
 846     lift :: m a -> (t m) a
 847 </verb></tscreen>
 848
 849 <item> <bf>In the signature of a class operation, every constraint
 850 must mention at least one type variable that is not a class type
 851 variable</bf>.
 852
 853 Thus:
 854
 855 <tscreen><verb>
 856   class Collection c a where
 857     mapC :: Collection c b => (a->b) -> c a -> c b
 858 </verb></tscreen>
 859
 860 is OK because the constraint <tt>(Collection a b)</tt> mentions
 861 <tt>b</tt>, even though it also mentions the class variable
 862 <tt>a</tt>.  On the other hand:
 863
 864 <tscreen><verb>
 865   class C a where
 866     op :: Eq a => (a,b) -> (a,b)
 867 </verb></tscreen>
 868
 869 is not OK because the constraint <tt>(Eq a)</tt> mentions on the class
 870 type variable <tt>a</tt>, but not <tt>b</tt>.  However, any such
 871 example is easily fixed by moving the offending context up to the
 872 superclass context:
 873
 874 <tscreen><verb>
 875   class Eq a => C a where
 876     op ::(a,b) -> (a,b)
 877 </verb></tscreen>
 878
 879 A yet more relaxed rule would allow the context of a class-op signature
 880 to mention only class type variables.  However, that conflicts with
 881 Rule 1(b) for types above.
 882
 883 <item> <bf>The type of each class operation must mention <em/all/ of
 884 the class type variables</bf>.  For example:
 885
 886 <tscreen><verb>
 887   class Coll s a where
 888     empty  :: s
 889     insert :: s -> a -> s
 890 </verb></tscreen>
 891
 892 is not OK, because the type of <tt>empty</tt> doesn't mention
 893 <tt>a</tt>.  This rule is a consequence of Rule 1(a), above, for
 894 types, and has the same motivation.
 895
 896 Sometimes, offending class declarations exhibit misunderstandings.  For
 897 example, <tt>Coll</tt> might be rewritten
 898
 899 <tscreen><verb>
 900   class Coll s a where
 901     empty  :: s a
 902     insert :: s a -> a -> s a
 903 </verb></tscreen>
 904
 905 which makes the connection between the type of a collection of
 906 <tt>a</tt>'s (namely <tt>(s a)</tt>) and the element type <tt>a</tt>.
 907 Occasionally this really doesn't work, in which case you can split the
 908 class like this:
 909
 910 <tscreen><verb>
 911   class CollE s where
 912     empty  :: s
 913
 914   class CollE s => Coll s a where
 915     insert :: s -> a -> s
 916 </verb></tscreen>
 917
 918 </enum>
 919
 920 <sect2>Instance declarations
 921 <p>
 922
 923 <enum>
 924
 925 <item> <bf>Instance declarations may not overlap</bf>.  The two instance
 926 declarations
 927
 928 <tscreen><verb>
 929   instance context1 => C type1 where ...
 930   instance context2 => C type2 where ...
 931 </verb></tscreen>
 932
 933 "overlap" if @type1@ and @type2@ unify
 934
 935 However, if you give the command line option
 936 @-fallow-overlapping-instances@<nidx>-fallow-overlapping-instances
 937 option</nidx> then two overlapping instance declarations are permitted
 938 iff
 939
 940 <itemize>
 941 <item> EITHER @type1@ and @type2@ do not unify
 942 <item> OR @type2@ is a substitution instance of @type1@
 943                 (but not identical to @type1@)
 944 <item> OR vice versa
 945 </itemize>
 946
 947 Notice that these rules
 948
 949 <itemize>
 950 <item> make it clear which instance decl to use
 951            (pick the most specific one that matches)
 952
 953 <item> do not mention the contexts @context1@, @context2@
 954             Reason: you can pick which instance decl
 955             "matches" based on the type.
 956 </itemize>
 957
 958 Regrettably, GHC doesn't guarantee to detect overlapping instance
 959 declarations if they appear in different modules.  GHC can "see" the
 960 instance declarations in the transitive closure of all the modules
 961 imported by the one being compiled, so it can "see" all instance decls
 962 when it is compiling <tt>Main</tt>.  However, it currently chooses not
 963 to look at ones that can't possibly be of use in the module currently
 964 being compiled, in the interests of efficiency.  (Perhaps we should
 965 change that decision, at least for <tt>Main</tt>.)
 966
 967 <item> <bf>There are no restrictions on the type in an instance
 968 <em/head/, except that at least one must not be a type variable</bf>.
 969 The instance "head" is the bit after the "=>" in an instance decl. For
 970 example, these are OK:
 971
 972 <tscreen><verb>
 973   instance C Int a where ...
 974
 975   instance D (Int, Int) where ...
 976
 977   instance E [[a]] where ...
 978 </verb></tscreen>
 979
 980 Note that instance heads <bf>may</bf> contain repeated type variables.
 981 For example, this is OK:
 982
 983 <tscreen><verb>
 984   instance Stateful (ST s) (MutVar s) where ...
 985 </verb></tscreen>
 986
 987 The "at least one not a type variable" restriction is to ensure that
 988 context reduction terminates: each reduction step removes one type
 989 constructor.  For example, the following would make the type checker
 990 loop if it wasn't excluded:
 991
 992 <tscreen><verb>
 993   instance C a => C a where ...
 994 </verb></tscreen>
 995
 996 There are two situations in which the rule is a bit of a pain. First,
 997 if one allows overlapping instance declarations then it's quite
 998 convenient to have a "default instance" declaration that applies if
 999 something more specific does not:
1000
1001 <tscreen><verb>
1002   instance C a where
1003     op = ... -- Default
1004 </verb></tscreen>
1005
1006 Second, sometimes you might want to use the following to get the
1007 effect of a "class synonym":
1008
1009 <tscreen><verb>
1010   class (C1 a, C2 a, C3 a) => C a where { }
1011
1012   instance (C1 a, C2 a, C3 a) => C a where { }
1013 </verb></tscreen>
1014
1015 This allows you to write shorter signatures:
1016
1017 <tscreen><verb>
1018   f :: C a => ...
1019 </verb></tscreen>
1020
1021 instead of
1022
1023 <tscreen><verb>
1024   f :: (C1 a, C2 a, C3 a) => ...
1025 </verb></tscreen>
1026
1027 I'm on the lookout for a simple rule that preserves decidability while
1028 allowing these idioms.  The experimental flag
1029 @-fallow-undecidable-instances@<nidx>-fallow-undecidable-instances
1030 option</nidx> lifts this restriction, allowing all the types in an
1031 instance head to be type variables.
1032
1033 <item> <bf>Unlike Haskell 1.4, instance heads may use type
1034 synonyms</bf>.  As always, using a type synonym is just shorthand for
1035 writing the RHS of the type synonym definition.  For example:
1036
1037 <tscreen><verb>
1038   type Point = (Int,Int)
1039   instance C Point   where ...
1040   instance C [Point] where ...
1041 </verb></tscreen>
1042
1043 is legal.  However, if you added
1044
1045 <tscreen><verb>
1046   instance C (Int,Int) where ...
1047 </verb></tscreen>
1048
1049 as well, then the compiler will complain about the overlapping
1050 (actually, identical) instance declarations.  As always, type synonyms
1051 must be fully applied.  You cannot, for example, write:
1052
1053 <tscreen><verb>
1054   type P a = [[a]]
1055   instance Monad P where ...
1056 </verb></tscreen>
1057
1058 This design decision is independent of all the others, and easily
1059 reversed, but it makes sense to me.
1060
1061 <item><bf>The types in an instance-declaration <em/context/ must all
1062 be type variables</bf>. Thus
1063
1064 <tscreen><verb>
1065   instance C a b => Eq (a,b) where ...
1066 </verb></tscreen>
1067
1068 is OK, but
1069
1070 <tscreen><verb>
1071   instance C Int b => Foo b where ...
1072 </verb></tscreen>
1073
1074 is not OK.  Again, the intent here is to make sure that context
1075 reduction terminates.
1076
1077 Voluminous correspondence on the Haskell mailing list has convinced me
1078 that it's worth experimenting with a more liberal rule.  If you use
1079 the flag <tt>-fallow-undecidable-instances</tt> you can use arbitrary
1080 types in an instance context.  Termination is ensured by having a
1081 fixed-depth recursion stack.  If you exceed the stack depth you get a
1082 sort of backtrace, and the opportunity to increase the stack depth
1083 with <tt>-fcontext-stack</tt><em/N/.
1084
1085 </enum>
1086
1087 % -----------------------------------------------------------------------------
1088 <sect1>Explicit universal quantification
1089 <label id="universal-quantification">
1090 <p>
1091
1092 GHC now allows you to write explicitly quantified types.  GHC's
1093 syntax for this now agrees with Hugs's, namely:
1094
1095 <tscreen><verb>
1096         forall a b. (Ord a, Eq  b) => a -> b -> a
1097 </verb></tscreen>
1098
1099 The context is, of course, optional.  You can't use <tt>forall</tt> as
1100 a type variable any more!
1101
1102 Haskell type signatures are implicitly quantified.  The <tt>forall</tt>
1103 allows us to say exactly what this means.  For example:
1104
1105 <tscreen><verb>
1106         g :: b -> b
1107 </verb></tscreen>
1108
1109 means this:
1110
1111 <tscreen><verb>
1112         g :: forall b. (b -> b)
1113 </verb></tscreen>
1114
1115 The two are treated identically.
1116
1117 <sect2>Universally-quantified data type fields
1118 <label id="univ">
1119 <p>
1120
1121 In a <tt>data</tt> or <tt>newtype</tt> declaration one can quantify
1122 the types of the constructor arguments.  Here are several examples:
1123
1124 <tscreen><verb>
1125 data T a = T1 (forall b. b -> b -> b) a
1126
1127 data MonadT m = MkMonad { return :: forall a. a -> m a,
1128                           bind   :: forall a b. m a -> (a -> m b) -> m b
1129                         }
1130
1131 newtype Swizzle = MkSwizzle (Ord a => [a] -> [a])
1132 </verb></tscreen>
1133
1134 The constructors now have so-called <em/rank 2/ polymorphic
1135 types, in which there is a for-all in the argument types.:
1136
1137 <tscreen><verb>
1138 T1 :: forall a. (forall b. b -> b -> b) -> a -> T1 a
1139 MkMonad :: forall m. (forall a. a -> m a)
1140                   -> (forall a b. m a -> (a -> m b) -> m b)
1141                   -> MonadT m
1142 MkSwizzle :: (Ord a => [a] -> [a]) -> Swizzle
1143 </verb></tscreen>
1144
1145 Notice that you don't need to use a <tt>forall</tt> if there's an
1146 explicit context.  For example in the first argument of the
1147 constructor <tt>MkSwizzle</tt>, an implicit "<tt>forall a.</tt>" is
1148 prefixed to the argument type.  The implicit <tt>forall</tt>
1149 quantifies all type variables that are not already in scope, and are
1150 mentioned in the type quantified over.
1151
1152 As for type signatures, implicit quantification happens for non-overloaded
1153 types too.  So if you write this:
1154 <tscreen><verb>
1155   data T a = MkT (Either a b) (b -> b)
1156 </verb></tscreen>
1157 it's just as if you had written this:
1158 <tscreen><verb>
1159   data T a = MkT (forall b. Either a b) (forall b. b -> b)
1160 </verb></tscreen>
1161 That is, since the type variable <tt>b</tt> isn't in scope, it's
1162 implicitly universally quantified.  (Arguably, it would be better
1163 to <em>require</em> explicit quantification on constructor arguments
1164 where that is what is wanted.  Feedback welcomed.)
1165
1166 <sect2> Construction
1167 <p>
1168
1169 You construct values of types <tt>T1, MonadT, Swizzle</tt> by applying
1170 the constructor to suitable values, just as usual.  For example,
1171
1172 <tscreen><verb>
1173 (T1 (\xy->x) 3) :: T Int
1174
1175 (MkSwizzle sort)    :: Swizzle
1176 (MkSwizzle reverse) :: Swizzle
1177
1178 (let r x = Just x
1179      b m k = case m of
1180                 Just y -> k y
1181                 Nothing -> Nothing
1182   in
1183   MkMonad r b) :: MonadT Maybe
1184 </verb></tscreen>
1185
1186 The type of the argument can, as usual, be more general than the type
1187 required, as <tt>(MkSwizzle reverse)</tt> shows.  (<tt>reverse</tt>
1188 does not need the <tt>Ord</tt> constraint.)
1189
1190 <sect2>Pattern matching
1191 <p>
1192
1193 When you use pattern matching, the bound variables may now have
1194 polymorphic types.  For example:
1195
1196 <tscreen><verb>
1197         f :: T a -> a -> (a, Char)
1198         f (T1 f k) x = (f k x, f 'c' 'd')
1199
1200         g :: (Ord a, Ord b) => Swizzle -> [a] -> (a -> b) -> [b]
1201         g (MkSwizzle s) xs f = s (map f (s xs))
1202
1203         h :: MonadT m -> [m a] -> m [a]
1204         h m [] = return m []
1205         h m (x:xs) = bind m x           $ \y ->
1206                       bind m (h m xs)   $ \ys ->
1207                       return m (y:ys)
1208 </verb></tscreen>
1209
1210 In the function <tt>h</tt> we use the record selectors <tt>return</tt>
1211 and <tt>bind</tt> to extract the polymorphic bind and return functions
1212 from the <tt>MonadT</tt> data structure, rather than using pattern
1213 matching.
1214
1215 You cannot pattern-match against an argument that is polymorphic.
1216 For example:
1217 <tscreen><verb>
1218         newtype TIM s a = TIM (ST s (Maybe a))
1219
1220         runTIM :: (forall s. TIM s a) -> Maybe a
1221         runTIM (TIM m) = runST m
1222 </verb></tscreen>
1223
1224 Here the pattern-match fails, because you can't pattern-match against
1225 an argument of type <tt>(forall s. TIM s a)</tt>.  Instead you
1226 must bind the variable and pattern match in the right hand side:
1227 <tscreen><verb>
1228         runTIM :: (forall s. TIM s a) -> Maybe a
1229         runTIM tm = case tm of { TIM m -> runST m }
1230 </verb></tscreen>
1231 The <tt>tm</tt> on the right hand side is (invisibly) instantiated, like
1232 any polymorphic value at its occurrence site, and now you can pattern-match
1233 against it.
1234
1235 <sect2>The partial-application restriction
1236 <p>
1237
1238 There is really only one way in which data structures with polymorphic
1239 components might surprise you: you must not partially apply them.
1240 For example, this is illegal:
1241
1242 <tscreen><verb>
1243         map MkSwizzle [sort, reverse]
1244 </verb></tscreen>
1245
1246 The restriction is this: <em>every subexpression of the program must
1247 have a type that has no for-alls, except that in a function
1248 application (f e1 ... en) the partial applications are not subject to
1249 this rule</em>.  The restriction makes type inference feasible.
1250
1251 In the illegal example, the sub-expression <tt>MkSwizzle</tt> has the
1252 polymorphic type <tt>(Ord b => [b] -> [b]) -> Swizzle</tt> and is not
1253 a sub-expression of an enclosing application.  On the other hand, this
1254 expression is OK:
1255
1256 <tscreen><verb>
1257         map (T1 (\a b -> a)) [1,2,3]
1258 </verb></tscreen>
1259
1260 even though it involves a partial application of <tt>T1</tt>, because
1261 the sub-expression <tt>T1 (\a b -> a)</tt> has type <tt>Int -> T
1262 Int</tt>.
1263
1264 <sect2>Type signatures
1265 <label id="sigs">
1266 <p>
1267
1268 Once you have data constructors with universally-quantified fields, or
1269 constants such as <tt>runST</tt> that have rank-2 types, it isn't long
1270 before you discover that you need more!  Consider:
1271
1272 <tscreen><verb>
1273   mkTs f x y = [T1 f x, T1 f y]
1274 </verb></tscreen>
1275
1276 <tt>mkTs</tt> is a fuction that constructs some values of type
1277 <tt>T</tt>, using some pieces passed to it.  The trouble is that since
1278 <tt>f</tt> is a function argument, Haskell assumes that it is
1279 monomorphic, so we'll get a type error when applying <tt>T1</tt> to
1280 it.  This is a rather silly example, but the problem really bites in
1281 practice.  Lots of people trip over the fact that you can't make
1282 "wrappers functions" for <tt>runST</tt> for exactly the same reason.
1283 In short, it is impossible to build abstractions around functions with
1284 rank-2 types.
1285
1286 The solution is fairly clear.  We provide the ability to give a rank-2
1287 type signature for <em>ordinary</em> functions (not only data
1288 constructors), thus:
1289
1290 <tscreen><verb>
1291   mkTs :: (forall b. b -> b -> b) -> a -> [T a]
1292   mkTs f x y = [T1 f x, T1 f y]
1293 </verb></tscreen>
1294
1295 This type signature tells the compiler to attribute <tt>f</tt> with
1296 the polymorphic type <tt>(forall b. b -> b -> b)</tt> when type
1297 checking the body of <tt>mkTs</tt>, so now the application of
1298 <tt>T1</tt> is fine.
1299
1300 There are two restrictions:
1301
1302 <itemize>
1303 <item> You can only define a rank 2 type, specified by the following
1304 grammar:
1305
1306 <tscreen><verb>
1307    rank2type ::= [forall tyvars .] [context =>] funty
1308    funty     ::= ([forall tyvars .] [context =>] ty) -> funty
1309                | ty
1310    ty        ::= ...current Haskell monotype syntax...
1311 </verb></tscreen>
1312
1313 Informally, the universal quantification must all be right at the beginning,
1314 or at the top level of a function argument.
1315
1316 <item> There is a restriction on the definition of a function whose
1317 type signature is a rank-2 type: the polymorphic arguments must be
1318 matched on the left hand side of the "<tt>=</tt>" sign.  You can't
1319 define <tt>mkTs</tt> like this:
1320
1321 <tscreen><verb>
1322   mkTs :: (forall b. b -> b -> b) -> a -> [T a]
1323   mkTs = \ f x y -> [T1 f x, T1 f y]
1324 </verb></tscreen>
1325
1326
1327 The same partial-application rule applies to ordinary functions with
1328 rank-2 types as applied to data constructors.
1329
1330 </itemize>
1331
1332 % -----------------------------------------------------------------------------
1333 <sect1>Existentially quantified data constructors
1334 <label id="existential-quantification">
1335 <p>
1336
1337 The idea of using existential quantification in data type declarations
1338 was suggested by Laufer (I believe, thought doubtless someone will
1339 correct me), and implemented in Hope+. It's been in Lennart
1340 Augustsson's <tt>hbc</tt> Haskell compiler for several years, and
1341 proved very useful.  Here's the idea.  Consider the declaration:
1342
1343 <tscreen><verb>
1344   data Foo = forall a. MkFoo a (a -> Bool)
1345            | Nil
1346 </verb></tscreen>
1347
1348 The data type <tt>Foo</tt> has two constructors with types:
1349
1350 <tscreen><verb>
1351   MkFoo :: forall a. a -> (a -> Bool) -> Foo
1352   Nil   :: Foo
1353 </verb></tscreen>
1354
1355 Notice that the type variable <tt>a</tt> in the type of <tt>MkFoo</tt>
1356 does not appear in the data type itself, which is plain <tt>Foo</tt>.
1357 For example, the following expression is fine:
1358
1359 <tscreen><verb>
1360   [MkFoo 3 even, MkFoo 'c' isUpper] :: [Foo]
1361 </verb></tscreen>
1362
1363 Here, <tt>(MkFoo 3 even)</tt> packages an integer with a function
1364 <tt>even</tt> that maps an integer to <tt>Bool</tt>; and <tt>MkFoo 'c'
1365 isUpper</tt> packages a character with a compatible function.  These
1366 two things are each of type <tt>Foo</tt> and can be put in a list.
1367
1368 What can we do with a value of type <tt>Foo</tt>?.  In particular,
1369 what happens when we pattern-match on <tt>MkFoo</tt>?
1370
1371 <tscreen><verb>
1372   f (MkFoo val fn) = ???
1373 </verb></tscreen>
1374
1375 Since all we know about <tt>val</tt> and <tt>fn</tt> is that they
1376 are compatible, the only (useful) thing we can do with them is to
1377 apply <tt>fn</tt> to <tt>val</tt> to get a boolean.  For example:
1378
1379 <tscreen><verb>
1380   f :: Foo -> Bool
1381   f (MkFoo val fn) = fn val
1382 </verb></tscreen>
1383
1384 What this allows us to do is to package heterogenous values
1385 together with a bunch of functions that manipulate them, and then treat
1386 that collection of packages in a uniform manner.  You can express
1387 quite a bit of object-oriented-like programming this way.
1388
1389 <sect2>Why existential?
1390 <label id="existential">
1391 <p>
1392
1393 What has this to do with <em>existential</em> quantification?
1394 Simply that <tt>MkFoo</tt> has the (nearly) isomorphic type
1395
1396 <tscreen><verb>
1397   MkFoo :: (exists a . (a, a -> Bool)) -> Foo
1398 </verb></tscreen>
1399
1400 But Haskell programmers can safely think of the ordinary
1401 <em>universally</em> quantified type given above, thereby avoiding
1402 adding a new existential quantification construct.
1403
1404 <sect2>Type classes
1405 <p>
1406
1407 An easy extension (implemented in <tt>hbc</tt>) is to allow
1408 arbitrary contexts before the constructor.  For example:
1409
1410 <tscreen><verb>
1411   data Baz = forall a. Eq a => Baz1 a a
1412            | forall b. Show b => Baz2 b (b -> b)
1413 </verb></tscreen>
1414
1415 The two constructors have the types you'd expect:
1416
1417 <tscreen><verb>
1418   Baz1 :: forall a. Eq a => a -> a -> Baz
1419   Baz2 :: forall b. Show b => b -> (b -> b) -> Baz
1420 </verb></tscreen>
1421
1422 But when pattern matching on <tt>Baz1</tt> the matched values can be compared
1423 for equality, and when pattern matching on <tt>Baz2</tt> the first matched
1424 value can be converted to a string (as well as applying the function to it).
1425 So this program is legal:
1426
1427 <tscreen><verb>
1428   f :: Baz -> String
1429   f (Baz1 p q) | p == q    = "Yes"
1430                | otherwise = "No"
1431   f (Baz1 v fn)            = show (fn v)
1432 </verb></tscreen>
1433
1434 Operationally, in a dictionary-passing implementation, the
1435 constructors <tt>Baz1</tt> and <tt>Baz2</tt> must store the
1436 dictionaries for <tt>Eq</tt> and <tt>Show</tt> respectively, and
1437 extract it on pattern matching.
1438
1439 Notice the way that the syntax fits smoothly with that used for
1440 universal quantification earlier.
1441
1442 <sect2>Restrictions
1443 <p>
1444
1445 There are several restrictions on the ways in which existentially-quantified
1446 constructors can be use.
1447
1448 <itemize>
1449
1450 <item> When pattern matching, each pattern match introduces a new,
1451 distinct, type for each existential type variable.  These types cannot
1452 be unified with any other type, nor can they escape from the scope of
1453 the pattern match.  For example, these fragments are incorrect:
1454
1455 <tscreen><verb>
1456   f1 (MkFoo a f) = a
1457 </verb></tscreen>
1458
1459 Here, the type bound by <tt>MkFoo</tt> "escapes", because <tt>a</tt>
1460 is the result of <tt>f1</tt>.  One way to see why this is wrong is to
1461 ask what type <tt>f1</tt> has:
1462
1463 <tscreen><verb>
1464   f1 :: Foo -> a             -- Weird!
1465 </verb></tscreen>
1466
1467 What is this "<tt>a</tt>" in the result type? Clearly we don't mean
1468 this:
1469
1470 <tscreen><verb>
1471   f1 :: forall a. Foo -> a   -- Wrong!
1472 </verb></tscreen>
1473
1474 The original program is just plain wrong.  Here's another sort of error
1475
1476 <tscreen><verb>
1477   f2 (Baz1 a b) (Baz1 p q) = a==q
1478 </verb></tscreen>
1479
1480 It's ok to say <tt>a==b</tt> or <tt>p==q</tt>, but
1481 <tt>a==q</tt> is wrong because it equates the two distinct types arising
1482 from the two <tt>Baz1</tt> constructors.
1483
1484
1485 <item>You can't pattern-match on an existentially quantified
1486 constructor in a <tt>let</tt> or <tt>where</tt> group of
1487 bindings. So this is illegal:
1488
1489 <tscreen><verb>
1490   f3 x = a==b where { Baz1 a b = x }
1491 </verb></tscreen>
1492
1493 You can only pattern-match
1494 on an existentially-quantified constructor in a <tt>case</tt> expression or
1495 in the patterns of a function definition.
1496
1497 The reason for this restriction is really an implementation one.
1498 Type-checking binding groups is already a nightmare without
1499 existentials complicating the picture.  Also an existential pattern
1500 binding at the top level of a module doesn't make sense, because it's
1501 not clear how to prevent the existentially-quantified type "escaping".
1502 So for now, there's a simple-to-state restriction.  We'll see how
1503 annoying it is.
1504
1505 <item>You can't use existential quantification for <tt>newtype</tt>
1506 declarations.  So this is illegal:
1507
1508 <tscreen><verb>
1509   newtype T = forall a. Ord a => MkT a
1510 </verb></tscreen>
1511
1512 Reason: a value of type <tt>T</tt> must be represented as a pair
1513 of a dictionary for <tt>Ord t</tt> and a value of type <tt>t</tt>.
1514 That contradicts the idea that <tt>newtype</tt> should have no
1515 concrete representation.  You can get just the same efficiency and effect
1516 by using <tt>data</tt> instead of <tt>newtype</tt>.  If there is no
1517 overloading involved, then there is more of a case for allowing
1518 an existentially-quantified <tt>newtype</tt>, because the <tt>data</tt>
1519 because the <tt>data</tt> version does carry an implementation cost,
1520 but single-field existentially quantified constructors aren't much
1521 use.  So the simple restriction (no existential stuff on <tt>newtype</tt>)
1522 stands, unless there are convincing reasons to change it.
1523
1524
1525 <item> You can't use <tt>deriving</tt> to define instances of a
1526 data type with existentially quantified data constructors.
1527
1528 Reason: in most cases it would not make sense. For example:#
1529 <tscreen><verb>
1530   data T = forall a. MkT [a] deriving( Eq )
1531 </verb></tscreen>
1532 To derive <tt>Eq</tt> in the standard way we would need to have equality
1533 between the single component of two <tt>MkT</tt> constructors:
1534 <tscreen><verb>
1535   instance Eq T where
1536     (MkT a) == (MkT b) = ???
1537 </verb></tscreen>
1538 But <tt>a</tt> and <tt>b</tt> have distinct types, and so can't be compared.
1539 It's just about possible to imagine examples in which the derived instance
1540 would make sense, but it seems altogether simpler simply to prohibit such
1541 declarations.  Define your own instances!
1542 </itemize>
1543
1544
1545 <sect1> <idx/Assertions/
1546 <label id="sec:assertions">
1547 <p>
1548
1549 If you want to make use of assertions in your standard Haskell code, you
1550 could define a function like the following:
1551
1552 <tscreen><verb>
1553 assert :: Bool -> a -> a
1554 assert False x = error "assertion failed!"
1555 assert _     x = x
1556 </verb></tscreen>
1557
1558 which works, but gives you back a less than useful error message --
1559 an assertion failed, but which and where?
1560
1561 One way out is to define an extended <tt/assert/ function which also
1562 takes a descriptive string to include in the error message and
1563 perhaps combine this with the use of a pre-processor which inserts
1564 the source location where <tt/assert/ was used.
1565
1566 Ghc offers a helping hand here, doing all of this for you. For every
1567 use of <tt/assert/ in the user's source:
1568
1569 <tscreen><verb>
1570 kelvinToC :: Double -> Double
1571 kelvinToC k = assert (k &gt;= 0.0) (k+273.15)
1572 </verb></tscreen>
1573
1574 Ghc will rewrite this to also include the source location where the
1575 assertion was made,
1576
1577 <tscreen><verb>
1578 assert pred val ==> assertError "Main.hs|15" pred val
1579 </verb></tscreen>
1580
1581 The rewrite is only performed by the compiler when it spots
1582 applications of <tt>Exception.assert</tt>, so you can still define and
1583 use your own versions of <tt/assert/, should you so wish. If not,
1584 import <tt/Exception/ to make use <tt/assert/ in your code.
1585
1586 To have the compiler ignore uses of assert, use the compiler option
1587 @-fignore-asserts@. <nidx>-fignore-asserts option</nidx> That is,
1588 expressions of the form @assert pred e@ will be rewritten to @e@.
1589
1590 Assertion failures can be caught, see the documentation for the
1591 Hugs/GHC Exception library for information of how.
1592
1593 % -----------------------------------------------------------------------------
1594 <sect1>Scoped Type Variables
1595 <label id="scoped-type-variables">
1596 <p>
1597
1598 A <em/pattern type signature/ can introduce a <em/scoped type
1599 variable/.  For example
1600
1601 <tscreen><verb>
1602 f (xs::[a]) = ys ++ ys
1603            where
1604               ys :: [a]
1605               ys = reverse xs
1606 </verb></tscreen>
1607
1608 The pattern @(xs::[a])@ includes a type signature for @xs@.
1609 This brings the type variable @a@ into scope; it scopes over
1610 all the patterns and right hand sides for this equation for @f@.
1611 In particular, it is in scope at the type signature for @y@.
1612
1613 At ordinary type signatures, such as that for @ys@, any type variables
1614 mentioned in the type signature <em/that are not in scope/ are
1615 implicitly universally quantified.  (If there are no type variables in
1616 scope, all type variables mentioned in the signature are universally
1617 quantified, which is just as in Haskell 98.)  In this case, since @a@
1618 is in scope, it is not universally quantified, so the type of @ys@ is
1619 the same as that of @xs@.  In Haskell 98 it is not possible to declare
1620 a type for @ys@; a major benefit of scoped type variables is that
1621 it becomes possible to do so.
1622
1623 Scoped type variables are implemented in both GHC and Hugs.  Where the
1624 implementations differ from the specification below, those differences
1625 are noted.
1626
1627 So much for the basic idea.  Here are the details.
1628
1629 <sect2>Scope and implicit quantification
1630 <p>
1631
1632 <itemize>
1633 <item> All the type variables mentioned in the patterns for a single
1634 function definition equation, that are not already in scope,
1635 are brought into scope by the patterns.  We describe this set as
1636 the <em/type variables bound by the equation/.
1637
1638 <item> The type variables thus brought into scope may be mentioned
1639 in ordinary type signatures or pattern type signatures anywhere within
1640 their scope.
1641
1642 <item> In ordinary type signatures, any type variable mentioned in the
1643 signature that is in scope is <em/not/ universally quantified.
1644
1645 <item> Ordinary type signatures do not bring any new type variables
1646 into scope (except in the type signature itself!). So this is illegal:
1647
1648 <tscreen><verb>
1649   f :: a -> a
1650   f x = x::a
1651 </verb></tscreen>
1652
1653 It's illegal because @a@ is not in scope in the body of @f@,
1654 so the ordinary signature @x::a@ is equivalent to @x::forall a.a@;
1655 and that is an incorrect typing.
1656
1657 <item> There is no implicit universal quantification on pattern type
1658 signatures, nor may one write an explicit @forall@ type in a pattern
1659 type signature.  The pattern type signature is a monotype.
1660
1661 <item>
1662 The type variables in the head of a @class@ or @instance@ declaration
1663 scope over the methods defined in the @where@ part.  For example:
1664
1665 <tscreen><verb>
1666   class C a where
1667     op :: [a] -> a
1668
1669     op xs = let ys::[a]
1670                 ys = reverse xs
1671             in
1672             head ys
1673 </verb></tscreen>
1674
1675 (Not implemented in Hugs yet, Dec 98).
1676 </itemize>
1677
1678 <sect2>Polymorphism
1679 <p>
1680
1681 <itemize>
1682 <item> Pattern type signatures are completely orthogonal to ordinary, separate
1683 type signatures.  The two can be used independently or together.  There is
1684 no scoping associated with the names of the type variables in a separate type signature.
1685
1686 <tscreen><verb>
1687    f :: [a] -> [a]
1688    f (xs::[b]) = reverse xs
1689 </verb></tscreen>
1690
1691 <item> The function must be polymorphic in the type variables
1692 bound by all its equations.  Operationally, the type variables bound
1693 by one equation must not:
1694
1695 <itemize>
1696 <item> Be unified with a type (such as @Int@, or @[a]@).
1697 <item> Be unified with a type variable free in the environment.
1698 <item> Be unified with each other.  (They may unify with the type variables
1699 bound by another equation for the same function, of course.)
1700 </itemize>
1701
1702 For example, the following all fail to type check:
1703
1704 <tscreen><verb>
1705   f (x::a) (y::b) = [x,y]       -- a unifies with b
1706
1707   g (x::a) = x + 1::Int         -- a unifies with Int
1708
1709   h x = let k (y::a) = [x,y]    -- a is free in the
1710         in k x                  -- environment
1711
1712   k (x::a) True    = ...        -- a unifies with Int
1713   k (x::Int) False = ...
1714
1715   w :: [b] -> [b]
1716   w (x::a) = x                  -- a unifies with [b]
1717 </verb></tscreen>
1718
1719 <item> The pattern-bound type variable may, however, be constrained
1720 by the context of the principal type, thus:
1721
1722 <tscreen><verb>
1723   f (x::a) (y::a) = x+y*2
1724 </verb></tscreen>
1725
1726 gets the inferred type: @forall a. Num a => a -> a -> a@.
1727 </itemize>
1728
1729 <sect2>Result type signatures
1730 <p>
1731
1732 <itemize>
1733 <item> The result type of a function can be given a signature,
1734 thus:
1735
1736 <tscreen><verb>
1737   f (x::a) :: [a] = [x,x,x]
1738 </verb></tscreen>
1739
1740 The final @":: [a]"@ after all the patterns gives a signature to the
1741 result type.  Sometimes this is the only way of naming the type variable
1742 you want:
1743
1744 <tscreen><verb>
1745   f :: Int -> [a] -> [a]
1746   f n :: ([a] -> [a]) = let g (x::a, y::a) = (y,x)
1747                         in \xs -> map g (reverse xs `zip` xs)
1748 </verb></tscreen>
1749
1750 </itemize>
1751
1752 Result type signatures are not yet implemented in Hugs.
1753
1754 <sect2>Pattern signatures on other constructs
1755 <p>
1756
1757 <itemize>
1758 <item> A pattern type signature can be on an arbitrary sub-pattern, not
1759 just on a variable:
1760
1761 <tscreen><verb>
1762   f ((x,y)::(a,b)) = (y,x) :: (b,a)
1763 </verb></tscreen>
1764
1765 <item> Pattern type signatures, including the result part, can be used
1766 in lambda abstractions:
1767
1768 <tscreen><verb>
1769   (\ (x::a, y) :: a -> x)
1770 </verb></tscreen>
1771
1772 Type variables bound by these patterns must be polymorphic in
1773 the sense defined above.
1774 For example:
1775
1776 <tscreen><verb>
1777   f1 (x::c) = f1 x      -- ok
1778   f2 = \(x::c) -> f2 x  -- not ok
1779 </verb></tscreen>
1780
1781 Here, @f1@ is OK, but @f2@ is not, because @c@ gets unified
1782 with a type variable free in the environment, in this
1783 case, the type of @f2@, which is in the environment when
1784 the lambda abstraction is checked.
1785
1786 <item> Pattern type signatures, including the result part, can be used
1787 in @case@ expressions:
1788
1789 <tscreen><verb>
1790   case e of { (x::a, y) :: a -> x }
1791 </verb></tscreen>
1792
1793 The pattern-bound type variables must, as usual,
1794 be polymorphic in the following sense: each case alternative,
1795 considered as a lambda abstraction, must be polymorphic.
1796 Thus this is OK:
1797
1798 <tscreen><verb>
1799   case (True,False) of { (x::a, y) -> x }
1800 </verb></tscreen>
1801
1802 Even though the context is that of a pair of booleans,
1803 the alternative itself is polymorphic.  Of course, it is
1804 also OK to say:
1805
1806 <tscreen><verb>
1807   case (True,False) of { (x::Bool, y) -> x }
1808 </verb></tscreen>
1809
1810 <item>
1811 To avoid ambiguity, the type after the ``@::@'' in a result
1812 pattern signature on a lambda or @case@ must be atomic (i.e. a single
1813 token or a parenthesised type of some sort).  To see why,
1814 consider how one would parse this:
1815
1816 <tscreen><verb>
1817   \ x :: a -> b -> x
1818 </verb></tscreen>
1819
1820 <item> Pattern type signatures that bind new type variables
1821 may not be used in pattern bindings at all.
1822 So this is illegal:
1823
1824 <tscreen><verb>
1825   f x = let (y, z::a) = x in ...
1826 </verb></tscreen>
1827
1828 But these are OK, because they do not bind fresh type variables:
1829
1830 <tscreen><verb>
1831   f1 x            = let (y, z::Int) = x in ...
1832   f2 (x::(Int,a)) = let (y, z::a)   = x in ...
1833 </verb></tscreen>
1834
1835 However a single variable is considered a degenerate function binding,
1836 rather than a degerate pattern binding, so this is permitted, even
1837 though it binds a type variable:
1838
1839 <tscreen><verb>
1840   f :: (b->b) = \(x::b) -> x
1841 </verb></tscreen>
1842
1843 </itemize>
1844 Such degnerate function bindings do not fall under the monomorphism
1845 restriction.  Thus:
1846
1847 <tscreen><verb>
1848   g :: a -> a -> Bool = \x y. x==y
1849 </verb></tscreen>
1850
1851 Here @g@ has type @forall a. Eq a => a -> a -> Bool@, just as if
1852 @g@ had a separate type signature.  Lacking a type signature, @g@
1853 would get a monomorphic type.
1854
1855 <sect2>Existentials
1856 <p>
1857
1858 <itemize>
1859 <item> Pattern type signatures can bind existential type variables.
1860 For example:
1861
1862 <tscreen><verb>
1863   data T = forall a. MkT [a]
1864
1865   f :: T -> T
1866   f (MkT [t::a]) = MkT t3
1867                  where
1868                    t3::[a] = [t,t,t]
1869 </verb></tscreen>
1870
1871 </itemize>
1872
1873 %-----------------------------------------------------------------------------
1874 <sect1>Pragmas
1875 <label id="pragmas">
1876 <p>
1877
1878 GHC supports several pragmas, or instructions to the compiler placed
1879 in the source code.  Pragmas don't affect the meaning of the program,
1880 but they might affect the efficiency of the generated code.
1881
1882 <sect2>INLINE pragma
1883 <label id="inline-pragma">
1884 <nidx>INLINE pragma</nidx>
1885 <nidx>pragma, INLINE</nidx>
1886 <p>
1887
1888 GHC (with @-O@, as always) tries to inline (or ``unfold'')
1889 functions/values that are ``small enough,'' thus avoiding the call
1890 overhead and possibly exposing other more-wonderful optimisations.
1891
1892 You will probably see these unfoldings (in Core syntax) in your
1893 interface files.
1894
1895 Normally, if GHC decides a function is ``too expensive'' to inline, it
1896 will not do so, nor will it export that unfolding for other modules to
1897 use.
1898
1899 The sledgehammer you can bring to bear is the
1900 @INLINE@<nidx>INLINE pragma</nidx> pragma, used thusly:
1901 <tscreen><verb>
1902 key_function :: Int -> String -> (Bool, Double)
1903
1904 #ifdef __GLASGOW_HASKELL__
1905 {-# INLINE key_function #-}
1906 #endif
1907 </verb></tscreen>
1908 (You don't need to do the C pre-processor carry-on unless you're going
1909 to stick the code through HBC---it doesn't like @INLINE@ pragmas.)
1910
1911 The major effect of an @INLINE@ pragma is to declare a function's
1912 ``cost'' to be very low.  The normal unfolding machinery will then be
1913 very keen to inline it.
1914
1915 An @INLINE@ pragma for a function can be put anywhere its type
1916 signature could be put.
1917
1918 @INLINE@ pragmas are a particularly good idea for the
1919 @then@/@return@ (or @bind@/@unit@) functions in a monad.
1920 For example, in GHC's own @UniqueSupply@ monad code, we have:
1921 <tscreen><verb>
1922 #ifdef __GLASGOW_HASKELL__
1923 {-# INLINE thenUs #-}
1924 {-# INLINE returnUs #-}
1925 #endif
1926 </verb></tscreen>
1927
1928 <sect2>NOINLINE pragma
1929 <label id="noinline-pragma">
1930 <p>
1931 <nidx>NOINLINE pragma</nidx>
1932 <nidx>pragma, NOINLINE</nidx>
1933
1934 The @NOINLINE@ pragma does exactly what you'd expect: it stops the
1935 named function from being inlined by the compiler.  You shouldn't ever
1936 need to do this, unless you're very cautious about code size.
1937
1938 <sect2>SPECIALIZE pragma
1939 <label id="specialize-pragma">
1940 <p>
1941 <nidx>SPECIALIZE pragma</nidx>
1942 <nidx>pragma, SPECIALIZE</nidx>
1943 <nidx>overloading, death to</nidx>
1944
1945 (UK spelling also accepted.)  For key overloaded functions, you can
1946 create extra versions (NB: more code space) specialised to particular
1947 types.  Thus, if you have an overloaded function:
1948
1949 <tscreen><verb>
1950 hammeredLookup :: Ord key => [(key, value)] -> key -> value
1951 </verb></tscreen>
1952
1953 If it is heavily used on lists with @Widget@ keys, you could
1954 specialise it as follows:
1955 <tscreen><verb>
1956 {-# SPECIALIZE hammeredLookup :: [(Widget, value)] -> Widget -> value #-}
1957 </verb></tscreen>
1958
1959 To get very fancy, you can also specify a named function to use for
1960 the specialised value, by adding @= blah@, as in:
1961 <tscreen><verb>
1962 {-# SPECIALIZE hammeredLookup :: ...as before... = blah #-}
1963 </verb></tscreen>
1964 It's <em>Your Responsibility</em> to make sure that @blah@ really
1965 behaves as a specialised version of @hammeredLookup@!!!
1966
1967 NOTE: the @=blah@ feature isn't implemented in GHC 4.xx.
1968
1969 An example in which the @= blah@ form will Win Big:
1970 <tscreen><verb>
1971 toDouble :: Real a => a -> Double
1972 toDouble = fromRational . toRational
1973
1974 {-# SPECIALIZE toDouble :: Int -> Double = i2d #-}
1975 i2d (I# i) = D# (int2Double# i) -- uses Glasgow prim-op directly
1976 </verb></tscreen>
1977 The @i2d@ function is virtually one machine instruction; the
1978 default conversion---via an intermediate @Rational@---is obscenely
1979 expensive by comparison.
1980
1981 By using the US spelling, your @SPECIALIZE@ pragma will work with
1982 HBC, too.  Note that HBC doesn't support the @= blah@ form.
1983
1984 A @SPECIALIZE@ pragma for a function can be put anywhere its type
1985 signature could be put.
1986
1987 <sect2>SPECIALIZE instance pragma
1988 <label id="specialize-instance-pragma">
1989 <p>
1990 <nidx>SPECIALIZE pragma</nidx>
1991 <nidx>overloading, death to</nidx>
1992 Same idea, except for instance declarations.  For example:
1993 <tscreen><verb>
1994 instance (Eq a) => Eq (Foo a) where { ... usual stuff ... }
1995
1996 {-# SPECIALIZE instance Eq (Foo [(Int, Bar)] #-}
1997 </verb></tscreen>
1998 Compatible with HBC, by the way.
1999
2000 <sect2>LINE pragma
2001 <label id="line-pragma">
2002 <p>
2003 <nidx>LINE pragma</nidx>
2004 <nidx>pragma, LINE</nidx>
2005
2006 This pragma is similar to C's @#line@ pragma, and is mainly for use in
2007 automatically generated Haskell code.  It lets you specify the line
2008 number and filename of the original code; for example
2009
2010 <tscreen><verb>
2011 {-# LINE 42 "Foo.vhs" #-}
2012 </verb></tscreen>
2013
2014 if you'd generated the current file from something called @Foo.vhs@
2015 and this line corresponds to line 42 in the original.  GHC will adjust
2016 its error messages to refer to the line/file named in the @LINE@
2017 pragma.
2018
2019 <sect2>RULES pragma
2020 <p>
2021 The RULES pragma lets you specify rewrite rules.  It is described in
2022 Section <ref name="Rewrite Rules"
2023 id="rewrite-rules">.
2024
2025 %-----------------------------------------------------------------------------
2026 <sect1>Rewrite rules
2027 <label id="rewrite-rules">
2028 <nidx>RULES pagma</nidx>
2029 <nidx>pragma, RULES</nidx>
2030 <nidx>rewrite rules</nidx>
2031 <p>
2032
2033 The programmer can specify rewrite rules as part of the source program
2034 (in a pragma).  GHC applies these rewrite rules wherever it can.
2035
2036 Here is an example:
2037 <tscreen><verb>
2038   {-# RULES
2039         "map/map"       forall f g xs. map f (map g xs) = map (f.g) xs
2040   #-}
2041 </verb></tscreen>
2042
2043 <sect2>Syntax
2044 <p>
2045
2046 From a syntactic point of view:
2047 <itemize>
2048 <item> Each rule has a name, enclosed in double quotes.  The name itself has
2049 no significance at all.  It is only used when reporting how many times the rule fired.
2050 <item> There may be zero or more rules in a @RULES@ pragma.
2051 <item> Layout applies in a @RULES@ pragma.  Currently no new indentation level
2052 is set, so you must lay out your rules starting in the same column as the
2053 enclosing definitions.
2054 <item> Each variable mentioned in a rule must either be in scope (e.g. @map@),
2055 or bound by the @forall@ (e.g. @f@, @g@, @xs@).  The variables bound by
2056 the @forall@ are called the <em>pattern</em> variables.  They are separated
2057 by spaces, just like in a type @forall@.
2058 <item> A pattern variable may optionally have a type signature.
2059 If the type of the pattern variable is polymorphic, it <em>must</em> have a type signature.
2060 For example, here is the @foldr/build@ rule:
2061 <tscreen><verb>
2062   "fold/build"  forall k z (g::forall b. (a->b->b) -> b -> b) .
2063                 foldr k z (build g) = g k z
2064 </verb></tscreen>
2065 Since @g@ has a polymorphic type, it must have a type signature.
2066
2067 <item> The left hand side of a rule must consist of a top-level variable applied
2068 to arbitrary expressions.  For example, this is <em>not</em> OK:
2069 <tscreen><verb>
2070   "wrong1"   forall e1 e2.  case True of { True -> e1; False -> e2 } = e1
2071   "wrong2"   forall f.      f True = True
2072 </verb></tscreen>
2073 In @"wrong1"@, the LHS is not an application; in @"wrong1"@, the LHS has a pattern variable
2074 in the head.
2075 <item> A rule does not need to be in the same module as (any of) the
2076 variables it mentions, though of course they need to be in scope.
2077 <item> Rules are automatically exported from a module, just as instance declarations are.
2078 </itemize>
2079
2080 <sect2>Semantics
2081 <p>
2082
2083 From a semantic point of view:
2084 <itemize>
2085 <item> Rules are only applied if you use the @-O@ flag.
2086
2087 <item> Rules are regarded as left-to-right rewrite rules.
2088 When GHC finds an expression that is a substitution instance of the LHS
2089 of a rule, it replaces the expression by the (appropriately-substituted) RHS.
2090 By "a substitution instance" we mean that the LHS can be made equal to the
2091 expression by substituting for the pattern variables.
2092
2093 <item> The LHS and RHS of a rule are typechecked, and must have the
2094 same type.
2095
2096 <item> GHC makes absolutely no attempt to verify that the LHS and RHS
2097 of a rule have the same meaning.  That is undecideable in general, and
2098 infeasible in most interesting cases.  The responsibility is entirely the programmer's!
2099
2100 <item> GHC makes no attempt to make sure that the rules are confluent or
2101 terminating.  For example:
2102 <tscreen><verb>
2103   "loop"        forall x,y.  f x y = f y x
2104 </verb></tscreen>
2105 This rule will cause the compiler to go into an infinite loop.
2106
2107 <item> If more than one rule matches a call, GHC will choose one arbitrarily to apply.
2108
2109 <item> GHC currently uses a very simple, syntactic, matching algorithm
2110 for matching a rule LHS with an expression.  It seeks a substitution
2111 which makes the LHS and expression syntactically equal modulo alpha
2112 conversion.  The pattern (rule), but not the expression, is eta-expanded if
2113 necessary.  (Eta-expanding the epression can lead to laziness bugs.)
2114 But not beta conversion (that's called higher-order matching).
2115 <p>
2116 Matching is carried out on GHC's intermediate language, which includes
2117 type abstractions and applications.  So a rule only matches if the
2118 types match too.  See Section <ref name="Specialisation" id="rule-spec"> below.
2119
2120 <item> GHC keeps trying to apply the rules as it optimises the program.
2121 For example, consider:
2122 <tscreen><verb>
2123   let s = map f
2124       t = map g
2125   in
2126   s (t xs)
2127 </verb></tscreen>
2128 The expression @s (t xs)@ does not match the rule @"map/map"@, but GHC
2129 will substitute for @s@ and @t@, giving an expression which does match.
2130 If @s@ or @t@ was (a) used more than once, and (b) large or a redex, then it would
2131 not be substituted, and the rule would not fire.
2132
2133 <item> In the earlier phases of compilation, GHC inlines <em>nothing
2134 that appears on the LHS of a rule</em>, because once you have substituted
2135 for something you can't match against it (given the simple minded
2136 matching).  So if you write the rule
2137 <tscreen><verb>
2138         "map/map"       forall f,g.  map f . map g = map (f.g)
2139 </verb></tscreen>
2140 this <em>won't</em> match the expression @map f (map g xs)@.
2141 It will only match something written with explicit use of ".".
2142 Well, not quite.  It <em>will</em> match the expression
2143 <tscreen><verb>
2144         wibble f g xs
2145 </verb></tscreen>
2146 where @wibble@ is defined:
2147 <tscreen><verb>
2148         wibble f g = map f . map g
2149 </verb></tscreen>
2150 because @wibble@ will be inlined (it's small).
2151
2152 Later on in compilation, GHC starts inlining even things on the
2153 LHS of rules, but still leaves the rules enabled.  This inlining
2154 policy is controlled by the per-simplification-pass flag @-finline-phase@n.
2155
2156 <item> All rules are implicitly exported from the module, and are therefore
2157 in force in any module that imports the module that defined the rule, directly
2158 or indirectly.  (That is, if A imports B, which imports C, then C's rules are
2159 in force when compiling A.)  The situation is very similar to that for instance
2160 declarations.
2161 </itemize>
2162
2163 <sect2>List fusion
2164 <p>
2165
2166 The RULES mechanism is used to implement fusion (deforestation) of common list functions.
2167 If a "good consumer" consumes an intermediate list constructed by a "good producer", the
2168 intermediate list should be eliminated entirely.
2169 <p>
2170 The following are good producers:
2171 <itemize>
2172 <item> List comprehensions
2173 <item> Enumerations of @Int@ and @Char@ (e.g. @['a'..'z']@).
2174 <item> Explicit lists (e.g. @[True, False]@)
2175 <item> The cons constructor (e.g @3:4:[]@)
2176 <item> @++@
2177 <item> @map@
2178 <item> @filter@
2179 <item> @iterate@, @repeat@
2180 <item> @zip@, @zipWith@
2181 </itemize>
2182
2183 The following are good consumers:
2184 <itemize>
2185 <item> List comprehensions
2186 <item> @array@ (on its second argument)
2187 <item> @length@
2188 <item> @++@ (on its first argument)
2189 <item> @map@
2190 <item> @filter@
2191 <item> @concat@
2192 <item> @unzip@, @unzip2@, @unzip3@, @unzip4@
2193 <item> @zip@, @zipWith@ (but on one argument only; if both are good producers, @zip@
2194 will fuse with one but not the other)
2195 <item> @partition@
2196 <item> @head@
2197 <item> @and@, @or@, @any@, @all@
2198 <item> @sequence_@
2199 <item> @msum@
2200 <item> @sortBy@
2201 </itemize>
2202
2203 So, for example, the following should generate no intermediate lists:
2204 <tscreen><verb>
2205         array (1,10) [(i,i*i) | i <- map (+ 1) [0..9]]
2206 </verb></tscreen>
2207
2208 This list could readily be extended; if there are Prelude functions that you use
2209 a lot which are not included, please tell us.
2210 <p>
2211 If you want to write your own good consumers or producers, look at the
2212 Prelude definitions of the above functions to see how to do so.
2213
2214 <sect2>Specialisation
2215 <label id="rule-spec">
2216 <p>
2217
2218 Rewrite rules can be used to get the same effect as a feature
2219 present in earlier version of GHC:
2220 <tscreen><verb>
2221   {-# SPECIALIZE fromIntegral :: Int8 -> Int16 = int8ToInt16 #-}
2222 </verb></tscreen>
2223 This told GHC to use @int8ToInt16@ instead of @fromIntegral@ whenever
2224 the latter was called with type @Int8 -> Int16@.  That is, rather than
2225 specialising the original definition of @fromIntegral@ the programmer is
2226 promising that it is safe to use @int8ToInt16@ instead.
2227
2228 This feature is no longer in GHC.  But rewrite rules let you do the
2229 same thing:
2230 <tscreen><verb>
2231   {-# RULES
2232     "fromIntegral/Int8/Int16" fromIntegral = int8ToInt16
2233   #-}
2234 </verb></tscreen>
2235 This slightly odd-looking rule instructs GHC to replace @fromIntegral@
2236 by @int8ToInt16@ <em>whenever the types match</em>.  Speaking more operationally,
2237 GHC adds the type and dictionary applications to get the typed rule
2238 <tscreen><verb>
2239         forall (d1::Integral Int8) (d2::Num Int16) .
2240                 fromIntegral Int8 Int16 d1 d2 = int8ToInt16
2241 </verb></tscreen>
2242 What is more,
2243 this rule does not need to be in the same file as fromIntegral,
2244 unlike the @SPECIALISE@ pragmas which currently do (so that they
2245 have an original definition available to specialise).
2246
2247 <sect2>Controlling what's going on
2248 <p>
2249
2250 <itemize>
2251 <item> Use @-ddump-rules@ to see what transformation rules GHC is using.
2252 <item> Use @-ddump-simpl-stats@ to see what rules are being fired.
2253 If you add @-dppr-debug@ you get a more detailed listing.
2254 <item> The defintion of (say) @build@ in @PrelBase.lhs@ looks llike this:
2255 <tscreen><verb>
2256         build   :: forall a. (forall b. (a -> b -> b) -> b -> b) -> [a]
2257         {-# INLINE build #-}
2258         build g = g (:) []
2259 </verb></tscreen>
2260 Notice the @INLINE@!  That prevents @(:)@ from being inlined when compiling
2261 @PrelBase@, so that an importing module will ``see'' the @(:)@, and can
2262 match it on the LHS of a rule.  @INLINE@ prevents any inlining happening
2263 in the RHS of the @INLINE@ thing.  I regret the delicacy of this.
2264
2265 <item> In @ghc/lib/std/PrelBase.lhs@ look at the rules for @map@ to
2266 see how to write rules that will do fusion and yet give an efficient
2267 program even if fusion doesn't happen.  More rules in @PrelList.lhs@.
2268 </itemize>
2269
2270
2271 %-----------------------------------------------------------------------------
2272 <sect1>Pattern guards
2273 <label id="pattern-guards">
2274 <p>
2275 GHC supports the ``pattern-guards'' extension to
2276 the guards that form part of Haskell function
2277 definitions.   The general aim is similar to that of views [1,2],
2278 but the expressive power of this proposal is a little different, in places
2279 more expressive than views, and in places less so.
2280
2281 <sect2>What's the problem?
2282 <p>
2283 Consider the following Haskell function definition
2284 <tscreen><verb>
2285   filter p []           = []
2286   filter p (y:ys) | p y = y : filter p ys
2287   | otherwise = filter p ys
2288 </verb></tscreen>
2289
2290 <p>The decision of which right-hand side to choose is made in
2291 two stages: first, pattern matching selects a guarded group,
2292 and second, the boolean-valued guards select among the right-hand
2293 sides of the group.  In these two stages, only the pattern-matching
2294 stage can bind variables.  A guard is simply a boolean valued expression.
2295
2296 <p>So pattern-matching combines selection with binding, whereas guards simply
2297 perform selection.  Sometimes this is a tremendous nuisance.  For example,
2298 suppose we have an abstract data type of finite maps, with a lookup
2299 operation:
2300 <tscreen><verb>
2301   lookup :: FinteMap -> Int -> Maybe Int
2302 </verb></tscreen>
2303
2304 <p>The lookup returns Nothing if the supplied key is not in the
2305 domain of the mapping, and (Just v) otherwise, where v is
2306 the value that the key maps to.  Now consider the following
2307 definition:
2308 <tscreen><verb>
2309    clunky env var1 var2 | ok1 && ok2 = val1 + val2
2310                         | otherwise  = var1 + var2
2311      where
2312         m1   = lookup env var1
2313         m2   = lookup env var2
2314         ok1  = maybeToBool m1
2315         ok2  = maybeToBool m2
2316         val1 = expectJust m1
2317         val2 = expectJust m2
2318 </verb></tscreen>
2319 The auxiliary functions are
2320 <tscreen><verb>
2321   maybeToBool :: Maybe a -> Bool
2322   maybeToBool (Just x) = True
2323   maybeToBool Nothing  = False
2324
2325   expectJust :: Maybe a -> a
2326   expectJust (Just x) = x
2327   expectJust Nothing  = error "Unexpected Nothing"
2328 </verb></tscreen>
2329 <p>What is <tt>clunky</tt> doing?  The guard <tt>ok1 && ok2</tt> checks that both
2330 lookups succeed, using <tt>maybeToBool</tt> to convert the maybe types to
2331 booleans.  The (lazily evaluated) <tt>expectJust</tt> calls extract the values
2332 from the results of the lookups, and binds the returned values to
2333 <tt>val1</tt> and <tt>val2</tt> respectively.  If either lookup fails, then <tt>clunky</tt>
2334 takes the <tt>otherwise</tt> case and returns the sum of its arguments.
2335
2336 <p>This is certainly legal Haskell, but it is a tremendously verbose
2337 and un-obvious way to achieve the desired effect.  Arguably, a more
2338 direct way to write <tt>clunky</tt> would be to use case expressions:
2339 <tscreen><verb>
2340   clunky env var1 var1  = case lookup env var1 of
2341                             Nothing -> fail
2342                             Just val1 -> case lookup env var2 of
2343                                            Nothing -> fail
2344                                            Just val2 -> val1 + val2
2345                         where
2346                           fail = val1 + val2
2347 </verb></tscreen>
2348 <p>This is a bit shorter, but hardly better.  Of course, we can rewrite
2349 any set of pattern-matching, guarded equations as case expressions;
2350 that is precisely what the compiler does when compiling equations!
2351 The reason that Haskell provides guarded equations is because they
2352 allow us to write down the cases we want to consider, one at a time,
2353 independently of each other.  This structure is hidden in the case
2354 version.  Two of the right-hand sides are really the same (<tt>fail</tt>),
2355 and the whole expression tends to become more and more indented.
2356
2357 <p>Worse, if this was just one equation of <tt>clunky</tt>, with others that
2358 follow, then the thing wouldn't work at all.  That is, suppose we have
2359 <tscreen><verb>
2360   clunky' env (var1:var2:vars) | ok1 && ok2 = val1 + val2
2361         where
2362           m1 = lookup env var1
2363           ...as before...
2364
2365   clunky' env [var1] = ...some stuff...
2366   clunky' env []     = ...more stuff...
2367 </verb></tscreen>
2368 Now, if either the lookups fail we want to fall through to the second
2369 and third equations for <tt>clunky'</tt>.  If we write the definition in the
2370 form of a case expression we are forced to make the latter two
2371 equations for <tt>clunky'</tt> into a separate definition and call it in
2372 the right hand side of <tt>fail</tt>.  Ugh.  Ugh.  Ugh.  This is precisely
2373 why Haskell provides guards at all, rather than relying on if-then-else
2374 expressions: if the guard fails we fall through to the next equation,
2375 whereas we can't do that with a conditional.
2376
2377
2378 <p>What is frustrating about this is that the solution is so tantalisingly
2379 near at hand!  What we want to do is to pattern-match on the result of
2380 the lookup.  We can do it like this:
2381 <tscreen><verb>
2382   clunky' env vars@(var1:var2:vars)
2383     = clunky_help (lookup env var1) (lookup env var2) vars
2384     where
2385       clunky_help (Just val1) (Just val2) vars   = val1 + val2
2386       clunky_help _           _           [var1] = ...some stuff...
2387       clunky_help _           _           []     = ...more stuff...
2388 </verb></tscreen>
2389 <p>Now we do get three equations, one for each right-hand side, but
2390 it is still clunky.  In a big set of equations it becomes hard to
2391 remember what each <tt>Just</tt> pattern corresponds to.  Worse, we can't
2392 use one lookup in the next.  For example, suppose our function was
2393 like this:
2394 <tscreen><verb>
2395   clunky'' env var1 var2
2396          | ok1 && ok2 = val2
2397          | otherwise  = var1 + var2
2398          where
2399              m1 = lookup env var1
2400              m2 = lookup env (var2 + val1)
2401              ok1 = maybeToBool m1
2402              ok2 = maybeToBool m2
2403              val1 = expectJust m1
2404              val2 = expectJust m2
2405 </verb></tscreen>
2406 <p>Notice that the second lookup uses val1, the result of the first lookup.
2407 To express this with a <tt>clunky_help</tt> function requires a second helper
2408 function nested inside the first.  Dire stuff.
2409
2410 <p>So the original definition, using <tt>maybeToBool</tt> and <tt>expectJust</tt> has the
2411 merit that it scales nicely, to accommodate both multiple equations
2412 and successive lookups.  Yet it stinks.
2413
2414
2415 <sect2>The pattern guards extension
2416 <p>
2417 The extension that GHC implements is simple:
2418 <em>instead of being a boolean expression,
2419 a guard is a list of qualifiers,
2420 exactly as in a list comprehension</em>.
2421
2422 <p>That is, the only syntax change is to replace
2423 <em>exp</em> by <em>quals</em> in the syntax of guarded equations.
2424
2425 <p>Here is how you can now write <tt>clunky</tt>:
2426 <tscreen><verb>
2427   clunky env var1 var1
2428     | Just val1 <- lookup env var1
2429     , Just val2 <- lookup env var2
2430     = val1 + val2
2431   ...other equations for clunky...
2432 </verb></tscreen>
2433 <p>The semantics should be clear enough.  The qualifers are matched in
2434 order.  For a <tt><-</tt> qualifier, which I call a <em>pattern guard</em>, the
2435 right hand side is evaluated and matched against the pattern on the
2436 left.  If the match fails then the whole guard fails and the next
2437 equation is tried.  If it succeeds, then the appropriate binding takes
2438 place, and the next qualifier is matched, in the augmented
2439 environment.  Unlike list comprehensions, however, the type of the
2440 expression to the right of the <tt><-</tt> is the same as the type of the
2441 pattern to its left.  The bindings introduced by pattern guards scope
2442 over all the remaining guard qualifiers, and over the right hand side
2443 of the equation.
2444
2445 <p>Just as with list comprehensions, boolean expressions can be freely mixed
2446 with among the pattern guards.  For example:
2447 <tscreen><verb>
2448   f x | [y] <- x
2449       , y > 3
2450       , Just z <- h y
2451       = ...
2452 </verb></tscreen>
2453 <p>Haskell's current guards therefore emerge as a special case, in which the
2454 qualifier list has just one element, a boolean expression.
2455
2456 <p>Just as with list comprehensions, a <tt>let</tt> qualifier can introduce a binding.
2457 It is also possible to do this with pattern guard with a simple
2458 variable pattern <tt>a <- e</tt>
2459 However a <tt>let</tt> qualifier is a little more powerful, because it can
2460 introduce a recursive or mutually-recursive binding.  It is not clear
2461 whether this power is particularly useful, but it seems more uniform to
2462 have exactly the same syntax as list comprehensions.
2463
2464 <p>One could argue that the notation <tt><-</tt> is misleading, suggesting
2465 the idea of <em>drawn from</em> as in a list comprehension.  But it's very
2466 nice to reuse precisely the list-comprehension syntax.  Furthermore,
2467 the only viable alternative is <tt>=</tt>, and that would lead to parsing
2468 difficulties, because we rely on the <tt>=</tt> to herald the arrival of
2469 the right-hand side of the equation.  Consider <tt>f x | y = h x = 3</tt>.
2470
2471 <sect2>Views
2472
2473 <p>One very useful application of pattern guards is to abstract data types.
2474 Given an abstract data type it's quite common to have conditional
2475 selectors.  For example:
2476 <tscreen><verb>
2477   addressMaybe :: Person -> Maybe String
2478 </verb></tscreen>
2479 <p>The function <tt>addressMaybe</tt> extracts a string from the abstract data type
2480 <tt>Person</tt>, but returns <tt>Nothing</tt> if the person has no address.  Inside
2481 GHC we have lots of functions like:
2482 <tscreen><verb>
2483   getFunTyMaybe :: Type -> Maybe (Type,Type)
2484 </verb></tscreen>
2485 <p>This returns <tt>Nothing</tt> if the argument is not a function type, and
2486 <tt>(Just arg_ty res_ty)</tt> if the argument is a function type.  The data
2487 type <tt>Type</tt> is abstract.
2488
2489 <p>Since <tt>Type</tt> and <tt>Person</tt> are abstract we can't pattern-match on them,
2490 but it's really nice to be able to say:
2491 <tscreen><verb>
2492   f person | Just address <- addressMaybe person
2493     = ...
2494     | otherwise
2495     = ...
2496 </verb></tscreen>
2497 <p>Thus, pattern guards can be seen as addressing a similar goal to
2498 that of views, namely reconciling pattern matching with data abstraction.
2499 Views were proposed by Wadler ages ago [1], and are the subject of a
2500 recent concrete proposal for a Haskell language extension [2].
2501
2502 <p>It is natural to ask whether views subsume pattern guards or vice versa.
2503 The answer is "neither".
2504
2505 <sect3>Do views subsume pattern guards?
2506
2507 <p>The views proposal [2] points out that you can use views to simulate
2508 (some) guards and, as we saw above, views have similar purpose and
2509 functionality to at least some applications of pattern guards.
2510
2511 <p>However, views give a view on a <em>single</em> value, whereas guards allow
2512 arbitrary function calls to combine in-scope values.  For example,
2513 <tt>clunky</tt> matches <tt>(Just val1)</tt> against <tt>(lookup env var1)</tt>. We do not want a
2514 view of <tt>env</tt> nor of <tt>var1</tt> but rather of their combination by
2515 <tt>lookup</tt>.  Views simply do not help with <tt>clunky</tt>.
2516
2517 <p>Views are capable of dealing with the data abstraction issue of
2518 course.  However, each conditional selector (such as <tt>getFunTyMaybe</tt>)
2519 would require its own view, complete with its own viewtype:
2520 <tscreen><verb>
2521   view FunType of Type  = FunType Type Type
2522                         | NotFunType
2523           where
2524             funType (Fun arg res) = FunType arg res
2525             funType other_type    = NotFunType
2526 </verb></tscreen>
2527 This seems a bit heavyweight (three new names instead of one)
2528 compared with
2529 <tscreen><verb>
2530   getFunTypeMaybe (Fun arg res) = Just (arg,res)
2531   getFunTypeMaybe other_type    = Nothing
2532 </verb></tscreen>
2533 <p>Here we can re-use the existing <tt>Maybe</tt> type.  Not only does this
2534 save defining new types, but it allows the existing library of
2535 functions on <tt>Maybe</tt> types to be applied directly to the result
2536 of <tt>getFunTypeMaybe</tt>.
2537
2538 <p>Just to put this point another way, suppose we had a function
2539 <tscreen><verb>
2540   tyvarsOf :: Type -> [TyVar]
2541 </verb></tscreen>
2542 that returns the free type variables of a type.
2543 Would anyone suggest that we make this into a view of <tt>Type</tt>?
2544 <tscreen><verb>
2545   view TyVarsOf of Type = TyVarsOf [TyVar]
2546                         where
2547                           tyVarsOf ty = ...
2548 </verb></tscreen>
2549 Now we could write
2550 <tscreen><verb>
2551   f :: Type -> Int
2552   f (TyVarsOf tyvars) = length tyvars
2553 </verb></tscreen>
2554 instead of
2555 <tscreen><verb>
2556   f :: Type -> Int
2557   f ty = length (tyvarsOf ty)
2558 </verb></tscreen>
2559 Surely not!  So why do so just because the value returned is a <tt>Maybe</tt> type?
2560
2561 <sect3>Do pattern guards subsume views?
2562
2563 <p>There are two ways in which views might be desired even if you
2564 had pattern guards:<p>
2565 <itemize>
2566 <item>
2567 We might prefer to write (using views)
2568 <tscreen><verb>
2569   addCpx (Rect r1 i1) (Rect r1 i2) = rect (r1+r2) (c1+c2)
2570 </verb></tscreen>
2571 rather than (using pattern guards)
2572 <tscreen><verb>
2573   addCpx c1 c2
2574     | Rect r1 i1 <- getRect c1
2575     , Rect r1 i2 <- getRect c2
2576     = mkRect (r1+r2) (c1+c2)
2577 </verb></tscreen>(One might argue, though, that the latter accurately indicates that there may be some work involved in matching against a view, compared to ordinary pattern matching.)
2578 </item>
2579 <item>
2580 The pattern-guard notation gets a bit more clunky if we want a view that has more than one information-carrying constructor. For example, consider the following view:
2581 <tscreen><verb>
2582   view AbsInt of Int = Pos Int | Neg Int
2583     where
2584       absInt n = if n>=0 then Pos n else Neg (-n)
2585 </verb></tscreen>
2586 Here the view returns a Pos or Neg constructor, each of which contains the absolute value of the original Int.  Now we can say
2587 <tscreen><verb>
2588   f (Pos n) = n+1
2589   f (Neg n) = n-1
2590 </verb></tscreen>
2591 Then <tt>f 4 = 5</tt>, <tt>f (-3) = -4</tt>.
2592
2593 Without views, but with pattern guards, we could write this:
2594 <tscreen><verb>
2595   data AbsInt = Pos Int | Neg Int
2596   absInt n = if n>=0 then Pos n else Neg n
2597
2598   f n | Pos n' <- abs_n = n'+1
2599       | Neg n' <- abs_n = n'-1
2600       where
2601         abs_n = absInt n
2602 </verb></tscreen>
2603 <p>Here we've used a where clause to ensure that <tt>absInt</tt> is only called once (though we could instead duplicate the call to <tt>absInt</tt> and hope the compile spots the common subexpression).
2604
2605 <p>The view version is undoubtedly more compact. (Again, one might wonder, though, whether it perhaps conceals too much.)
2606 </item>
2607 <item>
2608 When nested pattern guards are used, though, the use of a where clause fails.  For example, consider the following silly function using the <tt>AbsInt</tt> view
2609 <tscreen><verb>
2610   g (Pos (Pos n)) = n+1
2611   g (Pos (Neg n)) = n-1 -- A bit silly
2612 </verb></tscreen>
2613 Without views we have to write
2614 <tscreen><verb>
2615   g n | n1 <- abs_n
2616       , Pos n2 <- absInt n1
2617       = n2+1
2618       | Pos n1 <- abs_n
2619       , Neg n2 <- absInt n1
2620       = n2-1
2621       where
2622         abs_n = absInt n
2623 </verb></tscreen>
2624 <p>We can share the first call to <tt>absInt</tt> but not the second.  This is a compilation issue.  Just as we might hope that the compiler would spot the common sub-expression if we replaced <tt>abs_n by (absInt n)</tt>, so we might hope that it would optimise the second.
2625 The views optimisation seems more simple to spot, somehow.
2626 </item>
2627 </itemize>
2628
2629 <sect3>Views --- summary
2630 <p>
2631 My gut feel at the moment is that the pattern-guard proposal
2632 <itemize>
2633 <item>is much simpler to specify and implement than views
2634 <item> gets some expressiveness that is simply inaccessible to views.
2635 <item>successfully reconciles pattern matching with data abstraction,
2636 albeit with a slightly less compact notation than views --
2637 but the extra notation carries useful clues
2638 <item>is less heavyweight to use when defining many information
2639 extraction functions over an ADT
2640 </itemize>
2641 So I think the case for pattern guards is stronger than that for views,
2642 and (if implemented) reduces, without eliminating, the need for views.
2643
2644 <sect2>Argument evaluation order
2645
2646 <p>Haskell specifies that patterns are evaluated left to right.  Thus
2647 <tscreen><verb>
2648   f (x:xs) (y:ys) = ...
2649   f xs     ys     = ...
2650 </verb></tscreen>
2651 Here, the first argument is evaluated and matched against <tt>(x:xs)</tt> and
2652 then the second argument is evaluated and matched against <tt>(y:ys)</tt>.
2653 If you want to match the second argument first --- a significant change
2654 since it changes the semantics of the function --- you are out of luck.
2655 You must either change the order of the arguments, or use case expressions
2656 instead.
2657
2658 <p>With pattern guards you can say what you want, without changing the
2659 argument order:
2660 <tscreen><verb>
2661   f xs ys | (y:ys) <- ys
2662             (x:xs) <- xs
2663           = ...
2664   f xs ys = ...
2665 </verb></tscreen>
2666 (Since a pattern guard is a non recursive binding I have shadowed
2667 xs and ys, just to remind us that it's OK to do so.)
2668
2669 <p>I can't say that this is a very important feature in practice, but
2670 it's worth noting.
2671
2672 <sect2>References
2673
2674 <p>[1] P Wadler, "Views: a way for pattern matching to cohabit with
2675 data abstraction", POPL 14 (1987), 307-313
2676
2677 <p>[2] W Burton, E Meijer, P Sansom, S Thompson, P Wadler, "A (sic) extension
2678 to Haskell 1.3 for views", sent to the Haskell mailing list
2679 23 Oct 1996