ghc/docs/users_guide/glasgow_exts.vsgml

   1 %
   2 % $Id: glasgow_exts.vsgml,v 1.6 1999/03/16 17:07:21 simonm Exp $
   3 %
   4 % GHC Language Extensions.
   5 %
   6
   7 As with all known Haskell systems, GHC implements some extensions to
   8 the language.  To use them, you'll need to give a @-fglasgow-exts@%
   9 <nidx>-fglasgow-exts option</nidx> option.
  10
  11 Virtually all of the Glasgow extensions serve to give you access to
  12 the underlying facilities with which we implement Haskell.  Thus, you
  13 can get at the Raw Iron, if you are willing to write some non-standard
  14 code at a more primitive level.  You need not be ``stuck'' on
  15 performance because of the implementation costs of Haskell's
  16 ``high-level'' features---you can always code ``under'' them.  In an
  17 extreme case, you can write all your time-critical code in C, and then
  18 just glue it together with Haskell!
  19
  20 Executive summary of our extensions:
  21
  22 <descrip>
  23
  24 <tag>Unboxed types and primitive operations:</tag>
  25
  26 You can get right down to the raw machine types and operations;
  27 included in this are ``primitive arrays'' (direct access to Big Wads
  28 of Bytes).  Please see Section <ref name="Unboxed types"
  29 id="glasgow-unboxed"> and following.
  30
  31 <tag>Multi-parameter type classes:</tag>
  32
  33 GHC's type system supports extended type classes with multiple
  34 parameters.  Please see Section <ref name="Mult-parameter type
  35 classes" id="multi-param-type-classes">.
  36
  37 <tag>Local universal quantification:</tag>
  38
  39 GHC's type system supports explicit unversal quantification in
  40 constructor fields and function arguments.  This is useful for things
  41 like defining @runST@ from the state-thread world.  See Section <ref
  42 name="Local universal quantification" id="universal-quantification">.
  43
  44 <tag>Extistentially quantification in data types:</tag>
  45
  46 Some or all of the type variables in a datatype declaration may be
  47 <em>existentially quantified</em>.  More details in Section <ref
  48 name="Existential Quantification" id="existential-quantification">.
  49
  50 <tag>Scoped type variables:</tag>
  51
  52 Scoped type variables enable the programmer to supply type signatures
  53 for some nested declarations, where this would not be legal in Haskell
  54 98.  Details in Section <ref name="Scoped Type Variables"
  55 id="scoped-type-variables">.
  56
  57 <tag>Calling out to C:</tag>
  58
  59 Just what it sounds like.  We provide <em>lots</em> of rope that you
  60 can dangle around your neck.  Please see Section <ref name="Calling~C
  61 directly from Haskell" id="glasgow-ccalls">.
  62
  63 <tag>Pragmas</tag>
  64
  65 Pragmas are special instructions to the compiler placed in the source
  66 file.  The pragmas GHC supports are described in Section <ref
  67 name="Pragmas" id="pragmas">.
  68
  69 </descrip>
  70
  71 Before you get too carried away working at the lowest level (e.g.,
  72 sloshing @MutableByteArray#@s around your program), you may wish to
  73 check if there are system libraries that provide a ``Haskellised
  74 veneer'' over the features you want.  See Section <ref name="GHC
  75 Prelude and libraries" id="ghc-prelude">.
  76
  77 %************************************************************************
  78 %*                                                                      *
  79 <sect1>Unboxed types
  80 <label id="glasgow-unboxed">
  81 <p>
  82 <nidx>Unboxed types (Glasgow extension)</nidx>
  83 %*                                                                      *
  84 %************************************************************************
  85
  86 These types correspond to the ``raw machine'' types you would use in
  87 C: @Int#@ (long int), @Double#@ (double), @Addr#@ (void *), etc.  The
  88 <em>primitive operations</em> (PrimOps) on these types are what you
  89 might expect; e.g., @(+#)@ is addition on @Int#@s, and is the
  90 machine-addition that we all know and love---usually one instruction.
  91
  92 There are some restrictions on the use of unboxed types, the main one
  93 being that you can't pass an unboxed value to a polymorphic function
  94 or store one in a polymorphic data type.  This rules out things like
  95 @[Int#]@ (ie. lists of unboxed integers).  The reason for this
  96 restriction is that polymorphic arguments and constructor fields are
  97 assumed to be pointers: if an unboxed integer is stored in one of
  98 these, the garbage collector would attempt to follow it, leading to
  99 unpredictable space leaks.  Or a @seq@ operation on the polymorphic
 100 component may attempt to dereference the pointer, with disastrous
 101 results.  Even worse, the unboxed value might be larger than a pointer
 102 (@Double#@ for instance).
 103
 104 Nevertheless, A numerically-intensive program using unboxed types can
 105 go a <em>lot</em> faster than its ``standard'' counterpart---we saw a
 106 threefold speedup on one example.
 107
 108 Please see Section <ref name="The module PrelGHC: really primitive
 109 stuff" id="ghc-libs-ghc"> for the details of unboxed types and the
 110 operations on them.
 111
 112 %************************************************************************
 113 %*                                                                      *
 114 <sect1>Primitive state-transformer monad
 115 <label id="glasgow-ST-monad">
 116 <p>
 117 <nidx>state transformers (Glasgow extensions)</nidx>
 118 <nidx>ST monad (Glasgow extension)</nidx>
 119 %*                                                                      *
 120 %************************************************************************
 121
 122 This monad underlies our implementation of arrays, mutable and
 123 immutable, and our implementation of I/O, including ``C calls''.
 124
 125 The @ST@ library, which provides access to the @ST@ monad, is a
 126 GHC/Hugs extension library and is described in the separate <htmlurl
 127 name="GHC/Hugs Extension Libraries" url="libs.html"> document.
 128
 129 %************************************************************************
 130 %*                                                                      *
 131 <sect1>Primitive arrays, mutable and otherwise
 132 <label id="glasgow-prim-arrays">
 133 <p>
 134 <nidx>primitive arrays (Glasgow extension)</nidx>
 135 <nidx>arrays, primitive (Glasgow extension)</nidx>
 136 %*                                                                      *
 137 %************************************************************************
 138
 139 GHC knows about quite a few flavours of Large Swathes of Bytes.
 140
 141 First, GHC distinguishes between primitive arrays of (boxed) Haskell
 142 objects (type @Array# obj@) and primitive arrays of bytes (type
 143 @ByteArray#@).
 144
 145 Second, it distinguishes between...
 146 <descrip>
 147 <tag>Immutable:</tag>
 148 Arrays that do not change (as with ``standard'' Haskell arrays); you
 149 can only read from them.  Obviously, they do not need the care and
 150 attention of the state-transformer monad.
 151
 152 <tag>Mutable:</tag>
 153 Arrays that may be changed or ``mutated.''  All the operations on them
 154 live within the state-transformer monad and the updates happen
 155 <em>in-place</em>.
 156
 157 <tag>``Static'' (in C land):</tag>
 158 A C routine may pass an @Addr#@ pointer back into Haskell land.  There
 159 are then primitive operations with which you may merrily grab values
 160 over in C land, by indexing off the ``static'' pointer.
 161
 162 <tag>``Stable'' pointers:</tag>
 163 If, for some reason, you wish to hand a Haskell pointer (i.e.,
 164 <em>not</em> an unboxed value) to a C routine, you first make the
 165 pointer ``stable,'' so that the garbage collector won't forget that it
 166 exists.  That is, GHC provides a safe way to pass Haskell pointers to
 167 C.
 168
 169 Please see Section <ref name="Subverting automatic unboxing with
 170 ``stable pointers''" id="glasgow-stablePtrs"> for more details.
 171
 172 <tag>``Foreign objects'':</tag>
 173 A ``foreign object'' is a safe way to pass an external object (a
 174 C-allocated pointer, say) to Haskell and have Haskell do the Right
 175 Thing when it no longer references the object.  So, for example, C
 176 could pass a large bitmap over to Haskell and say ``please free this
 177 memory when you're done with it.''
 178
 179 Please see Section <ref name="Pointing outside the Haskell heap"
 180 id="glasgow-foreignObjs"> for more details.
 181
 182 </descrip>
 183
 184 The libraries section gives more details on all these ``primitive
 185 array'' types and the operations on them, Section <ref name="The GHC
 186 Prelude and Libraries" id="ghc-prelude">.  Some of these extensions
 187 are also supported by Hugs, and the supporting libraries are described
 188 in the <htmlurl name="GHC/Hugs Extension Libraries" url="libs.html">
 189 document.
 190
 191 %************************************************************************
 192 %*                                                                      *
 193 <sect1>Calling~C directly from Haskell
 194 <label id="glasgow-ccalls">
 195 <p>
 196 <nidx>C calls (Glasgow extension)</nidx>
 197 <nidx>_ccall_ (Glasgow extension)</nidx>
 198 <nidx>_casm_ (Glasgow extension)</nidx>
 199 %*                                                                      *
 200 %************************************************************************
 201
 202 GOOD ADVICE: Because this stuff is not Entirely Stable as far as names
 203 and things go, you would be well-advised to keep your C-callery
 204 corraled in a few modules, rather than sprinkled all over your code.
 205 It will then be quite easy to update later on.
 206
 207 %************************************************************************
 208 %*                                                                      *
 209 <sect2>@_ccall_@ and @_casm_@: an introduction
 210 <label id="ccall-intro">
 211 <p>
 212 %*                                                                      *
 213 %************************************************************************
 214
 215 The simplest way to use a simple C function
 216
 217 <tscreen><verb>
 218 double fooC( FILE *in, char c, int i, double d, unsigned int u )
 219 </verb></tscreen>
 220
 221 is to provide a Haskell wrapper:
 222
 223 <tscreen><verb>
 224 fooH :: Char -> Int -> Double -> Word -> IO Double
 225 fooH c i d w = _ccall_ fooC (``stdin''::Addr) c i d w
 226 </verb></tscreen>
 227
 228 The function @fooH@ will unbox all of its arguments, call the C
 229 function @fooC@ and box the corresponding arguments.
 230
 231 One of the annoyances about @_ccall_@s is when the C types don't quite
 232 match the Haskell compiler's ideas.  For this, the @_casm_@ variant
 233 may be just the ticket (NB: <em>no chance</em> of such code going
 234 through a native-code generator):
 235
 236 <tscreen><verb>
 237 oldGetEnv name
 238   = _casm_ ``%r = getenv((char *) %0);'' name >>= \ litstring@(A# str#) ->
 239     return (
 240         if (litstring == ``NULL'') then
 241             Left ("Fail:oldGetEnv:"++name)
 242         else
 243             Right (unpackCString# str#)
 244     )
 245 </verb></tscreen>
 246
 247 The first literal-literal argument to a @_casm_@ is like a @printf@
 248 format: @%r@ is replaced with the ``result,'' @%0@--@%n-1@ are
 249 replaced with the 1st--nth arguments.  As you can see above, it is an
 250 easy way to do simple C~casting.  Everything said about @_ccall_@ goes
 251 for @_casm_@ as well.
 252
 253 The use of @_casm_@ in your code does pose a problem to the compiler
 254 when it comes to generating an interface file for a freshly compiled
 255 module. Included in an interface file is the unfolding (if any) of a
 256 declaration. However, if a declaration's unfolding happens to contain
 257 a @_casm_@, its unfolding will <em/not/ be emitted into the interface
 258 file even if it qualifies by all the other criteria. The reason why
 259 the compiler prevents this from happening is that unfolding @_casm_@s
 260 into an interface file unduly constrains how code that import your
 261 module have to be compiled. If an imported declaration is unfolded and
 262 it contains a @_casm_@, you now have to be using a compiler backend
 263 capable of dealing with it (i.e., the C compiler backend). If you are
 264 using the C compiler backend, the unfolded @_casm_@ may still cause you
 265 problems since the C code snippet it contains may mention CPP symbols
 266 that were in scope when compiling the original module are not when
 267 compiling the importing module.
 268
 269 If you're willing to put up with the drawbacks of doing cross-module
 270 inlining of C code (GHC - A Better C Compiler :-), the option
 271 @-funfold-casms-in-hi-file@ will turn off the default behaviour.
 272 <nidx>-funfold-casms-in-hi-file option</nidx>
 273
 274 %************************************************************************
 275 %*                                                                      *
 276 <sect2>Using function headers
 277 <label id="glasgow-foreign-headers">
 278 <p>
 279 <nidx>C calls, function headers</nidx>
 280 %*                                                                      *
 281 %************************************************************************
 282
 283 When generating C (using the @-fvia-C@ directive), one can assist the
 284 C compiler in detecting type errors by using the @-#include@ directive
 285 to provide @.h@ files containing function headers.
 286
 287 For example,
 288
 289 <tscreen><verb>
 290 typedef unsigned long *StgForeignObj;
 291 typedef long StgInt;
 292
 293 void          initialiseEFS (StgInt size);
 294 StgInt        terminateEFS (void);
 295 StgForeignObj emptyEFS(void);
 296 StgForeignObj updateEFS (StgForeignObj a, StgInt i, StgInt x);
 297 StgInt        lookupEFS (StgForeignObj a, StgInt i);
 298 </verb></tscreen>
 299
 300 You can find appropriate definitions for @StgInt@, @StgForeignObj@,
 301 etc using @gcc@ on your architecture by consulting
 302 @ghc/includes/StgTypes.h@.  The following table summarises the
 303 relationship between Haskell types and C types.
 304
 305 <tabular ca="ll">
 306 <bf>C type name</bf>      | <bf>Haskell Type</bf> @@
 307 @@
 308 @StgChar@          | @Char#@ @@
 309 @StgInt@           | @Int#@ @@
 310 @StgWord@          | @Word#@ @@
 311 @StgAddr@          | @Addr#@ @@
 312 @StgFloat@         | @Float#@ @@
 313 @StgDouble@        | @Double#@ @@
 314
 315 @StgArray@         | @Array#@ @@
 316 @StgByteArray@     | @ByteArray#@ @@
 317 @StgArray@         | @MutableArray#@ @@
 318 @StgByteArray@     | @MutableByteArray#@ @@
 319
 320 @StgStablePtr@     | @StablePtr#@ @@
 321 @StgForeignObj@    | @ForeignObj#@
 322 </tabular>
 323
 324 Note that this approach is only <em>essential</em> for returning
 325 @float@s (or if @sizeof(int) != sizeof(int *)@ on your
 326 architecture) but is a Good Thing for anyone who cares about writing
 327 solid code.  You're crazy not to do it.
 328
 329 %************************************************************************
 330 %*                                                                      *
 331 <sect2>Subverting automatic unboxing with ``stable pointers''
 332 <label id="glasgow-stablePtrs">
 333 <p>
 334 <nidx>stable pointers (Glasgow extension)</nidx>
 335 %*                                                                      *
 336 %************************************************************************
 337
 338 The arguments of a @_ccall_@ are automatically unboxed before the
 339 call.  There are two reasons why this is usually the Right Thing to
 340 do:
 341
 342 <itemize>
 343 <item>
 344 C is a strict language: it would be excessively tedious to pass
 345 unevaluated arguments and require the C programmer to force their
 346 evaluation before using them.
 347
 348 <item> Boxed values are stored on the Haskell heap and may be moved
 349 within the heap if a garbage collection occurs---that is, pointers
 350 to boxed objects are not <em>stable</em>.
 351 </itemize>
 352
 353 It is possible to subvert the unboxing process by creating a ``stable
 354 pointer'' to a value and passing the stable pointer instead.  For
 355 example, to pass/return an integer lazily to C functions @storeC@ and
 356 @fetchC@, one might write:
 357
 358 <tscreen><verb>
 359 storeH :: Int -> IO ()
 360 storeH x = makeStablePtr x              >>= \ stable_x ->
 361            _ccall_ storeC stable_x
 362
 363 fetchH :: IO Int
 364 fetchH x = _ccall_ fetchC               >>= \ stable_x ->
 365            deRefStablePtr stable_x      >>= \ x ->
 366            freeStablePtr stable_x       >>
 367            return x
 368 </verb></tscreen>
 369
 370 The garbage collector will refrain from throwing a stable pointer away
 371 until you explicitly call one of the following from C or Haskell.
 372
 373 <tscreen><verb>
 374 void freeStablePointer( StgStablePtr stablePtrToToss )
 375 freeStablePtr :: StablePtr a -> IO ()
 376 </verb></tscreen>
 377
 378 As with the use of @free@ in C programs, GREAT CARE SHOULD BE
 379 EXERCISED to ensure these functions are called at the right time: too
 380 early and you get dangling references (and, if you're lucky, an error
 381 message from the runtime system); too late and you get space leaks.
 382
 383 And to force evaluation of the argument within @fooC@, one would
 384 call one of the following C functions (according to type of argument).
 385
 386 <tscreen><verb>
 387 void     performIO  ( StgStablePtr stableIndex /* StablePtr s (IO ()) */ );
 388 StgInt   enterInt   ( StgStablePtr stableIndex /* StablePtr s Int */ );
 389 StgFloat enterFloat ( StgStablePtr stableIndex /* StablePtr s Float */ );
 390 </verb></tscreen>
 391
 392 <nidx>performIO</nidx>
 393 <nidx>enterInt</nidx>
 394 <nidx>enterFloat</nidx>
 395
 396 Note Bene: @_ccall_GC_@<nidx>_ccall_GC_</nidx> must be used if any of
 397 these functions are used.
 398
 399 %************************************************************************
 400 %*                                                                      *
 401 <sect2>Foreign objects: pointing outside the Haskell heap
 402 <label id="glasgow-foreignObjs">
 403 <p>
 404 <nidx>foreign objects (Glasgow extension)</nidx>
 405 %*                                                                      *
 406 %************************************************************************
 407
 408 There are two types that @ghc@ programs can use to reference
 409 (heap-allocated) objects outside the Haskell world: @Addr@ and
 410 @ForeignObj@.
 411
 412 If you use @Addr@, it is up to you to the programmer to arrange
 413 allocation and deallocation of the objects.
 414
 415 If you use @ForeignObj@, @ghc@'s garbage collector will call upon the
 416 user-supplied <em>finaliser</em> function to free the object when the
 417 Haskell world no longer can access the object.  (An object is
 418 associated with a finaliser function when the abstract
 419  Haskell type @ForeignObj@ is created). The finaliser function is
 420 expressed in C, and is passed as argument the object:
 421
 422 <tscreen><verb>
 423 void foreignFinaliser ( StgForeignObj fo )
 424 </verb></tscreen>
 425
 426 when the Haskell world can no longer access the object.  Since
 427 @ForeignObj@s only get released when a garbage collection occurs, we
 428 provide ways of triggering a garbage collection from within C and from
 429 within Haskell.
 430
 431 <tscreen><verb>
 432 void GarbageCollect()
 433 performGC :: IO ()
 434 </verb></tscreen>
 435
 436 More information on the programmers' interface to @ForeignObj@ can be
 437 found in the library documentation.
 438
 439 %************************************************************************
 440 %*                                                                      *
 441 <sect2>Avoiding monads
 442 <label id="glasgow-avoiding-monads">
 443 <p>
 444 <nidx>C calls to `pure C'</nidx>
 445 <nidx>unsafePerformIO</nidx>
 446 %*                                                                      *
 447 %************************************************************************
 448
 449 The @_ccall_@ construct is part of the @IO@ monad because 9 out of 10
 450 uses will be to call imperative functions with side effects such as
 451 @printf@.  Use of the monad ensures that these operations happen in a
 452 predictable order in spite of laziness and compiler optimisations.
 453
 454 To avoid having to be in the monad to call a C function, it is
 455 possible to use @unsafePerformIO@, which is available from the
 456 @IOExts@ module.  There are three situations where one might like to
 457 call a C function from outside the IO world:
 458
 459 <itemize>
 460 <item>
 461 Calling a function with no side-effects:
 462 <tscreen><verb>
 463 atan2d :: Double -> Double -> Double
 464 atan2d y x = unsafePerformIO (_ccall_ atan2d y x)
 465
 466 sincosd :: Double -> (Double, Double)
 467 sincosd x = unsafePerformIO $ do
 468         da <- newDoubleArray (0, 1)
 469         _casm_ ``sincosd( %0, &((double *)%1[0]), &((double *)%1[1]) );'' x da
 470         s <- readDoubleArray da 0
 471         c <- readDoubleArray da 1
 472         return (s, c)
 473 </verb></tscreen>
 474
 475 <item> Calling a set of functions which have side-effects but which can
 476 be used in a purely functional manner.
 477
 478 For example, an imperative implementation of a purely functional
 479 lookup-table might be accessed using the following functions.
 480
 481 <tscreen><verb>
 482 empty  :: EFS x
 483 update :: EFS x -> Int -> x -> EFS x
 484 lookup :: EFS a -> Int -> a
 485
 486 empty = unsafePerformIO (_ccall_ emptyEFS)
 487
 488 update a i x = unsafePerformIO $
 489         makeStablePtr x         >>= \ stable_x ->
 490         _ccall_ updateEFS a i stable_x
 491
 492 lookup a i = unsafePerformIO $
 493         _ccall_ lookupEFS a i   >>= \ stable_x ->
 494         deRefStablePtr stable_x
 495 </verb></tscreen>
 496
 497 You will almost always want to use @ForeignObj@s with this.
 498
 499 <item> Calling a side-effecting function even though the results will
 500 be unpredictable.  For example the @trace@ function is defined by:
 501
 502 <tscreen><verb>
 503 trace :: String -> a -> a
 504 trace string expr
 505   = unsafePerformIO (
 506         ((_ccall_ PreTraceHook sTDERR{-msg-}):: IO ())  >>
 507         fputs sTDERR string                             >>
 508         ((_ccall_ PostTraceHook sTDERR{-msg-}):: IO ()) >>
 509         return expr )
 510   where
 511     sTDERR = (``stderr'' :: Addr)
 512 </verb></tscreen>
 513
 514 (This kind of use is not highly recommended --- it is only really
 515 useful in debugging code.)
 516 </itemize>
 517
 518 %************************************************************************
 519 %*                                                                      *
 520 <sect2>C-calling ``gotchas'' checklist
 521 <label id="ccall-gotchas">
 522 <p>
 523 <nidx>C call dangers</nidx>
 524 %*                                                                      *
 525 %************************************************************************
 526
 527 And some advice, too.
 528
 529 <itemize>
 530 <item> For modules that use @_ccall_@s, etc., compile with
 531 @-fvia-C@.<nidx>-fvia-C option</nidx> You don't have to, but you should.
 532
 533 Also, use the @-#include "prototypes.h"@ flag (hack) to inform the C
 534 compiler of the fully-prototyped types of all the C functions you
 535 call.  (Section <ref name="Using function headers"
 536 id="glasgow-foreign-headers"> says more about this...)
 537
 538 This scheme is the <em>only</em> way that you will get <em>any</em>
 539 typechecking of your @_ccall_@s.  (It shouldn't be that way, but...).
 540 GHC will pass the flag @-Wimplicit@ to gcc so that you'll get warnings
 541 if any @_ccall_@ed functions have no prototypes.
 542
 543 <item>
 544 Try to avoid @_ccall_@s to C~functions that take @float@
 545 arguments or return @float@ results.  Reason: if you do, you will
 546 become entangled in (ANSI?) C's rules for when arguments/results are
 547 promoted to @doubles@.  It's a nightmare and just not worth it.
 548 Use @doubles@ if possible.
 549
 550 If you do use @floats@, check and re-check that the right thing is
 551 happening.  Perhaps compile with @-keep-hc-file-too@ and look at
 552 the intermediate C (@.hc@ file).
 553
 554 <item> The compiler uses two non-standard type-classes when
 555 type-checking the arguments and results of @_ccall_@: the arguments
 556 (respectively result) of @_ccall_@ must be instances of the class
 557 @CCallable@ (respectively @CReturnable@).  Both classes may be
 558 imported from the module @CCall@, but this should only be
 559 necessary if you want to define a new instance.  (Neither class
 560 defines any methods --- their only function is to keep the
 561 type-checker happy.)
 562
 563 The type checker must be able to figure out just which of the
 564 C-callable/returnable types is being used.  If it can't, you have to
 565 add type signatures. For example,
 566
 567 <tscreen><verb>
 568 f x = _ccall_ foo x
 569 </verb></tscreen>
 570
 571 is not good enough, because the compiler can't work out what type @x@
 572 is, nor what type the @_ccall_@ returns.  You have to write, say:
 573
 574 <tscreen><verb>
 575 f :: Int -> IO Double
 576 f x = _ccall_ foo x
 577 </verb></tscreen>
 578
 579 This table summarises the standard instances of these classes.
 580
 581 % ToDo: check this table against implementation!
 582
 583 <tabular ca="llll">
 584 <bf>Type</bf>       |<bf>CCallable</bf>|<bf>CReturnable</bf> | <bf>Which is probably...</bf> @@
 585
 586 @Char@              | Yes  | Yes   | @unsigned char@ @@
 587 @Int@               | Yes  | Yes   | @long int@ @@
 588 @Word@              | Yes  | Yes   | @unsigned long int@ @@
 589 @Addr@              | Yes  | Yes   | @void *@ @@
 590 @Float@             | Yes  | Yes   | @float@ @@
 591 @Double@            | Yes  | Yes   | @double@ @@
 592 @()@                | No   | Yes   | @void@ @@
 593 @[Char]@            | Yes  | No    | @char *@ (null-terminated) @@
 594
 595 @Array@             | Yes  | No    | @unsigned long *@ @@
 596 @ByteArray@         | Yes  | No    | @unsigned long *@ @@
 597 @MutableArray@      | Yes  | No    | @unsigned long *@ @@
 598 @MutableByteArray@  | Yes  | No    | @unsigned long *@ @@
 599
 600 @State@             | Yes  | Yes   | nothing!@@
 601
 602 @StablePtr@         | Yes  | Yes   | @unsigned long *@ @@
 603 @ForeignObjs@       | Yes  | Yes   | see later @@
 604 </tabular>
 605
 606 Actually, the @Word@ type is defined as being the same size as a
 607 pointer on the target architecture, which is <em>probably</em>
 608 @unsigned long int@.
 609
 610 The brave and careful programmer can add their own instances of these
 611 classes for the following types:
 612
 613 <itemize>
 614 <item>
 615 A <em>boxed-primitive</em> type may be made an instance of both
 616 @CCallable@ and @CReturnable@.
 617
 618 A boxed primitive type is any data type with a
 619 single unary constructor with a single primitive argument.  For
 620 example, the following are all boxed primitive types:
 621
 622 <tscreen><verb>
 623 Int
 624 Double
 625 data XDisplay = XDisplay Addr#
 626 data EFS a = EFS# ForeignObj#
 627 </verb></tscreen>
 628
 629 <tscreen><verb>
 630 instance CCallable   (EFS a)
 631 instance CReturnable (EFS a)
 632 </verb></tscreen>
 633
 634 <item> Any datatype with a single nullary constructor may be made an
 635 instance of @CReturnable@.  For example:
 636
 637 <tscreen><verb>
 638 data MyVoid = MyVoid
 639 instance CReturnable MyVoid
 640 </verb></tscreen>
 641
 642 <item> As at version 2.09, @String@ (i.e., @[Char]@) is still
 643 not a @CReturnable@ type.
 644
 645 Also, the now-builtin type @PackedString@ is neither
 646 @CCallable@ nor @CReturnable@.  (But there are functions in
 647 the PackedString interface to let you get at the necessary bits...)
 648 </itemize>
 649
 650 <item> The code-generator will complain if you attempt to use @%r@ in
 651 a @_casm_@ whose result type is @IO ()@; or if you don't use @%r@
 652 <em>precisely</em> once for any other result type.  These messages are
 653 supposed to be helpful and catch bugs---please tell us if they wreck
 654 your life.
 655
 656 <item> If you call out to C code which may trigger the Haskell garbage
 657 collector or create new threads (examples of this later...), then you
 658 must use the @_ccall_GC_@<nidx>_ccall_GC_ primitive</nidx> or
 659 @_casm_GC_@<nidx>_casm_GC_ primitive</nidx> variant of C-calls.  (This
 660 does not work with the native code generator - use @\fvia-C@.) This
 661 stuff is hairy with a capital H!  </itemize>
 662
 663 <sect1> Multi-parameter type classes
 664 <label id="multi-param-type-classes">
 665 <p>
 666
 667 This section documents GHC's implementation of multi-paramter type
 668 classes.  There's lots of background in the paper <url name="Type
 669 classes: exploring the design space"
 670 url="http://www.dcs.gla.ac.uk/~simonpj/multi.ps.gz"> (Simon Peyton
 671 Jones, Mark Jones, Erik Meijer).
 672
 673 I'd like to thank people who reported shorcomings in the GHC 3.02
 674 implementation.  Our default decisions were all conservative ones, and
 675 the experience of these heroic pioneers has given useful concrete
 676 examples to support several generalisations.  (These appear below as
 677 design choices not implemented in 3.02.)
 678
 679 I've discussed these notes with Mark Jones, and I believe that Hugs
 680 will migrate towards the same design choices as I outline here.
 681 Thanks to him, and to many others who have offered very useful
 682 feedback.
 683
 684 <sect2>Types
 685 <p>
 686
 687 There are the following restrictions on the form of a qualified
 688 type:
 689
 690 <tscreen><verb>
 691   forall tv1..tvn (c1, ...,cn) => type
 692 </verb></tscreen>
 693
 694 (Here, I write the "foralls" explicitly, although the Haskell source
 695 language omits them; in Haskell 1.4, all the free type variables of an
 696 explicit source-language type signature are universally quantified,
 697 except for the class type variables in a class declaration.  However,
 698 in GHC, you can give the foralls if you want.  See Section <ref
 699 name="Explicit universal quantification"
 700 id="universal-quantification">).
 701
 702 <enum>
 703
 704 <item> <bf>Each universally quantified type variable
 705 @tvi@ must be mentioned (i.e. appear free) in @type@</bf>.
 706
 707 The reason for this is that a value with a type that does not obey
 708 this restriction could not be used without introducing
 709 ambiguity. Here, for example, is an illegal type:
 710
 711 <tscreen><verb>
 712   forall a. Eq a => Int
 713 </verb></tscreen>
 714
 715 When a value with this type was used, the constraint <tt>Eq tv</tt>
 716 would be introduced where <tt>tv</tt> is a fresh type variable, and
 717 (in the dictionary-translation implementation) the value would be
 718 applied to a dictionary for <tt>Eq tv</tt>.  The difficulty is that we
 719 can never know which instance of <tt>Eq</tt> to use because we never
 720 get any more information about <tt>tv</tt>.
 721
 722 <item> <bf>Every constraint @ci@ must mention at least one of the
 723 universally quantified type variables @tvi@</bf>.
 724
 725 For example, this type is OK because <tt>C a b</tt> mentions the
 726 universally quantified type variable <tt>b</tt>:
 727
 728 <tscreen><verb>
 729   forall a. C a b => burble
 730 </verb></tscreen>
 731
 732 The next type is illegal because the constraint <tt>Eq b</tt> does not
 733 mention <tt>a</tt>:
 734
 735 <tscreen><verb>
 736   forall a. Eq b => burble
 737 </verb></tscreen>
 738
 739 The reason for this restriction is milder than the other one.  The
 740 excluded types are never useful or necessary (because the offending
 741 context doesn't need to be witnessed at this point; it can be floated
 742 out).  Furthermore, floating them out increases sharing. Lastly,
 743 excluding them is a conservative choice; it leaves a patch of
 744 territory free in case we need it later.
 745
 746 </enum>
 747
 748 These restrictions apply to all types, whether declared in a type signature
 749 or inferred.
 750
 751 Unlike Haskell 1.4, constraints in types do <bf>not</bf> have to be of
 752 the form <em>(class type-variables)</em>.  Thus, these type signatures
 753 are perfectly OK
 754
 755 <tscreen><verb>
 756   f :: Eq (m a) => [m a] -> [m a]
 757   g :: Eq [a] => ...
 758 </verb></tscreen>
 759
 760 This choice recovers principal types, a property that Haskell 1.4 does not have.
 761
 762 <sect2>Class declarations
 763 <p>
 764
 765 <enum>
 766
 767 <item> <bf>Multi-parameter type classes are permitted</bf>. For example:
 768
 769 <tscreen><verb>
 770   class Collection c a where
 771     union :: c a -> c a -> c a
 772     ...etc..
 773 </verb></tscreen>
 774
 775
 776 <item> <bf>The class hierarchy must be acyclic</bf>.  However, the definition
 777 of "acyclic" involves only the superclass relationships.  For example,
 778 this is OK:
 779
 780 <tscreen><verb>
 781   class C a where {
 782     op :: D b => a -> b -> b
 783   }
 784
 785   class C a => D a where { ... }
 786 </verb></tscreen>
 787
 788 Here, <tt>C</tt> is a superclass of <tt>D</tt>, but it's OK for a
 789 class operation <tt>op</tt> of <tt>C</tt> to mention <tt>D</tt>.  (It
 790 would not be OK for <tt>D</tt> to be a superclass of <tt>C</tt>.)
 791
 792 <item> <bf>There are no restrictions on the context in a class declaration
 793 (which introduces superclasses), except that the class hierarchy must
 794 be acyclic</bf>.  So these class declarations are OK:
 795
 796 <tscreen><verb>
 797   class Functor (m k) => FiniteMap m k where
 798     ...
 799
 800   class (Monad m, Monad (t m)) => Transform t m where
 801     lift :: m a -> (t m) a
 802 </verb></tscreen>
 803
 804 <item> <bf>In the signature of a class operation, every constraint
 805 must mention at least one type variable that is not a class type
 806 variable</bf>.
 807
 808 Thus:
 809
 810 <tscreen><verb>
 811   class Collection c a where
 812     mapC :: Collection c b => (a->b) -> c a -> c b
 813 </verb></tscreen>
 814
 815 is OK because the constraint <tt>(Collection a b)</tt> mentions
 816 <tt>b</tt>, even though it also mentions the class variable
 817 <tt>a</tt>.  On the other hand:
 818
 819 <tscreen><verb>
 820   class C a where
 821     op :: Eq a => (a,b) -> (a,b)
 822 </verb></tscreen>
 823
 824 is not OK because the constraint <tt>(Eq a)</tt> mentions on the class
 825 type variable <tt>a</tt>, but not <tt>b</tt>.  However, any such
 826 example is easily fixed by moving the offending context up to the
 827 superclass context:
 828
 829 <tscreen><verb>
 830   class Eq a => C a where
 831     op ::(a,b) -> (a,b)
 832 </verb></tscreen>
 833
 834 A yet more relaxed rule would allow the context of a class-op signature
 835 to mention only class type variables.  However, that conflicts with
 836 Rule 1(b) for types above.
 837
 838 <item> <bf>The type of each class operation must mention <em/all/ of
 839 the class type variables</bf>.  For example:
 840
 841 <tscreen><verb>
 842   class Coll s a where
 843     empty  :: s
 844     insert :: s -> a -> s
 845 </verb></tscreen>
 846
 847 is not OK, because the type of <tt>empty</tt> doesn't mention
 848 <tt>a</tt>.  This rule is a consequence of Rule 1(a), above, for
 849 types, and has the same motivation.
 850
 851 Sometimes, offending class declarations exhibit misunderstandings.  For
 852 example, <tt>Coll</tt> might be rewritten
 853
 854 <tscreen><verb>
 855   class Coll s a where
 856     empty  :: s a
 857     insert :: s a -> a -> s a
 858 </verb></tscreen>
 859
 860 which makes the connection between the type of a collection of
 861 <tt>a</tt>'s (namely <tt>(s a)</tt>) and the element type <tt>a</tt>.
 862 Occasionally this really doesn't work, in which case you can split the
 863 class like this:
 864
 865 <tscreen><verb>
 866   class CollE s where
 867     empty  :: s
 868
 869   class CollE s => Coll s a where
 870     insert :: s -> a -> s
 871 </verb></tscreen>
 872
 873 </enum>
 874
 875 <sect2>Instance declarations
 876 <p>
 877
 878 <enum>
 879
 880 <item> <bf>Instance declarations may not overlap</bf>.  The two instance
 881 declarations
 882
 883 <tscreen><verb>
 884   instance context1 => C type1 where ...
 885   instance context2 => C type2 where ...
 886 </verb></tscreen>
 887
 888 "overlap" if @type1@ and @type2@ unify
 889
 890 However, if you give the command line option
 891 @-fallow-overlapping-instances@<nidx>-fallow-overlapping-instances
 892 option</nidx> then two overlapping instance declarations are permitted
 893 iff
 894
 895 <itemize>
 896 <item> EITHER @type1@ and @type2@ do not unify
 897 <item> OR @type2@ is a substitution instance of @type1@
 898                 (but not identical to @type1@)
 899 <item> OR vice versa
 900 </itemize>
 901
 902 Notice that these rules
 903
 904 <itemize>
 905 <item> make it clear which instance decl to use
 906            (pick the most specific one that matches)
 907
 908 <item> do not mention the contexts @context1@, @context2@
 909             Reason: you can pick which instance decl
 910             "matches" based on the type.
 911 </itemize>
 912
 913 Regrettably, GHC doesn't guarantee to detect overlapping instance
 914 declarations if they appear in different modules.  GHC can "see" the
 915 instance declarations in the transitive closure of all the modules
 916 imported by the one being compiled, so it can "see" all instance decls
 917 when it is compiling <tt>Main</tt>.  However, it currently chooses not
 918 to look at ones that can't possibly be of use in the module currently
 919 being compiled, in the interests of efficiency.  (Perhaps we should
 920 change that decision, at least for <tt>Main</tt>.)
 921
 922 <item> <bf>There are no restrictions on the type in an instance
 923 <em/head/, except that at least one must not be a type variable</bf>.
 924 The instance "head" is the bit after the "=>" in an instance decl. For
 925 example, these are OK:
 926
 927 <tscreen><verb>
 928   instance C Int a where ...
 929
 930   instance D (Int, Int) where ...
 931
 932   instance E [[a]] where ...
 933 </verb></tscreen>
 934
 935 Note that instance heads <bf>may</bf> contain repeated type variables.
 936 For example, this is OK:
 937
 938 <tscreen><verb>
 939   instance Stateful (ST s) (MutVar s) where ...
 940 </verb></tscreen>
 941
 942 The "at least one not a type variable" restriction is to ensure that
 943 context reduction terminates: each reduction step removes one type
 944 constructor.  For example, the following would make the type checker
 945 loop if it wasn't excluded:
 946
 947 <tscreen><verb>
 948   instance C a => C a where ...
 949 </verb></tscreen>
 950
 951 There are two situations in which the rule is a bit of a pain. First,
 952 if one allows overlapping instance declarations then it's quite
 953 convenient to have a "default instance" declaration that applies if
 954 something more specific does not:
 955
 956 <tscreen><verb>
 957   instance C a where
 958     op = ... -- Default
 959 </verb></tscreen>
 960
 961 Second, sometimes you might want to use the following to get the
 962 effect of a "class synonym":
 963
 964 <tscreen><verb>
 965   class (C1 a, C2 a, C3 a) => C a where { }
 966
 967   instance (C1 a, C2 a, C3 a) => C a where { }
 968 </verb></tscreen>
 969
 970 This allows you to write shorter signatures:
 971
 972 <tscreen><verb>
 973   f :: C a => ...
 974 </verb></tscreen>
 975
 976 instead of
 977
 978 <tscreen><verb>
 979   f :: (C1 a, C2 a, C3 a) => ...
 980 </verb></tscreen>
 981
 982 I'm on the lookout for a simple rule that preserves decidability while
 983 allowing these idioms.  The experimental flag
 984 @-fallow-undecidable-instances@<nidx>-fallow-undecidable-instances
 985 option</nidx> lifts this restriction, allowing all the types in an
 986 instance head to be type variables.
 987
 988 <item> <bf>Unlike Haskell 1.4, instance heads may use type
 989 synonyms</bf>.  As always, using a type synonym is just shorthand for
 990 writing the RHS of the type synonym definition.  For example:
 991
 992 <tscreen><verb>
 993   type Point = (Int,Int)
 994   instance C Point   where ...
 995   instance C [Point] where ...
 996 </verb></tscreen>
 997
 998 is legal.  However, if you added
 999
1000 <tscreen><verb>
1001   instance C (Int,Int) where ...
1002 </verb></tscreen>
1003
1004 as well, then the compiler will complain about the overlapping
1005 (actually, identical) instance declarations.  As always, type synonyms
1006 must be fully applied.  You cannot, for example, write:
1007
1008 <tscreen><verb>
1009   type P a = [[a]]
1010   instance Monad P where ...
1011 </verb></tscreen>
1012
1013 This design decision is independent of all the others, and easily
1014 reversed, but it makes sense to me.
1015
1016 <item><bf>The types in an instance-declaration <em/context/ must all
1017 be type variables</bf>. Thus
1018
1019 <tscreen><verb>
1020   instance C a b => Eq (a,b) where ...
1021 </verb></tscreen>
1022
1023 is OK, but
1024
1025 <tscreen><verb>
1026   instance C Int b => Foo b where ...
1027 </verb></tscreen>
1028
1029 is not OK.  Again, the intent here is to make sure that context
1030 reduction terminates.
1031
1032 Voluminous correspondence on the Haskell mailing list has convinced me
1033 that it's worth experimenting with a more liberal rule.  If you use
1034 the flag <tt>-fallow-undecidable-instances</tt> you can use arbitrary
1035 types in an instance context.  Termination is ensured by having a
1036 fixed-depth recursion stack.  If you exceed the stack depth you get a
1037 sort of backtrace, and the opportunity to increase the stack depth
1038 with <tt>-fcontext-stack</tt><em/N/.
1039
1040 </enum>
1041
1042 % -----------------------------------------------------------------------------
1043 <sect1>Explicit universal quantification
1044 <label id="universal-quantification">
1045 <p>
1046
1047 GHC now allows you to write explicitly quantified types.  GHC's
1048 syntax for this now agrees with Hugs's, namely:
1049
1050 <tscreen><verb>
1051         forall a b. (Ord a, Eq  b) => a -> b -> a
1052 </verb></tscreen>
1053
1054 The context is, of course, optional.  You can't use <tt>forall</tt> as
1055 a type variable any more!
1056
1057 Haskell type signatures are implicitly quantified.  The <tt>forall</tt>
1058 allows us to say exactly what this means.  For example:
1059
1060 <tscreen><verb>
1061         g :: b -> b
1062 </verb></tscreen>
1063
1064 means this:
1065
1066 <tscreen><verb>
1067         g :: forall b. (b -> b)
1068 </verb></tscreen>
1069
1070 The two are treated identically.
1071
1072 <sect2>Universally-quantified data type fields
1073 <label id="univ">
1074 <p>
1075
1076 In a <tt>data</tt> or <tt>newtype</tt> declaration one can quantify
1077 the types of the constructor arguments.  Here are several examples:
1078
1079 <tscreen><verb>
1080 data T a = T1 (forall b. b -> b -> b) a
1081
1082 data MonadT m = MkMonad { return :: forall a. a -> m a,
1083                           bind   :: forall a b. m a -> (a -> m b) -> m b
1084                         }
1085
1086 newtype Swizzle = MkSwizzle (Ord a => [a] -> [a])
1087 </verb></tscreen>
1088
1089 The constructors now have so-called <em/rank 2/ polymorphic
1090 types, in which there is a for-all in the argument types.:
1091
1092 <tscreen><verb>
1093 T1 :: forall a. (forall b. b -> b -> b) -> a -> T1 a
1094 MkMonad :: forall m. (forall a. a -> m a)
1095                   -> (forall a b. m a -> (a -> m b) -> m b)
1096                   -> MonadT m
1097 MkSwizzle :: (Ord a => [a] -> [a]) -> Swizzle
1098 </verb></tscreen>
1099
1100 Notice that you don't need to use a <tt>forall</tt> if there's an
1101 explicit context.  For example in the first argument of the
1102 constructor <tt>MkSwizzle</tt>, an implicit "<tt>forall a.</tt>" is
1103 prefixed to the argument type.  The implicit <tt>forall</tt>
1104 quantifies all type variables that are not already in scope, and are
1105 mentioned in the type quantified over.
1106
1107 As for type signatures, implicit quantification happens for non-overloaded
1108 types too.  So if you write this:
1109 <tscreen><verb>
1110   data T a = MkT (Either a b) (b -> b)
1111 </verb></tscreen>
1112 it's just as if you had written this:
1113 <tscreen><verb>
1114   data T a = MkT (forall b. Either a b) (forall b. b -> b)
1115 </verb></tscreen>
1116 That is, since the type variable <tt>b</tt> isn't in scope, it's
1117 implicitly universally quantified.  (Arguably, it would be better
1118 to <em>require</em> explicit quantification on constructor arguments
1119 where that is what is wanted.  Feedback welcomed.)
1120
1121 <sect2> Construction
1122 <p>
1123
1124 You construct values of types <tt>T1, MonadT, Swizzle</tt> by applying
1125 the constructor to suitable values, just as usual.  For example,
1126
1127 <tscreen><verb>
1128 (T1 (\xy->x) 3) :: T Int
1129
1130 (MkSwizzle sort)    :: Swizzle
1131 (MkSwizzle reverse) :: Swizzle
1132
1133 (let r x = Just x
1134      b m k = case m of
1135                 Just y -> k y
1136                 Nothing -> Nothing
1137   in
1138   MkMonad r b) :: MonadT Maybe
1139 </verb></tscreen>
1140
1141 The type of the argument can, as usual, be more general than the type
1142 required, as <tt>(MkSwizzle reverse)</tt> shows.  (<tt>reverse</tt>
1143 does not need the <tt>Ord</tt> constraint.)
1144
1145 <sect2>Pattern matching
1146 <p>
1147
1148 When you use pattern matching, the bound variables may now have
1149 polymorphic types.  For example:
1150
1151 <tscreen><verb>
1152         f :: T a -> a -> (a, Char)
1153         f (T1 f k) x = (f k x, f 'c' 'd')
1154
1155         g :: (Ord a, Ord b) => Swizzle -> [a] -> (a -> b) -> [b]
1156         g (MkSwizzle s) xs f = s (map f (s xs))
1157
1158         h :: MonadT m -> [m a] -> m [a]
1159         h m [] = return m []
1160         h m (x:xs) = bind m x           $ \y ->
1161                       bind m (h m xs)   $ \ys ->
1162                       return m (y:ys)
1163 </verb></tscreen>
1164
1165 In the function <tt>h</tt> we use the record selectors <tt>return</tt>
1166 and <tt>bind</tt> to extract the polymorphic bind and return functions
1167 from the <tt>MonadT</tt> data structure, rather than using pattern
1168 matching.
1169
1170 <sect2>The partial-application restriction
1171 <p>
1172
1173 There is really only one way in which data structures with polymorphic
1174 components might surprise you: you must not partially apply them.
1175 For example, this is illegal:
1176
1177 <tscreen><verb>
1178         map MkSwizzle [sort, reverse]
1179 </verb></tscreen>
1180
1181 The restriction is this: <em>every subexpression of the program must
1182 have a type that has no for-alls, except that in a function
1183 application (f e1 ... en) the partial applications are not subject to
1184 this rule</em>.  The restriction makes type inference feasible.
1185
1186 In the illegal example, the sub-expression <tt>MkSwizzle</tt> has the
1187 polymorphic type <tt>(Ord b => [b] -> [b]) -> Swizzle</tt> and is not
1188 a sub-expression of an enclosing application.  On the other hand, this
1189 expression is OK:
1190
1191 <tscreen><verb>
1192         map (T1 (\a b -> a)) [1,2,3]
1193 </verb></tscreen>
1194
1195 even though it involves a partial application of <tt>T1</tt>, because
1196 the sub-expression <tt>T1 (\a b -> a)</tt> has type <tt>Int -> T
1197 Int</tt>.
1198
1199 <sect2>Type signatures
1200 <label id="sigs">
1201 <p>
1202
1203 Once you have data constructors with universally-quantified fields, or
1204 constants such as <tt>runST</tt> that have rank-2 types, it isn't long
1205 before you discover that you need more!  Consider:
1206
1207 <tscreen><verb>
1208   mkTs f x y = [T1 f x, T1 f y]
1209 </verb></tscreen>
1210
1211 <tt>mkTs</tt> is a fuction that constructs some values of type
1212 <tt>T</tt>, using some pieces passed to it.  The trouble is that since
1213 <tt>f</tt> is a function argument, Haskell assumes that it is
1214 monomorphic, so we'll get a type error when applying <tt>T1</tt> to
1215 it.  This is a rather silly example, but the problem really bites in
1216 practice.  Lots of people trip over the fact that you can't make
1217 "wrappers functions" for <tt>runST</tt> for exactly the same reason.
1218 In short, it is impossible to build abstractions around functions with
1219 rank-2 types.
1220
1221 The solution is fairly clear.  We provide the ability to give a rank-2
1222 type signature for <em>ordinary</em> functions (not only data
1223 constructors), thus:
1224
1225 <tscreen><verb>
1226   mkTs :: (forall b. b -> b -> b) -> a -> [T a]
1227   mkTs f x y = [T1 f x, T1 f y]
1228 </verb></tscreen>
1229
1230 This type signature tells the compiler to attribute <tt>f</tt> with
1231 the polymorphic type <tt>(forall b. b -> b -> b)</tt> when type
1232 checking the body of <tt>mkTs</tt>, so now the application of
1233 <tt>T1</tt> is fine.
1234
1235 There are two restrictions:
1236
1237 <itemize>
1238 <item> You can only define a rank 2 type, specified by the following
1239 grammar:
1240
1241 <tscreen><verb>
1242    rank2type ::= [forall tyvars .] [context =>] funty
1243    funty     ::= ([forall tyvars .] [context =>] ty) -> funty
1244                | ty
1245    ty        ::= ...current Haskell monotype syntax...
1246 </verb></tscreen>
1247
1248 Informally, the universal quantification must all be right at the beginning,
1249 or at the top level of a function argument.
1250
1251 <item> There is a restriction on the definition of a function whose
1252 type signature is a rank-2 type: the polymorphic arguments must be
1253 matched on the left hand side of the "<tt>=</tt>" sign.  You can't
1254 define <tt>mkTs</tt> like this:
1255
1256 <tscreen><verb>
1257   mkTs :: (forall b. b -> b -> b) -> a -> [T a]
1258   mkTs = \ f x y -> [T1 f x, T1 f y]
1259 </verb></tscreen>
1260
1261
1262 The same partial-application rule applies to ordinary functions with
1263 rank-2 types as applied to data constructors.
1264
1265 </itemize>
1266
1267 % -----------------------------------------------------------------------------
1268 <sect1>Existentially quantified data constructors
1269 <label id="existential-quantification">
1270 <p>
1271
1272 The idea of using existential quantification in data type declarations
1273 was suggested by Laufer (I believe, thought doubtless someone will
1274 correct me), and implemented in Hope+. It's been in Lennart
1275 Augustsson's <tt>hbc</tt> Haskell compiler for several years, and
1276 proved very useful.  Here's the idea.  Consider the declaration:
1277
1278 <tscreen><verb>
1279   data Foo = forall a. MkFoo a (a -> Bool)
1280            | Nil
1281 </verb></tscreen>
1282
1283 The data type <tt>Foo</tt> has two constructors with types:
1284
1285 <tscreen><verb>
1286   MkFoo :: forall a. a -> (a -> Bool) -> Foo
1287   Nil   :: Foo
1288 </verb></tscreen>
1289
1290 Notice that the type variable <tt>a</tt> in the type of <tt>MkFoo</tt>
1291 does not appear in the data type itself, which is plain <tt>Foo</tt>.
1292 For example, the following expression is fine:
1293
1294 <tscreen><verb>
1295   [MkFoo 3 even, MkFoo 'c' isUpper] :: [Foo]
1296 </verb></tscreen>
1297
1298 Here, <tt>(MkFoo 3 even)</tt> packages an integer with a function
1299 <tt>even</tt> that maps an integer to <tt>Bool</tt>; and <tt>MkFoo 'c'
1300 isUpper</tt> packages a character with a compatible function.  These
1301 two things are each of type <tt>Foo</tt> and can be put in a list.
1302
1303 What can we do with a value of type <tt>Foo</tt>?.  In particular,
1304 what happens when we pattern-match on <tt>MkFoo</tt>?
1305
1306 <tscreen><verb>
1307   f (MkFoo val fn) = ???
1308 </verb></tscreen>
1309
1310 Since all we know about <tt>val</tt> and <tt>fn</tt> is that they
1311 are compatible, the only (useful) thing we can do with them is to
1312 apply <tt>fn</tt> to <tt>val</tt> to get a boolean.  For example:
1313
1314 <tscreen><verb>
1315   f :: Foo -> Bool
1316   f (MkFoo val fn) = fn val
1317 </verb></tscreen>
1318
1319 What this allows us to do is to package heterogenous values
1320 together with a bunch of functions that manipulate them, and then treat
1321 that collection of packages in a uniform manner.  You can express
1322 quite a bit of object-oriented-like programming this way.
1323
1324 <sect2>Why existential?
1325 <label id="existential">
1326 <p>
1327
1328 What has this to do with <em>existential</em> quantification?
1329 Simply that <tt>MkFoo</tt> has the (nearly) isomorphic type
1330
1331 <tscreen><verb>
1332   MkFoo :: (exists a . (a, a -> Bool)) -> Foo
1333 </verb></tscreen>
1334
1335 But Haskell programmers can safely think of the ordinary
1336 <em>universally</em> quantified type given above, thereby avoiding
1337 adding a new existential quantification construct.
1338
1339 <sect2>Type classes
1340 <p>
1341
1342 An easy extension (implemented in <tt>hbc</tt>) is to allow
1343 arbitrary contexts before the constructor.  For example:
1344
1345 <tscreen><verb>
1346   data Baz = forall a. Eq a => Baz1 a a
1347            | forall b. Show b => Baz2 b (b -> b)
1348 </verb></tscreen>
1349
1350 The two constructors have the types you'd expect:
1351
1352 <tscreen><verb>
1353   Baz1 :: forall a. Eq a => a -> a -> Baz
1354   Baz2 :: forall b. Show b => b -> (b -> b) -> Baz
1355 </verb></tscreen>
1356
1357 But when pattern matching on <tt>Baz1</tt> the matched values can be compared
1358 for equality, and when pattern matching on <tt>Baz2</tt> the first matched
1359 value can be converted to a string (as well as applying the function to it).
1360 So this program is legal:
1361
1362 <tscreen><verb>
1363   f :: Baz -> String
1364   f (Baz1 p q) | p == q    = "Yes"
1365                | otherwise = "No"
1366   f (Baz1 v fn)            = show (fn v)
1367 </verb></tscreen>
1368
1369 Operationally, in a dictionary-passing implementation, the
1370 constructors <tt>Baz1</tt> and <tt>Baz2</tt> must store the
1371 dictionaries for <tt>Eq</tt> and <tt>Show</tt> respectively, and
1372 extract it on pattern matching.
1373
1374 Notice the way that the syntax fits smoothly with that used for
1375 universal quantification earlier.
1376
1377 <sect2>Restrictions
1378 <p>
1379
1380 There are several restrictions on the ways in which existentially-quantified
1381 constructors can be use.
1382
1383 <itemize>
1384
1385 <item> When pattern matching, each pattern match introduces a new,
1386 distinct, type for each existential type variable.  These types cannot
1387 be unified with any other type, nor can they escape from the scope of
1388 the pattern match.  For example, these fragments are incorrect:
1389
1390 <tscreen><verb>
1391   f1 (MkFoo a f) = a
1392 </verb></tscreen>
1393
1394 Here, the type bound by <tt>MkFoo</tt> "escapes", because <tt>a</tt>
1395 is the result of <tt>f1</tt>.  One way to see why this is wrong is to
1396 ask what type <tt>f1</tt> has:
1397
1398 <tscreen><verb>
1399   f1 :: Foo -> a             -- Weird!
1400 </verb></tscreen>
1401
1402 What is this "<tt>a</tt>" in the result type? Clearly we don't mean
1403 this:
1404
1405 <tscreen><verb>
1406   f1 :: forall a. Foo -> a   -- Wrong!
1407 </verb></tscreen>
1408
1409 The original program is just plain wrong.  Here's another sort of error
1410
1411 <tscreen><verb>
1412   f2 (Baz1 a b) (Baz1 p q) = a==q
1413 </verb></tscreen>
1414
1415 It's ok to say <tt>a==b</tt> or <tt>p==q</tt>, but
1416 <tt>a==q</tt> is wrong because it equates the two distinct types arising
1417 from the two <tt>Baz1</tt> constructors.
1418
1419
1420 <item>You can't pattern-match on an existentially quantified
1421 constructor in a <tt>let</tt> or <tt>where</tt> group of
1422 bindings. So this is illegal:
1423
1424 <tscreen><verb>
1425   f3 x = a==b where { Baz1 a b = x }
1426 </verb></tscreen>
1427
1428 You can only pattern-match
1429 on an existentially-quantified constructor in a <tt>case</tt> expression or
1430 in the patterns of a function definition.
1431
1432 The reason for this restriction is really an implementation one.
1433 Type-checking binding groups is already a nightmare without
1434 existentials complicating the picture.  Also an existential pattern
1435 binding at the top level of a module doesn't make sense, because it's
1436 not clear how to prevent the existentially-quantified type "escaping".
1437 So for now, there's a simple-to-state restriction.  We'll see how
1438 annoying it is.
1439
1440 <item>You can't use existential quantification for <tt>newtype</tt>
1441 declarations.  So this is illegal:
1442
1443 <tscreen><verb>
1444   newtype T = forall a. Ord a => MkT a
1445 </verb></tscreen>
1446
1447 Reason: a value of type <tt>T</tt> must be represented as a pair
1448 of a dictionary for <tt>Ord t</tt> and a value of type <tt>t</tt>.
1449 That contradicts the idea that <tt>newtype</tt> should have no
1450 concrete representation.  You can get just the same efficiency and effect
1451 by using <tt>data</tt> instead of <tt>newtype</tt>.  If there is no
1452 overloading involved, then there is more of a case for allowing
1453 an existentially-quantified <tt>newtype</tt>, because the <tt>data</tt>
1454 because the <tt>data</tt> version does carry an implementation cost,
1455 but single-field existentially quantified constructors aren't much
1456 use.  So the simple restriction (no existential stuff on <tt>newtype</tt>)
1457 stands, unless there are convincing reasons to change it.
1458 </itemize>
1459
1460 % -----------------------------------------------------------------------------
1461 <sect1>Scoped Type Variables
1462 <label id="scoped-type-variables">
1463 <p>
1464
1465 A <em/pattern type signature/ can introduce a <em/scoped type
1466 variable/.  For example
1467
1468 <tscreen><verb>
1469 f (xs::[a]) = ys ++ ys
1470            where
1471               ys :: [a]
1472               ys = reverse xs
1473 </verb></tscreen>
1474
1475 The pattern @(xs::[a])@ includes a type signature for @xs@.
1476 This brings the type variable @a@ into scope; it scopes over
1477 all the patterns and right hand sides for this equation for @f@.
1478 In particular, it is in scope at the type signature for @y@.
1479
1480 At ordinary type signatures, such as that for @ys@, any type variables
1481 mentioned in the type signature <em/that are not in scope/ are
1482 implicitly universally quantified.  (If there are no type variables in
1483 scope, all type variables mentioned in the signature are universally
1484 quantified, which is just as in Haskell 98.)  In this case, since @a@
1485 is in scope, it is not universally quantified, so the type of @ys@ is
1486 the same as that of @xs@.  In Haskell 98 it is not possible to declare
1487 a type for @ys@; a major benefit of scoped type variables is that
1488 it becomes possible to do so.
1489
1490 Scoped type variables are implemented in both GHC and Hugs.  Where the
1491 implementations differ from the specification below, those differences
1492 are noted.
1493
1494 So much for the basic idea.  Here are the details.
1495
1496 <sect2>Scope and implicit quantification
1497 <p>
1498
1499 <itemize>
1500 <item> All the type variables mentioned in the patterns for a single
1501 function definition equation, that are not already in scope,
1502 are brought into scope by the patterns.  We describe this set as
1503 the <em/type variables bound by the equation/.
1504
1505 <item> The type variables thus brought into scope may be mentioned
1506 in ordinary type signatures or pattern type signatures anywhere within
1507 their scope.
1508
1509 <item> In ordinary type signatures, any type variable mentioned in the
1510 signature that is in scope is <em/not/ universally quantified.
1511
1512 <item> Ordinary type signatures do not bring any new type variables
1513 into scope (except in the type signature itself!). So this is illegal:
1514
1515 <tscreen><verb>
1516   f :: a -> a
1517   f x = x::a
1518 </verb></tscreen>
1519
1520 It's illegal because @a@ is not in scope in the body of @f@,
1521 so the ordinary signature @x::a@ is equivalent to @x::forall a.a@;
1522 and that is an incorrect typing.
1523
1524 <item> There is no implicit universal quantification on pattern type
1525 signatures, nor may one write an explicit @forall@ type in a pattern
1526 type signature.  The pattern type signature is a monotype.
1527
1528 <item>
1529 The type variables in the head of a @class@ or @instance@ declaration
1530 scope over the methods defined in the @where@ part.  For example:
1531
1532 <tscreen><verb>
1533   class C a where
1534     op :: [a] -> a
1535
1536     op xs = let ys::[a]
1537                 ys = reverse xs
1538             in
1539             head ys
1540 </verb></tscreen>
1541
1542 (Not implemented in Hugs yet, Dec 98).
1543 </itemize>
1544
1545 <sect2>Polymorphism
1546 <p>
1547
1548 <itemize>
1549 <item> Pattern type signatures are completely orthogonal to ordinary, separate
1550 type signatures.  The two can be used independently or together.  There is
1551 no scoping associated with the names of the type variables in a separate type signature.
1552
1553 <tscreen><verb>
1554    f :: [a] -> [a]
1555    f (xs::[b]) = reverse xs
1556 </verb></tscreen>
1557
1558 <item> The function must be polymorphic in the type variables
1559 bound by all its equations.  Operationally, the type variables bound
1560 by one equation must not:
1561
1562 <itemize>
1563 <item> Be unified with a type (such as @Int@, or @[a]@).
1564 <item> Be unified with a type variable free in the environment.
1565 <item> Be unified with each other.  (They may unify with the type variables
1566 bound by another equation for the same function, of course.)
1567 </itemize>
1568
1569 For example, the following all fail to type check:
1570
1571 <tscreen><verb>
1572   f (x::a) (y::b) = [x,y]       -- a unifies with b
1573
1574   g (x::a) = x + 1::Int         -- a unifies with Int
1575
1576   h x = let k (y::a) = [x,y]    -- a is free in the
1577         in k x                  -- environment
1578
1579   k (x::a) True    = ...        -- a unifies with Int
1580   k (x::Int) False = ...
1581
1582   w :: [b] -> [b]
1583   w (x::a) = x                  -- a unifies with [b]
1584 </verb></tscreen>
1585
1586 <item> The pattern-bound type variable may, however, be constrained
1587 by the context of the principal type, thus:
1588
1589 <tscreen><verb>
1590   f (x::a) (y::a) = x+y*2
1591 </verb></tscreen>
1592
1593 gets the inferred type: @forall a. Num a => a -> a -> a@.
1594 </itemize>
1595
1596 <sect2>Result type signatures
1597 <p>
1598
1599 <itemize>
1600 <item> The result type of a function can be given a signature,
1601 thus:
1602
1603 <tscreen><verb>
1604   f (x::a) :: [a] = [x,x,x]
1605 </verb></tscreen>
1606
1607 The final @":: [a]"@ after all the patterns gives a signature to the
1608 result type.  Sometimes this is the only way of naming the type variable
1609 you want:
1610
1611 <tscreen><verb>
1612   f :: Int -> [a] -> [a]
1613   f n :: ([a] -> [a]) = let g (x::a, y::a) = (y,x)
1614                         in \xs -> map g (reverse xs `zip` xs)
1615 </verb></tscreen>
1616
1617 </itemize>
1618
1619 Result type signatures are not yet implemented in Hugs.
1620
1621 <sect2>Pattern signatures on other constructs
1622 <p>
1623
1624 <itemize>
1625 <item> A pattern type signature can be on an arbitrary sub-pattern, not
1626 just on a variable:
1627
1628 <tscreen><verb>
1629   f ((x,y)::(a,b)) = (y,x) :: (b,a)
1630 </verb></tscreen>
1631
1632 <item> Pattern type signatures, including the result part, can be used
1633 in lambda abstractions:
1634
1635 <tscreen><verb>
1636   (\ (x::a, y) :: a -> x)
1637 </verb></tscreen>
1638
1639 Type variables bound by these patterns must be polymorphic in
1640 the sense defined above.
1641 For example:
1642
1643 <tscreen><verb>
1644   f1 (x::c) = f1 x      -- ok
1645   f2 = \(x::c) -> f2 x  -- not ok
1646 </verb></tscreen>
1647
1648 Here, @f1@ is OK, but @f2@ is not, because @c@ gets unified
1649 with a type variable free in the environment, in this
1650 case, the type of @f2@, which is in the environment when
1651 the lambda abstraction is checked.
1652
1653 <item> Pattern type signatures, including the result part, can be used
1654 in @case@ expressions:
1655
1656 <tscreen><verb>
1657   case e of { (x::a, y) :: a -> x }
1658 </verb></tscreen>
1659
1660 The pattern-bound type variables must, as usual,
1661 be polymorphic in the following sense: each case alternative,
1662 considered as a lambda abstraction, must be polymorphic.
1663 Thus this is OK:
1664
1665 <tscreen><verb>
1666   case (True,False) of { (x::a, y) -> x }
1667 </verb></tscreen>
1668
1669 Even though the context is that of a pair of booleans,
1670 the alternative itself is polymorphic.  Of course, it is
1671 also OK to say:
1672
1673 <tscreen><verb>
1674   case (True,False) of { (x::Bool, y) -> x }
1675 </verb></tscreen>
1676
1677 <item>
1678 To avoid ambiguity, the type after the ``@::@'' in a result
1679 pattern signature on a lambda or @case@ must be atomic (i.e. a single
1680 token or a parenthesised type of some sort).  To see why,
1681 consider how one would parse this:
1682
1683 <tscreen><verb>
1684   \ x :: a -> b -> x
1685 </verb></tscreen>
1686
1687 <item> Pattern type signatures that bind new type variables
1688 may not be used in pattern bindings at all.
1689 So this is illegal:
1690
1691 <tscreen><verb>
1692   f x = let (y, z::a) = x in ...
1693 </verb></tscreen>
1694
1695 But these are OK, because they do not bind fresh type variables:
1696
1697 <tscreen><verb>
1698   f1 x            = let (y, z::Int) = x in ...
1699   f2 (x::(Int,a)) = let (y, z::a)   = x in ...
1700 </verb></tscreen>
1701
1702 However a single variable is considered a degenerate function binding,
1703 rather than a degerate pattern binding, so this is permitted, even
1704 though it binds a type variable:
1705
1706 <tscreen><verb>
1707   f :: (b->b) = \(x::b) -> x
1708 </verb></tscreen>
1709
1710 </itemize>
1711 Such degnerate function bindings do not fall under the monomorphism
1712 restriction.  Thus:
1713
1714 <tscreen><verb>
1715   g :: a -> a -> Bool = \x y. x==y
1716 </verb></tscreen>
1717
1718 Here @g@ has type @forall a. Eq a => a -> a -> Bool@, just as if
1719 @g@ had a separate type signature.  Lacking a type signature, @g@
1720 would get a monomorphic type.
1721
1722 <sect2>Existentials
1723 <p>
1724
1725 <itemize>
1726 <item> Pattern type signatures can bind existential type variables.
1727 For example:
1728
1729 <tscreen><verb>
1730   data T = forall a. MkT [a]
1731
1732   f :: T -> T
1733   f (MkT [t::a]) = MkT t3
1734                  where
1735                    t3::[a] = [t,t,t]
1736 </verb></tscreen>
1737
1738 </itemize>
1739
1740 %-----------------------------------------------------------------------------
1741 <sect1>Pragmas
1742 <label id="pragmas">
1743 <p>
1744
1745 GHC supports several pragmas, or instructions to the compiler placed
1746 in the source code.  Pragmas don't affect the meaning of the program,
1747 but they might affect the efficiency of the generated code.
1748
1749 <sect2>INLINE pragma
1750 <label id="inline-pragma">
1751 <nidx>INLINE pragma</nidx>
1752 <nidx>pragma, INLINE</nidx>
1753 <p>
1754
1755 GHC (with @-O@, as always) tries to inline (or ``unfold'')
1756 functions/values that are ``small enough,'' thus avoiding the call
1757 overhead and possibly exposing other more-wonderful optimisations.
1758
1759 You will probably see these unfoldings (in Core syntax) in your
1760 interface files.
1761
1762 Normally, if GHC decides a function is ``too expensive'' to inline, it
1763 will not do so, nor will it export that unfolding for other modules to
1764 use.
1765
1766 The sledgehammer you can bring to bear is the
1767 @INLINE@<nidx>INLINE pragma</nidx> pragma, used thusly:
1768 <tscreen><verb>
1769 key_function :: Int -> String -> (Bool, Double)
1770
1771 #ifdef __GLASGOW_HASKELL__
1772 {-# INLINE key_function #-}
1773 #endif
1774 </verb></tscreen>
1775 (You don't need to do the C pre-processor carry-on unless you're going
1776 to stick the code through HBC---it doesn't like @INLINE@ pragmas.)
1777
1778 The major effect of an @INLINE@ pragma is to declare a function's
1779 ``cost'' to be very low.  The normal unfolding machinery will then be
1780 very keen to inline it.
1781
1782 An @INLINE@ pragma for a function can be put anywhere its type
1783 signature could be put.
1784
1785 @INLINE@ pragmas are a particularly good idea for the
1786 @then@/@return@ (or @bind@/@unit@) functions in a monad.
1787 For example, in GHC's own @UniqueSupply@ monad code, we have:
1788 <tscreen><verb>
1789 #ifdef __GLASGOW_HASKELL__
1790 {-# INLINE thenUs #-}
1791 {-# INLINE returnUs #-}
1792 #endif
1793 </verb></tscreen>
1794
1795 <sect2>NOINLINE pragma
1796 <label id="noinline-pragma">
1797 <p>
1798 <nidx>NOINLINE pragma</nidx>
1799 <nidx>pragma, NOINLINE</nidx>
1800
1801 The @NOINLINE@ pragma does exactly what you'd expect: it stops the
1802 named function from being inlined by the compiler.  You shouldn't ever
1803 need to do this, unless you're very cautious about code size.
1804
1805 <sect2>SPECIALIZE pragma
1806 <label id="specialize-pragma">
1807 <p>
1808 <nidx>SPECIALIZE pragma</nidx>
1809 <nidx>pragma, SPECIALIZE</nidx>
1810 <nidx>overloading, death to</nidx>
1811
1812 (UK spelling also accepted.)  For key overloaded functions, you can
1813 create extra versions (NB: more code space) specialised to particular
1814 types.  Thus, if you have an overloaded function:
1815
1816 <tscreen><verb>
1817 hammeredLookup :: Ord key => [(key, value)] -> key -> value
1818 </verb></tscreen>
1819
1820 If it is heavily used on lists with @Widget@ keys, you could
1821 specialise it as follows:
1822 <tscreen><verb>
1823 {-# SPECIALIZE hammeredLookup :: [(Widget, value)] -> Widget -> value #-}
1824 </verb></tscreen>
1825
1826 To get very fancy, you can also specify a named function to use for
1827 the specialised value, by adding @= blah@, as in:
1828 <tscreen><verb>
1829 {-# SPECIALIZE hammeredLookup :: ...as before... = blah #-}
1830 </verb></tscreen>
1831 It's <em>Your Responsibility</em> to make sure that @blah@ really
1832 behaves as a specialised version of @hammeredLookup@!!!
1833
1834 NOTE: the @=blah@ feature isn't implemented in GHC 4.xx.
1835
1836 An example in which the @= blah@ form will Win Big:
1837 <tscreen><verb>
1838 toDouble :: Real a => a -> Double
1839 toDouble = fromRational . toRational
1840
1841 {-# SPECIALIZE toDouble :: Int -> Double = i2d #-}
1842 i2d (I# i) = D# (int2Double# i) -- uses Glasgow prim-op directly
1843 </verb></tscreen>
1844 The @i2d@ function is virtually one machine instruction; the
1845 default conversion---via an intermediate @Rational@---is obscenely
1846 expensive by comparison.
1847
1848 By using the US spelling, your @SPECIALIZE@ pragma will work with
1849 HBC, too.  Note that HBC doesn't support the @= blah@ form.
1850
1851 A @SPECIALIZE@ pragma for a function can be put anywhere its type
1852 signature could be put.
1853
1854 <sect2>SPECIALIZE instance pragma
1855 <label id="specialize-instance-pragma">
1856 <p>
1857 <nidx>SPECIALIZE pragma</nidx>
1858 <nidx>overloading, death to</nidx>
1859 Same idea, except for instance declarations.  For example:
1860 <tscreen><verb>
1861 instance (Eq a) => Eq (Foo a) where { ... usual stuff ... }
1862
1863 {-# SPECIALIZE instance Eq (Foo [(Int, Bar)] #-}
1864 </verb></tscreen>
1865 Compatible with HBC, by the way.
1866
1867 <sect2>LINE pragma
1868 <label id="line-pragma">
1869 <p>
1870 <nidx>LINE pragma</nidx>
1871 <nidx>pragma, LINE</nidx>
1872
1873 This pragma is similar to C's @#line@ pragma, and is mainly for use in
1874 automatically generated Haskell code.  It lets you specify the line
1875 number and filename of the original code; for example
1876
1877 <tscreen><verb>
1878 {-# LINE 42 "Foo.vhs" #-}
1879 </verb></tscreen>
1880
1881 if you'd generated the current file from something called @Foo.vhs@
1882 and this line corresponds to line 42 in the original.  GHC will adjust
1883 its error messages to refer to the line/file named in the @LINE@
1884 pragma.