ghc/docs/users_guide/glasgow_exts.vsgml

   1 %
   2 % $Id: glasgow_exts.vsgml,v 1.3 1998/12/02 13:20:38 simonm Exp $
   3 %
   4 % GHC Language Extensions.
   5 %
   6
   7 As with all known Haskell systems, GHC implements some extensions to
   8 the language.  To use them, you'll need to give a @-fglasgow-exts@%
   9 <nidx>-fglasgow-exts option</nidx> option.
  10
  11 Virtually all of the Glasgow extensions serve to give you access to
  12 the underlying facilities with which we implement Haskell.  Thus, you
  13 can get at the Raw Iron, if you are willing to write some non-standard
  14 code at a more primitive level.  You need not be ``stuck'' on
  15 performance because of the implementation costs of Haskell's
  16 ``high-level'' features---you can always code ``under'' them.  In an
  17 extreme case, you can write all your time-critical code in C, and then
  18 just glue it together with Haskell!
  19
  20 Executive summary of our extensions:
  21
  22 <descrip>
  23
  24 <tag>Unboxed types and primitive operations:</tag>
  25
  26 You can get right down to the raw machine types and operations;
  27 included in this are ``primitive arrays'' (direct access to Big Wads
  28 of Bytes).  Please see Section <ref name="Unboxed types"
  29 id="glasgow-unboxed"> and following.
  30
  31 <tag>Multi-parameter type classes:</tag>
  32
  33 GHC's type system supports extended type classes with multiple
  34 parameters.  Please see Section <ref name="Mult-parameter type
  35 classes" id="multi-param-type-classes">.
  36
  37 <tag>Local universal quantification:</tag>
  38
  39 GHC's type system supports explicit unversal quantification in
  40 constructor fields and function arguments.  This is useful for things
  41 like defining @runST@ from the state-thread world.  See Section <ref
  42 name="Local universal quantification" id="universal-quantification">.
  43
  44 <tag>Extistentially quantification in data types:</tag>
  45
  46 Some or all of the type variables in a datatype declaration may be
  47 <em>existentially quantified</em>.  More details in Section <ref
  48 name="Existential Quantification" id="existential-quantification">.
  49
  50 <tag>Calling out to C:</tag>
  51
  52 Just what it sounds like.  We provide <em>lots</em> of rope that you
  53 can dangle around your neck.  Please see Section <ref name="Calling~C
  54 directly from Haskell" id="glasgow-ccalls">.
  55
  56 </descrip>
  57
  58 Before you get too carried away working at the lowest level (e.g.,
  59 sloshing @MutableByteArray#@s around your program), you may wish to
  60 check if there are system libraries that provide a ``Haskellised
  61 veneer'' over the features you want.  See Section <ref name="GHC
  62 Prelude and libraries" id="ghc-prelude">.
  63
  64 %************************************************************************
  65 %*                                                                      *
  66 <sect1>Unboxed types
  67 <label id="glasgow-unboxed">
  68 <p>
  69 <nidx>Unboxed types (Glasgow extension)</nidx>
  70 %*                                                                      *
  71 %************************************************************************
  72
  73 These types correspond to the ``raw machine'' types you would use in
  74 C: @Int#@ (long int), @Double#@ (double), @Addr#@ (void *), etc.  The
  75 <em>primitive operations</em> (PrimOps) on these types are what you
  76 might expect; e.g., @(+#)@ is addition on @Int#@s, and is the
  77 machine-addition that we all know and love---usually one instruction.
  78
  79 There are some restrictions on the use of unboxed types, the main one
  80 being that you can't pass an unboxed value to a polymorphic function
  81 or store one in a polymorphic data type.  This rules out things like
  82 @[Int#]@ (ie. lists of unboxed integers).  The reason for this
  83 restriction is that polymorphic arguments and constructor fields are
  84 assumed to be pointers: if an unboxed integer is stored in one of
  85 these, the garbage collector would attempt to follow it, leading to
  86 unpredictable space leaks.  Or a @seq@ operation on the polymorphic
  87 component may attempt to dereference the pointer, with disastrous
  88 results.  Even worse, the unboxed value might be larger than a pointer
  89 (@Double#@ for instance).
  90
  91 Nevertheless, A numerically-intensive program using unboxed types can
  92 go a <em>lot</em> faster than its ``standard'' counterpart---we saw a
  93 threefold speedup on one example.
  94
  95 Please see Section <ref name="The module PrelGHC: really primitive
  96 stuff" id="ghc-libs-ghc"> for the details of unboxed types and the
  97 operations on them.
  98
  99 %************************************************************************
 100 %*                                                                      *
 101 <sect1>Primitive state-transformer monad
 102 <label id="glasgow-ST-monad">
 103 <p>
 104 <nidx>state transformers (Glasgow extensions)</nidx>
 105 <nidx>ST monad (Glasgow extension)</nidx>
 106 %*                                                                      *
 107 %************************************************************************
 108
 109 This monad underlies our implementation of arrays, mutable and
 110 immutable, and our implementation of I/O, including ``C calls''.
 111
 112 The @ST@ library, which provides access to the @ST@ monad, is a
 113 GHC/Hugs extension library and is described in the separate <htmlurl
 114 name="GHC/Hugs Extension Libraries" url="libs.html"> document.
 115
 116 %************************************************************************
 117 %*                                                                      *
 118 <sect1>Primitive arrays, mutable and otherwise
 119 <label id="glasgow-prim-arrays">
 120 <p>
 121 <nidx>primitive arrays (Glasgow extension)</nidx>
 122 <nidx>arrays, primitive (Glasgow extension)</nidx>
 123 %*                                                                      *
 124 %************************************************************************
 125
 126 GHC knows about quite a few flavours of Large Swathes of Bytes.
 127
 128 First, GHC distinguishes between primitive arrays of (boxed) Haskell
 129 objects (type @Array# obj@) and primitive arrays of bytes (type
 130 @ByteArray#@).
 131
 132 Second, it distinguishes between...
 133 <descrip>
 134 <tag>Immutable:</tag>
 135 Arrays that do not change (as with ``standard'' Haskell arrays); you
 136 can only read from them.  Obviously, they do not need the care and
 137 attention of the state-transformer monad.
 138
 139 <tag>Mutable:</tag>
 140 Arrays that may be changed or ``mutated.''  All the operations on them
 141 live within the state-transformer monad and the updates happen
 142 <em>in-place</em>.
 143
 144 <tag>``Static'' (in C land):</tag>
 145 A C routine may pass an @Addr#@ pointer back into Haskell land.  There
 146 are then primitive operations with which you may merrily grab values
 147 over in C land, by indexing off the ``static'' pointer.
 148
 149 <tag>``Stable'' pointers:</tag>
 150 If, for some reason, you wish to hand a Haskell pointer (i.e.,
 151 <em>not</em> an unboxed value) to a C routine, you first make the
 152 pointer ``stable,'' so that the garbage collector won't forget that it
 153 exists.  That is, GHC provides a safe way to pass Haskell pointers to
 154 C.
 155
 156 Please see Section <ref name="Subverting automatic unboxing with
 157 ``stable pointers''" id="glasgow-stablePtrs"> for more details.
 158
 159 <tag>``Foreign objects'':</tag>
 160 A ``foreign object'' is a safe way to pass an external object (a
 161 C-allocated pointer, say) to Haskell and have Haskell do the Right
 162 Thing when it no longer references the object.  So, for example, C
 163 could pass a large bitmap over to Haskell and say ``please free this
 164 memory when you're done with it.''
 165
 166 Please see Section <ref name="Pointing outside the Haskell heap"
 167 id="glasgow-foreignObjs"> for more details.
 168
 169 </descrip>
 170
 171 The libraries section gives more details on all these ``primitive
 172 array'' types and the operations on them, Section <ref name="The GHC
 173 Prelude and Libraries" id="ghc-prelude">.  Some of these extensions
 174 are also supported by Hugs, and the supporting libraries are described
 175 in the <htmlurl name="GHC/Hugs Extension Libraries" url="libs.html">
 176 document.
 177
 178 %************************************************************************
 179 %*                                                                      *
 180 <sect1>Calling~C directly from Haskell
 181 <label id="glasgow-ccalls">
 182 <p>
 183 <nidx>C calls (Glasgow extension)</nidx>
 184 <nidx>_ccall_ (Glasgow extension)</nidx>
 185 <nidx>_casm_ (Glasgow extension)</nidx>
 186 %*                                                                      *
 187 %************************************************************************
 188
 189 GOOD ADVICE: Because this stuff is not Entirely Stable as far as names
 190 and things go, you would be well-advised to keep your C-callery
 191 corraled in a few modules, rather than sprinkled all over your code.
 192 It will then be quite easy to update later on.
 193
 194 %************************************************************************
 195 %*                                                                      *
 196 <sect2>@_ccall_@ and @_casm_@: an introduction
 197 <label id="ccall-intro">
 198 <p>
 199 %*                                                                      *
 200 %************************************************************************
 201
 202 The simplest way to use a simple C function
 203
 204 <tscreen><verb>
 205 double fooC( FILE *in, char c, int i, double d, unsigned int u )
 206 </verb></tscreen>
 207
 208 is to provide a Haskell wrapper:
 209
 210 <tscreen><verb>
 211 fooH :: Char -> Int -> Double -> Word -> IO Double
 212 fooH c i d w = _ccall_ fooC (``stdin''::Addr) c i d w
 213 </verb></tscreen>
 214
 215 The function @fooH@ will unbox all of its arguments, call the C
 216 function @fooC@ and box the corresponding arguments.
 217
 218 One of the annoyances about @_ccall_@s is when the C types don't quite
 219 match the Haskell compiler's ideas.  For this, the @_casm_@ variant
 220 may be just the ticket (NB: <em>no chance</em> of such code going
 221 through a native-code generator):
 222
 223 <tscreen><verb>
 224 oldGetEnv name
 225   = _casm_ ``%r = getenv((char *) %0);'' name >>= \ litstring@(A# str#) ->
 226     return (
 227         if (litstring == ``NULL'') then
 228             Left ("Fail:oldGetEnv:"++name)
 229         else
 230             Right (unpackCString# str#)
 231     )
 232 </verb></tscreen>
 233
 234 The first literal-literal argument to a @_casm_@ is like a @printf@
 235 format: @%r@ is replaced with the ``result,'' @%0@--@%n-1@ are
 236 replaced with the 1st--nth arguments.  As you can see above, it is an
 237 easy way to do simple C~casting.  Everything said about @_ccall_@ goes
 238 for @_casm_@ as well.
 239
 240 The use of @_casm_@ in your code does pose a problem to the compiler
 241 when it comes to generating an interface file for a freshly compiled
 242 module. Included in an interface file is the unfolding (if any) of a
 243 declaration. However, if a declaration's unfolding happens to contain
 244 a @_casm_@, its unfolding will <em/not/ be emitted into the interface
 245 file even if it qualifies by all the other criteria. The reason why
 246 the compiler prevents this from happening is that unfolding @_casm_@s
 247 into an interface file unduly constrains how code that import your
 248 module have to be compiled. If an imported declaration is unfolded and
 249 it contains a @_casm_@, you now have to be using a compiler backend
 250 capable of dealing with it (i.e., the C compiler backend). If you are
 251 using the C compiler backend, the unfolded @_casm_@ may still cause you
 252 problems since the C code snippet it contains may mention CPP symbols
 253 that were in scope when compiling the original module are not when
 254 compiling the importing module.
 255
 256 If you're willing to put up with the drawbacks of doing cross-module
 257 inlining of C code (GHC - A Better C Compiler :-), the option
 258 @-funfold-casms-in-hi-file@ will turn off the default behaviour.
 259 <nidx>-funfold-casms-in-hi-file option</nidx>
 260
 261 %************************************************************************
 262 %*                                                                      *
 263 <sect2>Using function headers
 264 <label id="glasgow-foreign-headers">
 265 <p>
 266 <nidx>C calls, function headers</nidx>
 267 %*                                                                      *
 268 %************************************************************************
 269
 270 When generating C (using the @-fvia-C@ directive), one can assist the
 271 C compiler in detecting type errors by using the @-#include@ directive
 272 to provide @.h@ files containing function headers.
 273
 274 For example,
 275
 276 <tscreen><verb>
 277 typedef unsigned long *StgForeignObj;
 278 typedef long StgInt;
 279
 280 void          initialiseEFS (StgInt size);
 281 StgInt        terminateEFS (void);
 282 StgForeignObj emptyEFS(void);
 283 StgForeignObj updateEFS (StgForeignObj a, StgInt i, StgInt x);
 284 StgInt        lookupEFS (StgForeignObj a, StgInt i);
 285 </verb></tscreen>
 286
 287 You can find appropriate definitions for @StgInt@, @StgForeignObj@,
 288 etc using @gcc@ on your architecture by consulting
 289 @ghc/includes/StgTypes.h@.  The following table summarises the
 290 relationship between Haskell types and C types.
 291
 292 <tabular ca="ll">
 293 <bf>C type name</bf>      | <bf>Haskell Type</bf> @@
 294 @@
 295 @StgChar@          | @Char#@ @@
 296 @StgInt@           | @Int#@ @@
 297 @StgWord@          | @Word#@ @@
 298 @StgAddr@          | @Addr#@ @@
 299 @StgFloat@         | @Float#@ @@
 300 @StgDouble@        | @Double#@ @@
 301
 302 @StgArray@         | @Array#@ @@
 303 @StgByteArray@     | @ByteArray#@ @@
 304 @StgArray@         | @MutableArray#@ @@
 305 @StgByteArray@     | @MutableByteArray#@ @@
 306
 307 @StgStablePtr@     | @StablePtr#@ @@
 308 @StgForeignObj@    | @ForeignObj#@
 309 </tabular>
 310
 311 Note that this approach is only <em>essential</em> for returning
 312 @float@s (or if @sizeof(int) != sizeof(int *)@ on your
 313 architecture) but is a Good Thing for anyone who cares about writing
 314 solid code.  You're crazy not to do it.
 315
 316 %************************************************************************
 317 %*                                                                      *
 318 <sect2>Subverting automatic unboxing with ``stable pointers''
 319 <label id="glasgow-stablePtrs">
 320 <p>
 321 <nidx>stable pointers (Glasgow extension)</nidx>
 322 %*                                                                      *
 323 %************************************************************************
 324
 325 The arguments of a @_ccall_@ are automatically unboxed before the
 326 call.  There are two reasons why this is usually the Right Thing to
 327 do:
 328
 329 <itemize>
 330 <item>
 331 C is a strict language: it would be excessively tedious to pass
 332 unevaluated arguments and require the C programmer to force their
 333 evaluation before using them.
 334
 335 <item> Boxed values are stored on the Haskell heap and may be moved
 336 within the heap if a garbage collection occurs---that is, pointers
 337 to boxed objects are not <em>stable</em>.
 338 </itemize>
 339
 340 It is possible to subvert the unboxing process by creating a ``stable
 341 pointer'' to a value and passing the stable pointer instead.  For
 342 example, to pass/return an integer lazily to C functions @storeC@ and
 343 @fetchC@, one might write:
 344
 345 <tscreen><verb>
 346 storeH :: Int -> IO ()
 347 storeH x = makeStablePtr x              >>= \ stable_x ->
 348            _ccall_ storeC stable_x
 349
 350 fetchH :: IO Int
 351 fetchH x = _ccall_ fetchC               >>= \ stable_x ->
 352            deRefStablePtr stable_x      >>= \ x ->
 353            freeStablePtr stable_x       >>
 354            return x
 355 </verb></tscreen>
 356
 357 The garbage collector will refrain from throwing a stable pointer away
 358 until you explicitly call one of the following from C or Haskell.
 359
 360 <tscreen><verb>
 361 void freeStablePointer( StgStablePtr stablePtrToToss )
 362 freeStablePtr :: StablePtr a -> IO ()
 363 </verb></tscreen>
 364
 365 As with the use of @free@ in C programs, GREAT CARE SHOULD BE
 366 EXERCISED to ensure these functions are called at the right time: too
 367 early and you get dangling references (and, if you're lucky, an error
 368 message from the runtime system); too late and you get space leaks.
 369
 370 And to force evaluation of the argument within @fooC@, one would
 371 call one of the following C functions (according to type of argument).
 372
 373 <tscreen><verb>
 374 void     performIO  ( StgStablePtr stableIndex /* StablePtr s (IO ()) */ );
 375 StgInt   enterInt   ( StgStablePtr stableIndex /* StablePtr s Int */ );
 376 StgFloat enterFloat ( StgStablePtr stableIndex /* StablePtr s Float */ );
 377 </verb></tscreen>
 378
 379 <nidx>performIO</nidx>
 380 <nidx>enterInt</nidx>
 381 <nidx>enterFloat</nidx>
 382
 383 Note Bene: @_ccall_GC_@<nidx>_ccall_GC_</nidx> must be used if any of
 384 these functions are used.
 385
 386 %************************************************************************
 387 %*                                                                      *
 388 <sect2>Foreign objects: pointing outside the Haskell heap
 389 <label id="glasgow-foreignObjs">
 390 <p>
 391 <nidx>foreign objects (Glasgow extension)</nidx>
 392 %*                                                                      *
 393 %************************************************************************
 394
 395 There are two types that @ghc@ programs can use to reference
 396 (heap-allocated) objects outside the Haskell world: @Addr@ and
 397 @ForeignObj@.
 398
 399 If you use @Addr@, it is up to you to the programmer to arrange
 400 allocation and deallocation of the objects.
 401
 402 If you use @ForeignObj@, @ghc@'s garbage collector will call upon the
 403 user-supplied <em>finaliser</em> function to free the object when the
 404 Haskell world no longer can access the object.  (An object is
 405 associated with a finaliser function when the abstract
 406  Haskell type @ForeignObj@ is created). The finaliser function is
 407 expressed in C, and is passed as argument the object:
 408
 409 <tscreen><verb>
 410 void foreignFinaliser ( StgForeignObj fo )
 411 </verb></tscreen>
 412
 413 when the Haskell world can no longer access the object.  Since
 414 @ForeignObj@s only get released when a garbage collection occurs, we
 415 provide ways of triggering a garbage collection from within C and from
 416 within Haskell.
 417
 418 <tscreen><verb>
 419 void GarbageCollect()
 420 performGC :: IO ()
 421 </verb></tscreen>
 422
 423 More information on the programmers' interface to @ForeignObj@ can be
 424 found in Section <ref name="Foreign objects" id="sec:foreign-obj">.
 425
 426 %************************************************************************
 427 %*                                                                      *
 428 <sect2>Avoiding monads
 429 <label id="glasgow-avoiding-monads">
 430 <p>
 431 <nidx>C calls to `pure C'</nidx>
 432 <nidx>unsafePerformIO</nidx>
 433 %*                                                                      *
 434 %************************************************************************
 435
 436 The @_ccall_@ construct is part of the @IO@ monad because 9 out of 10
 437 uses will be to call imperative functions with side effects such as
 438 @printf@.  Use of the monad ensures that these operations happen in a
 439 predictable order in spite of laziness and compiler optimisations.
 440
 441 To avoid having to be in the monad to call a C function, it is
 442 possible to use @unsafePerformIO@, which is available from the
 443 @IOExts@ module.  There are three situations where one might like to
 444 call a C function from outside the IO world:
 445
 446 <itemize>
 447 <item>
 448 Calling a function with no side-effects:
 449 <tscreen><verb>
 450 atan2d :: Double -> Double -> Double
 451 atan2d y x = unsafePerformIO (_ccall_ atan2d y x)
 452
 453 sincosd :: Double -> (Double, Double)
 454 sincosd x = unsafePerformIO $ do
 455         da <- newDoubleArray (0, 1)
 456         _casm_ ``sincosd( %0, &((double *)%1[0]), &((double *)%1[1]) );'' x da
 457         s <- readDoubleArray da 0
 458         c <- readDoubleArray da 1
 459         return (s, c)
 460 </verb></tscreen>
 461
 462 <item> Calling a set of functions which have side-effects but which can
 463 be used in a purely functional manner.
 464
 465 For example, an imperative implementation of a purely functional
 466 lookup-table might be accessed using the following functions.
 467
 468 <tscreen><verb>
 469 empty  :: EFS x
 470 update :: EFS x -> Int -> x -> EFS x
 471 lookup :: EFS a -> Int -> a
 472
 473 empty = unsafePerformIO (_ccall_ emptyEFS)
 474
 475 update a i x = unsafePerformIO $
 476         makeStablePtr x         >>= \ stable_x ->
 477         _ccall_ updateEFS a i stable_x
 478
 479 lookup a i = unsafePerformIO $
 480         _ccall_ lookupEFS a i   >>= \ stable_x ->
 481         deRefStablePtr stable_x
 482 </verb></tscreen>
 483
 484 You will almost always want to use @ForeignObj@s with this.
 485
 486 <item> Calling a side-effecting function even though the results will
 487 be unpredictable.  For example the @trace@ function is defined by:
 488
 489 <tscreen><verb>
 490 trace :: String -> a -> a
 491 trace string expr
 492   = unsafePerformIO (
 493         ((_ccall_ PreTraceHook sTDERR{-msg-}):: IO ())  >>
 494         fputs sTDERR string                             >>
 495         ((_ccall_ PostTraceHook sTDERR{-msg-}):: IO ()) >>
 496         return expr )
 497   where
 498     sTDERR = (``stderr'' :: Addr)
 499 </verb></tscreen>
 500
 501 (This kind of use is not highly recommended --- it is only really
 502 useful in debugging code.)
 503 </itemize>
 504
 505 %************************************************************************
 506 %*                                                                      *
 507 <sect2>C-calling ``gotchas'' checklist
 508 <label id="ccall-gotchas">
 509 <p>
 510 <nidx>C call dangers</nidx>
 511 %*                                                                      *
 512 %************************************************************************
 513
 514 And some advice, too.
 515
 516 <itemize>
 517 <item> For modules that use @_ccall_@s, etc., compile with
 518 @-fvia-C@.<nidx>-fvia-C option</nidx> You don't have to, but you should.
 519
 520 Also, use the @-#include "prototypes.h"@ flag (hack) to inform the C
 521 compiler of the fully-prototyped types of all the C functions you
 522 call.  (Section <ref name="Using function headers"
 523 id="glasgow-foreign-headers"> says more about this...)
 524
 525 This scheme is the <em>only</em> way that you will get <em>any</em>
 526 typechecking of your @_ccall_@s.  (It shouldn't be that way, but...).
 527 GHC will pass the flag @-Wimplicit@ to gcc so that you'll get warnings
 528 if any @_ccall_@ed functions have no prototypes.
 529
 530 <item>
 531 Try to avoid @_ccall_@s to C~functions that take @float@
 532 arguments or return @float@ results.  Reason: if you do, you will
 533 become entangled in (ANSI?) C's rules for when arguments/results are
 534 promoted to @doubles@.  It's a nightmare and just not worth it.
 535 Use @doubles@ if possible.
 536
 537 If you do use @floats@, check and re-check that the right thing is
 538 happening.  Perhaps compile with @-keep-hc-file-too@ and look at
 539 the intermediate C (@.hc@ file).
 540
 541 <item> The compiler uses two non-standard type-classes when
 542 type-checking the arguments and results of @_ccall_@: the arguments
 543 (respectively result) of @_ccall_@ must be instances of the class
 544 @CCallable@ (respectively @CReturnable@).  Both classes may be
 545 imported from the module @CCall@, but this should only be
 546 necessary if you want to define a new instance.  (Neither class
 547 defines any methods --- their only function is to keep the
 548 type-checker happy.)
 549
 550 The type checker must be able to figure out just which of the
 551 C-callable/returnable types is being used.  If it can't, you have to
 552 add type signatures. For example,
 553
 554 <tscreen><verb>
 555 f x = _ccall_ foo x
 556 </verb></tscreen>
 557
 558 is not good enough, because the compiler can't work out what type @x@
 559 is, nor what type the @_ccall_@ returns.  You have to write, say:
 560
 561 <tscreen><verb>
 562 f :: Int -> IO Double
 563 f x = _ccall_ foo x
 564 </verb></tscreen>
 565
 566 This table summarises the standard instances of these classes.
 567
 568 % ToDo: check this table against implementation!
 569
 570 <tabular ca="llll">
 571 <bf>Type</bf>       |<bf>CCallable</bf>|<bf>CReturnable</bf> | <bf>Which is probably...</bf> @@
 572
 573 @Char@              | Yes  | Yes   | @unsigned char@ @@
 574 @Int@               | Yes  | Yes   | @long int@ @@
 575 @Word@              | Yes  | Yes   | @unsigned long int@ @@
 576 @Addr@              | Yes  | Yes   | @void *@ @@
 577 @Float@             | Yes  | Yes   | @float@ @@
 578 @Double@            | Yes  | Yes   | @double@ @@
 579 @()@                | No   | Yes   | @void@ @@
 580 @[Char]@            | Yes  | No    | @char *@ (null-terminated) @@
 581
 582 @Array@             | Yes  | No    | @unsigned long *@ @@
 583 @ByteArray@         | Yes  | No    | @unsigned long *@ @@
 584 @MutableArray@      | Yes  | No    | @unsigned long *@ @@
 585 @MutableByteArray@  | Yes  | No    | @unsigned long *@ @@
 586
 587 @State@             | Yes  | Yes   | nothing!@@
 588
 589 @StablePtr@         | Yes  | Yes   | @unsigned long *@ @@
 590 @ForeignObjs@       | Yes  | Yes   | see later @@
 591 </tabular>
 592
 593 Actually, the @Word@ type is defined as being the same size as a
 594 pointer on the target architecture, which is <em>probably</em>
 595 @unsigned long int@.
 596
 597 The brave and careful programmer can add their own instances of these
 598 classes for the following types:
 599
 600 <itemize>
 601 <item>
 602 A <em>boxed-primitive</em> type may be made an instance of both
 603 @CCallable@ and @CReturnable@.
 604
 605 A boxed primitive type is any data type with a
 606 single unary constructor with a single primitive argument.  For
 607 example, the following are all boxed primitive types:
 608
 609 <tscreen><verb>
 610 Int
 611 Double
 612 data XDisplay = XDisplay Addr#
 613 data EFS a = EFS# ForeignObj#
 614 </verb></tscreen>
 615
 616 <tscreen><verb>
 617 instance CCallable   (EFS a)
 618 instance CReturnable (EFS a)
 619 </verb></tscreen>
 620
 621 <item> Any datatype with a single nullary constructor may be made an
 622 instance of @CReturnable@.  For example:
 623
 624 <tscreen><verb>
 625 data MyVoid = MyVoid
 626 instance CReturnable MyVoid
 627 </verb></tscreen>
 628
 629 <item> As at version 2.09, @String@ (i.e., @[Char]@) is still
 630 not a @CReturnable@ type.
 631
 632 Also, the now-builtin type @PackedString@ is neither
 633 @CCallable@ nor @CReturnable@.  (But there are functions in
 634 the PackedString interface to let you get at the necessary bits...)
 635 </itemize>
 636
 637 <item> The code-generator will complain if you attempt to use @%r@ in
 638 a @_casm_@ whose result type is @IO ()@; or if you don't use @%r@
 639 <em>precisely</em> once for any other result type.  These messages are
 640 supposed to be helpful and catch bugs---please tell us if they wreck
 641 your life.
 642
 643 <item> If you call out to C code which may trigger the Haskell garbage
 644 collector or create new threads (examples of this later...), then you
 645 must use the @_ccall_GC_@<nidx>_ccall_GC_ primitive</nidx> or
 646 @_casm_GC_@<nidx>_casm_GC_ primitive</nidx> variant of C-calls.  (This
 647 does not work with the native code generator - use @\fvia-C@.) This
 648 stuff is hairy with a capital H!  </itemize>
 649
 650 <sect1> Multi-parameter type classes
 651 <label id="multi-param-type-classes">
 652 <p>
 653
 654 This section documents GHC's implementation of multi-paramter type
 655 classes.  There's lots of background in the paper <url name="Type
 656 classes: exploring the design space"
 657 url="http://www.dcs.gla.ac.uk/~simonpj/multi.ps.gz"> (Simon Peyton
 658 Jones, Mark Jones, Erik Meijer).
 659
 660 I'd like to thank people who reported shorcomings in the GHC 3.02
 661 implementation.  Our default decisions were all conservative ones, and
 662 the experience of these heroic pioneers has given useful concrete
 663 examples to support several generalisations.  (These appear below as
 664 design choices not implemented in 3.02.)
 665
 666 I've discussed these notes with Mark Jones, and I believe that Hugs
 667 will migrate towards the same design choices as I outline here.
 668 Thanks to him, and to many others who have offered very useful
 669 feedback.
 670
 671 <sect2>Types
 672 <p>
 673
 674 There are the following restrictions on the form of a qualified
 675 type:
 676
 677 <tscreen><verb>
 678   forall tv1..tvn (c1, ...,cn) => type
 679 </verb></tscreen>
 680
 681 (Here, I write the "foralls" explicitly, although the Haskell source
 682 language omits them; in Haskell 1.4, all the free type variables of an
 683 explicit source-language type signature are universally quantified,
 684 except for the class type variables in a class declaration.  However,
 685 in GHC, you can give the foralls if you want.  See Section <ref
 686 name="Explicit universal quantification"
 687 id="universal-quantification">).
 688
 689 <enum>
 690
 691 <item> <bf>Each universally quantified type variable
 692 @tvi@ must be mentioned (i.e. appear free) in @type@</bf>.
 693
 694 The reason for this is that a value with a type that does not obey
 695 this restriction could not be used without introducing
 696 ambiguity. Here, for example, is an illegal type:
 697
 698 <tscreen><verb>
 699   forall a. Eq a => Int
 700 </verb></tscreen>
 701
 702 When a value with this type was used, the constraint <tt>Eq tv</tt>
 703 would be introduced where <tt>tv</tt> is a fresh type variable, and
 704 (in the dictionary-translation implementation) the value would be
 705 applied to a dictionary for <tt>Eq tv</tt>.  The difficulty is that we
 706 can never know which instance of <tt>Eq</tt> to use because we never
 707 get any more information about <tt>tv</tt>.
 708
 709 <item> <bf>Every constraint @ci@ must mention at least one of the
 710 universally quantified type variables @tvi@</bf>.
 711
 712 For example, this type is OK because <tt>C a b</tt> mentions the
 713 universally quantified type variable <tt>b</tt>:
 714
 715 <tscreen><verb>
 716   forall a. C a b => burble
 717 </verb></tscreen>
 718
 719 The next type is illegal because the constraint <tt>Eq b</tt> does not
 720 mention <tt>a</tt>:
 721
 722 <tscreen><verb>
 723   forall a. Eq b => burble
 724 </verb></tscreen>
 725
 726 The reason for this restriction is milder than the other one.  The
 727 excluded types are never useful or necessary (because the offending
 728 context doesn't need to be witnessed at this point; it can be floated
 729 out).  Furthermore, floating them out increases sharing. Lastly,
 730 excluding them is a conservative choice; it leaves a patch of
 731 territory free in case we need it later.
 732
 733 </enum>
 734
 735 These restrictions apply to all types, whether declared in a type signature
 736 or inferred.
 737
 738 Unlike Haskell 1.4, constraints in types do <bf>not</bf> have to be of
 739 the form <em>(class type-variables)</em>.  Thus, these type signatures
 740 are perfectly OK
 741
 742 <tscreen><verb>
 743   f :: Eq (m a) => [m a] -> [m a]
 744   g :: Eq [a] => ...
 745 </verb></tscreen>
 746
 747 This choice recovers principal types, a property that Haskell 1.4 does not have.
 748
 749 <sect2>Class declarations
 750 <p>
 751
 752 <enum>
 753
 754 <item> <bf>Multi-parameter type classes are permitted</bf>. For example:
 755
 756 <tscreen><verb>
 757   class Collection c a where
 758     union :: c a -> c a -> c a
 759     ...etc..
 760 </verb></tscreen>
 761
 762
 763 <item> <bf>The class hierarchy must be acyclic</bf>.  However, the definition
 764 of "acyclic" involves only the superclass relationships.  For example,
 765 this is OK:
 766
 767 <tscreen><verb>
 768   class C a where {
 769     op :: D b => a -> b -> b
 770   }
 771
 772   class C a => D a where { ... }
 773 </verb></tscreen>
 774
 775 Here, <tt>C</tt> is a superclass of <tt>D</tt>, but it's OK for a
 776 class operation <tt>op</tt> of <tt>C</tt> to mention <tt>D</tt>.  (It
 777 would not be OK for <tt>D</tt> to be a superclass of <tt>C</tt>.)
 778
 779 <item> <bf>There are no restrictions on the context in a class declaration
 780 (which introduces superclasses), except that the class hierarchy must
 781 be acyclic</bf>.  So these class declarations are OK:
 782
 783 <tscreen><verb>
 784   class Functor (m k) => FiniteMap m k where
 785     ...
 786
 787   class (Monad m, Monad (t m)) => Transform t m where
 788     lift :: m a -> (t m) a
 789 </verb></tscreen>
 790
 791 <item> <bf>In the signature of a class operation, every constraint
 792 must mention at least one type variable that is not a class type
 793 variable</bf>.
 794
 795 Thus:
 796
 797 <tscreen><verb>
 798   class Collection c a where
 799     mapC :: Collection c b => (a->b) -> c a -> c b
 800 </verb></tscreen>
 801
 802 is OK because the constraint <tt>(Collection a b)</tt> mentions
 803 <tt>b</tt>, even though it also mentions the class variable
 804 <tt>a</tt>.  On the other hand:
 805
 806 <tscreen><verb>
 807   class C a where
 808     op :: Eq a => (a,b) -> (a,b)
 809 </verb></tscreen>
 810
 811 is not OK because the constraint <tt>(Eq a)</tt> mentions on the class
 812 type variable <tt>a</tt>, but not <tt>b</tt>.  However, any such
 813 example is easily fixed by moving the offending context up to the
 814 superclass context:
 815
 816 <tscreen><verb>
 817   class Eq a => C a where
 818     op ::(a,b) -> (a,b)
 819 </verb></tscreen>
 820
 821 A yet more relaxed rule would allow the context of a class-op signature
 822 to mention only class type variables.  However, that conflicts with
 823 Rule 1(b) for types above.
 824
 825 <item> <bf>The type of each class operation must mention <em/all/ of
 826 the class type variables</bf>.  For example:
 827
 828 <tscreen><verb>
 829   class Coll s a where
 830     empty  :: s
 831     insert :: s -> a -> s
 832 </verb></tscreen>
 833
 834 is not OK, because the type of <tt>empty</tt> doesn't mention
 835 <tt>a</tt>.  This rule is a consequence of Rule 1(a), above, for
 836 types, and has the same motivation.
 837
 838 Sometimes, offending class declarations exhibit misunderstandings.  For
 839 example, <tt>Coll</tt> might be rewritten
 840
 841 <tscreen><verb>
 842   class Coll s a where
 843     empty  :: s a
 844     insert :: s a -> a -> s a
 845 </verb></tscreen>
 846
 847 which makes the connection between the type of a collection of
 848 <tt>a</tt>'s (namely <tt>(s a)</tt>) and the element type <tt>a</tt>.
 849 Occasionally this really doesn't work, in which case you can split the
 850 class like this:
 851
 852 <tscreen><verb>
 853   class CollE s where
 854     empty  :: s
 855
 856   class CollE s => Coll s a where
 857     insert :: s -> a -> s
 858 </verb></tscreen>
 859
 860 </enum>
 861
 862 <sect2>Instance declarations
 863 <p>
 864
 865 <enum>
 866
 867 <item> <bf>Instance declarations may not overlap</bf>.  The two instance
 868 declarations
 869
 870 <tscreen><verb>
 871   instance context1 => C type1 where ...
 872   instance context2 => C type2 where ...
 873 </verb></tscreen>
 874
 875 "overlap" if @type1@ and @type2@ unify
 876
 877 However, if you give the command line option
 878 @-fallow-overlapping-instances@<nidx>-fallow-overlapping-instances
 879 option</nidx> then two overlapping instance declarations are permitted
 880 iff
 881
 882 <itemize>
 883 <item> EITHER @type1@ and @type2@ do not unify
 884 <item> OR @type2@ is a substitution instance of @type1@
 885                 (but not identical to @type1@)
 886 <item> OR vice versa
 887 </itemize>
 888
 889 Notice that these rules
 890
 891 <itemize>
 892 <item> make it clear which instance decl to use
 893            (pick the most specific one that matches)
 894
 895 <item> do not mention the contexts @context1@, @context2@
 896             Reason: you can pick which instance decl
 897             "matches" based on the type.
 898 </itemize>
 899
 900 Regrettably, GHC doesn't guarantee to detect overlapping instance
 901 declarations if they appear in different modules.  GHC can "see" the
 902 instance declarations in the transitive closure of all the modules
 903 imported by the one being compiled, so it can "see" all instance decls
 904 when it is compiling <tt>Main</tt>.  However, it currently chooses not
 905 to look at ones that can't possibly be of use in the module currently
 906 being compiled, in the interests of efficiency.  (Perhaps we should
 907 change that decision, at least for <tt>Main</tt>.)
 908
 909 <item> <bf>There are no restrictions on the type in an instance
 910 <em/head/, except that at least one must not be a type variable</bf>.
 911 The instance "head" is the bit after the "=>" in an instance decl. For
 912 example, these are OK:
 913
 914 <tscreen><verb>
 915   instance C Int a where ...
 916
 917   instance D (Int, Int) where ...
 918
 919   instance E [[a]] where ...
 920 </verb></tscreen>
 921
 922 Note that instance heads <bf>may</bf> contain repeated type variables.
 923 For example, this is OK:
 924
 925 <tscreen><verb>
 926   instance Stateful (ST s) (MutVar s) where ...
 927 </verb></tscreen>
 928
 929 The "at least one not a type variable" restriction is to ensure that
 930 context reduction terminates: each reduction step removes one type
 931 constructor.  For example, the following would make the type checker
 932 loop if it wasn't excluded:
 933
 934 <tscreen><verb>
 935   instance C a => C a where ...
 936 </verb></tscreen>
 937
 938 There are two situations in which the rule is a bit of a pain. First,
 939 if one allows overlapping instance declarations then it's quite
 940 convenient to have a "default instance" declaration that applies if
 941 something more specific does not:
 942
 943 <tscreen><verb>
 944   instance C a where
 945     op = ... -- Default
 946 </verb></tscreen>
 947
 948 Second, sometimes you might want to use the following to get the
 949 effect of a "class synonym":
 950
 951 <tscreen><verb>
 952   class (C1 a, C2 a, C3 a) => C a where { }
 953
 954   instance (C1 a, C2 a, C3 a) => C a where { }
 955 </verb></tscreen>
 956
 957 This allows you to write shorter signatures:
 958
 959 <tscreen><verb>
 960   f :: C a => ...
 961 </verb></tscreen>
 962
 963 instead of
 964
 965 <tscreen><verb>
 966   f :: (C1 a, C2 a, C3 a) => ...
 967 </verb></tscreen>
 968
 969 I'm on the lookout for a simple rule that preserves decidability while
 970 allowing these idioms.  The experimental flag
 971 @-fallow-undecidable-instances@<nidx>-fallow-undecidable-instances
 972 option</nidx> lifts this restriction, allowing all the types in an
 973 instance head to be type variables.
 974
 975 <item> <bf>Unlike Haskell 1.4, instance heads may use type
 976 synonyms</bf>.  As always, using a type synonym is just shorthand for
 977 writing the RHS of the type synonym definition.  For example:
 978
 979 <tscreen><verb>
 980   type Point = (Int,Int)
 981   instance C Point   where ...
 982   instance C [Point] where ...
 983 </verb></tscreen>
 984
 985 is legal.  However, if you added
 986
 987 <tscreen><verb>
 988   instance C (Int,Int) where ...
 989 </verb></tscreen>
 990
 991 as well, then the compiler will complain about the overlapping
 992 (actually, identical) instance declarations.  As always, type synonyms
 993 must be fully applied.  You cannot, for example, write:
 994
 995 <tscreen><verb>
 996   type P a = [[a]]
 997   instance Monad P where ...
 998 </verb></tscreen>
 999
1000 This design decision is independent of all the others, and easily
1001 reversed, but it makes sense to me.
1002
1003 <item><bf>The types in an instance-declaration <em/context/ must all
1004 be type variables</bf>. Thus
1005
1006 <tscreen><verb>
1007   instance C a b => Eq (a,b) where ...
1008 </verb></tscreen>
1009
1010 is OK, but
1011
1012 <tscreen><verb>
1013   instance C Int b => Foo b where ...
1014 </verb></tscreen>
1015
1016 is not OK.  Again, the intent here is to make sure that context
1017 reduction terminates.
1018
1019 Voluminous correspondence on the Haskell mailing list has convinced me
1020 that it's worth experimenting with a more liberal rule.  If you use
1021 the flag <tt>-fallow-undecidable-instances</tt> you can use arbitrary
1022 types in an instance context.  Termination is ensured by having a
1023 fixed-depth recursion stack.  If you exceed the stack depth you get a
1024 sort of backtrace, and the opportunity to increase the stack depth
1025 with <tt>-fcontext-stack</tt><em/N/.
1026
1027 </enum>
1028
1029 % -----------------------------------------------------------------------------
1030 <sect1>Explicit universal quantification
1031 <label id="universal-quantification">
1032 <p>
1033
1034 GHC now allows you to write explicitly quantified types.  GHC's
1035 syntax for this now agrees with Hugs's, namely:
1036
1037 <tscreen><verb>
1038         forall a b. (Ord a, Eq  b) => a -> b -> a
1039 </verb></tscreen>
1040
1041 The context is, of course, optional.  You can't use <tt>forall</tt> as
1042 a type variable any more!
1043
1044 Haskell type signatures are implicitly quantified.  The <tt>forall</tt>
1045 allows us to say exactly what this means.  For example:
1046
1047 <tscreen><verb>
1048         g :: b -> b
1049 </verb></tscreen>
1050
1051 means this:
1052
1053 <tscreen><verb>
1054         g :: forall b. (b -> b)
1055 </verb></tscreen>
1056
1057 The two are treated identically.
1058
1059 <sect2>Universally-quantified data type fields
1060 <label id="univ">
1061 <p>
1062
1063 In a <tt>data</tt> or <tt>newtype</tt> declaration one can quantify
1064 the types of the constructor arguments.  Here are several examples:
1065
1066 <tscreen><verb>
1067 data T a = T1 (forall b. b -> b -> b) a
1068
1069 data MonadT m = MkMonad { return :: forall a. a -> m a,
1070                           bind   :: forall a b. m a -> (a -> m b) -> m b
1071                         }
1072
1073 newtype Swizzle = MkSwizzle (Ord a => [a] -> [a])
1074 </verb></tscreen>
1075
1076 The constructors now have so-called <em/rank 2/ polymorphic
1077 types, in which there is a for-all in the argument types.:
1078
1079 <tscreen><verb>
1080 T1 :: forall a. (forall b. b -> b -> b) -> a -> T1 a
1081 MkMonad :: forall m. (forall a. a -> m a)
1082                   -> (forall a b. m a -> (a -> m b) -> m b)
1083                   -> MonadT m
1084 MkSwizzle :: (Ord a => [a] -> [a]) -> Swizzle
1085 </verb></tscreen>
1086
1087 Notice that you don't need to use a <tt>forall</tt> if there's an
1088 explicit context.  For example in the first argument of the
1089 constructor <tt>MkSwizzle</tt>, an implicit "<tt>forall a.</tt>" is
1090 prefixed to the argument type.  The implicit <tt>forall</tt>
1091 quantifies all type variables that are not already in scope, and are
1092 mentioned in the type quantified over.
1093
1094 As for type signatures, implicit quantification happens for non-overloaded
1095 types too.  So if you write this:
1096 <tscreen><verb>
1097   data T a = MkT (Either a b) (b -> b)
1098 </verb></tscreen>
1099 it's just as if you had written this:
1100 <tscreen><verb>
1101   data T a = MkT (forall b. Either a b) (forall b. b -> b)
1102 </verb></tscreen>
1103 That is, since the type variable <tt>b</tt> isn't in scope, it's
1104 implicitly universally quantified.  (Arguably, it would be better
1105 to <em>require</em> explicit quantification on constructor arguments
1106 where that is what is wanted.  Feedback welcomed.)
1107
1108 <sect2> Construction
1109 <p>
1110
1111 You construct values of types <tt>T1, MonadT, Swizzle</tt> by applying
1112 the constructor to suitable values, just as usual.  For example,
1113
1114 <tscreen><verb>
1115 (T1 (\xy->x) 3) :: T Int
1116
1117 (MkSwizzle sort)    :: Swizzle
1118 (MkSwizzle reverse) :: Swizzle
1119
1120 (let r x = Just x
1121      b m k = case m of
1122                 Just y -> k y
1123                 Nothing -> Nothing
1124   in
1125   MkMonad r b) :: MonadT Maybe
1126 </verb></tscreen>
1127
1128 The type of the argument can, as usual, be more general than the type
1129 required, as <tt>(MkSwizzle reverse)</tt> shows.  (<tt>reverse</tt>
1130 does not need the <tt>Ord</tt> constraint.)
1131
1132 <sect2>Pattern matching
1133 <p>
1134
1135 When you use pattern matching, the bound variables may now have
1136 polymorphic types.  For example:
1137
1138 <tscreen><verb>
1139         f :: T a -> a -> (a, Char)
1140         f (T1 f k) x = (f k x, f 'c' 'd')
1141
1142         g :: (Ord a, Ord b) => Swizzle -> [a] -> (a -> b) -> [b]
1143         g (MkSwizzle s) xs f = s (map f (s xs))
1144
1145         h :: MonadT m -> [m a] -> m [a]
1146         h m [] = return m []
1147         h m (x:xs) = bind m x           $ \y ->
1148                       bind m (h m xs)   $ \ys ->
1149                       return m (y:ys)
1150 </verb></tscreen>
1151
1152 In the function <tt>h</tt> we use the record selectors <tt>return</tt>
1153 and <tt>bind</tt> to extract the polymorphic bind and return functions
1154 from the <tt>MonadT</tt> data structure, rather than using pattern
1155 matching.
1156
1157 <sect2>The partial-application restriction
1158 <p>
1159
1160 There is really only one way in which data structures with polymorphic
1161 components might surprise you: you must not partially apply them.
1162 For example, this is illegal:
1163
1164 <tscreen><verb>
1165         map MkSwizzle [sort, reverse]
1166 </verb></tscreen>
1167
1168 The restriction is this: <em>every subexpression of the program must
1169 have a type that has no for-alls, except that in a function
1170 application (f e1 ... en) the partial applications are not subject to
1171 this rule</em>.  The restriction makes type inference feasible.
1172
1173 In the illegal example, the sub-expression <tt>MkSwizzle</tt> has the
1174 polymorphic type <tt>(Ord b => [b] -> [b]) -> Swizzle</tt> and is not
1175 a sub-expression of an enclosing application.  On the other hand, this
1176 expression is OK:
1177
1178 <tscreen><verb>
1179         map (T1 (\a b -> a)) [1,2,3]
1180 </verb></tscreen>
1181
1182 even though it involves a partial application of <tt>T1</tt>, because
1183 the sub-expression <tt>T1 (\a b -> a)</tt> has type <tt>Int -> T
1184 Int</tt>.
1185
1186 <sect2>Type signatures
1187 <label id="sigs">
1188 <p>
1189
1190 Once you have data constructors with universally-quantified fields, or
1191 constants such as <tt>runST</tt> that have rank-2 types, it isn't long
1192 before you discover that you need more!  Consider:
1193
1194 <tscreen><verb>
1195   mkTs f x y = [T1 f x, T1 f y]
1196 </verb></tscreen>
1197
1198 <tt>mkTs</tt> is a fuction that constructs some values of type
1199 <tt>T</tt>, using some pieces passed to it.  The trouble is that since
1200 <tt>f</tt> is a function argument, Haskell assumes that it is
1201 monomorphic, so we'll get a type error when applying <tt>T1</tt> to
1202 it.  This is a rather silly example, but the problem really bites in
1203 practice.  Lots of people trip over the fact that you can't make
1204 "wrappers functions" for <tt>runST</tt> for exactly the same reason.
1205 In short, it is impossible to build abstractions around functions with
1206 rank-2 types.
1207
1208 The solution is fairly clear.  We provide the ability to give a rank-2
1209 type signature for <em>ordinary</em> functions (not only data
1210 constructors), thus:
1211
1212 <tscreen><verb>
1213   mkTs :: (forall b. b -> b -> b) -> a -> [T a]
1214   mkTs f x y = [T1 f x, T1 f y]
1215 </verb></tscreen>
1216
1217 This type signature tells the compiler to attribute <tt>f</tt> with
1218 the polymorphic type <tt>(forall b. b -> b -> b)</tt> when type
1219 checking the body of <tt>mkTs</tt>, so now the application of
1220 <tt>T1</tt> is fine.
1221
1222 There are two restrictions:
1223
1224 <itemize>
1225 <item> You can only define a rank 2 type, specified by the following
1226 grammar:
1227
1228 <tscreen><verb>
1229    rank2type ::= [forall tyvars .] [context =>] funty
1230    funty     ::= ([forall tyvars .] [context =>] ty) -> funty
1231                | ty
1232    ty        ::= ...current Haskell monotype syntax...
1233 </verb></tscreen>
1234
1235 Informally, the universal quantification must all be right at the beginning,
1236 or at the top level of a function argument.
1237
1238 <item> There is a restriction on the definition of a function whose
1239 type signature is a rank-2 type: the polymorphic arguments must be
1240 matched on the left hand side of the "<tt>=</tt>" sign.  You can't
1241 define <tt>mkTs</tt> like this:
1242
1243 <tscreen><verb>
1244   mkTs :: (forall b. b -> b -> b) -> a -> [T a]
1245   mkTs = \ f x y -> [T1 f x, T1 f y]
1246 </verb></tscreen>
1247
1248
1249 The same partial-application rule applies to ordinary functions with
1250 rank-2 types as applied to data constructors.
1251
1252 </itemize>
1253
1254 % -----------------------------------------------------------------------------
1255 <sect1>Existentially quantified data constructors
1256 <label id="existential-quantification">
1257 <p>
1258
1259 The idea of using existential quantification in data type declarations
1260 was suggested by Laufer (I believe, thought doubtless someone will
1261 correct me), and implemented in Hope+. It's been in Lennart
1262 Augustsson's <tt>hbc</tt> Haskell compiler for several years, and
1263 proved very useful.  Here's the idea.  Consider the declaration:
1264
1265 <tscreen><verb>
1266   data Foo = forall a. MkFoo a (a -> Bool)
1267            | Nil
1268 </verb></tscreen>
1269
1270 The data type <tt>Foo</tt> has two constructors with types:
1271
1272 <tscreen><verb>
1273   MkFoo :: forall a. a -> (a -> Bool) -> Foo
1274   Nil   :: Foo
1275 </verb></tscreen>
1276
1277 Notice that the type variable <tt>a</tt> in the type of <tt>MkFoo</tt>
1278 does not appear in the data type itself, which is plain <tt>Foo</tt>.
1279 For example, the following expression is fine:
1280
1281 <tscreen><verb>
1282   [MkFoo 3 even, MkFoo 'c' isUpper] :: [Foo]
1283 </verb></tscreen>
1284
1285 Here, <tt>(MkFoo 3 even)</tt> packages an integer with a function
1286 <tt>even</tt> that maps an integer to <tt>Bool</tt>; and <tt>MkFoo 'c'
1287 isUpper</tt> packages a character with a compatible function.  These
1288 two things are each of type <tt>Foo</tt> and can be put in a list.
1289
1290 What can we do with a value of type <tt>Foo</tt>?.  In particular,
1291 what happens when we pattern-match on <tt>MkFoo</tt>?
1292
1293 <tscreen><verb>
1294   f (MkFoo val fn) = ???
1295 </verb></tscreen>
1296
1297 Since all we know about <tt>val</tt> and <tt>fn</tt> is that they
1298 are compatible, the only (useful) thing we can do with them is to
1299 apply <tt>fn</tt> to <tt>val</tt> to get a boolean.  For example:
1300
1301 <tscreen><verb>
1302   f :: Foo -> Bool
1303   f (MkFoo val fn) = fn val
1304 </verb></tscreen>
1305
1306 What this allows us to do is to package heterogenous values
1307 together with a bunch of functions that manipulate them, and then treat
1308 that collection of packages in a uniform manner.  You can express
1309 quite a bit of object-oriented-like programming this way.
1310
1311 <sect2>Why existential?
1312 <label id="existential">
1313 <p>
1314
1315 What has this to do with <em>existential</em> quantification?
1316 Simply that <tt>MkFoo</tt> has the (nearly) isomorphic type
1317
1318 <tscreen><verb>
1319   MkFoo :: (exists a . (a, a -> Bool)) -> Foo
1320 </verb></tscreen>
1321
1322 But Haskell programmers can safely think of the ordinary
1323 <em>universally</em> quantified type given above, thereby avoiding
1324 adding a new existential quantification construct.
1325
1326 <sect2>Type classes
1327 <p>
1328
1329 An easy extension (implemented in <tt>hbc</tt>) is to allow
1330 arbitrary contexts before the constructor.  For example:
1331
1332 <tscreen><verb>
1333   data Baz = forall a. Eq a => Baz1 a a
1334            | forall b. Show b => Baz2 b (b -> b)
1335 </verb></tscreen>
1336
1337 The two constructors have the types you'd expect:
1338
1339 <tscreen><verb>
1340   Baz1 :: forall a. Eq a => a -> a -> Baz
1341   Baz2 :: forall b. Show b => b -> (b -> b) -> Baz
1342 </verb></tscreen>
1343
1344 But when pattern matching on <tt>Baz1</tt> the matched values can be compared
1345 for equality, and when pattern matching on <tt>Baz2</tt> the first matched
1346 value can be converted to a string (as well as applying the function to it).
1347 So this program is legal:
1348
1349 <tscreen><verb>
1350   f :: Baz -> String
1351   f (Baz1 p q) | p == q    = "Yes"
1352                | otherwise = "No"
1353   f (Baz1 v fn)            = show (fn v)
1354 </verb></tscreen>
1355
1356 Operationally, in a dictionary-passing implementation, the
1357 constructors <tt>Baz1</tt> and <tt>Baz2</tt> must store the
1358 dictionaries for <tt>Eq</tt> and <tt>Show</tt> respectively, and
1359 extract it on pattern matching.
1360
1361 Notice the way that the syntax fits smoothly with that used for
1362 universal quantification earlier.
1363
1364 <sect2>Restrictions
1365 <p>
1366
1367 There are several restrictions on the ways in which existentially-quantified
1368 constructors can be use.
1369
1370 <itemize>
1371
1372 <item> When pattern matching, each pattern match introduces a new,
1373 distinct, type for each existential type variable.  These types cannot
1374 be unified with any other type, nor can they escape from the scope of
1375 the pattern match.  For example, these fragments are incorrect:
1376
1377 <tscreen><verb>
1378   f1 (MkFoo a f) = a
1379 </verb></tscreen>
1380
1381 Here, the type bound by <tt>MkFoo</tt> "escapes", because <tt>a</tt>
1382 is the result of <tt>f1</tt>.  One way to see why this is wrong is to
1383 ask what type <tt>f1</tt> has:
1384
1385 <tscreen><verb>
1386   f1 :: Foo -> a             -- Weird!
1387 </verb></tscreen>
1388
1389 What is this "<tt>a</tt>" in the result type? Clearly we don't mean
1390 this:
1391
1392 <tscreen><verb>
1393   f1 :: forall a. Foo -> a   -- Wrong!
1394 </verb></tscreen>
1395
1396 The original program is just plain wrong.  Here's another sort of error
1397
1398 <tscreen><verb>
1399   f2 (Baz1 a b) (Baz1 p q) = a==q
1400 </verb></tscreen>
1401
1402 It's ok to say <tt>a==b</tt> or <tt>p==q</tt>, but
1403 <tt>a==q</tt> is wrong because it equates the two distinct types arising
1404 from the two <tt>Baz1</tt> constructors.
1405
1406
1407 <item>You can't pattern-match on an existentially quantified
1408 constructor in a <tt>let</tt> or <tt>where</tt> group of
1409 bindings. So this is illegal:
1410
1411 <tscreen><verb>
1412   f3 x = a==b where { Baz1 a b = x }
1413 </verb></tscreen>
1414
1415 You can only pattern-match
1416 on an existentially-quantified constructor in a <tt>case</tt> expression or
1417 in the patterns of a function definition.
1418
1419 The reason for this restriction is really an implementation one.
1420 Type-checking binding groups is already a nightmare without
1421 existentials complicating the picture.  Also an existential pattern
1422 binding at the top level of a module doesn't make sense, because it's
1423 not clear how to prevent the existentially-quantified type "escaping".
1424 So for now, there's a simple-to-state restriction.  We'll see how
1425 annoying it is.
1426
1427 <item>You can't use existential quantification for <tt>newtype</tt>
1428 declarations.  So this is illegal:
1429
1430 <tscreen><verb>
1431   newtype T = forall a. Ord a => MkT a
1432 </verb></tscreen>
1433
1434 Reason: a value of type <tt>T</tt> must be represented as a pair
1435 of a dictionary for <tt>Ord t</tt> and a value of type <tt>t</tt>.
1436 That contradicts the idea that <tt>newtype</tt> should have no
1437 concrete representation.  You can get just the same efficiency and effect
1438 by using <tt>data</tt> instead of <tt>newtype</tt>.  If there is no
1439 overloading involved, then there is more of a case for allowing
1440 an existentially-quantified <tt>newtype</tt>, because the <tt>data</tt>
1441 because the <tt>data</tt> version does carry an implementation cost,
1442 but single-field existentially quantified constructors aren't much
1443 use.  So the simple restriction (no existential stuff on <tt>newtype</tt>)
1444 stands, unless there are convincing reasons to change it.
1445 </itemize>