ghc/docs/users_guide/glasgow_exts.vsgml

   1 %
   2 % $Id: glasgow_exts.vsgml,v 1.5 1999/02/22 10:22:35 sof Exp $
   3 %
   4 % GHC Language Extensions.
   5 %
   6
   7 As with all known Haskell systems, GHC implements some extensions to
   8 the language.  To use them, you'll need to give a @-fglasgow-exts@%
   9 <nidx>-fglasgow-exts option</nidx> option.
  10
  11 Virtually all of the Glasgow extensions serve to give you access to
  12 the underlying facilities with which we implement Haskell.  Thus, you
  13 can get at the Raw Iron, if you are willing to write some non-standard
  14 code at a more primitive level.  You need not be ``stuck'' on
  15 performance because of the implementation costs of Haskell's
  16 ``high-level'' features---you can always code ``under'' them.  In an
  17 extreme case, you can write all your time-critical code in C, and then
  18 just glue it together with Haskell!
  19
  20 Executive summary of our extensions:
  21
  22 <descrip>
  23
  24 <tag>Unboxed types and primitive operations:</tag>
  25
  26 You can get right down to the raw machine types and operations;
  27 included in this are ``primitive arrays'' (direct access to Big Wads
  28 of Bytes).  Please see Section <ref name="Unboxed types"
  29 id="glasgow-unboxed"> and following.
  30
  31 <tag>Multi-parameter type classes:</tag>
  32
  33 GHC's type system supports extended type classes with multiple
  34 parameters.  Please see Section <ref name="Mult-parameter type
  35 classes" id="multi-param-type-classes">.
  36
  37 <tag>Local universal quantification:</tag>
  38
  39 GHC's type system supports explicit unversal quantification in
  40 constructor fields and function arguments.  This is useful for things
  41 like defining @runST@ from the state-thread world.  See Section <ref
  42 name="Local universal quantification" id="universal-quantification">.
  43
  44 <tag>Extistentially quantification in data types:</tag>
  45
  46 Some or all of the type variables in a datatype declaration may be
  47 <em>existentially quantified</em>.  More details in Section <ref
  48 name="Existential Quantification" id="existential-quantification">.
  49
  50 <tag>Scoped type variables:</tag>
  51
  52 Scoped type variables enable the programmer to supply type signatures
  53 for some nested declarations, where this would not be legal in Haskell
  54 98.  Details in Section <ref name="Scoped Type Variables"
  55 id="scoped-type-variables">.
  56
  57 <tag>Calling out to C:</tag>
  58
  59 Just what it sounds like.  We provide <em>lots</em> of rope that you
  60 can dangle around your neck.  Please see Section <ref name="Calling~C
  61 directly from Haskell" id="glasgow-ccalls">.
  62
  63 </descrip>
  64
  65 Before you get too carried away working at the lowest level (e.g.,
  66 sloshing @MutableByteArray#@s around your program), you may wish to
  67 check if there are system libraries that provide a ``Haskellised
  68 veneer'' over the features you want.  See Section <ref name="GHC
  69 Prelude and libraries" id="ghc-prelude">.
  70
  71 %************************************************************************
  72 %*                                                                      *
  73 <sect1>Unboxed types
  74 <label id="glasgow-unboxed">
  75 <p>
  76 <nidx>Unboxed types (Glasgow extension)</nidx>
  77 %*                                                                      *
  78 %************************************************************************
  79
  80 These types correspond to the ``raw machine'' types you would use in
  81 C: @Int#@ (long int), @Double#@ (double), @Addr#@ (void *), etc.  The
  82 <em>primitive operations</em> (PrimOps) on these types are what you
  83 might expect; e.g., @(+#)@ is addition on @Int#@s, and is the
  84 machine-addition that we all know and love---usually one instruction.
  85
  86 There are some restrictions on the use of unboxed types, the main one
  87 being that you can't pass an unboxed value to a polymorphic function
  88 or store one in a polymorphic data type.  This rules out things like
  89 @[Int#]@ (ie. lists of unboxed integers).  The reason for this
  90 restriction is that polymorphic arguments and constructor fields are
  91 assumed to be pointers: if an unboxed integer is stored in one of
  92 these, the garbage collector would attempt to follow it, leading to
  93 unpredictable space leaks.  Or a @seq@ operation on the polymorphic
  94 component may attempt to dereference the pointer, with disastrous
  95 results.  Even worse, the unboxed value might be larger than a pointer
  96 (@Double#@ for instance).
  97
  98 Nevertheless, A numerically-intensive program using unboxed types can
  99 go a <em>lot</em> faster than its ``standard'' counterpart---we saw a
 100 threefold speedup on one example.
 101
 102 Please see Section <ref name="The module PrelGHC: really primitive
 103 stuff" id="ghc-libs-ghc"> for the details of unboxed types and the
 104 operations on them.
 105
 106 %************************************************************************
 107 %*                                                                      *
 108 <sect1>Primitive state-transformer monad
 109 <label id="glasgow-ST-monad">
 110 <p>
 111 <nidx>state transformers (Glasgow extensions)</nidx>
 112 <nidx>ST monad (Glasgow extension)</nidx>
 113 %*                                                                      *
 114 %************************************************************************
 115
 116 This monad underlies our implementation of arrays, mutable and
 117 immutable, and our implementation of I/O, including ``C calls''.
 118
 119 The @ST@ library, which provides access to the @ST@ monad, is a
 120 GHC/Hugs extension library and is described in the separate <htmlurl
 121 name="GHC/Hugs Extension Libraries" url="libs.html"> document.
 122
 123 %************************************************************************
 124 %*                                                                      *
 125 <sect1>Primitive arrays, mutable and otherwise
 126 <label id="glasgow-prim-arrays">
 127 <p>
 128 <nidx>primitive arrays (Glasgow extension)</nidx>
 129 <nidx>arrays, primitive (Glasgow extension)</nidx>
 130 %*                                                                      *
 131 %************************************************************************
 132
 133 GHC knows about quite a few flavours of Large Swathes of Bytes.
 134
 135 First, GHC distinguishes between primitive arrays of (boxed) Haskell
 136 objects (type @Array# obj@) and primitive arrays of bytes (type
 137 @ByteArray#@).
 138
 139 Second, it distinguishes between...
 140 <descrip>
 141 <tag>Immutable:</tag>
 142 Arrays that do not change (as with ``standard'' Haskell arrays); you
 143 can only read from them.  Obviously, they do not need the care and
 144 attention of the state-transformer monad.
 145
 146 <tag>Mutable:</tag>
 147 Arrays that may be changed or ``mutated.''  All the operations on them
 148 live within the state-transformer monad and the updates happen
 149 <em>in-place</em>.
 150
 151 <tag>``Static'' (in C land):</tag>
 152 A C routine may pass an @Addr#@ pointer back into Haskell land.  There
 153 are then primitive operations with which you may merrily grab values
 154 over in C land, by indexing off the ``static'' pointer.
 155
 156 <tag>``Stable'' pointers:</tag>
 157 If, for some reason, you wish to hand a Haskell pointer (i.e.,
 158 <em>not</em> an unboxed value) to a C routine, you first make the
 159 pointer ``stable,'' so that the garbage collector won't forget that it
 160 exists.  That is, GHC provides a safe way to pass Haskell pointers to
 161 C.
 162
 163 Please see Section <ref name="Subverting automatic unboxing with
 164 ``stable pointers''" id="glasgow-stablePtrs"> for more details.
 165
 166 <tag>``Foreign objects'':</tag>
 167 A ``foreign object'' is a safe way to pass an external object (a
 168 C-allocated pointer, say) to Haskell and have Haskell do the Right
 169 Thing when it no longer references the object.  So, for example, C
 170 could pass a large bitmap over to Haskell and say ``please free this
 171 memory when you're done with it.''
 172
 173 Please see Section <ref name="Pointing outside the Haskell heap"
 174 id="glasgow-foreignObjs"> for more details.
 175
 176 </descrip>
 177
 178 The libraries section gives more details on all these ``primitive
 179 array'' types and the operations on them, Section <ref name="The GHC
 180 Prelude and Libraries" id="ghc-prelude">.  Some of these extensions
 181 are also supported by Hugs, and the supporting libraries are described
 182 in the <htmlurl name="GHC/Hugs Extension Libraries" url="libs.html">
 183 document.
 184
 185 %************************************************************************
 186 %*                                                                      *
 187 <sect1>Calling~C directly from Haskell
 188 <label id="glasgow-ccalls">
 189 <p>
 190 <nidx>C calls (Glasgow extension)</nidx>
 191 <nidx>_ccall_ (Glasgow extension)</nidx>
 192 <nidx>_casm_ (Glasgow extension)</nidx>
 193 %*                                                                      *
 194 %************************************************************************
 195
 196 GOOD ADVICE: Because this stuff is not Entirely Stable as far as names
 197 and things go, you would be well-advised to keep your C-callery
 198 corraled in a few modules, rather than sprinkled all over your code.
 199 It will then be quite easy to update later on.
 200
 201 %************************************************************************
 202 %*                                                                      *
 203 <sect2>@_ccall_@ and @_casm_@: an introduction
 204 <label id="ccall-intro">
 205 <p>
 206 %*                                                                      *
 207 %************************************************************************
 208
 209 The simplest way to use a simple C function
 210
 211 <tscreen><verb>
 212 double fooC( FILE *in, char c, int i, double d, unsigned int u )
 213 </verb></tscreen>
 214
 215 is to provide a Haskell wrapper:
 216
 217 <tscreen><verb>
 218 fooH :: Char -> Int -> Double -> Word -> IO Double
 219 fooH c i d w = _ccall_ fooC (``stdin''::Addr) c i d w
 220 </verb></tscreen>
 221
 222 The function @fooH@ will unbox all of its arguments, call the C
 223 function @fooC@ and box the corresponding arguments.
 224
 225 One of the annoyances about @_ccall_@s is when the C types don't quite
 226 match the Haskell compiler's ideas.  For this, the @_casm_@ variant
 227 may be just the ticket (NB: <em>no chance</em> of such code going
 228 through a native-code generator):
 229
 230 <tscreen><verb>
 231 oldGetEnv name
 232   = _casm_ ``%r = getenv((char *) %0);'' name >>= \ litstring@(A# str#) ->
 233     return (
 234         if (litstring == ``NULL'') then
 235             Left ("Fail:oldGetEnv:"++name)
 236         else
 237             Right (unpackCString# str#)
 238     )
 239 </verb></tscreen>
 240
 241 The first literal-literal argument to a @_casm_@ is like a @printf@
 242 format: @%r@ is replaced with the ``result,'' @%0@--@%n-1@ are
 243 replaced with the 1st--nth arguments.  As you can see above, it is an
 244 easy way to do simple C~casting.  Everything said about @_ccall_@ goes
 245 for @_casm_@ as well.
 246
 247 The use of @_casm_@ in your code does pose a problem to the compiler
 248 when it comes to generating an interface file for a freshly compiled
 249 module. Included in an interface file is the unfolding (if any) of a
 250 declaration. However, if a declaration's unfolding happens to contain
 251 a @_casm_@, its unfolding will <em/not/ be emitted into the interface
 252 file even if it qualifies by all the other criteria. The reason why
 253 the compiler prevents this from happening is that unfolding @_casm_@s
 254 into an interface file unduly constrains how code that import your
 255 module have to be compiled. If an imported declaration is unfolded and
 256 it contains a @_casm_@, you now have to be using a compiler backend
 257 capable of dealing with it (i.e., the C compiler backend). If you are
 258 using the C compiler backend, the unfolded @_casm_@ may still cause you
 259 problems since the C code snippet it contains may mention CPP symbols
 260 that were in scope when compiling the original module are not when
 261 compiling the importing module.
 262
 263 If you're willing to put up with the drawbacks of doing cross-module
 264 inlining of C code (GHC - A Better C Compiler :-), the option
 265 @-funfold-casms-in-hi-file@ will turn off the default behaviour.
 266 <nidx>-funfold-casms-in-hi-file option</nidx>
 267
 268 %************************************************************************
 269 %*                                                                      *
 270 <sect2>Using function headers
 271 <label id="glasgow-foreign-headers">
 272 <p>
 273 <nidx>C calls, function headers</nidx>
 274 %*                                                                      *
 275 %************************************************************************
 276
 277 When generating C (using the @-fvia-C@ directive), one can assist the
 278 C compiler in detecting type errors by using the @-#include@ directive
 279 to provide @.h@ files containing function headers.
 280
 281 For example,
 282
 283 <tscreen><verb>
 284 typedef unsigned long *StgForeignObj;
 285 typedef long StgInt;
 286
 287 void          initialiseEFS (StgInt size);
 288 StgInt        terminateEFS (void);
 289 StgForeignObj emptyEFS(void);
 290 StgForeignObj updateEFS (StgForeignObj a, StgInt i, StgInt x);
 291 StgInt        lookupEFS (StgForeignObj a, StgInt i);
 292 </verb></tscreen>
 293
 294 You can find appropriate definitions for @StgInt@, @StgForeignObj@,
 295 etc using @gcc@ on your architecture by consulting
 296 @ghc/includes/StgTypes.h@.  The following table summarises the
 297 relationship between Haskell types and C types.
 298
 299 <tabular ca="ll">
 300 <bf>C type name</bf>      | <bf>Haskell Type</bf> @@
 301 @@
 302 @StgChar@          | @Char#@ @@
 303 @StgInt@           | @Int#@ @@
 304 @StgWord@          | @Word#@ @@
 305 @StgAddr@          | @Addr#@ @@
 306 @StgFloat@         | @Float#@ @@
 307 @StgDouble@        | @Double#@ @@
 308
 309 @StgArray@         | @Array#@ @@
 310 @StgByteArray@     | @ByteArray#@ @@
 311 @StgArray@         | @MutableArray#@ @@
 312 @StgByteArray@     | @MutableByteArray#@ @@
 313
 314 @StgStablePtr@     | @StablePtr#@ @@
 315 @StgForeignObj@    | @ForeignObj#@
 316 </tabular>
 317
 318 Note that this approach is only <em>essential</em> for returning
 319 @float@s (or if @sizeof(int) != sizeof(int *)@ on your
 320 architecture) but is a Good Thing for anyone who cares about writing
 321 solid code.  You're crazy not to do it.
 322
 323 %************************************************************************
 324 %*                                                                      *
 325 <sect2>Subverting automatic unboxing with ``stable pointers''
 326 <label id="glasgow-stablePtrs">
 327 <p>
 328 <nidx>stable pointers (Glasgow extension)</nidx>
 329 %*                                                                      *
 330 %************************************************************************
 331
 332 The arguments of a @_ccall_@ are automatically unboxed before the
 333 call.  There are two reasons why this is usually the Right Thing to
 334 do:
 335
 336 <itemize>
 337 <item>
 338 C is a strict language: it would be excessively tedious to pass
 339 unevaluated arguments and require the C programmer to force their
 340 evaluation before using them.
 341
 342 <item> Boxed values are stored on the Haskell heap and may be moved
 343 within the heap if a garbage collection occurs---that is, pointers
 344 to boxed objects are not <em>stable</em>.
 345 </itemize>
 346
 347 It is possible to subvert the unboxing process by creating a ``stable
 348 pointer'' to a value and passing the stable pointer instead.  For
 349 example, to pass/return an integer lazily to C functions @storeC@ and
 350 @fetchC@, one might write:
 351
 352 <tscreen><verb>
 353 storeH :: Int -> IO ()
 354 storeH x = makeStablePtr x              >>= \ stable_x ->
 355            _ccall_ storeC stable_x
 356
 357 fetchH :: IO Int
 358 fetchH x = _ccall_ fetchC               >>= \ stable_x ->
 359            deRefStablePtr stable_x      >>= \ x ->
 360            freeStablePtr stable_x       >>
 361            return x
 362 </verb></tscreen>
 363
 364 The garbage collector will refrain from throwing a stable pointer away
 365 until you explicitly call one of the following from C or Haskell.
 366
 367 <tscreen><verb>
 368 void freeStablePointer( StgStablePtr stablePtrToToss )
 369 freeStablePtr :: StablePtr a -> IO ()
 370 </verb></tscreen>
 371
 372 As with the use of @free@ in C programs, GREAT CARE SHOULD BE
 373 EXERCISED to ensure these functions are called at the right time: too
 374 early and you get dangling references (and, if you're lucky, an error
 375 message from the runtime system); too late and you get space leaks.
 376
 377 And to force evaluation of the argument within @fooC@, one would
 378 call one of the following C functions (according to type of argument).
 379
 380 <tscreen><verb>
 381 void     performIO  ( StgStablePtr stableIndex /* StablePtr s (IO ()) */ );
 382 StgInt   enterInt   ( StgStablePtr stableIndex /* StablePtr s Int */ );
 383 StgFloat enterFloat ( StgStablePtr stableIndex /* StablePtr s Float */ );
 384 </verb></tscreen>
 385
 386 <nidx>performIO</nidx>
 387 <nidx>enterInt</nidx>
 388 <nidx>enterFloat</nidx>
 389
 390 Note Bene: @_ccall_GC_@<nidx>_ccall_GC_</nidx> must be used if any of
 391 these functions are used.
 392
 393 %************************************************************************
 394 %*                                                                      *
 395 <sect2>Foreign objects: pointing outside the Haskell heap
 396 <label id="glasgow-foreignObjs">
 397 <p>
 398 <nidx>foreign objects (Glasgow extension)</nidx>
 399 %*                                                                      *
 400 %************************************************************************
 401
 402 There are two types that @ghc@ programs can use to reference
 403 (heap-allocated) objects outside the Haskell world: @Addr@ and
 404 @ForeignObj@.
 405
 406 If you use @Addr@, it is up to you to the programmer to arrange
 407 allocation and deallocation of the objects.
 408
 409 If you use @ForeignObj@, @ghc@'s garbage collector will call upon the
 410 user-supplied <em>finaliser</em> function to free the object when the
 411 Haskell world no longer can access the object.  (An object is
 412 associated with a finaliser function when the abstract
 413  Haskell type @ForeignObj@ is created). The finaliser function is
 414 expressed in C, and is passed as argument the object:
 415
 416 <tscreen><verb>
 417 void foreignFinaliser ( StgForeignObj fo )
 418 </verb></tscreen>
 419
 420 when the Haskell world can no longer access the object.  Since
 421 @ForeignObj@s only get released when a garbage collection occurs, we
 422 provide ways of triggering a garbage collection from within C and from
 423 within Haskell.
 424
 425 <tscreen><verb>
 426 void GarbageCollect()
 427 performGC :: IO ()
 428 </verb></tscreen>
 429
 430 More information on the programmers' interface to @ForeignObj@ can be
 431 found in the library documentation.
 432
 433 %************************************************************************
 434 %*                                                                      *
 435 <sect2>Avoiding monads
 436 <label id="glasgow-avoiding-monads">
 437 <p>
 438 <nidx>C calls to `pure C'</nidx>
 439 <nidx>unsafePerformIO</nidx>
 440 %*                                                                      *
 441 %************************************************************************
 442
 443 The @_ccall_@ construct is part of the @IO@ monad because 9 out of 10
 444 uses will be to call imperative functions with side effects such as
 445 @printf@.  Use of the monad ensures that these operations happen in a
 446 predictable order in spite of laziness and compiler optimisations.
 447
 448 To avoid having to be in the monad to call a C function, it is
 449 possible to use @unsafePerformIO@, which is available from the
 450 @IOExts@ module.  There are three situations where one might like to
 451 call a C function from outside the IO world:
 452
 453 <itemize>
 454 <item>
 455 Calling a function with no side-effects:
 456 <tscreen><verb>
 457 atan2d :: Double -> Double -> Double
 458 atan2d y x = unsafePerformIO (_ccall_ atan2d y x)
 459
 460 sincosd :: Double -> (Double, Double)
 461 sincosd x = unsafePerformIO $ do
 462         da <- newDoubleArray (0, 1)
 463         _casm_ ``sincosd( %0, &((double *)%1[0]), &((double *)%1[1]) );'' x da
 464         s <- readDoubleArray da 0
 465         c <- readDoubleArray da 1
 466         return (s, c)
 467 </verb></tscreen>
 468
 469 <item> Calling a set of functions which have side-effects but which can
 470 be used in a purely functional manner.
 471
 472 For example, an imperative implementation of a purely functional
 473 lookup-table might be accessed using the following functions.
 474
 475 <tscreen><verb>
 476 empty  :: EFS x
 477 update :: EFS x -> Int -> x -> EFS x
 478 lookup :: EFS a -> Int -> a
 479
 480 empty = unsafePerformIO (_ccall_ emptyEFS)
 481
 482 update a i x = unsafePerformIO $
 483         makeStablePtr x         >>= \ stable_x ->
 484         _ccall_ updateEFS a i stable_x
 485
 486 lookup a i = unsafePerformIO $
 487         _ccall_ lookupEFS a i   >>= \ stable_x ->
 488         deRefStablePtr stable_x
 489 </verb></tscreen>
 490
 491 You will almost always want to use @ForeignObj@s with this.
 492
 493 <item> Calling a side-effecting function even though the results will
 494 be unpredictable.  For example the @trace@ function is defined by:
 495
 496 <tscreen><verb>
 497 trace :: String -> a -> a
 498 trace string expr
 499   = unsafePerformIO (
 500         ((_ccall_ PreTraceHook sTDERR{-msg-}):: IO ())  >>
 501         fputs sTDERR string                             >>
 502         ((_ccall_ PostTraceHook sTDERR{-msg-}):: IO ()) >>
 503         return expr )
 504   where
 505     sTDERR = (``stderr'' :: Addr)
 506 </verb></tscreen>
 507
 508 (This kind of use is not highly recommended --- it is only really
 509 useful in debugging code.)
 510 </itemize>
 511
 512 %************************************************************************
 513 %*                                                                      *
 514 <sect2>C-calling ``gotchas'' checklist
 515 <label id="ccall-gotchas">
 516 <p>
 517 <nidx>C call dangers</nidx>
 518 %*                                                                      *
 519 %************************************************************************
 520
 521 And some advice, too.
 522
 523 <itemize>
 524 <item> For modules that use @_ccall_@s, etc., compile with
 525 @-fvia-C@.<nidx>-fvia-C option</nidx> You don't have to, but you should.
 526
 527 Also, use the @-#include "prototypes.h"@ flag (hack) to inform the C
 528 compiler of the fully-prototyped types of all the C functions you
 529 call.  (Section <ref name="Using function headers"
 530 id="glasgow-foreign-headers"> says more about this...)
 531
 532 This scheme is the <em>only</em> way that you will get <em>any</em>
 533 typechecking of your @_ccall_@s.  (It shouldn't be that way, but...).
 534 GHC will pass the flag @-Wimplicit@ to gcc so that you'll get warnings
 535 if any @_ccall_@ed functions have no prototypes.
 536
 537 <item>
 538 Try to avoid @_ccall_@s to C~functions that take @float@
 539 arguments or return @float@ results.  Reason: if you do, you will
 540 become entangled in (ANSI?) C's rules for when arguments/results are
 541 promoted to @doubles@.  It's a nightmare and just not worth it.
 542 Use @doubles@ if possible.
 543
 544 If you do use @floats@, check and re-check that the right thing is
 545 happening.  Perhaps compile with @-keep-hc-file-too@ and look at
 546 the intermediate C (@.hc@ file).
 547
 548 <item> The compiler uses two non-standard type-classes when
 549 type-checking the arguments and results of @_ccall_@: the arguments
 550 (respectively result) of @_ccall_@ must be instances of the class
 551 @CCallable@ (respectively @CReturnable@).  Both classes may be
 552 imported from the module @CCall@, but this should only be
 553 necessary if you want to define a new instance.  (Neither class
 554 defines any methods --- their only function is to keep the
 555 type-checker happy.)
 556
 557 The type checker must be able to figure out just which of the
 558 C-callable/returnable types is being used.  If it can't, you have to
 559 add type signatures. For example,
 560
 561 <tscreen><verb>
 562 f x = _ccall_ foo x
 563 </verb></tscreen>
 564
 565 is not good enough, because the compiler can't work out what type @x@
 566 is, nor what type the @_ccall_@ returns.  You have to write, say:
 567
 568 <tscreen><verb>
 569 f :: Int -> IO Double
 570 f x = _ccall_ foo x
 571 </verb></tscreen>
 572
 573 This table summarises the standard instances of these classes.
 574
 575 % ToDo: check this table against implementation!
 576
 577 <tabular ca="llll">
 578 <bf>Type</bf>       |<bf>CCallable</bf>|<bf>CReturnable</bf> | <bf>Which is probably...</bf> @@
 579
 580 @Char@              | Yes  | Yes   | @unsigned char@ @@
 581 @Int@               | Yes  | Yes   | @long int@ @@
 582 @Word@              | Yes  | Yes   | @unsigned long int@ @@
 583 @Addr@              | Yes  | Yes   | @void *@ @@
 584 @Float@             | Yes  | Yes   | @float@ @@
 585 @Double@            | Yes  | Yes   | @double@ @@
 586 @()@                | No   | Yes   | @void@ @@
 587 @[Char]@            | Yes  | No    | @char *@ (null-terminated) @@
 588
 589 @Array@             | Yes  | No    | @unsigned long *@ @@
 590 @ByteArray@         | Yes  | No    | @unsigned long *@ @@
 591 @MutableArray@      | Yes  | No    | @unsigned long *@ @@
 592 @MutableByteArray@  | Yes  | No    | @unsigned long *@ @@
 593
 594 @State@             | Yes  | Yes   | nothing!@@
 595
 596 @StablePtr@         | Yes  | Yes   | @unsigned long *@ @@
 597 @ForeignObjs@       | Yes  | Yes   | see later @@
 598 </tabular>
 599
 600 Actually, the @Word@ type is defined as being the same size as a
 601 pointer on the target architecture, which is <em>probably</em>
 602 @unsigned long int@.
 603
 604 The brave and careful programmer can add their own instances of these
 605 classes for the following types:
 606
 607 <itemize>
 608 <item>
 609 A <em>boxed-primitive</em> type may be made an instance of both
 610 @CCallable@ and @CReturnable@.
 611
 612 A boxed primitive type is any data type with a
 613 single unary constructor with a single primitive argument.  For
 614 example, the following are all boxed primitive types:
 615
 616 <tscreen><verb>
 617 Int
 618 Double
 619 data XDisplay = XDisplay Addr#
 620 data EFS a = EFS# ForeignObj#
 621 </verb></tscreen>
 622
 623 <tscreen><verb>
 624 instance CCallable   (EFS a)
 625 instance CReturnable (EFS a)
 626 </verb></tscreen>
 627
 628 <item> Any datatype with a single nullary constructor may be made an
 629 instance of @CReturnable@.  For example:
 630
 631 <tscreen><verb>
 632 data MyVoid = MyVoid
 633 instance CReturnable MyVoid
 634 </verb></tscreen>
 635
 636 <item> As at version 2.09, @String@ (i.e., @[Char]@) is still
 637 not a @CReturnable@ type.
 638
 639 Also, the now-builtin type @PackedString@ is neither
 640 @CCallable@ nor @CReturnable@.  (But there are functions in
 641 the PackedString interface to let you get at the necessary bits...)
 642 </itemize>
 643
 644 <item> The code-generator will complain if you attempt to use @%r@ in
 645 a @_casm_@ whose result type is @IO ()@; or if you don't use @%r@
 646 <em>precisely</em> once for any other result type.  These messages are
 647 supposed to be helpful and catch bugs---please tell us if they wreck
 648 your life.
 649
 650 <item> If you call out to C code which may trigger the Haskell garbage
 651 collector or create new threads (examples of this later...), then you
 652 must use the @_ccall_GC_@<nidx>_ccall_GC_ primitive</nidx> or
 653 @_casm_GC_@<nidx>_casm_GC_ primitive</nidx> variant of C-calls.  (This
 654 does not work with the native code generator - use @\fvia-C@.) This
 655 stuff is hairy with a capital H!  </itemize>
 656
 657 <sect1> Multi-parameter type classes
 658 <label id="multi-param-type-classes">
 659 <p>
 660
 661 This section documents GHC's implementation of multi-paramter type
 662 classes.  There's lots of background in the paper <url name="Type
 663 classes: exploring the design space"
 664 url="http://www.dcs.gla.ac.uk/~simonpj/multi.ps.gz"> (Simon Peyton
 665 Jones, Mark Jones, Erik Meijer).
 666
 667 I'd like to thank people who reported shorcomings in the GHC 3.02
 668 implementation.  Our default decisions were all conservative ones, and
 669 the experience of these heroic pioneers has given useful concrete
 670 examples to support several generalisations.  (These appear below as
 671 design choices not implemented in 3.02.)
 672
 673 I've discussed these notes with Mark Jones, and I believe that Hugs
 674 will migrate towards the same design choices as I outline here.
 675 Thanks to him, and to many others who have offered very useful
 676 feedback.
 677
 678 <sect2>Types
 679 <p>
 680
 681 There are the following restrictions on the form of a qualified
 682 type:
 683
 684 <tscreen><verb>
 685   forall tv1..tvn (c1, ...,cn) => type
 686 </verb></tscreen>
 687
 688 (Here, I write the "foralls" explicitly, although the Haskell source
 689 language omits them; in Haskell 1.4, all the free type variables of an
 690 explicit source-language type signature are universally quantified,
 691 except for the class type variables in a class declaration.  However,
 692 in GHC, you can give the foralls if you want.  See Section <ref
 693 name="Explicit universal quantification"
 694 id="universal-quantification">).
 695
 696 <enum>
 697
 698 <item> <bf>Each universally quantified type variable
 699 @tvi@ must be mentioned (i.e. appear free) in @type@</bf>.
 700
 701 The reason for this is that a value with a type that does not obey
 702 this restriction could not be used without introducing
 703 ambiguity. Here, for example, is an illegal type:
 704
 705 <tscreen><verb>
 706   forall a. Eq a => Int
 707 </verb></tscreen>
 708
 709 When a value with this type was used, the constraint <tt>Eq tv</tt>
 710 would be introduced where <tt>tv</tt> is a fresh type variable, and
 711 (in the dictionary-translation implementation) the value would be
 712 applied to a dictionary for <tt>Eq tv</tt>.  The difficulty is that we
 713 can never know which instance of <tt>Eq</tt> to use because we never
 714 get any more information about <tt>tv</tt>.
 715
 716 <item> <bf>Every constraint @ci@ must mention at least one of the
 717 universally quantified type variables @tvi@</bf>.
 718
 719 For example, this type is OK because <tt>C a b</tt> mentions the
 720 universally quantified type variable <tt>b</tt>:
 721
 722 <tscreen><verb>
 723   forall a. C a b => burble
 724 </verb></tscreen>
 725
 726 The next type is illegal because the constraint <tt>Eq b</tt> does not
 727 mention <tt>a</tt>:
 728
 729 <tscreen><verb>
 730   forall a. Eq b => burble
 731 </verb></tscreen>
 732
 733 The reason for this restriction is milder than the other one.  The
 734 excluded types are never useful or necessary (because the offending
 735 context doesn't need to be witnessed at this point; it can be floated
 736 out).  Furthermore, floating them out increases sharing. Lastly,
 737 excluding them is a conservative choice; it leaves a patch of
 738 territory free in case we need it later.
 739
 740 </enum>
 741
 742 These restrictions apply to all types, whether declared in a type signature
 743 or inferred.
 744
 745 Unlike Haskell 1.4, constraints in types do <bf>not</bf> have to be of
 746 the form <em>(class type-variables)</em>.  Thus, these type signatures
 747 are perfectly OK
 748
 749 <tscreen><verb>
 750   f :: Eq (m a) => [m a] -> [m a]
 751   g :: Eq [a] => ...
 752 </verb></tscreen>
 753
 754 This choice recovers principal types, a property that Haskell 1.4 does not have.
 755
 756 <sect2>Class declarations
 757 <p>
 758
 759 <enum>
 760
 761 <item> <bf>Multi-parameter type classes are permitted</bf>. For example:
 762
 763 <tscreen><verb>
 764   class Collection c a where
 765     union :: c a -> c a -> c a
 766     ...etc..
 767 </verb></tscreen>
 768
 769
 770 <item> <bf>The class hierarchy must be acyclic</bf>.  However, the definition
 771 of "acyclic" involves only the superclass relationships.  For example,
 772 this is OK:
 773
 774 <tscreen><verb>
 775   class C a where {
 776     op :: D b => a -> b -> b
 777   }
 778
 779   class C a => D a where { ... }
 780 </verb></tscreen>
 781
 782 Here, <tt>C</tt> is a superclass of <tt>D</tt>, but it's OK for a
 783 class operation <tt>op</tt> of <tt>C</tt> to mention <tt>D</tt>.  (It
 784 would not be OK for <tt>D</tt> to be a superclass of <tt>C</tt>.)
 785
 786 <item> <bf>There are no restrictions on the context in a class declaration
 787 (which introduces superclasses), except that the class hierarchy must
 788 be acyclic</bf>.  So these class declarations are OK:
 789
 790 <tscreen><verb>
 791   class Functor (m k) => FiniteMap m k where
 792     ...
 793
 794   class (Monad m, Monad (t m)) => Transform t m where
 795     lift :: m a -> (t m) a
 796 </verb></tscreen>
 797
 798 <item> <bf>In the signature of a class operation, every constraint
 799 must mention at least one type variable that is not a class type
 800 variable</bf>.
 801
 802 Thus:
 803
 804 <tscreen><verb>
 805   class Collection c a where
 806     mapC :: Collection c b => (a->b) -> c a -> c b
 807 </verb></tscreen>
 808
 809 is OK because the constraint <tt>(Collection a b)</tt> mentions
 810 <tt>b</tt>, even though it also mentions the class variable
 811 <tt>a</tt>.  On the other hand:
 812
 813 <tscreen><verb>
 814   class C a where
 815     op :: Eq a => (a,b) -> (a,b)
 816 </verb></tscreen>
 817
 818 is not OK because the constraint <tt>(Eq a)</tt> mentions on the class
 819 type variable <tt>a</tt>, but not <tt>b</tt>.  However, any such
 820 example is easily fixed by moving the offending context up to the
 821 superclass context:
 822
 823 <tscreen><verb>
 824   class Eq a => C a where
 825     op ::(a,b) -> (a,b)
 826 </verb></tscreen>
 827
 828 A yet more relaxed rule would allow the context of a class-op signature
 829 to mention only class type variables.  However, that conflicts with
 830 Rule 1(b) for types above.
 831
 832 <item> <bf>The type of each class operation must mention <em/all/ of
 833 the class type variables</bf>.  For example:
 834
 835 <tscreen><verb>
 836   class Coll s a where
 837     empty  :: s
 838     insert :: s -> a -> s
 839 </verb></tscreen>
 840
 841 is not OK, because the type of <tt>empty</tt> doesn't mention
 842 <tt>a</tt>.  This rule is a consequence of Rule 1(a), above, for
 843 types, and has the same motivation.
 844
 845 Sometimes, offending class declarations exhibit misunderstandings.  For
 846 example, <tt>Coll</tt> might be rewritten
 847
 848 <tscreen><verb>
 849   class Coll s a where
 850     empty  :: s a
 851     insert :: s a -> a -> s a
 852 </verb></tscreen>
 853
 854 which makes the connection between the type of a collection of
 855 <tt>a</tt>'s (namely <tt>(s a)</tt>) and the element type <tt>a</tt>.
 856 Occasionally this really doesn't work, in which case you can split the
 857 class like this:
 858
 859 <tscreen><verb>
 860   class CollE s where
 861     empty  :: s
 862
 863   class CollE s => Coll s a where
 864     insert :: s -> a -> s
 865 </verb></tscreen>
 866
 867 </enum>
 868
 869 <sect2>Instance declarations
 870 <p>
 871
 872 <enum>
 873
 874 <item> <bf>Instance declarations may not overlap</bf>.  The two instance
 875 declarations
 876
 877 <tscreen><verb>
 878   instance context1 => C type1 where ...
 879   instance context2 => C type2 where ...
 880 </verb></tscreen>
 881
 882 "overlap" if @type1@ and @type2@ unify
 883
 884 However, if you give the command line option
 885 @-fallow-overlapping-instances@<nidx>-fallow-overlapping-instances
 886 option</nidx> then two overlapping instance declarations are permitted
 887 iff
 888
 889 <itemize>
 890 <item> EITHER @type1@ and @type2@ do not unify
 891 <item> OR @type2@ is a substitution instance of @type1@
 892                 (but not identical to @type1@)
 893 <item> OR vice versa
 894 </itemize>
 895
 896 Notice that these rules
 897
 898 <itemize>
 899 <item> make it clear which instance decl to use
 900            (pick the most specific one that matches)
 901
 902 <item> do not mention the contexts @context1@, @context2@
 903             Reason: you can pick which instance decl
 904             "matches" based on the type.
 905 </itemize>
 906
 907 Regrettably, GHC doesn't guarantee to detect overlapping instance
 908 declarations if they appear in different modules.  GHC can "see" the
 909 instance declarations in the transitive closure of all the modules
 910 imported by the one being compiled, so it can "see" all instance decls
 911 when it is compiling <tt>Main</tt>.  However, it currently chooses not
 912 to look at ones that can't possibly be of use in the module currently
 913 being compiled, in the interests of efficiency.  (Perhaps we should
 914 change that decision, at least for <tt>Main</tt>.)
 915
 916 <item> <bf>There are no restrictions on the type in an instance
 917 <em/head/, except that at least one must not be a type variable</bf>.
 918 The instance "head" is the bit after the "=>" in an instance decl. For
 919 example, these are OK:
 920
 921 <tscreen><verb>
 922   instance C Int a where ...
 923
 924   instance D (Int, Int) where ...
 925
 926   instance E [[a]] where ...
 927 </verb></tscreen>
 928
 929 Note that instance heads <bf>may</bf> contain repeated type variables.
 930 For example, this is OK:
 931
 932 <tscreen><verb>
 933   instance Stateful (ST s) (MutVar s) where ...
 934 </verb></tscreen>
 935
 936 The "at least one not a type variable" restriction is to ensure that
 937 context reduction terminates: each reduction step removes one type
 938 constructor.  For example, the following would make the type checker
 939 loop if it wasn't excluded:
 940
 941 <tscreen><verb>
 942   instance C a => C a where ...
 943 </verb></tscreen>
 944
 945 There are two situations in which the rule is a bit of a pain. First,
 946 if one allows overlapping instance declarations then it's quite
 947 convenient to have a "default instance" declaration that applies if
 948 something more specific does not:
 949
 950 <tscreen><verb>
 951   instance C a where
 952     op = ... -- Default
 953 </verb></tscreen>
 954
 955 Second, sometimes you might want to use the following to get the
 956 effect of a "class synonym":
 957
 958 <tscreen><verb>
 959   class (C1 a, C2 a, C3 a) => C a where { }
 960
 961   instance (C1 a, C2 a, C3 a) => C a where { }
 962 </verb></tscreen>
 963
 964 This allows you to write shorter signatures:
 965
 966 <tscreen><verb>
 967   f :: C a => ...
 968 </verb></tscreen>
 969
 970 instead of
 971
 972 <tscreen><verb>
 973   f :: (C1 a, C2 a, C3 a) => ...
 974 </verb></tscreen>
 975
 976 I'm on the lookout for a simple rule that preserves decidability while
 977 allowing these idioms.  The experimental flag
 978 @-fallow-undecidable-instances@<nidx>-fallow-undecidable-instances
 979 option</nidx> lifts this restriction, allowing all the types in an
 980 instance head to be type variables.
 981
 982 <item> <bf>Unlike Haskell 1.4, instance heads may use type
 983 synonyms</bf>.  As always, using a type synonym is just shorthand for
 984 writing the RHS of the type synonym definition.  For example:
 985
 986 <tscreen><verb>
 987   type Point = (Int,Int)
 988   instance C Point   where ...
 989   instance C [Point] where ...
 990 </verb></tscreen>
 991
 992 is legal.  However, if you added
 993
 994 <tscreen><verb>
 995   instance C (Int,Int) where ...
 996 </verb></tscreen>
 997
 998 as well, then the compiler will complain about the overlapping
 999 (actually, identical) instance declarations.  As always, type synonyms
1000 must be fully applied.  You cannot, for example, write:
1001
1002 <tscreen><verb>
1003   type P a = [[a]]
1004   instance Monad P where ...
1005 </verb></tscreen>
1006
1007 This design decision is independent of all the others, and easily
1008 reversed, but it makes sense to me.
1009
1010 <item><bf>The types in an instance-declaration <em/context/ must all
1011 be type variables</bf>. Thus
1012
1013 <tscreen><verb>
1014   instance C a b => Eq (a,b) where ...
1015 </verb></tscreen>
1016
1017 is OK, but
1018
1019 <tscreen><verb>
1020   instance C Int b => Foo b where ...
1021 </verb></tscreen>
1022
1023 is not OK.  Again, the intent here is to make sure that context
1024 reduction terminates.
1025
1026 Voluminous correspondence on the Haskell mailing list has convinced me
1027 that it's worth experimenting with a more liberal rule.  If you use
1028 the flag <tt>-fallow-undecidable-instances</tt> you can use arbitrary
1029 types in an instance context.  Termination is ensured by having a
1030 fixed-depth recursion stack.  If you exceed the stack depth you get a
1031 sort of backtrace, and the opportunity to increase the stack depth
1032 with <tt>-fcontext-stack</tt><em/N/.
1033
1034 </enum>
1035
1036 % -----------------------------------------------------------------------------
1037 <sect1>Explicit universal quantification
1038 <label id="universal-quantification">
1039 <p>
1040
1041 GHC now allows you to write explicitly quantified types.  GHC's
1042 syntax for this now agrees with Hugs's, namely:
1043
1044 <tscreen><verb>
1045         forall a b. (Ord a, Eq  b) => a -> b -> a
1046 </verb></tscreen>
1047
1048 The context is, of course, optional.  You can't use <tt>forall</tt> as
1049 a type variable any more!
1050
1051 Haskell type signatures are implicitly quantified.  The <tt>forall</tt>
1052 allows us to say exactly what this means.  For example:
1053
1054 <tscreen><verb>
1055         g :: b -> b
1056 </verb></tscreen>
1057
1058 means this:
1059
1060 <tscreen><verb>
1061         g :: forall b. (b -> b)
1062 </verb></tscreen>
1063
1064 The two are treated identically.
1065
1066 <sect2>Universally-quantified data type fields
1067 <label id="univ">
1068 <p>
1069
1070 In a <tt>data</tt> or <tt>newtype</tt> declaration one can quantify
1071 the types of the constructor arguments.  Here are several examples:
1072
1073 <tscreen><verb>
1074 data T a = T1 (forall b. b -> b -> b) a
1075
1076 data MonadT m = MkMonad { return :: forall a. a -> m a,
1077                           bind   :: forall a b. m a -> (a -> m b) -> m b
1078                         }
1079
1080 newtype Swizzle = MkSwizzle (Ord a => [a] -> [a])
1081 </verb></tscreen>
1082
1083 The constructors now have so-called <em/rank 2/ polymorphic
1084 types, in which there is a for-all in the argument types.:
1085
1086 <tscreen><verb>
1087 T1 :: forall a. (forall b. b -> b -> b) -> a -> T1 a
1088 MkMonad :: forall m. (forall a. a -> m a)
1089                   -> (forall a b. m a -> (a -> m b) -> m b)
1090                   -> MonadT m
1091 MkSwizzle :: (Ord a => [a] -> [a]) -> Swizzle
1092 </verb></tscreen>
1093
1094 Notice that you don't need to use a <tt>forall</tt> if there's an
1095 explicit context.  For example in the first argument of the
1096 constructor <tt>MkSwizzle</tt>, an implicit "<tt>forall a.</tt>" is
1097 prefixed to the argument type.  The implicit <tt>forall</tt>
1098 quantifies all type variables that are not already in scope, and are
1099 mentioned in the type quantified over.
1100
1101 As for type signatures, implicit quantification happens for non-overloaded
1102 types too.  So if you write this:
1103 <tscreen><verb>
1104   data T a = MkT (Either a b) (b -> b)
1105 </verb></tscreen>
1106 it's just as if you had written this:
1107 <tscreen><verb>
1108   data T a = MkT (forall b. Either a b) (forall b. b -> b)
1109 </verb></tscreen>
1110 That is, since the type variable <tt>b</tt> isn't in scope, it's
1111 implicitly universally quantified.  (Arguably, it would be better
1112 to <em>require</em> explicit quantification on constructor arguments
1113 where that is what is wanted.  Feedback welcomed.)
1114
1115 <sect2> Construction
1116 <p>
1117
1118 You construct values of types <tt>T1, MonadT, Swizzle</tt> by applying
1119 the constructor to suitable values, just as usual.  For example,
1120
1121 <tscreen><verb>
1122 (T1 (\xy->x) 3) :: T Int
1123
1124 (MkSwizzle sort)    :: Swizzle
1125 (MkSwizzle reverse) :: Swizzle
1126
1127 (let r x = Just x
1128      b m k = case m of
1129                 Just y -> k y
1130                 Nothing -> Nothing
1131   in
1132   MkMonad r b) :: MonadT Maybe
1133 </verb></tscreen>
1134
1135 The type of the argument can, as usual, be more general than the type
1136 required, as <tt>(MkSwizzle reverse)</tt> shows.  (<tt>reverse</tt>
1137 does not need the <tt>Ord</tt> constraint.)
1138
1139 <sect2>Pattern matching
1140 <p>
1141
1142 When you use pattern matching, the bound variables may now have
1143 polymorphic types.  For example:
1144
1145 <tscreen><verb>
1146         f :: T a -> a -> (a, Char)
1147         f (T1 f k) x = (f k x, f 'c' 'd')
1148
1149         g :: (Ord a, Ord b) => Swizzle -> [a] -> (a -> b) -> [b]
1150         g (MkSwizzle s) xs f = s (map f (s xs))
1151
1152         h :: MonadT m -> [m a] -> m [a]
1153         h m [] = return m []
1154         h m (x:xs) = bind m x           $ \y ->
1155                       bind m (h m xs)   $ \ys ->
1156                       return m (y:ys)
1157 </verb></tscreen>
1158
1159 In the function <tt>h</tt> we use the record selectors <tt>return</tt>
1160 and <tt>bind</tt> to extract the polymorphic bind and return functions
1161 from the <tt>MonadT</tt> data structure, rather than using pattern
1162 matching.
1163
1164 <sect2>The partial-application restriction
1165 <p>
1166
1167 There is really only one way in which data structures with polymorphic
1168 components might surprise you: you must not partially apply them.
1169 For example, this is illegal:
1170
1171 <tscreen><verb>
1172         map MkSwizzle [sort, reverse]
1173 </verb></tscreen>
1174
1175 The restriction is this: <em>every subexpression of the program must
1176 have a type that has no for-alls, except that in a function
1177 application (f e1 ... en) the partial applications are not subject to
1178 this rule</em>.  The restriction makes type inference feasible.
1179
1180 In the illegal example, the sub-expression <tt>MkSwizzle</tt> has the
1181 polymorphic type <tt>(Ord b => [b] -> [b]) -> Swizzle</tt> and is not
1182 a sub-expression of an enclosing application.  On the other hand, this
1183 expression is OK:
1184
1185 <tscreen><verb>
1186         map (T1 (\a b -> a)) [1,2,3]
1187 </verb></tscreen>
1188
1189 even though it involves a partial application of <tt>T1</tt>, because
1190 the sub-expression <tt>T1 (\a b -> a)</tt> has type <tt>Int -> T
1191 Int</tt>.
1192
1193 <sect2>Type signatures
1194 <label id="sigs">
1195 <p>
1196
1197 Once you have data constructors with universally-quantified fields, or
1198 constants such as <tt>runST</tt> that have rank-2 types, it isn't long
1199 before you discover that you need more!  Consider:
1200
1201 <tscreen><verb>
1202   mkTs f x y = [T1 f x, T1 f y]
1203 </verb></tscreen>
1204
1205 <tt>mkTs</tt> is a fuction that constructs some values of type
1206 <tt>T</tt>, using some pieces passed to it.  The trouble is that since
1207 <tt>f</tt> is a function argument, Haskell assumes that it is
1208 monomorphic, so we'll get a type error when applying <tt>T1</tt> to
1209 it.  This is a rather silly example, but the problem really bites in
1210 practice.  Lots of people trip over the fact that you can't make
1211 "wrappers functions" for <tt>runST</tt> for exactly the same reason.
1212 In short, it is impossible to build abstractions around functions with
1213 rank-2 types.
1214
1215 The solution is fairly clear.  We provide the ability to give a rank-2
1216 type signature for <em>ordinary</em> functions (not only data
1217 constructors), thus:
1218
1219 <tscreen><verb>
1220   mkTs :: (forall b. b -> b -> b) -> a -> [T a]
1221   mkTs f x y = [T1 f x, T1 f y]
1222 </verb></tscreen>
1223
1224 This type signature tells the compiler to attribute <tt>f</tt> with
1225 the polymorphic type <tt>(forall b. b -> b -> b)</tt> when type
1226 checking the body of <tt>mkTs</tt>, so now the application of
1227 <tt>T1</tt> is fine.
1228
1229 There are two restrictions:
1230
1231 <itemize>
1232 <item> You can only define a rank 2 type, specified by the following
1233 grammar:
1234
1235 <tscreen><verb>
1236    rank2type ::= [forall tyvars .] [context =>] funty
1237    funty     ::= ([forall tyvars .] [context =>] ty) -> funty
1238                | ty
1239    ty        ::= ...current Haskell monotype syntax...
1240 </verb></tscreen>
1241
1242 Informally, the universal quantification must all be right at the beginning,
1243 or at the top level of a function argument.
1244
1245 <item> There is a restriction on the definition of a function whose
1246 type signature is a rank-2 type: the polymorphic arguments must be
1247 matched on the left hand side of the "<tt>=</tt>" sign.  You can't
1248 define <tt>mkTs</tt> like this:
1249
1250 <tscreen><verb>
1251   mkTs :: (forall b. b -> b -> b) -> a -> [T a]
1252   mkTs = \ f x y -> [T1 f x, T1 f y]
1253 </verb></tscreen>
1254
1255
1256 The same partial-application rule applies to ordinary functions with
1257 rank-2 types as applied to data constructors.
1258
1259 </itemize>
1260
1261 % -----------------------------------------------------------------------------
1262 <sect1>Existentially quantified data constructors
1263 <label id="existential-quantification">
1264 <p>
1265
1266 The idea of using existential quantification in data type declarations
1267 was suggested by Laufer (I believe, thought doubtless someone will
1268 correct me), and implemented in Hope+. It's been in Lennart
1269 Augustsson's <tt>hbc</tt> Haskell compiler for several years, and
1270 proved very useful.  Here's the idea.  Consider the declaration:
1271
1272 <tscreen><verb>
1273   data Foo = forall a. MkFoo a (a -> Bool)
1274            | Nil
1275 </verb></tscreen>
1276
1277 The data type <tt>Foo</tt> has two constructors with types:
1278
1279 <tscreen><verb>
1280   MkFoo :: forall a. a -> (a -> Bool) -> Foo
1281   Nil   :: Foo
1282 </verb></tscreen>
1283
1284 Notice that the type variable <tt>a</tt> in the type of <tt>MkFoo</tt>
1285 does not appear in the data type itself, which is plain <tt>Foo</tt>.
1286 For example, the following expression is fine:
1287
1288 <tscreen><verb>
1289   [MkFoo 3 even, MkFoo 'c' isUpper] :: [Foo]
1290 </verb></tscreen>
1291
1292 Here, <tt>(MkFoo 3 even)</tt> packages an integer with a function
1293 <tt>even</tt> that maps an integer to <tt>Bool</tt>; and <tt>MkFoo 'c'
1294 isUpper</tt> packages a character with a compatible function.  These
1295 two things are each of type <tt>Foo</tt> and can be put in a list.
1296
1297 What can we do with a value of type <tt>Foo</tt>?.  In particular,
1298 what happens when we pattern-match on <tt>MkFoo</tt>?
1299
1300 <tscreen><verb>
1301   f (MkFoo val fn) = ???
1302 </verb></tscreen>
1303
1304 Since all we know about <tt>val</tt> and <tt>fn</tt> is that they
1305 are compatible, the only (useful) thing we can do with them is to
1306 apply <tt>fn</tt> to <tt>val</tt> to get a boolean.  For example:
1307
1308 <tscreen><verb>
1309   f :: Foo -> Bool
1310   f (MkFoo val fn) = fn val
1311 </verb></tscreen>
1312
1313 What this allows us to do is to package heterogenous values
1314 together with a bunch of functions that manipulate them, and then treat
1315 that collection of packages in a uniform manner.  You can express
1316 quite a bit of object-oriented-like programming this way.
1317
1318 <sect2>Why existential?
1319 <label id="existential">
1320 <p>
1321
1322 What has this to do with <em>existential</em> quantification?
1323 Simply that <tt>MkFoo</tt> has the (nearly) isomorphic type
1324
1325 <tscreen><verb>
1326   MkFoo :: (exists a . (a, a -> Bool)) -> Foo
1327 </verb></tscreen>
1328
1329 But Haskell programmers can safely think of the ordinary
1330 <em>universally</em> quantified type given above, thereby avoiding
1331 adding a new existential quantification construct.
1332
1333 <sect2>Type classes
1334 <p>
1335
1336 An easy extension (implemented in <tt>hbc</tt>) is to allow
1337 arbitrary contexts before the constructor.  For example:
1338
1339 <tscreen><verb>
1340   data Baz = forall a. Eq a => Baz1 a a
1341            | forall b. Show b => Baz2 b (b -> b)
1342 </verb></tscreen>
1343
1344 The two constructors have the types you'd expect:
1345
1346 <tscreen><verb>
1347   Baz1 :: forall a. Eq a => a -> a -> Baz
1348   Baz2 :: forall b. Show b => b -> (b -> b) -> Baz
1349 </verb></tscreen>
1350
1351 But when pattern matching on <tt>Baz1</tt> the matched values can be compared
1352 for equality, and when pattern matching on <tt>Baz2</tt> the first matched
1353 value can be converted to a string (as well as applying the function to it).
1354 So this program is legal:
1355
1356 <tscreen><verb>
1357   f :: Baz -> String
1358   f (Baz1 p q) | p == q    = "Yes"
1359                | otherwise = "No"
1360   f (Baz1 v fn)            = show (fn v)
1361 </verb></tscreen>
1362
1363 Operationally, in a dictionary-passing implementation, the
1364 constructors <tt>Baz1</tt> and <tt>Baz2</tt> must store the
1365 dictionaries for <tt>Eq</tt> and <tt>Show</tt> respectively, and
1366 extract it on pattern matching.
1367
1368 Notice the way that the syntax fits smoothly with that used for
1369 universal quantification earlier.
1370
1371 <sect2>Restrictions
1372 <p>
1373
1374 There are several restrictions on the ways in which existentially-quantified
1375 constructors can be use.
1376
1377 <itemize>
1378
1379 <item> When pattern matching, each pattern match introduces a new,
1380 distinct, type for each existential type variable.  These types cannot
1381 be unified with any other type, nor can they escape from the scope of
1382 the pattern match.  For example, these fragments are incorrect:
1383
1384 <tscreen><verb>
1385   f1 (MkFoo a f) = a
1386 </verb></tscreen>
1387
1388 Here, the type bound by <tt>MkFoo</tt> "escapes", because <tt>a</tt>
1389 is the result of <tt>f1</tt>.  One way to see why this is wrong is to
1390 ask what type <tt>f1</tt> has:
1391
1392 <tscreen><verb>
1393   f1 :: Foo -> a             -- Weird!
1394 </verb></tscreen>
1395
1396 What is this "<tt>a</tt>" in the result type? Clearly we don't mean
1397 this:
1398
1399 <tscreen><verb>
1400   f1 :: forall a. Foo -> a   -- Wrong!
1401 </verb></tscreen>
1402
1403 The original program is just plain wrong.  Here's another sort of error
1404
1405 <tscreen><verb>
1406   f2 (Baz1 a b) (Baz1 p q) = a==q
1407 </verb></tscreen>
1408
1409 It's ok to say <tt>a==b</tt> or <tt>p==q</tt>, but
1410 <tt>a==q</tt> is wrong because it equates the two distinct types arising
1411 from the two <tt>Baz1</tt> constructors.
1412
1413
1414 <item>You can't pattern-match on an existentially quantified
1415 constructor in a <tt>let</tt> or <tt>where</tt> group of
1416 bindings. So this is illegal:
1417
1418 <tscreen><verb>
1419   f3 x = a==b where { Baz1 a b = x }
1420 </verb></tscreen>
1421
1422 You can only pattern-match
1423 on an existentially-quantified constructor in a <tt>case</tt> expression or
1424 in the patterns of a function definition.
1425
1426 The reason for this restriction is really an implementation one.
1427 Type-checking binding groups is already a nightmare without
1428 existentials complicating the picture.  Also an existential pattern
1429 binding at the top level of a module doesn't make sense, because it's
1430 not clear how to prevent the existentially-quantified type "escaping".
1431 So for now, there's a simple-to-state restriction.  We'll see how
1432 annoying it is.
1433
1434 <item>You can't use existential quantification for <tt>newtype</tt>
1435 declarations.  So this is illegal:
1436
1437 <tscreen><verb>
1438   newtype T = forall a. Ord a => MkT a
1439 </verb></tscreen>
1440
1441 Reason: a value of type <tt>T</tt> must be represented as a pair
1442 of a dictionary for <tt>Ord t</tt> and a value of type <tt>t</tt>.
1443 That contradicts the idea that <tt>newtype</tt> should have no
1444 concrete representation.  You can get just the same efficiency and effect
1445 by using <tt>data</tt> instead of <tt>newtype</tt>.  If there is no
1446 overloading involved, then there is more of a case for allowing
1447 an existentially-quantified <tt>newtype</tt>, because the <tt>data</tt>
1448 because the <tt>data</tt> version does carry an implementation cost,
1449 but single-field existentially quantified constructors aren't much
1450 use.  So the simple restriction (no existential stuff on <tt>newtype</tt>)
1451 stands, unless there are convincing reasons to change it.
1452 </itemize>
1453
1454 % -----------------------------------------------------------------------------
1455 <sect1>Scoped Type Variables
1456 <label id="scoped-type-variables">
1457 <p>
1458
1459 A <em/pattern type signature/ can introduce a <em/scoped type
1460 variable/.  For example
1461
1462 <tscreen><verb>
1463 f (xs::[a]) = ys ++ ys
1464            where
1465               ys :: [a]
1466               ys = reverse xs
1467 </verb></tscreen>
1468
1469 The pattern @(xs::[a])@ includes a type signature for @xs@.
1470 This brings the type variable @a@ into scope; it scopes over
1471 all the patterns and right hand sides for this equation for @f@.
1472 In particular, it is in scope at the type signature for @y@.
1473
1474 At ordinary type signatures, such as that for @ys@, any type variables
1475 mentioned in the type signature <em/that are not in scope/ are
1476 implicitly universally quantified.  (If there are no type variables in
1477 scope, all type variables mentioned in the signature are universally
1478 quantified, which is just as in Haskell 98.)  In this case, since @a@
1479 is in scope, it is not universally quantified, so the type of @ys@ is
1480 the same as that of @xs@.  In Haskell 98 it is not possible to declare
1481 a type for @ys@; a major benefit of scoped type variables is that
1482 it becomes possible to do so.
1483
1484 Scoped type variables are implemented in both GHC and Hugs.  Where the
1485 implementations differ from the specification below, those differences
1486 are noted.
1487
1488 So much for the basic idea.  Here are the details.
1489
1490 <sect2>Scope and implicit quantification
1491 <p>
1492
1493 <itemize>
1494 <item> All the type variables mentioned in the patterns for a single
1495 function definition equation, that are not already in scope,
1496 are brought into scope by the patterns.  We describe this set as
1497 the <em/type variables bound by the equation/.
1498
1499 <item> The type variables thus brought into scope may be mentioned
1500 in ordinary type signatures or pattern type signatures anywhere within
1501 their scope.
1502
1503 <item> In ordinary type signatures, any type variable mentioned in the
1504 signature that is in scope is <em/not/ universally quantified.
1505
1506 <item> Ordinary type signatures do not bring any new type variables
1507 into scope (except in the type signature itself!). So this is illegal:
1508
1509 <tscreen><verb>
1510   f :: a -> a
1511   f x = x::a
1512 </verb></tscreen>
1513
1514 It's illegal because @a@ is not in scope in the body of @f@,
1515 so the ordinary signature @x::a@ is equivalent to @x::forall a.a@;
1516 and that is an incorrect typing.
1517
1518 <item> There is no implicit universal quantification on pattern type
1519 signatures, nor may one write an explicit @forall@ type in a pattern
1520 type signature.  The pattern type signature is a monotype.
1521
1522 <item>
1523 The type variables in the head of a @class@ or @instance@ declaration
1524 scope over the methods defined in the @where@ part.  For example:
1525
1526 <tscreen><verb>
1527   class C a where
1528     op :: [a] -> a
1529
1530     op xs = let ys::[a]
1531                 ys = reverse xs
1532             in
1533             head ys
1534 </verb></tscreen>
1535
1536 (Not implemented in Hugs yet, Dec 98).
1537 </itemize>
1538
1539 <sect2>Polymorphism
1540 <p>
1541
1542 <itemize>
1543 <item> Pattern type signatures are completely orthogonal to ordinary, separate
1544 type signatures.  The two can be used independently or together.  There is
1545 no scoping associated with the names of the type variables in a separate type signature.
1546
1547 <tscreen><verb>
1548    f :: [a] -> [a]
1549    f (xs::[b]) = reverse xs
1550 </verb></tscreen>
1551
1552 <item> The function must be polymorphic in the type variables
1553 bound by all its equations.  Operationally, the type variables bound
1554 by one equation must not:
1555
1556 <itemize>
1557 <item> Be unified with a type (such as @Int@, or @[a]@).
1558 <item> Be unified with a type variable free in the environment.
1559 <item> Be unified with each other.  (They may unify with the type variables
1560 bound by another equation for the same function, of course.)
1561 </itemize>
1562
1563 For example, the following all fail to type check:
1564
1565 <tscreen><verb>
1566   f (x::a) (y::b) = [x,y]       -- a unifies with b
1567
1568   g (x::a) = x + 1::Int         -- a unifies with Int
1569
1570   h x = let k (y::a) = [x,y]    -- a is free in the
1571         in k x                  -- environment
1572
1573   k (x::a) True    = ...        -- a unifies with Int
1574   k (x::Int) False = ...
1575
1576   w :: [b] -> [b]
1577   w (x::a) = x                  -- a unifies with [b]
1578 </verb></tscreen>
1579
1580 <item> The pattern-bound type variable may, however, be constrained
1581 by the context of the principal type, thus:
1582
1583 <tscreen><verb>
1584   f (x::a) (y::a) = x+y*2
1585 </verb></tscreen>
1586
1587 gets the inferred type: @forall a. Num a => a -> a -> a@.
1588 </itemize>
1589
1590 <sect2>Result type signatures
1591 <p>
1592
1593 <itemize>
1594 <item> The result type of a function can be given a signature,
1595 thus:
1596
1597 <tscreen><verb>
1598   f (x::a) :: [a] = [x,x,x]
1599 </verb></tscreen>
1600
1601 The final @":: [a]"@ after all the patterns gives a signature to the
1602 result type.  Sometimes this is the only way of naming the type variable
1603 you want:
1604
1605 <tscreen><verb>
1606   f :: Int -> [a] -> [a]
1607   f n :: ([a] -> [a]) = let g (x::a, y::a) = (y,x)
1608                         in \xs -> map g (reverse xs `zip` xs)
1609 </verb></tscreen>
1610
1611 </itemize>
1612
1613 Result type signatures are not yet implemented in Hugs.
1614
1615 <sect2>Pattern signatures on other constructs
1616 <p>
1617
1618 <itemize>
1619 <item> A pattern type signature can be on an arbitrary sub-pattern, not
1620 just on a variable:
1621
1622 <tscreen><verb>
1623   f ((x,y)::(a,b)) = (y,x) :: (b,a)
1624 </verb></tscreen>
1625
1626 <item> Pattern type signatures, including the result part, can be used
1627 in lambda abstractions:
1628
1629 <tscreen><verb>
1630   (\ (x::a, y) :: a -> x)
1631 </verb></tscreen>
1632
1633 Type variables bound by these patterns must be polymorphic in
1634 the sense defined above.
1635 For example:
1636
1637 <tscreen><verb>
1638   f1 (x::c) = f1 x      -- ok
1639   f2 = \(x::c) -> f2 x  -- not ok
1640 </verb></tscreen>
1641
1642 Here, @f1@ is OK, but @f2@ is not, because @c@ gets unified
1643 with a type variable free in the environment, in this
1644 case, the type of @f2@, which is in the environment when
1645 the lambda abstraction is checked.
1646
1647 <item> Pattern type signatures, including the result part, can be used
1648 in @case@ expressions:
1649
1650 <tscreen><verb>
1651   case e of { (x::a, y) :: a -> x }
1652 </verb></tscreen>
1653
1654 The pattern-bound type variables must, as usual,
1655 be polymorphic in the following sense: each case alternative,
1656 considered as a lambda abstraction, must be polymorphic.
1657 Thus this is OK:
1658
1659 <tscreen><verb>
1660   case (True,False) of { (x::a, y) -> x }
1661 </verb></tscreen>
1662
1663 Even though the context is that of a pair of booleans,
1664 the alternative itself is polymorphic.  Of course, it is
1665 also OK to say:
1666
1667 <tscreen><verb>
1668   case (True,False) of { (x::Bool, y) -> x }
1669 </verb></tscreen>
1670
1671 <item>
1672 To avoid ambiguity, the type after the ``@::@'' in a result
1673 pattern signature on a lambda or @case@ must be atomic (i.e. a single
1674 token or a parenthesised type of some sort).  To see why,
1675 consider how one would parse this:
1676
1677 <tscreen><verb>
1678   \ x :: a -> b -> x
1679 </verb></tscreen>
1680
1681 <item> Pattern type signatures that bind new type variables
1682 may not be used in pattern bindings at all.
1683 So this is illegal:
1684
1685 <tscreen><verb>
1686   f x = let (y, z::a) = x in ...
1687 </verb></tscreen>
1688
1689 But these are OK, because they do not bind fresh type variables:
1690
1691 <tscreen><verb>
1692   f1 x            = let (y, z::Int) = x in ...
1693   f2 (x::(Int,a)) = let (y, z::a)   = x in ...
1694 </verb></tscreen>
1695
1696 However a single variable is considered a degenerate function binding,
1697 rather than a degerate pattern binding, so this is permitted, even
1698 though it binds a type variable:
1699
1700 <tscreen><verb>
1701   f :: (b->b) = \(x::b) -> x
1702 </verb></tscreen>
1703
1704 </itemize>
1705 Such degnerate function bindings do not fall under the monomorphism
1706 restriction.  Thus:
1707
1708 <tscreen><verb>
1709   g :: a -> a -> Bool = \x y. x==y
1710 </verb></tscreen>
1711
1712 Here @g@ has type @forall a. Eq a => a -> a -> Bool@, just as if
1713 @g@ had a separate type signature.  Lacking a type signature, @g@
1714 would get a monomorphic type.
1715
1716 <sect2>Existentials
1717 <p>
1718
1719 <itemize>
1720 <item> Pattern type signatures can bind existential type variables.
1721 For example:
1722
1723 <tscreen><verb>
1724   data T = forall a. MkT [a]
1725
1726   f :: T -> T
1727   f (MkT [t::a]) = MkT t3
1728                  where
1729                    t3::[a] = [t,t,t]
1730 </verb></tscreen>
1731
1732 </itemize>