2 % $Id: glasgow_exts.vsgml,v 1.3 1998/12/02 13:20:38 simonm Exp $
4 % GHC Language Extensions.
7 As with all known Haskell systems, GHC implements some extensions to
8 the language. To use them, you'll need to give a @-fglasgow-exts@%
9 <nidx>-fglasgow-exts option</nidx> option.
11 Virtually all of the Glasgow extensions serve to give you access to
12 the underlying facilities with which we implement Haskell. Thus, you
13 can get at the Raw Iron, if you are willing to write some non-standard
14 code at a more primitive level. You need not be ``stuck'' on
15 performance because of the implementation costs of Haskell's
16 ``high-level'' features---you can always code ``under'' them. In an
17 extreme case, you can write all your time-critical code in C, and then
18 just glue it together with Haskell!
20 Executive summary of our extensions:
24 <tag>Unboxed types and primitive operations:</tag>
26 You can get right down to the raw machine types and operations;
27 included in this are ``primitive arrays'' (direct access to Big Wads
28 of Bytes). Please see Section <ref name="Unboxed types"
29 id="glasgow-unboxed"> and following.
31 <tag>Multi-parameter type classes:</tag>
33 GHC's type system supports extended type classes with multiple
34 parameters. Please see Section <ref name="Mult-parameter type
35 classes" id="multi-param-type-classes">.
37 <tag>Local universal quantification:</tag>
39 GHC's type system supports explicit unversal quantification in
40 constructor fields and function arguments. This is useful for things
41 like defining @runST@ from the state-thread world. See Section <ref
42 name="Local universal quantification" id="universal-quantification">.
44 <tag>Extistentially quantification in data types:</tag>
46 Some or all of the type variables in a datatype declaration may be
47 <em>existentially quantified</em>. More details in Section <ref
48 name="Existential Quantification" id="existential-quantification">.
50 <tag>Calling out to C:</tag>
52 Just what it sounds like. We provide <em>lots</em> of rope that you
53 can dangle around your neck. Please see Section <ref name="Calling~C
54 directly from Haskell" id="glasgow-ccalls">.
58 Before you get too carried away working at the lowest level (e.g.,
59 sloshing @MutableByteArray#@s around your program), you may wish to
60 check if there are system libraries that provide a ``Haskellised
61 veneer'' over the features you want. See Section <ref name="GHC
62 Prelude and libraries" id="ghc-prelude">.
64 %************************************************************************
67 <label id="glasgow-unboxed">
69 <nidx>Unboxed types (Glasgow extension)</nidx>
71 %************************************************************************
73 These types correspond to the ``raw machine'' types you would use in
74 C: @Int#@ (long int), @Double#@ (double), @Addr#@ (void *), etc. The
75 <em>primitive operations</em> (PrimOps) on these types are what you
76 might expect; e.g., @(+#)@ is addition on @Int#@s, and is the
77 machine-addition that we all know and love---usually one instruction.
79 There are some restrictions on the use of unboxed types, the main one
80 being that you can't pass an unboxed value to a polymorphic function
81 or store one in a polymorphic data type. This rules out things like
82 @[Int#]@ (ie. lists of unboxed integers). The reason for this
83 restriction is that polymorphic arguments and constructor fields are
84 assumed to be pointers: if an unboxed integer is stored in one of
85 these, the garbage collector would attempt to follow it, leading to
86 unpredictable space leaks. Or a @seq@ operation on the polymorphic
87 component may attempt to dereference the pointer, with disastrous
88 results. Even worse, the unboxed value might be larger than a pointer
89 (@Double#@ for instance).
91 Nevertheless, A numerically-intensive program using unboxed types can
92 go a <em>lot</em> faster than its ``standard'' counterpart---we saw a
93 threefold speedup on one example.
95 Please see Section <ref name="The module PrelGHC: really primitive
96 stuff" id="ghc-libs-ghc"> for the details of unboxed types and the
99 %************************************************************************
101 <sect1>Primitive state-transformer monad
102 <label id="glasgow-ST-monad">
104 <nidx>state transformers (Glasgow extensions)</nidx>
105 <nidx>ST monad (Glasgow extension)</nidx>
107 %************************************************************************
109 This monad underlies our implementation of arrays, mutable and
110 immutable, and our implementation of I/O, including ``C calls''.
112 The @ST@ library, which provides access to the @ST@ monad, is a
113 GHC/Hugs extension library and is described in the separate <htmlurl
114 name="GHC/Hugs Extension Libraries" url="libs.html"> document.
116 %************************************************************************
118 <sect1>Primitive arrays, mutable and otherwise
119 <label id="glasgow-prim-arrays">
121 <nidx>primitive arrays (Glasgow extension)</nidx>
122 <nidx>arrays, primitive (Glasgow extension)</nidx>
124 %************************************************************************
126 GHC knows about quite a few flavours of Large Swathes of Bytes.
128 First, GHC distinguishes between primitive arrays of (boxed) Haskell
129 objects (type @Array# obj@) and primitive arrays of bytes (type
132 Second, it distinguishes between...
134 <tag>Immutable:</tag>
135 Arrays that do not change (as with ``standard'' Haskell arrays); you
136 can only read from them. Obviously, they do not need the care and
137 attention of the state-transformer monad.
140 Arrays that may be changed or ``mutated.'' All the operations on them
141 live within the state-transformer monad and the updates happen
144 <tag>``Static'' (in C land):</tag>
145 A C routine may pass an @Addr#@ pointer back into Haskell land. There
146 are then primitive operations with which you may merrily grab values
147 over in C land, by indexing off the ``static'' pointer.
149 <tag>``Stable'' pointers:</tag>
150 If, for some reason, you wish to hand a Haskell pointer (i.e.,
151 <em>not</em> an unboxed value) to a C routine, you first make the
152 pointer ``stable,'' so that the garbage collector won't forget that it
153 exists. That is, GHC provides a safe way to pass Haskell pointers to
156 Please see Section <ref name="Subverting automatic unboxing with
157 ``stable pointers''" id="glasgow-stablePtrs"> for more details.
159 <tag>``Foreign objects'':</tag>
160 A ``foreign object'' is a safe way to pass an external object (a
161 C-allocated pointer, say) to Haskell and have Haskell do the Right
162 Thing when it no longer references the object. So, for example, C
163 could pass a large bitmap over to Haskell and say ``please free this
164 memory when you're done with it.''
166 Please see Section <ref name="Pointing outside the Haskell heap"
167 id="glasgow-foreignObjs"> for more details.
171 The libraries section gives more details on all these ``primitive
172 array'' types and the operations on them, Section <ref name="The GHC
173 Prelude and Libraries" id="ghc-prelude">. Some of these extensions
174 are also supported by Hugs, and the supporting libraries are described
175 in the <htmlurl name="GHC/Hugs Extension Libraries" url="libs.html">
178 %************************************************************************
180 <sect1>Calling~C directly from Haskell
181 <label id="glasgow-ccalls">
183 <nidx>C calls (Glasgow extension)</nidx>
184 <nidx>_ccall_ (Glasgow extension)</nidx>
185 <nidx>_casm_ (Glasgow extension)</nidx>
187 %************************************************************************
189 GOOD ADVICE: Because this stuff is not Entirely Stable as far as names
190 and things go, you would be well-advised to keep your C-callery
191 corraled in a few modules, rather than sprinkled all over your code.
192 It will then be quite easy to update later on.
194 %************************************************************************
196 <sect2>@_ccall_@ and @_casm_@: an introduction
197 <label id="ccall-intro">
200 %************************************************************************
202 The simplest way to use a simple C function
205 double fooC( FILE *in, char c, int i, double d, unsigned int u )
208 is to provide a Haskell wrapper:
211 fooH :: Char -> Int -> Double -> Word -> IO Double
212 fooH c i d w = _ccall_ fooC (``stdin''::Addr) c i d w
215 The function @fooH@ will unbox all of its arguments, call the C
216 function @fooC@ and box the corresponding arguments.
218 One of the annoyances about @_ccall_@s is when the C types don't quite
219 match the Haskell compiler's ideas. For this, the @_casm_@ variant
220 may be just the ticket (NB: <em>no chance</em> of such code going
221 through a native-code generator):
225 = _casm_ ``%r = getenv((char *) %0);'' name >>= \ litstring@(A# str#) ->
227 if (litstring == ``NULL'') then
228 Left ("Fail:oldGetEnv:"++name)
230 Right (unpackCString# str#)
234 The first literal-literal argument to a @_casm_@ is like a @printf@
235 format: @%r@ is replaced with the ``result,'' @%0@--@%n-1@ are
236 replaced with the 1st--nth arguments. As you can see above, it is an
237 easy way to do simple C~casting. Everything said about @_ccall_@ goes
238 for @_casm_@ as well.
240 The use of @_casm_@ in your code does pose a problem to the compiler
241 when it comes to generating an interface file for a freshly compiled
242 module. Included in an interface file is the unfolding (if any) of a
243 declaration. However, if a declaration's unfolding happens to contain
244 a @_casm_@, its unfolding will <em/not/ be emitted into the interface
245 file even if it qualifies by all the other criteria. The reason why
246 the compiler prevents this from happening is that unfolding @_casm_@s
247 into an interface file unduly constrains how code that import your
248 module have to be compiled. If an imported declaration is unfolded and
249 it contains a @_casm_@, you now have to be using a compiler backend
250 capable of dealing with it (i.e., the C compiler backend). If you are
251 using the C compiler backend, the unfolded @_casm_@ may still cause you
252 problems since the C code snippet it contains may mention CPP symbols
253 that were in scope when compiling the original module are not when
254 compiling the importing module.
256 If you're willing to put up with the drawbacks of doing cross-module
257 inlining of C code (GHC - A Better C Compiler :-), the option
258 @-funfold-casms-in-hi-file@ will turn off the default behaviour.
259 <nidx>-funfold-casms-in-hi-file option</nidx>
261 %************************************************************************
263 <sect2>Using function headers
264 <label id="glasgow-foreign-headers">
266 <nidx>C calls, function headers</nidx>
268 %************************************************************************
270 When generating C (using the @-fvia-C@ directive), one can assist the
271 C compiler in detecting type errors by using the @-#include@ directive
272 to provide @.h@ files containing function headers.
277 typedef unsigned long *StgForeignObj;
280 void initialiseEFS (StgInt size);
281 StgInt terminateEFS (void);
282 StgForeignObj emptyEFS(void);
283 StgForeignObj updateEFS (StgForeignObj a, StgInt i, StgInt x);
284 StgInt lookupEFS (StgForeignObj a, StgInt i);
287 You can find appropriate definitions for @StgInt@, @StgForeignObj@,
288 etc using @gcc@ on your architecture by consulting
289 @ghc/includes/StgTypes.h@. The following table summarises the
290 relationship between Haskell types and C types.
293 <bf>C type name</bf> | <bf>Haskell Type</bf> @@
295 @StgChar@ | @Char#@ @@
297 @StgWord@ | @Word#@ @@
298 @StgAddr@ | @Addr#@ @@
299 @StgFloat@ | @Float#@ @@
300 @StgDouble@ | @Double#@ @@
302 @StgArray@ | @Array#@ @@
303 @StgByteArray@ | @ByteArray#@ @@
304 @StgArray@ | @MutableArray#@ @@
305 @StgByteArray@ | @MutableByteArray#@ @@
307 @StgStablePtr@ | @StablePtr#@ @@
308 @StgForeignObj@ | @ForeignObj#@
311 Note that this approach is only <em>essential</em> for returning
312 @float@s (or if @sizeof(int) != sizeof(int *)@ on your
313 architecture) but is a Good Thing for anyone who cares about writing
314 solid code. You're crazy not to do it.
316 %************************************************************************
318 <sect2>Subverting automatic unboxing with ``stable pointers''
319 <label id="glasgow-stablePtrs">
321 <nidx>stable pointers (Glasgow extension)</nidx>
323 %************************************************************************
325 The arguments of a @_ccall_@ are automatically unboxed before the
326 call. There are two reasons why this is usually the Right Thing to
331 C is a strict language: it would be excessively tedious to pass
332 unevaluated arguments and require the C programmer to force their
333 evaluation before using them.
335 <item> Boxed values are stored on the Haskell heap and may be moved
336 within the heap if a garbage collection occurs---that is, pointers
337 to boxed objects are not <em>stable</em>.
340 It is possible to subvert the unboxing process by creating a ``stable
341 pointer'' to a value and passing the stable pointer instead. For
342 example, to pass/return an integer lazily to C functions @storeC@ and
343 @fetchC@, one might write:
346 storeH :: Int -> IO ()
347 storeH x = makeStablePtr x >>= \ stable_x ->
348 _ccall_ storeC stable_x
351 fetchH x = _ccall_ fetchC >>= \ stable_x ->
352 deRefStablePtr stable_x >>= \ x ->
353 freeStablePtr stable_x >>
357 The garbage collector will refrain from throwing a stable pointer away
358 until you explicitly call one of the following from C or Haskell.
361 void freeStablePointer( StgStablePtr stablePtrToToss )
362 freeStablePtr :: StablePtr a -> IO ()
365 As with the use of @free@ in C programs, GREAT CARE SHOULD BE
366 EXERCISED to ensure these functions are called at the right time: too
367 early and you get dangling references (and, if you're lucky, an error
368 message from the runtime system); too late and you get space leaks.
370 And to force evaluation of the argument within @fooC@, one would
371 call one of the following C functions (according to type of argument).
374 void performIO ( StgStablePtr stableIndex /* StablePtr s (IO ()) */ );
375 StgInt enterInt ( StgStablePtr stableIndex /* StablePtr s Int */ );
376 StgFloat enterFloat ( StgStablePtr stableIndex /* StablePtr s Float */ );
379 <nidx>performIO</nidx>
380 <nidx>enterInt</nidx>
381 <nidx>enterFloat</nidx>
383 Note Bene: @_ccall_GC_@<nidx>_ccall_GC_</nidx> must be used if any of
384 these functions are used.
386 %************************************************************************
388 <sect2>Foreign objects: pointing outside the Haskell heap
389 <label id="glasgow-foreignObjs">
391 <nidx>foreign objects (Glasgow extension)</nidx>
393 %************************************************************************
395 There are two types that @ghc@ programs can use to reference
396 (heap-allocated) objects outside the Haskell world: @Addr@ and
399 If you use @Addr@, it is up to you to the programmer to arrange
400 allocation and deallocation of the objects.
402 If you use @ForeignObj@, @ghc@'s garbage collector will call upon the
403 user-supplied <em>finaliser</em> function to free the object when the
404 Haskell world no longer can access the object. (An object is
405 associated with a finaliser function when the abstract
406 Haskell type @ForeignObj@ is created). The finaliser function is
407 expressed in C, and is passed as argument the object:
410 void foreignFinaliser ( StgForeignObj fo )
413 when the Haskell world can no longer access the object. Since
414 @ForeignObj@s only get released when a garbage collection occurs, we
415 provide ways of triggering a garbage collection from within C and from
419 void GarbageCollect()
423 More information on the programmers' interface to @ForeignObj@ can be
424 found in Section <ref name="Foreign objects" id="sec:foreign-obj">.
426 %************************************************************************
428 <sect2>Avoiding monads
429 <label id="glasgow-avoiding-monads">
431 <nidx>C calls to `pure C'</nidx>
432 <nidx>unsafePerformIO</nidx>
434 %************************************************************************
436 The @_ccall_@ construct is part of the @IO@ monad because 9 out of 10
437 uses will be to call imperative functions with side effects such as
438 @printf@. Use of the monad ensures that these operations happen in a
439 predictable order in spite of laziness and compiler optimisations.
441 To avoid having to be in the monad to call a C function, it is
442 possible to use @unsafePerformIO@, which is available from the
443 @IOExts@ module. There are three situations where one might like to
444 call a C function from outside the IO world:
448 Calling a function with no side-effects:
450 atan2d :: Double -> Double -> Double
451 atan2d y x = unsafePerformIO (_ccall_ atan2d y x)
453 sincosd :: Double -> (Double, Double)
454 sincosd x = unsafePerformIO $ do
455 da <- newDoubleArray (0, 1)
456 _casm_ ``sincosd( %0, &((double *)%1[0]), &((double *)%1[1]) );'' x da
457 s <- readDoubleArray da 0
458 c <- readDoubleArray da 1
462 <item> Calling a set of functions which have side-effects but which can
463 be used in a purely functional manner.
465 For example, an imperative implementation of a purely functional
466 lookup-table might be accessed using the following functions.
470 update :: EFS x -> Int -> x -> EFS x
471 lookup :: EFS a -> Int -> a
473 empty = unsafePerformIO (_ccall_ emptyEFS)
475 update a i x = unsafePerformIO $
476 makeStablePtr x >>= \ stable_x ->
477 _ccall_ updateEFS a i stable_x
479 lookup a i = unsafePerformIO $
480 _ccall_ lookupEFS a i >>= \ stable_x ->
481 deRefStablePtr stable_x
484 You will almost always want to use @ForeignObj@s with this.
486 <item> Calling a side-effecting function even though the results will
487 be unpredictable. For example the @trace@ function is defined by:
490 trace :: String -> a -> a
493 ((_ccall_ PreTraceHook sTDERR{-msg-}):: IO ()) >>
494 fputs sTDERR string >>
495 ((_ccall_ PostTraceHook sTDERR{-msg-}):: IO ()) >>
498 sTDERR = (``stderr'' :: Addr)
501 (This kind of use is not highly recommended --- it is only really
502 useful in debugging code.)
505 %************************************************************************
507 <sect2>C-calling ``gotchas'' checklist
508 <label id="ccall-gotchas">
510 <nidx>C call dangers</nidx>
512 %************************************************************************
514 And some advice, too.
517 <item> For modules that use @_ccall_@s, etc., compile with
518 @-fvia-C@.<nidx>-fvia-C option</nidx> You don't have to, but you should.
520 Also, use the @-#include "prototypes.h"@ flag (hack) to inform the C
521 compiler of the fully-prototyped types of all the C functions you
522 call. (Section <ref name="Using function headers"
523 id="glasgow-foreign-headers"> says more about this...)
525 This scheme is the <em>only</em> way that you will get <em>any</em>
526 typechecking of your @_ccall_@s. (It shouldn't be that way, but...).
527 GHC will pass the flag @-Wimplicit@ to gcc so that you'll get warnings
528 if any @_ccall_@ed functions have no prototypes.
531 Try to avoid @_ccall_@s to C~functions that take @float@
532 arguments or return @float@ results. Reason: if you do, you will
533 become entangled in (ANSI?) C's rules for when arguments/results are
534 promoted to @doubles@. It's a nightmare and just not worth it.
535 Use @doubles@ if possible.
537 If you do use @floats@, check and re-check that the right thing is
538 happening. Perhaps compile with @-keep-hc-file-too@ and look at
539 the intermediate C (@.hc@ file).
541 <item> The compiler uses two non-standard type-classes when
542 type-checking the arguments and results of @_ccall_@: the arguments
543 (respectively result) of @_ccall_@ must be instances of the class
544 @CCallable@ (respectively @CReturnable@). Both classes may be
545 imported from the module @CCall@, but this should only be
546 necessary if you want to define a new instance. (Neither class
547 defines any methods --- their only function is to keep the
550 The type checker must be able to figure out just which of the
551 C-callable/returnable types is being used. If it can't, you have to
552 add type signatures. For example,
558 is not good enough, because the compiler can't work out what type @x@
559 is, nor what type the @_ccall_@ returns. You have to write, say:
562 f :: Int -> IO Double
566 This table summarises the standard instances of these classes.
568 % ToDo: check this table against implementation!
571 <bf>Type</bf> |<bf>CCallable</bf>|<bf>CReturnable</bf> | <bf>Which is probably...</bf> @@
573 @Char@ | Yes | Yes | @unsigned char@ @@
574 @Int@ | Yes | Yes | @long int@ @@
575 @Word@ | Yes | Yes | @unsigned long int@ @@
576 @Addr@ | Yes | Yes | @void *@ @@
577 @Float@ | Yes | Yes | @float@ @@
578 @Double@ | Yes | Yes | @double@ @@
579 @()@ | No | Yes | @void@ @@
580 @[Char]@ | Yes | No | @char *@ (null-terminated) @@
582 @Array@ | Yes | No | @unsigned long *@ @@
583 @ByteArray@ | Yes | No | @unsigned long *@ @@
584 @MutableArray@ | Yes | No | @unsigned long *@ @@
585 @MutableByteArray@ | Yes | No | @unsigned long *@ @@
587 @State@ | Yes | Yes | nothing!@@
589 @StablePtr@ | Yes | Yes | @unsigned long *@ @@
590 @ForeignObjs@ | Yes | Yes | see later @@
593 Actually, the @Word@ type is defined as being the same size as a
594 pointer on the target architecture, which is <em>probably</em>
597 The brave and careful programmer can add their own instances of these
598 classes for the following types:
602 A <em>boxed-primitive</em> type may be made an instance of both
603 @CCallable@ and @CReturnable@.
605 A boxed primitive type is any data type with a
606 single unary constructor with a single primitive argument. For
607 example, the following are all boxed primitive types:
612 data XDisplay = XDisplay Addr#
613 data EFS a = EFS# ForeignObj#
617 instance CCallable (EFS a)
618 instance CReturnable (EFS a)
621 <item> Any datatype with a single nullary constructor may be made an
622 instance of @CReturnable@. For example:
626 instance CReturnable MyVoid
629 <item> As at version 2.09, @String@ (i.e., @[Char]@) is still
630 not a @CReturnable@ type.
632 Also, the now-builtin type @PackedString@ is neither
633 @CCallable@ nor @CReturnable@. (But there are functions in
634 the PackedString interface to let you get at the necessary bits...)
637 <item> The code-generator will complain if you attempt to use @%r@ in
638 a @_casm_@ whose result type is @IO ()@; or if you don't use @%r@
639 <em>precisely</em> once for any other result type. These messages are
640 supposed to be helpful and catch bugs---please tell us if they wreck
643 <item> If you call out to C code which may trigger the Haskell garbage
644 collector or create new threads (examples of this later...), then you
645 must use the @_ccall_GC_@<nidx>_ccall_GC_ primitive</nidx> or
646 @_casm_GC_@<nidx>_casm_GC_ primitive</nidx> variant of C-calls. (This
647 does not work with the native code generator - use @\fvia-C@.) This
648 stuff is hairy with a capital H! </itemize>
650 <sect1> Multi-parameter type classes
651 <label id="multi-param-type-classes">
654 This section documents GHC's implementation of multi-paramter type
655 classes. There's lots of background in the paper <url name="Type
656 classes: exploring the design space"
657 url="http://www.dcs.gla.ac.uk/~simonpj/multi.ps.gz"> (Simon Peyton
658 Jones, Mark Jones, Erik Meijer).
660 I'd like to thank people who reported shorcomings in the GHC 3.02
661 implementation. Our default decisions were all conservative ones, and
662 the experience of these heroic pioneers has given useful concrete
663 examples to support several generalisations. (These appear below as
664 design choices not implemented in 3.02.)
666 I've discussed these notes with Mark Jones, and I believe that Hugs
667 will migrate towards the same design choices as I outline here.
668 Thanks to him, and to many others who have offered very useful
674 There are the following restrictions on the form of a qualified
678 forall tv1..tvn (c1, ...,cn) => type
681 (Here, I write the "foralls" explicitly, although the Haskell source
682 language omits them; in Haskell 1.4, all the free type variables of an
683 explicit source-language type signature are universally quantified,
684 except for the class type variables in a class declaration. However,
685 in GHC, you can give the foralls if you want. See Section <ref
686 name="Explicit universal quantification"
687 id="universal-quantification">).
691 <item> <bf>Each universally quantified type variable
692 @tvi@ must be mentioned (i.e. appear free) in @type@</bf>.
694 The reason for this is that a value with a type that does not obey
695 this restriction could not be used without introducing
696 ambiguity. Here, for example, is an illegal type:
699 forall a. Eq a => Int
702 When a value with this type was used, the constraint <tt>Eq tv</tt>
703 would be introduced where <tt>tv</tt> is a fresh type variable, and
704 (in the dictionary-translation implementation) the value would be
705 applied to a dictionary for <tt>Eq tv</tt>. The difficulty is that we
706 can never know which instance of <tt>Eq</tt> to use because we never
707 get any more information about <tt>tv</tt>.
709 <item> <bf>Every constraint @ci@ must mention at least one of the
710 universally quantified type variables @tvi@</bf>.
712 For example, this type is OK because <tt>C a b</tt> mentions the
713 universally quantified type variable <tt>b</tt>:
716 forall a. C a b => burble
719 The next type is illegal because the constraint <tt>Eq b</tt> does not
723 forall a. Eq b => burble
726 The reason for this restriction is milder than the other one. The
727 excluded types are never useful or necessary (because the offending
728 context doesn't need to be witnessed at this point; it can be floated
729 out). Furthermore, floating them out increases sharing. Lastly,
730 excluding them is a conservative choice; it leaves a patch of
731 territory free in case we need it later.
735 These restrictions apply to all types, whether declared in a type signature
738 Unlike Haskell 1.4, constraints in types do <bf>not</bf> have to be of
739 the form <em>(class type-variables)</em>. Thus, these type signatures
743 f :: Eq (m a) => [m a] -> [m a]
747 This choice recovers principal types, a property that Haskell 1.4 does not have.
749 <sect2>Class declarations
754 <item> <bf>Multi-parameter type classes are permitted</bf>. For example:
757 class Collection c a where
758 union :: c a -> c a -> c a
763 <item> <bf>The class hierarchy must be acyclic</bf>. However, the definition
764 of "acyclic" involves only the superclass relationships. For example,
769 op :: D b => a -> b -> b
772 class C a => D a where { ... }
775 Here, <tt>C</tt> is a superclass of <tt>D</tt>, but it's OK for a
776 class operation <tt>op</tt> of <tt>C</tt> to mention <tt>D</tt>. (It
777 would not be OK for <tt>D</tt> to be a superclass of <tt>C</tt>.)
779 <item> <bf>There are no restrictions on the context in a class declaration
780 (which introduces superclasses), except that the class hierarchy must
781 be acyclic</bf>. So these class declarations are OK:
784 class Functor (m k) => FiniteMap m k where
787 class (Monad m, Monad (t m)) => Transform t m where
788 lift :: m a -> (t m) a
791 <item> <bf>In the signature of a class operation, every constraint
792 must mention at least one type variable that is not a class type
798 class Collection c a where
799 mapC :: Collection c b => (a->b) -> c a -> c b
802 is OK because the constraint <tt>(Collection a b)</tt> mentions
803 <tt>b</tt>, even though it also mentions the class variable
804 <tt>a</tt>. On the other hand:
808 op :: Eq a => (a,b) -> (a,b)
811 is not OK because the constraint <tt>(Eq a)</tt> mentions on the class
812 type variable <tt>a</tt>, but not <tt>b</tt>. However, any such
813 example is easily fixed by moving the offending context up to the
817 class Eq a => C a where
821 A yet more relaxed rule would allow the context of a class-op signature
822 to mention only class type variables. However, that conflicts with
823 Rule 1(b) for types above.
825 <item> <bf>The type of each class operation must mention <em/all/ of
826 the class type variables</bf>. For example:
831 insert :: s -> a -> s
834 is not OK, because the type of <tt>empty</tt> doesn't mention
835 <tt>a</tt>. This rule is a consequence of Rule 1(a), above, for
836 types, and has the same motivation.
838 Sometimes, offending class declarations exhibit misunderstandings. For
839 example, <tt>Coll</tt> might be rewritten
844 insert :: s a -> a -> s a
847 which makes the connection between the type of a collection of
848 <tt>a</tt>'s (namely <tt>(s a)</tt>) and the element type <tt>a</tt>.
849 Occasionally this really doesn't work, in which case you can split the
856 class CollE s => Coll s a where
857 insert :: s -> a -> s
862 <sect2>Instance declarations
867 <item> <bf>Instance declarations may not overlap</bf>. The two instance
871 instance context1 => C type1 where ...
872 instance context2 => C type2 where ...
875 "overlap" if @type1@ and @type2@ unify
877 However, if you give the command line option
878 @-fallow-overlapping-instances@<nidx>-fallow-overlapping-instances
879 option</nidx> then two overlapping instance declarations are permitted
883 <item> EITHER @type1@ and @type2@ do not unify
884 <item> OR @type2@ is a substitution instance of @type1@
885 (but not identical to @type1@)
889 Notice that these rules
892 <item> make it clear which instance decl to use
893 (pick the most specific one that matches)
895 <item> do not mention the contexts @context1@, @context2@
896 Reason: you can pick which instance decl
897 "matches" based on the type.
900 Regrettably, GHC doesn't guarantee to detect overlapping instance
901 declarations if they appear in different modules. GHC can "see" the
902 instance declarations in the transitive closure of all the modules
903 imported by the one being compiled, so it can "see" all instance decls
904 when it is compiling <tt>Main</tt>. However, it currently chooses not
905 to look at ones that can't possibly be of use in the module currently
906 being compiled, in the interests of efficiency. (Perhaps we should
907 change that decision, at least for <tt>Main</tt>.)
909 <item> <bf>There are no restrictions on the type in an instance
910 <em/head/, except that at least one must not be a type variable</bf>.
911 The instance "head" is the bit after the "=>" in an instance decl. For
912 example, these are OK:
915 instance C Int a where ...
917 instance D (Int, Int) where ...
919 instance E [[a]] where ...
922 Note that instance heads <bf>may</bf> contain repeated type variables.
923 For example, this is OK:
926 instance Stateful (ST s) (MutVar s) where ...
929 The "at least one not a type variable" restriction is to ensure that
930 context reduction terminates: each reduction step removes one type
931 constructor. For example, the following would make the type checker
932 loop if it wasn't excluded:
935 instance C a => C a where ...
938 There are two situations in which the rule is a bit of a pain. First,
939 if one allows overlapping instance declarations then it's quite
940 convenient to have a "default instance" declaration that applies if
941 something more specific does not:
948 Second, sometimes you might want to use the following to get the
949 effect of a "class synonym":
952 class (C1 a, C2 a, C3 a) => C a where { }
954 instance (C1 a, C2 a, C3 a) => C a where { }
957 This allows you to write shorter signatures:
966 f :: (C1 a, C2 a, C3 a) => ...
969 I'm on the lookout for a simple rule that preserves decidability while
970 allowing these idioms. The experimental flag
971 @-fallow-undecidable-instances@<nidx>-fallow-undecidable-instances
972 option</nidx> lifts this restriction, allowing all the types in an
973 instance head to be type variables.
975 <item> <bf>Unlike Haskell 1.4, instance heads may use type
976 synonyms</bf>. As always, using a type synonym is just shorthand for
977 writing the RHS of the type synonym definition. For example:
980 type Point = (Int,Int)
981 instance C Point where ...
982 instance C [Point] where ...
985 is legal. However, if you added
988 instance C (Int,Int) where ...
991 as well, then the compiler will complain about the overlapping
992 (actually, identical) instance declarations. As always, type synonyms
993 must be fully applied. You cannot, for example, write:
997 instance Monad P where ...
1000 This design decision is independent of all the others, and easily
1001 reversed, but it makes sense to me.
1003 <item><bf>The types in an instance-declaration <em/context/ must all
1004 be type variables</bf>. Thus
1007 instance C a b => Eq (a,b) where ...
1013 instance C Int b => Foo b where ...
1016 is not OK. Again, the intent here is to make sure that context
1017 reduction terminates.
1019 Voluminous correspondence on the Haskell mailing list has convinced me
1020 that it's worth experimenting with a more liberal rule. If you use
1021 the flag <tt>-fallow-undecidable-instances</tt> you can use arbitrary
1022 types in an instance context. Termination is ensured by having a
1023 fixed-depth recursion stack. If you exceed the stack depth you get a
1024 sort of backtrace, and the opportunity to increase the stack depth
1025 with <tt>-fcontext-stack</tt><em/N/.
1029 % -----------------------------------------------------------------------------
1030 <sect1>Explicit universal quantification
1031 <label id="universal-quantification">
1034 GHC now allows you to write explicitly quantified types. GHC's
1035 syntax for this now agrees with Hugs's, namely:
1038 forall a b. (Ord a, Eq b) => a -> b -> a
1041 The context is, of course, optional. You can't use <tt>forall</tt> as
1042 a type variable any more!
1044 Haskell type signatures are implicitly quantified. The <tt>forall</tt>
1045 allows us to say exactly what this means. For example:
1054 g :: forall b. (b -> b)
1057 The two are treated identically.
1059 <sect2>Universally-quantified data type fields
1063 In a <tt>data</tt> or <tt>newtype</tt> declaration one can quantify
1064 the types of the constructor arguments. Here are several examples:
1067 data T a = T1 (forall b. b -> b -> b) a
1069 data MonadT m = MkMonad { return :: forall a. a -> m a,
1070 bind :: forall a b. m a -> (a -> m b) -> m b
1073 newtype Swizzle = MkSwizzle (Ord a => [a] -> [a])
1076 The constructors now have so-called <em/rank 2/ polymorphic
1077 types, in which there is a for-all in the argument types.:
1080 T1 :: forall a. (forall b. b -> b -> b) -> a -> T1 a
1081 MkMonad :: forall m. (forall a. a -> m a)
1082 -> (forall a b. m a -> (a -> m b) -> m b)
1084 MkSwizzle :: (Ord a => [a] -> [a]) -> Swizzle
1087 Notice that you don't need to use a <tt>forall</tt> if there's an
1088 explicit context. For example in the first argument of the
1089 constructor <tt>MkSwizzle</tt>, an implicit "<tt>forall a.</tt>" is
1090 prefixed to the argument type. The implicit <tt>forall</tt>
1091 quantifies all type variables that are not already in scope, and are
1092 mentioned in the type quantified over.
1094 As for type signatures, implicit quantification happens for non-overloaded
1095 types too. So if you write this:
1097 data T a = MkT (Either a b) (b -> b)
1099 it's just as if you had written this:
1101 data T a = MkT (forall b. Either a b) (forall b. b -> b)
1103 That is, since the type variable <tt>b</tt> isn't in scope, it's
1104 implicitly universally quantified. (Arguably, it would be better
1105 to <em>require</em> explicit quantification on constructor arguments
1106 where that is what is wanted. Feedback welcomed.)
1108 <sect2> Construction
1111 You construct values of types <tt>T1, MonadT, Swizzle</tt> by applying
1112 the constructor to suitable values, just as usual. For example,
1115 (T1 (\xy->x) 3) :: T Int
1117 (MkSwizzle sort) :: Swizzle
1118 (MkSwizzle reverse) :: Swizzle
1125 MkMonad r b) :: MonadT Maybe
1128 The type of the argument can, as usual, be more general than the type
1129 required, as <tt>(MkSwizzle reverse)</tt> shows. (<tt>reverse</tt>
1130 does not need the <tt>Ord</tt> constraint.)
1132 <sect2>Pattern matching
1135 When you use pattern matching, the bound variables may now have
1136 polymorphic types. For example:
1139 f :: T a -> a -> (a, Char)
1140 f (T1 f k) x = (f k x, f 'c' 'd')
1142 g :: (Ord a, Ord b) => Swizzle -> [a] -> (a -> b) -> [b]
1143 g (MkSwizzle s) xs f = s (map f (s xs))
1145 h :: MonadT m -> [m a] -> m [a]
1146 h m [] = return m []
1147 h m (x:xs) = bind m x $ \y ->
1148 bind m (h m xs) $ \ys ->
1152 In the function <tt>h</tt> we use the record selectors <tt>return</tt>
1153 and <tt>bind</tt> to extract the polymorphic bind and return functions
1154 from the <tt>MonadT</tt> data structure, rather than using pattern
1157 <sect2>The partial-application restriction
1160 There is really only one way in which data structures with polymorphic
1161 components might surprise you: you must not partially apply them.
1162 For example, this is illegal:
1165 map MkSwizzle [sort, reverse]
1168 The restriction is this: <em>every subexpression of the program must
1169 have a type that has no for-alls, except that in a function
1170 application (f e1 ... en) the partial applications are not subject to
1171 this rule</em>. The restriction makes type inference feasible.
1173 In the illegal example, the sub-expression <tt>MkSwizzle</tt> has the
1174 polymorphic type <tt>(Ord b => [b] -> [b]) -> Swizzle</tt> and is not
1175 a sub-expression of an enclosing application. On the other hand, this
1179 map (T1 (\a b -> a)) [1,2,3]
1182 even though it involves a partial application of <tt>T1</tt>, because
1183 the sub-expression <tt>T1 (\a b -> a)</tt> has type <tt>Int -> T
1186 <sect2>Type signatures
1190 Once you have data constructors with universally-quantified fields, or
1191 constants such as <tt>runST</tt> that have rank-2 types, it isn't long
1192 before you discover that you need more! Consider:
1195 mkTs f x y = [T1 f x, T1 f y]
1198 <tt>mkTs</tt> is a fuction that constructs some values of type
1199 <tt>T</tt>, using some pieces passed to it. The trouble is that since
1200 <tt>f</tt> is a function argument, Haskell assumes that it is
1201 monomorphic, so we'll get a type error when applying <tt>T1</tt> to
1202 it. This is a rather silly example, but the problem really bites in
1203 practice. Lots of people trip over the fact that you can't make
1204 "wrappers functions" for <tt>runST</tt> for exactly the same reason.
1205 In short, it is impossible to build abstractions around functions with
1208 The solution is fairly clear. We provide the ability to give a rank-2
1209 type signature for <em>ordinary</em> functions (not only data
1210 constructors), thus:
1213 mkTs :: (forall b. b -> b -> b) -> a -> [T a]
1214 mkTs f x y = [T1 f x, T1 f y]
1217 This type signature tells the compiler to attribute <tt>f</tt> with
1218 the polymorphic type <tt>(forall b. b -> b -> b)</tt> when type
1219 checking the body of <tt>mkTs</tt>, so now the application of
1220 <tt>T1</tt> is fine.
1222 There are two restrictions:
1225 <item> You can only define a rank 2 type, specified by the following
1229 rank2type ::= [forall tyvars .] [context =>] funty
1230 funty ::= ([forall tyvars .] [context =>] ty) -> funty
1232 ty ::= ...current Haskell monotype syntax...
1235 Informally, the universal quantification must all be right at the beginning,
1236 or at the top level of a function argument.
1238 <item> There is a restriction on the definition of a function whose
1239 type signature is a rank-2 type: the polymorphic arguments must be
1240 matched on the left hand side of the "<tt>=</tt>" sign. You can't
1241 define <tt>mkTs</tt> like this:
1244 mkTs :: (forall b. b -> b -> b) -> a -> [T a]
1245 mkTs = \ f x y -> [T1 f x, T1 f y]
1249 The same partial-application rule applies to ordinary functions with
1250 rank-2 types as applied to data constructors.
1254 % -----------------------------------------------------------------------------
1255 <sect1>Existentially quantified data constructors
1256 <label id="existential-quantification">
1259 The idea of using existential quantification in data type declarations
1260 was suggested by Laufer (I believe, thought doubtless someone will
1261 correct me), and implemented in Hope+. It's been in Lennart
1262 Augustsson's <tt>hbc</tt> Haskell compiler for several years, and
1263 proved very useful. Here's the idea. Consider the declaration:
1266 data Foo = forall a. MkFoo a (a -> Bool)
1270 The data type <tt>Foo</tt> has two constructors with types:
1273 MkFoo :: forall a. a -> (a -> Bool) -> Foo
1277 Notice that the type variable <tt>a</tt> in the type of <tt>MkFoo</tt>
1278 does not appear in the data type itself, which is plain <tt>Foo</tt>.
1279 For example, the following expression is fine:
1282 [MkFoo 3 even, MkFoo 'c' isUpper] :: [Foo]
1285 Here, <tt>(MkFoo 3 even)</tt> packages an integer with a function
1286 <tt>even</tt> that maps an integer to <tt>Bool</tt>; and <tt>MkFoo 'c'
1287 isUpper</tt> packages a character with a compatible function. These
1288 two things are each of type <tt>Foo</tt> and can be put in a list.
1290 What can we do with a value of type <tt>Foo</tt>?. In particular,
1291 what happens when we pattern-match on <tt>MkFoo</tt>?
1294 f (MkFoo val fn) = ???
1297 Since all we know about <tt>val</tt> and <tt>fn</tt> is that they
1298 are compatible, the only (useful) thing we can do with them is to
1299 apply <tt>fn</tt> to <tt>val</tt> to get a boolean. For example:
1303 f (MkFoo val fn) = fn val
1306 What this allows us to do is to package heterogenous values
1307 together with a bunch of functions that manipulate them, and then treat
1308 that collection of packages in a uniform manner. You can express
1309 quite a bit of object-oriented-like programming this way.
1311 <sect2>Why existential?
1312 <label id="existential">
1315 What has this to do with <em>existential</em> quantification?
1316 Simply that <tt>MkFoo</tt> has the (nearly) isomorphic type
1319 MkFoo :: (exists a . (a, a -> Bool)) -> Foo
1322 But Haskell programmers can safely think of the ordinary
1323 <em>universally</em> quantified type given above, thereby avoiding
1324 adding a new existential quantification construct.
1329 An easy extension (implemented in <tt>hbc</tt>) is to allow
1330 arbitrary contexts before the constructor. For example:
1333 data Baz = forall a. Eq a => Baz1 a a
1334 | forall b. Show b => Baz2 b (b -> b)
1337 The two constructors have the types you'd expect:
1340 Baz1 :: forall a. Eq a => a -> a -> Baz
1341 Baz2 :: forall b. Show b => b -> (b -> b) -> Baz
1344 But when pattern matching on <tt>Baz1</tt> the matched values can be compared
1345 for equality, and when pattern matching on <tt>Baz2</tt> the first matched
1346 value can be converted to a string (as well as applying the function to it).
1347 So this program is legal:
1351 f (Baz1 p q) | p == q = "Yes"
1353 f (Baz1 v fn) = show (fn v)
1356 Operationally, in a dictionary-passing implementation, the
1357 constructors <tt>Baz1</tt> and <tt>Baz2</tt> must store the
1358 dictionaries for <tt>Eq</tt> and <tt>Show</tt> respectively, and
1359 extract it on pattern matching.
1361 Notice the way that the syntax fits smoothly with that used for
1362 universal quantification earlier.
1367 There are several restrictions on the ways in which existentially-quantified
1368 constructors can be use.
1372 <item> When pattern matching, each pattern match introduces a new,
1373 distinct, type for each existential type variable. These types cannot
1374 be unified with any other type, nor can they escape from the scope of
1375 the pattern match. For example, these fragments are incorrect:
1381 Here, the type bound by <tt>MkFoo</tt> "escapes", because <tt>a</tt>
1382 is the result of <tt>f1</tt>. One way to see why this is wrong is to
1383 ask what type <tt>f1</tt> has:
1386 f1 :: Foo -> a -- Weird!
1389 What is this "<tt>a</tt>" in the result type? Clearly we don't mean
1393 f1 :: forall a. Foo -> a -- Wrong!
1396 The original program is just plain wrong. Here's another sort of error
1399 f2 (Baz1 a b) (Baz1 p q) = a==q
1402 It's ok to say <tt>a==b</tt> or <tt>p==q</tt>, but
1403 <tt>a==q</tt> is wrong because it equates the two distinct types arising
1404 from the two <tt>Baz1</tt> constructors.
1407 <item>You can't pattern-match on an existentially quantified
1408 constructor in a <tt>let</tt> or <tt>where</tt> group of
1409 bindings. So this is illegal:
1412 f3 x = a==b where { Baz1 a b = x }
1415 You can only pattern-match
1416 on an existentially-quantified constructor in a <tt>case</tt> expression or
1417 in the patterns of a function definition.
1419 The reason for this restriction is really an implementation one.
1420 Type-checking binding groups is already a nightmare without
1421 existentials complicating the picture. Also an existential pattern
1422 binding at the top level of a module doesn't make sense, because it's
1423 not clear how to prevent the existentially-quantified type "escaping".
1424 So for now, there's a simple-to-state restriction. We'll see how
1427 <item>You can't use existential quantification for <tt>newtype</tt>
1428 declarations. So this is illegal:
1431 newtype T = forall a. Ord a => MkT a
1434 Reason: a value of type <tt>T</tt> must be represented as a pair
1435 of a dictionary for <tt>Ord t</tt> and a value of type <tt>t</tt>.
1436 That contradicts the idea that <tt>newtype</tt> should have no
1437 concrete representation. You can get just the same efficiency and effect
1438 by using <tt>data</tt> instead of <tt>newtype</tt>. If there is no
1439 overloading involved, then there is more of a case for allowing
1440 an existentially-quantified <tt>newtype</tt>, because the <tt>data</tt>
1441 because the <tt>data</tt> version does carry an implementation cost,
1442 but single-field existentially quantified constructors aren't much
1443 use. So the simple restriction (no existential stuff on <tt>newtype</tt>)
1444 stands, unless there are convincing reasons to change it.