docs/users_guide/glasgow_exts.xml

   1 <?xml version="1.0" encoding="iso-8859-1"?>
   2 <para>
   3 <indexterm><primary>language, GHC</primary></indexterm>
   4 <indexterm><primary>extensions, GHC</primary></indexterm>
   5 As with all known Haskell systems, GHC implements some extensions to
   6 the language.  They are all enabled by options; by default GHC
   7 understands only plain Haskell 98.
   8 </para>
   9
  10 <para>
  11 Some of the Glasgow extensions serve to give you access to the
  12 underlying facilities with which we implement Haskell.  Thus, you can
  13 get at the Raw Iron, if you are willing to write some non-portable
  14 code at a more primitive level.  You need not be &ldquo;stuck&rdquo;
  15 on performance because of the implementation costs of Haskell's
  16 &ldquo;high-level&rdquo; features&mdash;you can always code
  17 &ldquo;under&rdquo; them.  In an extreme case, you can write all your
  18 time-critical code in C, and then just glue it together with Haskell!
  19 </para>
  20
  21 <para>
  22 Before you get too carried away working at the lowest level (e.g.,
  23 sloshing <literal>MutableByteArray&num;</literal>s around your
  24 program), you may wish to check if there are libraries that provide a
  25 &ldquo;Haskellised veneer&rdquo; over the features you want.  The
  26 separate <ulink url="../libraries/index.html">libraries
  27 documentation</ulink> describes all the libraries that come with GHC.
  28 </para>
  29
  30 <!-- LANGUAGE OPTIONS -->
  31   <sect1 id="options-language">
  32     <title>Language options</title>
  33
  34     <indexterm><primary>language</primary><secondary>option</secondary>
  35     </indexterm>
  36     <indexterm><primary>options</primary><secondary>language</secondary>
  37     </indexterm>
  38     <indexterm><primary>extensions</primary><secondary>options controlling</secondary>
  39     </indexterm>
  40
  41     <para>The language option flag control what variation of the language are
  42     permitted.  Leaving out all of them gives you standard Haskell
  43     98.</para>
  44
  45     <para>Generally speaking, all the language options are introduced by "<option>-X</option>",
  46     e.g. <option>-XTemplateHaskell</option>.
  47     </para>
  48
  49    <para> All the language options can be turned off by using the prefix "<option>No</option>";
  50       e.g. "<option>-XNoTemplateHaskell</option>".</para>
  51
  52    <para> Language options recognised by Cabal can also be enabled using the <literal>LANGUAGE</literal> pragma,
  53    thus <literal>{-# LANGUAGE TemplateHaskell #-}</literal> (see <xref linkend="language-pragma"/>>). </para>
  54
  55     <para>The flag <option>-fglasgow-exts</option>
  56           <indexterm><primary><option>-fglasgow-exts</option></primary></indexterm>
  57           is equivalent to enabling the following extensions:
  58           <option>-XPrintExplicitForalls</option>,
  59           <option>-XForeignFunctionInterface</option>,
  60           <option>-XUnliftedFFITypes</option>,
  61           <option>-XGADTs</option>,
  62           <option>-XImplicitParams</option>,
  63           <option>-XScopedTypeVariables</option>,
  64           <option>-XUnboxedTuples</option>,
  65           <option>-XTypeSynonymInstances</option>,
  66           <option>-XStandaloneDeriving</option>,
  67           <option>-XDeriveDataTypeable</option>,
  68           <option>-XFlexibleContexts</option>,
  69           <option>-XFlexibleInstances</option>,
  70           <option>-XConstrainedClassMethods</option>,
  71           <option>-XMultiParamTypeClasses</option>,
  72           <option>-XFunctionalDependencies</option>,
  73           <option>-XMagicHash</option>,
  74           <option>-XPolymorphicComponents</option>,
  75           <option>-XExistentialQuantification</option>,
  76           <option>-XUnicodeSyntax</option>,
  77           <option>-XPostfixOperators</option>,
  78           <option>-XPatternGuards</option>,
  79           <option>-XLiberalTypeSynonyms</option>,
  80           <option>-XRankNTypes</option>,
  81           <option>-XImpredicativeTypes</option>,
  82           <option>-XTypeOperators</option>,
  83           <option>-XRecursiveDo</option>,
  84           <option>-XParallelListComp</option>,
  85           <option>-XEmptyDataDecls</option>,
  86           <option>-XKindSignatures</option>,
  87           <option>-XGeneralizedNewtypeDeriving</option>,
  88           <option>-XTypeFamilies</option>.
  89             Enabling these options is the <emphasis>only</emphasis>
  90             effect of <options>-fglasgow-exts</options>.
  91           We are trying to move away from this portmanteau flag,
  92           and towards enabling features individually.</para>
  93
  94   </sect1>
  95
  96 <!-- UNBOXED TYPES AND PRIMITIVE OPERATIONS -->
  97 <sect1 id="primitives">
  98   <title>Unboxed types and primitive operations</title>
  99
 100 <para>GHC is built on a raft of primitive data types and operations;
 101 "primitive" in the sense that they cannot be defined in Haskell itself.
 102 While you really can use this stuff to write fast code,
 103   we generally find it a lot less painful, and more satisfying in the
 104   long run, to use higher-level language features and libraries.  With
 105   any luck, the code you write will be optimised to the efficient
 106   unboxed version in any case.  And if it isn't, we'd like to know
 107   about it.</para>
 108
 109 <para>All these primitive data types and operations are exported by the
 110 library <literal>GHC.Prim</literal>, for which there is
 111 <ulink url="../libraries/base/GHC.Prim.html">detailed online documentation</ulink>.
 112 (This documentation is generated from the file <filename>compiler/prelude/primops.txt.pp</filename>.)
 113 </para>
 114 <para>
 115 If you want to mention any of the primitive data types or operations in your
 116 program, you must first import <literal>GHC.Prim</literal> to bring them
 117 into scope.  Many of them have names ending in "&num;", and to mention such
 118 names you need the <option>-XMagicHash</option> extension (<xref linkend="magic-hash"/>).
 119 </para>
 120
 121 <para>The primops make extensive use of <link linkend="glasgow-unboxed">unboxed types</link>
 122 and <link linkend="unboxed-tuples">unboxed tuples</link>, which
 123 we briefly summarise here. </para>
 124
 125 <sect2 id="glasgow-unboxed">
 126 <title>Unboxed types
 127 </title>
 128
 129 <para>
 130 <indexterm><primary>Unboxed types (Glasgow extension)</primary></indexterm>
 131 </para>
 132
 133 <para>Most types in GHC are <firstterm>boxed</firstterm>, which means
 134 that values of that type are represented by a pointer to a heap
 135 object.  The representation of a Haskell <literal>Int</literal>, for
 136 example, is a two-word heap object.  An <firstterm>unboxed</firstterm>
 137 type, however, is represented by the value itself, no pointers or heap
 138 allocation are involved.
 139 </para>
 140
 141 <para>
 142 Unboxed types correspond to the &ldquo;raw machine&rdquo; types you
 143 would use in C: <literal>Int&num;</literal> (long int),
 144 <literal>Double&num;</literal> (double), <literal>Addr&num;</literal>
 145 (void *), etc.  The <emphasis>primitive operations</emphasis>
 146 (PrimOps) on these types are what you might expect; e.g.,
 147 <literal>(+&num;)</literal> is addition on
 148 <literal>Int&num;</literal>s, and is the machine-addition that we all
 149 know and love&mdash;usually one instruction.
 150 </para>
 151
 152 <para>
 153 Primitive (unboxed) types cannot be defined in Haskell, and are
 154 therefore built into the language and compiler.  Primitive types are
 155 always unlifted; that is, a value of a primitive type cannot be
 156 bottom.  We use the convention (but it is only a convention)
 157 that primitive types, values, and
 158 operations have a <literal>&num;</literal> suffix (see <xref linkend="magic-hash"/>).
 159 For some primitive types we have special syntax for literals, also
 160 described in the <link linkend="magic-hash">same section</link>.
 161 </para>
 162
 163 <para>
 164 Primitive values are often represented by a simple bit-pattern, such
 165 as <literal>Int&num;</literal>, <literal>Float&num;</literal>,
 166 <literal>Double&num;</literal>.  But this is not necessarily the case:
 167 a primitive value might be represented by a pointer to a
 168 heap-allocated object.  Examples include
 169 <literal>Array&num;</literal>, the type of primitive arrays.  A
 170 primitive array is heap-allocated because it is too big a value to fit
 171 in a register, and would be too expensive to copy around; in a sense,
 172 it is accidental that it is represented by a pointer.  If a pointer
 173 represents a primitive value, then it really does point to that value:
 174 no unevaluated thunks, no indirections&hellip;nothing can be at the
 175 other end of the pointer than the primitive value.
 176 A numerically-intensive program using unboxed types can
 177 go a <emphasis>lot</emphasis> faster than its &ldquo;standard&rdquo;
 178 counterpart&mdash;we saw a threefold speedup on one example.
 179 </para>
 180
 181 <para>
 182 There are some restrictions on the use of primitive types:
 183 <itemizedlist>
 184 <listitem><para>The main restriction
 185 is that you can't pass a primitive value to a polymorphic
 186 function or store one in a polymorphic data type.  This rules out
 187 things like <literal>[Int&num;]</literal> (i.e. lists of primitive
 188 integers).  The reason for this restriction is that polymorphic
 189 arguments and constructor fields are assumed to be pointers: if an
 190 unboxed integer is stored in one of these, the garbage collector would
 191 attempt to follow it, leading to unpredictable space leaks.  Or a
 192 <function>seq</function> operation on the polymorphic component may
 193 attempt to dereference the pointer, with disastrous results.  Even
 194 worse, the unboxed value might be larger than a pointer
 195 (<literal>Double&num;</literal> for instance).
 196 </para>
 197 </listitem>
 198 <listitem><para> You cannot define a newtype whose representation type
 199 (the argument type of the data constructor) is an unboxed type.  Thus,
 200 this is illegal:
 201 <programlisting>
 202   newtype A = MkA Int#
 203 </programlisting>
 204 </para></listitem>
 205 <listitem><para> You cannot bind a variable with an unboxed type
 206 in a <emphasis>top-level</emphasis> binding.
 207 </para></listitem>
 208 <listitem><para> You cannot bind a variable with an unboxed type
 209 in a <emphasis>recursive</emphasis> binding.
 210 </para></listitem>
 211 <listitem><para> You may bind unboxed variables in a (non-recursive,
 212 non-top-level) pattern binding, but any such variable causes the entire
 213 pattern-match
 214 to become strict.  For example:
 215 <programlisting>
 216   data Foo = Foo Int Int#
 217
 218   f x = let (Foo a b, w) = ..rhs.. in ..body..
 219 </programlisting>
 220 Since <literal>b</literal> has type <literal>Int#</literal>, the entire pattern
 221 match
 222 is strict, and the program behaves as if you had written
 223 <programlisting>
 224   data Foo = Foo Int Int#
 225
 226   f x = case ..rhs.. of { (Foo a b, w) -> ..body.. }
 227 </programlisting>
 228 </para>
 229 </listitem>
 230 </itemizedlist>
 231 </para>
 232
 233 </sect2>
 234
 235 <sect2 id="unboxed-tuples">
 236 <title>Unboxed Tuples
 237 </title>
 238
 239 <para>
 240 Unboxed tuples aren't really exported by <literal>GHC.Exts</literal>,
 241 they're available by default with <option>-fglasgow-exts</option>.  An
 242 unboxed tuple looks like this:
 243 </para>
 244
 245 <para>
 246
 247 <programlisting>
 248 (# e_1, ..., e_n #)
 249 </programlisting>
 250
 251 </para>
 252
 253 <para>
 254 where <literal>e&lowbar;1..e&lowbar;n</literal> are expressions of any
 255 type (primitive or non-primitive).  The type of an unboxed tuple looks
 256 the same.
 257 </para>
 258
 259 <para>
 260 Unboxed tuples are used for functions that need to return multiple
 261 values, but they avoid the heap allocation normally associated with
 262 using fully-fledged tuples.  When an unboxed tuple is returned, the
 263 components are put directly into registers or on the stack; the
 264 unboxed tuple itself does not have a composite representation.  Many
 265 of the primitive operations listed in <literal>primops.txt.pp</literal> return unboxed
 266 tuples.
 267 In particular, the <literal>IO</literal> and <literal>ST</literal> monads use unboxed
 268 tuples to avoid unnecessary allocation during sequences of operations.
 269 </para>
 270
 271 <para>
 272 There are some pretty stringent restrictions on the use of unboxed tuples:
 273 <itemizedlist>
 274 <listitem>
 275
 276 <para>
 277 Values of unboxed tuple types are subject to the same restrictions as
 278 other unboxed types; i.e. they may not be stored in polymorphic data
 279 structures or passed to polymorphic functions.
 280
 281 </para>
 282 </listitem>
 283 <listitem>
 284
 285 <para>
 286 No variable can have an unboxed tuple type, nor may a constructor or function
 287 argument have an unboxed tuple type.  The following are all illegal:
 288
 289
 290 <programlisting>
 291   data Foo = Foo (# Int, Int #)
 292
 293   f :: (# Int, Int #) -&#62; (# Int, Int #)
 294   f x = x
 295
 296   g :: (# Int, Int #) -&#62; Int
 297   g (# a,b #) = a
 298
 299   h x = let y = (# x,x #) in ...
 300 </programlisting>
 301 </para>
 302 </listitem>
 303 </itemizedlist>
 304 </para>
 305 <para>
 306 The typical use of unboxed tuples is simply to return multiple values,
 307 binding those multiple results with a <literal>case</literal> expression, thus:
 308 <programlisting>
 309   f x y = (# x+1, y-1 #)
 310   g x = case f x x of { (# a, b #) -&#62; a + b }
 311 </programlisting>
 312 You can have an unboxed tuple in a pattern binding, thus
 313 <programlisting>
 314   f x = let (# p,q #) = h x in ..body..
 315 </programlisting>
 316 If the types of <literal>p</literal> and <literal>q</literal> are not unboxed,
 317 the resulting binding is lazy like any other Haskell pattern binding.  The
 318 above example desugars like this:
 319 <programlisting>
 320   f x = let t = case h x o f{ (# p,q #) -> (p,q)
 321             p = fst t
 322             q = snd t
 323         in ..body..
 324 </programlisting>
 325 Indeed, the bindings can even be recursive.
 326 </para>
 327
 328 </sect2>
 329 </sect1>
 330
 331
 332 <!-- ====================== SYNTACTIC EXTENSIONS =======================  -->
 333
 334 <sect1 id="syntax-extns">
 335 <title>Syntactic extensions</title>
 336
 337     <sect2 id="magic-hash">
 338       <title>The magic hash</title>
 339       <para>The language extension <option>-XMagicHash</option> allows "&num;" as a
 340         postfix modifier to identifiers.  Thus, "x&num;" is a valid variable, and "T&num;" is
 341         a valid type constructor or data constructor.</para>
 342
 343       <para>The hash sign does not change sematics at all.  We tend to use variable
 344         names ending in "&num;" for unboxed values or types (e.g. <literal>Int&num;</literal>),
 345         but there is no requirement to do so; they are just plain ordinary variables.
 346         Nor does the <option>-XMagicHash</option> extension bring anything into scope.
 347         For example, to bring <literal>Int&num;</literal> into scope you must
 348         import <literal>GHC.Prim</literal> (see <xref linkend="primitives"/>);
 349         the <option>-XMagicHash</option> extension
 350         then allows you to <emphasis>refer</emphasis> to the <literal>Int&num;</literal>
 351         that is now in scope.</para>
 352       <para> The <option>-XMagicHash</option> also enables some new forms of literals (see <xref linkend="glasgow-unboxed"/>):
 353         <itemizedlist>
 354           <listitem><para> <literal>'x'&num;</literal> has type <literal>Char&num;</literal></para> </listitem>
 355           <listitem><para> <literal>&quot;foo&quot;&num;</literal> has type <literal>Addr&num;</literal></para> </listitem>
 356           <listitem><para> <literal>3&num;</literal> has type <literal>Int&num;</literal>. In general,
 357           any Haskell 98 integer lexeme followed by a <literal>&num;</literal> is an <literal>Int&num;</literal> literal, e.g.
 358             <literal>-0x3A&num;</literal> as well as <literal>32&num;</literal></para>.</listitem>
 359           <listitem><para> <literal>3&num;&num;</literal> has type <literal>Word&num;</literal>. In general,
 360           any non-negative Haskell 98 integer lexeme followed by <literal>&num;&num;</literal>
 361               is a <literal>Word&num;</literal>. </para> </listitem>
 362           <listitem><para> <literal>3.2&num;</literal> has type <literal>Float&num;</literal>.</para> </listitem>
 363           <listitem><para> <literal>3.2&num;&num;</literal> has type <literal>Double&num;</literal></para> </listitem>
 364           </itemizedlist>
 365       </para>
 366    </sect2>
 367
 368     <sect2>
 369       <title>New qualified operator syntax</title>
 370
 371       <para>A new syntax for referencing qualified operators is
 372         planned to be introduced by Haskell', and is enabled in GHC
 373         with
 374         the <option>-XNewQualifiedOperators</option><indexterm><primary><option>-XNewQualifiedOperators</option></primary></indexterm>
 375         option.  In the new syntax, the prefix form of a qualified
 376         operator is
 377         written <literal><replaceable>module</replaceable>.(<replaceable>symbol</replaceable>)</literal>
 378         (in Haskell 98 this would
 379         be <literal>(<replaceable>module</replaceable>.<replaceable>symbol</replaceable>)</literal>),
 380         and the infix form is
 381         written <literal>`<replaceable>module</replaceable>.(<replaceable>symbol</replaceable>)`</literal>
 382         (in Haskell 98 this would
 383         be <literal>`<replaceable>module</replaceable>.<replaceable>symbol</replaceable>`</literal>.
 384         For example:
 385 <programlisting>
 386   add x y = Prelude.(+) x y
 387   subtract y = (`Prelude.(-)` y)
 388 </programlisting>
 389         The new form of qualified operators is intended to regularise
 390         the syntax by eliminating odd cases
 391         like <literal>Prelude..</literal>.  For example,
 392         when <literal>NewQualifiedOperators</literal> is on, it is possible to
 393         write the enerated sequence <literal>[Monday..]</literal>
 394         without spaces, whereas in Haskell 98 this would be a
 395         reference to the operator &lsquo;<literal>.</literal>&lsquo;
 396         from module <literal>Monday</literal>.</para>
 397
 398       <para>When <option>-XNewQualifiedOperators</option> is on, the old Haskell
 399         98 syntax for qualified operators is not accepted, so this
 400         option may cause existing Haskell 98 code to break.</para>
 401
 402     </sect2>
 403
 404
 405     <!-- ====================== HIERARCHICAL MODULES =======================  -->
 406
 407
 408     <sect2 id="hierarchical-modules">
 409       <title>Hierarchical Modules</title>
 410
 411       <para>GHC supports a small extension to the syntax of module
 412       names: a module name is allowed to contain a dot
 413       <literal>&lsquo;.&rsquo;</literal>.  This is also known as the
 414       &ldquo;hierarchical module namespace&rdquo; extension, because
 415       it extends the normally flat Haskell module namespace into a
 416       more flexible hierarchy of modules.</para>
 417
 418       <para>This extension has very little impact on the language
 419       itself; modules names are <emphasis>always</emphasis> fully
 420       qualified, so you can just think of the fully qualified module
 421       name as <quote>the module name</quote>.  In particular, this
 422       means that the full module name must be given after the
 423       <literal>module</literal> keyword at the beginning of the
 424       module; for example, the module <literal>A.B.C</literal> must
 425       begin</para>
 426
 427 <programlisting>module A.B.C</programlisting>
 428
 429
 430       <para>It is a common strategy to use the <literal>as</literal>
 431       keyword to save some typing when using qualified names with
 432       hierarchical modules.  For example:</para>
 433
 434 <programlisting>
 435 import qualified Control.Monad.ST.Strict as ST
 436 </programlisting>
 437
 438       <para>For details on how GHC searches for source and interface
 439       files in the presence of hierarchical modules, see <xref
 440       linkend="search-path"/>.</para>
 441
 442       <para>GHC comes with a large collection of libraries arranged
 443       hierarchically; see the accompanying <ulink
 444       url="../libraries/index.html">library
 445       documentation</ulink>.  More libraries to install are available
 446       from <ulink
 447       url="http://hackage.haskell.org/packages/hackage.html">HackageDB</ulink>.</para>
 448     </sect2>
 449
 450     <!-- ====================== PATTERN GUARDS =======================  -->
 451
 452 <sect2 id="pattern-guards">
 453 <title>Pattern guards</title>
 454
 455 <para>
 456 <indexterm><primary>Pattern guards (Glasgow extension)</primary></indexterm>
 457 The discussion that follows is an abbreviated version of Simon Peyton Jones's original <ulink url="http://research.microsoft.com/~simonpj/Haskell/guards.html">proposal</ulink>. (Note that the proposal was written before pattern guards were implemented, so refers to them as unimplemented.)
 458 </para>
 459
 460 <para>
 461 Suppose we have an abstract data type of finite maps, with a
 462 lookup operation:
 463
 464 <programlisting>
 465 lookup :: FiniteMap -> Int -> Maybe Int
 466 </programlisting>
 467
 468 The lookup returns <function>Nothing</function> if the supplied key is not in the domain of the mapping, and <function>(Just v)</function> otherwise,
 469 where <varname>v</varname> is the value that the key maps to.  Now consider the following definition:
 470 </para>
 471
 472 <programlisting>
 473 clunky env var1 var2 | ok1 &amp;&amp; ok2 = val1 + val2
 474 | otherwise  = var1 + var2
 475 where
 476   m1 = lookup env var1
 477   m2 = lookup env var2
 478   ok1 = maybeToBool m1
 479   ok2 = maybeToBool m2
 480   val1 = expectJust m1
 481   val2 = expectJust m2
 482 </programlisting>
 483
 484 <para>
 485 The auxiliary functions are
 486 </para>
 487
 488 <programlisting>
 489 maybeToBool :: Maybe a -&gt; Bool
 490 maybeToBool (Just x) = True
 491 maybeToBool Nothing  = False
 492
 493 expectJust :: Maybe a -&gt; a
 494 expectJust (Just x) = x
 495 expectJust Nothing  = error "Unexpected Nothing"
 496 </programlisting>
 497
 498 <para>
 499 What is <function>clunky</function> doing? The guard <literal>ok1 &amp;&amp;
 500 ok2</literal> checks that both lookups succeed, using
 501 <function>maybeToBool</function> to convert the <function>Maybe</function>
 502 types to booleans. The (lazily evaluated) <function>expectJust</function>
 503 calls extract the values from the results of the lookups, and binds the
 504 returned values to <varname>val1</varname> and <varname>val2</varname>
 505 respectively.  If either lookup fails, then clunky takes the
 506 <literal>otherwise</literal> case and returns the sum of its arguments.
 507 </para>
 508
 509 <para>
 510 This is certainly legal Haskell, but it is a tremendously verbose and
 511 un-obvious way to achieve the desired effect.  Arguably, a more direct way
 512 to write clunky would be to use case expressions:
 513 </para>
 514
 515 <programlisting>
 516 clunky env var1 var2 = case lookup env var1 of
 517   Nothing -&gt; fail
 518   Just val1 -&gt; case lookup env var2 of
 519     Nothing -&gt; fail
 520     Just val2 -&gt; val1 + val2
 521 where
 522   fail = var1 + var2
 523 </programlisting>
 524
 525 <para>
 526 This is a bit shorter, but hardly better.  Of course, we can rewrite any set
 527 of pattern-matching, guarded equations as case expressions; that is
 528 precisely what the compiler does when compiling equations! The reason that
 529 Haskell provides guarded equations is because they allow us to write down
 530 the cases we want to consider, one at a time, independently of each other.
 531 This structure is hidden in the case version.  Two of the right-hand sides
 532 are really the same (<function>fail</function>), and the whole expression
 533 tends to become more and more indented.
 534 </para>
 535
 536 <para>
 537 Here is how I would write clunky:
 538 </para>
 539
 540 <programlisting>
 541 clunky env var1 var2
 542   | Just val1 &lt;- lookup env var1
 543   , Just val2 &lt;- lookup env var2
 544   = val1 + val2
 545 ...other equations for clunky...
 546 </programlisting>
 547
 548 <para>
 549 The semantics should be clear enough.  The qualifiers are matched in order.
 550 For a <literal>&lt;-</literal> qualifier, which I call a pattern guard, the
 551 right hand side is evaluated and matched against the pattern on the left.
 552 If the match fails then the whole guard fails and the next equation is
 553 tried.  If it succeeds, then the appropriate binding takes place, and the
 554 next qualifier is matched, in the augmented environment.  Unlike list
 555 comprehensions, however, the type of the expression to the right of the
 556 <literal>&lt;-</literal> is the same as the type of the pattern to its
 557 left.  The bindings introduced by pattern guards scope over all the
 558 remaining guard qualifiers, and over the right hand side of the equation.
 559 </para>
 560
 561 <para>
 562 Just as with list comprehensions, boolean expressions can be freely mixed
 563 with among the pattern guards.  For example:
 564 </para>
 565
 566 <programlisting>
 567 f x | [y] &lt;- x
 568     , y > 3
 569     , Just z &lt;- h y
 570     = ...
 571 </programlisting>
 572
 573 <para>
 574 Haskell's current guards therefore emerge as a special case, in which the
 575 qualifier list has just one element, a boolean expression.
 576 </para>
 577 </sect2>
 578
 579     <!-- ===================== View patterns ===================  -->
 580
 581 <sect2 id="view-patterns">
 582 <title>View patterns
 583 </title>
 584
 585 <para>
 586 View patterns are enabled by the flag <literal>-XViewPatterns</literal>.
 587 More information and examples of view patterns can be found on the
 588 <ulink url="http://hackage.haskell.org/trac/ghc/wiki/ViewPatterns">Wiki
 589 page</ulink>.
 590 </para>
 591
 592 <para>
 593 View patterns are somewhat like pattern guards that can be nested inside
 594 of other patterns.  They are a convenient way of pattern-matching
 595 against values of abstract types. For example, in a programming language
 596 implementation, we might represent the syntax of the types of the
 597 language as follows:
 598
 599 <programlisting>
 600 type Typ
 601
 602 data TypView = Unit
 603              | Arrow Typ Typ
 604
 605 view :: Type -> TypeView
 606
 607 -- additional operations for constructing Typ's ...
 608 </programlisting>
 609
 610 The representation of Typ is held abstract, permitting implementations
 611 to use a fancy representation (e.g., hash-consing to manage sharing).
 612
 613 Without view patterns, using this signature a little inconvenient:
 614 <programlisting>
 615 size :: Typ -> Integer
 616 size t = case view t of
 617   Unit -> 1
 618   Arrow t1 t2 -> size t1 + size t2
 619 </programlisting>
 620
 621 It is necessary to iterate the case, rather than using an equational
 622 function definition. And the situation is even worse when the matching
 623 against <literal>t</literal> is buried deep inside another pattern.
 624 </para>
 625
 626 <para>
 627 View patterns permit calling the view function inside the pattern and
 628 matching against the result:
 629 <programlisting>
 630 size (view -> Unit) = 1
 631 size (view -> Arrow t1 t2) = size t1 + size t2
 632 </programlisting>
 633
 634 That is, we add a new form of pattern, written
 635 <replaceable>expression</replaceable> <literal>-></literal>
 636 <replaceable>pattern</replaceable> that means "apply the expression to
 637 whatever we're trying to match against, and then match the result of
 638 that application against the pattern". The expression can be any Haskell
 639 expression of function type, and view patterns can be used wherever
 640 patterns are used.
 641 </para>
 642
 643 <para>
 644 The semantics of a pattern <literal>(</literal>
 645 <replaceable>exp</replaceable> <literal>-></literal>
 646 <replaceable>pat</replaceable> <literal>)</literal> are as follows:
 647
 648 <itemizedlist>
 649
 650 <listitem> Scoping:
 651
 652 <para>The variables bound by the view pattern are the variables bound by
 653 <replaceable>pat</replaceable>.
 654 </para>
 655
 656 <para>
 657 Any variables in <replaceable>exp</replaceable> are bound occurrences,
 658 but variables bound "to the left" in a pattern are in scope.  This
 659 feature permits, for example, one argument to a function to be used in
 660 the view of another argument.  For example, the function
 661 <literal>clunky</literal> from <xref linkend="pattern-guards" /> can be
 662 written using view patterns as follows:
 663
 664 <programlisting>
 665 clunky env (lookup env -> Just val1) (lookup env -> Just val2) = val1 + val2
 666 ...other equations for clunky...
 667 </programlisting>
 668 </para>
 669
 670 <para>
 671 More precisely, the scoping rules are:
 672 <itemizedlist>
 673 <listitem>
 674 <para>
 675 In a single pattern, variables bound by patterns to the left of a view
 676 pattern expression are in scope. For example:
 677 <programlisting>
 678 example :: Maybe ((String -> Integer,Integer), String) -> Bool
 679 example Just ((f,_), f -> 4) = True
 680 </programlisting>
 681
 682 Additionally, in function definitions, variables bound by matching earlier curried
 683 arguments may be used in view pattern expressions in later arguments:
 684 <programlisting>
 685 example :: (String -> Integer) -> String -> Bool
 686 example f (f -> 4) = True
 687 </programlisting>
 688 That is, the scoping is the same as it would be if the curried arguments
 689 were collected into a tuple.
 690 </para>
 691 </listitem>
 692
 693 <listitem>
 694 <para>
 695 In mutually recursive bindings, such as <literal>let</literal>,
 696 <literal>where</literal>, or the top level, view patterns in one
 697 declaration may not mention variables bound by other declarations.  That
 698 is, each declaration must be self-contained.  For example, the following
 699 program is not allowed:
 700 <programlisting>
 701 let {(x -> y) = e1 ;
 702      (y -> x) = e2 } in x
 703 </programlisting>
 704
 705 (We may lift this
 706 restriction in the future; the only cost is that type checking patterns
 707 would get a little more complicated.)
 708
 709
 710 </para>
 711 </listitem>
 712 </itemizedlist>
 713
 714 </para>
 715 </listitem>
 716
 717 <listitem><para> Typing: If <replaceable>exp</replaceable> has type
 718 <replaceable>T1</replaceable> <literal>-></literal>
 719 <replaceable>T2</replaceable> and <replaceable>pat</replaceable> matches
 720 a <replaceable>T2</replaceable>, then the whole view pattern matches a
 721 <replaceable>T1</replaceable>.
 722 </para></listitem>
 723
 724 <listitem><para> Matching: To the equations in Section 3.17.3 of the
 725 <ulink url="http://www.haskell.org/onlinereport/">Haskell 98
 726 Report</ulink>, add the following:
 727 <programlisting>
 728 case v of { (e -> p) -> e1 ; _ -> e2 }
 729  =
 730 case (e v) of { p -> e1 ; _ -> e2 }
 731 </programlisting>
 732 That is, to match a variable <replaceable>v</replaceable> against a pattern
 733 <literal>(</literal> <replaceable>exp</replaceable>
 734 <literal>-></literal> <replaceable>pat</replaceable>
 735 <literal>)</literal>, evaluate <literal>(</literal>
 736 <replaceable>exp</replaceable> <replaceable> v</replaceable>
 737 <literal>)</literal> and match the result against
 738 <replaceable>pat</replaceable>.
 739 </para></listitem>
 740
 741 <listitem><para> Efficiency: When the same view function is applied in
 742 multiple branches of a function definition or a case expression (e.g.,
 743 in <literal>size</literal> above), GHC makes an attempt to collect these
 744 applications into a single nested case expression, so that the view
 745 function is only applied once.  Pattern compilation in GHC follows the
 746 matrix algorithm described in Chapter 4 of <ulink
 747 url="http://research.microsoft.com/~simonpj/Papers/slpj-book-1987/">The
 748 Implementation of Functional Programming Languages</ulink>.  When the
 749 top rows of the first column of a matrix are all view patterns with the
 750 "same" expression, these patterns are transformed into a single nested
 751 case.  This includes, for example, adjacent view patterns that line up
 752 in a tuple, as in
 753 <programlisting>
 754 f ((view -> A, p1), p2) = e1
 755 f ((view -> B, p3), p4) = e2
 756 </programlisting>
 757 </para>
 758
 759 <para> The current notion of when two view pattern expressions are "the
 760 same" is very restricted: it is not even full syntactic equality.
 761 However, it does include variables, literals, applications, and tuples;
 762 e.g., two instances of <literal>view ("hi", "there")</literal> will be
 763 collected.  However, the current implementation does not compare up to
 764 alpha-equivalence, so two instances of <literal>(x, view x ->
 765 y)</literal> will not be coalesced.
 766 </para>
 767
 768 </listitem>
 769
 770 </itemizedlist>
 771 </para>
 772
 773 </sect2>
 774
 775     <!-- ===================== Recursive do-notation ===================  -->
 776
 777 <sect2 id="mdo-notation">
 778 <title>The recursive do-notation
 779 </title>
 780
 781 <para> The recursive do-notation (also known as mdo-notation) is implemented as described in
 782 <ulink url="http://citeseer.ist.psu.edu/erk02recursive.html">A recursive do for Haskell</ulink>,
 783 by Levent Erkok, John Launchbury,
 784 Haskell Workshop 2002, pages: 29-37. Pittsburgh, Pennsylvania.
 785 This paper is essential reading for anyone making non-trivial use of mdo-notation,
 786 and we do not repeat it here.
 787 </para>
 788 <para>
 789 The do-notation of Haskell does not allow <emphasis>recursive bindings</emphasis>,
 790 that is, the variables bound in a do-expression are visible only in the textually following
 791 code block. Compare this to a let-expression, where bound variables are visible in the entire binding
 792 group. It turns out that several applications can benefit from recursive bindings in
 793 the do-notation, and this extension provides the necessary syntactic support.
 794 </para>
 795 <para>
 796 Here is a simple (yet contrived) example:
 797 </para>
 798 <programlisting>
 799 import Control.Monad.Fix
 800
 801 justOnes = mdo xs &lt;- Just (1:xs)
 802                return xs
 803 </programlisting>
 804 <para>
 805 As you can guess <literal>justOnes</literal> will evaluate to <literal>Just [1,1,1,...</literal>.
 806 </para>
 807
 808 <para>
 809 The Control.Monad.Fix library introduces the <literal>MonadFix</literal> class. It's definition is:
 810 </para>
 811 <programlisting>
 812 class Monad m => MonadFix m where
 813    mfix :: (a -> m a) -> m a
 814 </programlisting>
 815 <para>
 816 The function <literal>mfix</literal>
 817 dictates how the required recursion operation should be performed.  For example,
 818 <literal>justOnes</literal> desugars as follows:
 819 <programlisting>
 820 justOnes = mfix (\xs' -&gt; do { xs &lt;- Just (1:xs'); return xs }
 821 </programlisting>
 822 For full details of the way in which mdo is typechecked and desugared, see
 823 the paper <ulink url="http://citeseer.ist.psu.edu/erk02recursive.html">A recursive do for Haskell</ulink>.
 824 In particular, GHC implements the segmentation technique described in Section 3.2 of the paper.
 825 </para>
 826 <para>
 827 If recursive bindings are required for a monad,
 828 then that monad must be declared an instance of the <literal>MonadFix</literal> class.
 829 The following instances of <literal>MonadFix</literal> are automatically provided: List, Maybe, IO.
 830 Furthermore, the Control.Monad.ST and Control.Monad.ST.Lazy modules provide the instances of the MonadFix class
 831 for Haskell's internal state monad (strict and lazy, respectively).
 832 </para>
 833 <para>
 834 Here are some important points in using the recursive-do notation:
 835 <itemizedlist>
 836 <listitem><para>
 837 The recursive version of the do-notation uses the keyword <literal>mdo</literal> (rather
 838 than <literal>do</literal>).
 839 </para></listitem>
 840
 841 <listitem><para>
 842 It is enabled with the flag <literal>-XRecursiveDo</literal>, which is in turn implied by
 843 <literal>-fglasgow-exts</literal>.
 844 </para></listitem>
 845
 846 <listitem><para>
 847 Unlike ordinary do-notation, but like <literal>let</literal> and <literal>where</literal> bindings,
 848 name shadowing is not allowed; that is, all the names bound in a single <literal>mdo</literal> must
 849 be distinct (Section 3.3 of the paper).
 850 </para></listitem>
 851
 852 <listitem><para>
 853 Variables bound by a <literal>let</literal> statement in an <literal>mdo</literal>
 854 are monomorphic in the <literal>mdo</literal> (Section 3.1 of the paper).  However
 855 GHC breaks the <literal>mdo</literal> into segments to enhance polymorphism,
 856 and improve termination (Section 3.2 of the paper).
 857 </para></listitem>
 858 </itemizedlist>
 859 </para>
 860
 861 <para>
 862 The web page: <ulink url="http://www.cse.ogi.edu/PacSoft/projects/rmb/">http://www.cse.ogi.edu/PacSoft/projects/rmb/</ulink>
 863 contains up to date information on recursive monadic bindings.
 864 </para>
 865
 866 <para>
 867 Historical note: The old implementation of the mdo-notation (and most
 868 of the existing documents) used the name
 869 <literal>MonadRec</literal> for the class and the corresponding library.
 870 This name is not supported by GHC.
 871 </para>
 872
 873 </sect2>
 874
 875
 876    <!-- ===================== PARALLEL LIST COMPREHENSIONS ===================  -->
 877
 878   <sect2 id="parallel-list-comprehensions">
 879     <title>Parallel List Comprehensions</title>
 880     <indexterm><primary>list comprehensions</primary><secondary>parallel</secondary>
 881     </indexterm>
 882     <indexterm><primary>parallel list comprehensions</primary>
 883     </indexterm>
 884
 885     <para>Parallel list comprehensions are a natural extension to list
 886     comprehensions.  List comprehensions can be thought of as a nice
 887     syntax for writing maps and filters.  Parallel comprehensions
 888     extend this to include the zipWith family.</para>
 889
 890     <para>A parallel list comprehension has multiple independent
 891     branches of qualifier lists, each separated by a `|' symbol.  For
 892     example, the following zips together two lists:</para>
 893
 894 <programlisting>
 895    [ (x, y) | x &lt;- xs | y &lt;- ys ]
 896 </programlisting>
 897
 898     <para>The behavior of parallel list comprehensions follows that of
 899     zip, in that the resulting list will have the same length as the
 900     shortest branch.</para>
 901
 902     <para>We can define parallel list comprehensions by translation to
 903     regular comprehensions.  Here's the basic idea:</para>
 904
 905     <para>Given a parallel comprehension of the form: </para>
 906
 907 <programlisting>
 908    [ e | p1 &lt;- e11, p2 &lt;- e12, ...
 909        | q1 &lt;- e21, q2 &lt;- e22, ...
 910        ...
 911    ]
 912 </programlisting>
 913
 914     <para>This will be translated to: </para>
 915
 916 <programlisting>
 917    [ e | ((p1,p2), (q1,q2), ...) &lt;- zipN [(p1,p2) | p1 &lt;- e11, p2 &lt;- e12, ...]
 918                                          [(q1,q2) | q1 &lt;- e21, q2 &lt;- e22, ...]
 919                                          ...
 920    ]
 921 </programlisting>
 922
 923     <para>where `zipN' is the appropriate zip for the given number of
 924     branches.</para>
 925
 926   </sect2>
 927
 928   <!-- ===================== TRANSFORM LIST COMPREHENSIONS ===================  -->
 929
 930   <sect2 id="generalised-list-comprehensions">
 931     <title>Generalised (SQL-Like) List Comprehensions</title>
 932     <indexterm><primary>list comprehensions</primary><secondary>generalised</secondary>
 933     </indexterm>
 934     <indexterm><primary>extended list comprehensions</primary>
 935     </indexterm>
 936     <indexterm><primary>group</primary></indexterm>
 937     <indexterm><primary>sql</primary></indexterm>
 938
 939
 940     <para>Generalised list comprehensions are a further enhancement to the
 941     list comprehension syntatic sugar to allow operations such as sorting
 942     and grouping which are familiar from SQL.   They are fully described in the
 943         paper <ulink url="http://research.microsoft.com/~simonpj/papers/list-comp">
 944           Comprehensive comprehensions: comprehensions with "order by" and "group by"</ulink>,
 945     except that the syntax we use differs slightly from the paper.</para>
 946 <para>Here is an example:
 947 <programlisting>
 948 employees = [ ("Simon", "MS", 80)
 949 , ("Erik", "MS", 100)
 950 , ("Phil", "Ed", 40)
 951 , ("Gordon", "Ed", 45)
 952 , ("Paul", "Yale", 60)]
 953
 954 output = [ (the dept, sum salary)
 955 | (name, dept, salary) &lt;- employees
 956 , then group by dept
 957 , then sortWith by (sum salary)
 958 , then take 5 ]
 959 </programlisting>
 960 In this example, the list <literal>output</literal> would take on
 961     the value:
 962
 963 <programlisting>
 964 [("Yale", 60), ("Ed", 85), ("MS", 180)]
 965 </programlisting>
 966 </para>
 967 <para>There are three new keywords: <literal>group</literal>, <literal>by</literal>, and <literal>using</literal>.
 968 (The function <literal>sortWith</literal> is not a keyword; it is an ordinary
 969 function that is exported by <literal>GHC.Exts</literal>.)</para>
 970
 971 <para>There are five new forms of comprehension qualifier,
 972 all introduced by the (existing) keyword <literal>then</literal>:
 973     <itemizedlist>
 974     <listitem>
 975
 976 <programlisting>
 977 then f
 978 </programlisting>
 979
 980     This statement requires that <literal>f</literal> have the type <literal>
 981     forall a. [a] -> [a]</literal>. You can see an example of it's use in the
 982     motivating example, as this form is used to apply <literal>take 5</literal>.
 983
 984     </listitem>
 985
 986
 987     <listitem>
 988 <para>
 989 <programlisting>
 990 then f by e
 991 </programlisting>
 992
 993     This form is similar to the previous one, but allows you to create a function
 994     which will be passed as the first argument to f. As a consequence f must have
 995     the type <literal>forall a. (a -> t) -> [a] -> [a]</literal>. As you can see
 996     from the type, this function lets f &quot;project out&quot; some information
 997     from the elements of the list it is transforming.</para>
 998
 999     <para>An example is shown in the opening example, where <literal>sortWith</literal>
1000     is supplied with a function that lets it find out the <literal>sum salary</literal>
1001     for any item in the list comprehension it transforms.</para>
1002
1003     </listitem>
1004
1005
1006     <listitem>
1007
1008 <programlisting>
1009 then group by e using f
1010 </programlisting>
1011
1012     <para>This is the most general of the grouping-type statements. In this form,
1013     f is required to have type <literal>forall a. (a -> t) -> [a] -> [[a]]</literal>.
1014     As with the <literal>then f by e</literal> case above, the first argument
1015     is a function supplied to f by the compiler which lets it compute e on every
1016     element of the list being transformed. However, unlike the non-grouping case,
1017     f additionally partitions the list into a number of sublists: this means that
1018     at every point after this statement, binders occurring before it in the comprehension
1019     refer to <emphasis>lists</emphasis> of possible values, not single values. To help understand
1020     this, let's look at an example:</para>
1021
1022 <programlisting>
1023 -- This works similarly to groupWith in GHC.Exts, but doesn't sort its input first
1024 groupRuns :: Eq b => (a -> b) -> [a] -> [[a]]
1025 groupRuns f = groupBy (\x y -> f x == f y)
1026
1027 output = [ (the x, y)
1028 | x &lt;- ([1..3] ++ [1..2])
1029 , y &lt;- [4..6]
1030 , then group by x using groupRuns ]
1031 </programlisting>
1032
1033     <para>This results in the variable <literal>output</literal> taking on the value below:</para>
1034
1035 <programlisting>
1036 [(1, [4, 5, 6]), (2, [4, 5, 6]), (3, [4, 5, 6]), (1, [4, 5, 6]), (2, [4, 5, 6])]
1037 </programlisting>
1038
1039     <para>Note that we have used the <literal>the</literal> function to change the type
1040     of x from a list to its original numeric type. The variable y, in contrast, is left
1041     unchanged from the list form introduced by the grouping.</para>
1042
1043     </listitem>
1044
1045     <listitem>
1046
1047 <programlisting>
1048 then group by e
1049 </programlisting>
1050
1051     <para>This form of grouping is essentially the same as the one described above. However,
1052     since no function to use for the grouping has been supplied it will fall back on the
1053     <literal>groupWith</literal> function defined in
1054     <ulink url="../libraries/base/GHC-Exts.html"><literal>GHC.Exts</literal></ulink>. This
1055     is the form of the group statement that we made use of in the opening example.</para>
1056
1057     </listitem>
1058
1059
1060     <listitem>
1061
1062 <programlisting>
1063 then group using f
1064 </programlisting>
1065
1066     <para>With this form of the group statement, f is required to simply have the type
1067     <literal>forall a. [a] -> [[a]]</literal>, which will be used to group up the
1068     comprehension so far directly. An example of this form is as follows:</para>
1069
1070 <programlisting>
1071 output = [ x
1072 | y &lt;- [1..5]
1073 , x &lt;- "hello"
1074 , then group using inits]
1075 </programlisting>
1076
1077     <para>This will yield a list containing every prefix of the word "hello" written out 5 times:</para>
1078
1079 <programlisting>
1080 ["","h","he","hel","hell","hello","helloh","hellohe","hellohel","hellohell","hellohello","hellohelloh",...]
1081 </programlisting>
1082
1083     </listitem>
1084 </itemizedlist>
1085 </para>
1086   </sect2>
1087
1088    <!-- ===================== REBINDABLE SYNTAX ===================  -->
1089
1090 <sect2 id="rebindable-syntax">
1091 <title>Rebindable syntax and the implicit Prelude import</title>
1092
1093  <para><indexterm><primary>-XNoImplicitPrelude
1094  option</primary></indexterm> GHC normally imports
1095  <filename>Prelude.hi</filename> files for you.  If you'd
1096  rather it didn't, then give it a
1097  <option>-XNoImplicitPrelude</option> option.  The idea is
1098  that you can then import a Prelude of your own.  (But don't
1099  call it <literal>Prelude</literal>; the Haskell module
1100  namespace is flat, and you must not conflict with any
1101  Prelude module.)</para>
1102
1103             <para>Suppose you are importing a Prelude of your own
1104               in order to define your own numeric class
1105             hierarchy.  It completely defeats that purpose if the
1106             literal "1" means "<literal>Prelude.fromInteger
1107             1</literal>", which is what the Haskell Report specifies.
1108             So the <option>-XNoImplicitPrelude</option>
1109               flag <emphasis>also</emphasis> causes
1110             the following pieces of built-in syntax to refer to
1111             <emphasis>whatever is in scope</emphasis>, not the Prelude
1112             versions:
1113             <itemizedlist>
1114               <listitem>
1115                 <para>An integer literal <literal>368</literal> means
1116                 "<literal>fromInteger (368::Integer)</literal>", rather than
1117                 "<literal>Prelude.fromInteger (368::Integer)</literal>".
1118 </para> </listitem>
1119
1120       <listitem><para>Fractional literals are handed in just the same way,
1121           except that the translation is
1122               <literal>fromRational (3.68::Rational)</literal>.
1123 </para> </listitem>
1124
1125           <listitem><para>The equality test in an overloaded numeric pattern
1126               uses whatever <literal>(==)</literal> is in scope.
1127 </para> </listitem>
1128
1129           <listitem><para>The subtraction operation, and the
1130           greater-than-or-equal test, in <literal>n+k</literal> patterns
1131               use whatever <literal>(-)</literal> and <literal>(>=)</literal> are in scope.
1132               </para></listitem>
1133
1134               <listitem>
1135                 <para>Negation (e.g. "<literal>- (f x)</literal>")
1136                 means "<literal>negate (f x)</literal>", both in numeric
1137                 patterns, and expressions.
1138               </para></listitem>
1139
1140               <listitem>
1141           <para>"Do" notation is translated using whatever
1142               functions <literal>(>>=)</literal>,
1143               <literal>(>>)</literal>, and <literal>fail</literal>,
1144               are in scope (not the Prelude
1145               versions).  List comprehensions, mdo (<xref linkend="mdo-notation"/>), and parallel array
1146               comprehensions, are unaffected.  </para></listitem>
1147
1148               <listitem>
1149                 <para>Arrow
1150                 notation (see <xref linkend="arrow-notation"/>)
1151                 uses whatever <literal>arr</literal>,
1152                 <literal>(>>>)</literal>, <literal>first</literal>,
1153                 <literal>app</literal>, <literal>(|||)</literal> and
1154                 <literal>loop</literal> functions are in scope. But unlike the
1155                 other constructs, the types of these functions must match the
1156                 Prelude types very closely.  Details are in flux; if you want
1157                 to use this, ask!
1158               </para></listitem>
1159             </itemizedlist>
1160 In all cases (apart from arrow notation), the static semantics should be that of the desugared form,
1161 even if that is a little unexpected. For example, the
1162 static semantics of the literal <literal>368</literal>
1163 is exactly that of <literal>fromInteger (368::Integer)</literal>; it's fine for
1164 <literal>fromInteger</literal> to have any of the types:
1165 <programlisting>
1166 fromInteger :: Integer -> Integer
1167 fromInteger :: forall a. Foo a => Integer -> a
1168 fromInteger :: Num a => a -> Integer
1169 fromInteger :: Integer -> Bool -> Bool
1170 </programlisting>
1171 </para>
1172
1173              <para>Be warned: this is an experimental facility, with
1174              fewer checks than usual.  Use <literal>-dcore-lint</literal>
1175              to typecheck the desugared program.  If Core Lint is happy
1176              you should be all right.</para>
1177
1178 </sect2>
1179
1180 <sect2 id="postfix-operators">
1181 <title>Postfix operators</title>
1182
1183 <para>
1184 GHC allows a small extension to the syntax of left operator sections, which
1185 allows you to define postfix operators.  The extension is this:  the left section
1186 <programlisting>
1187   (e !)
1188 </programlisting>
1189 is equivalent (from the point of view of both type checking and execution) to the expression
1190 <programlisting>
1191   ((!) e)
1192 </programlisting>
1193 (for any expression <literal>e</literal> and operator <literal>(!)</literal>.
1194 The strict Haskell 98 interpretation is that the section is equivalent to
1195 <programlisting>
1196   (\y -> (!) e y)
1197 </programlisting>
1198 That is, the operator must be a function of two arguments.  GHC allows it to
1199 take only one argument, and that in turn allows you to write the function
1200 postfix.
1201 </para>
1202 <para>Since this extension goes beyond Haskell 98, it should really be enabled
1203 by a flag; but in fact it is enabled all the time.  (No Haskell 98 programs
1204 change their behaviour, of course.)
1205 </para>
1206 <para>The extension does not extend to the left-hand side of function
1207 definitions; you must define such a function in prefix form.</para>
1208
1209 </sect2>
1210
1211 <sect2 id="disambiguate-fields">
1212 <title>Record field disambiguation</title>
1213 <para>
1214 In record construction and record pattern matching
1215 it is entirely unambiguous which field is referred to, even if there are two different
1216 data types in scope with a common field name.  For example:
1217 <programlisting>
1218 module M where
1219   data S = MkS { x :: Int, y :: Bool }
1220
1221 module Foo where
1222   import M
1223
1224   data T = MkT { x :: Int }
1225
1226   ok1 (MkS { x = n }) = n+1   -- Unambiguous
1227
1228   ok2 n = MkT { x = n+1 }     -- Unambiguous
1229
1230   bad1 k = k { x = 3 }  -- Ambiguous
1231   bad2 k = x k          -- Ambiguous
1232 </programlisting>
1233 Even though there are two <literal>x</literal>'s in scope,
1234 it is clear that the <literal>x</literal> in the pattern in the
1235 definition of <literal>ok1</literal> can only mean the field
1236 <literal>x</literal> from type <literal>S</literal>. Similarly for
1237 the function <literal>ok2</literal>.  However, in the record update
1238 in <literal>bad1</literal> and the record selection in <literal>bad2</literal>
1239 it is not clear which of the two types is intended.
1240 </para>
1241 <para>
1242 Haskell 98 regards all four as ambiguous, but with the
1243 <option>-fdisambiguate-record-fields</option> flag, GHC will accept
1244 the former two.  The rules are precisely the same as those for instance
1245 declarations in Haskell 98, where the method names on the left-hand side
1246 of the method bindings in an instance declaration refer unambiguously
1247 to the method of that class (provided they are in scope at all), even
1248 if there are other variables in scope with the same name.
1249 This reduces the clutter of qualified names when you import two
1250 records from different modules that use the same field name.
1251 </para>
1252 </sect2>
1253
1254     <!-- ===================== Record puns ===================  -->
1255
1256 <sect2 id="record-puns">
1257 <title>Record puns
1258 </title>
1259
1260 <para>
1261 Record puns are enabled by the flag <literal>-XNamedFieldPuns</literal>.
1262 </para>
1263
1264 <para>
1265 When using records, it is common to write a pattern that binds a
1266 variable with the same name as a record field, such as:
1267
1268 <programlisting>
1269 data C = C {a :: Int}
1270 f (C {a = a}) = a
1271 </programlisting>
1272 </para>
1273
1274 <para>
1275 Record punning permits the variable name to be elided, so one can simply
1276 write
1277
1278 <programlisting>
1279 f (C {a}) = a
1280 </programlisting>
1281
1282 to mean the same pattern as above.  That is, in a record pattern, the
1283 pattern <literal>a</literal> expands into the pattern <literal>a =
1284 a</literal> for the same name <literal>a</literal>.
1285 </para>
1286
1287 <para>
1288 Note that puns and other patterns can be mixed in the same record:
1289 <programlisting>
1290 data C = C {a :: Int, b :: Int}
1291 f (C {a, b = 4}) = a
1292 </programlisting>
1293 and that puns can be used wherever record patterns occur (e.g. in
1294 <literal>let</literal> bindings or at the top-level).
1295 </para>
1296
1297 <para>
1298 Record punning can also be used in an expression, writing, for example,
1299 <programlisting>
1300 let a = 1 in C {a}
1301 </programlisting>
1302 instead of
1303 <programlisting>
1304 let a = 1 in C {a = a}
1305 </programlisting>
1306
1307 Note that this expansion is purely syntactic, so the record pun
1308 expression refers to the nearest enclosing variable that is spelled the
1309 same as the field name.
1310 </para>
1311
1312 </sect2>
1313
1314     <!-- ===================== Record wildcards ===================  -->
1315
1316 <sect2 id="record-wildcards">
1317 <title>Record wildcards
1318 </title>
1319
1320 <para>
1321 Record wildcards are enabled by the flag <literal>-XRecordWildCards</literal>.
1322 </para>
1323
1324 <para>
1325 For records with many fields, it can be tiresome to write out each field
1326 individually in a record pattern, as in
1327 <programlisting>
1328 data C = C {a :: Int, b :: Int, c :: Int, d :: Int}
1329 f (C {a = 1, b = b, c = c, d = d}) = b + c + d
1330 </programlisting>
1331 </para>
1332
1333 <para>
1334 Record wildcard syntax permits a (<literal>..</literal>) in a record
1335 pattern, where each elided field <literal>f</literal> is replaced by the
1336 pattern <literal>f = f</literal>.  For example, the above pattern can be
1337 written as
1338 <programlisting>
1339 f (C {a = 1, ..}) = b + c + d
1340 </programlisting>
1341 </para>
1342
1343 <para>
1344 Note that wildcards can be mixed with other patterns, including puns
1345 (<xref linkend="record-puns"/>); for example, in a pattern <literal>C {a
1346 = 1, b, ..})</literal>.  Additionally, record wildcards can be used
1347 wherever record patterns occur, including in <literal>let</literal>
1348 bindings and at the top-level.  For example, the top-level binding
1349 <programlisting>
1350 C {a = 1, ..} = e
1351 </programlisting>
1352 defines <literal>b</literal>, <literal>c</literal>, and
1353 <literal>d</literal>.
1354 </para>
1355
1356 <para>
1357 Record wildcards can also be used in expressions, writing, for example,
1358
1359 <programlisting>
1360 let {a = 1; b = 2; c = 3; d = 4} in C {..}
1361 </programlisting>
1362
1363 in place of
1364
1365 <programlisting>
1366 let {a = 1; b = 2; c = 3; d = 4} in C {a=a, b=b, c=c, d=d}
1367 </programlisting>
1368
1369 Note that this expansion is purely syntactic, so the record wildcard
1370 expression refers to the nearest enclosing variables that are spelled
1371 the same as the omitted field names.
1372 </para>
1373
1374 </sect2>
1375
1376     <!-- ===================== Local fixity declarations ===================  -->
1377
1378 <sect2 id="local-fixity-declarations">
1379 <title>Local Fixity Declarations
1380 </title>
1381
1382 <para>A careful reading of the Haskell 98 Report reveals that fixity
1383 declarations (<literal>infix</literal>, <literal>infixl</literal>, and
1384 <literal>infixr</literal>) are permitted to appear inside local bindings
1385 such those introduced by <literal>let</literal> and
1386 <literal>where</literal>.  However, the Haskell Report does not specify
1387 the semantics of such bindings very precisely.
1388 </para>
1389
1390 <para>In GHC, a fixity declaration may accompany a local binding:
1391 <programlisting>
1392 let f = ...
1393     infixr 3 `f`
1394 in
1395     ...
1396 </programlisting>
1397 and the fixity declaration applies wherever the binding is in scope.
1398 For example, in a <literal>let</literal>, it applies in the right-hand
1399 sides of other <literal>let</literal>-bindings and the body of the
1400 <literal>let</literal>C. Or, in recursive <literal>do</literal>
1401 expressions (<xref linkend="mdo-notation"/>), the local fixity
1402 declarations of a <literal>let</literal> statement scope over other
1403 statements in the group, just as the bound name does.
1404 </para>
1405
1406 <para>
1407 Moreover, a local fixity declaration *must* accompany a local binding of
1408 that name: it is not possible to revise the fixity of name bound
1409 elsewhere, as in
1410 <programlisting>
1411 let infixr 9 $ in ...
1412 </programlisting>
1413
1414 Because local fixity declarations are technically Haskell 98, no flag is
1415 necessary to enable them.
1416 </para>
1417 </sect2>
1418
1419 <sect2 id="package-imports">
1420   <title>Package-qualified imports</title>
1421
1422   <para>With the <option>-XPackageImports</option> flag, GHC allows
1423   import declarations to be qualified by the package name that the
1424     module is intended to be imported from.  For example:</para>
1425
1426 <programlisting>
1427 import "network" Network.Socket
1428 </programlisting>
1429
1430   <para>would import the module <literal>Network.Socket</literal> from
1431     the package <literal>network</literal> (any version).  This may
1432     be used to disambiguate an import when the same module is
1433     available from multiple packages, or is present in both the
1434     current package being built and an external package.</para>
1435
1436   <para>Note: you probably don't need to use this feature, it was
1437     added mainly so that we can build backwards-compatible versions of
1438     packages when APIs change.  It can lead to fragile dependencies in
1439     the common case: modules occasionally move from one package to
1440     another, rendering any package-qualified imports broken.</para>
1441 </sect2>
1442
1443 <sect2 id="syntax-stolen">
1444 <title>Summary of stolen syntax</title>
1445
1446     <para>Turning on an option that enables special syntax
1447     <emphasis>might</emphasis> cause working Haskell 98 code to fail
1448     to compile, perhaps because it uses a variable name which has
1449     become a reserved word.  This section lists the syntax that is
1450     "stolen" by language extensions.
1451      We use
1452     notation and nonterminal names from the Haskell 98 lexical syntax
1453     (see the Haskell 98 Report).
1454     We only list syntax changes here that might affect
1455     existing working programs (i.e. "stolen" syntax).  Many of these
1456     extensions will also enable new context-free syntax, but in all
1457     cases programs written to use the new syntax would not be
1458     compilable without the option enabled.</para>
1459
1460 <para>There are two classes of special
1461     syntax:
1462
1463     <itemizedlist>
1464       <listitem>
1465         <para>New reserved words and symbols: character sequences
1466         which are no longer available for use as identifiers in the
1467         program.</para>
1468       </listitem>
1469       <listitem>
1470         <para>Other special syntax: sequences of characters that have
1471         a different meaning when this particular option is turned
1472         on.</para>
1473       </listitem>
1474     </itemizedlist>
1475
1476 The following syntax is stolen:
1477
1478     <variablelist>
1479       <varlistentry>
1480         <term>
1481           <literal>forall</literal>
1482           <indexterm><primary><literal>forall</literal></primary></indexterm>
1483         </term>
1484         <listitem><para>
1485         Stolen (in types) by: <option>-XScopedTypeVariables</option>,
1486             <option>-XLiberalTypeSynonyms</option>,
1487             <option>-XRank2Types</option>,
1488             <option>-XRankNTypes</option>,
1489             <option>-XPolymorphicComponents</option>,
1490             <option>-XExistentialQuantification</option>
1491           </para></listitem>
1492       </varlistentry>
1493
1494       <varlistentry>
1495         <term>
1496           <literal>mdo</literal>
1497           <indexterm><primary><literal>mdo</literal></primary></indexterm>
1498         </term>
1499         <listitem><para>
1500         Stolen by: <option>-XRecursiveDo</option>,
1501           </para></listitem>
1502       </varlistentry>
1503
1504       <varlistentry>
1505         <term>
1506           <literal>foreign</literal>
1507           <indexterm><primary><literal>foreign</literal></primary></indexterm>
1508         </term>
1509         <listitem><para>
1510         Stolen by: <option>-XForeignFunctionInterface</option>,
1511           </para></listitem>
1512       </varlistentry>
1513
1514       <varlistentry>
1515         <term>
1516           <literal>rec</literal>,
1517           <literal>proc</literal>, <literal>-&lt;</literal>,
1518           <literal>&gt;-</literal>, <literal>-&lt;&lt;</literal>,
1519           <literal>&gt;&gt;-</literal>, and <literal>(|</literal>,
1520           <literal>|)</literal> brackets
1521           <indexterm><primary><literal>proc</literal></primary></indexterm>
1522         </term>
1523         <listitem><para>
1524         Stolen by: <option>-XArrows</option>,
1525           </para></listitem>
1526       </varlistentry>
1527
1528       <varlistentry>
1529         <term>
1530           <literal>?<replaceable>varid</replaceable></literal>,
1531           <literal>%<replaceable>varid</replaceable></literal>
1532           <indexterm><primary>implicit parameters</primary></indexterm>
1533         </term>
1534         <listitem><para>
1535         Stolen by: <option>-XImplicitParams</option>,
1536           </para></listitem>
1537       </varlistentry>
1538
1539       <varlistentry>
1540         <term>
1541           <literal>[|</literal>,
1542           <literal>[e|</literal>, <literal>[p|</literal>,
1543           <literal>[d|</literal>, <literal>[t|</literal>,
1544           <literal>$(</literal>,
1545           <literal>$<replaceable>varid</replaceable></literal>
1546           <indexterm><primary>Template Haskell</primary></indexterm>
1547         </term>
1548         <listitem><para>
1549         Stolen by: <option>-XTemplateHaskell</option>,
1550           </para></listitem>
1551       </varlistentry>
1552
1553       <varlistentry>
1554         <term>
1555           <literal>[:<replaceable>varid</replaceable>|</literal>
1556           <indexterm><primary>quasi-quotation</primary></indexterm>
1557         </term>
1558         <listitem><para>
1559         Stolen by: <option>-XQuasiQuotes</option>,
1560           </para></listitem>
1561       </varlistentry>
1562
1563       <varlistentry>
1564         <term>
1565               <replaceable>varid</replaceable>{<literal>&num;</literal>},
1566               <replaceable>char</replaceable><literal>&num;</literal>,
1567               <replaceable>string</replaceable><literal>&num;</literal>,
1568               <replaceable>integer</replaceable><literal>&num;</literal>,
1569               <replaceable>float</replaceable><literal>&num;</literal>,
1570               <replaceable>float</replaceable><literal>&num;&num;</literal>,
1571               <literal>(&num;</literal>, <literal>&num;)</literal>,
1572         </term>
1573         <listitem><para>
1574         Stolen by: <option>-XMagicHash</option>,
1575           </para></listitem>
1576       </varlistentry>
1577     </variablelist>
1578 </para>
1579 </sect2>
1580 </sect1>
1581
1582
1583 <!-- TYPE SYSTEM EXTENSIONS -->
1584 <sect1 id="data-type-extensions">
1585 <title>Extensions to data types and type synonyms</title>
1586
1587 <sect2 id="nullary-types">
1588 <title>Data types with no constructors</title>
1589
1590 <para>With the <option>-fglasgow-exts</option> flag, GHC lets you declare
1591 a data type with no constructors.  For example:</para>
1592
1593 <programlisting>
1594   data S      -- S :: *
1595   data T a    -- T :: * -> *
1596 </programlisting>
1597
1598 <para>Syntactically, the declaration lacks the "= constrs" part.  The
1599 type can be parameterised over types of any kind, but if the kind is
1600 not <literal>*</literal> then an explicit kind annotation must be used
1601 (see <xref linkend="kinding"/>).</para>
1602
1603 <para>Such data types have only one value, namely bottom.
1604 Nevertheless, they can be useful when defining "phantom types".</para>
1605 </sect2>
1606
1607 <sect2 id="infix-tycons">
1608 <title>Infix type constructors, classes, and type variables</title>
1609
1610 <para>
1611 GHC allows type constructors, classes, and type variables to be operators, and
1612 to be written infix, very much like expressions.  More specifically:
1613 <itemizedlist>
1614 <listitem><para>
1615   A type constructor or class can be an operator, beginning with a colon; e.g. <literal>:*:</literal>.
1616   The lexical syntax is the same as that for data constructors.
1617   </para></listitem>
1618 <listitem><para>
1619   Data type and type-synonym declarations can be written infix, parenthesised
1620   if you want further arguments.  E.g.
1621 <screen>
1622   data a :*: b = Foo a b
1623   type a :+: b = Either a b
1624   class a :=: b where ...
1625
1626   data (a :**: b) x = Baz a b x
1627   type (a :++: b) y = Either (a,b) y
1628 </screen>
1629   </para></listitem>
1630 <listitem><para>
1631   Types, and class constraints, can be written infix.  For example
1632   <screen>
1633         x :: Int :*: Bool
1634         f :: (a :=: b) => a -> b
1635   </screen>
1636   </para></listitem>
1637 <listitem><para>
1638   A type variable can be an (unqualified) operator e.g. <literal>+</literal>.
1639   The lexical syntax is the same as that for variable operators, excluding "(.)",
1640   "(!)", and "(*)".  In a binding position, the operator must be
1641   parenthesised.  For example:
1642 <programlisting>
1643    type T (+) = Int + Int
1644    f :: T Either
1645    f = Left 3
1646
1647    liftA2 :: Arrow (~>)
1648           => (a -> b -> c) -> (e ~> a) -> (e ~> b) -> (e ~> c)
1649    liftA2 = ...
1650 </programlisting>
1651   </para></listitem>
1652 <listitem><para>
1653   Back-quotes work
1654   as for expressions, both for type constructors and type variables;  e.g. <literal>Int `Either` Bool</literal>, or
1655   <literal>Int `a` Bool</literal>.  Similarly, parentheses work the same; e.g.  <literal>(:*:) Int Bool</literal>.
1656   </para></listitem>
1657 <listitem><para>
1658   Fixities may be declared for type constructors, or classes, just as for data constructors.  However,
1659   one cannot distinguish between the two in a fixity declaration; a fixity declaration
1660   sets the fixity for a data constructor and the corresponding type constructor.  For example:
1661 <screen>
1662   infixl 7 T, :*:
1663 </screen>
1664   sets the fixity for both type constructor <literal>T</literal> and data constructor <literal>T</literal>,
1665   and similarly for <literal>:*:</literal>.
1666   <literal>Int `a` Bool</literal>.
1667   </para></listitem>
1668 <listitem><para>
1669   Function arrow is <literal>infixr</literal> with fixity 0.  (This might change; I'm not sure what it should be.)
1670   </para></listitem>
1671
1672 </itemizedlist>
1673 </para>
1674 </sect2>
1675
1676 <sect2 id="type-synonyms">
1677 <title>Liberalised type synonyms</title>
1678
1679 <para>
1680 Type synonyms are like macros at the type level, but Haskell 98 imposes many rules
1681 on individual synonym declarations.
1682 With the <option>-XLiberalTypeSynonyms</option> extension,
1683 GHC does validity checking on types <emphasis>only after expanding type synonyms</emphasis>.
1684 That means that GHC can be very much more liberal about type synonyms than Haskell 98.
1685
1686 <itemizedlist>
1687 <listitem> <para>You can write a <literal>forall</literal> (including overloading)
1688 in a type synonym, thus:
1689 <programlisting>
1690   type Discard a = forall b. Show b => a -> b -> (a, String)
1691
1692   f :: Discard a
1693   f x y = (x, show y)
1694
1695   g :: Discard Int -> (Int,String)    -- A rank-2 type
1696   g f = f 3 True
1697 </programlisting>
1698 </para>
1699 </listitem>
1700
1701 <listitem><para>
1702 If you also use <option>-XUnboxedTuples</option>,
1703 you can write an unboxed tuple in a type synonym:
1704 <programlisting>
1705   type Pr = (# Int, Int #)
1706
1707   h :: Int -> Pr
1708   h x = (# x, x #)
1709 </programlisting>
1710 </para></listitem>
1711
1712 <listitem><para>
1713 You can apply a type synonym to a forall type:
1714 <programlisting>
1715   type Foo a = a -> a -> Bool
1716
1717   f :: Foo (forall b. b->b)
1718 </programlisting>
1719 After expanding the synonym, <literal>f</literal> has the legal (in GHC) type:
1720 <programlisting>
1721   f :: (forall b. b->b) -> (forall b. b->b) -> Bool
1722 </programlisting>
1723 </para></listitem>
1724
1725 <listitem><para>
1726 You can apply a type synonym to a partially applied type synonym:
1727 <programlisting>
1728   type Generic i o = forall x. i x -> o x
1729   type Id x = x
1730
1731   foo :: Generic Id []
1732 </programlisting>
1733 After expanding the synonym, <literal>foo</literal> has the legal (in GHC) type:
1734 <programlisting>
1735   foo :: forall x. x -> [x]
1736 </programlisting>
1737 </para></listitem>
1738
1739 </itemizedlist>
1740 </para>
1741
1742 <para>
1743 GHC currently does kind checking before expanding synonyms (though even that
1744 could be changed.)
1745 </para>
1746 <para>
1747 After expanding type synonyms, GHC does validity checking on types, looking for
1748 the following mal-formedness which isn't detected simply by kind checking:
1749 <itemizedlist>
1750 <listitem><para>
1751 Type constructor applied to a type involving for-alls.
1752 </para></listitem>
1753 <listitem><para>
1754 Unboxed tuple on left of an arrow.
1755 </para></listitem>
1756 <listitem><para>
1757 Partially-applied type synonym.
1758 </para></listitem>
1759 </itemizedlist>
1760 So, for example,
1761 this will be rejected:
1762 <programlisting>
1763   type Pr = (# Int, Int #)
1764
1765   h :: Pr -> Int
1766   h x = ...
1767 </programlisting>
1768 because GHC does not allow  unboxed tuples on the left of a function arrow.
1769 </para>
1770 </sect2>
1771
1772
1773 <sect2 id="existential-quantification">
1774 <title>Existentially quantified data constructors
1775 </title>
1776
1777 <para>
1778 The idea of using existential quantification in data type declarations
1779 was suggested by Perry, and implemented in Hope+ (Nigel Perry, <emphasis>The Implementation
1780 of Practical Functional Programming Languages</emphasis>, PhD Thesis, University of
1781 London, 1991). It was later formalised by Laufer and Odersky
1782 (<emphasis>Polymorphic type inference and abstract data types</emphasis>,
1783 TOPLAS, 16(5), pp1411-1430, 1994).
1784 It's been in Lennart
1785 Augustsson's <command>hbc</command> Haskell compiler for several years, and
1786 proved very useful.  Here's the idea.  Consider the declaration:
1787 </para>
1788
1789 <para>
1790
1791 <programlisting>
1792   data Foo = forall a. MkFoo a (a -> Bool)
1793            | Nil
1794 </programlisting>
1795
1796 </para>
1797
1798 <para>
1799 The data type <literal>Foo</literal> has two constructors with types:
1800 </para>
1801
1802 <para>
1803
1804 <programlisting>
1805   MkFoo :: forall a. a -> (a -> Bool) -> Foo
1806   Nil   :: Foo
1807 </programlisting>
1808
1809 </para>
1810
1811 <para>
1812 Notice that the type variable <literal>a</literal> in the type of <function>MkFoo</function>
1813 does not appear in the data type itself, which is plain <literal>Foo</literal>.
1814 For example, the following expression is fine:
1815 </para>
1816
1817 <para>
1818
1819 <programlisting>
1820   [MkFoo 3 even, MkFoo 'c' isUpper] :: [Foo]
1821 </programlisting>
1822
1823 </para>
1824
1825 <para>
1826 Here, <literal>(MkFoo 3 even)</literal> packages an integer with a function
1827 <function>even</function> that maps an integer to <literal>Bool</literal>; and <function>MkFoo 'c'
1828 isUpper</function> packages a character with a compatible function.  These
1829 two things are each of type <literal>Foo</literal> and can be put in a list.
1830 </para>
1831
1832 <para>
1833 What can we do with a value of type <literal>Foo</literal>?.  In particular,
1834 what happens when we pattern-match on <function>MkFoo</function>?
1835 </para>
1836
1837 <para>
1838
1839 <programlisting>
1840   f (MkFoo val fn) = ???
1841 </programlisting>
1842
1843 </para>
1844
1845 <para>
1846 Since all we know about <literal>val</literal> and <function>fn</function> is that they
1847 are compatible, the only (useful) thing we can do with them is to
1848 apply <function>fn</function> to <literal>val</literal> to get a boolean.  For example:
1849 </para>
1850
1851 <para>
1852
1853 <programlisting>
1854   f :: Foo -> Bool
1855   f (MkFoo val fn) = fn val
1856 </programlisting>
1857
1858 </para>
1859
1860 <para>
1861 What this allows us to do is to package heterogeneous values
1862 together with a bunch of functions that manipulate them, and then treat
1863 that collection of packages in a uniform manner.  You can express
1864 quite a bit of object-oriented-like programming this way.
1865 </para>
1866
1867 <sect3 id="existential">
1868 <title>Why existential?
1869 </title>
1870
1871 <para>
1872 What has this to do with <emphasis>existential</emphasis> quantification?
1873 Simply that <function>MkFoo</function> has the (nearly) isomorphic type
1874 </para>
1875
1876 <para>
1877
1878 <programlisting>
1879   MkFoo :: (exists a . (a, a -> Bool)) -> Foo
1880 </programlisting>
1881
1882 </para>
1883
1884 <para>
1885 But Haskell programmers can safely think of the ordinary
1886 <emphasis>universally</emphasis> quantified type given above, thereby avoiding
1887 adding a new existential quantification construct.
1888 </para>
1889
1890 </sect3>
1891
1892 <sect3 id="existential-with-context">
1893 <title>Existentials and type classes</title>
1894
1895 <para>
1896 An easy extension is to allow
1897 arbitrary contexts before the constructor.  For example:
1898 </para>
1899
1900 <para>
1901
1902 <programlisting>
1903 data Baz = forall a. Eq a => Baz1 a a
1904          | forall b. Show b => Baz2 b (b -> b)
1905 </programlisting>
1906
1907 </para>
1908
1909 <para>
1910 The two constructors have the types you'd expect:
1911 </para>
1912
1913 <para>
1914
1915 <programlisting>
1916 Baz1 :: forall a. Eq a => a -> a -> Baz
1917 Baz2 :: forall b. Show b => b -> (b -> b) -> Baz
1918 </programlisting>
1919
1920 </para>
1921
1922 <para>
1923 But when pattern matching on <function>Baz1</function> the matched values can be compared
1924 for equality, and when pattern matching on <function>Baz2</function> the first matched
1925 value can be converted to a string (as well as applying the function to it).
1926 So this program is legal:
1927 </para>
1928
1929 <para>
1930
1931 <programlisting>
1932   f :: Baz -> String
1933   f (Baz1 p q) | p == q    = "Yes"
1934                | otherwise = "No"
1935   f (Baz2 v fn)            = show (fn v)
1936 </programlisting>
1937
1938 </para>
1939
1940 <para>
1941 Operationally, in a dictionary-passing implementation, the
1942 constructors <function>Baz1</function> and <function>Baz2</function> must store the
1943 dictionaries for <literal>Eq</literal> and <literal>Show</literal> respectively, and
1944 extract it on pattern matching.
1945 </para>
1946
1947 </sect3>
1948
1949 <sect3 id="existential-records">
1950 <title>Record Constructors</title>
1951
1952 <para>
1953 GHC allows existentials to be used with records syntax as well.  For example:
1954
1955 <programlisting>
1956 data Counter a = forall self. NewCounter
1957     { _this    :: self
1958     , _inc     :: self -> self
1959     , _display :: self -> IO ()
1960     , tag      :: a
1961     }
1962 </programlisting>
1963 Here <literal>tag</literal> is a public field, with a well-typed selector
1964 function <literal>tag :: Counter a -> a</literal>.  The <literal>self</literal>
1965 type is hidden from the outside; any attempt to apply <literal>_this</literal>,
1966 <literal>_inc</literal> or <literal>_display</literal> as functions will raise a
1967 compile-time error.  In other words, <emphasis>GHC defines a record selector function
1968 only for fields whose type does not mention the existentially-quantified variables</emphasis>.
1969 (This example used an underscore in the fields for which record selectors
1970 will not be defined, but that is only programming style; GHC ignores them.)
1971 </para>
1972
1973 <para>
1974 To make use of these hidden fields, we need to create some helper functions:
1975
1976 <programlisting>
1977 inc :: Counter a -> Counter a
1978 inc (NewCounter x i d t) = NewCounter
1979     { _this = i x, _inc = i, _display = d, tag = t }
1980
1981 display :: Counter a -> IO ()
1982 display NewCounter{ _this = x, _display = d } = d x
1983 </programlisting>
1984
1985 Now we can define counters with different underlying implementations:
1986
1987 <programlisting>
1988 counterA :: Counter String
1989 counterA = NewCounter
1990     { _this = 0, _inc = (1+), _display = print, tag = "A" }
1991
1992 counterB :: Counter String
1993 counterB = NewCounter
1994     { _this = "", _inc = ('#':), _display = putStrLn, tag = "B" }
1995
1996 main = do
1997     display (inc counterA)         -- prints "1"
1998     display (inc (inc counterB))   -- prints "##"
1999 </programlisting>
2000
2001 At the moment, record update syntax is only supported for Haskell 98 data types,
2002 so the following function does <emphasis>not</emphasis> work:
2003
2004 <programlisting>
2005 -- This is invalid; use explicit NewCounter instead for now
2006 setTag :: Counter a -> a -> Counter a
2007 setTag obj t = obj{ tag = t }
2008 </programlisting>
2009
2010 </para>
2011
2012 </sect3>
2013
2014
2015 <sect3>
2016 <title>Restrictions</title>
2017
2018 <para>
2019 There are several restrictions on the ways in which existentially-quantified
2020 constructors can be use.
2021 </para>
2022
2023 <para>
2024
2025 <itemizedlist>
2026 <listitem>
2027
2028 <para>
2029  When pattern matching, each pattern match introduces a new,
2030 distinct, type for each existential type variable.  These types cannot
2031 be unified with any other type, nor can they escape from the scope of
2032 the pattern match.  For example, these fragments are incorrect:
2033
2034
2035 <programlisting>
2036 f1 (MkFoo a f) = a
2037 </programlisting>
2038
2039
2040 Here, the type bound by <function>MkFoo</function> "escapes", because <literal>a</literal>
2041 is the result of <function>f1</function>.  One way to see why this is wrong is to
2042 ask what type <function>f1</function> has:
2043
2044
2045 <programlisting>
2046   f1 :: Foo -> a             -- Weird!
2047 </programlisting>
2048
2049
2050 What is this "<literal>a</literal>" in the result type? Clearly we don't mean
2051 this:
2052
2053
2054 <programlisting>
2055   f1 :: forall a. Foo -> a   -- Wrong!
2056 </programlisting>
2057
2058
2059 The original program is just plain wrong.  Here's another sort of error
2060
2061
2062 <programlisting>
2063   f2 (Baz1 a b) (Baz1 p q) = a==q
2064 </programlisting>
2065
2066
2067 It's ok to say <literal>a==b</literal> or <literal>p==q</literal>, but
2068 <literal>a==q</literal> is wrong because it equates the two distinct types arising
2069 from the two <function>Baz1</function> constructors.
2070
2071
2072 </para>
2073 </listitem>
2074 <listitem>
2075
2076 <para>
2077 You can't pattern-match on an existentially quantified
2078 constructor in a <literal>let</literal> or <literal>where</literal> group of
2079 bindings. So this is illegal:
2080
2081
2082 <programlisting>
2083   f3 x = a==b where { Baz1 a b = x }
2084 </programlisting>
2085
2086 Instead, use a <literal>case</literal> expression:
2087
2088 <programlisting>
2089   f3 x = case x of Baz1 a b -> a==b
2090 </programlisting>
2091
2092 In general, you can only pattern-match
2093 on an existentially-quantified constructor in a <literal>case</literal> expression or
2094 in the patterns of a function definition.
2095
2096 The reason for this restriction is really an implementation one.
2097 Type-checking binding groups is already a nightmare without
2098 existentials complicating the picture.  Also an existential pattern
2099 binding at the top level of a module doesn't make sense, because it's
2100 not clear how to prevent the existentially-quantified type "escaping".
2101 So for now, there's a simple-to-state restriction.  We'll see how
2102 annoying it is.
2103
2104 </para>
2105 </listitem>
2106 <listitem>
2107
2108 <para>
2109 You can't use existential quantification for <literal>newtype</literal>
2110 declarations.  So this is illegal:
2111
2112
2113 <programlisting>
2114   newtype T = forall a. Ord a => MkT a
2115 </programlisting>
2116
2117
2118 Reason: a value of type <literal>T</literal> must be represented as a
2119 pair of a dictionary for <literal>Ord t</literal> and a value of type
2120 <literal>t</literal>.  That contradicts the idea that
2121 <literal>newtype</literal> should have no concrete representation.
2122 You can get just the same efficiency and effect by using
2123 <literal>data</literal> instead of <literal>newtype</literal>.  If
2124 there is no overloading involved, then there is more of a case for
2125 allowing an existentially-quantified <literal>newtype</literal>,
2126 because the <literal>data</literal> version does carry an
2127 implementation cost, but single-field existentially quantified
2128 constructors aren't much use.  So the simple restriction (no
2129 existential stuff on <literal>newtype</literal>) stands, unless there
2130 are convincing reasons to change it.
2131
2132
2133 </para>
2134 </listitem>
2135 <listitem>
2136
2137 <para>
2138  You can't use <literal>deriving</literal> to define instances of a
2139 data type with existentially quantified data constructors.
2140
2141 Reason: in most cases it would not make sense. For example:;
2142
2143 <programlisting>
2144 data T = forall a. MkT [a] deriving( Eq )
2145 </programlisting>
2146
2147 To derive <literal>Eq</literal> in the standard way we would need to have equality
2148 between the single component of two <function>MkT</function> constructors:
2149
2150 <programlisting>
2151 instance Eq T where
2152   (MkT a) == (MkT b) = ???
2153 </programlisting>
2154
2155 But <varname>a</varname> and <varname>b</varname> have distinct types, and so can't be compared.
2156 It's just about possible to imagine examples in which the derived instance
2157 would make sense, but it seems altogether simpler simply to prohibit such
2158 declarations.  Define your own instances!
2159 </para>
2160 </listitem>
2161
2162 </itemizedlist>
2163
2164 </para>
2165
2166 </sect3>
2167 </sect2>
2168
2169 <!-- ====================== Generalised algebraic data types =======================  -->
2170
2171 <sect2 id="gadt-style">
2172 <title>Declaring data types with explicit constructor signatures</title>
2173
2174 <para>GHC allows you to declare an algebraic data type by
2175 giving the type signatures of constructors explicitly.  For example:
2176 <programlisting>
2177   data Maybe a where
2178       Nothing :: Maybe a
2179       Just    :: a -> Maybe a
2180 </programlisting>
2181 The form is called a "GADT-style declaration"
2182 because Generalised Algebraic Data Types, described in <xref linkend="gadt"/>,
2183 can only be declared using this form.</para>
2184 <para>Notice that GADT-style syntax generalises existential types (<xref linkend="existential-quantification"/>).
2185 For example, these two declarations are equivalent:
2186 <programlisting>
2187   data Foo = forall a. MkFoo a (a -> Bool)
2188   data Foo' where { MKFoo :: a -> (a->Bool) -> Foo' }
2189 </programlisting>
2190 </para>
2191 <para>Any data type that can be declared in standard Haskell-98 syntax
2192 can also be declared using GADT-style syntax.
2193 The choice is largely stylistic, but GADT-style declarations differ in one important respect:
2194 they treat class constraints on the data constructors differently.
2195 Specifically, if the constructor is given a type-class context, that
2196 context is made available by pattern matching.  For example:
2197 <programlisting>
2198   data Set a where
2199     MkSet :: Eq a => [a] -> Set a
2200
2201   makeSet :: Eq a => [a] -> Set a
2202   makeSet xs = MkSet (nub xs)
2203
2204   insert :: a -> Set a -> Set a
2205   insert a (MkSet as) | a `elem` as = MkSet as
2206                       | otherwise   = MkSet (a:as)
2207 </programlisting>
2208 A use of <literal>MkSet</literal> as a constructor (e.g. in the definition of <literal>makeSet</literal>)
2209 gives rise to a <literal>(Eq a)</literal>
2210 constraint, as you would expect.  The new feature is that pattern-matching on <literal>MkSet</literal>
2211 (as in the definition of <literal>insert</literal>) makes <emphasis>available</emphasis> an <literal>(Eq a)</literal>
2212 context.  In implementation terms, the <literal>MkSet</literal> constructor has a hidden field that stores
2213 the <literal>(Eq a)</literal> dictionary that is passed to <literal>MkSet</literal>; so
2214 when pattern-matching that dictionary becomes available for the right-hand side of the match.
2215 In the example, the equality dictionary is used to satisfy the equality constraint
2216 generated by the call to <literal>elem</literal>, so that the type of
2217 <literal>insert</literal> itself has no <literal>Eq</literal> constraint.
2218 </para>
2219 <para>
2220 For example, one possible application is to reify dictionaries:
2221 <programlisting>
2222    data NumInst a where
2223      MkNumInst :: Num a => NumInst a
2224
2225    intInst :: NumInst Int
2226    intInst = MkNumInst
2227
2228    plus :: NumInst a -> a -> a -> a
2229    plus MkNumInst p q = p + q
2230 </programlisting>
2231 Here, a value of type <literal>NumInst a</literal> is equivalent
2232 to an explicit <literal>(Num a)</literal> dictionary.
2233 </para>
2234 <para>
2235 All this applies to constructors declared using the syntax of <xref linkend="existential-with-context"/>.
2236 For example, the <literal>NumInst</literal> data type above could equivalently be declared
2237 like this:
2238 <programlisting>
2239    data NumInst a
2240       = Num a => MkNumInst (NumInst a)
2241 </programlisting>
2242 Notice that, unlike the situation when declaring an existential, there is
2243 no <literal>forall</literal>, because the <literal>Num</literal> constrains the
2244 data type's universally quantified type variable <literal>a</literal>.
2245 A constructor may have both universal and existential type variables: for example,
2246 the following two declarations are equivalent:
2247 <programlisting>
2248    data T1 a
2249         = forall b. (Num a, Eq b) => MkT1 a b
2250    data T2 a where
2251         MkT2 :: (Num a, Eq b) => a -> b -> T2 a
2252 </programlisting>
2253 </para>
2254 <para>All this behaviour contrasts with Haskell 98's peculiar treatment of
2255 contexts on a data type declaration (Section 4.2.1 of the Haskell 98 Report).
2256 In Haskell 98 the definition
2257 <programlisting>
2258   data Eq a => Set' a = MkSet' [a]
2259 </programlisting>
2260 gives <literal>MkSet'</literal> the same type as <literal>MkSet</literal> above.  But instead of
2261 <emphasis>making available</emphasis> an <literal>(Eq a)</literal> constraint, pattern-matching
2262 on <literal>MkSet'</literal> <emphasis>requires</emphasis> an <literal>(Eq a)</literal> constraint!
2263 GHC faithfully implements this behaviour, odd though it is.  But for GADT-style declarations,
2264 GHC's behaviour is much more useful, as well as much more intuitive.
2265 </para>
2266
2267 <para>
2268 The rest of this section gives further details about GADT-style data
2269 type declarations.
2270
2271 <itemizedlist>
2272 <listitem><para>
2273 The result type of each data constructor must begin with the type constructor being defined.
2274 If the result type of all constructors
2275 has the form <literal>T a1 ... an</literal>, where <literal>a1 ... an</literal>
2276 are distinct type variables, then the data type is <emphasis>ordinary</emphasis>;
2277 otherwise is a <emphasis>generalised</emphasis> data type (<xref linkend="gadt"/>).
2278 </para></listitem>
2279
2280 <listitem><para>
2281 The type signature of
2282 each constructor is independent, and is implicitly universally quantified as usual.
2283 Different constructors may have different universally-quantified type variables
2284 and different type-class constraints.
2285 For example, this is fine:
2286 <programlisting>
2287   data T a where
2288     T1 :: Eq b => b -> T b
2289     T2 :: (Show c, Ix c) => c -> [c] -> T c
2290 </programlisting>
2291 </para></listitem>
2292
2293 <listitem><para>
2294 Unlike a Haskell-98-style
2295 data type declaration, the type variable(s) in the "<literal>data Set a where</literal>" header
2296 have no scope.  Indeed, one can write a kind signature instead:
2297 <programlisting>
2298   data Set :: * -> * where ...
2299 </programlisting>
2300 or even a mixture of the two:
2301 <programlisting>
2302   data Foo a :: (* -> *) -> * where ...
2303 </programlisting>
2304 The type variables (if given) may be explicitly kinded, so we could also write the header for <literal>Foo</literal>
2305 like this:
2306 <programlisting>
2307   data Foo a (b :: * -> *) where ...
2308 </programlisting>
2309 </para></listitem>
2310
2311
2312 <listitem><para>
2313 You can use strictness annotations, in the obvious places
2314 in the constructor type:
2315 <programlisting>
2316   data Term a where
2317       Lit    :: !Int -> Term Int
2318       If     :: Term Bool -> !(Term a) -> !(Term a) -> Term a
2319       Pair   :: Term a -> Term b -> Term (a,b)
2320 </programlisting>
2321 </para></listitem>
2322
2323 <listitem><para>
2324 You can use a <literal>deriving</literal> clause on a GADT-style data type
2325 declaration.   For example, these two declarations are equivalent
2326 <programlisting>
2327   data Maybe1 a where {
2328       Nothing1 :: Maybe1 a ;
2329       Just1    :: a -> Maybe1 a
2330     } deriving( Eq, Ord )
2331
2332   data Maybe2 a = Nothing2 | Just2 a
2333        deriving( Eq, Ord )
2334 </programlisting>
2335 </para></listitem>
2336
2337 <listitem><para>
2338 You can use record syntax on a GADT-style data type declaration:
2339
2340 <programlisting>
2341   data Person where
2342       Adult { name :: String, children :: [Person] } :: Person
2343       Child { name :: String } :: Person
2344 </programlisting>
2345 As usual, for every constructor that has a field <literal>f</literal>, the type of
2346 field <literal>f</literal> must be the same (modulo alpha conversion).
2347 </para>
2348 <para>
2349 At the moment, record updates are not yet possible with GADT-style declarations,
2350 so support is limited to record construction, selection and pattern matching.
2351 For example
2352 <programlisting>
2353   aPerson = Adult { name = "Fred", children = [] }
2354
2355   shortName :: Person -> Bool
2356   hasChildren (Adult { children = kids }) = not (null kids)
2357   hasChildren (Child {})                  = False
2358 </programlisting>
2359 </para></listitem>
2360
2361 <listitem><para>
2362 As in the case of existentials declared using the Haskell-98-like record syntax
2363 (<xref linkend="existential-records"/>),
2364 record-selector functions are generated only for those fields that have well-typed
2365 selectors.
2366 Here is the example of that section, in GADT-style syntax:
2367 <programlisting>
2368 data Counter a where
2369     NewCounter { _this    :: self
2370                , _inc     :: self -> self
2371                , _display :: self -> IO ()
2372                , tag      :: a
2373                }
2374         :: Counter a
2375 </programlisting>
2376 As before, only one selector function is generated here, that for <literal>tag</literal>.
2377 Nevertheless, you can still use all the field names in pattern matching and record construction.
2378 </para></listitem>
2379 </itemizedlist></para>
2380 </sect2>
2381
2382 <sect2 id="gadt">
2383 <title>Generalised Algebraic Data Types (GADTs)</title>
2384
2385 <para>Generalised Algebraic Data Types generalise ordinary algebraic data types
2386 by allowing constructors to have richer return types.  Here is an example:
2387 <programlisting>
2388   data Term a where
2389       Lit    :: Int -> Term Int
2390       Succ   :: Term Int -> Term Int
2391       IsZero :: Term Int -> Term Bool
2392       If     :: Term Bool -> Term a -> Term a -> Term a
2393       Pair   :: Term a -> Term b -> Term (a,b)
2394 </programlisting>
2395 Notice that the return type of the constructors is not always <literal>Term a</literal>, as is the
2396 case with ordinary data types.  This generality allows us to
2397 write a well-typed <literal>eval</literal> function
2398 for these <literal>Terms</literal>:
2399 <programlisting>
2400   eval :: Term a -> a
2401   eval (Lit i)      = i
2402   eval (Succ t)     = 1 + eval t
2403   eval (IsZero t)   = eval t == 0
2404   eval (If b e1 e2) = if eval b then eval e1 else eval e2
2405   eval (Pair e1 e2) = (eval e1, eval e2)
2406 </programlisting>
2407 The key point about GADTs is that <emphasis>pattern matching causes type refinement</emphasis>.
2408 For example, in the right hand side of the equation
2409 <programlisting>
2410   eval :: Term a -> a
2411   eval (Lit i) =  ...
2412 </programlisting>
2413 the type <literal>a</literal> is refined to <literal>Int</literal>.  That's the whole point!
2414 A precise specification of the type rules is beyond what this user manual aspires to,
2415 but the design closely follows that described in
2416 the paper <ulink
2417 url="http://research.microsoft.com/%7Esimonpj/papers/gadt/">Simple
2418 unification-based type inference for GADTs</ulink>,
2419 (ICFP 2006).
2420 The general principle is this: <emphasis>type refinement is only carried out
2421 based on user-supplied type annotations</emphasis>.
2422 So if no type signature is supplied for <literal>eval</literal>, no type refinement happens,
2423 and lots of obscure error messages will
2424 occur.  However, the refinement is quite general.  For example, if we had:
2425 <programlisting>
2426   eval :: Term a -> a -> a
2427   eval (Lit i) j =  i+j
2428 </programlisting>
2429 the pattern match causes the type <literal>a</literal> to be refined to <literal>Int</literal> (because of the type
2430 of the constructor <literal>Lit</literal>), and that refinement also applies to the type of <literal>j</literal>, and
2431 the result type of the <literal>case</literal> expression.  Hence the addition <literal>i+j</literal> is legal.
2432 </para>
2433 <para>
2434 These and many other examples are given in papers by Hongwei Xi, and
2435 Tim Sheard. There is a longer introduction
2436 <ulink url="http://www.haskell.org/haskellwiki/GADT">on the wiki</ulink>,
2437 and Ralf Hinze's
2438 <ulink url="http://www.informatik.uni-bonn.de/~ralf/publications/With.pdf">Fun with phantom types</ulink> also has a number of examples. Note that papers
2439 may use different notation to that implemented in GHC.
2440 </para>
2441 <para>
2442 The rest of this section outlines the extensions to GHC that support GADTs.   The extension is enabled with
2443 <option>-XGADTs</option>.  The <option>-XGADTs</option> flag also sets <option>-XRelaxedPolyRec</option>.
2444 <itemizedlist>
2445 <listitem><para>
2446 A GADT can only be declared using GADT-style syntax (<xref linkend="gadt-style"/>);
2447 the old Haskell-98 syntax for data declarations always declares an ordinary data type.
2448 The result type of each constructor must begin with the type constructor being defined,
2449 but for a GADT the arguments to the type constructor can be arbitrary monotypes.
2450 For example, in the <literal>Term</literal> data
2451 type above, the type of each constructor must end with <literal>Term ty</literal>, but
2452 the <literal>ty</literal> need not be a type variable (e.g. the <literal>Lit</literal>
2453 constructor).
2454 </para></listitem>
2455
2456 <listitem><para>
2457 It's is permitted to declare an ordinary algebraic data type using GADT-style syntax.
2458 What makes a GADT into a GADT is not the syntax, but rather the presence of data constructors
2459 whose result type is not just <literal>T a b</literal>.
2460 </para></listitem>
2461
2462 <listitem><para>
2463 You cannot use a <literal>deriving</literal> clause for a GADT; only for
2464 an ordinary data type.
2465 </para></listitem>
2466
2467 <listitem><para>
2468 As mentioned in <xref linkend="gadt-style"/>, record syntax is supported.
2469 For example:
2470 <programlisting>
2471   data Term a where
2472       Lit    { val  :: Int }      :: Term Int
2473       Succ   { num  :: Term Int } :: Term Int
2474       Pred   { num  :: Term Int } :: Term Int
2475       IsZero { arg  :: Term Int } :: Term Bool
2476       Pair   { arg1 :: Term a
2477              , arg2 :: Term b
2478              }                    :: Term (a,b)
2479       If     { cnd  :: Term Bool
2480              , tru  :: Term a
2481              , fls  :: Term a
2482              }                    :: Term a
2483 </programlisting>
2484 However, for GADTs there is the following additional constraint:
2485 every constructor that has a field <literal>f</literal> must have
2486 the same result type (modulo alpha conversion)
2487 Hence, in the above example, we cannot merge the <literal>num</literal>
2488 and <literal>arg</literal> fields above into a
2489 single name.  Although their field types are both <literal>Term Int</literal>,
2490 their selector functions actually have different types:
2491
2492 <programlisting>
2493   num :: Term Int -> Term Int
2494   arg :: Term Bool -> Term Int
2495 </programlisting>
2496 </para></listitem>
2497
2498 <listitem><para>
2499 When pattern-matching against data constructors drawn from a GADT,
2500 for example in a <literal>case</literal> expression, the following rules apply:
2501 <itemizedlist>
2502 <listitem><para>The type of the scrutinee must be rigid.</para></listitem>
2503 <listitem><para>The type of the entire <literal>case</literal> expression must be rigid.</para></listitem>
2504 <listitem><para>The type of any free variable mentioned in any of
2505 the <literal>case</literal> alternatives must be rigid.</para></listitem>
2506 </itemizedlist>
2507 A type is "rigid" if it is completely known to the compiler at its binding site.  The easiest
2508 way to ensure that a variable a rigid type is to give it a type signature.
2509 For more precise details see <ulink url="http://research.microsoft.com/%7Esimonpj/papers/gadt">
2510 Simple unification-based type inference for GADTs
2511 </ulink>. The criteria implemented by GHC are given in the Appendix.
2512
2513 </para></listitem>
2514
2515 </itemizedlist>
2516 </para>
2517
2518 </sect2>
2519 </sect1>
2520
2521 <!-- ====================== End of Generalised algebraic data types =======================  -->
2522
2523 <sect1 id="deriving">
2524 <title>Extensions to the "deriving" mechanism</title>
2525
2526 <sect2 id="deriving-inferred">
2527 <title>Inferred context for deriving clauses</title>
2528
2529 <para>
2530 The Haskell Report is vague about exactly when a <literal>deriving</literal> clause is
2531 legal.  For example:
2532 <programlisting>
2533   data T0 f a = MkT0 a         deriving( Eq )
2534   data T1 f a = MkT1 (f a)     deriving( Eq )
2535   data T2 f a = MkT2 (f (f a)) deriving( Eq )
2536 </programlisting>
2537 The natural generated <literal>Eq</literal> code would result in these instance declarations:
2538 <programlisting>
2539   instance Eq a         => Eq (T0 f a) where ...
2540   instance Eq (f a)     => Eq (T1 f a) where ...
2541   instance Eq (f (f a)) => Eq (T2 f a) where ...
2542 </programlisting>
2543 The first of these is obviously fine. The second is still fine, although less obviously.
2544 The third is not Haskell 98, and risks losing termination of instances.
2545 </para>
2546 <para>
2547 GHC takes a conservative position: it accepts the first two, but not the third.  The  rule is this:
2548 each constraint in the inferred instance context must consist only of type variables,
2549 with no repetitions.
2550 </para>
2551 <para>
2552 This rule is applied regardless of flags.  If you want a more exotic context, you can write
2553 it yourself, using the <link linkend="stand-alone-deriving">standalone deriving mechanism</link>.
2554 </para>
2555 </sect2>
2556
2557 <sect2 id="stand-alone-deriving">
2558 <title>Stand-alone deriving declarations</title>
2559
2560 <para>
2561 GHC now allows stand-alone <literal>deriving</literal> declarations, enabled by <literal>-XStandaloneDeriving</literal>:
2562 <programlisting>
2563   data Foo a = Bar a | Baz String
2564
2565   deriving instance Eq a => Eq (Foo a)
2566 </programlisting>
2567 The syntax is identical to that of an ordinary instance declaration apart from (a) the keyword
2568 <literal>deriving</literal>, and (b) the absence of the <literal>where</literal> part.
2569 You must supply a context (in the example the context is <literal>(Eq a)</literal>),
2570 exactly as you would in an ordinary instance declaration.
2571 (In contrast the context is inferred in a <literal>deriving</literal> clause
2572 attached to a data type declaration.)
2573
2574 A <literal>deriving instance</literal> declaration
2575 must obey the same rules concerning form and termination as ordinary instance declarations,
2576 controlled by the same flags; see <xref linkend="instance-decls"/>.
2577 </para>
2578 <para>
2579 Unlike a <literal>deriving</literal>
2580 declaration attached to a <literal>data</literal> declaration, the instance can be more specific
2581 than the data type (assuming you also use
2582 <literal>-XFlexibleInstances</literal>, <xref linkend="instance-rules"/>).  Consider
2583 for example
2584 <programlisting>
2585   data Foo a = Bar a | Baz String
2586
2587   deriving instance Eq a => Eq (Foo [a])
2588   deriving instance Eq a => Eq (Foo (Maybe a))
2589 </programlisting>
2590 This will generate a derived instance for <literal>(Foo [a])</literal> and <literal>(Foo (Maybe a))</literal>,
2591 but other types such as <literal>(Foo (Int,Bool))</literal> will not be an instance of <literal>Eq</literal>.
2592 </para>
2593
2594 <para>The stand-alone syntax is generalised for newtypes in exactly the same
2595 way that ordinary <literal>deriving</literal> clauses are generalised (<xref linkend="newtype-deriving"/>).
2596 For example:
2597 <programlisting>
2598   newtype Foo a = MkFoo (State Int a)
2599
2600   deriving instance MonadState Int Foo
2601 </programlisting>
2602 GHC always treats the <emphasis>last</emphasis> parameter of the instance
2603 (<literal>Foo</literal> in this example) as the type whose instance is being derived.
2604 </para>
2605
2606 </sect2>
2607
2608
2609 <sect2 id="deriving-typeable">
2610 <title>Deriving clause for classes <literal>Typeable</literal> and <literal>Data</literal></title>
2611
2612 <para>
2613 Haskell 98 allows the programmer to add "<literal>deriving( Eq, Ord )</literal>" to a data type
2614 declaration, to generate a standard instance declaration for classes specified in the <literal>deriving</literal> clause.
2615 In Haskell 98, the only classes that may appear in the <literal>deriving</literal> clause are the standard
2616 classes <literal>Eq</literal>, <literal>Ord</literal>,
2617 <literal>Enum</literal>, <literal>Ix</literal>, <literal>Bounded</literal>, <literal>Read</literal>, and <literal>Show</literal>.
2618 </para>
2619 <para>
2620 GHC extends this list with two more classes that may be automatically derived
2621 (provided the <option>-XDeriveDataTypeable</option> flag is specified):
2622 <literal>Typeable</literal>, and <literal>Data</literal>.  These classes are defined in the library
2623 modules <literal>Data.Typeable</literal> and <literal>Data.Generics</literal> respectively, and the
2624 appropriate class must be in scope before it can be mentioned in the <literal>deriving</literal> clause.
2625 </para>
2626 <para>An instance of <literal>Typeable</literal> can only be derived if the
2627 data type has seven or fewer type parameters, all of kind <literal>*</literal>.
2628 The reason for this is that the <literal>Typeable</literal> class is derived using the scheme
2629 described in
2630 <ulink url="http://research.microsoft.com/%7Esimonpj/papers/hmap/gmap2.ps">
2631 Scrap More Boilerplate: Reflection, Zips, and Generalised Casts
2632 </ulink>.
2633 (Section 7.4 of the paper describes the multiple <literal>Typeable</literal> classes that
2634 are used, and only <literal>Typeable1</literal> up to
2635 <literal>Typeable7</literal> are provided in the library.)
2636 In other cases, there is nothing to stop the programmer writing a <literal>TypableX</literal>
2637 class, whose kind suits that of the data type constructor, and
2638 then writing the data type instance by hand.
2639 </para>
2640 </sect2>
2641
2642 <sect2 id="newtype-deriving">
2643 <title>Generalised derived instances for newtypes</title>
2644
2645 <para>
2646 When you define an abstract type using <literal>newtype</literal>, you may want
2647 the new type to inherit some instances from its representation. In
2648 Haskell 98, you can inherit instances of <literal>Eq</literal>, <literal>Ord</literal>,
2649 <literal>Enum</literal> and <literal>Bounded</literal> by deriving them, but for any
2650 other classes you have to write an explicit instance declaration. For
2651 example, if you define
2652
2653 <programlisting>
2654   newtype Dollars = Dollars Int
2655 </programlisting>
2656
2657 and you want to use arithmetic on <literal>Dollars</literal>, you have to
2658 explicitly define an instance of <literal>Num</literal>:
2659
2660 <programlisting>
2661   instance Num Dollars where
2662     Dollars a + Dollars b = Dollars (a+b)
2663     ...
2664 </programlisting>
2665 All the instance does is apply and remove the <literal>newtype</literal>
2666 constructor. It is particularly galling that, since the constructor
2667 doesn't appear at run-time, this instance declaration defines a
2668 dictionary which is <emphasis>wholly equivalent</emphasis> to the <literal>Int</literal>
2669 dictionary, only slower!
2670 </para>
2671
2672
2673 <sect3> <title> Generalising the deriving clause </title>
2674 <para>
2675 GHC now permits such instances to be derived instead,
2676 using the flag <option>-XGeneralizedNewtypeDeriving</option>,
2677 so one can write
2678 <programlisting>
2679   newtype Dollars = Dollars Int deriving (Eq,Show,Num)
2680 </programlisting>
2681
2682 and the implementation uses the <emphasis>same</emphasis> <literal>Num</literal> dictionary
2683 for <literal>Dollars</literal> as for <literal>Int</literal>. Notionally, the compiler
2684 derives an instance declaration of the form
2685
2686 <programlisting>
2687   instance Num Int => Num Dollars
2688 </programlisting>
2689
2690 which just adds or removes the <literal>newtype</literal> constructor according to the type.
2691 </para>
2692 <para>
2693
2694 We can also derive instances of constructor classes in a similar
2695 way. For example, suppose we have implemented state and failure monad
2696 transformers, such that
2697
2698 <programlisting>
2699   instance Monad m => Monad (State s m)
2700   instance Monad m => Monad (Failure m)
2701 </programlisting>
2702 In Haskell 98, we can define a parsing monad by
2703 <programlisting>
2704   type Parser tok m a = State [tok] (Failure m) a
2705 </programlisting>
2706
2707 which is automatically a monad thanks to the instance declarations
2708 above. With the extension, we can make the parser type abstract,
2709 without needing to write an instance of class <literal>Monad</literal>, via
2710
2711 <programlisting>
2712   newtype Parser tok m a = Parser (State [tok] (Failure m) a)
2713                          deriving Monad
2714 </programlisting>
2715 In this case the derived instance declaration is of the form
2716 <programlisting>
2717   instance Monad (State [tok] (Failure m)) => Monad (Parser tok m)
2718 </programlisting>
2719
2720 Notice that, since <literal>Monad</literal> is a constructor class, the
2721 instance is a <emphasis>partial application</emphasis> of the new type, not the
2722 entire left hand side. We can imagine that the type declaration is
2723 "eta-converted" to generate the context of the instance
2724 declaration.
2725 </para>
2726 <para>
2727
2728 We can even derive instances of multi-parameter classes, provided the
2729 newtype is the last class parameter. In this case, a ``partial
2730 application'' of the class appears in the <literal>deriving</literal>
2731 clause. For example, given the class
2732
2733 <programlisting>
2734   class StateMonad s m | m -> s where ...
2735   instance Monad m => StateMonad s (State s m) where ...
2736 </programlisting>
2737 then we can derive an instance of <literal>StateMonad</literal> for <literal>Parser</literal>s by
2738 <programlisting>
2739   newtype Parser tok m a = Parser (State [tok] (Failure m) a)
2740                          deriving (Monad, StateMonad [tok])
2741 </programlisting>
2742
2743 The derived instance is obtained by completing the application of the
2744 class to the new type:
2745
2746 <programlisting>
2747   instance StateMonad [tok] (State [tok] (Failure m)) =>
2748            StateMonad [tok] (Parser tok m)
2749 </programlisting>
2750 </para>
2751 <para>
2752
2753 As a result of this extension, all derived instances in newtype
2754  declarations are treated uniformly (and implemented just by reusing
2755 the dictionary for the representation type), <emphasis>except</emphasis>
2756 <literal>Show</literal> and <literal>Read</literal>, which really behave differently for
2757 the newtype and its representation.
2758 </para>
2759 </sect3>
2760
2761 <sect3> <title> A more precise specification </title>
2762 <para>
2763 Derived instance declarations are constructed as follows. Consider the
2764 declaration (after expansion of any type synonyms)
2765
2766 <programlisting>
2767   newtype T v1...vn = T' (t vk+1...vn) deriving (c1...cm)
2768 </programlisting>
2769
2770 where
2771  <itemizedlist>
2772 <listitem><para>
2773   The <literal>ci</literal> are partial applications of
2774   classes of the form <literal>C t1'...tj'</literal>, where the arity of <literal>C</literal>
2775   is exactly <literal>j+1</literal>.  That is, <literal>C</literal> lacks exactly one type argument.
2776 </para></listitem>
2777 <listitem><para>
2778   The <literal>k</literal> is chosen so that <literal>ci (T v1...vk)</literal> is well-kinded.
2779 </para></listitem>
2780 <listitem><para>
2781   The type <literal>t</literal> is an arbitrary type.
2782 </para></listitem>
2783 <listitem><para>
2784   The type variables <literal>vk+1...vn</literal> do not occur in <literal>t</literal>,
2785   nor in the <literal>ci</literal>, and
2786 </para></listitem>
2787 <listitem><para>
2788   None of the <literal>ci</literal> is <literal>Read</literal>, <literal>Show</literal>,
2789                 <literal>Typeable</literal>, or <literal>Data</literal>.  These classes
2790                 should not "look through" the type or its constructor.  You can still
2791                 derive these classes for a newtype, but it happens in the usual way, not
2792                 via this new mechanism.
2793 </para></listitem>
2794 </itemizedlist>
2795 Then, for each <literal>ci</literal>, the derived instance
2796 declaration is:
2797 <programlisting>
2798   instance ci t => ci (T v1...vk)
2799 </programlisting>
2800 As an example which does <emphasis>not</emphasis> work, consider
2801 <programlisting>
2802   newtype NonMonad m s = NonMonad (State s m s) deriving Monad
2803 </programlisting>
2804 Here we cannot derive the instance
2805 <programlisting>
2806   instance Monad (State s m) => Monad (NonMonad m)
2807 </programlisting>
2808
2809 because the type variable <literal>s</literal> occurs in <literal>State s m</literal>,
2810 and so cannot be "eta-converted" away. It is a good thing that this
2811 <literal>deriving</literal> clause is rejected, because <literal>NonMonad m</literal> is
2812 not, in fact, a monad --- for the same reason. Try defining
2813 <literal>>>=</literal> with the correct type: you won't be able to.
2814 </para>
2815 <para>
2816
2817 Notice also that the <emphasis>order</emphasis> of class parameters becomes
2818 important, since we can only derive instances for the last one. If the
2819 <literal>StateMonad</literal> class above were instead defined as
2820
2821 <programlisting>
2822   class StateMonad m s | m -> s where ...
2823 </programlisting>
2824
2825 then we would not have been able to derive an instance for the
2826 <literal>Parser</literal> type above. We hypothesise that multi-parameter
2827 classes usually have one "main" parameter for which deriving new
2828 instances is most interesting.
2829 </para>
2830 <para>Lastly, all of this applies only for classes other than
2831 <literal>Read</literal>, <literal>Show</literal>, <literal>Typeable</literal>,
2832 and <literal>Data</literal>, for which the built-in derivation applies (section
2833 4.3.3. of the Haskell Report).
2834 (For the standard classes <literal>Eq</literal>, <literal>Ord</literal>,
2835 <literal>Ix</literal>, and <literal>Bounded</literal> it is immaterial whether
2836 the standard method is used or the one described here.)
2837 </para>
2838 </sect3>
2839 </sect2>
2840 </sect1>
2841
2842
2843 <!-- TYPE SYSTEM EXTENSIONS -->
2844 <sect1 id="type-class-extensions">
2845 <title>Class and instances declarations</title>
2846
2847 <sect2 id="multi-param-type-classes">
2848 <title>Class declarations</title>
2849
2850 <para>
2851 This section, and the next one, documents GHC's type-class extensions.
2852 There's lots of background in the paper <ulink
2853 url="http://research.microsoft.com/~simonpj/Papers/type-class-design-space/">Type
2854 classes: exploring the design space</ulink> (Simon Peyton Jones, Mark
2855 Jones, Erik Meijer).
2856 </para>
2857 <para>
2858 All the extensions are enabled by the <option>-fglasgow-exts</option> flag.
2859 </para>
2860
2861 <sect3>
2862 <title>Multi-parameter type classes</title>
2863 <para>
2864 Multi-parameter type classes are permitted. For example:
2865
2866
2867 <programlisting>
2868   class Collection c a where
2869     union :: c a -> c a -> c a
2870     ...etc.
2871 </programlisting>
2872
2873 </para>
2874 </sect3>
2875
2876 <sect3>
2877 <title>The superclasses of a class declaration</title>
2878
2879 <para>
2880 There are no restrictions on the context in a class declaration
2881 (which introduces superclasses), except that the class hierarchy must
2882 be acyclic.  So these class declarations are OK:
2883
2884
2885 <programlisting>
2886   class Functor (m k) => FiniteMap m k where
2887     ...
2888
2889   class (Monad m, Monad (t m)) => Transform t m where
2890     lift :: m a -> (t m) a
2891 </programlisting>
2892
2893
2894 </para>
2895 <para>
2896 As in Haskell 98, The class hierarchy must be acyclic.  However, the definition
2897 of "acyclic" involves only the superclass relationships.  For example,
2898 this is OK:
2899
2900
2901 <programlisting>
2902   class C a where {
2903     op :: D b => a -> b -> b
2904   }
2905
2906   class C a => D a where { ... }
2907 </programlisting>
2908
2909
2910 Here, <literal>C</literal> is a superclass of <literal>D</literal>, but it's OK for a
2911 class operation <literal>op</literal> of <literal>C</literal> to mention <literal>D</literal>.  (It
2912 would not be OK for <literal>D</literal> to be a superclass of <literal>C</literal>.)
2913 </para>
2914 </sect3>
2915
2916
2917
2918
2919 <sect3 id="class-method-types">
2920 <title>Class method types</title>
2921
2922 <para>
2923 Haskell 98 prohibits class method types to mention constraints on the
2924 class type variable, thus:
2925 <programlisting>
2926   class Seq s a where
2927     fromList :: [a] -> s a
2928     elem     :: Eq a => a -> s a -> Bool
2929 </programlisting>
2930 The type of <literal>elem</literal> is illegal in Haskell 98, because it
2931 contains the constraint <literal>Eq a</literal>, constrains only the
2932 class type variable (in this case <literal>a</literal>).
2933 GHC lifts this restriction (flag <option>-XConstrainedClassMethods</option>).
2934 </para>
2935
2936
2937 </sect3>
2938 </sect2>
2939
2940 <sect2 id="functional-dependencies">
2941 <title>Functional dependencies
2942 </title>
2943
2944 <para> Functional dependencies are implemented as described by Mark Jones
2945 in &ldquo;<ulink url="http://citeseer.ist.psu.edu/jones00type.html">Type Classes with Functional Dependencies</ulink>&rdquo;, Mark P. Jones,
2946 In Proceedings of the 9th European Symposium on Programming,
2947 ESOP 2000, Berlin, Germany, March 2000, Springer-Verlag LNCS 1782,
2948 .
2949 </para>
2950 <para>
2951 Functional dependencies are introduced by a vertical bar in the syntax of a
2952 class declaration;  e.g.
2953 <programlisting>
2954   class (Monad m) => MonadState s m | m -> s where ...
2955
2956   class Foo a b c | a b -> c where ...
2957 </programlisting>
2958 There should be more documentation, but there isn't (yet).  Yell if you need it.
2959 </para>
2960
2961 <sect3><title>Rules for functional dependencies </title>
2962 <para>
2963 In a class declaration, all of the class type variables must be reachable (in the sense
2964 mentioned in <xref linkend="type-restrictions"/>)
2965 from the free variables of each method type.
2966 For example:
2967
2968 <programlisting>
2969   class Coll s a where
2970     empty  :: s
2971     insert :: s -> a -> s
2972 </programlisting>
2973
2974 is not OK, because the type of <literal>empty</literal> doesn't mention
2975 <literal>a</literal>.  Functional dependencies can make the type variable
2976 reachable:
2977 <programlisting>
2978   class Coll s a | s -> a where
2979     empty  :: s
2980     insert :: s -> a -> s
2981 </programlisting>
2982
2983 Alternatively <literal>Coll</literal> might be rewritten
2984
2985 <programlisting>
2986   class Coll s a where
2987     empty  :: s a
2988     insert :: s a -> a -> s a
2989 </programlisting>
2990
2991
2992 which makes the connection between the type of a collection of
2993 <literal>a</literal>'s (namely <literal>(s a)</literal>) and the element type <literal>a</literal>.
2994 Occasionally this really doesn't work, in which case you can split the
2995 class like this:
2996
2997
2998 <programlisting>
2999   class CollE s where
3000     empty  :: s
3001
3002   class CollE s => Coll s a where
3003     insert :: s -> a -> s
3004 </programlisting>
3005 </para>
3006 </sect3>
3007
3008
3009 <sect3>
3010 <title>Background on functional dependencies</title>
3011
3012 <para>The following description of the motivation and use of functional dependencies is taken
3013 from the Hugs user manual, reproduced here (with minor changes) by kind
3014 permission of Mark Jones.
3015 </para>
3016 <para>
3017 Consider the following class, intended as part of a
3018 library for collection types:
3019 <programlisting>
3020    class Collects e ce where
3021        empty  :: ce
3022        insert :: e -> ce -> ce
3023        member :: e -> ce -> Bool
3024 </programlisting>
3025 The type variable e used here represents the element type, while ce is the type
3026 of the container itself. Within this framework, we might want to define
3027 instances of this class for lists or characteristic functions (both of which
3028 can be used to represent collections of any equality type), bit sets (which can
3029 be used to represent collections of characters), or hash tables (which can be
3030 used to represent any collection whose elements have a hash function). Omitting
3031 standard implementation details, this would lead to the following declarations:
3032 <programlisting>
3033    instance Eq e => Collects e [e] where ...
3034    instance Eq e => Collects e (e -> Bool) where ...
3035    instance Collects Char BitSet where ...
3036    instance (Hashable e, Collects a ce)
3037               => Collects e (Array Int ce) where ...
3038 </programlisting>
3039 All this looks quite promising; we have a class and a range of interesting
3040 implementations. Unfortunately, there are some serious problems with the class
3041 declaration. First, the empty function has an ambiguous type:
3042 <programlisting>
3043    empty :: Collects e ce => ce
3044 </programlisting>
3045 By "ambiguous" we mean that there is a type variable e that appears on the left
3046 of the <literal>=&gt;</literal> symbol, but not on the right. The problem with
3047 this is that, according to the theoretical foundations of Haskell overloading,
3048 we cannot guarantee a well-defined semantics for any term with an ambiguous
3049 type.
3050 </para>
3051 <para>
3052 We can sidestep this specific problem by removing the empty member from the
3053 class declaration. However, although the remaining members, insert and member,
3054 do not have ambiguous types, we still run into problems when we try to use
3055 them. For example, consider the following two functions:
3056 <programlisting>
3057    f x y = insert x . insert y
3058    g     = f True 'a'
3059 </programlisting>
3060 for which GHC infers the following types:
3061 <programlisting>
3062    f :: (Collects a c, Collects b c) => a -> b -> c -> c
3063    g :: (Collects Bool c, Collects Char c) => c -> c
3064 </programlisting>
3065 Notice that the type for f allows the two parameters x and y to be assigned
3066 different types, even though it attempts to insert each of the two values, one
3067 after the other, into the same collection. If we're trying to model collections
3068 that contain only one type of value, then this is clearly an inaccurate
3069 type. Worse still, the definition for g is accepted, without causing a type
3070 error. As a result, the error in this code will not be flagged at the point
3071 where it appears. Instead, it will show up only when we try to use g, which
3072 might even be in a different module.
3073 </para>
3074
3075 <sect4><title>An attempt to use constructor classes</title>
3076
3077 <para>
3078 Faced with the problems described above, some Haskell programmers might be
3079 tempted to use something like the following version of the class declaration:
3080 <programlisting>
3081    class Collects e c where
3082       empty  :: c e
3083       insert :: e -> c e -> c e
3084       member :: e -> c e -> Bool
3085 </programlisting>
3086 The key difference here is that we abstract over the type constructor c that is
3087 used to form the collection type c e, and not over that collection type itself,
3088 represented by ce in the original class declaration. This avoids the immediate
3089 problems that we mentioned above: empty has type <literal>Collects e c => c
3090 e</literal>, which is not ambiguous.
3091 </para>
3092 <para>
3093 The function f from the previous section has a more accurate type:
3094 <programlisting>
3095    f :: (Collects e c) => e -> e -> c e -> c e
3096 </programlisting>
3097 The function g from the previous section is now rejected with a type error as
3098 we would hope because the type of f does not allow the two arguments to have
3099 different types.
3100 This, then, is an example of a multiple parameter class that does actually work
3101 quite well in practice, without ambiguity problems.
3102 There is, however, a catch. This version of the Collects class is nowhere near
3103 as general as the original class seemed to be: only one of the four instances
3104 for <literal>Collects</literal>
3105 given above can be used with this version of Collects because only one of
3106 them---the instance for lists---has a collection type that can be written in
3107 the form c e, for some type constructor c, and element type e.
3108 </para>
3109 </sect4>
3110
3111 <sect4><title>Adding functional dependencies</title>
3112
3113 <para>
3114 To get a more useful version of the Collects class, Hugs provides a mechanism
3115 that allows programmers to specify dependencies between the parameters of a
3116 multiple parameter class (For readers with an interest in theoretical
3117 foundations and previous work: The use of dependency information can be seen
3118 both as a generalization of the proposal for `parametric type classes' that was
3119 put forward by Chen, Hudak, and Odersky, or as a special case of Mark Jones's
3120 later framework for "improvement" of qualified types. The
3121 underlying ideas are also discussed in a more theoretical and abstract setting
3122 in a manuscript [implparam], where they are identified as one point in a
3123 general design space for systems of implicit parameterization.).
3124
3125 To start with an abstract example, consider a declaration such as:
3126 <programlisting>
3127    class C a b where ...
3128 </programlisting>
3129 which tells us simply that C can be thought of as a binary relation on types
3130 (or type constructors, depending on the kinds of a and b). Extra clauses can be
3131 included in the definition of classes to add information about dependencies
3132 between parameters, as in the following examples:
3133 <programlisting>
3134    class D a b | a -> b where ...
3135    class E a b | a -> b, b -> a where ...
3136 </programlisting>
3137 The notation <literal>a -&gt; b</literal> used here between the | and where
3138 symbols --- not to be
3139 confused with a function type --- indicates that the a parameter uniquely
3140 determines the b parameter, and might be read as "a determines b." Thus D is
3141 not just a relation, but actually a (partial) function. Similarly, from the two
3142 dependencies that are included in the definition of E, we can see that E
3143 represents a (partial) one-one mapping between types.
3144 </para>
3145 <para>
3146 More generally, dependencies take the form <literal>x1 ... xn -&gt; y1 ... ym</literal>,
3147 where x1, ..., xn, and y1, ..., yn are type variables with n&gt;0 and
3148 m&gt;=0, meaning that the y parameters are uniquely determined by the x
3149 parameters. Spaces can be used as separators if more than one variable appears
3150 on any single side of a dependency, as in <literal>t -&gt; a b</literal>. Note that a class may be
3151 annotated with multiple dependencies using commas as separators, as in the
3152 definition of E above. Some dependencies that we can write in this notation are
3153 redundant, and will be rejected because they don't serve any useful
3154 purpose, and may instead indicate an error in the program. Examples of
3155 dependencies like this include  <literal>a -&gt; a </literal>,
3156 <literal>a -&gt; a a </literal>,
3157 <literal>a -&gt; </literal>, etc. There can also be
3158 some redundancy if multiple dependencies are given, as in
3159 <literal>a-&gt;b</literal>,
3160  <literal>b-&gt;c </literal>,  <literal>a-&gt;c </literal>, and
3161 in which some subset implies the remaining dependencies. Examples like this are
3162 not treated as errors. Note that dependencies appear only in class
3163 declarations, and not in any other part of the language. In particular, the
3164 syntax for instance declarations, class constraints, and types is completely
3165 unchanged.
3166 </para>
3167 <para>
3168 By including dependencies in a class declaration, we provide a mechanism for
3169 the programmer to specify each multiple parameter class more precisely. The
3170 compiler, on the other hand, is responsible for ensuring that the set of
3171 instances that are in scope at any given point in the program is consistent
3172 with any declared dependencies. For example, the following pair of instance
3173 declarations cannot appear together in the same scope because they violate the
3174 dependency for D, even though either one on its own would be acceptable:
3175 <programlisting>
3176    instance D Bool Int where ...
3177    instance D Bool Char where ...
3178 </programlisting>
3179 Note also that the following declaration is not allowed, even by itself:
3180 <programlisting>
3181    instance D [a] b where ...
3182 </programlisting>
3183 The problem here is that this instance would allow one particular choice of [a]
3184 to be associated with more than one choice for b, which contradicts the
3185 dependency specified in the definition of D. More generally, this means that,
3186 in any instance of the form:
3187 <programlisting>
3188    instance D t s where ...
3189 </programlisting>
3190 for some particular types t and s, the only variables that can appear in s are
3191 the ones that appear in t, and hence, if the type t is known, then s will be
3192 uniquely determined.
3193 </para>
3194 <para>
3195 The benefit of including dependency information is that it allows us to define
3196 more general multiple parameter classes, without ambiguity problems, and with
3197 the benefit of more accurate types. To illustrate this, we return to the
3198 collection class example, and annotate the original definition of <literal>Collects</literal>
3199 with a simple dependency:
3200 <programlisting>
3201    class Collects e ce | ce -> e where
3202       empty  :: ce
3203       insert :: e -> ce -> ce
3204       member :: e -> ce -> Bool
3205 </programlisting>
3206 The dependency <literal>ce -&gt; e</literal> here specifies that the type e of elements is uniquely
3207 determined by the type of the collection ce. Note that both parameters of
3208 Collects are of kind *; there are no constructor classes here. Note too that
3209 all of the instances of Collects that we gave earlier can be used
3210 together with this new definition.
3211 </para>
3212 <para>
3213 What about the ambiguity problems that we encountered with the original
3214 definition? The empty function still has type Collects e ce => ce, but it is no
3215 longer necessary to regard that as an ambiguous type: Although the variable e
3216 does not appear on the right of the => symbol, the dependency for class
3217 Collects tells us that it is uniquely determined by ce, which does appear on
3218 the right of the => symbol. Hence the context in which empty is used can still
3219 give enough information to determine types for both ce and e, without
3220 ambiguity. More generally, we need only regard a type as ambiguous if it
3221 contains a variable on the left of the => that is not uniquely determined
3222 (either directly or indirectly) by the variables on the right.
3223 </para>
3224 <para>
3225 Dependencies also help to produce more accurate types for user defined
3226 functions, and hence to provide earlier detection of errors, and less cluttered
3227 types for programmers to work with. Recall the previous definition for a
3228 function f:
3229 <programlisting>
3230    f x y = insert x y = insert x . insert y
3231 </programlisting>
3232 for which we originally obtained a type:
3233 <programlisting>
3234    f :: (Collects a c, Collects b c) => a -> b -> c -> c
3235 </programlisting>
3236 Given the dependency information that we have for Collects, however, we can
3237 deduce that a and b must be equal because they both appear as the second
3238 parameter in a Collects constraint with the same first parameter c. Hence we
3239 can infer a shorter and more accurate type for f:
3240 <programlisting>
3241    f :: (Collects a c) => a -> a -> c -> c
3242 </programlisting>
3243 In a similar way, the earlier definition of g will now be flagged as a type error.
3244 </para>
3245 <para>
3246 Although we have given only a few examples here, it should be clear that the
3247 addition of dependency information can help to make multiple parameter classes
3248 more useful in practice, avoiding ambiguity problems, and allowing more general
3249 sets of instance declarations.
3250 </para>
3251 </sect4>
3252 </sect3>
3253 </sect2>
3254
3255 <sect2 id="instance-decls">
3256 <title>Instance declarations</title>
3257
3258 <sect3 id="instance-rules">
3259 <title>Relaxed rules for instance declarations</title>
3260
3261 <para>An instance declaration has the form
3262 <screen>
3263   instance ( <replaceable>assertion</replaceable><subscript>1</subscript>, ..., <replaceable>assertion</replaceable><subscript>n</subscript>) =&gt; <replaceable>class</replaceable> <replaceable>type</replaceable><subscript>1</subscript> ... <replaceable>type</replaceable><subscript>m</subscript> where ...
3264 </screen>
3265 The part before the "<literal>=&gt;</literal>" is the
3266 <emphasis>context</emphasis>, while the part after the
3267 "<literal>=&gt;</literal>" is the <emphasis>head</emphasis> of the instance declaration.
3268 </para>
3269
3270 <para>
3271 In Haskell 98 the head of an instance declaration
3272 must be of the form <literal>C (T a1 ... an)</literal>, where
3273 <literal>C</literal> is the class, <literal>T</literal> is a type constructor,
3274 and the <literal>a1 ... an</literal> are distinct type variables.
3275 Furthermore, the assertions in the context of the instance declaration
3276 must be of the form <literal>C a</literal> where <literal>a</literal>
3277 is a type variable that occurs in the head.
3278 </para>
3279 <para>
3280 The <option>-XFlexibleInstances</option> flag loosens these restrictions
3281 considerably.  Firstly, multi-parameter type classes are permitted.  Secondly,
3282 the context and head of the instance declaration can each consist of arbitrary
3283 (well-kinded) assertions <literal>(C t1 ... tn)</literal> subject only to the
3284 following rules:
3285 <orderedlist>
3286 <listitem><para>
3287 The Paterson Conditions: for each assertion in the context
3288 <orderedlist>
3289 <listitem><para>No type variable has more occurrences in the assertion than in the head</para></listitem>
3290 <listitem><para>The assertion has fewer constructors and variables (taken together
3291       and counting repetitions) than the head</para></listitem>
3292 </orderedlist>
3293 </para></listitem>
3294
3295 <listitem><para>The Coverage Condition.  For each functional dependency,
3296 <replaceable>tvs</replaceable><subscript>left</subscript> <literal>-&gt;</literal>
3297 <replaceable>tvs</replaceable><subscript>right</subscript>,  of the class,
3298 every type variable in
3299 S(<replaceable>tvs</replaceable><subscript>right</subscript>) must appear in
3300 S(<replaceable>tvs</replaceable><subscript>left</subscript>), where S is the
3301 substitution mapping each type variable in the class declaration to the
3302 corresponding type in the instance declaration.
3303 </para></listitem>
3304 </orderedlist>
3305 These restrictions ensure that context reduction terminates: each reduction
3306 step makes the problem smaller by at least one
3307 constructor.  Both the Paterson Conditions and the Coverage Condition are lifted
3308 if you give the <option>-XUndecidableInstances</option>
3309 flag (<xref linkend="undecidable-instances"/>).
3310 You can find lots of background material about the reason for these
3311 restrictions in the paper <ulink
3312 url="http://research.microsoft.com/%7Esimonpj/papers/fd%2Dchr/">
3313 Understanding functional dependencies via Constraint Handling Rules</ulink>.
3314 </para>
3315 <para>
3316 For example, these are OK:
3317 <programlisting>
3318   instance C Int [a]          -- Multiple parameters
3319   instance Eq (S [a])         -- Structured type in head
3320
3321       -- Repeated type variable in head
3322   instance C4 a a => C4 [a] [a]
3323   instance Stateful (ST s) (MutVar s)
3324
3325       -- Head can consist of type variables only
3326   instance C a
3327   instance (Eq a, Show b) => C2 a b
3328
3329       -- Non-type variables in context
3330   instance Show (s a) => Show (Sized s a)
3331   instance C2 Int a => C3 Bool [a]
3332   instance C2 Int a => C3 [a] b
3333 </programlisting>
3334 But these are not:
3335 <programlisting>
3336       -- Context assertion no smaller than head
3337   instance C a => C a where ...
3338       -- (C b b) has more more occurrences of b than the head
3339   instance C b b => Foo [b] where ...
3340 </programlisting>
3341 </para>
3342
3343 <para>
3344 The same restrictions apply to instances generated by
3345 <literal>deriving</literal> clauses.  Thus the following is accepted:
3346 <programlisting>
3347   data MinHeap h a = H a (h a)
3348     deriving (Show)
3349 </programlisting>
3350 because the derived instance
3351 <programlisting>
3352   instance (Show a, Show (h a)) => Show (MinHeap h a)
3353 </programlisting>
3354 conforms to the above rules.
3355 </para>
3356
3357 <para>
3358 A useful idiom permitted by the above rules is as follows.
3359 If one allows overlapping instance declarations then it's quite
3360 convenient to have a "default instance" declaration that applies if
3361 something more specific does not:
3362 <programlisting>
3363   instance C a where
3364     op = ... -- Default
3365 </programlisting>
3366 </para>
3367 </sect3>
3368
3369 <sect3 id="undecidable-instances">
3370 <title>Undecidable instances</title>
3371
3372 <para>
3373 Sometimes even the rules of <xref linkend="instance-rules"/> are too onerous.
3374 For example, sometimes you might want to use the following to get the
3375 effect of a "class synonym":
3376 <programlisting>
3377   class (C1 a, C2 a, C3 a) => C a where { }
3378
3379   instance (C1 a, C2 a, C3 a) => C a where { }
3380 </programlisting>
3381 This allows you to write shorter signatures:
3382 <programlisting>
3383   f :: C a => ...
3384 </programlisting>
3385 instead of
3386 <programlisting>
3387   f :: (C1 a, C2 a, C3 a) => ...
3388 </programlisting>
3389 The restrictions on functional dependencies (<xref
3390 linkend="functional-dependencies"/>) are particularly troublesome.
3391 It is tempting to introduce type variables in the context that do not appear in
3392 the head, something that is excluded by the normal rules. For example:
3393 <programlisting>
3394   class HasConverter a b | a -> b where
3395      convert :: a -> b
3396
3397   data Foo a = MkFoo a
3398
3399   instance (HasConverter a b,Show b) => Show (Foo a) where
3400      show (MkFoo value) = show (convert value)
3401 </programlisting>
3402 This is dangerous territory, however. Here, for example, is a program that would make the
3403 typechecker loop:
3404 <programlisting>
3405   class D a
3406   class F a b | a->b
3407   instance F [a] [[a]]
3408   instance (D c, F a c) => D [a]   -- 'c' is not mentioned in the head
3409 </programlisting>
3410 Similarly, it can be tempting to lift the coverage condition:
3411 <programlisting>
3412   class Mul a b c | a b -> c where
3413         (.*.) :: a -> b -> c
3414
3415   instance Mul Int Int Int where (.*.) = (*)
3416   instance Mul Int Float Float where x .*. y = fromIntegral x * y
3417   instance Mul a b c => Mul a [b] [c] where x .*. v = map (x.*.) v
3418 </programlisting>
3419 The third instance declaration does not obey the coverage condition;
3420 and indeed the (somewhat strange) definition:
3421 <programlisting>
3422   f = \ b x y -> if b then x .*. [y] else y
3423 </programlisting>
3424 makes instance inference go into a loop, because it requires the constraint
3425 <literal>(Mul a [b] b)</literal>.
3426 </para>
3427 <para>
3428 Nevertheless, GHC allows you to experiment with more liberal rules.  If you use
3429 the experimental flag <option>-XUndecidableInstances</option>
3430 <indexterm><primary>-XUndecidableInstances</primary></indexterm>,
3431 both the Paterson Conditions and the Coverage Condition
3432 (described in <xref linkend="instance-rules"/>) are lifted.  Termination is ensured by having a
3433 fixed-depth recursion stack.  If you exceed the stack depth you get a
3434 sort of backtrace, and the opportunity to increase the stack depth
3435 with <option>-fcontext-stack=</option><emphasis>N</emphasis>.
3436 </para>
3437
3438 </sect3>
3439
3440
3441 <sect3 id="instance-overlap">
3442 <title>Overlapping instances</title>
3443 <para>
3444 In general, <emphasis>GHC requires that that it be unambiguous which instance
3445 declaration
3446 should be used to resolve a type-class constraint</emphasis>. This behaviour
3447 can be modified by two flags: <option>-XOverlappingInstances</option>
3448 <indexterm><primary>-XOverlappingInstances
3449 </primary></indexterm>
3450 and <option>-XIncoherentInstances</option>
3451 <indexterm><primary>-XIncoherentInstances
3452 </primary></indexterm>, as this section discusses.  Both these
3453 flags are dynamic flags, and can be set on a per-module basis, using
3454 an <literal>OPTIONS_GHC</literal> pragma if desired (<xref linkend="source-file-options"/>).</para>
3455 <para>
3456 When GHC tries to resolve, say, the constraint <literal>C Int Bool</literal>,
3457 it tries to match every instance declaration against the
3458 constraint,
3459 by instantiating the head of the instance declaration.  For example, consider
3460 these declarations:
3461 <programlisting>
3462   instance context1 => C Int a     where ...  -- (A)
3463   instance context2 => C a   Bool  where ...  -- (B)
3464   instance context3 => C Int [a]   where ...  -- (C)
3465   instance context4 => C Int [Int] where ...  -- (D)
3466 </programlisting>
3467 The instances (A) and (B) match the constraint <literal>C Int Bool</literal>,
3468 but (C) and (D) do not.  When matching, GHC takes
3469 no account of the context of the instance declaration
3470 (<literal>context1</literal> etc).
3471 GHC's default behaviour is that <emphasis>exactly one instance must match the
3472 constraint it is trying to resolve</emphasis>.
3473 It is fine for there to be a <emphasis>potential</emphasis> of overlap (by
3474 including both declarations (A) and (B), say); an error is only reported if a
3475 particular constraint matches more than one.
3476 </para>
3477
3478 <para>
3479 The <option>-XOverlappingInstances</option> flag instructs GHC to allow
3480 more than one instance to match, provided there is a most specific one.  For
3481 example, the constraint <literal>C Int [Int]</literal> matches instances (A),
3482 (C) and (D), but the last is more specific, and hence is chosen.  If there is no
3483 most-specific match, the program is rejected.
3484 </para>
3485 <para>
3486 However, GHC is conservative about committing to an overlapping instance.  For example:
3487 <programlisting>
3488   f :: [b] -> [b]
3489   f x = ...
3490 </programlisting>
3491 Suppose that from the RHS of <literal>f</literal> we get the constraint
3492 <literal>C Int [b]</literal>.  But
3493 GHC does not commit to instance (C), because in a particular
3494 call of <literal>f</literal>, <literal>b</literal> might be instantiate
3495 to <literal>Int</literal>, in which case instance (D) would be more specific still.
3496 So GHC rejects the program.
3497 (If you add the flag <option>-XIncoherentInstances</option>,
3498 GHC will instead pick (C), without complaining about
3499 the problem of subsequent instantiations.)
3500 </para>
3501 <para>
3502 Notice that we gave a type signature to <literal>f</literal>, so GHC had to
3503 <emphasis>check</emphasis> that <literal>f</literal> has the specified type.
3504 Suppose instead we do not give a type signature, asking GHC to <emphasis>infer</emphasis>
3505 it instead.  In this case, GHC will refrain from
3506 simplifying the constraint <literal>C Int [b]</literal> (for the same reason
3507 as before) but, rather than rejecting the program, it will infer the type
3508 <programlisting>
3509   f :: C Int [b] => [b] -> [b]
3510 </programlisting>
3511 That postpones the question of which instance to pick to the
3512 call site for <literal>f</literal>
3513 by which time more is known about the type <literal>b</literal>.
3514 You can write this type signature yourself if you use the
3515 <link linkend="flexible-contexts"><option>-XFlexibleContexts</option></link>
3516 flag.
3517 </para>
3518 <para>
3519 Exactly the same situation can arise in instance declarations themselves.  Suppose we have
3520 <programlisting>
3521   class Foo a where
3522      f :: a -> a
3523   instance Foo [b] where
3524      f x = ...
3525 </programlisting>
3526 and, as before, the constraint <literal>C Int [b]</literal> arises from <literal>f</literal>'s
3527 right hand side.  GHC will reject the instance, complaining as before that it does not know how to resolve
3528 the constraint <literal>C Int [b]</literal>, because it matches more than one instance
3529 declaration.  The solution is to postpone the choice by adding the constraint to the context
3530 of the instance declaration, thus:
3531 <programlisting>
3532   instance C Int [b] => Foo [b] where
3533      f x = ...
3534 </programlisting>
3535 (You need <link linkend="instance-rules"><option>-XFlexibleInstances</option></link> to do this.)
3536 </para>
3537 <para>
3538 The willingness to be overlapped or incoherent is a property of
3539 the <emphasis>instance declaration</emphasis> itself, controlled by the
3540 presence or otherwise of the <option>-XOverlappingInstances</option>
3541 and <option>-XIncoherentInstances</option> flags when that module is
3542 being defined.  Neither flag is required in a module that imports and uses the
3543 instance declaration.  Specifically, during the lookup process:
3544 <itemizedlist>
3545 <listitem><para>
3546 An instance declaration is ignored during the lookup process if (a) a more specific
3547 match is found, and (b) the instance declaration was compiled with
3548 <option>-XOverlappingInstances</option>.  The flag setting for the
3549 more-specific instance does not matter.
3550 </para></listitem>
3551 <listitem><para>
3552 Suppose an instance declaration does not match the constraint being looked up, but
3553 does unify with it, so that it might match when the constraint is further
3554 instantiated.  Usually GHC will regard this as a reason for not committing to
3555 some other constraint.  But if the instance declaration was compiled with
3556 <option>-XIncoherentInstances</option>, GHC will skip the "does-it-unify?"
3557 check for that declaration.
3558 </para></listitem>
3559 </itemizedlist>
3560 These rules make it possible for a library author to design a library that relies on
3561 overlapping instances without the library client having to know.
3562 </para>
3563 <para>
3564 If an instance declaration is compiled without
3565 <option>-XOverlappingInstances</option>,
3566 then that instance can never be overlapped.  This could perhaps be
3567 inconvenient.  Perhaps the rule should instead say that the
3568 <emphasis>overlapping</emphasis> instance declaration should be compiled in
3569 this way, rather than the <emphasis>overlapped</emphasis> one.  Perhaps overlap
3570 at a usage site should be permitted regardless of how the instance declarations
3571 are compiled, if the <option>-XOverlappingInstances</option> flag is
3572 used at the usage site.  (Mind you, the exact usage site can occasionally be
3573 hard to pin down.)  We are interested to receive feedback on these points.
3574 </para>
3575 <para>The <option>-XIncoherentInstances</option> flag implies the
3576 <option>-XOverlappingInstances</option> flag, but not vice versa.
3577 </para>
3578 </sect3>
3579
3580 <sect3>
3581 <title>Type synonyms in the instance head</title>
3582
3583 <para>
3584 <emphasis>Unlike Haskell 98, instance heads may use type
3585 synonyms</emphasis>.  (The instance "head" is the bit after the "=>" in an instance decl.)
3586 As always, using a type synonym is just shorthand for
3587 writing the RHS of the type synonym definition.  For example:
3588
3589
3590 <programlisting>
3591   type Point = (Int,Int)
3592   instance C Point   where ...
3593   instance C [Point] where ...
3594 </programlisting>
3595
3596
3597 is legal.  However, if you added
3598
3599
3600 <programlisting>
3601   instance C (Int,Int) where ...
3602 </programlisting>
3603
3604
3605 as well, then the compiler will complain about the overlapping
3606 (actually, identical) instance declarations.  As always, type synonyms
3607 must be fully applied.  You cannot, for example, write:
3608
3609
3610 <programlisting>
3611   type P a = [[a]]
3612   instance Monad P where ...
3613 </programlisting>
3614
3615
3616 This design decision is independent of all the others, and easily
3617 reversed, but it makes sense to me.
3618
3619 </para>
3620 </sect3>
3621
3622
3623 </sect2>
3624
3625 <sect2 id="overloaded-strings">
3626 <title>Overloaded string literals
3627 </title>
3628
3629 <para>
3630 GHC supports <emphasis>overloaded string literals</emphasis>.  Normally a
3631 string literal has type <literal>String</literal>, but with overloaded string
3632 literals enabled (with <literal>-XOverloadedStrings</literal>)
3633  a string literal has type <literal>(IsString a) => a</literal>.
3634 </para>
3635 <para>
3636 This means that the usual string syntax can be used, e.g., for packed strings
3637 and other variations of string like types.  String literals behave very much
3638 like integer literals, i.e., they can be used in both expressions and patterns.
3639 If used in a pattern the literal with be replaced by an equality test, in the same
3640 way as an integer literal is.
3641 </para>
3642 <para>
3643 The class <literal>IsString</literal> is defined as:
3644 <programlisting>
3645 class IsString a where
3646     fromString :: String -> a
3647 </programlisting>
3648 The only predefined instance is the obvious one to make strings work as usual:
3649 <programlisting>
3650 instance IsString [Char] where
3651     fromString cs = cs
3652 </programlisting>
3653 The class <literal>IsString</literal> is not in scope by default.  If you want to mention
3654 it explicitly (for example, to give an instance declaration for it), you can import it
3655 from module <literal>GHC.Exts</literal>.
3656 </para>
3657 <para>
3658 Haskell's defaulting mechanism is extended to cover string literals, when <option>-XOverloadedStrings</option> is specified.
3659 Specifically:
3660 <itemizedlist>
3661 <listitem><para>
3662 Each type in a default declaration must be an
3663 instance of <literal>Num</literal> <emphasis>or</emphasis> of <literal>IsString</literal>.
3664 </para></listitem>
3665
3666 <listitem><para>
3667 The standard defaulting rule (<ulink url="http://www.haskell.org/onlinereport/decls.html#sect4.3.4">Haskell Report, Section 4.3.4</ulink>)
3668 is extended thus: defaulting applies when all the unresolved constraints involve standard classes
3669 <emphasis>or</emphasis> <literal>IsString</literal>; and at least one is a numeric class
3670 <emphasis>or</emphasis> <literal>IsString</literal>.
3671 </para></listitem>
3672 </itemizedlist>
3673 </para>
3674 <para>
3675 A small example:
3676 <programlisting>
3677 module Main where
3678
3679 import GHC.Exts( IsString(..) )
3680
3681 newtype MyString = MyString String deriving (Eq, Show)
3682 instance IsString MyString where
3683     fromString = MyString
3684
3685 greet :: MyString -> MyString
3686 greet "hello" = "world"
3687 greet other = other
3688
3689 main = do
3690     print $ greet "hello"
3691     print $ greet "fool"
3692 </programlisting>
3693 </para>
3694 <para>
3695 Note that deriving <literal>Eq</literal> is necessary for the pattern matching
3696 to work since it gets translated into an equality comparison.
3697 </para>
3698 </sect2>
3699
3700 </sect1>
3701
3702 <sect1 id="other-type-extensions">
3703 <title>Other type system extensions</title>
3704
3705 <sect2 id="type-restrictions">
3706 <title>Type signatures</title>
3707
3708 <sect3 id="flexible-contexts"><title>The context of a type signature</title>
3709 <para>
3710 The <option>-XFlexibleContexts</option> flag lifts the Haskell 98 restriction
3711 that the type-class constraints in a type signature must have the
3712 form <emphasis>(class type-variable)</emphasis> or
3713 <emphasis>(class (type-variable type-variable ...))</emphasis>.
3714 With <option>-XFlexibleContexts</option>
3715 these type signatures are perfectly OK
3716 <programlisting>
3717   g :: Eq [a] => ...
3718   g :: Ord (T a ()) => ...
3719 </programlisting>
3720 </para>
3721 <para>
3722 GHC imposes the following restrictions on the constraints in a type signature.
3723 Consider the type:
3724
3725 <programlisting>
3726   forall tv1..tvn (c1, ...,cn) => type
3727 </programlisting>
3728
3729 (Here, we write the "foralls" explicitly, although the Haskell source
3730 language omits them; in Haskell 98, all the free type variables of an
3731 explicit source-language type signature are universally quantified,
3732 except for the class type variables in a class declaration.  However,
3733 in GHC, you can give the foralls if you want.  See <xref linkend="universal-quantification"/>).
3734 </para>
3735
3736 <para>
3737
3738 <orderedlist>
3739 <listitem>
3740
3741 <para>
3742  <emphasis>Each universally quantified type variable
3743 <literal>tvi</literal> must be reachable from <literal>type</literal></emphasis>.
3744
3745 A type variable <literal>a</literal> is "reachable" if it appears
3746 in the same constraint as either a type variable free in
3747 <literal>type</literal>, or another reachable type variable.
3748 A value with a type that does not obey
3749 this reachability restriction cannot be used without introducing
3750 ambiguity; that is why the type is rejected.
3751 Here, for example, is an illegal type:
3752
3753
3754 <programlisting>
3755   forall a. Eq a => Int
3756 </programlisting>
3757
3758
3759 When a value with this type was used, the constraint <literal>Eq tv</literal>
3760 would be introduced where <literal>tv</literal> is a fresh type variable, and
3761 (in the dictionary-translation implementation) the value would be
3762 applied to a dictionary for <literal>Eq tv</literal>.  The difficulty is that we
3763 can never know which instance of <literal>Eq</literal> to use because we never
3764 get any more information about <literal>tv</literal>.
3765 </para>
3766 <para>
3767 Note
3768 that the reachability condition is weaker than saying that <literal>a</literal> is
3769 functionally dependent on a type variable free in
3770 <literal>type</literal> (see <xref
3771 linkend="functional-dependencies"/>).  The reason for this is there
3772 might be a "hidden" dependency, in a superclass perhaps.  So
3773 "reachable" is a conservative approximation to "functionally dependent".
3774 For example, consider:
3775 <programlisting>
3776   class C a b | a -> b where ...
3777   class C a b => D a b where ...
3778   f :: forall a b. D a b => a -> a
3779 </programlisting>
3780 This is fine, because in fact <literal>a</literal> does functionally determine <literal>b</literal>
3781 but that is not immediately apparent from <literal>f</literal>'s type.
3782 </para>
3783 </listitem>
3784 <listitem>
3785
3786 <para>
3787  <emphasis>Every constraint <literal>ci</literal> must mention at least one of the
3788 universally quantified type variables <literal>tvi</literal></emphasis>.
3789
3790 For example, this type is OK because <literal>C a b</literal> mentions the
3791 universally quantified type variable <literal>b</literal>:
3792
3793
3794 <programlisting>
3795   forall a. C a b => burble
3796 </programlisting>
3797
3798
3799 The next type is illegal because the constraint <literal>Eq b</literal> does not
3800 mention <literal>a</literal>:
3801
3802
3803 <programlisting>
3804   forall a. Eq b => burble
3805 </programlisting>
3806
3807
3808 The reason for this restriction is milder than the other one.  The
3809 excluded types are never useful or necessary (because the offending
3810 context doesn't need to be witnessed at this point; it can be floated
3811 out).  Furthermore, floating them out increases sharing. Lastly,
3812 excluding them is a conservative choice; it leaves a patch of
3813 territory free in case we need it later.
3814
3815 </para>
3816 </listitem>
3817
3818 </orderedlist>
3819
3820 </para>
3821 </sect3>
3822
3823
3824
3825 </sect2>
3826
3827 <sect2 id="implicit-parameters">
3828 <title>Implicit parameters</title>
3829
3830 <para> Implicit parameters are implemented as described in
3831 "Implicit parameters: dynamic scoping with static types",
3832 J Lewis, MB Shields, E Meijer, J Launchbury,
3833 27th ACM Symposium on Principles of Programming Languages (POPL'00),
3834 Boston, Jan 2000.
3835 </para>
3836
3837 <para>(Most of the following, still rather incomplete, documentation is
3838 due to Jeff Lewis.)</para>
3839
3840 <para>Implicit parameter support is enabled with the option
3841 <option>-XImplicitParams</option>.</para>
3842
3843 <para>
3844 A variable is called <emphasis>dynamically bound</emphasis> when it is bound by the calling
3845 context of a function and <emphasis>statically bound</emphasis> when bound by the callee's
3846 context. In Haskell, all variables are statically bound. Dynamic
3847 binding of variables is a notion that goes back to Lisp, but was later
3848 discarded in more modern incarnations, such as Scheme. Dynamic binding
3849 can be very confusing in an untyped language, and unfortunately, typed
3850 languages, in particular Hindley-Milner typed languages like Haskell,
3851 only support static scoping of variables.
3852 </para>
3853 <para>
3854 However, by a simple extension to the type class system of Haskell, we
3855 can support dynamic binding. Basically, we express the use of a
3856 dynamically bound variable as a constraint on the type. These
3857 constraints lead to types of the form <literal>(?x::t') => t</literal>, which says "this
3858 function uses a dynamically-bound variable <literal>?x</literal>
3859 of type <literal>t'</literal>". For
3860 example, the following expresses the type of a sort function,
3861 implicitly parameterized by a comparison function named <literal>cmp</literal>.
3862 <programlisting>
3863   sort :: (?cmp :: a -> a -> Bool) => [a] -> [a]
3864 </programlisting>
3865 The dynamic binding constraints are just a new form of predicate in the type class system.
3866 </para>
3867 <para>
3868 An implicit parameter occurs in an expression using the special form <literal>?x</literal>,
3869 where <literal>x</literal> is
3870 any valid identifier (e.g. <literal>ord ?x</literal> is a valid expression).
3871 Use of this construct also introduces a new
3872 dynamic-binding constraint in the type of the expression.
3873 For example, the following definition
3874 shows how we can define an implicitly parameterized sort function in
3875 terms of an explicitly parameterized <literal>sortBy</literal> function:
3876 <programlisting>
3877   sortBy :: (a -> a -> Bool) -> [a] -> [a]
3878
3879   sort   :: (?cmp :: a -> a -> Bool) => [a] -> [a]
3880   sort    = sortBy ?cmp
3881 </programlisting>
3882 </para>
3883
3884 <sect3>
3885 <title>Implicit-parameter type constraints</title>
3886 <para>
3887 Dynamic binding constraints behave just like other type class
3888 constraints in that they are automatically propagated. Thus, when a
3889 function is used, its implicit parameters are inherited by the
3890 function that called it. For example, our <literal>sort</literal> function might be used
3891 to pick out the least value in a list:
3892 <programlisting>
3893   least   :: (?cmp :: a -> a -> Bool) => [a] -> a
3894   least xs = head (sort xs)
3895 </programlisting>
3896 Without lifting a finger, the <literal>?cmp</literal> parameter is
3897 propagated to become a parameter of <literal>least</literal> as well. With explicit
3898 parameters, the default is that parameters must always be explicit
3899 propagated. With implicit parameters, the default is to always
3900 propagate them.
3901 </para>
3902 <para>
3903 An implicit-parameter type constraint differs from other type class constraints in the
3904 following way: All uses of a particular implicit parameter must have
3905 the same type. This means that the type of <literal>(?x, ?x)</literal>
3906 is <literal>(?x::a) => (a,a)</literal>, and not
3907 <literal>(?x::a, ?x::b) => (a, b)</literal>, as would be the case for type
3908 class constraints.
3909 </para>
3910
3911 <para> You can't have an implicit parameter in the context of a class or instance
3912 declaration.  For example, both these declarations are illegal:
3913 <programlisting>
3914   class (?x::Int) => C a where ...
3915   instance (?x::a) => Foo [a] where ...
3916 </programlisting>
3917 Reason: exactly which implicit parameter you pick up depends on exactly where
3918 you invoke a function. But the ``invocation'' of instance declarations is done
3919 behind the scenes by the compiler, so it's hard to figure out exactly where it is done.
3920 Easiest thing is to outlaw the offending types.</para>
3921 <para>
3922 Implicit-parameter constraints do not cause ambiguity.  For example, consider:
3923 <programlisting>
3924    f :: (?x :: [a]) => Int -> Int
3925    f n = n + length ?x
3926
3927    g :: (Read a, Show a) => String -> String
3928    g s = show (read s)
3929 </programlisting>
3930 Here, <literal>g</literal> has an ambiguous type, and is rejected, but <literal>f</literal>
3931 is fine.  The binding for <literal>?x</literal> at <literal>f</literal>'s call site is
3932 quite unambiguous, and fixes the type <literal>a</literal>.
3933 </para>
3934 </sect3>
3935
3936 <sect3>
3937 <title>Implicit-parameter bindings</title>
3938
3939 <para>
3940 An implicit parameter is <emphasis>bound</emphasis> using the standard
3941 <literal>let</literal> or <literal>where</literal> binding forms.
3942 For example, we define the <literal>min</literal> function by binding
3943 <literal>cmp</literal>.
3944 <programlisting>
3945   min :: [a] -> a
3946   min  = let ?cmp = (&lt;=) in least
3947 </programlisting>
3948 </para>
3949 <para>
3950 A group of implicit-parameter bindings may occur anywhere a normal group of Haskell
3951 bindings can occur, except at top level.  That is, they can occur in a <literal>let</literal>
3952 (including in a list comprehension, or do-notation, or pattern guards),
3953 or a <literal>where</literal> clause.
3954 Note the following points:
3955 <itemizedlist>
3956 <listitem><para>
3957 An implicit-parameter binding group must be a
3958 collection of simple bindings to implicit-style variables (no
3959 function-style bindings, and no type signatures); these bindings are
3960 neither polymorphic or recursive.
3961 </para></listitem>
3962 <listitem><para>
3963 You may not mix implicit-parameter bindings with ordinary bindings in a
3964 single <literal>let</literal>
3965 expression; use two nested <literal>let</literal>s instead.
3966 (In the case of <literal>where</literal> you are stuck, since you can't nest <literal>where</literal> clauses.)
3967 </para></listitem>
3968
3969 <listitem><para>
3970 You may put multiple implicit-parameter bindings in a
3971 single binding group; but they are <emphasis>not</emphasis> treated
3972 as a mutually recursive group (as ordinary <literal>let</literal> bindings are).
3973 Instead they are treated as a non-recursive group, simultaneously binding all the implicit
3974 parameter.  The bindings are not nested, and may be re-ordered without changing
3975 the meaning of the program.
3976 For example, consider:
3977 <programlisting>
3978   f t = let { ?x = t; ?y = ?x+(1::Int) } in ?x + ?y
3979 </programlisting>
3980 The use of <literal>?x</literal> in the binding for <literal>?y</literal> does not "see"
3981 the binding for <literal>?x</literal>, so the type of <literal>f</literal> is
3982 <programlisting>
3983   f :: (?x::Int) => Int -> Int
3984 </programlisting>
3985 </para></listitem>
3986 </itemizedlist>
3987 </para>
3988
3989 </sect3>
3990
3991 <sect3><title>Implicit parameters and polymorphic recursion</title>
3992
3993 <para>
3994 Consider these two definitions:
3995 <programlisting>
3996   len1 :: [a] -> Int
3997   len1 xs = let ?acc = 0 in len_acc1 xs
3998
3999   len_acc1 [] = ?acc
4000   len_acc1 (x:xs) = let ?acc = ?acc + (1::Int) in len_acc1 xs
4001
4002   ------------
4003
4004   len2 :: [a] -> Int
4005   len2 xs = let ?acc = 0 in len_acc2 xs
4006
4007   len_acc2 :: (?acc :: Int) => [a] -> Int
4008   len_acc2 [] = ?acc
4009   len_acc2 (x:xs) = let ?acc = ?acc + (1::Int) in len_acc2 xs
4010 </programlisting>
4011 The only difference between the two groups is that in the second group
4012 <literal>len_acc</literal> is given a type signature.
4013 In the former case, <literal>len_acc1</literal> is monomorphic in its own
4014 right-hand side, so the implicit parameter <literal>?acc</literal> is not
4015 passed to the recursive call.  In the latter case, because <literal>len_acc2</literal>
4016 has a type signature, the recursive call is made to the
4017 <emphasis>polymorphic</emphasis> version, which takes <literal>?acc</literal>
4018 as an implicit parameter.  So we get the following results in GHCi:
4019 <programlisting>
4020   Prog> len1 "hello"
4021   0
4022   Prog> len2 "hello"
4023   5
4024 </programlisting>
4025 Adding a type signature dramatically changes the result!  This is a rather
4026 counter-intuitive phenomenon, worth watching out for.
4027 </para>
4028 </sect3>
4029
4030 <sect3><title>Implicit parameters and monomorphism</title>
4031
4032 <para>GHC applies the dreaded Monomorphism Restriction (section 4.5.5 of the
4033 Haskell Report) to implicit parameters.  For example, consider:
4034 <programlisting>
4035  f :: Int -> Int
4036   f v = let ?x = 0     in
4037         let y = ?x + v in
4038         let ?x = 5     in
4039         y
4040 </programlisting>
4041 Since the binding for <literal>y</literal> falls under the Monomorphism
4042 Restriction it is not generalised, so the type of <literal>y</literal> is
4043 simply <literal>Int</literal>, not <literal>(?x::Int) => Int</literal>.
4044 Hence, <literal>(f 9)</literal> returns result <literal>9</literal>.
4045 If you add a type signature for <literal>y</literal>, then <literal>y</literal>
4046 will get type <literal>(?x::Int) => Int</literal>, so the occurrence of
4047 <literal>y</literal> in the body of the <literal>let</literal> will see the
4048 inner binding of <literal>?x</literal>, so <literal>(f 9)</literal> will return
4049 <literal>14</literal>.
4050 </para>
4051 </sect3>
4052 </sect2>
4053
4054     <!--   ======================= COMMENTED OUT ========================
4055
4056     We intend to remove linear implicit parameters, so I'm at least removing
4057     them from the 6.6 user manual
4058
4059 <sect2 id="linear-implicit-parameters">
4060 <title>Linear implicit parameters</title>
4061 <para>
4062 Linear implicit parameters are an idea developed by Koen Claessen,
4063 Mark Shields, and Simon PJ.  They address the long-standing
4064 problem that monads seem over-kill for certain sorts of problem, notably:
4065 </para>
4066 <itemizedlist>
4067 <listitem> <para> distributing a supply of unique names </para> </listitem>
4068 <listitem> <para> distributing a supply of random numbers </para> </listitem>
4069 <listitem> <para> distributing an oracle (as in QuickCheck) </para> </listitem>
4070 </itemizedlist>
4071
4072 <para>
4073 Linear implicit parameters are just like ordinary implicit parameters,
4074 except that they are "linear"; that is, they cannot be copied, and
4075 must be explicitly "split" instead.  Linear implicit parameters are
4076 written '<literal>%x</literal>' instead of '<literal>?x</literal>'.
4077 (The '/' in the '%' suggests the split!)
4078 </para>
4079 <para>
4080 For example:
4081 <programlisting>
4082     import GHC.Exts( Splittable )
4083
4084     data NameSupply = ...
4085
4086     splitNS :: NameSupply -> (NameSupply, NameSupply)
4087     newName :: NameSupply -> Name
4088
4089     instance Splittable NameSupply where
4090         split = splitNS
4091
4092
4093     f :: (%ns :: NameSupply) => Env -> Expr -> Expr
4094     f env (Lam x e) = Lam x' (f env e)
4095                     where
4096                       x'   = newName %ns
4097                       env' = extend env x x'
4098     ...more equations for f...
4099 </programlisting>
4100 Notice that the implicit parameter %ns is consumed
4101 <itemizedlist>
4102 <listitem> <para> once by the call to <literal>newName</literal> </para> </listitem>
4103 <listitem> <para> once by the recursive call to <literal>f</literal> </para></listitem>
4104 </itemizedlist>
4105 </para>
4106 <para>
4107 So the translation done by the type checker makes
4108 the parameter explicit:
4109 <programlisting>
4110     f :: NameSupply -> Env -> Expr -> Expr
4111     f ns env (Lam x e) = Lam x' (f ns1 env e)
4112                        where
4113                          (ns1,ns2) = splitNS ns
4114                          x' = newName ns2
4115                          env = extend env x x'
4116 </programlisting>
4117 Notice the call to 'split' introduced by the type checker.
4118 How did it know to use 'splitNS'?  Because what it really did
4119 was to introduce a call to the overloaded function 'split',
4120 defined by the class <literal>Splittable</literal>:
4121 <programlisting>
4122         class Splittable a where
4123           split :: a -> (a,a)
4124 </programlisting>
4125 The instance for <literal>Splittable NameSupply</literal> tells GHC how to implement
4126 split for name supplies.  But we can simply write
4127 <programlisting>
4128         g x = (x, %ns, %ns)
4129 </programlisting>
4130 and GHC will infer
4131 <programlisting>
4132         g :: (Splittable a, %ns :: a) => b -> (b,a,a)
4133 </programlisting>
4134 The <literal>Splittable</literal> class is built into GHC.  It's exported by module
4135 <literal>GHC.Exts</literal>.
4136 </para>
4137 <para>
4138 Other points:
4139 <itemizedlist>
4140 <listitem> <para> '<literal>?x</literal>' and '<literal>%x</literal>'
4141 are entirely distinct implicit parameters: you
4142   can use them together and they won't interfere with each other. </para>
4143 </listitem>
4144
4145 <listitem> <para> You can bind linear implicit parameters in 'with' clauses. </para> </listitem>
4146
4147 <listitem> <para>You cannot have implicit parameters (whether linear or not)
4148   in the context of a class or instance declaration. </para></listitem>
4149 </itemizedlist>
4150 </para>
4151
4152 <sect3><title>Warnings</title>
4153
4154 <para>
4155 The monomorphism restriction is even more important than usual.
4156 Consider the example above:
4157 <programlisting>
4158     f :: (%ns :: NameSupply) => Env -> Expr -> Expr
4159     f env (Lam x e) = Lam x' (f env e)
4160                     where
4161                       x'   = newName %ns
4162                       env' = extend env x x'
4163 </programlisting>
4164 If we replaced the two occurrences of x' by (newName %ns), which is
4165 usually a harmless thing to do, we get:
4166 <programlisting>
4167     f :: (%ns :: NameSupply) => Env -> Expr -> Expr
4168     f env (Lam x e) = Lam (newName %ns) (f env e)
4169                     where
4170                       env' = extend env x (newName %ns)
4171 </programlisting>
4172 But now the name supply is consumed in <emphasis>three</emphasis> places
4173 (the two calls to newName,and the recursive call to f), so
4174 the result is utterly different.  Urk!  We don't even have
4175 the beta rule.
4176 </para>
4177 <para>
4178 Well, this is an experimental change.  With implicit
4179 parameters we have already lost beta reduction anyway, and
4180 (as John Launchbury puts it) we can't sensibly reason about
4181 Haskell programs without knowing their typing.
4182 </para>
4183
4184 </sect3>
4185
4186 <sect3><title>Recursive functions</title>
4187 <para>Linear implicit parameters can be particularly tricky when you have a recursive function
4188 Consider
4189 <programlisting>
4190         foo :: %x::T => Int -> [Int]
4191         foo 0 = []
4192         foo n = %x : foo (n-1)
4193 </programlisting>
4194 where T is some type in class Splittable.</para>
4195 <para>
4196 Do you get a list of all the same T's or all different T's
4197 (assuming that split gives two distinct T's back)?
4198 </para><para>
4199 If you supply the type signature, taking advantage of polymorphic
4200 recursion, you get what you'd probably expect.  Here's the
4201 translated term, where the implicit param is made explicit:
4202 <programlisting>
4203         foo x 0 = []
4204         foo x n = let (x1,x2) = split x
4205                   in x1 : foo x2 (n-1)
4206 </programlisting>
4207 But if you don't supply a type signature, GHC uses the Hindley
4208 Milner trick of using a single monomorphic instance of the function
4209 for the recursive calls. That is what makes Hindley Milner type inference
4210 work.  So the translation becomes
4211 <programlisting>
4212         foo x = let
4213                   foom 0 = []
4214                   foom n = x : foom (n-1)
4215                 in
4216                 foom
4217 </programlisting>
4218 Result: 'x' is not split, and you get a list of identical T's.  So the
4219 semantics of the program depends on whether or not foo has a type signature.
4220 Yikes!
4221 </para><para>
4222 You may say that this is a good reason to dislike linear implicit parameters
4223 and you'd be right.  That is why they are an experimental feature.
4224 </para>
4225 </sect3>
4226
4227 </sect2>
4228
4229 ================ END OF Linear Implicit Parameters commented out -->
4230
4231 <sect2 id="kinding">
4232 <title>Explicitly-kinded quantification</title>
4233
4234 <para>
4235 Haskell infers the kind of each type variable.  Sometimes it is nice to be able
4236 to give the kind explicitly as (machine-checked) documentation,
4237 just as it is nice to give a type signature for a function.  On some occasions,
4238 it is essential to do so.  For example, in his paper "Restricted Data Types in Haskell" (Haskell Workshop 1999)
4239 John Hughes had to define the data type:
4240 <screen>
4241      data Set cxt a = Set [a]
4242                     | Unused (cxt a -> ())
4243 </screen>
4244 The only use for the <literal>Unused</literal> constructor was to force the correct
4245 kind for the type variable <literal>cxt</literal>.
4246 </para>
4247 <para>
4248 GHC now instead allows you to specify the kind of a type variable directly, wherever
4249 a type variable is explicitly bound, with the flag <option>-XKindSignatures</option>.
4250 </para>
4251 <para>
4252 This flag enables kind signatures in the following places:
4253 <itemizedlist>
4254 <listitem><para><literal>data</literal> declarations:
4255 <screen>
4256   data Set (cxt :: * -> *) a = Set [a]
4257 </screen></para></listitem>
4258 <listitem><para><literal>type</literal> declarations:
4259 <screen>
4260   type T (f :: * -> *) = f Int
4261 </screen></para></listitem>
4262 <listitem><para><literal>class</literal> declarations:
4263 <screen>
4264   class (Eq a) => C (f :: * -> *) a where ...
4265 </screen></para></listitem>
4266 <listitem><para><literal>forall</literal>'s in type signatures:
4267 <screen>
4268   f :: forall (cxt :: * -> *). Set cxt Int
4269 </screen></para></listitem>
4270 </itemizedlist>
4271 </para>
4272
4273 <para>
4274 The parentheses are required.  Some of the spaces are required too, to
4275 separate the lexemes.  If you write <literal>(f::*->*)</literal> you
4276 will get a parse error, because "<literal>::*->*</literal>" is a
4277 single lexeme in Haskell.
4278 </para>
4279
4280 <para>
4281 As part of the same extension, you can put kind annotations in types
4282 as well.  Thus:
4283 <screen>
4284    f :: (Int :: *) -> Int
4285    g :: forall a. a -> (a :: *)
4286 </screen>
4287 The syntax is
4288 <screen>
4289    atype ::= '(' ctype '::' kind ')
4290 </screen>
4291 The parentheses are required.
4292 </para>
4293 </sect2>
4294
4295
4296 <sect2 id="universal-quantification">
4297 <title>Arbitrary-rank polymorphism
4298 </title>
4299
4300 <para>
4301 Haskell type signatures are implicitly quantified.  The new keyword <literal>forall</literal>
4302 allows us to say exactly what this means.  For example:
4303 </para>
4304 <para>
4305 <programlisting>
4306         g :: b -> b
4307 </programlisting>
4308 means this:
4309 <programlisting>
4310         g :: forall b. (b -> b)
4311 </programlisting>
4312 The two are treated identically.
4313 </para>
4314
4315 <para>
4316 However, GHC's type system supports <emphasis>arbitrary-rank</emphasis>
4317 explicit universal quantification in
4318 types.
4319 For example, all the following types are legal:
4320 <programlisting>
4321     f1 :: forall a b. a -> b -> a
4322     g1 :: forall a b. (Ord a, Eq  b) => a -> b -> a
4323
4324     f2 :: (forall a. a->a) -> Int -> Int
4325     g2 :: (forall a. Eq a => [a] -> a -> Bool) -> Int -> Int
4326
4327     f3 :: ((forall a. a->a) -> Int) -> Bool -> Bool
4328
4329     f4 :: Int -> (forall a. a -> a)
4330 </programlisting>
4331 Here, <literal>f1</literal> and <literal>g1</literal> are rank-1 types, and
4332 can be written in standard Haskell (e.g. <literal>f1 :: a->b->a</literal>).
4333 The <literal>forall</literal> makes explicit the universal quantification that
4334 is implicitly added by Haskell.
4335 </para>
4336 <para>
4337 The functions <literal>f2</literal> and <literal>g2</literal> have rank-2 types;
4338 the <literal>forall</literal> is on the left of a function arrow.  As <literal>g2</literal>
4339 shows, the polymorphic type on the left of the function arrow can be overloaded.
4340 </para>
4341 <para>
4342 The function <literal>f3</literal> has a rank-3 type;
4343 it has rank-2 types on the left of a function arrow.
4344 </para>
4345 <para>
4346 GHC has three flags to control higher-rank types:
4347 <itemizedlist>
4348 <listitem><para>
4349  <option>-XPolymorphicComponents</option>: data constructors (only) can have polymorphic argument types.
4350 </para></listitem>
4351 <listitem><para>
4352  <option>-XRank2Types</option>: any function (including data constructors) can have a rank-2 type.
4353 </para></listitem>
4354 <listitem><para>
4355  <option>-XRankNTypes</option>: any function (including data constructors) can have an arbitrary-rank type.
4356 That is,  you can nest <literal>forall</literal>s
4357 arbitrarily deep in function arrows.
4358 In particular, a forall-type (also called a "type scheme"),
4359 including an operational type class context, is legal:
4360 <itemizedlist>
4361 <listitem> <para> On the left or right (see <literal>f4</literal>, for example)
4362 of a function arrow </para> </listitem>
4363 <listitem> <para> As the argument of a constructor, or type of a field, in a data type declaration. For
4364 example, any of the <literal>f1,f2,f3,g1,g2</literal> above would be valid
4365 field type signatures.</para> </listitem>
4366 <listitem> <para> As the type of an implicit parameter </para> </listitem>
4367 <listitem> <para> In a pattern type signature (see <xref linkend="scoped-type-variables"/>) </para> </listitem>
4368 </itemizedlist>
4369 </para></listitem>
4370 </itemizedlist>
4371 Of course <literal>forall</literal> becomes a keyword; you can't use <literal>forall</literal> as
4372 a type variable any more!
4373 </para>
4374
4375
4376 <sect3 id="univ">
4377 <title>Examples
4378 </title>
4379
4380 <para>
4381 In a <literal>data</literal> or <literal>newtype</literal> declaration one can quantify
4382 the types of the constructor arguments.  Here are several examples:
4383 </para>
4384
4385 <para>
4386
4387 <programlisting>
4388 data T a = T1 (forall b. b -> b -> b) a
4389
4390 data MonadT m = MkMonad { return :: forall a. a -> m a,
4391                           bind   :: forall a b. m a -> (a -> m b) -> m b
4392                         }
4393
4394 newtype Swizzle = MkSwizzle (Ord a => [a] -> [a])
4395 </programlisting>
4396
4397 </para>
4398
4399 <para>
4400 The constructors have rank-2 types:
4401 </para>
4402
4403 <para>
4404
4405 <programlisting>
4406 T1 :: forall a. (forall b. b -> b -> b) -> a -> T a
4407 MkMonad :: forall m. (forall a. a -> m a)
4408                   -> (forall a b. m a -> (a -> m b) -> m b)
4409                   -> MonadT m
4410 MkSwizzle :: (Ord a => [a] -> [a]) -> Swizzle
4411 </programlisting>
4412
4413 </para>
4414
4415 <para>
4416 Notice that you don't need to use a <literal>forall</literal> if there's an
4417 explicit context.  For example in the first argument of the
4418 constructor <function>MkSwizzle</function>, an implicit "<literal>forall a.</literal>" is
4419 prefixed to the argument type.  The implicit <literal>forall</literal>
4420 quantifies all type variables that are not already in scope, and are
4421 mentioned in the type quantified over.
4422 </para>
4423
4424 <para>
4425 As for type signatures, implicit quantification happens for non-overloaded
4426 types too.  So if you write this:
4427
4428 <programlisting>
4429   data T a = MkT (Either a b) (b -> b)
4430 </programlisting>
4431
4432 it's just as if you had written this:
4433
4434 <programlisting>
4435   data T a = MkT (forall b. Either a b) (forall b. b -> b)
4436 </programlisting>
4437
4438 That is, since the type variable <literal>b</literal> isn't in scope, it's
4439 implicitly universally quantified.  (Arguably, it would be better
4440 to <emphasis>require</emphasis> explicit quantification on constructor arguments
4441 where that is what is wanted.  Feedback welcomed.)
4442 </para>
4443
4444 <para>
4445 You construct values of types <literal>T1, MonadT, Swizzle</literal> by applying
4446 the constructor to suitable values, just as usual.  For example,
4447 </para>
4448
4449 <para>
4450
4451 <programlisting>
4452     a1 :: T Int
4453     a1 = T1 (\xy->x) 3
4454
4455     a2, a3 :: Swizzle
4456     a2 = MkSwizzle sort
4457     a3 = MkSwizzle reverse
4458
4459     a4 :: MonadT Maybe
4460     a4 = let r x = Just x
4461              b m k = case m of
4462                        Just y -> k y
4463                        Nothing -> Nothing
4464          in
4465          MkMonad r b
4466
4467     mkTs :: (forall b. b -> b -> b) -> a -> [T a]
4468     mkTs f x y = [T1 f x, T1 f y]
4469 </programlisting>
4470
4471 </para>
4472
4473 <para>
4474 The type of the argument can, as usual, be more general than the type
4475 required, as <literal>(MkSwizzle reverse)</literal> shows.  (<function>reverse</function>
4476 does not need the <literal>Ord</literal> constraint.)
4477 </para>
4478
4479 <para>
4480 When you use pattern matching, the bound variables may now have
4481 polymorphic types.  For example:
4482 </para>
4483
4484 <para>
4485
4486 <programlisting>
4487     f :: T a -> a -> (a, Char)
4488     f (T1 w k) x = (w k x, w 'c' 'd')
4489
4490     g :: (Ord a, Ord b) => Swizzle -> [a] -> (a -> b) -> [b]
4491     g (MkSwizzle s) xs f = s (map f (s xs))
4492
4493     h :: MonadT m -> [m a] -> m [a]
4494     h m [] = return m []
4495     h m (x:xs) = bind m x          $ \y ->
4496                  bind m (h m xs)   $ \ys ->
4497                  return m (y:ys)
4498 </programlisting>
4499
4500 </para>
4501
4502 <para>
4503 In the function <function>h</function> we use the record selectors <literal>return</literal>
4504 and <literal>bind</literal> to extract the polymorphic bind and return functions
4505 from the <literal>MonadT</literal> data structure, rather than using pattern
4506 matching.
4507 </para>
4508 </sect3>
4509
4510 <sect3>
4511 <title>Type inference</title>
4512
4513 <para>
4514 In general, type inference for arbitrary-rank types is undecidable.
4515 GHC uses an algorithm proposed by Odersky and Laufer ("Putting type annotations to work", POPL'96)
4516 to get a decidable algorithm by requiring some help from the programmer.
4517 We do not yet have a formal specification of "some help" but the rule is this:
4518 </para>
4519 <para>
4520 <emphasis>For a lambda-bound or case-bound variable, x, either the programmer
4521 provides an explicit polymorphic type for x, or GHC's type inference will assume
4522 that x's type has no foralls in it</emphasis>.
4523 </para>
4524 <para>
4525 What does it mean to "provide" an explicit type for x?  You can do that by
4526 giving a type signature for x directly, using a pattern type signature
4527 (<xref linkend="scoped-type-variables"/>), thus:
4528 <programlisting>
4529      \ f :: (forall a. a->a) -> (f True, f 'c')
4530 </programlisting>
4531 Alternatively, you can give a type signature to the enclosing
4532 context, which GHC can "push down" to find the type for the variable:
4533 <programlisting>
4534      (\ f -> (f True, f 'c')) :: (forall a. a->a) -> (Bool,Char)
4535 </programlisting>
4536 Here the type signature on the expression can be pushed inwards
4537 to give a type signature for f.  Similarly, and more commonly,
4538 one can give a type signature for the function itself:
4539 <programlisting>
4540      h :: (forall a. a->a) -> (Bool,Char)
4541      h f = (f True, f 'c')
4542 </programlisting>
4543 You don't need to give a type signature if the lambda bound variable
4544 is a constructor argument.  Here is an example we saw earlier:
4545 <programlisting>
4546     f :: T a -> a -> (a, Char)
4547     f (T1 w k) x = (w k x, w 'c' 'd')
4548 </programlisting>
4549 Here we do not need to give a type signature to <literal>w</literal>, because
4550 it is an argument of constructor <literal>T1</literal> and that tells GHC all
4551 it needs to know.
4552 </para>
4553
4554 </sect3>
4555
4556
4557 <sect3 id="implicit-quant">
4558 <title>Implicit quantification</title>
4559
4560 <para>
4561 GHC performs implicit quantification as follows.  <emphasis>At the top level (only) of
4562 user-written types, if and only if there is no explicit <literal>forall</literal>,
4563 GHC finds all the type variables mentioned in the type that are not already
4564 in scope, and universally quantifies them.</emphasis>  For example, the following pairs are
4565 equivalent:
4566 <programlisting>
4567   f :: a -> a
4568   f :: forall a. a -> a
4569
4570   g (x::a) = let
4571                 h :: a -> b -> b
4572                 h x y = y
4573              in ...
4574   g (x::a) = let
4575                 h :: forall b. a -> b -> b
4576                 h x y = y
4577              in ...
4578 </programlisting>
4579 </para>
4580 <para>
4581 Notice that GHC does <emphasis>not</emphasis> find the innermost possible quantification
4582 point.  For example:
4583 <programlisting>
4584   f :: (a -> a) -> Int
4585            -- MEANS
4586   f :: forall a. (a -> a) -> Int
4587            -- NOT
4588   f :: (forall a. a -> a) -> Int
4589
4590
4591   g :: (Ord a => a -> a) -> Int
4592            -- MEANS the illegal type
4593   g :: forall a. (Ord a => a -> a) -> Int
4594            -- NOT
4595   g :: (forall a. Ord a => a -> a) -> Int
4596 </programlisting>
4597 The latter produces an illegal type, which you might think is silly,
4598 but at least the rule is simple.  If you want the latter type, you
4599 can write your for-alls explicitly.  Indeed, doing so is strongly advised
4600 for rank-2 types.
4601 </para>
4602 </sect3>
4603 </sect2>
4604
4605
4606 <sect2 id="impredicative-polymorphism">
4607 <title>Impredicative polymorphism
4608 </title>
4609 <para>GHC supports <emphasis>impredicative polymorphism</emphasis>,
4610 enabled with <option>-XImpredicativeTypes</option>.
4611 This means
4612 that you can call a polymorphic function at a polymorphic type, and
4613 parameterise data structures over polymorphic types.  For example:
4614 <programlisting>
4615   f :: Maybe (forall a. [a] -> [a]) -> Maybe ([Int], [Char])
4616   f (Just g) = Just (g [3], g "hello")
4617   f Nothing  = Nothing
4618 </programlisting>
4619 Notice here that the <literal>Maybe</literal> type is parameterised by the
4620 <emphasis>polymorphic</emphasis> type <literal>(forall a. [a] ->
4621 [a])</literal>.
4622 </para>
4623 <para>The technical details of this extension are described in the paper
4624 <ulink url="http://research.microsoft.com/%7Esimonpj/papers/boxy/">Boxy types:
4625 type inference for higher-rank types and impredicativity</ulink>,
4626 which appeared at ICFP 2006.
4627 </para>
4628 </sect2>
4629
4630 <sect2 id="scoped-type-variables">
4631 <title>Lexically scoped type variables
4632 </title>
4633
4634 <para>
4635 GHC supports <emphasis>lexically scoped type variables</emphasis>, without
4636 which some type signatures are simply impossible to write. For example:
4637 <programlisting>
4638 f :: forall a. [a] -> [a]
4639 f xs = ys ++ ys
4640      where
4641        ys :: [a]
4642        ys = reverse xs
4643 </programlisting>
4644 The type signature for <literal>f</literal> brings the type variable <literal>a</literal> into scope; it scopes over
4645 the entire definition of <literal>f</literal>.
4646 In particular, it is in scope at the type signature for <varname>ys</varname>.
4647 In Haskell 98 it is not possible to declare
4648 a type for <varname>ys</varname>; a major benefit of scoped type variables is that
4649 it becomes possible to do so.
4650 </para>
4651 <para>Lexically-scoped type variables are enabled by
4652 <option>-XScopedTypeVariables</option>.  This flag implies <option>-XRelaxedPolyRec</option>.
4653 </para>
4654 <para>Note: GHC 6.6 contains substantial changes to the way that scoped type
4655 variables work, compared to earlier releases.  Read this section
4656 carefully!</para>
4657
4658 <sect3>
4659 <title>Overview</title>
4660
4661 <para>The design follows the following principles
4662 <itemizedlist>
4663 <listitem><para>A scoped type variable stands for a type <emphasis>variable</emphasis>, and not for
4664 a <emphasis>type</emphasis>. (This is a change from GHC's earlier
4665 design.)</para></listitem>
4666 <listitem><para>Furthermore, distinct lexical type variables stand for distinct
4667 type variables.  This means that every programmer-written type signature
4668 (including one that contains free scoped type variables) denotes a
4669 <emphasis>rigid</emphasis> type; that is, the type is fully known to the type
4670 checker, and no inference is involved.</para></listitem>
4671 <listitem><para>Lexical type variables may be alpha-renamed freely, without
4672 changing the program.</para></listitem>
4673 </itemizedlist>
4674 </para>
4675 <para>
4676 A <emphasis>lexically scoped type variable</emphasis> can be bound by:
4677 <itemizedlist>
4678 <listitem><para>A declaration type signature (<xref linkend="decl-type-sigs"/>)</para></listitem>
4679 <listitem><para>An expression type signature (<xref linkend="exp-type-sigs"/>)</para></listitem>
4680 <listitem><para>A pattern type signature (<xref linkend="pattern-type-sigs"/>)</para></listitem>
4681 <listitem><para>Class and instance declarations (<xref linkend="cls-inst-scoped-tyvars"/>)</para></listitem>
4682 </itemizedlist>
4683 </para>
4684 <para>
4685 In Haskell, a programmer-written type signature is implicitly quantified over
4686 its free type variables (<ulink
4687 url="http://www.haskell.org/onlinereport/decls.html#sect4.1.2">Section
4688 4.1.2</ulink>
4689 of the Haskell Report).
4690 Lexically scoped type variables affect this implicit quantification rules
4691 as follows: any type variable that is in scope is <emphasis>not</emphasis> universally
4692 quantified. For example, if type variable <literal>a</literal> is in scope,
4693 then
4694 <programlisting>
4695   (e :: a -> a)     means     (e :: a -> a)
4696   (e :: b -> b)     means     (e :: forall b. b->b)
4697   (e :: a -> b)     means     (e :: forall b. a->b)
4698 </programlisting>
4699 </para>
4700
4701
4702 </sect3>
4703
4704
4705 <sect3 id="decl-type-sigs">
4706 <title>Declaration type signatures</title>
4707 <para>A declaration type signature that has <emphasis>explicit</emphasis>
4708 quantification (using <literal>forall</literal>) brings into scope the
4709 explicitly-quantified
4710 type variables, in the definition of the named function.  For example:
4711 <programlisting>
4712   f :: forall a. [a] -> [a]
4713   f (x:xs) = xs ++ [ x :: a ]
4714 </programlisting>
4715 The "<literal>forall a</literal>" brings "<literal>a</literal>" into scope in
4716 the definition of "<literal>f</literal>".
4717 </para>
4718 <para>This only happens if:
4719 <itemizedlist>
4720 <listitem><para> The quantification in <literal>f</literal>'s type
4721 signature is explicit.  For example:
4722 <programlisting>
4723   g :: [a] -> [a]
4724   g (x:xs) = xs ++ [ x :: a ]
4725 </programlisting>
4726 This program will be rejected, because "<literal>a</literal>" does not scope
4727 over the definition of "<literal>f</literal>", so "<literal>x::a</literal>"
4728 means "<literal>x::forall a. a</literal>" by Haskell's usual implicit
4729 quantification rules.
4730 </para></listitem>
4731 <listitem><para> The signature gives a type for a function binding or a bare variable binding,
4732 not a pattern binding.
4733 For example:
4734 <programlisting>
4735   f1 :: forall a. [a] -> [a]
4736   f1 (x:xs) = xs ++ [ x :: a ]   -- OK
4737
4738   f2 :: forall a. [a] -> [a]
4739   f2 = \(x:xs) -> xs ++ [ x :: a ]   -- OK
4740
4741   f3 :: forall a. [a] -> [a]
4742   Just f3 = Just (\(x:xs) -> xs ++ [ x :: a ])   -- Not OK!
4743 </programlisting>
4744 The binding for <literal>f3</literal> is a pattern binding, and so its type signature
4745 does not bring <literal>a</literal> into scope.   However <literal>f1</literal> is a
4746 function binding, and <literal>f2</literal> binds a bare variable; in both cases
4747 the type signature brings <literal>a</literal> into scope.
4748 </para></listitem>
4749 </itemizedlist>
4750 </para>
4751 </sect3>
4752
4753 <sect3 id="exp-type-sigs">
4754 <title>Expression type signatures</title>
4755
4756 <para>An expression type signature that has <emphasis>explicit</emphasis>
4757 quantification (using <literal>forall</literal>) brings into scope the
4758 explicitly-quantified
4759 type variables, in the annotated expression.  For example:
4760 <programlisting>
4761   f = runST ( (op >>= \(x :: STRef s Int) -> g x) :: forall s. ST s Bool )
4762 </programlisting>
4763 Here, the type signature <literal>forall a. ST s Bool</literal> brings the
4764 type variable <literal>s</literal> into scope, in the annotated expression
4765 <literal>(op >>= \(x :: STRef s Int) -> g x)</literal>.
4766 </para>
4767
4768 </sect3>
4769
4770 <sect3 id="pattern-type-sigs">
4771 <title>Pattern type signatures</title>
4772 <para>
4773 A type signature may occur in any pattern; this is a <emphasis>pattern type
4774 signature</emphasis>.
4775 For example:
4776 <programlisting>
4777   -- f and g assume that 'a' is already in scope
4778   f = \(x::Int, y::a) -> x
4779   g (x::a) = x
4780   h ((x,y) :: (Int,Bool)) = (y,x)
4781 </programlisting>
4782 In the case where all the type variables in the pattern type signature are
4783 already in scope (i.e. bound by the enclosing context), matters are simple: the
4784 signature simply constrains the type of the pattern in the obvious way.
4785 </para>
4786 <para>
4787 Unlike expression and declaration type signatures, pattern type signatures are not implicitly generalised.
4788 The pattern in a <emphasis>pattern binding</emphasis> may only mention type variables
4789 that are already in scope.  For example:
4790 <programlisting>
4791   f :: forall a. [a] -> (Int, [a])
4792   f xs = (n, zs)
4793     where
4794       (ys::[a], n) = (reverse xs, length xs) -- OK
4795       zs::[a] = xs ++ ys                     -- OK
4796
4797       Just (v::b) = ...  -- Not OK; b is not in scope
4798 </programlisting>
4799 Here, the pattern signatures for <literal>ys</literal> and <literal>zs</literal>
4800 are fine, but the one for <literal>v</literal> is not because <literal>b</literal> is
4801 not in scope.
4802 </para>
4803 <para>
4804 However, in all patterns <emphasis>other</emphasis> than pattern bindings, a pattern
4805 type signature may mention a type variable that is not in scope; in this case,
4806 <emphasis>the signature brings that type variable into scope</emphasis>.
4807 This is particularly important for existential data constructors.  For example:
4808 <programlisting>
4809   data T = forall a. MkT [a]
4810
4811   k :: T -> T
4812   k (MkT [t::a]) = MkT t3
4813                  where
4814                    t3::[a] = [t,t,t]
4815 </programlisting>
4816 Here, the pattern type signature <literal>(t::a)</literal> mentions a lexical type
4817 variable that is not already in scope.  Indeed, it <emphasis>cannot</emphasis> already be in scope,
4818 because it is bound by the pattern match.  GHC's rule is that in this situation
4819 (and only then), a pattern type signature can mention a type variable that is
4820 not already in scope; the effect is to bring it into scope, standing for the
4821 existentially-bound type variable.
4822 </para>
4823 <para>
4824 When a pattern type signature binds a type variable in this way, GHC insists that the
4825 type variable is bound to a <emphasis>rigid</emphasis>, or fully-known, type variable.
4826 This means that any user-written type signature always stands for a completely known type.
4827 </para>
4828 <para>
4829 If all this seems a little odd, we think so too.  But we must have
4830 <emphasis>some</emphasis> way to bring such type variables into scope, else we
4831 could not name existentially-bound type variables in subsequent type signatures.
4832 </para>
4833 <para>
4834 This is (now) the <emphasis>only</emphasis> situation in which a pattern type
4835 signature is allowed to mention a lexical variable that is not already in
4836 scope.
4837 For example, both <literal>f</literal> and <literal>g</literal> would be
4838 illegal if <literal>a</literal> was not already in scope.
4839 </para>
4840
4841
4842 </sect3>
4843
4844 <!-- ==================== Commented out part about result type signatures
4845
4846 <sect3 id="result-type-sigs">
4847 <title>Result type signatures</title>
4848
4849 <para>
4850 The result type of a function, lambda, or case expression alternative can be given a signature, thus:
4851
4852 <programlisting>
4853   {- f assumes that 'a' is already in scope -}
4854   f x y :: [a] = [x,y,x]
4855
4856   g = \ x :: [Int] -> [3,4]
4857
4858   h :: forall a. [a] -> a
4859   h xs = case xs of
4860             (y:ys) :: a -> y
4861 </programlisting>
4862 The final <literal>:: [a]</literal> after the patterns of <literal>f</literal> gives the type of
4863 the result of the function.  Similarly, the body of the lambda in the RHS of
4864 <literal>g</literal> is <literal>[Int]</literal>, and the RHS of the case
4865 alternative in <literal>h</literal> is <literal>a</literal>.
4866 </para>
4867 <para> A result type signature never brings new type variables into scope.</para>
4868 <para>
4869 There are a couple of syntactic wrinkles.  First, notice that all three
4870 examples would parse quite differently with parentheses:
4871 <programlisting>
4872   {- f assumes that 'a' is already in scope -}
4873   f x (y :: [a]) = [x,y,x]
4874
4875   g = \ (x :: [Int]) -> [3,4]
4876
4877   h :: forall a. [a] -> a
4878   h xs = case xs of
4879             ((y:ys) :: a) -> y
4880 </programlisting>
4881 Now the signature is on the <emphasis>pattern</emphasis>; and
4882 <literal>h</literal> would certainly be ill-typed (since the pattern
4883 <literal>(y:ys)</literal> cannot have the type <literal>a</literal>.
4884
4885 Second, to avoid ambiguity, the type after the &ldquo;<literal>::</literal>&rdquo; in a result
4886 pattern signature on a lambda or <literal>case</literal> must be atomic (i.e. a single
4887 token or a parenthesised type of some sort).  To see why,
4888 consider how one would parse this:
4889 <programlisting>
4890   \ x :: a -> b -> x
4891 </programlisting>
4892 </para>
4893 </sect3>
4894
4895  -->
4896
4897 <sect3 id="cls-inst-scoped-tyvars">
4898 <title>Class and instance declarations</title>
4899 <para>
4900
4901 The type variables in the head of a <literal>class</literal> or <literal>instance</literal> declaration
4902 scope over the methods defined in the <literal>where</literal> part.  For example:
4903
4904
4905 <programlisting>
4906   class C a where
4907     op :: [a] -> a
4908
4909     op xs = let ys::[a]
4910                 ys = reverse xs
4911             in
4912             head ys
4913 </programlisting>
4914 </para>
4915 </sect3>
4916
4917 </sect2>
4918
4919
4920 <sect2 id="typing-binds">
4921 <title>Generalised typing of mutually recursive bindings</title>
4922
4923 <para>
4924 The Haskell Report specifies that a group of bindings (at top level, or in a
4925 <literal>let</literal> or <literal>where</literal>) should be sorted into
4926 strongly-connected components, and then type-checked in dependency order
4927 (<ulink url="http://www.haskell.org/onlinereport/decls.html#sect4.5.1">Haskell
4928 Report, Section 4.5.1</ulink>).
4929 As each group is type-checked, any binders of the group that
4930 have
4931 an explicit type signature are put in the type environment with the specified
4932 polymorphic type,
4933 and all others are monomorphic until the group is generalised
4934 (<ulink url="http://www.haskell.org/onlinereport/decls.html#sect4.5.2">Haskell Report, Section 4.5.2</ulink>).
4935 </para>
4936
4937 <para>Following a suggestion of Mark Jones, in his paper
4938 <ulink url="http://citeseer.ist.psu.edu/424440.html">Typing Haskell in
4939 Haskell</ulink>,
4940 GHC implements a more general scheme.  If <option>-XRelaxedPolyRec</option> is
4941 specified:
4942 <emphasis>the dependency analysis ignores references to variables that have an explicit
4943 type signature</emphasis>.
4944 As a result of this refined dependency analysis, the dependency groups are smaller, and more bindings will
4945 typecheck.  For example, consider:
4946 <programlisting>
4947   f :: Eq a =&gt; a -> Bool
4948   f x = (x == x) || g True || g "Yes"
4949
4950   g y = (y &lt;= y) || f True
4951 </programlisting>
4952 This is rejected by Haskell 98, but under Jones's scheme the definition for
4953 <literal>g</literal> is typechecked first, separately from that for
4954 <literal>f</literal>,
4955 because the reference to <literal>f</literal> in <literal>g</literal>'s right
4956 hand side is ignored by the dependency analysis.  Then <literal>g</literal>'s
4957 type is generalised, to get
4958 <programlisting>
4959   g :: Ord a =&gt; a -> Bool
4960 </programlisting>
4961 Now, the definition for <literal>f</literal> is typechecked, with this type for
4962 <literal>g</literal> in the type environment.
4963 </para>
4964
4965 <para>
4966 The same refined dependency analysis also allows the type signatures of
4967 mutually-recursive functions to have different contexts, something that is illegal in
4968 Haskell 98 (Section 4.5.2, last sentence).  With
4969 <option>-XRelaxedPolyRec</option>
4970 GHC only insists that the type signatures of a <emphasis>refined</emphasis> group have identical
4971 type signatures; in practice this means that only variables bound by the same
4972 pattern binding must have the same context.  For example, this is fine:
4973 <programlisting>
4974   f :: Eq a =&gt; a -> Bool
4975   f x = (x == x) || g True
4976
4977   g :: Ord a =&gt; a -> Bool
4978   g y = (y &lt;= y) || f True
4979 </programlisting>
4980 </para>
4981 </sect2>
4982
4983 <sect2 id="type-families">
4984 <title>Type families
4985 </title>
4986
4987 <para>
4988 GHC supports the definition of type families indexed by types.  They may be
4989 seen as an extension of Haskell 98's class-based overloading of values to
4990 types.  When type families are declared in classes, they are also known as
4991 associated types.
4992 </para>
4993 <para>
4994 There are two forms of type families: data families and type synonym families.
4995 Currently, only the former are fully implemented, while we are still working
4996 on the latter.  As a result, the specification of the language extension is
4997 also still to some degree in flux.  Hence, a more detailed description of
4998 the language extension and its use is currently available
4999 from <ulink url="http://www.haskell.org/haskellwiki/GHC/Indexed_types">the Haskell
5000 wiki page on type families</ulink>.  The material will be moved to this user's
5001 guide when it has stabilised.
5002 </para>
5003 <para>
5004 Type families are enabled by the flag <option>-XTypeFamilies</option>.
5005 </para>
5006
5007
5008 </sect2>
5009
5010 </sect1>
5011 <!-- ==================== End of type system extensions =================  -->
5012
5013 <!-- ====================== TEMPLATE HASKELL =======================  -->
5014
5015 <sect1 id="template-haskell">
5016 <title>Template Haskell</title>
5017
5018 <para>Template Haskell allows you to do compile-time meta-programming in
5019 Haskell.
5020 The background to
5021 the main technical innovations is discussed in "<ulink
5022 url="http://research.microsoft.com/~simonpj/papers/meta-haskell/">
5023 Template Meta-programming for Haskell</ulink>" (Proc Haskell Workshop 2002).
5024 </para>
5025 <para>
5026 There is a Wiki page about
5027 Template Haskell at <ulink url="http://www.haskell.org/haskellwiki/Template_Haskell">
5028 http://www.haskell.org/haskellwiki/Template_Haskell</ulink>, and that is the best place to look for
5029 further details.
5030 You may also
5031 consult the <ulink
5032 url="http://www.haskell.org/ghc/docs/latest/html/libraries/index.html">online
5033 Haskell library reference material</ulink>
5034 (look for module <literal>Language.Haskell.TH</literal>).
5035 Many changes to the original design are described in
5036       <ulink url="http://research.microsoft.com/~simonpj/papers/meta-haskell/notes2.ps">
5037 Notes on Template Haskell version 2</ulink>.
5038 Not all of these changes are in GHC, however.
5039 </para>
5040
5041 <para> The first example from that paper is set out below (<xref linkend="th-example"/>)
5042 as a worked example to help get you started.
5043 </para>
5044
5045 <para>
5046 The documentation here describes the realisation of Template Haskell in GHC.  It is not detailed enough to
5047 understand Template Haskell; see the <ulink url="http://haskell.org/haskellwiki/Template_Haskell">
5048 Wiki page</ulink>.
5049 </para>
5050
5051     <sect2>
5052       <title>Syntax</title>
5053
5054       <para> Template Haskell has the following new syntactic
5055       constructions.  You need to use the flag
5056       <option>-XTemplateHaskell</option>
5057         <indexterm><primary><option>-XTemplateHaskell</option></primary>
5058       </indexterm>to switch these syntactic extensions on
5059       (<option>-XTemplateHaskell</option> is no longer implied by
5060       <option>-fglasgow-exts</option>).</para>
5061
5062         <itemizedlist>
5063               <listitem><para>
5064                   A splice is written <literal>$x</literal>, where <literal>x</literal> is an
5065                   identifier, or <literal>$(...)</literal>, where the "..." is an arbitrary expression.
5066                   There must be no space between the "$" and the identifier or parenthesis.  This use
5067                   of "$" overrides its meaning as an infix operator, just as "M.x" overrides the meaning
5068                   of "." as an infix operator.  If you want the infix operator, put spaces around it.
5069                   </para>
5070               <para> A splice can occur in place of
5071                   <itemizedlist>
5072                     <listitem><para> an expression; the spliced expression must
5073                     have type <literal>Q Exp</literal></para></listitem>
5074                     <listitem><para> a list of top-level declarations; the spliced expression must have type <literal>Q [Dec]</literal></para></listitem>
5075                     </itemizedlist>
5076                 </para>
5077             Inside a splice you can can only call functions defined in imported modules,
5078         not functions defined elsewhere in the same module.</listitem>
5079
5080
5081               <listitem><para>
5082                   A expression quotation is written in Oxford brackets, thus:
5083                   <itemizedlist>
5084                     <listitem><para> <literal>[| ... |]</literal>, where the "..." is an expression;
5085                              the quotation has type <literal>Q Exp</literal>.</para></listitem>
5086                     <listitem><para> <literal>[d| ... |]</literal>, where the "..." is a list of top-level declarations;
5087                              the quotation has type <literal>Q [Dec]</literal>.</para></listitem>
5088                     <listitem><para> <literal>[t| ... |]</literal>, where the "..." is a type;
5089                              the quotation has type <literal>Q Typ</literal>.</para></listitem>
5090                   </itemizedlist></para></listitem>
5091
5092               <listitem><para>
5093                   A quasi-quotation can appear in either a pattern context or an
5094                   expression context and is also written in Oxford brackets:
5095                   <itemizedlist>
5096                     <listitem><para> <literal>[:<replaceable>varid</replaceable>| ... |]</literal>,
5097                         where the "..." is an arbitrary string; a full description of the
5098                         quasi-quotation facility is given in <xref linkend="th-quasiquotation"/>.</para></listitem>
5099                   </itemizedlist></para></listitem>
5100
5101               <listitem><para>
5102                   A name can be quoted with either one or two prefix single quotes:
5103                   <itemizedlist>
5104                     <listitem><para> <literal>'f</literal> has type <literal>Name</literal>, and names the function <literal>f</literal>.
5105                   Similarly <literal>'C</literal> has type <literal>Name</literal> and names the data constructor <literal>C</literal>.
5106                   In general <literal>'</literal><replaceable>thing</replaceable> interprets <replaceable>thing</replaceable> in an expression context.
5107                      </para></listitem>
5108                     <listitem><para> <literal>''T</literal> has type <literal>Name</literal>, and names the type constructor  <literal>T</literal>.
5109                   That is, <literal>''</literal><replaceable>thing</replaceable> interprets <replaceable>thing</replaceable> in a type context.
5110                      </para></listitem>
5111                   </itemizedlist>
5112                   These <literal>Names</literal> can be used to construct Template Haskell expressions, patterns, declarations etc.  They
5113                   may also be given as an argument to the <literal>reify</literal> function.
5114                  </para>
5115                 </listitem>
5116
5117
5118         </itemizedlist>
5119 (Compared to the original paper, there are many differences of detail.
5120 The syntax for a declaration splice uses "<literal>$</literal>" not "<literal>splice</literal>".
5121 The type of the enclosed expression must be  <literal>Q [Dec]</literal>, not  <literal>[Q Dec]</literal>.
5122 Type splices are not implemented, and neither are pattern splices or quotations.
5123
5124 </sect2>
5125
5126 <sect2>  <title> Using Template Haskell </title>
5127 <para>
5128 <itemizedlist>
5129     <listitem><para>
5130     The data types and monadic constructor functions for Template Haskell are in the library
5131     <literal>Language.Haskell.THSyntax</literal>.
5132     </para></listitem>
5133
5134     <listitem><para>
5135     You can only run a function at compile time if it is imported from another module.  That is,
5136             you can't define a function in a module, and call it from within a splice in the same module.
5137             (It would make sense to do so, but it's hard to implement.)
5138    </para></listitem>
5139
5140    <listitem><para>
5141    You can only run a function at compile time if it is imported
5142    from another module <emphasis>that is not part of a mutually-recursive group of modules
5143    that includes the module currently being compiled</emphasis>.  Furthermore, all of the modules of
5144    the mutually-recursive group must be reachable by non-SOURCE imports from the module where the
5145    splice is to be run.</para>
5146    <para>
5147    For example, when compiling module A,
5148    you can only run Template Haskell functions imported from B if B does not import A (directly or indirectly).
5149    The reason should be clear: to run B we must compile and run A, but we are currently type-checking A.
5150    </para></listitem>
5151
5152     <listitem><para>
5153             The flag <literal>-ddump-splices</literal> shows the expansion of all top-level splices as they happen.
5154    </para></listitem>
5155     <listitem><para>
5156             If you are building GHC from source, you need at least a stage-2 bootstrap compiler to
5157               run Template Haskell.  A stage-1 compiler will reject the TH constructs.  Reason: TH
5158               compiles and runs a program, and then looks at the result.  So it's important that
5159               the program it compiles produces results whose representations are identical to
5160               those of the compiler itself.
5161    </para></listitem>
5162 </itemizedlist>
5163 </para>
5164 <para> Template Haskell works in any mode (<literal>--make</literal>, <literal>--interactive</literal>,
5165         or file-at-a-time).  There used to be a restriction to the former two, but that restriction
5166         has been lifted.
5167 </para>
5168 </sect2>
5169
5170 <sect2 id="th-example">  <title> A Template Haskell Worked Example </title>
5171 <para>To help you get over the confidence barrier, try out this skeletal worked example.
5172   First cut and paste the two modules below into "Main.hs" and "Printf.hs":</para>
5173
5174 <programlisting>
5175
5176 {- Main.hs -}
5177 module Main where
5178
5179 -- Import our template "pr"
5180 import Printf ( pr )
5181
5182 -- The splice operator $ takes the Haskell source code
5183 -- generated at compile time by "pr" and splices it into
5184 -- the argument of "putStrLn".
5185 main = putStrLn ( $(pr "Hello") )
5186
5187
5188 {- Printf.hs -}
5189 module Printf where
5190
5191 -- Skeletal printf from the paper.
5192 -- It needs to be in a separate module to the one where
5193 -- you intend to use it.
5194
5195 -- Import some Template Haskell syntax
5196 import Language.Haskell.TH
5197
5198 -- Describe a format string
5199 data Format = D | S | L String
5200
5201 -- Parse a format string.  This is left largely to you
5202 -- as we are here interested in building our first ever
5203 -- Template Haskell program and not in building printf.
5204 parse :: String -> [Format]
5205 parse s   = [ L s ]
5206
5207 -- Generate Haskell source code from a parsed representation
5208 -- of the format string.  This code will be spliced into
5209 -- the module which calls "pr", at compile time.
5210 gen :: [Format] -> Q Exp
5211 gen [D]   = [| \n -> show n |]
5212 gen [S]   = [| \s -> s |]
5213 gen [L s] = stringE s
5214
5215 -- Here we generate the Haskell code for the splice
5216 -- from an input format string.
5217 pr :: String -> Q Exp
5218 pr s = gen (parse s)
5219 </programlisting>
5220
5221 <para>Now run the compiler (here we are a Cygwin prompt on Windows):
5222 </para>
5223 <programlisting>
5224 $ ghc --make -XTemplateHaskell main.hs -o main.exe
5225 </programlisting>
5226
5227 <para>Run "main.exe" and here is your output:</para>
5228
5229 <programlisting>
5230 $ ./main
5231 Hello
5232 </programlisting>
5233
5234 </sect2>
5235
5236 <sect2>
5237 <title>Using Template Haskell with Profiling</title>
5238 <indexterm><primary>profiling</primary><secondary>with Template Haskell</secondary></indexterm>
5239
5240 <para>Template Haskell relies on GHC's built-in bytecode compiler and
5241 interpreter to run the splice expressions.  The bytecode interpreter
5242 runs the compiled expression on top of the same runtime on which GHC
5243 itself is running; this means that the compiled code referred to by
5244 the interpreted expression must be compatible with this runtime, and
5245 in particular this means that object code that is compiled for
5246 profiling <emphasis>cannot</emphasis> be loaded and used by a splice
5247 expression, because profiled object code is only compatible with the
5248 profiling version of the runtime.</para>
5249
5250 <para>This causes difficulties if you have a multi-module program
5251 containing Template Haskell code and you need to compile it for
5252 profiling, because GHC cannot load the profiled object code and use it
5253 when executing the splices.  Fortunately GHC provides a workaround.
5254 The basic idea is to compile the program twice:</para>
5255
5256 <orderedlist>
5257 <listitem>
5258   <para>Compile the program or library first the normal way, without
5259   <option>-prof</option><indexterm><primary><option>-prof</option></primary></indexterm>.</para>
5260 </listitem>
5261 <listitem>
5262   <para>Then compile it again with <option>-prof</option>, and
5263   additionally use <option>-osuf
5264   p_o</option><indexterm><primary><option>-osuf</option></primary></indexterm>
5265   to name the object files differently (you can choose any suffix
5266   that isn't the normal object suffix here).  GHC will automatically
5267   load the object files built in the first step when executing splice
5268   expressions.  If you omit the <option>-osuf</option> flag when
5269   building with <option>-prof</option> and Template Haskell is used,
5270   GHC will emit an error message. </para>
5271 </listitem>
5272 </orderedlist>
5273 </sect2>
5274
5275 <sect2 id="th-quasiquotation">  <title> Template Haskell Quasi-quotation </title>
5276 <para>Quasi-quotation allows patterns and expressions to be written using
5277 programmer-defined concrete syntax; the motivation behind the extension and
5278 several examples are documented in
5279 "<ulink url="http://www.eecs.harvard.edu/~mainland/ghc-quasiquoting/">Why It's
5280 Nice to be Quoted: Quasiquoting for Haskell</ulink>" (Proc Haskell Workshop
5281 2007). The example below shows how to write a quasiquoter for a simple
5282 expression language.</para>
5283
5284 <para>
5285 In the example, the quasiquoter <literal>expr</literal> is bound to a value of
5286 type <literal>Language.Haskell.TH.Quote.QuasiQuoter</literal> which contains two
5287 functions for quoting expressions and patterns, respectively. The first argument
5288 to each quoter is the (arbitrary) string enclosed in the Oxford brackets. The
5289 context of the quasi-quotation statement determines which of the two parsers is
5290 called: if the quasi-quotation occurs in an expression context, the expression
5291 parser is called, and if it occurs in a pattern context, the pattern parser is
5292 called.</para>
5293
5294 <para>
5295 Note that in the example we make use of an antiquoted
5296 variable <literal>n</literal>, indicated by the syntax <literal>'int:n</literal>
5297 (this syntax for anti-quotation was defined by the parser's
5298 author, <emphasis>not</emphasis> by GHC). This binds <literal>n</literal> to the
5299 integer value argument of the constructor <literal>IntExpr</literal> when
5300 pattern matching. Please see the referenced paper for further details regarding
5301 anti-quotation as well as the description of a technique that uses SYB to
5302 leverage a single parser of type <literal>String -> a</literal> to generate both
5303 an expression parser that returns a value of type <literal>Q Exp</literal> and a
5304 pattern parser that returns a value of type <literal>Q Pat</literal>.
5305 </para>
5306
5307 <para>In general, a quasi-quote has the form
5308 <literal>[$<replaceable>quoter</replaceable>| <replaceable>string</replaceable> |]</literal>.
5309 The <replaceable>quoter</replaceable> must be the name of an imported quoter; it
5310 cannot be an arbitrary expression.  The quoted <replaceable>string</replaceable>
5311 can be arbitrary, and may contain newlines.
5312 </para>
5313 <para>
5314 Quasiquoters must obey the same stage restrictions as Template Haskell, e.g., in
5315 the example, <literal>expr</literal> cannot be defined
5316 in <literal>Main.hs</literal> where it is used, but must be imported.
5317 </para>
5318
5319 <programlisting>
5320
5321 {- Main.hs -}
5322 module Main where
5323
5324 import Expr
5325
5326 main :: IO ()
5327 main = do { print $ eval [$expr|1 + 2|]
5328           ; case IntExpr 1 of
5329               { [$expr|'int:n|] -> print n
5330               ;  _              -> return ()
5331               }
5332           }
5333
5334
5335 {- Expr.hs -}
5336 module Expr where
5337
5338 import qualified Language.Haskell.TH as TH
5339 import Language.Haskell.TH.Quasi
5340
5341 data Expr  =  IntExpr Integer
5342            |  AntiIntExpr String
5343            |  BinopExpr BinOp Expr Expr
5344            |  AntiExpr String
5345     deriving(Show, Typeable, Data)
5346
5347 data BinOp  =  AddOp
5348             |  SubOp
5349             |  MulOp
5350             |  DivOp
5351     deriving(Show, Typeable, Data)
5352
5353 eval :: Expr -> Integer
5354 eval (IntExpr n)        = n
5355 eval (BinopExpr op x y) = (opToFun op) (eval x) (eval y)
5356   where
5357     opToFun AddOp = (+)
5358     opToFun SubOp = (-)
5359     opToFun MulOp = (*)
5360     opToFun DivOp = div
5361
5362 expr = QuasiQuoter parseExprExp parseExprPat
5363
5364 -- Parse an Expr, returning its representation as
5365 -- either a Q Exp or a Q Pat. See the referenced paper
5366 -- for how to use SYB to do this by writing a single
5367 -- parser of type String -> Expr instead of two
5368 -- separate parsers.
5369
5370 parseExprExp :: String -> Q Exp
5371 parseExprExp ...
5372
5373 parseExprPat :: String -> Q Pat
5374 parseExprPat ...
5375 </programlisting>
5376
5377 <para>Now run the compiler:
5378 </para>
5379 <programlisting>
5380 $ ghc --make -XQuasiQuotes Main.hs -o main
5381 </programlisting>
5382
5383 <para>Run "main" and here is your output:</para>
5384
5385 <programlisting>
5386 $ ./main
5387 3
5388 1
5389 </programlisting>
5390
5391 </sect2>
5392
5393 </sect1>
5394
5395 <!-- ===================== Arrow notation ===================  -->
5396
5397 <sect1 id="arrow-notation">
5398 <title>Arrow notation
5399 </title>
5400
5401 <para>Arrows are a generalization of monads introduced by John Hughes.
5402 For more details, see
5403 <itemizedlist>
5404
5405 <listitem>
5406 <para>
5407 &ldquo;Generalising Monads to Arrows&rdquo;,
5408 John Hughes, in <citetitle>Science of Computer Programming</citetitle> 37,
5409 pp67&ndash;111, May 2000.
5410 The paper that introduced arrows: a friendly introduction, motivated with
5411 programming examples.
5412 </para>
5413 </listitem>
5414
5415 <listitem>
5416 <para>
5417 &ldquo;<ulink url="http://www.soi.city.ac.uk/~ross/papers/notation.html">A New Notation for Arrows</ulink>&rdquo;,
5418 Ross Paterson, in <citetitle>ICFP</citetitle>, Sep 2001.
5419 Introduced the notation described here.
5420 </para>
5421 </listitem>
5422
5423 <listitem>
5424 <para>
5425 &ldquo;<ulink url="http://www.soi.city.ac.uk/~ross/papers/fop.html">Arrows and Computation</ulink>&rdquo;,
5426 Ross Paterson, in <citetitle>The Fun of Programming</citetitle>,
5427 Palgrave, 2003.
5428 </para>
5429 </listitem>
5430
5431 <listitem>
5432 <para>
5433 &ldquo;<ulink url="http://www.cs.chalmers.se/~rjmh/afp-arrows.pdf">Programming with Arrows</ulink>&rdquo;,
5434 John Hughes, in <citetitle>5th International Summer School on
5435 Advanced Functional Programming</citetitle>,
5436 <citetitle>Lecture Notes in Computer Science</citetitle> vol. 3622,
5437 Springer, 2004.
5438 This paper includes another introduction to the notation,
5439 with practical examples.
5440 </para>
5441 </listitem>
5442
5443 <listitem>
5444 <para>
5445 &ldquo;<ulink url="http://www.haskell.org/ghc/docs/papers/arrow-rules.pdf">Type and Translation Rules for Arrow Notation in GHC</ulink>&rdquo;,
5446 Ross Paterson and Simon Peyton Jones, September 16, 2004.
5447 A terse enumeration of the formal rules used
5448 (extracted from comments in the source code).
5449 </para>
5450 </listitem>
5451
5452 <listitem>
5453 <para>
5454 The arrows web page at
5455 <ulink url="http://www.haskell.org/arrows/"><literal>http://www.haskell.org/arrows/</literal></ulink>.
5456 </para>
5457 </listitem>
5458
5459 </itemizedlist>
5460 With the <option>-XArrows</option> flag, GHC supports the arrow
5461 notation described in the second of these papers,
5462 translating it using combinators from the
5463 <ulink url="../libraries/base/Control-Arrow.html"><literal>Control.Arrow</literal></ulink>
5464 module.
5465 What follows is a brief introduction to the notation;
5466 it won't make much sense unless you've read Hughes's paper.
5467 </para>
5468
5469 <para>The extension adds a new kind of expression for defining arrows:
5470 <screen>
5471 <replaceable>exp</replaceable><superscript>10</superscript> ::= ...
5472        |  proc <replaceable>apat</replaceable> -> <replaceable>cmd</replaceable>
5473 </screen>
5474 where <literal>proc</literal> is a new keyword.
5475 The variables of the pattern are bound in the body of the
5476 <literal>proc</literal>-expression,
5477 which is a new sort of thing called a <firstterm>command</firstterm>.
5478 The syntax of commands is as follows:
5479 <screen>
5480 <replaceable>cmd</replaceable>   ::= <replaceable>exp</replaceable><superscript>10</superscript> -&lt;  <replaceable>exp</replaceable>
5481        |  <replaceable>exp</replaceable><superscript>10</superscript> -&lt;&lt; <replaceable>exp</replaceable>
5482        |  <replaceable>cmd</replaceable><superscript>0</superscript>
5483 </screen>
5484 with <replaceable>cmd</replaceable><superscript>0</superscript> up to
5485 <replaceable>cmd</replaceable><superscript>9</superscript> defined using
5486 infix operators as for expressions, and
5487 <screen>
5488 <replaceable>cmd</replaceable><superscript>10</superscript> ::= \ <replaceable>apat</replaceable> ... <replaceable>apat</replaceable> -> <replaceable>cmd</replaceable>
5489        |  let <replaceable>decls</replaceable> in <replaceable>cmd</replaceable>
5490        |  if <replaceable>exp</replaceable> then <replaceable>cmd</replaceable> else <replaceable>cmd</replaceable>
5491        |  case <replaceable>exp</replaceable> of { <replaceable>calts</replaceable> }
5492        |  do { <replaceable>cstmt</replaceable> ; ... <replaceable>cstmt</replaceable> ; <replaceable>cmd</replaceable> }
5493        |  <replaceable>fcmd</replaceable>
5494
5495 <replaceable>fcmd</replaceable>  ::= <replaceable>fcmd</replaceable> <replaceable>aexp</replaceable>
5496        |  ( <replaceable>cmd</replaceable> )
5497        |  (| <replaceable>aexp</replaceable> <replaceable>cmd</replaceable> ... <replaceable>cmd</replaceable> |)
5498
5499 <replaceable>cstmt</replaceable> ::= let <replaceable>decls</replaceable>
5500        |  <replaceable>pat</replaceable> &lt;- <replaceable>cmd</replaceable>
5501        |  rec { <replaceable>cstmt</replaceable> ; ... <replaceable>cstmt</replaceable> [;] }
5502        |  <replaceable>cmd</replaceable>
5503 </screen>
5504 where <replaceable>calts</replaceable> are like <replaceable>alts</replaceable>
5505 except that the bodies are commands instead of expressions.
5506 </para>
5507
5508 <para>
5509 Commands produce values, but (like monadic computations)
5510 may yield more than one value,
5511 or none, and may do other things as well.
5512 For the most part, familiarity with monadic notation is a good guide to
5513 using commands.
5514 However the values of expressions, even monadic ones,
5515 are determined by the values of the variables they contain;
5516 this is not necessarily the case for commands.
5517 </para>
5518
5519 <para>
5520 A simple example of the new notation is the expression
5521 <screen>
5522 proc x -> f -&lt; x+1
5523 </screen>
5524 We call this a <firstterm>procedure</firstterm> or
5525 <firstterm>arrow abstraction</firstterm>.
5526 As with a lambda expression, the variable <literal>x</literal>
5527 is a new variable bound within the <literal>proc</literal>-expression.
5528 It refers to the input to the arrow.
5529 In the above example, <literal>-&lt;</literal> is not an identifier but an
5530 new reserved symbol used for building commands from an expression of arrow
5531 type and an expression to be fed as input to that arrow.
5532 (The weird look will make more sense later.)
5533 It may be read as analogue of application for arrows.
5534 The above example is equivalent to the Haskell expression
5535 <screen>
5536 arr (\ x -> x+1) >>> f
5537 </screen>
5538 That would make no sense if the expression to the left of
5539 <literal>-&lt;</literal> involves the bound variable <literal>x</literal>.
5540 More generally, the expression to the left of <literal>-&lt;</literal>
5541 may not involve any <firstterm>local variable</firstterm>,
5542 i.e. a variable bound in the current arrow abstraction.
5543 For such a situation there is a variant <literal>-&lt;&lt;</literal>, as in
5544 <screen>
5545 proc x -> f x -&lt;&lt; x+1
5546 </screen>
5547 which is equivalent to
5548 <screen>
5549 arr (\ x -> (f x, x+1)) >>> app
5550 </screen>
5551 so in this case the arrow must belong to the <literal>ArrowApply</literal>
5552 class.
5553 Such an arrow is equivalent to a monad, so if you're using this form
5554 you may find a monadic formulation more convenient.
5555 </para>
5556
5557 <sect2>
5558 <title>do-notation for commands</title>
5559
5560 <para>
5561 Another form of command is a form of <literal>do</literal>-notation.
5562 For example, you can write
5563 <screen>
5564 proc x -> do
5565         y &lt;- f -&lt; x+1
5566         g -&lt; 2*y
5567         let z = x+y
5568         t &lt;- h -&lt; x*z
5569         returnA -&lt; t+z
5570 </screen>
5571 You can read this much like ordinary <literal>do</literal>-notation,
5572 but with commands in place of monadic expressions.
5573 The first line sends the value of <literal>x+1</literal> as an input to
5574 the arrow <literal>f</literal>, and matches its output against
5575 <literal>y</literal>.
5576 In the next line, the output is discarded.
5577 The arrow <function>returnA</function> is defined in the
5578 <ulink url="../libraries/base/Control-Arrow.html"><literal>Control.Arrow</literal></ulink>
5579 module as <literal>arr id</literal>.
5580 The above example is treated as an abbreviation for
5581 <screen>
5582 arr (\ x -> (x, x)) >>>
5583         first (arr (\ x -> x+1) >>> f) >>>
5584         arr (\ (y, x) -> (y, (x, y))) >>>
5585         first (arr (\ y -> 2*y) >>> g) >>>
5586         arr snd >>>
5587         arr (\ (x, y) -> let z = x+y in ((x, z), z)) >>>
5588         first (arr (\ (x, z) -> x*z) >>> h) >>>
5589         arr (\ (t, z) -> t+z) >>>
5590         returnA
5591 </screen>
5592 Note that variables not used later in the composition are projected out.
5593 After simplification using rewrite rules (see <xref linkend="rewrite-rules"/>)
5594 defined in the
5595 <ulink url="../libraries/base/Control-Arrow.html"><literal>Control.Arrow</literal></ulink>
5596 module, this reduces to
5597 <screen>
5598 arr (\ x -> (x+1, x)) >>>
5599         first f >>>
5600         arr (\ (y, x) -> (2*y, (x, y))) >>>
5601         first g >>>
5602         arr (\ (_, (x, y)) -> let z = x+y in (x*z, z)) >>>
5603         first h >>>
5604         arr (\ (t, z) -> t+z)
5605 </screen>
5606 which is what you might have written by hand.
5607 With arrow notation, GHC keeps track of all those tuples of variables for you.
5608 </para>
5609
5610 <para>
5611 Note that although the above translation suggests that
5612 <literal>let</literal>-bound variables like <literal>z</literal> must be
5613 monomorphic, the actual translation produces Core,
5614 so polymorphic variables are allowed.
5615 </para>
5616
5617 <para>
5618 It's also possible to have mutually recursive bindings,
5619 using the new <literal>rec</literal> keyword, as in the following example:
5620 <programlisting>
5621 counter :: ArrowCircuit a => a Bool Int
5622 counter = proc reset -> do
5623         rec     output &lt;- returnA -&lt; if reset then 0 else next
5624                 next &lt;- delay 0 -&lt; output+1
5625         returnA -&lt; output
5626 </programlisting>
5627 The translation of such forms uses the <function>loop</function> combinator,
5628 so the arrow concerned must belong to the <literal>ArrowLoop</literal> class.
5629 </para>
5630
5631 </sect2>
5632
5633 <sect2>
5634 <title>Conditional commands</title>
5635
5636 <para>
5637 In the previous example, we used a conditional expression to construct the
5638 input for an arrow.
5639 Sometimes we want to conditionally execute different commands, as in
5640 <screen>
5641 proc (x,y) ->
5642         if f x y
5643         then g -&lt; x+1
5644         else h -&lt; y+2
5645 </screen>
5646 which is translated to
5647 <screen>
5648 arr (\ (x,y) -> if f x y then Left x else Right y) >>>
5649         (arr (\x -> x+1) >>> f) ||| (arr (\y -> y+2) >>> g)
5650 </screen>
5651 Since the translation uses <function>|||</function>,
5652 the arrow concerned must belong to the <literal>ArrowChoice</literal> class.
5653 </para>
5654
5655 <para>
5656 There are also <literal>case</literal> commands, like
5657 <screen>
5658 case input of
5659     [] -> f -&lt; ()
5660     [x] -> g -&lt; x+1
5661     x1:x2:xs -> do
5662         y &lt;- h -&lt; (x1, x2)
5663         ys &lt;- k -&lt; xs
5664         returnA -&lt; y:ys
5665 </screen>
5666 The syntax is the same as for <literal>case</literal> expressions,
5667 except that the bodies of the alternatives are commands rather than expressions.
5668 The translation is similar to that of <literal>if</literal> commands.
5669 </para>
5670
5671 </sect2>
5672
5673 <sect2>
5674 <title>Defining your own control structures</title>
5675
5676 <para>
5677 As we're seen, arrow notation provides constructs,
5678 modelled on those for expressions,
5679 for sequencing, value recursion and conditionals.
5680 But suitable combinators,
5681 which you can define in ordinary Haskell,
5682 may also be used to build new commands out of existing ones.
5683 The basic idea is that a command defines an arrow from environments to values.
5684 These environments assign values to the free local variables of the command.
5685 Thus combinators that produce arrows from arrows
5686 may also be used to build commands from commands.
5687 For example, the <literal>ArrowChoice</literal> class includes a combinator
5688 <programlisting>
5689 ArrowChoice a => (&lt;+>) :: a e c -> a e c -> a e c
5690 </programlisting>
5691 so we can use it to build commands:
5692 <programlisting>
5693 expr' = proc x -> do
5694                 returnA -&lt; x
5695         &lt;+> do
5696                 symbol Plus -&lt; ()
5697                 y &lt;- term -&lt; ()
5698                 expr' -&lt; x + y
5699         &lt;+> do
5700                 symbol Minus -&lt; ()
5701                 y &lt;- term -&lt; ()
5702                 expr' -&lt; x - y
5703 </programlisting>
5704 (The <literal>do</literal> on the first line is needed to prevent the first
5705 <literal>&lt;+> ...</literal> from being interpreted as part of the
5706 expression on the previous line.)
5707 This is equivalent to
5708 <programlisting>
5709 expr' = (proc x -> returnA -&lt; x)
5710         &lt;+> (proc x -> do
5711                 symbol Plus -&lt; ()
5712                 y &lt;- term -&lt; ()
5713                 expr' -&lt; x + y)
5714         &lt;+> (proc x -> do
5715                 symbol Minus -&lt; ()
5716                 y &lt;- term -&lt; ()
5717                 expr' -&lt; x - y)
5718 </programlisting>
5719 It is essential that this operator be polymorphic in <literal>e</literal>
5720 (representing the environment input to the command
5721 and thence to its subcommands)
5722 and satisfy the corresponding naturality property
5723 <screen>
5724 arr k >>> (f &lt;+> g) = (arr k >>> f) &lt;+> (arr k >>> g)
5725 </screen>
5726 at least for strict <literal>k</literal>.
5727 (This should be automatic if you're not using <function>seq</function>.)
5728 This ensures that environments seen by the subcommands are environments
5729 of the whole command,
5730 and also allows the translation to safely trim these environments.
5731 The operator must also not use any variable defined within the current
5732 arrow abstraction.
5733 </para>
5734
5735 <para>
5736 We could define our own operator
5737 <programlisting>
5738 untilA :: ArrowChoice a => a e () -> a e Bool -> a e ()
5739 untilA body cond = proc x ->
5740         b &lt;- cond -&lt; x
5741         if b then returnA -&lt; ()
5742         else do
5743                 body -&lt; x
5744                 untilA body cond -&lt; x
5745 </programlisting>
5746 and use it in the same way.
5747 Of course this infix syntax only makes sense for binary operators;
5748 there is also a more general syntax involving special brackets:
5749 <screen>
5750 proc x -> do
5751         y &lt;- f -&lt; x+1
5752         (|untilA (increment -&lt; x+y) (within 0.5 -&lt; x)|)
5753 </screen>
5754 </para>
5755
5756 </sect2>
5757
5758 <sect2>
5759 <title>Primitive constructs</title>
5760
5761 <para>
5762 Some operators will need to pass additional inputs to their subcommands.
5763 For example, in an arrow type supporting exceptions,
5764 the operator that attaches an exception handler will wish to pass the
5765 exception that occurred to the handler.
5766 Such an operator might have a type
5767 <screen>
5768 handleA :: ... => a e c -> a (e,Ex) c -> a e c
5769 </screen>
5770 where <literal>Ex</literal> is the type of exceptions handled.
5771 You could then use this with arrow notation by writing a command
5772 <screen>
5773 body `handleA` \ ex -> handler
5774 </screen>
5775 so that if an exception is raised in the command <literal>body</literal>,
5776 the variable <literal>ex</literal> is bound to the value of the exception
5777 and the command <literal>handler</literal>,
5778 which typically refers to <literal>ex</literal>, is entered.
5779 Though the syntax here looks like a functional lambda,
5780 we are talking about commands, and something different is going on.
5781 The input to the arrow represented by a command consists of values for
5782 the free local variables in the command, plus a stack of anonymous values.
5783 In all the prior examples, this stack was empty.
5784 In the second argument to <function>handleA</function>,
5785 this stack consists of one value, the value of the exception.
5786 The command form of lambda merely gives this value a name.
5787 </para>
5788
5789 <para>
5790 More concretely,
5791 the values on the stack are paired to the right of the environment.
5792 So operators like <function>handleA</function> that pass
5793 extra inputs to their subcommands can be designed for use with the notation
5794 by pairing the values with the environment in this way.
5795 More precisely, the type of each argument of the operator (and its result)
5796 should have the form
5797 <screen>
5798 a (...(e,t1), ... tn) t
5799 </screen>
5800 where <replaceable>e</replaceable> is a polymorphic variable
5801 (representing the environment)
5802 and <replaceable>ti</replaceable> are the types of the values on the stack,
5803 with <replaceable>t1</replaceable> being the <quote>top</quote>.
5804 The polymorphic variable <replaceable>e</replaceable> must not occur in
5805 <replaceable>a</replaceable>, <replaceable>ti</replaceable> or
5806 <replaceable>t</replaceable>.
5807 However the arrows involved need not be the same.
5808 Here are some more examples of suitable operators:
5809 <screen>
5810 bracketA :: ... => a e b -> a (e,b) c -> a (e,c) d -> a e d
5811 runReader :: ... => a e c -> a' (e,State) c
5812 runState :: ... => a e c -> a' (e,State) (c,State)
5813 </screen>
5814 We can supply the extra input required by commands built with the last two
5815 by applying them to ordinary expressions, as in
5816 <screen>
5817 proc x -> do
5818         s &lt;- ...
5819         (|runReader (do { ... })|) s
5820 </screen>
5821 which adds <literal>s</literal> to the stack of inputs to the command
5822 built using <function>runReader</function>.
5823 </para>
5824
5825 <para>
5826 The command versions of lambda abstraction and application are analogous to
5827 the expression versions.
5828 In particular, the beta and eta rules describe equivalences of commands.
5829 These three features (operators, lambda abstraction and application)
5830 are the core of the notation; everything else can be built using them,
5831 though the results would be somewhat clumsy.
5832 For example, we could simulate <literal>do</literal>-notation by defining
5833 <programlisting>
5834 bind :: Arrow a => a e b -> a (e,b) c -> a e c
5835 u `bind` f = returnA &amp;&amp;&amp; u >>> f
5836
5837 bind_ :: Arrow a => a e b -> a e c -> a e c
5838 u `bind_` f = u `bind` (arr fst >>> f)
5839 </programlisting>
5840 We could simulate <literal>if</literal> by defining
5841 <programlisting>
5842 cond :: ArrowChoice a => a e b -> a e b -> a (e,Bool) b
5843 cond f g = arr (\ (e,b) -> if b then Left e else Right e) >>> f ||| g
5844 </programlisting>
5845 </para>
5846
5847 </sect2>
5848
5849 <sect2>
5850 <title>Differences with the paper</title>
5851
5852 <itemizedlist>
5853
5854 <listitem>
5855 <para>Instead of a single form of arrow application (arrow tail) with two
5856 translations, the implementation provides two forms
5857 <quote><literal>-&lt;</literal></quote> (first-order)
5858 and <quote><literal>-&lt;&lt;</literal></quote> (higher-order).
5859 </para>
5860 </listitem>
5861
5862 <listitem>
5863 <para>User-defined operators are flagged with banana brackets instead of
5864 a new <literal>form</literal> keyword.
5865 </para>
5866 </listitem>
5867
5868 </itemizedlist>
5869
5870 </sect2>
5871
5872 <sect2>
5873 <title>Portability</title>
5874
5875 <para>
5876 Although only GHC implements arrow notation directly,
5877 there is also a preprocessor
5878 (available from the
5879 <ulink url="http://www.haskell.org/arrows/">arrows web page</ulink>)
5880 that translates arrow notation into Haskell 98
5881 for use with other Haskell systems.
5882 You would still want to check arrow programs with GHC;
5883 tracing type errors in the preprocessor output is not easy.
5884 Modules intended for both GHC and the preprocessor must observe some
5885 additional restrictions:
5886 <itemizedlist>
5887
5888 <listitem>
5889 <para>
5890 The module must import
5891 <ulink url="../libraries/base/Control-Arrow.html"><literal>Control.Arrow</literal></ulink>.
5892 </para>
5893 </listitem>
5894
5895 <listitem>
5896 <para>
5897 The preprocessor cannot cope with other Haskell extensions.
5898 These would have to go in separate modules.
5899 </para>
5900 </listitem>
5901
5902 <listitem>
5903 <para>
5904 Because the preprocessor targets Haskell (rather than Core),
5905 <literal>let</literal>-bound variables are monomorphic.
5906 </para>
5907 </listitem>
5908
5909 </itemizedlist>
5910 </para>
5911
5912 </sect2>
5913
5914 </sect1>
5915
5916 <!-- ==================== BANG PATTERNS =================  -->
5917
5918 <sect1 id="bang-patterns">
5919 <title>Bang patterns
5920 <indexterm><primary>Bang patterns</primary></indexterm>
5921 </title>
5922 <para>GHC supports an extension of pattern matching called <emphasis>bang
5923 patterns</emphasis>.   Bang patterns are under consideration for Haskell Prime.
5924 The <ulink
5925 url="http://hackage.haskell.org/trac/haskell-prime/wiki/BangPatterns">Haskell
5926 prime feature description</ulink> contains more discussion and examples
5927 than the material below.
5928 </para>
5929 <para>
5930 Bang patterns are enabled by the flag <option>-XBangPatterns</option>.
5931 </para>
5932
5933 <sect2 id="bang-patterns-informal">
5934 <title>Informal description of bang patterns
5935 </title>
5936 <para>
5937 The main idea is to add a single new production to the syntax of patterns:
5938 <programlisting>
5939   pat ::= !pat
5940 </programlisting>
5941 Matching an expression <literal>e</literal> against a pattern <literal>!p</literal> is done by first
5942 evaluating <literal>e</literal> (to WHNF) and then matching the result against <literal>p</literal>.
5943 Example:
5944 <programlisting>
5945 f1 !x = True
5946 </programlisting>
5947 This definition makes <literal>f1</literal> is strict in <literal>x</literal>,
5948 whereas without the bang it would be lazy.
5949 Bang patterns can be nested of course:
5950 <programlisting>
5951 f2 (!x, y) = [x,y]
5952 </programlisting>
5953 Here, <literal>f2</literal> is strict in <literal>x</literal> but not in
5954 <literal>y</literal>.
5955 A bang only really has an effect if it precedes a variable or wild-card pattern:
5956 <programlisting>
5957 f3 !(x,y) = [x,y]
5958 f4 (x,y)  = [x,y]
5959 </programlisting>
5960 Here, <literal>f3</literal> and <literal>f4</literal> are identical; putting a bang before a pattern that
5961 forces evaluation anyway does nothing.
5962 </para><para>
5963 Bang patterns work in <literal>case</literal> expressions too, of course:
5964 <programlisting>
5965 g5 x = let y = f x in body
5966 g6 x = case f x of { y -&gt; body }
5967 g7 x = case f x of { !y -&gt; body }
5968 </programlisting>
5969 The functions <literal>g5</literal> and <literal>g6</literal> mean exactly the same thing.
5970 But <literal>g7</literal> evaluates <literal>(f x)</literal>, binds <literal>y</literal> to the
5971 result, and then evaluates <literal>body</literal>.
5972 </para><para>
5973 Bang patterns work in <literal>let</literal> and <literal>where</literal>
5974 definitions too. For example:
5975 <programlisting>
5976 let ![x,y] = e in b
5977 </programlisting>
5978 is a strict pattern: operationally, it evaluates <literal>e</literal>, matches
5979 it against the pattern <literal>[x,y]</literal>, and then evaluates <literal>b</literal>
5980 The "<literal>!</literal>" should not be regarded as part of the pattern; after all,
5981 in a function argument <literal>![x,y]</literal> means the
5982 same as <literal>[x,y]</literal>.  Rather, the "<literal>!</literal>"
5983 is part of the syntax of <literal>let</literal> bindings.
5984 </para>
5985 </sect2>
5986
5987
5988 <sect2 id="bang-patterns-sem">
5989 <title>Syntax and semantics
5990 </title>
5991 <para>
5992
5993 We add a single new production to the syntax of patterns:
5994 <programlisting>
5995   pat ::= !pat
5996 </programlisting>
5997 There is one problem with syntactic ambiguity.  Consider:
5998 <programlisting>
5999 f !x = 3
6000 </programlisting>
6001 Is this a definition of the infix function "<literal>(!)</literal>",
6002 or of the "<literal>f</literal>" with a bang pattern? GHC resolves this
6003 ambiguity in favour of the latter.  If you want to define
6004 <literal>(!)</literal> with bang-patterns enabled, you have to do so using
6005 prefix notation:
6006 <programlisting>
6007 (!) f x = 3
6008 </programlisting>
6009 The semantics of Haskell pattern matching is described in <ulink
6010 url="http://www.haskell.org/onlinereport/exps.html#sect3.17.2">
6011 Section 3.17.2</ulink> of the Haskell Report.  To this description add
6012 one extra item 10, saying:
6013 <itemizedlist><listitem><para>Matching
6014 the pattern <literal>!pat</literal> against a value <literal>v</literal> behaves as follows:
6015 <itemizedlist><listitem><para>if <literal>v</literal> is bottom, the match diverges</para></listitem>
6016                 <listitem><para>otherwise, <literal>pat</literal> is matched against
6017                 <literal>v</literal></para></listitem>
6018 </itemizedlist>
6019 </para></listitem></itemizedlist>
6020 Similarly, in Figure 4 of  <ulink url="http://www.haskell.org/onlinereport/exps.html#sect3.17.3">
6021 Section 3.17.3</ulink>, add a new case (t):
6022 <programlisting>
6023 case v of { !pat -> e; _ -> e' }
6024    = v `seq` case v of { pat -> e; _ -> e' }
6025 </programlisting>
6026 </para><para>
6027 That leaves let expressions, whose translation is given in
6028 <ulink url="http://www.haskell.org/onlinereport/exps.html#sect3.12">Section
6029 3.12</ulink>
6030 of the Haskell Report.
6031 In the translation box, first apply
6032 the following transformation:  for each pattern <literal>pi</literal> that is of
6033 form <literal>!qi = ei</literal>, transform it to <literal>(xi,!qi) = ((),ei)</literal>, and and replace <literal>e0</literal>
6034 by <literal>(xi `seq` e0)</literal>.  Then, when none of the left-hand-side patterns
6035 have a bang at the top, apply the rules in the existing box.
6036 </para>
6037 <para>The effect of the let rule is to force complete matching of the pattern
6038 <literal>qi</literal> before evaluation of the body is begun.  The bang is
6039 retained in the translated form in case <literal>qi</literal> is a variable,
6040 thus:
6041 <programlisting>
6042   let !y = f x in b
6043 </programlisting>
6044
6045 </para>
6046 <para>
6047 The let-binding can be recursive.  However, it is much more common for
6048 the let-binding to be non-recursive, in which case the following law holds:
6049 <literal>(let !p = rhs in body)</literal>
6050      is equivalent to
6051 <literal>(case rhs of !p -> body)</literal>
6052 </para>
6053 <para>
6054 A pattern with a bang at the outermost level is not allowed at the top level of
6055 a module.
6056 </para>
6057 </sect2>
6058 </sect1>
6059
6060 <!-- ==================== ASSERTIONS =================  -->
6061
6062 <sect1 id="assertions">
6063 <title>Assertions
6064 <indexterm><primary>Assertions</primary></indexterm>
6065 </title>
6066
6067 <para>
6068 If you want to make use of assertions in your standard Haskell code, you
6069 could define a function like the following:
6070 </para>
6071
6072 <para>
6073
6074 <programlisting>
6075 assert :: Bool -> a -> a
6076 assert False x = error "assertion failed!"
6077 assert _     x = x
6078 </programlisting>
6079
6080 </para>
6081
6082 <para>
6083 which works, but gives you back a less than useful error message --
6084 an assertion failed, but which and where?
6085 </para>
6086
6087 <para>
6088 One way out is to define an extended <function>assert</function> function which also
6089 takes a descriptive string to include in the error message and
6090 perhaps combine this with the use of a pre-processor which inserts
6091 the source location where <function>assert</function> was used.
6092 </para>
6093
6094 <para>
6095 Ghc offers a helping hand here, doing all of this for you. For every
6096 use of <function>assert</function> in the user's source:
6097 </para>
6098
6099 <para>
6100
6101 <programlisting>
6102 kelvinToC :: Double -> Double
6103 kelvinToC k = assert (k &gt;= 0.0) (k+273.15)
6104 </programlisting>
6105
6106 </para>
6107
6108 <para>
6109 Ghc will rewrite this to also include the source location where the
6110 assertion was made,
6111 </para>
6112
6113 <para>
6114
6115 <programlisting>
6116 assert pred val ==> assertError "Main.hs|15" pred val
6117 </programlisting>
6118
6119 </para>
6120
6121 <para>
6122 The rewrite is only performed by the compiler when it spots
6123 applications of <function>Control.Exception.assert</function>, so you
6124 can still define and use your own versions of
6125 <function>assert</function>, should you so wish. If not, import
6126 <literal>Control.Exception</literal> to make use
6127 <function>assert</function> in your code.
6128 </para>
6129
6130 <para>
6131 GHC ignores assertions when optimisation is turned on with the
6132       <option>-O</option><indexterm><primary><option>-O</option></primary></indexterm> flag.  That is, expressions of the form
6133 <literal>assert pred e</literal> will be rewritten to
6134 <literal>e</literal>.  You can also disable assertions using the
6135       <option>-fignore-asserts</option>
6136       option<indexterm><primary><option>-fignore-asserts</option></primary>
6137       </indexterm>.</para>
6138
6139 <para>
6140 Assertion failures can be caught, see the documentation for the
6141 <literal>Control.Exception</literal> library for the details.
6142 </para>
6143
6144 </sect1>
6145
6146
6147 <!-- =============================== PRAGMAS ===========================  -->
6148
6149   <sect1 id="pragmas">
6150     <title>Pragmas</title>
6151
6152     <indexterm><primary>pragma</primary></indexterm>
6153
6154     <para>GHC supports several pragmas, or instructions to the
6155     compiler placed in the source code.  Pragmas don't normally affect
6156     the meaning of the program, but they might affect the efficiency
6157     of the generated code.</para>
6158
6159     <para>Pragmas all take the form
6160
6161 <literal>{-# <replaceable>word</replaceable> ... #-}</literal>
6162
6163     where <replaceable>word</replaceable> indicates the type of
6164     pragma, and is followed optionally by information specific to that
6165     type of pragma.  Case is ignored in
6166     <replaceable>word</replaceable>.  The various values for
6167     <replaceable>word</replaceable> that GHC understands are described
6168     in the following sections; any pragma encountered with an
6169     unrecognised <replaceable>word</replaceable> is (silently)
6170     ignored. The layout rule applies in pragmas, so the closing <literal>#-}</literal>
6171     should start in a column to the right of the opening <literal>{-#</literal>. </para>
6172
6173     <para>Certain pragmas are <emphasis>file-header pragmas</emphasis>.  A file-header
6174       pragma must precede the <literal>module</literal> keyword in the file.
6175       There can be as many file-header pragmas as you please, and they can be
6176       preceded or followed by comments.</para>
6177
6178     <sect2 id="language-pragma">
6179       <title>LANGUAGE pragma</title>
6180
6181       <indexterm><primary>LANGUAGE</primary><secondary>pragma</secondary></indexterm>
6182       <indexterm><primary>pragma</primary><secondary>LANGUAGE</secondary></indexterm>
6183
6184       <para>The <literal>LANGUAGE</literal> pragma allows language extensions to be enabled
6185         in a portable way.
6186         It is the intention that all Haskell compilers support the
6187         <literal>LANGUAGE</literal> pragma with the same syntax, although not
6188         all extensions are supported by all compilers, of
6189         course.  The <literal>LANGUAGE</literal> pragma should be used instead
6190         of <literal>OPTIONS_GHC</literal>, if possible.</para>
6191
6192       <para>For example, to enable the FFI and preprocessing with CPP:</para>
6193
6194 <programlisting>{-# LANGUAGE ForeignFunctionInterface, CPP #-}</programlisting>
6195
6196         <para><literal>LANGUAGE</literal> is a file-header pragma (see <xref linkend="pragmas"/>).</para>
6197
6198       <para>Every language extension can also be turned into a command-line flag
6199         by prefixing it with "<literal>-X</literal>"; for example <option>-XForeignFunctionInterface</option>.
6200         (Similarly, all "<literal>-X</literal>" flags can be written as <literal>LANGUAGE</literal> pragmas.
6201       </para>
6202
6203       <para>A list of all supported language extensions can be obtained by invoking
6204         <literal>ghc --supported-languages</literal> (see <xref linkend="modes"/>).</para>
6205
6206       <para>Any extension from the <literal>Extension</literal> type defined in
6207         <ulink
6208           url="../libraries/Cabal/Language-Haskell-Extension.html"><literal>Language.Haskell.Extension</literal></ulink>
6209         may be used.  GHC will report an error if any of the requested extensions are not supported.</para>
6210     </sect2>
6211
6212
6213     <sect2 id="options-pragma">
6214       <title>OPTIONS_GHC pragma</title>
6215       <indexterm><primary>OPTIONS_GHC</primary>
6216       </indexterm>
6217       <indexterm><primary>pragma</primary><secondary>OPTIONS_GHC</secondary>
6218       </indexterm>
6219
6220       <para>The <literal>OPTIONS_GHC</literal> pragma is used to specify
6221       additional options that are given to the compiler when compiling
6222       this source file.  See <xref linkend="source-file-options"/> for
6223       details.</para>
6224
6225       <para>Previous versions of GHC accepted <literal>OPTIONS</literal> rather
6226         than <literal>OPTIONS_GHC</literal>, but that is now deprecated.</para>
6227     </sect2>
6228
6229         <para><literal>OPTIONS_GHC</literal> is a file-header pragma (see <xref linkend="pragmas"/>).</para>
6230
6231     <sect2 id="include-pragma">
6232       <title>INCLUDE pragma</title>
6233
6234       <para>The <literal>INCLUDE</literal> pragma is for specifying the names
6235         of C header files that should be <literal>#include</literal>'d into
6236         the C source code generated by the compiler for the current module (if
6237         compiling via C).  For example:</para>
6238
6239 <programlisting>
6240 {-# INCLUDE "foo.h" #-}
6241 {-# INCLUDE &lt;stdio.h&gt; #-}</programlisting>
6242
6243         <para><literal>INCLUDE</literal> is a file-header pragma (see <xref linkend="pragmas"/>).</para>
6244
6245       <para>An <literal>INCLUDE</literal> pragma is  the preferred alternative
6246         to the <option>-#include</option> option (<xref
6247           linkend="options-C-compiler" />), because the
6248         <literal>INCLUDE</literal> pragma is understood by other
6249         compilers.  Yet another alternative is to add the include file to each
6250         <literal>foreign import</literal> declaration in your code, but we
6251         don't recommend using this approach with GHC.</para>
6252     </sect2>
6253
6254     <sect2 id="warning-deprecated-pragma">
6255       <title>WARNING and DEPRECATED pragmas</title>
6256       <indexterm><primary>WARNING</primary></indexterm>
6257       <indexterm><primary>DEPRECATED</primary></indexterm>
6258
6259       <para>The WARNING pragma allows you to attach an arbitrary warning
6260       to a particular function, class, or type.
6261       A DEPRECATED pragma lets you specify that
6262       a particular function, class, or type is deprecated.
6263       There are two ways of using these pragmas.
6264
6265       <itemizedlist>
6266         <listitem>
6267           <para>You can work on an entire module thus:</para>
6268 <programlisting>
6269    module Wibble {-# DEPRECATED "Use Wobble instead" #-} where
6270      ...
6271 </programlisting>
6272       <para>Or:</para>
6273 <programlisting>
6274    module Wibble {-# WARNING "This is an unstable interface." #-} where
6275      ...
6276 </programlisting>
6277           <para>When you compile any module that import
6278           <literal>Wibble</literal>, GHC will print the specified
6279           message.</para>
6280         </listitem>
6281
6282         <listitem>
6283           <para>You can attach a warning to a function, class, type, or data constructor, with the
6284           following top-level declarations:</para>
6285 <programlisting>
6286    {-# DEPRECATED f, C, T "Don't use these" #-}
6287    {-# WARNING unsafePerformIO "This is unsafe; I hope you know what you're doing" #-}
6288 </programlisting>
6289           <para>When you compile any module that imports and uses any
6290           of the specified entities, GHC will print the specified
6291           message.</para>
6292           <para> You can only attach to entities declared at top level in the module
6293           being compiled, and you can only use unqualified names in the list of
6294           entities. A capitalised name, such as <literal>T</literal>
6295           refers to <emphasis>either</emphasis> the type constructor <literal>T</literal>
6296           <emphasis>or</emphasis> the data constructor <literal>T</literal>, or both if
6297           both are in scope.  If both are in scope, there is currently no way to
6298       specify one without the other (c.f. fixities
6299       <xref linkend="infix-tycons"/>).</para>
6300         </listitem>
6301       </itemizedlist>
6302       Warnings and deprecations are not reported for
6303       (a) uses within the defining module, and
6304       (b) uses in an export list.
6305       The latter reduces spurious complaints within a library
6306       in which one module gathers together and re-exports
6307       the exports of several others.
6308       </para>
6309       <para>You can suppress the warnings with the flag
6310       <option>-fno-warn-warnings-deprecations</option>.</para>
6311     </sect2>
6312
6313     <sect2 id="inline-noinline-pragma">
6314       <title>INLINE and NOINLINE pragmas</title>
6315
6316       <para>These pragmas control the inlining of function
6317       definitions.</para>
6318
6319       <sect3 id="inline-pragma">
6320         <title>INLINE pragma</title>
6321         <indexterm><primary>INLINE</primary></indexterm>
6322
6323         <para>GHC (with <option>-O</option>, as always) tries to
6324         inline (or &ldquo;unfold&rdquo;) functions/values that are
6325         &ldquo;small enough,&rdquo; thus avoiding the call overhead
6326         and possibly exposing other more-wonderful optimisations.
6327         Normally, if GHC decides a function is &ldquo;too
6328         expensive&rdquo; to inline, it will not do so, nor will it
6329         export that unfolding for other modules to use.</para>
6330
6331         <para>The sledgehammer you can bring to bear is the
6332         <literal>INLINE</literal><indexterm><primary>INLINE
6333         pragma</primary></indexterm> pragma, used thusly:</para>
6334
6335 <programlisting>
6336 key_function :: Int -> String -> (Bool, Double)
6337 {-# INLINE key_function #-}
6338 </programlisting>
6339
6340         <para>The major effect of an <literal>INLINE</literal> pragma
6341         is to declare a function's &ldquo;cost&rdquo; to be very low.
6342         The normal unfolding machinery will then be very keen to
6343         inline it.  However, an <literal>INLINE</literal> pragma for a
6344         function "<literal>f</literal>" has a number of other effects:
6345 <itemizedlist>
6346 <listitem><para>
6347 No functions are inlined into <literal>f</literal>.  Otherwise
6348 GHC might inline a big function into <literal>f</literal>'s right hand side,
6349 making <literal>f</literal> big; and then inline <literal>f</literal> blindly.
6350 </para></listitem>
6351 <listitem><para>
6352 The float-in, float-out, and common-sub-expression transformations are not
6353 applied to the body of <literal>f</literal>.
6354 </para></listitem>
6355 <listitem><para>
6356 An INLINE function is not worker/wrappered by strictness analysis.
6357 It's going to be inlined wholesale instead.
6358 </para></listitem>
6359 </itemizedlist>
6360 All of these effects are aimed at ensuring that what gets inlined is
6361 exactly what you asked for, no more and no less.
6362 </para>
6363 <para>GHC ensures that inlining cannot go on forever: every mutually-recursive
6364 group is cut by one or more <emphasis>loop breakers</emphasis> that is never inlined
6365 (see <ulink url="http://research.microsoft.com/%7Esimonpj/Papers/inlining/index.htm">
6366 Secrets of the GHC inliner, JFP 12(4) July 2002</ulink>).
6367 GHC tries not to select a function with an INLINE pragma as a loop breaker, but
6368 when there is no choice even an INLINE function can be selected, in which case
6369 the INLINE pragma is ignored.
6370 For example, for a self-recursive function, the loop breaker can only be the function
6371 itself, so an INLINE pragma is always ignored.</para>
6372
6373         <para>Syntactically, an <literal>INLINE</literal> pragma for a
6374         function can be put anywhere its type signature could be
6375         put.</para>
6376
6377         <para><literal>INLINE</literal> pragmas are a particularly
6378         good idea for the
6379         <literal>then</literal>/<literal>return</literal> (or
6380         <literal>bind</literal>/<literal>unit</literal>) functions in
6381         a monad.  For example, in GHC's own
6382         <literal>UniqueSupply</literal> monad code, we have:</para>
6383
6384 <programlisting>
6385 {-# INLINE thenUs #-}
6386 {-# INLINE returnUs #-}
6387 </programlisting>
6388
6389         <para>See also the <literal>NOINLINE</literal> pragma (<xref
6390         linkend="noinline-pragma"/>).</para>
6391
6392         <para>Note: the HBC compiler doesn't like <literal>INLINE</literal> pragmas,
6393           so if you want your code to be HBC-compatible you'll have to surround
6394           the pragma with C pre-processor directives
6395           <literal>#ifdef __GLASGOW_HASKELL__</literal>...<literal>#endif</literal>.</para>
6396
6397       </sect3>
6398
6399       <sect3 id="noinline-pragma">
6400         <title>NOINLINE pragma</title>
6401
6402         <indexterm><primary>NOINLINE</primary></indexterm>
6403         <indexterm><primary>NOTINLINE</primary></indexterm>
6404
6405         <para>The <literal>NOINLINE</literal> pragma does exactly what
6406         you'd expect: it stops the named function from being inlined
6407         by the compiler.  You shouldn't ever need to do this, unless
6408         you're very cautious about code size.</para>
6409
6410         <para><literal>NOTINLINE</literal> is a synonym for
6411         <literal>NOINLINE</literal> (<literal>NOINLINE</literal> is
6412         specified by Haskell 98 as the standard way to disable
6413         inlining, so it should be used if you want your code to be
6414         portable).</para>
6415       </sect3>
6416
6417       <sect3 id="phase-control">
6418         <title>Phase control</title>
6419
6420         <para> Sometimes you want to control exactly when in GHC's
6421         pipeline the INLINE pragma is switched on.  Inlining happens
6422         only during runs of the <emphasis>simplifier</emphasis>.  Each
6423         run of the simplifier has a different <emphasis>phase
6424         number</emphasis>; the phase number decreases towards zero.
6425         If you use <option>-dverbose-core2core</option> you'll see the
6426         sequence of phase numbers for successive runs of the
6427         simplifier.  In an INLINE pragma you can optionally specify a
6428         phase number, thus:
6429         <itemizedlist>
6430           <listitem>
6431             <para>"<literal>INLINE[k] f</literal>" means: do not inline
6432             <literal>f</literal>
6433               until phase <literal>k</literal>, but from phase
6434               <literal>k</literal> onwards be very keen to inline it.
6435             </para></listitem>
6436           <listitem>
6437             <para>"<literal>INLINE[~k] f</literal>" means: be very keen to inline
6438             <literal>f</literal>
6439               until phase <literal>k</literal>, but from phase
6440               <literal>k</literal> onwards do not inline it.
6441             </para></listitem>
6442           <listitem>
6443             <para>"<literal>NOINLINE[k] f</literal>" means: do not inline
6444             <literal>f</literal>
6445               until phase <literal>k</literal>, but from phase
6446               <literal>k</literal> onwards be willing to inline it (as if
6447               there was no pragma).
6448             </para></listitem>
6449             <listitem>
6450             <para>"<literal>NOINLINE[~k] f</literal>" means: be willing to inline
6451             <literal>f</literal>
6452               until phase <literal>k</literal>, but from phase
6453               <literal>k</literal> onwards do not inline it.
6454             </para></listitem>
6455         </itemizedlist>
6456 The same information is summarised here:
6457 <programlisting>
6458                            -- Before phase 2     Phase 2 and later
6459   {-# INLINE   [2]  f #-}  --      No                 Yes
6460   {-# INLINE   [~2] f #-}  --      Yes                No
6461   {-# NOINLINE [2]  f #-}  --      No                 Maybe
6462   {-# NOINLINE [~2] f #-}  --      Maybe              No
6463
6464   {-# INLINE   f #-}       --      Yes                Yes
6465   {-# NOINLINE f #-}       --      No                 No
6466 </programlisting>
6467 By "Maybe" we mean that the usual heuristic inlining rules apply (if the
6468 function body is small, or it is applied to interesting-looking arguments etc).
6469 Another way to understand the semantics is this:
6470 <itemizedlist>
6471 <listitem><para>For both INLINE and NOINLINE, the phase number says
6472 when inlining is allowed at all.</para></listitem>
6473 <listitem><para>The INLINE pragma has the additional effect of making the
6474 function body look small, so that when inlining is allowed it is very likely to
6475 happen.
6476 </para></listitem>
6477 </itemizedlist>
6478 </para>
6479 <para>The same phase-numbering control is available for RULES
6480         (<xref linkend="rewrite-rules"/>).</para>
6481       </sect3>
6482     </sect2>
6483
6484     <sect2 id="line-pragma">
6485       <title>LINE pragma</title>
6486
6487       <indexterm><primary>LINE</primary><secondary>pragma</secondary></indexterm>
6488       <indexterm><primary>pragma</primary><secondary>LINE</secondary></indexterm>
6489       <para>This pragma is similar to C's <literal>&num;line</literal>
6490       pragma, and is mainly for use in automatically generated Haskell
6491       code.  It lets you specify the line number and filename of the
6492       original code; for example</para>
6493
6494 <programlisting>{-# LINE 42 "Foo.vhs" #-}</programlisting>
6495
6496       <para>if you'd generated the current file from something called
6497       <filename>Foo.vhs</filename> and this line corresponds to line
6498       42 in the original.  GHC will adjust its error messages to refer
6499       to the line/file named in the <literal>LINE</literal>
6500       pragma.</para>
6501     </sect2>
6502
6503     <sect2 id="rules">
6504       <title>RULES pragma</title>
6505
6506       <para>The RULES pragma lets you specify rewrite rules.  It is
6507       described in <xref linkend="rewrite-rules"/>.</para>
6508     </sect2>
6509
6510     <sect2 id="specialize-pragma">
6511       <title>SPECIALIZE pragma</title>
6512
6513       <indexterm><primary>SPECIALIZE pragma</primary></indexterm>
6514       <indexterm><primary>pragma, SPECIALIZE</primary></indexterm>
6515       <indexterm><primary>overloading, death to</primary></indexterm>
6516
6517       <para>(UK spelling also accepted.)  For key overloaded
6518       functions, you can create extra versions (NB: more code space)
6519       specialised to particular types.  Thus, if you have an
6520       overloaded function:</para>
6521
6522 <programlisting>
6523   hammeredLookup :: Ord key => [(key, value)] -> key -> value
6524 </programlisting>
6525
6526       <para>If it is heavily used on lists with
6527       <literal>Widget</literal> keys, you could specialise it as
6528       follows:</para>
6529
6530 <programlisting>
6531   {-# SPECIALIZE hammeredLookup :: [(Widget, value)] -> Widget -> value #-}
6532 </programlisting>
6533
6534       <para>A <literal>SPECIALIZE</literal> pragma for a function can
6535       be put anywhere its type signature could be put.</para>
6536
6537       <para>A <literal>SPECIALIZE</literal> has the effect of generating
6538       (a) a specialised version of the function and (b) a rewrite rule
6539       (see <xref linkend="rewrite-rules"/>) that rewrites a call to the
6540       un-specialised function into a call to the specialised one.</para>
6541
6542       <para>The type in a SPECIALIZE pragma can be any type that is less
6543         polymorphic than the type of the original function.  In concrete terms,
6544         if the original function is <literal>f</literal> then the pragma
6545 <programlisting>
6546   {-# SPECIALIZE f :: &lt;type&gt; #-}
6547 </programlisting>
6548       is valid if and only if the definition
6549 <programlisting>
6550   f_spec :: &lt;type&gt;
6551   f_spec = f
6552 </programlisting>
6553       is valid.  Here are some examples (where we only give the type signature
6554       for the original function, not its code):
6555 <programlisting>
6556   f :: Eq a => a -> b -> b
6557   {-# SPECIALISE f :: Int -> b -> b #-}
6558
6559   g :: (Eq a, Ix b) => a -> b -> b
6560   {-# SPECIALISE g :: (Eq a) => a -> Int -> Int #-}
6561
6562   h :: Eq a => a -> a -> a
6563   {-# SPECIALISE h :: (Eq a) => [a] -> [a] -> [a] #-}
6564 </programlisting>
6565 The last of these examples will generate a
6566 RULE with a somewhat-complex left-hand side (try it yourself), so it might not fire very
6567 well.  If you use this kind of specialisation, let us know how well it works.
6568 </para>
6569
6570 <para>A <literal>SPECIALIZE</literal> pragma can optionally be followed with a
6571 <literal>INLINE</literal> or <literal>NOINLINE</literal> pragma, optionally
6572 followed by a phase, as described in <xref linkend="inline-noinline-pragma"/>.
6573 The <literal>INLINE</literal> pragma affects the specialised version of the
6574 function (only), and applies even if the function is recursive.  The motivating
6575 example is this:
6576 <programlisting>
6577 -- A GADT for arrays with type-indexed representation
6578 data Arr e where
6579   ArrInt :: !Int -> ByteArray# -> Arr Int
6580   ArrPair :: !Int -> Arr e1 -> Arr e2 -> Arr (e1, e2)
6581
6582 (!:) :: Arr e -> Int -> e
6583 {-# SPECIALISE INLINE (!:) :: Arr Int -> Int -> Int #-}
6584 {-# SPECIALISE INLINE (!:) :: Arr (a, b) -> Int -> (a, b) #-}
6585 (ArrInt _ ba)     !: (I# i) = I# (indexIntArray# ba i)
6586 (ArrPair _ a1 a2) !: i      = (a1 !: i, a2 !: i)
6587 </programlisting>
6588 Here, <literal>(!:)</literal> is a recursive function that indexes arrays
6589 of type <literal>Arr e</literal>.  Consider a call to  <literal>(!:)</literal>
6590 at type <literal>(Int,Int)</literal>.  The second specialisation will fire, and
6591 the specialised function will be inlined.  It has two calls to
6592 <literal>(!:)</literal>,
6593 both at type <literal>Int</literal>.  Both these calls fire the first
6594 specialisation, whose body is also inlined.  The result is a type-based
6595 unrolling of the indexing function.</para>
6596 <para>Warning: you can make GHC diverge by using <literal>SPECIALISE INLINE</literal>
6597 on an ordinarily-recursive function.</para>
6598
6599       <para>Note: In earlier versions of GHC, it was possible to provide your own
6600       specialised function for a given type:
6601
6602 <programlisting>
6603 {-# SPECIALIZE hammeredLookup :: [(Int, value)] -> Int -> value = intLookup #-}
6604 </programlisting>
6605
6606       This feature has been removed, as it is now subsumed by the
6607       <literal>RULES</literal> pragma (see <xref linkend="rule-spec"/>).</para>
6608
6609     </sect2>
6610
6611 <sect2 id="specialize-instance-pragma">
6612 <title>SPECIALIZE instance pragma
6613 </title>
6614
6615 <para>
6616 <indexterm><primary>SPECIALIZE pragma</primary></indexterm>
6617 <indexterm><primary>overloading, death to</primary></indexterm>
6618 Same idea, except for instance declarations.  For example:
6619
6620 <programlisting>
6621 instance (Eq a) => Eq (Foo a) where {
6622    {-# SPECIALIZE instance Eq (Foo [(Int, Bar)]) #-}
6623    ... usual stuff ...
6624  }
6625 </programlisting>
6626 The pragma must occur inside the <literal>where</literal> part
6627 of the instance declaration.
6628 </para>
6629 <para>
6630 Compatible with HBC, by the way, except perhaps in the placement
6631 of the pragma.
6632 </para>
6633
6634 </sect2>
6635
6636     <sect2 id="unpack-pragma">
6637       <title>UNPACK pragma</title>
6638
6639       <indexterm><primary>UNPACK</primary></indexterm>
6640
6641       <para>The <literal>UNPACK</literal> indicates to the compiler
6642       that it should unpack the contents of a constructor field into
6643       the constructor itself, removing a level of indirection.  For
6644       example:</para>
6645
6646 <programlisting>
6647 data T = T {-# UNPACK #-} !Float
6648            {-# UNPACK #-} !Float
6649 </programlisting>
6650
6651       <para>will create a constructor <literal>T</literal> containing
6652       two unboxed floats.  This may not always be an optimisation: if
6653       the <function>T</function> constructor is scrutinised and the
6654       floats passed to a non-strict function for example, they will
6655       have to be reboxed (this is done automatically by the
6656       compiler).</para>
6657
6658       <para>Unpacking constructor fields should only be used in
6659       conjunction with <option>-O</option>, in order to expose
6660       unfoldings to the compiler so the reboxing can be removed as
6661       often as possible.  For example:</para>
6662
6663 <programlisting>
6664 f :: T -&#62; Float
6665 f (T f1 f2) = f1 + f2
6666 </programlisting>
6667
6668       <para>The compiler will avoid reboxing <function>f1</function>
6669       and <function>f2</function> by inlining <function>+</function>
6670       on floats, but only when <option>-O</option> is on.</para>
6671
6672       <para>Any single-constructor data is eligible for unpacking; for
6673       example</para>
6674
6675 <programlisting>
6676 data T = T {-# UNPACK #-} !(Int,Int)
6677 </programlisting>
6678
6679       <para>will store the two <literal>Int</literal>s directly in the
6680       <function>T</function> constructor, by flattening the pair.
6681       Multi-level unpacking is also supported:
6682
6683 <programlisting>
6684 data T = T {-# UNPACK #-} !S
6685 data S = S {-# UNPACK #-} !Int {-# UNPACK #-} !Int
6686 </programlisting>
6687
6688       will store two unboxed <literal>Int&num;</literal>s
6689       directly in the <function>T</function> constructor.  The
6690       unpacker can see through newtypes, too.</para>
6691
6692       <para>If a field cannot be unpacked, you will not get a warning,
6693       so it might be an idea to check the generated code with
6694       <option>-ddump-simpl</option>.</para>
6695
6696       <para>See also the <option>-funbox-strict-fields</option> flag,
6697       which essentially has the effect of adding
6698       <literal>{-#&nbsp;UNPACK&nbsp;#-}</literal> to every strict
6699       constructor field.</para>
6700     </sect2>
6701
6702     <sect2 id="source-pragma">
6703       <title>SOURCE pragma</title>
6704
6705       <indexterm><primary>SOURCE</primary></indexterm>
6706      <para>The <literal>{-# SOURCE #-}</literal> pragma is used only in <literal>import</literal> declarations,
6707      to break a module loop.  It is described in detail in <xref linkend="mutual-recursion"/>.
6708      </para>
6709 </sect2>
6710
6711 </sect1>
6712
6713 <!--  ======================= REWRITE RULES ======================== -->
6714
6715 <sect1 id="rewrite-rules">
6716 <title>Rewrite rules
6717
6718 <indexterm><primary>RULES pragma</primary></indexterm>
6719 <indexterm><primary>pragma, RULES</primary></indexterm>
6720 <indexterm><primary>rewrite rules</primary></indexterm></title>
6721
6722 <para>
6723 The programmer can specify rewrite rules as part of the source program
6724 (in a pragma).
6725 Here is an example:
6726
6727 <programlisting>
6728   {-# RULES
6729   "map/map"    forall f g xs.  map f (map g xs) = map (f.g) xs
6730     #-}
6731 </programlisting>
6732 </para>
6733 <para>
6734 Use the debug flag <option>-ddump-simpl-stats</option> to see what rules fired.
6735 If you need more information, then <option>-ddump-rule-firings</option> shows you
6736 each individual rule firing in detail.
6737 </para>
6738
6739 <sect2>
6740 <title>Syntax</title>
6741
6742 <para>
6743 From a syntactic point of view:
6744
6745 <itemizedlist>
6746
6747 <listitem>
6748 <para>
6749  There may be zero or more rules in a <literal>RULES</literal> pragma, separated by semicolons (which
6750  may be generated by the layout rule).
6751 </para>
6752 </listitem>
6753
6754 <listitem>
6755 <para>
6756 The layout rule applies in a pragma.
6757 Currently no new indentation level
6758 is set, so if you put several rules in single RULES pragma and wish to use layout to separate them,
6759 you must lay out the starting in the same column as the enclosing definitions.
6760 <programlisting>
6761   {-# RULES
6762   "map/map"    forall f g xs.  map f (map g xs) = map (f.g) xs
6763   "map/append" forall f xs ys. map f (xs ++ ys) = map f xs ++ map f ys
6764     #-}
6765 </programlisting>
6766 Furthermore, the closing <literal>#-}</literal>
6767 should start in a column to the right of the opening <literal>{-#</literal>.
6768 </para>
6769 </listitem>
6770
6771 <listitem>
6772 <para>
6773  Each rule has a name, enclosed in double quotes.  The name itself has
6774 no significance at all.  It is only used when reporting how many times the rule fired.
6775 </para>
6776 </listitem>
6777
6778 <listitem>
6779 <para>
6780 A rule may optionally have a phase-control number (see <xref linkend="phase-control"/>),
6781 immediately after the name of the rule.  Thus:
6782 <programlisting>
6783   {-# RULES
6784         "map/map" [2]  forall f g xs. map f (map g xs) = map (f.g) xs
6785     #-}
6786 </programlisting>
6787 The "[2]" means that the rule is active in Phase 2 and subsequent phases.  The inverse
6788 notation "[~2]" is also accepted, meaning that the rule is active up to, but not including,
6789 Phase 2.
6790 </para>
6791 </listitem>
6792
6793
6794
6795 <listitem>
6796 <para>
6797  Each variable mentioned in a rule must either be in scope (e.g. <function>map</function>),
6798 or bound by the <literal>forall</literal> (e.g. <function>f</function>, <function>g</function>, <function>xs</function>).  The variables bound by
6799 the <literal>forall</literal> are called the <emphasis>pattern</emphasis> variables.  They are separated
6800 by spaces, just like in a type <literal>forall</literal>.
6801 </para>
6802 </listitem>
6803 <listitem>
6804
6805 <para>
6806  A pattern variable may optionally have a type signature.
6807 If the type of the pattern variable is polymorphic, it <emphasis>must</emphasis> have a type signature.
6808 For example, here is the <literal>foldr/build</literal> rule:
6809
6810 <programlisting>
6811 "fold/build"  forall k z (g::forall b. (a->b->b) -> b -> b) .
6812               foldr k z (build g) = g k z
6813 </programlisting>
6814
6815 Since <function>g</function> has a polymorphic type, it must have a type signature.
6816
6817 </para>
6818 </listitem>
6819 <listitem>
6820
6821 <para>
6822 The left hand side of a rule must consist of a top-level variable applied
6823 to arbitrary expressions.  For example, this is <emphasis>not</emphasis> OK:
6824
6825 <programlisting>
6826 "wrong1"   forall e1 e2.  case True of { True -> e1; False -> e2 } = e1
6827 "wrong2"   forall f.      f True = True
6828 </programlisting>
6829
6830 In <literal>"wrong1"</literal>, the LHS is not an application; in <literal>"wrong2"</literal>, the LHS has a pattern variable
6831 in the head.
6832 </para>
6833 </listitem>
6834 <listitem>
6835
6836 <para>
6837  A rule does not need to be in the same module as (any of) the
6838 variables it mentions, though of course they need to be in scope.
6839 </para>
6840 </listitem>
6841 <listitem>
6842
6843 <para>
6844  All rules are implicitly exported from the module, and are therefore
6845 in force in any module that imports the module that defined the rule, directly
6846 or indirectly.  (That is, if A imports B, which imports C, then C's rules are
6847 in force when compiling A.)  The situation is very similar to that for instance
6848 declarations.
6849 </para>
6850 </listitem>
6851
6852 <listitem>
6853
6854 <para>
6855 Inside a RULE "<literal>forall</literal>" is treated as a keyword, regardless of
6856 any other flag settings.  Furthermore, inside a RULE, the language extension
6857 <option>-XScopedTypeVariables</option> is automatically enabled; see
6858 <xref linkend="scoped-type-variables"/>.
6859 </para>
6860 </listitem>
6861 <listitem>
6862
6863 <para>
6864 Like other pragmas, RULE pragmas are always checked for scope errors, and
6865 are typechecked. Typechecking means that the LHS and RHS of a rule are typechecked,
6866 and must have the same type.  However, rules are only <emphasis>enabled</emphasis>
6867 if the <option>-fenable-rewrite-rules</option> flag is
6868 on (see <xref linkend="rule-semantics"/>).
6869 </para>
6870 </listitem>
6871 </itemizedlist>
6872
6873 </para>
6874
6875 </sect2>
6876
6877 <sect2 id="rule-semantics">
6878 <title>Semantics</title>
6879
6880 <para>
6881 From a semantic point of view:
6882
6883 <itemizedlist>
6884 <listitem>
6885 <para>
6886 Rules are enabled (that is, used during optimisation)
6887 by the <option>-fenable-rewrite-rules</option> flag.
6888 This flag is implied by <option>-O</option>, and may be switched
6889 off (as usual) by <option>-fno-enable-rewrite-rules</option>.
6890 (NB: enabling <option>-fenable-rewrite-rules</option> without <option>-O</option>
6891 may not do what you expect, though, because without <option>-O</option> GHC
6892 ignores all optimisation information in interface files;
6893 see <option>-fignore-interface-pragmas</option>, <xref linkend="options-f"/>.)
6894 Note that <option>-fenable-rewrite-rules</option> is an <emphasis>optimisation</emphasis> flag, and
6895 has no effect on parsing or typechecking.
6896 </para>
6897 </listitem>
6898
6899 <listitem>
6900 <para>
6901  Rules are regarded as left-to-right rewrite rules.
6902 When GHC finds an expression that is a substitution instance of the LHS
6903 of a rule, it replaces the expression by the (appropriately-substituted) RHS.
6904 By "a substitution instance" we mean that the LHS can be made equal to the
6905 expression by substituting for the pattern variables.
6906
6907 </para>
6908 </listitem>
6909 <listitem>
6910
6911 <para>
6912  GHC makes absolutely no attempt to verify that the LHS and RHS
6913 of a rule have the same meaning.  That is undecidable in general, and
6914 infeasible in most interesting cases.  The responsibility is entirely the programmer's!
6915
6916 </para>
6917 </listitem>
6918 <listitem>
6919
6920 <para>
6921  GHC makes no attempt to make sure that the rules are confluent or
6922 terminating.  For example:
6923
6924 <programlisting>
6925   "loop"        forall x y.  f x y = f y x
6926 </programlisting>
6927
6928 This rule will cause the compiler to go into an infinite loop.
6929
6930 </para>
6931 </listitem>
6932 <listitem>
6933
6934 <para>
6935  If more than one rule matches a call, GHC will choose one arbitrarily to apply.
6936
6937 </para>
6938 </listitem>
6939 <listitem>
6940 <para>
6941  GHC currently uses a very simple, syntactic, matching algorithm
6942 for matching a rule LHS with an expression.  It seeks a substitution
6943 which makes the LHS and expression syntactically equal modulo alpha
6944 conversion.  The pattern (rule), but not the expression, is eta-expanded if
6945 necessary.  (Eta-expanding the expression can lead to laziness bugs.)
6946 But not beta conversion (that's called higher-order matching).
6947 </para>
6948
6949 <para>
6950 Matching is carried out on GHC's intermediate language, which includes
6951 type abstractions and applications.  So a rule only matches if the
6952 types match too.  See <xref linkend="rule-spec"/> below.
6953 </para>
6954 </listitem>
6955 <listitem>
6956
6957 <para>
6958  GHC keeps trying to apply the rules as it optimises the program.
6959 For example, consider:
6960
6961 <programlisting>
6962   let s = map f
6963       t = map g
6964   in
6965   s (t xs)
6966 </programlisting>
6967
6968 The expression <literal>s (t xs)</literal> does not match the rule <literal>"map/map"</literal>, but GHC
6969 will substitute for <varname>s</varname> and <varname>t</varname>, giving an expression which does match.
6970 If <varname>s</varname> or <varname>t</varname> was (a) used more than once, and (b) large or a redex, then it would
6971 not be substituted, and the rule would not fire.
6972
6973 </para>
6974 </listitem>
6975 <listitem>
6976
6977 <para>
6978 Ordinary inlining happens at the same time as rule rewriting, which may lead to unexpected
6979 results.  Consider this (artificial) example
6980 <programlisting>
6981 f x = x
6982 {-# RULES "f" f True = False #-}
6983
6984 g y = f y
6985
6986 h z = g True
6987 </programlisting>
6988 Since <literal>f</literal>'s right-hand side is small, it is inlined into <literal>g</literal>,
6989 to give
6990 <programlisting>
6991 g y = y
6992 </programlisting>
6993 Now <literal>g</literal> is inlined into <literal>h</literal>, but <literal>f</literal>'s RULE has
6994 no chance to fire.
6995 If instead GHC had first inlined <literal>g</literal> into <literal>h</literal> then there
6996 would have been a better chance that <literal>f</literal>'s RULE might fire.
6997 </para>
6998 <para>
6999 The way to get predictable behaviour is to use a NOINLINE
7000 pragma on <literal>f</literal>, to ensure
7001 that it is not inlined until its RULEs have had a chance to fire.
7002 </para>
7003 </listitem>
7004 </itemizedlist>
7005
7006 </para>
7007
7008 </sect2>
7009
7010 <sect2>
7011 <title>List fusion</title>
7012
7013 <para>
7014 The RULES mechanism is used to implement fusion (deforestation) of common list functions.
7015 If a "good consumer" consumes an intermediate list constructed by a "good producer", the
7016 intermediate list should be eliminated entirely.
7017 </para>
7018
7019 <para>
7020 The following are good producers:
7021
7022 <itemizedlist>
7023 <listitem>
7024
7025 <para>
7026  List comprehensions
7027 </para>
7028 </listitem>
7029 <listitem>
7030
7031 <para>
7032  Enumerations of <literal>Int</literal> and <literal>Char</literal> (e.g. <literal>['a'..'z']</literal>).
7033 </para>
7034 </listitem>
7035 <listitem>
7036
7037 <para>
7038  Explicit lists (e.g. <literal>[True, False]</literal>)
7039 </para>
7040 </listitem>
7041 <listitem>
7042
7043 <para>
7044  The cons constructor (e.g <literal>3:4:[]</literal>)
7045 </para>
7046 </listitem>
7047 <listitem>
7048
7049 <para>
7050  <function>++</function>
7051 </para>
7052 </listitem>
7053
7054 <listitem>
7055 <para>
7056  <function>map</function>
7057 </para>
7058 </listitem>
7059
7060 <listitem>
7061 <para>
7062 <function>take</function>, <function>filter</function>
7063 </para>
7064 </listitem>
7065 <listitem>
7066
7067 <para>
7068  <function>iterate</function>, <function>repeat</function>
7069 </para>
7070 </listitem>
7071 <listitem>
7072
7073 <para>
7074  <function>zip</function>, <function>zipWith</function>
7075 </para>
7076 </listitem>
7077
7078 </itemizedlist>
7079
7080 </para>
7081
7082 <para>
7083 The following are good consumers:
7084
7085 <itemizedlist>
7086 <listitem>
7087
7088 <para>
7089  List comprehensions
7090 </para>
7091 </listitem>
7092 <listitem>
7093
7094 <para>
7095  <function>array</function> (on its second argument)
7096 </para>
7097 </listitem>
7098 <listitem>
7099
7100 <para>
7101  <function>++</function> (on its first argument)
7102 </para>
7103 </listitem>
7104
7105 <listitem>
7106 <para>
7107  <function>foldr</function>
7108 </para>
7109 </listitem>
7110
7111 <listitem>
7112 <para>
7113  <function>map</function>
7114 </para>
7115 </listitem>
7116 <listitem>
7117
7118 <para>
7119 <function>take</function>, <function>filter</function>
7120 </para>
7121 </listitem>
7122 <listitem>
7123
7124 <para>
7125  <function>concat</function>
7126 </para>
7127 </listitem>
7128 <listitem>
7129
7130 <para>
7131  <function>unzip</function>, <function>unzip2</function>, <function>unzip3</function>, <function>unzip4</function>
7132 </para>
7133 </listitem>
7134 <listitem>
7135
7136 <para>
7137  <function>zip</function>, <function>zipWith</function> (but on one argument only; if both are good producers, <function>zip</function>
7138 will fuse with one but not the other)
7139 </para>
7140 </listitem>
7141 <listitem>
7142
7143 <para>
7144  <function>partition</function>
7145 </para>
7146 </listitem>
7147 <listitem>
7148
7149 <para>
7150  <function>head</function>
7151 </para>
7152 </listitem>
7153 <listitem>
7154
7155 <para>
7156  <function>and</function>, <function>or</function>, <function>any</function>, <function>all</function>
7157 </para>
7158 </listitem>
7159 <listitem>
7160
7161 <para>
7162  <function>sequence&lowbar;</function>
7163 </para>
7164 </listitem>
7165 <listitem>
7166
7167 <para>
7168  <function>msum</function>
7169 </para>
7170 </listitem>
7171 <listitem>
7172
7173 <para>
7174  <function>sortBy</function>
7175 </para>
7176 </listitem>
7177
7178 </itemizedlist>
7179
7180 </para>
7181
7182  <para>
7183 So, for example, the following should generate no intermediate lists:
7184
7185 <programlisting>
7186 array (1,10) [(i,i*i) | i &#60;- map (+ 1) [0..9]]
7187 </programlisting>
7188
7189 </para>
7190
7191 <para>
7192 This list could readily be extended; if there are Prelude functions that you use
7193 a lot which are not included, please tell us.
7194 </para>
7195
7196 <para>
7197 If you want to write your own good consumers or producers, look at the
7198 Prelude definitions of the above functions to see how to do so.
7199 </para>
7200
7201 </sect2>
7202
7203 <sect2 id="rule-spec">
7204 <title>Specialisation
7205 </title>
7206
7207 <para>
7208 Rewrite rules can be used to get the same effect as a feature
7209 present in earlier versions of GHC.
7210 For example, suppose that:
7211
7212 <programlisting>
7213 genericLookup :: Ord a => Table a b   -> a   -> b
7214 intLookup     ::          Table Int b -> Int -> b
7215 </programlisting>
7216
7217 where <function>intLookup</function> is an implementation of
7218 <function>genericLookup</function> that works very fast for
7219 keys of type <literal>Int</literal>.  You might wish
7220 to tell GHC to use <function>intLookup</function> instead of
7221 <function>genericLookup</function> whenever the latter was called with
7222 type <literal>Table Int b -&gt; Int -&gt; b</literal>.
7223 It used to be possible to write
7224
7225 <programlisting>
7226 {-# SPECIALIZE genericLookup :: Table Int b -> Int -> b = intLookup #-}
7227 </programlisting>
7228
7229 This feature is no longer in GHC, but rewrite rules let you do the same thing:
7230
7231 <programlisting>
7232 {-# RULES "genericLookup/Int" genericLookup = intLookup #-}
7233 </programlisting>
7234
7235 This slightly odd-looking rule instructs GHC to replace
7236 <function>genericLookup</function> by <function>intLookup</function>
7237 <emphasis>whenever the types match</emphasis>.
7238 What is more, this rule does not need to be in the same
7239 file as <function>genericLookup</function>, unlike the
7240 <literal>SPECIALIZE</literal> pragmas which currently do (so that they
7241 have an original definition available to specialise).
7242 </para>
7243
7244 <para>It is <emphasis>Your Responsibility</emphasis> to make sure that
7245 <function>intLookup</function> really behaves as a specialised version
7246 of <function>genericLookup</function>!!!</para>
7247
7248 <para>An example in which using <literal>RULES</literal> for
7249 specialisation will Win Big:
7250
7251 <programlisting>
7252 toDouble :: Real a => a -> Double
7253 toDouble = fromRational . toRational
7254
7255 {-# RULES "toDouble/Int" toDouble = i2d #-}
7256 i2d (I# i) = D# (int2Double# i) -- uses Glasgow prim-op directly
7257 </programlisting>
7258
7259 The <function>i2d</function> function is virtually one machine
7260 instruction; the default conversion&mdash;via an intermediate
7261 <literal>Rational</literal>&mdash;is obscenely expensive by
7262 comparison.
7263 </para>
7264
7265 </sect2>
7266
7267 <sect2>
7268 <title>Controlling what's going on</title>
7269
7270 <para>
7271
7272 <itemizedlist>
7273 <listitem>
7274
7275 <para>
7276  Use <option>-ddump-rules</option> to see what transformation rules GHC is using.
7277 </para>
7278 </listitem>
7279 <listitem>
7280
7281 <para>
7282  Use <option>-ddump-simpl-stats</option> to see what rules are being fired.
7283 If you add <option>-dppr-debug</option> you get a more detailed listing.
7284 </para>
7285 </listitem>
7286 <listitem>
7287
7288 <para>
7289  The definition of (say) <function>build</function> in <filename>GHC/Base.lhs</filename> looks like this:
7290
7291 <programlisting>
7292         build   :: forall a. (forall b. (a -> b -> b) -> b -> b) -> [a]
7293         {-# INLINE build #-}
7294         build g = g (:) []
7295 </programlisting>
7296
7297 Notice the <literal>INLINE</literal>!  That prevents <literal>(:)</literal> from being inlined when compiling
7298 <literal>PrelBase</literal>, so that an importing module will &ldquo;see&rdquo; the <literal>(:)</literal>, and can
7299 match it on the LHS of a rule.  <literal>INLINE</literal> prevents any inlining happening
7300 in the RHS of the <literal>INLINE</literal> thing.  I regret the delicacy of this.
7301
7302 </para>
7303 </listitem>
7304 <listitem>
7305
7306 <para>
7307  In <filename>libraries/base/GHC/Base.lhs</filename> look at the rules for <function>map</function> to
7308 see how to write rules that will do fusion and yet give an efficient
7309 program even if fusion doesn't happen.  More rules in <filename>GHC/List.lhs</filename>.
7310 </para>
7311 </listitem>
7312
7313 </itemizedlist>
7314
7315 </para>
7316
7317 </sect2>
7318
7319 <sect2 id="core-pragma">
7320   <title>CORE pragma</title>
7321
7322   <indexterm><primary>CORE pragma</primary></indexterm>
7323   <indexterm><primary>pragma, CORE</primary></indexterm>
7324   <indexterm><primary>core, annotation</primary></indexterm>
7325
7326 <para>
7327   The external core format supports <quote>Note</quote> annotations;
7328   the <literal>CORE</literal> pragma gives a way to specify what these
7329   should be in your Haskell source code.  Syntactically, core
7330   annotations are attached to expressions and take a Haskell string
7331   literal as an argument.  The following function definition shows an
7332   example:
7333
7334 <programlisting>
7335 f x = ({-# CORE "foo" #-} show) ({-# CORE "bar" #-} x)
7336 </programlisting>
7337
7338   Semantically, this is equivalent to:
7339
7340 <programlisting>
7341 g x = show x
7342 </programlisting>
7343 </para>
7344
7345 <para>
7346   However, when external core is generated (via
7347   <option>-fext-core</option>), there will be Notes attached to the
7348   expressions <function>show</function> and <varname>x</varname>.
7349   The core function declaration for <function>f</function> is:
7350 </para>
7351
7352 <programlisting>
7353   f :: %forall a . GHCziShow.ZCTShow a ->
7354                    a -> GHCziBase.ZMZN GHCziBase.Char =
7355     \ @ a (zddShow::GHCziShow.ZCTShow a) (eta::a) ->
7356         (%note "foo"
7357          %case zddShow %of (tpl::GHCziShow.ZCTShow a)
7358            {GHCziShow.ZCDShow
7359             (tpl1::GHCziBase.Int ->
7360                    a ->
7361                    GHCziBase.ZMZN GHCziBase.Char -> GHCziBase.ZMZN GHCziBase.Cha
7362 r)
7363             (tpl2::a -> GHCziBase.ZMZN GHCziBase.Char)
7364             (tpl3::GHCziBase.ZMZN a ->
7365                    GHCziBase.ZMZN GHCziBase.Char -> GHCziBase.ZMZN GHCziBase.Cha
7366 r) ->
7367               tpl2})
7368         (%note "bar"
7369          eta);
7370 </programlisting>
7371
7372 <para>
7373   Here, we can see that the function <function>show</function> (which
7374   has been expanded out to a case expression over the Show dictionary)
7375   has a <literal>%note</literal> attached to it, as does the
7376   expression <varname>eta</varname> (which used to be called
7377   <varname>x</varname>).
7378 </para>
7379
7380 </sect2>
7381
7382 </sect1>
7383
7384 <sect1 id="special-ids">
7385 <title>Special built-in functions</title>
7386 <para>GHC has a few built-in functions with special behaviour.  These
7387 are now described in the module <ulink
7388 url="../libraries/base/GHC-Prim.html"><literal>GHC.Prim</literal></ulink>
7389 in the library documentation.</para>
7390 </sect1>
7391
7392
7393 <sect1 id="generic-classes">
7394 <title>Generic classes</title>
7395
7396 <para>
7397 The ideas behind this extension are described in detail in "Derivable type classes",
7398 Ralf Hinze and Simon Peyton Jones, Haskell Workshop, Montreal Sept 2000, pp94-105.
7399 An example will give the idea:
7400 </para>
7401
7402 <programlisting>
7403   import Generics
7404
7405   class Bin a where
7406     toBin   :: a -> [Int]
7407     fromBin :: [Int] -> (a, [Int])
7408
7409     toBin {| Unit |}    Unit      = []
7410     toBin {| a :+: b |} (Inl x)   = 0 : toBin x
7411     toBin {| a :+: b |} (Inr y)   = 1 : toBin y
7412     toBin {| a :*: b |} (x :*: y) = toBin x ++ toBin y
7413
7414     fromBin {| Unit |}    bs      = (Unit, bs)
7415     fromBin {| a :+: b |} (0:bs)  = (Inl x, bs')    where (x,bs') = fromBin bs
7416     fromBin {| a :+: b |} (1:bs)  = (Inr y, bs')    where (y,bs') = fromBin bs
7417     fromBin {| a :*: b |} bs      = (x :*: y, bs'') where (x,bs' ) = fromBin bs
7418                                                           (y,bs'') = fromBin bs'
7419 </programlisting>
7420 <para>
7421 This class declaration explains how <literal>toBin</literal> and <literal>fromBin</literal>
7422 work for arbitrary data types.  They do so by giving cases for unit, product, and sum,
7423 which are defined thus in the library module <literal>Generics</literal>:
7424 </para>
7425 <programlisting>
7426   data Unit    = Unit
7427   data a :+: b = Inl a | Inr b
7428   data a :*: b = a :*: b
7429 </programlisting>
7430 <para>
7431 Now you can make a data type into an instance of Bin like this:
7432 <programlisting>
7433   instance (Bin a, Bin b) => Bin (a,b)
7434   instance Bin a => Bin [a]
7435 </programlisting>
7436 That is, just leave off the "where" clause.  Of course, you can put in the
7437 where clause and over-ride whichever methods you please.
7438 </para>
7439
7440     <sect2>
7441       <title> Using generics </title>
7442       <para>To use generics you need to</para>
7443       <itemizedlist>
7444         <listitem>
7445           <para>Use the flags <option>-fglasgow-exts</option> (to enable the extra syntax),
7446                 <option>-XGenerics</option> (to generate extra per-data-type code),
7447                 and <option>-package lang</option> (to make the <literal>Generics</literal> library
7448                 available.  </para>
7449         </listitem>
7450         <listitem>
7451           <para>Import the module <literal>Generics</literal> from the
7452           <literal>lang</literal> package.  This import brings into
7453           scope the data types <literal>Unit</literal>,
7454           <literal>:*:</literal>, and <literal>:+:</literal>.  (You
7455           don't need this import if you don't mention these types
7456           explicitly; for example, if you are simply giving instance
7457           declarations.)</para>
7458         </listitem>
7459       </itemizedlist>
7460     </sect2>
7461
7462 <sect2> <title> Changes wrt the paper </title>
7463 <para>
7464 Note that the type constructors <literal>:+:</literal> and <literal>:*:</literal>
7465 can be written infix (indeed, you can now use
7466 any operator starting in a colon as an infix type constructor).  Also note that
7467 the type constructors are not exactly as in the paper (Unit instead of 1, etc).
7468 Finally, note that the syntax of the type patterns in the class declaration
7469 uses "<literal>{|</literal>" and "<literal>|}</literal>" brackets; curly braces
7470 alone would ambiguous when they appear on right hand sides (an extension we
7471 anticipate wanting).
7472 </para>
7473 </sect2>
7474
7475 <sect2> <title>Terminology and restrictions</title>
7476 <para>
7477 Terminology.  A "generic default method" in a class declaration
7478 is one that is defined using type patterns as above.
7479 A "polymorphic default method" is a default method defined as in Haskell 98.
7480 A "generic class declaration" is a class declaration with at least one
7481 generic default method.
7482 </para>
7483
7484 <para>
7485 Restrictions:
7486 <itemizedlist>
7487 <listitem>
7488 <para>
7489 Alas, we do not yet implement the stuff about constructor names and
7490 field labels.
7491 </para>
7492 </listitem>
7493
7494 <listitem>
7495 <para>
7496 A generic class can have only one parameter; you can't have a generic
7497 multi-parameter class.
7498 </para>
7499 </listitem>
7500
7501 <listitem>
7502 <para>
7503 A default method must be defined entirely using type patterns, or entirely
7504 without.  So this is illegal:
7505 <programlisting>
7506   class Foo a where
7507     op :: a -> (a, Bool)
7508     op {| Unit |} Unit = (Unit, True)
7509     op x               = (x,    False)
7510 </programlisting>
7511 However it is perfectly OK for some methods of a generic class to have
7512 generic default methods and others to have polymorphic default methods.
7513 </para>
7514 </listitem>
7515
7516 <listitem>
7517 <para>
7518 The type variable(s) in the type pattern for a generic method declaration
7519 scope over the right hand side.  So this is legal (note the use of the type variable ``p'' in a type signature on the right hand side:
7520 <programlisting>
7521   class Foo a where
7522     op :: a -> Bool
7523     op {| p :*: q |} (x :*: y) = op (x :: p)
7524     ...
7525 </programlisting>
7526 </para>
7527 </listitem>
7528
7529 <listitem>
7530 <para>
7531 The type patterns in a generic default method must take one of the forms:
7532 <programlisting>
7533        a :+: b
7534        a :*: b
7535        Unit
7536 </programlisting>
7537 where "a" and "b" are type variables.  Furthermore, all the type patterns for
7538 a single type constructor (<literal>:*:</literal>, say) must be identical; they
7539 must use the same type variables.  So this is illegal:
7540 <programlisting>
7541   class Foo a where
7542     op :: a -> Bool
7543     op {| a :+: b |} (Inl x) = True
7544     op {| p :+: q |} (Inr y) = False
7545 </programlisting>
7546 The type patterns must be identical, even in equations for different methods of the class.
7547 So this too is illegal:
7548 <programlisting>
7549   class Foo a where
7550     op1 :: a -> Bool
7551     op1 {| a :*: b |} (x :*: y) = True
7552
7553     op2 :: a -> Bool
7554     op2 {| p :*: q |} (x :*: y) = False
7555 </programlisting>
7556 (The reason for this restriction is that we gather all the equations for a particular type constructor
7557 into a single generic instance declaration.)
7558 </para>
7559 </listitem>
7560
7561 <listitem>
7562 <para>
7563 A generic method declaration must give a case for each of the three type constructors.
7564 </para>
7565 </listitem>
7566
7567 <listitem>
7568 <para>
7569 The type for a generic method can be built only from:
7570   <itemizedlist>
7571   <listitem> <para> Function arrows </para> </listitem>
7572   <listitem> <para> Type variables </para> </listitem>
7573   <listitem> <para> Tuples </para> </listitem>
7574   <listitem> <para> Arbitrary types not involving type variables </para> </listitem>
7575   </itemizedlist>
7576 Here are some example type signatures for generic methods:
7577 <programlisting>
7578     op1 :: a -> Bool
7579     op2 :: Bool -> (a,Bool)
7580     op3 :: [Int] -> a -> a
7581     op4 :: [a] -> Bool
7582 </programlisting>
7583 Here, op1, op2, op3 are OK, but op4 is rejected, because it has a type variable
7584 inside a list.
7585 </para>
7586 <para>
7587 This restriction is an implementation restriction: we just haven't got around to
7588 implementing the necessary bidirectional maps over arbitrary type constructors.
7589 It would be relatively easy to add specific type constructors, such as Maybe and list,
7590 to the ones that are allowed.</para>
7591 </listitem>
7592
7593 <listitem>
7594 <para>
7595 In an instance declaration for a generic class, the idea is that the compiler
7596 will fill in the methods for you, based on the generic templates.  However it can only
7597 do so if
7598   <itemizedlist>
7599   <listitem>
7600   <para>
7601   The instance type is simple (a type constructor applied to type variables, as in Haskell 98).
7602   </para>
7603   </listitem>
7604   <listitem>
7605   <para>
7606   No constructor of the instance type has unboxed fields.
7607   </para>
7608   </listitem>
7609   </itemizedlist>
7610 (Of course, these things can only arise if you are already using GHC extensions.)
7611 However, you can still give an instance declarations for types which break these rules,
7612 provided you give explicit code to override any generic default methods.
7613 </para>
7614 </listitem>
7615
7616 </itemizedlist>
7617 </para>
7618
7619 <para>
7620 The option <option>-ddump-deriv</option> dumps incomprehensible stuff giving details of
7621 what the compiler does with generic declarations.
7622 </para>
7623
7624 </sect2>
7625
7626 <sect2> <title> Another example </title>
7627 <para>
7628 Just to finish with, here's another example I rather like:
7629 <programlisting>
7630   class Tag a where
7631     nCons :: a -> Int
7632     nCons {| Unit |}    _ = 1
7633     nCons {| a :*: b |} _ = 1
7634     nCons {| a :+: b |} _ = nCons (bot::a) + nCons (bot::b)
7635
7636     tag :: a -> Int
7637     tag {| Unit |}    _       = 1
7638     tag {| a :*: b |} _       = 1
7639     tag {| a :+: b |} (Inl x) = tag x
7640     tag {| a :+: b |} (Inr y) = nCons (bot::a) + tag y
7641 </programlisting>
7642 </para>
7643 </sect2>
7644 </sect1>
7645
7646 <sect1 id="monomorphism">
7647 <title>Control over monomorphism</title>
7648
7649 <para>GHC supports two flags that control the way in which generalisation is
7650 carried out at let and where bindings.
7651 </para>
7652
7653 <sect2>
7654 <title>Switching off the dreaded Monomorphism Restriction</title>
7655           <indexterm><primary><option>-XNoMonomorphismRestriction</option></primary></indexterm>
7656
7657 <para>Haskell's monomorphism restriction (see
7658 <ulink url="http://www.haskell.org/onlinereport/decls.html#sect4.5.5">Section
7659 4.5.5</ulink>
7660 of the Haskell Report)
7661 can be completely switched off by
7662 <option>-XNoMonomorphismRestriction</option>.
7663 </para>
7664 </sect2>
7665
7666 <sect2>
7667 <title>Monomorphic pattern bindings</title>
7668           <indexterm><primary><option>-XNoMonoPatBinds</option></primary></indexterm>
7669           <indexterm><primary><option>-XMonoPatBinds</option></primary></indexterm>
7670
7671           <para> As an experimental change, we are exploring the possibility of
7672           making pattern bindings monomorphic; that is, not generalised at all.
7673             A pattern binding is a binding whose LHS has no function arguments,
7674             and is not a simple variable.  For example:
7675 <programlisting>
7676   f x = x                    -- Not a pattern binding
7677   f = \x -> x                -- Not a pattern binding
7678   f :: Int -> Int = \x -> x  -- Not a pattern binding
7679
7680   (g,h) = e                  -- A pattern binding
7681   (f) = e                    -- A pattern binding
7682   [x] = e                    -- A pattern binding
7683 </programlisting>
7684 Experimentally, GHC now makes pattern bindings monomorphic <emphasis>by
7685 default</emphasis>.  Use <option>-XNoMonoPatBinds</option> to recover the
7686 standard behaviour.
7687 </para>
7688 </sect2>
7689 </sect1>
7690
7691
7692
7693 <!-- Emacs stuff:
7694      ;;; Local Variables: ***
7695      ;;; mode: xml ***
7696      ;;; sgml-parent-document: ("users_guide.xml" "book" "chapter" "sect1") ***
7697      ;;; ispell-local-dictionary: "british" ***
7698      ;;; End: ***
7699  -->
7700