docs/users_guide/glasgow_exts.xml

   1 <?xml version="1.0" encoding="iso-8859-1"?>
   2 <para>
   3 <indexterm><primary>language, GHC</primary></indexterm>
   4 <indexterm><primary>extensions, GHC</primary></indexterm>
   5 As with all known Haskell systems, GHC implements some extensions to
   6 the language.  They are all enabled by options; by default GHC
   7 understands only plain Haskell 98.
   8 </para>
   9
  10 <para>
  11 Some of the Glasgow extensions serve to give you access to the
  12 underlying facilities with which we implement Haskell.  Thus, you can
  13 get at the Raw Iron, if you are willing to write some non-portable
  14 code at a more primitive level.  You need not be &ldquo;stuck&rdquo;
  15 on performance because of the implementation costs of Haskell's
  16 &ldquo;high-level&rdquo; features&mdash;you can always code
  17 &ldquo;under&rdquo; them.  In an extreme case, you can write all your
  18 time-critical code in C, and then just glue it together with Haskell!
  19 </para>
  20
  21 <para>
  22 Before you get too carried away working at the lowest level (e.g.,
  23 sloshing <literal>MutableByteArray&num;</literal>s around your
  24 program), you may wish to check if there are libraries that provide a
  25 &ldquo;Haskellised veneer&rdquo; over the features you want.  The
  26 separate <ulink url="../libraries/index.html">libraries
  27 documentation</ulink> describes all the libraries that come with GHC.
  28 </para>
  29
  30 <!-- LANGUAGE OPTIONS -->
  31   <sect1 id="options-language">
  32     <title>Language options</title>
  33
  34     <indexterm><primary>language</primary><secondary>option</secondary>
  35     </indexterm>
  36     <indexterm><primary>options</primary><secondary>language</secondary>
  37     </indexterm>
  38     <indexterm><primary>extensions</primary><secondary>options controlling</secondary>
  39     </indexterm>
  40
  41     <para>The language option flags control what variation of the language are
  42     permitted.  Leaving out all of them gives you standard Haskell
  43     98.</para>
  44
  45     <para>Language options can be controlled in two ways:
  46     <itemizedlist>
  47       <listitem><para>Every language option can switched on by a command-line flag "<option>-X...</option>"
  48         (e.g. <option>-XTemplateHaskell</option>), and switched off by the flag "<option>-XNo...</option>";
  49         (e.g. <option>-XNoTemplateHaskell</option>).</para></listitem>
  50       <listitem><para>
  51           Language options recognised by Cabal can also be enabled using the <literal>LANGUAGE</literal> pragma,
  52           thus <literal>{-# LANGUAGE TemplateHaskell #-}</literal> (see <xref linkend="language-pragma"/>). </para>
  53           </listitem>
  54       </itemizedlist></para>
  55
  56     <para>The flag <option>-fglasgow-exts</option>
  57           <indexterm><primary><option>-fglasgow-exts</option></primary></indexterm>
  58           is equivalent to enabling the following extensions:
  59           <option>-XPrintExplicitForalls</option>,
  60           <option>-XForeignFunctionInterface</option>,
  61           <option>-XUnliftedFFITypes</option>,
  62           <option>-XGADTs</option>,
  63           <option>-XImplicitParams</option>,
  64           <option>-XScopedTypeVariables</option>,
  65           <option>-XUnboxedTuples</option>,
  66           <option>-XTypeSynonymInstances</option>,
  67           <option>-XStandaloneDeriving</option>,
  68           <option>-XDeriveDataTypeable</option>,
  69           <option>-XFlexibleContexts</option>,
  70           <option>-XFlexibleInstances</option>,
  71           <option>-XConstrainedClassMethods</option>,
  72           <option>-XMultiParamTypeClasses</option>,
  73           <option>-XFunctionalDependencies</option>,
  74           <option>-XMagicHash</option>,
  75           <option>-XPolymorphicComponents</option>,
  76           <option>-XExistentialQuantification</option>,
  77           <option>-XUnicodeSyntax</option>,
  78           <option>-XPostfixOperators</option>,
  79           <option>-XPatternGuards</option>,
  80           <option>-XLiberalTypeSynonyms</option>,
  81           <option>-XExplicitForAll</option>,
  82           <option>-XRankNTypes</option>,
  83           <option>-XImpredicativeTypes</option>,
  84           <option>-XTypeOperators</option>,
  85           <option>-XDoRec</option>,
  86           <option>-XParallelListComp</option>,
  87           <option>-XEmptyDataDecls</option>,
  88           <option>-XKindSignatures</option>,
  89           <option>-XGeneralizedNewtypeDeriving</option>,
  90           <option>-XTypeFamilies</option>.
  91             Enabling these options is the <emphasis>only</emphasis>
  92             effect of <option>-fglasgow-exts</option>.
  93           We are trying to move away from this portmanteau flag,
  94           and towards enabling features individually.</para>
  95
  96   </sect1>
  97
  98 <!-- UNBOXED TYPES AND PRIMITIVE OPERATIONS -->
  99 <sect1 id="primitives">
 100   <title>Unboxed types and primitive operations</title>
 101
 102 <para>GHC is built on a raft of primitive data types and operations;
 103 "primitive" in the sense that they cannot be defined in Haskell itself.
 104 While you really can use this stuff to write fast code,
 105   we generally find it a lot less painful, and more satisfying in the
 106   long run, to use higher-level language features and libraries.  With
 107   any luck, the code you write will be optimised to the efficient
 108   unboxed version in any case.  And if it isn't, we'd like to know
 109   about it.</para>
 110
 111 <para>All these primitive data types and operations are exported by the
 112 library <literal>GHC.Prim</literal>, for which there is
 113 <ulink url="&libraryGhcPrimLocation;/GHC-Prim.html">detailed online documentation</ulink>.
 114 (This documentation is generated from the file <filename>compiler/prelude/primops.txt.pp</filename>.)
 115 </para>
 116 <para>
 117 If you want to mention any of the primitive data types or operations in your
 118 program, you must first import <literal>GHC.Prim</literal> to bring them
 119 into scope.  Many of them have names ending in "&num;", and to mention such
 120 names you need the <option>-XMagicHash</option> extension (<xref linkend="magic-hash"/>).
 121 </para>
 122
 123 <para>The primops make extensive use of <link linkend="glasgow-unboxed">unboxed types</link>
 124 and <link linkend="unboxed-tuples">unboxed tuples</link>, which
 125 we briefly summarise here. </para>
 126
 127 <sect2 id="glasgow-unboxed">
 128 <title>Unboxed types
 129 </title>
 130
 131 <para>
 132 <indexterm><primary>Unboxed types (Glasgow extension)</primary></indexterm>
 133 </para>
 134
 135 <para>Most types in GHC are <firstterm>boxed</firstterm>, which means
 136 that values of that type are represented by a pointer to a heap
 137 object.  The representation of a Haskell <literal>Int</literal>, for
 138 example, is a two-word heap object.  An <firstterm>unboxed</firstterm>
 139 type, however, is represented by the value itself, no pointers or heap
 140 allocation are involved.
 141 </para>
 142
 143 <para>
 144 Unboxed types correspond to the &ldquo;raw machine&rdquo; types you
 145 would use in C: <literal>Int&num;</literal> (long int),
 146 <literal>Double&num;</literal> (double), <literal>Addr&num;</literal>
 147 (void *), etc.  The <emphasis>primitive operations</emphasis>
 148 (PrimOps) on these types are what you might expect; e.g.,
 149 <literal>(+&num;)</literal> is addition on
 150 <literal>Int&num;</literal>s, and is the machine-addition that we all
 151 know and love&mdash;usually one instruction.
 152 </para>
 153
 154 <para>
 155 Primitive (unboxed) types cannot be defined in Haskell, and are
 156 therefore built into the language and compiler.  Primitive types are
 157 always unlifted; that is, a value of a primitive type cannot be
 158 bottom.  We use the convention (but it is only a convention)
 159 that primitive types, values, and
 160 operations have a <literal>&num;</literal> suffix (see <xref linkend="magic-hash"/>).
 161 For some primitive types we have special syntax for literals, also
 162 described in the <link linkend="magic-hash">same section</link>.
 163 </para>
 164
 165 <para>
 166 Primitive values are often represented by a simple bit-pattern, such
 167 as <literal>Int&num;</literal>, <literal>Float&num;</literal>,
 168 <literal>Double&num;</literal>.  But this is not necessarily the case:
 169 a primitive value might be represented by a pointer to a
 170 heap-allocated object.  Examples include
 171 <literal>Array&num;</literal>, the type of primitive arrays.  A
 172 primitive array is heap-allocated because it is too big a value to fit
 173 in a register, and would be too expensive to copy around; in a sense,
 174 it is accidental that it is represented by a pointer.  If a pointer
 175 represents a primitive value, then it really does point to that value:
 176 no unevaluated thunks, no indirections&hellip;nothing can be at the
 177 other end of the pointer than the primitive value.
 178 A numerically-intensive program using unboxed types can
 179 go a <emphasis>lot</emphasis> faster than its &ldquo;standard&rdquo;
 180 counterpart&mdash;we saw a threefold speedup on one example.
 181 </para>
 182
 183 <para>
 184 There are some restrictions on the use of primitive types:
 185 <itemizedlist>
 186 <listitem><para>The main restriction
 187 is that you can't pass a primitive value to a polymorphic
 188 function or store one in a polymorphic data type.  This rules out
 189 things like <literal>[Int&num;]</literal> (i.e. lists of primitive
 190 integers).  The reason for this restriction is that polymorphic
 191 arguments and constructor fields are assumed to be pointers: if an
 192 unboxed integer is stored in one of these, the garbage collector would
 193 attempt to follow it, leading to unpredictable space leaks.  Or a
 194 <function>seq</function> operation on the polymorphic component may
 195 attempt to dereference the pointer, with disastrous results.  Even
 196 worse, the unboxed value might be larger than a pointer
 197 (<literal>Double&num;</literal> for instance).
 198 </para>
 199 </listitem>
 200 <listitem><para> You cannot define a newtype whose representation type
 201 (the argument type of the data constructor) is an unboxed type.  Thus,
 202 this is illegal:
 203 <programlisting>
 204   newtype A = MkA Int#
 205 </programlisting>
 206 </para></listitem>
 207 <listitem><para> You cannot bind a variable with an unboxed type
 208 in a <emphasis>top-level</emphasis> binding.
 209 </para></listitem>
 210 <listitem><para> You cannot bind a variable with an unboxed type
 211 in a <emphasis>recursive</emphasis> binding.
 212 </para></listitem>
 213 <listitem><para> You may bind unboxed variables in a (non-recursive,
 214 non-top-level) pattern binding, but you must make any such pattern-match
 215 strict.  For example, rather than:
 216 <programlisting>
 217   data Foo = Foo Int Int#
 218
 219   f x = let (Foo a b, w) = ..rhs.. in ..body..
 220 </programlisting>
 221 you must write:
 222 <programlisting>
 223   data Foo = Foo Int Int#
 224
 225   f x = let !(Foo a b, w) = ..rhs.. in ..body..
 226 </programlisting>
 227 since <literal>b</literal> has type <literal>Int#</literal>.
 228 </para>
 229 </listitem>
 230 </itemizedlist>
 231 </para>
 232
 233 </sect2>
 234
 235 <sect2 id="unboxed-tuples">
 236 <title>Unboxed Tuples
 237 </title>
 238
 239 <para>
 240 Unboxed tuples aren't really exported by <literal>GHC.Exts</literal>,
 241 they're available by default with <option>-fglasgow-exts</option>.  An
 242 unboxed tuple looks like this:
 243 </para>
 244
 245 <para>
 246
 247 <programlisting>
 248 (# e_1, ..., e_n #)
 249 </programlisting>
 250
 251 </para>
 252
 253 <para>
 254 where <literal>e&lowbar;1..e&lowbar;n</literal> are expressions of any
 255 type (primitive or non-primitive).  The type of an unboxed tuple looks
 256 the same.
 257 </para>
 258
 259 <para>
 260 Unboxed tuples are used for functions that need to return multiple
 261 values, but they avoid the heap allocation normally associated with
 262 using fully-fledged tuples.  When an unboxed tuple is returned, the
 263 components are put directly into registers or on the stack; the
 264 unboxed tuple itself does not have a composite representation.  Many
 265 of the primitive operations listed in <literal>primops.txt.pp</literal> return unboxed
 266 tuples.
 267 In particular, the <literal>IO</literal> and <literal>ST</literal> monads use unboxed
 268 tuples to avoid unnecessary allocation during sequences of operations.
 269 </para>
 270
 271 <para>
 272 There are some pretty stringent restrictions on the use of unboxed tuples:
 273 <itemizedlist>
 274 <listitem>
 275
 276 <para>
 277 Values of unboxed tuple types are subject to the same restrictions as
 278 other unboxed types; i.e. they may not be stored in polymorphic data
 279 structures or passed to polymorphic functions.
 280
 281 </para>
 282 </listitem>
 283 <listitem>
 284
 285 <para>
 286 No variable can have an unboxed tuple type, nor may a constructor or function
 287 argument have an unboxed tuple type.  The following are all illegal:
 288
 289
 290 <programlisting>
 291   data Foo = Foo (# Int, Int #)
 292
 293   f :: (# Int, Int #) -&#62; (# Int, Int #)
 294   f x = x
 295
 296   g :: (# Int, Int #) -&#62; Int
 297   g (# a,b #) = a
 298
 299   h x = let y = (# x,x #) in ...
 300 </programlisting>
 301 </para>
 302 </listitem>
 303 </itemizedlist>
 304 </para>
 305 <para>
 306 The typical use of unboxed tuples is simply to return multiple values,
 307 binding those multiple results with a <literal>case</literal> expression, thus:
 308 <programlisting>
 309   f x y = (# x+1, y-1 #)
 310   g x = case f x x of { (# a, b #) -&#62; a + b }
 311 </programlisting>
 312 You can have an unboxed tuple in a pattern binding, thus
 313 <programlisting>
 314   f x = let (# p,q #) = h x in ..body..
 315 </programlisting>
 316 If the types of <literal>p</literal> and <literal>q</literal> are not unboxed,
 317 the resulting binding is lazy like any other Haskell pattern binding.  The
 318 above example desugars like this:
 319 <programlisting>
 320   f x = let t = case h x o f{ (# p,q #) -> (p,q)
 321             p = fst t
 322             q = snd t
 323         in ..body..
 324 </programlisting>
 325 Indeed, the bindings can even be recursive.
 326 </para>
 327
 328 </sect2>
 329 </sect1>
 330
 331
 332 <!-- ====================== SYNTACTIC EXTENSIONS =======================  -->
 333
 334 <sect1 id="syntax-extns">
 335 <title>Syntactic extensions</title>
 336
 337     <sect2 id="unicode-syntax">
 338       <title>Unicode syntax</title>
 339       <para>The language
 340       extension <option>-XUnicodeSyntax</option><indexterm><primary><option>-XUnicodeSyntax</option></primary></indexterm>
 341       enables Unicode characters to be used to stand for certain ASCII
 342       character sequences.  The following alternatives are provided:</para>
 343
 344       <informaltable>
 345         <tgroup cols="2" align="left" colsep="1" rowsep="1">
 346           <thead>
 347             <row>
 348               <entry>ASCII</entry>
 349               <entry>Unicode alternative</entry>
 350               <entry>Code point</entry>
 351               <entry>Name</entry>
 352             </row>
 353           </thead>
 354
 355 <!--
 356                to find the DocBook entities for these characters, find
 357                the Unicode code point (e.g. 0x2237), and grep for it in
 358                /usr/share/sgml/docbook/xml-dtd-*/ent/* (or equivalent on
 359                your system.  Some of these Unicode code points don't have
 360                equivalent DocBook entities.
 361             -->
 362
 363           <tbody>
 364             <row>
 365               <entry><literal>::</literal></entry>
 366               <entry>::</entry> <!-- no special char, apparently -->
 367               <entry>0x2237</entry>
 368               <entry>PROPORTION</entry>
 369             </row>
 370           </tbody>
 371           <tbody>
 372             <row>
 373               <entry><literal>=&gt;</literal></entry>
 374               <entry>&rArr;</entry>
 375               <entry>0x21D2</entry>
 376               <entry>RIGHTWARDS DOUBLE ARROW</entry>
 377             </row>
 378           </tbody>
 379           <tbody>
 380             <row>
 381               <entry><literal>forall</literal></entry>
 382               <entry>&forall;</entry>
 383               <entry>0x2200</entry>
 384               <entry>FOR ALL</entry>
 385             </row>
 386           </tbody>
 387           <tbody>
 388             <row>
 389               <entry><literal>-&gt;</literal></entry>
 390               <entry>&rarr;</entry>
 391               <entry>0x2192</entry>
 392               <entry>RIGHTWARDS ARROW</entry>
 393             </row>
 394           </tbody>
 395           <tbody>
 396             <row>
 397               <entry><literal>&lt;-</literal></entry>
 398               <entry>&larr;</entry>
 399               <entry>0x2190</entry>
 400               <entry>LEFTWARDS ARROW</entry>
 401             </row>
 402           </tbody>
 403
 404           <tbody>
 405             <row>
 406               <entry>-&lt;</entry>
 407               <entry>&larrtl;</entry>
 408               <entry>0x2919</entry>
 409               <entry>LEFTWARDS ARROW-TAIL</entry>
 410             </row>
 411           </tbody>
 412
 413           <tbody>
 414             <row>
 415               <entry>&gt;-</entry>
 416               <entry>&rarrtl;</entry>
 417               <entry>0x291A</entry>
 418               <entry>RIGHTWARDS ARROW-TAIL</entry>
 419             </row>
 420           </tbody>
 421
 422           <tbody>
 423             <row>
 424               <entry>-&lt;&lt;</entry>
 425               <entry></entry>
 426               <entry>0x291B</entry>
 427               <entry>LEFTWARDS DOUBLE ARROW-TAIL</entry>
 428             </row>
 429           </tbody>
 430
 431           <tbody>
 432             <row>
 433               <entry>&gt;&gt;-</entry>
 434               <entry></entry>
 435               <entry>0x291C</entry>
 436               <entry>RIGHTWARDS DOUBLE ARROW-TAIL</entry>
 437             </row>
 438           </tbody>
 439
 440           <tbody>
 441             <row>
 442               <entry>*</entry>
 443               <entry>&starf;</entry>
 444               <entry>0x2605</entry>
 445               <entry>BLACK STAR</entry>
 446             </row>
 447           </tbody>
 448
 449         </tgroup>
 450       </informaltable>
 451     </sect2>
 452
 453     <sect2 id="magic-hash">
 454       <title>The magic hash</title>
 455       <para>The language extension <option>-XMagicHash</option> allows "&num;" as a
 456         postfix modifier to identifiers.  Thus, "x&num;" is a valid variable, and "T&num;" is
 457         a valid type constructor or data constructor.</para>
 458
 459       <para>The hash sign does not change sematics at all.  We tend to use variable
 460         names ending in "&num;" for unboxed values or types (e.g. <literal>Int&num;</literal>),
 461         but there is no requirement to do so; they are just plain ordinary variables.
 462         Nor does the <option>-XMagicHash</option> extension bring anything into scope.
 463         For example, to bring <literal>Int&num;</literal> into scope you must
 464         import <literal>GHC.Prim</literal> (see <xref linkend="primitives"/>);
 465         the <option>-XMagicHash</option> extension
 466         then allows you to <emphasis>refer</emphasis> to the <literal>Int&num;</literal>
 467         that is now in scope.</para>
 468       <para> The <option>-XMagicHash</option> also enables some new forms of literals (see <xref linkend="glasgow-unboxed"/>):
 469         <itemizedlist>
 470           <listitem><para> <literal>'x'&num;</literal> has type <literal>Char&num;</literal></para> </listitem>
 471           <listitem><para> <literal>&quot;foo&quot;&num;</literal> has type <literal>Addr&num;</literal></para> </listitem>
 472           <listitem><para> <literal>3&num;</literal> has type <literal>Int&num;</literal>. In general,
 473           any Haskell 98 integer lexeme followed by a <literal>&num;</literal> is an <literal>Int&num;</literal> literal, e.g.
 474             <literal>-0x3A&num;</literal> as well as <literal>32&num;</literal></para>.</listitem>
 475           <listitem><para> <literal>3&num;&num;</literal> has type <literal>Word&num;</literal>. In general,
 476           any non-negative Haskell 98 integer lexeme followed by <literal>&num;&num;</literal>
 477               is a <literal>Word&num;</literal>. </para> </listitem>
 478           <listitem><para> <literal>3.2&num;</literal> has type <literal>Float&num;</literal>.</para> </listitem>
 479           <listitem><para> <literal>3.2&num;&num;</literal> has type <literal>Double&num;</literal></para> </listitem>
 480           </itemizedlist>
 481       </para>
 482    </sect2>
 483
 484     <sect2 id="new-qualified-operators">
 485       <title>New qualified operator syntax</title>
 486
 487       <para>A new syntax for referencing qualified operators is
 488         planned to be introduced by Haskell', and is enabled in GHC
 489         with
 490         the <option>-XNewQualifiedOperators</option><indexterm><primary><option>-XNewQualifiedOperators</option></primary></indexterm>
 491         option.  In the new syntax, the prefix form of a qualified
 492         operator is
 493         written <literal><replaceable>module</replaceable>.(<replaceable>symbol</replaceable>)</literal>
 494         (in Haskell 98 this would
 495         be <literal>(<replaceable>module</replaceable>.<replaceable>symbol</replaceable>)</literal>),
 496         and the infix form is
 497         written <literal>`<replaceable>module</replaceable>.(<replaceable>symbol</replaceable>)`</literal>
 498         (in Haskell 98 this would
 499         be <literal>`<replaceable>module</replaceable>.<replaceable>symbol</replaceable>`</literal>.
 500         For example:
 501 <programlisting>
 502   add x y = Prelude.(+) x y
 503   subtract y = (`Prelude.(-)` y)
 504 </programlisting>
 505         The new form of qualified operators is intended to regularise
 506         the syntax by eliminating odd cases
 507         like <literal>Prelude..</literal>.  For example,
 508         when <literal>NewQualifiedOperators</literal> is on, it is possible to
 509         write the enumerated sequence <literal>[Monday..]</literal>
 510         without spaces, whereas in Haskell 98 this would be a
 511         reference to the operator &lsquo;<literal>.</literal>&lsquo;
 512         from module <literal>Monday</literal>.</para>
 513
 514       <para>When <option>-XNewQualifiedOperators</option> is on, the old Haskell
 515         98 syntax for qualified operators is not accepted, so this
 516         option may cause existing Haskell 98 code to break.</para>
 517
 518     </sect2>
 519
 520
 521     <!-- ====================== HIERARCHICAL MODULES =======================  -->
 522
 523
 524     <sect2 id="hierarchical-modules">
 525       <title>Hierarchical Modules</title>
 526
 527       <para>GHC supports a small extension to the syntax of module
 528       names: a module name is allowed to contain a dot
 529       <literal>&lsquo;.&rsquo;</literal>.  This is also known as the
 530       &ldquo;hierarchical module namespace&rdquo; extension, because
 531       it extends the normally flat Haskell module namespace into a
 532       more flexible hierarchy of modules.</para>
 533
 534       <para>This extension has very little impact on the language
 535       itself; modules names are <emphasis>always</emphasis> fully
 536       qualified, so you can just think of the fully qualified module
 537       name as <quote>the module name</quote>.  In particular, this
 538       means that the full module name must be given after the
 539       <literal>module</literal> keyword at the beginning of the
 540       module; for example, the module <literal>A.B.C</literal> must
 541       begin</para>
 542
 543 <programlisting>module A.B.C</programlisting>
 544
 545
 546       <para>It is a common strategy to use the <literal>as</literal>
 547       keyword to save some typing when using qualified names with
 548       hierarchical modules.  For example:</para>
 549
 550 <programlisting>
 551 import qualified Control.Monad.ST.Strict as ST
 552 </programlisting>
 553
 554       <para>For details on how GHC searches for source and interface
 555       files in the presence of hierarchical modules, see <xref
 556       linkend="search-path"/>.</para>
 557
 558       <para>GHC comes with a large collection of libraries arranged
 559       hierarchically; see the accompanying <ulink
 560       url="../libraries/index.html">library
 561       documentation</ulink>.  More libraries to install are available
 562       from <ulink
 563       url="http://hackage.haskell.org/packages/hackage.html">HackageDB</ulink>.</para>
 564     </sect2>
 565
 566     <!-- ====================== PATTERN GUARDS =======================  -->
 567
 568 <sect2 id="pattern-guards">
 569 <title>Pattern guards</title>
 570
 571 <para>
 572 <indexterm><primary>Pattern guards (Glasgow extension)</primary></indexterm>
 573 The discussion that follows is an abbreviated version of Simon Peyton Jones's original <ulink url="http://research.microsoft.com/~simonpj/Haskell/guards.html">proposal</ulink>. (Note that the proposal was written before pattern guards were implemented, so refers to them as unimplemented.)
 574 </para>
 575
 576 <para>
 577 Suppose we have an abstract data type of finite maps, with a
 578 lookup operation:
 579
 580 <programlisting>
 581 lookup :: FiniteMap -> Int -> Maybe Int
 582 </programlisting>
 583
 584 The lookup returns <function>Nothing</function> if the supplied key is not in the domain of the mapping, and <function>(Just v)</function> otherwise,
 585 where <varname>v</varname> is the value that the key maps to.  Now consider the following definition:
 586 </para>
 587
 588 <programlisting>
 589 clunky env var1 var2 | ok1 &amp;&amp; ok2 = val1 + val2
 590 | otherwise  = var1 + var2
 591 where
 592   m1 = lookup env var1
 593   m2 = lookup env var2
 594   ok1 = maybeToBool m1
 595   ok2 = maybeToBool m2
 596   val1 = expectJust m1
 597   val2 = expectJust m2
 598 </programlisting>
 599
 600 <para>
 601 The auxiliary functions are
 602 </para>
 603
 604 <programlisting>
 605 maybeToBool :: Maybe a -&gt; Bool
 606 maybeToBool (Just x) = True
 607 maybeToBool Nothing  = False
 608
 609 expectJust :: Maybe a -&gt; a
 610 expectJust (Just x) = x
 611 expectJust Nothing  = error "Unexpected Nothing"
 612 </programlisting>
 613
 614 <para>
 615 What is <function>clunky</function> doing? The guard <literal>ok1 &amp;&amp;
 616 ok2</literal> checks that both lookups succeed, using
 617 <function>maybeToBool</function> to convert the <function>Maybe</function>
 618 types to booleans. The (lazily evaluated) <function>expectJust</function>
 619 calls extract the values from the results of the lookups, and binds the
 620 returned values to <varname>val1</varname> and <varname>val2</varname>
 621 respectively.  If either lookup fails, then clunky takes the
 622 <literal>otherwise</literal> case and returns the sum of its arguments.
 623 </para>
 624
 625 <para>
 626 This is certainly legal Haskell, but it is a tremendously verbose and
 627 un-obvious way to achieve the desired effect.  Arguably, a more direct way
 628 to write clunky would be to use case expressions:
 629 </para>
 630
 631 <programlisting>
 632 clunky env var1 var2 = case lookup env var1 of
 633   Nothing -&gt; fail
 634   Just val1 -&gt; case lookup env var2 of
 635     Nothing -&gt; fail
 636     Just val2 -&gt; val1 + val2
 637 where
 638   fail = var1 + var2
 639 </programlisting>
 640
 641 <para>
 642 This is a bit shorter, but hardly better.  Of course, we can rewrite any set
 643 of pattern-matching, guarded equations as case expressions; that is
 644 precisely what the compiler does when compiling equations! The reason that
 645 Haskell provides guarded equations is because they allow us to write down
 646 the cases we want to consider, one at a time, independently of each other.
 647 This structure is hidden in the case version.  Two of the right-hand sides
 648 are really the same (<function>fail</function>), and the whole expression
 649 tends to become more and more indented.
 650 </para>
 651
 652 <para>
 653 Here is how I would write clunky:
 654 </para>
 655
 656 <programlisting>
 657 clunky env var1 var2
 658   | Just val1 &lt;- lookup env var1
 659   , Just val2 &lt;- lookup env var2
 660   = val1 + val2
 661 ...other equations for clunky...
 662 </programlisting>
 663
 664 <para>
 665 The semantics should be clear enough.  The qualifiers are matched in order.
 666 For a <literal>&lt;-</literal> qualifier, which I call a pattern guard, the
 667 right hand side is evaluated and matched against the pattern on the left.
 668 If the match fails then the whole guard fails and the next equation is
 669 tried.  If it succeeds, then the appropriate binding takes place, and the
 670 next qualifier is matched, in the augmented environment.  Unlike list
 671 comprehensions, however, the type of the expression to the right of the
 672 <literal>&lt;-</literal> is the same as the type of the pattern to its
 673 left.  The bindings introduced by pattern guards scope over all the
 674 remaining guard qualifiers, and over the right hand side of the equation.
 675 </para>
 676
 677 <para>
 678 Just as with list comprehensions, boolean expressions can be freely mixed
 679 with among the pattern guards.  For example:
 680 </para>
 681
 682 <programlisting>
 683 f x | [y] &lt;- x
 684     , y > 3
 685     , Just z &lt;- h y
 686     = ...
 687 </programlisting>
 688
 689 <para>
 690 Haskell's current guards therefore emerge as a special case, in which the
 691 qualifier list has just one element, a boolean expression.
 692 </para>
 693 </sect2>
 694
 695     <!-- ===================== View patterns ===================  -->
 696
 697 <sect2 id="view-patterns">
 698 <title>View patterns
 699 </title>
 700
 701 <para>
 702 View patterns are enabled by the flag <literal>-XViewPatterns</literal>.
 703 More information and examples of view patterns can be found on the
 704 <ulink url="http://hackage.haskell.org/trac/ghc/wiki/ViewPatterns">Wiki
 705 page</ulink>.
 706 </para>
 707
 708 <para>
 709 View patterns are somewhat like pattern guards that can be nested inside
 710 of other patterns.  They are a convenient way of pattern-matching
 711 against values of abstract types. For example, in a programming language
 712 implementation, we might represent the syntax of the types of the
 713 language as follows:
 714
 715 <programlisting>
 716 type Typ
 717
 718 data TypView = Unit
 719              | Arrow Typ Typ
 720
 721 view :: Type -> TypeView
 722
 723 -- additional operations for constructing Typ's ...
 724 </programlisting>
 725
 726 The representation of Typ is held abstract, permitting implementations
 727 to use a fancy representation (e.g., hash-consing to manage sharing).
 728
 729 Without view patterns, using this signature a little inconvenient:
 730 <programlisting>
 731 size :: Typ -> Integer
 732 size t = case view t of
 733   Unit -> 1
 734   Arrow t1 t2 -> size t1 + size t2
 735 </programlisting>
 736
 737 It is necessary to iterate the case, rather than using an equational
 738 function definition. And the situation is even worse when the matching
 739 against <literal>t</literal> is buried deep inside another pattern.
 740 </para>
 741
 742 <para>
 743 View patterns permit calling the view function inside the pattern and
 744 matching against the result:
 745 <programlisting>
 746 size (view -> Unit) = 1
 747 size (view -> Arrow t1 t2) = size t1 + size t2
 748 </programlisting>
 749
 750 That is, we add a new form of pattern, written
 751 <replaceable>expression</replaceable> <literal>-></literal>
 752 <replaceable>pattern</replaceable> that means "apply the expression to
 753 whatever we're trying to match against, and then match the result of
 754 that application against the pattern". The expression can be any Haskell
 755 expression of function type, and view patterns can be used wherever
 756 patterns are used.
 757 </para>
 758
 759 <para>
 760 The semantics of a pattern <literal>(</literal>
 761 <replaceable>exp</replaceable> <literal>-></literal>
 762 <replaceable>pat</replaceable> <literal>)</literal> are as follows:
 763
 764 <itemizedlist>
 765
 766 <listitem> Scoping:
 767
 768 <para>The variables bound by the view pattern are the variables bound by
 769 <replaceable>pat</replaceable>.
 770 </para>
 771
 772 <para>
 773 Any variables in <replaceable>exp</replaceable> are bound occurrences,
 774 but variables bound "to the left" in a pattern are in scope.  This
 775 feature permits, for example, one argument to a function to be used in
 776 the view of another argument.  For example, the function
 777 <literal>clunky</literal> from <xref linkend="pattern-guards" /> can be
 778 written using view patterns as follows:
 779
 780 <programlisting>
 781 clunky env (lookup env -> Just val1) (lookup env -> Just val2) = val1 + val2
 782 ...other equations for clunky...
 783 </programlisting>
 784 </para>
 785
 786 <para>
 787 More precisely, the scoping rules are:
 788 <itemizedlist>
 789 <listitem>
 790 <para>
 791 In a single pattern, variables bound by patterns to the left of a view
 792 pattern expression are in scope. For example:
 793 <programlisting>
 794 example :: Maybe ((String -> Integer,Integer), String) -> Bool
 795 example Just ((f,_), f -> 4) = True
 796 </programlisting>
 797
 798 Additionally, in function definitions, variables bound by matching earlier curried
 799 arguments may be used in view pattern expressions in later arguments:
 800 <programlisting>
 801 example :: (String -> Integer) -> String -> Bool
 802 example f (f -> 4) = True
 803 </programlisting>
 804 That is, the scoping is the same as it would be if the curried arguments
 805 were collected into a tuple.
 806 </para>
 807 </listitem>
 808
 809 <listitem>
 810 <para>
 811 In mutually recursive bindings, such as <literal>let</literal>,
 812 <literal>where</literal>, or the top level, view patterns in one
 813 declaration may not mention variables bound by other declarations.  That
 814 is, each declaration must be self-contained.  For example, the following
 815 program is not allowed:
 816 <programlisting>
 817 let {(x -> y) = e1 ;
 818      (y -> x) = e2 } in x
 819 </programlisting>
 820
 821 (We may lift this
 822 restriction in the future; the only cost is that type checking patterns
 823 would get a little more complicated.)
 824
 825
 826 </para>
 827 </listitem>
 828 </itemizedlist>
 829
 830 </para>
 831 </listitem>
 832
 833 <listitem><para> Typing: If <replaceable>exp</replaceable> has type
 834 <replaceable>T1</replaceable> <literal>-></literal>
 835 <replaceable>T2</replaceable> and <replaceable>pat</replaceable> matches
 836 a <replaceable>T2</replaceable>, then the whole view pattern matches a
 837 <replaceable>T1</replaceable>.
 838 </para></listitem>
 839
 840 <listitem><para> Matching: To the equations in Section 3.17.3 of the
 841 <ulink url="http://www.haskell.org/onlinereport/">Haskell 98
 842 Report</ulink>, add the following:
 843 <programlisting>
 844 case v of { (e -> p) -> e1 ; _ -> e2 }
 845  =
 846 case (e v) of { p -> e1 ; _ -> e2 }
 847 </programlisting>
 848 That is, to match a variable <replaceable>v</replaceable> against a pattern
 849 <literal>(</literal> <replaceable>exp</replaceable>
 850 <literal>-></literal> <replaceable>pat</replaceable>
 851 <literal>)</literal>, evaluate <literal>(</literal>
 852 <replaceable>exp</replaceable> <replaceable> v</replaceable>
 853 <literal>)</literal> and match the result against
 854 <replaceable>pat</replaceable>.
 855 </para></listitem>
 856
 857 <listitem><para> Efficiency: When the same view function is applied in
 858 multiple branches of a function definition or a case expression (e.g.,
 859 in <literal>size</literal> above), GHC makes an attempt to collect these
 860 applications into a single nested case expression, so that the view
 861 function is only applied once.  Pattern compilation in GHC follows the
 862 matrix algorithm described in Chapter 4 of <ulink
 863 url="http://research.microsoft.com/~simonpj/Papers/slpj-book-1987/">The
 864 Implementation of Functional Programming Languages</ulink>.  When the
 865 top rows of the first column of a matrix are all view patterns with the
 866 "same" expression, these patterns are transformed into a single nested
 867 case.  This includes, for example, adjacent view patterns that line up
 868 in a tuple, as in
 869 <programlisting>
 870 f ((view -> A, p1), p2) = e1
 871 f ((view -> B, p3), p4) = e2
 872 </programlisting>
 873 </para>
 874
 875 <para> The current notion of when two view pattern expressions are "the
 876 same" is very restricted: it is not even full syntactic equality.
 877 However, it does include variables, literals, applications, and tuples;
 878 e.g., two instances of <literal>view ("hi", "there")</literal> will be
 879 collected.  However, the current implementation does not compare up to
 880 alpha-equivalence, so two instances of <literal>(x, view x ->
 881 y)</literal> will not be coalesced.
 882 </para>
 883
 884 </listitem>
 885
 886 </itemizedlist>
 887 </para>
 888
 889 </sect2>
 890
 891     <!-- ===================== n+k patterns ===================  -->
 892
 893 <sect2 id="n-k-patterns">
 894 <title>n+k patterns</title>
 895 <indexterm><primary><option>-XNoNPlusKPatterns</option></primary></indexterm>
 896
 897 <para>
 898 <literal>n+k</literal> pattern support is enabled by default. To disable
 899 it, you can use the <option>-XNoNPlusKPatterns</option> flag.
 900 </para>
 901
 902 </sect2>
 903
 904     <!-- ===================== Recursive do-notation ===================  -->
 905
 906 <sect2 id="recursive-do-notation">
 907 <title>The recursive do-notation
 908 </title>
 909
 910 <para>
 911 The do-notation of Haskell 98 does not allow <emphasis>recursive bindings</emphasis>,
 912 that is, the variables bound in a do-expression are visible only in the textually following
 913 code block. Compare this to a let-expression, where bound variables are visible in the entire binding
 914 group. It turns out that several applications can benefit from recursive bindings in
 915 the do-notation.  The <option>-XDoRec</option> flag provides the necessary syntactic support.
 916 </para>
 917 <para>
 918 Here is a simple (albeit contrived) example:
 919 <programlisting>
 920 {-# LANGUAGE DoRec #-}
 921 justOnes = do { rec { xs &lt;- Just (1:xs) }
 922               ; return (map negate xs) }
 923 </programlisting>
 924 As you can guess <literal>justOnes</literal> will evaluate to <literal>Just [-1,-1,-1,...</literal>.
 925 </para>
 926 <para>
 927 The background and motivation for recursive do-notation is described in
 928 <ulink url="http://sites.google.com/site/leventerkok/">A recursive do for Haskell</ulink>,
 929 by Levent Erkok, John Launchbury,
 930 Haskell Workshop 2002, pages: 29-37. Pittsburgh, Pennsylvania.
 931 The theory behind monadic value recursion is explained further in Erkok's thesis
 932 <ulink url="http://sites.google.com/site/leventerkok/erkok-thesis.pdf">Value Recursion in Monadic Computations</ulink>.
 933 However, note that GHC uses a different syntax than the one described in these documents.
 934 </para>
 935
 936 <sect3>
 937 <title>Details of recursive do-notation</title>
 938 <para>
 939 The recursive do-notation is enabled with the flag <option>-XDoRec</option> or, equivalently,
 940 the LANGUAGE pragma <option>DoRec</option>.  It introduces the single new keyword "<literal>rec</literal>",
 941 which wraps a mutually-recursive group of monadic statements,
 942 producing a single statement.
 943 </para>
 944 <para>Similar to a <literal>let</literal>
 945 statement, the variables bound in the <literal>rec</literal> are
 946 visible throughout the <literal>rec</literal> group, and below it.
 947 For example, compare
 948 <programlisting>
 949 do { a &lt;- getChar              do { a &lt;- getChar
 950    ; let { r1 = f a r2             ; rec { r1 &lt;- f a r2
 951          ; r2 = g r1 }                   ; r2 &lt;- g r1 }
 952    ; return (r1 ++ r2) }          ; return (r1 ++ r2) }
 953 </programlisting>
 954 In both cases, <literal>r1</literal> and <literal>r2</literal> are
 955 available both throughout the <literal>let</literal> or <literal>rec</literal> block, and
 956 in the statements that follow it.  The difference is that <literal>let</literal> is non-monadic,
 957 while <literal>rec</literal> is monadic.  (In Haskell <literal>let</literal> is
 958 really <literal>letrec</literal>, of course.)
 959 </para>
 960 <para>
 961 The static and dynamic semantics of <literal>rec</literal> can be described as follows:
 962 <itemizedlist>
 963 <listitem><para>
 964 First,
 965 similar to let-bindings, the <literal>rec</literal> is broken into
 966 minimal recursive groups, a process known as <emphasis>segmentation</emphasis>.
 967 For example:
 968 <programlisting>
 969 rec { a &lt;- getChar      ===>     a &lt;- getChar
 970     ; b &lt;- f a c                 rec { b &lt;- f a c
 971     ; c &lt;- f b a                     ; c &lt;- f b a }
 972     ; putChar c }                putChar c
 973 </programlisting>
 974 The details of segmentation are described in Section 3.2 of
 975 <ulink url="http://sites.google.com/site/leventerkok/">A recursive do for Haskell</ulink>.
 976 Segmentation improves polymorphism, reduces the size of the recursive "knot", and, as the paper
 977 describes, also has a semantic effect (unless the monad satisfies the right-shrinking law).
 978 </para></listitem>
 979 <listitem><para>
 980 Then each resulting <literal>rec</literal> is desugared, using a call to <literal>Control.Monad.Fix.mfix</literal>.
 981 For example, the <literal>rec</literal> group in the preceding example is desugared like this:
 982 <programlisting>
 983 rec { b &lt;- f a c     ===>    (b,c) &lt;- mfix (\~(b,c) -> do { b &lt;- f a c
 984     ; c &lt;- f b a }                                        ; c &lt;- f b a
 985                                                           ; return (b,c) })
 986 </programlisting>
 987 In general, the statment <literal>rec <replaceable>ss</replaceable></literal>
 988 is desugared to the statement
 989 <programlisting>
 990 <replaceable>vs</replaceable> &lt;- mfix (\~<replaceable>vs</replaceable> -&gt; do { <replaceable>ss</replaceable>; return <replaceable>vs</replaceable> })
 991 </programlisting>
 992 where <replaceable>vs</replaceable> is a tuple of the variables bound by <replaceable>ss</replaceable>.
 993 </para><para>
 994 The original <literal>rec</literal> typechecks exactly
 995 when the above desugared version would do so.  For example, this means that
 996 the variables <replaceable>vs</replaceable> are all monomorphic in the statements
 997 following the <literal>rec</literal>, because they are bound by a lambda.
 998 </para>
 999 <para>
1000 The <literal>mfix</literal> function is defined in the <literal>MonadFix</literal>
1001 class, in <literal>Control.Monad.Fix</literal>, thus:
1002 <programlisting>
1003 class Monad m => MonadFix m where
1004    mfix :: (a -> m a) -> m a
1005 </programlisting>
1006 </para>
1007 </listitem>
1008 </itemizedlist>
1009 </para>
1010 <para>
1011 Here are some other important points in using the recursive-do notation:
1012 <itemizedlist>
1013 <listitem><para>
1014 It is enabled with the flag <literal>-XDoRec</literal>, which is in turn implied by
1015 <literal>-fglasgow-exts</literal>.
1016 </para></listitem>
1017
1018 <listitem><para>
1019 If recursive bindings are required for a monad,
1020 then that monad must be declared an instance of the <literal>MonadFix</literal> class.
1021 </para></listitem>
1022
1023 <listitem><para>
1024 The following instances of <literal>MonadFix</literal> are automatically provided: List, Maybe, IO.
1025 Furthermore, the Control.Monad.ST and Control.Monad.ST.Lazy modules provide the instances of the MonadFix class
1026 for Haskell's internal state monad (strict and lazy, respectively).
1027 </para></listitem>
1028
1029 <listitem><para>
1030 Like <literal>let</literal> and <literal>where</literal> bindings,
1031 name shadowing is not allowed within a <literal>rec</literal>;
1032 that is, all the names bound in a single <literal>rec</literal> must
1033 be distinct (Section 3.3 of the paper).
1034 </para></listitem>
1035 <listitem><para>
1036 It supports rebindable syntax (see <xref linkend="rebindable-syntax"/>).
1037 </para></listitem>
1038 </itemizedlist>
1039 </para>
1040 </sect3>
1041
1042 <sect3 id="mdo-notation"> <title> Mdo-notation (deprecated) </title>
1043
1044 <para> GHC used to support the flag <option>-XRecursiveDo</option>,
1045 which enabled the keyword <literal>mdo</literal>, precisely as described in
1046 <ulink url="http://sites.google.com/site/leventerkok/">A recursive do for Haskell</ulink>,
1047 but this is now deprecated.  Instead of <literal>mdo { Q; e }</literal>, write
1048 <literal>do { rec Q; e }</literal>.
1049 </para>
1050 <para>
1051 Historical note: The old implementation of the mdo-notation (and most
1052 of the existing documents) used the name
1053 <literal>MonadRec</literal> for the class and the corresponding library.
1054 This name is not supported by GHC.
1055 </para>
1056 </sect3>
1057
1058 </sect2>
1059
1060
1061    <!-- ===================== PARALLEL LIST COMPREHENSIONS ===================  -->
1062
1063   <sect2 id="parallel-list-comprehensions">
1064     <title>Parallel List Comprehensions</title>
1065     <indexterm><primary>list comprehensions</primary><secondary>parallel</secondary>
1066     </indexterm>
1067     <indexterm><primary>parallel list comprehensions</primary>
1068     </indexterm>
1069
1070     <para>Parallel list comprehensions are a natural extension to list
1071     comprehensions.  List comprehensions can be thought of as a nice
1072     syntax for writing maps and filters.  Parallel comprehensions
1073     extend this to include the zipWith family.</para>
1074
1075     <para>A parallel list comprehension has multiple independent
1076     branches of qualifier lists, each separated by a `|' symbol.  For
1077     example, the following zips together two lists:</para>
1078
1079 <programlisting>
1080    [ (x, y) | x &lt;- xs | y &lt;- ys ]
1081 </programlisting>
1082
1083     <para>The behavior of parallel list comprehensions follows that of
1084     zip, in that the resulting list will have the same length as the
1085     shortest branch.</para>
1086
1087     <para>We can define parallel list comprehensions by translation to
1088     regular comprehensions.  Here's the basic idea:</para>
1089
1090     <para>Given a parallel comprehension of the form: </para>
1091
1092 <programlisting>
1093    [ e | p1 &lt;- e11, p2 &lt;- e12, ...
1094        | q1 &lt;- e21, q2 &lt;- e22, ...
1095        ...
1096    ]
1097 </programlisting>
1098
1099     <para>This will be translated to: </para>
1100
1101 <programlisting>
1102    [ e | ((p1,p2), (q1,q2), ...) &lt;- zipN [(p1,p2) | p1 &lt;- e11, p2 &lt;- e12, ...]
1103                                          [(q1,q2) | q1 &lt;- e21, q2 &lt;- e22, ...]
1104                                          ...
1105    ]
1106 </programlisting>
1107
1108     <para>where `zipN' is the appropriate zip for the given number of
1109     branches.</para>
1110
1111   </sect2>
1112
1113   <!-- ===================== TRANSFORM LIST COMPREHENSIONS ===================  -->
1114
1115   <sect2 id="generalised-list-comprehensions">
1116     <title>Generalised (SQL-Like) List Comprehensions</title>
1117     <indexterm><primary>list comprehensions</primary><secondary>generalised</secondary>
1118     </indexterm>
1119     <indexterm><primary>extended list comprehensions</primary>
1120     </indexterm>
1121     <indexterm><primary>group</primary></indexterm>
1122     <indexterm><primary>sql</primary></indexterm>
1123
1124
1125     <para>Generalised list comprehensions are a further enhancement to the
1126     list comprehension syntactic sugar to allow operations such as sorting
1127     and grouping which are familiar from SQL.   They are fully described in the
1128         paper <ulink url="http://research.microsoft.com/~simonpj/papers/list-comp">
1129           Comprehensive comprehensions: comprehensions with "order by" and "group by"</ulink>,
1130     except that the syntax we use differs slightly from the paper.</para>
1131 <para>The extension is enabled with the flag <option>-XTransformListComp</option>.</para>
1132 <para>Here is an example:
1133 <programlisting>
1134 employees = [ ("Simon", "MS", 80)
1135 , ("Erik", "MS", 100)
1136 , ("Phil", "Ed", 40)
1137 , ("Gordon", "Ed", 45)
1138 , ("Paul", "Yale", 60)]
1139
1140 output = [ (the dept, sum salary)
1141 | (name, dept, salary) &lt;- employees
1142 , then group by dept
1143 , then sortWith by (sum salary)
1144 , then take 5 ]
1145 </programlisting>
1146 In this example, the list <literal>output</literal> would take on
1147     the value:
1148
1149 <programlisting>
1150 [("Yale", 60), ("Ed", 85), ("MS", 180)]
1151 </programlisting>
1152 </para>
1153 <para>There are three new keywords: <literal>group</literal>, <literal>by</literal>, and <literal>using</literal>.
1154 (The function <literal>sortWith</literal> is not a keyword; it is an ordinary
1155 function that is exported by <literal>GHC.Exts</literal>.)</para>
1156
1157 <para>There are five new forms of comprehension qualifier,
1158 all introduced by the (existing) keyword <literal>then</literal>:
1159     <itemizedlist>
1160     <listitem>
1161
1162 <programlisting>
1163 then f
1164 </programlisting>
1165
1166     This statement requires that <literal>f</literal> have the type <literal>
1167     forall a. [a] -> [a]</literal>. You can see an example of its use in the
1168     motivating example, as this form is used to apply <literal>take 5</literal>.
1169
1170     </listitem>
1171
1172
1173     <listitem>
1174 <para>
1175 <programlisting>
1176 then f by e
1177 </programlisting>
1178
1179     This form is similar to the previous one, but allows you to create a function
1180     which will be passed as the first argument to f. As a consequence f must have
1181     the type <literal>forall a. (a -> t) -> [a] -> [a]</literal>. As you can see
1182     from the type, this function lets f &quot;project out&quot; some information
1183     from the elements of the list it is transforming.</para>
1184
1185     <para>An example is shown in the opening example, where <literal>sortWith</literal>
1186     is supplied with a function that lets it find out the <literal>sum salary</literal>
1187     for any item in the list comprehension it transforms.</para>
1188
1189     </listitem>
1190
1191
1192     <listitem>
1193
1194 <programlisting>
1195 then group by e using f
1196 </programlisting>
1197
1198     <para>This is the most general of the grouping-type statements. In this form,
1199     f is required to have type <literal>forall a. (a -> t) -> [a] -> [[a]]</literal>.
1200     As with the <literal>then f by e</literal> case above, the first argument
1201     is a function supplied to f by the compiler which lets it compute e on every
1202     element of the list being transformed. However, unlike the non-grouping case,
1203     f additionally partitions the list into a number of sublists: this means that
1204     at every point after this statement, binders occurring before it in the comprehension
1205     refer to <emphasis>lists</emphasis> of possible values, not single values. To help understand
1206     this, let's look at an example:</para>
1207
1208 <programlisting>
1209 -- This works similarly to groupWith in GHC.Exts, but doesn't sort its input first
1210 groupRuns :: Eq b => (a -> b) -> [a] -> [[a]]
1211 groupRuns f = groupBy (\x y -> f x == f y)
1212
1213 output = [ (the x, y)
1214 | x &lt;- ([1..3] ++ [1..2])
1215 , y &lt;- [4..6]
1216 , then group by x using groupRuns ]
1217 </programlisting>
1218
1219     <para>This results in the variable <literal>output</literal> taking on the value below:</para>
1220
1221 <programlisting>
1222 [(1, [4, 5, 6]), (2, [4, 5, 6]), (3, [4, 5, 6]), (1, [4, 5, 6]), (2, [4, 5, 6])]
1223 </programlisting>
1224
1225     <para>Note that we have used the <literal>the</literal> function to change the type
1226     of x from a list to its original numeric type. The variable y, in contrast, is left
1227     unchanged from the list form introduced by the grouping.</para>
1228
1229     </listitem>
1230
1231     <listitem>
1232
1233 <programlisting>
1234 then group by e
1235 </programlisting>
1236
1237     <para>This form of grouping is essentially the same as the one described above. However,
1238     since no function to use for the grouping has been supplied it will fall back on the
1239     <literal>groupWith</literal> function defined in
1240     <ulink url="&libraryBaseLocation;/GHC-Exts.html"><literal>GHC.Exts</literal></ulink>. This
1241     is the form of the group statement that we made use of in the opening example.</para>
1242
1243     </listitem>
1244
1245
1246     <listitem>
1247
1248 <programlisting>
1249 then group using f
1250 </programlisting>
1251
1252     <para>With this form of the group statement, f is required to simply have the type
1253     <literal>forall a. [a] -> [[a]]</literal>, which will be used to group up the
1254     comprehension so far directly. An example of this form is as follows:</para>
1255
1256 <programlisting>
1257 output = [ x
1258 | y &lt;- [1..5]
1259 , x &lt;- "hello"
1260 , then group using inits]
1261 </programlisting>
1262
1263     <para>This will yield a list containing every prefix of the word "hello" written out 5 times:</para>
1264
1265 <programlisting>
1266 ["","h","he","hel","hell","hello","helloh","hellohe","hellohel","hellohell","hellohello","hellohelloh",...]
1267 </programlisting>
1268
1269     </listitem>
1270 </itemizedlist>
1271 </para>
1272   </sect2>
1273
1274    <!-- ===================== REBINDABLE SYNTAX ===================  -->
1275
1276 <sect2 id="rebindable-syntax">
1277 <title>Rebindable syntax and the implicit Prelude import</title>
1278
1279  <para><indexterm><primary>-XNoImplicitPrelude
1280  option</primary></indexterm> GHC normally imports
1281  <filename>Prelude.hi</filename> files for you.  If you'd
1282  rather it didn't, then give it a
1283  <option>-XNoImplicitPrelude</option> option.  The idea is
1284  that you can then import a Prelude of your own.  (But don't
1285  call it <literal>Prelude</literal>; the Haskell module
1286  namespace is flat, and you must not conflict with any
1287  Prelude module.)</para>
1288
1289             <para>Suppose you are importing a Prelude of your own
1290               in order to define your own numeric class
1291             hierarchy.  It completely defeats that purpose if the
1292             literal "1" means "<literal>Prelude.fromInteger
1293             1</literal>", which is what the Haskell Report specifies.
1294             So the <option>-XNoImplicitPrelude</option>
1295               flag <emphasis>also</emphasis> causes
1296             the following pieces of built-in syntax to refer to
1297             <emphasis>whatever is in scope</emphasis>, not the Prelude
1298             versions:
1299             <itemizedlist>
1300               <listitem>
1301                 <para>An integer literal <literal>368</literal> means
1302                 "<literal>fromInteger (368::Integer)</literal>", rather than
1303                 "<literal>Prelude.fromInteger (368::Integer)</literal>".
1304 </para> </listitem>
1305
1306       <listitem><para>Fractional literals are handed in just the same way,
1307           except that the translation is
1308               <literal>fromRational (3.68::Rational)</literal>.
1309 </para> </listitem>
1310
1311           <listitem><para>The equality test in an overloaded numeric pattern
1312               uses whatever <literal>(==)</literal> is in scope.
1313 </para> </listitem>
1314
1315           <listitem><para>The subtraction operation, and the
1316           greater-than-or-equal test, in <literal>n+k</literal> patterns
1317               use whatever <literal>(-)</literal> and <literal>(>=)</literal> are in scope.
1318               </para></listitem>
1319
1320               <listitem>
1321                 <para>Negation (e.g. "<literal>- (f x)</literal>")
1322                 means "<literal>negate (f x)</literal>", both in numeric
1323                 patterns, and expressions.
1324               </para></listitem>
1325
1326               <listitem>
1327           <para>"Do" notation is translated using whatever
1328               functions <literal>(>>=)</literal>,
1329               <literal>(>>)</literal>, and <literal>fail</literal>,
1330               are in scope (not the Prelude
1331               versions).  List comprehensions, mdo (<xref linkend="mdo-notation"/>), and parallel array
1332               comprehensions, are unaffected.  </para></listitem>
1333
1334               <listitem>
1335                 <para>Arrow
1336                 notation (see <xref linkend="arrow-notation"/>)
1337                 uses whatever <literal>arr</literal>,
1338                 <literal>(>>>)</literal>, <literal>first</literal>,
1339                 <literal>app</literal>, <literal>(|||)</literal> and
1340                 <literal>loop</literal> functions are in scope. But unlike the
1341                 other constructs, the types of these functions must match the
1342                 Prelude types very closely.  Details are in flux; if you want
1343                 to use this, ask!
1344               </para></listitem>
1345             </itemizedlist>
1346 In all cases (apart from arrow notation), the static semantics should be that of the desugared form,
1347 even if that is a little unexpected. For example, the
1348 static semantics of the literal <literal>368</literal>
1349 is exactly that of <literal>fromInteger (368::Integer)</literal>; it's fine for
1350 <literal>fromInteger</literal> to have any of the types:
1351 <programlisting>
1352 fromInteger :: Integer -> Integer
1353 fromInteger :: forall a. Foo a => Integer -> a
1354 fromInteger :: Num a => a -> Integer
1355 fromInteger :: Integer -> Bool -> Bool
1356 </programlisting>
1357 </para>
1358
1359              <para>Be warned: this is an experimental facility, with
1360              fewer checks than usual.  Use <literal>-dcore-lint</literal>
1361              to typecheck the desugared program.  If Core Lint is happy
1362              you should be all right.</para>
1363
1364 </sect2>
1365
1366 <sect2 id="postfix-operators">
1367 <title>Postfix operators</title>
1368
1369 <para>
1370   The <option>-XPostfixOperators</option> flag enables a small
1371 extension to the syntax of left operator sections, which allows you to
1372 define postfix operators.  The extension is this: the left section
1373 <programlisting>
1374   (e !)
1375 </programlisting>
1376 is equivalent (from the point of view of both type checking and execution) to the expression
1377 <programlisting>
1378   ((!) e)
1379 </programlisting>
1380 (for any expression <literal>e</literal> and operator <literal>(!)</literal>.
1381 The strict Haskell 98 interpretation is that the section is equivalent to
1382 <programlisting>
1383   (\y -> (!) e y)
1384 </programlisting>
1385 That is, the operator must be a function of two arguments.  GHC allows it to
1386 take only one argument, and that in turn allows you to write the function
1387 postfix.
1388 </para>
1389 <para>The extension does not extend to the left-hand side of function
1390 definitions; you must define such a function in prefix form.</para>
1391
1392 </sect2>
1393
1394 <sect2 id="tuple-sections">
1395 <title>Tuple sections</title>
1396
1397 <para>
1398   The <option>-XTupleSections</option> flag enables Python-style partially applied
1399   tuple constructors. For example, the following program
1400 <programlisting>
1401   (, True)
1402 </programlisting>
1403   is considered to be an alternative notation for the more unwieldy alternative
1404 <programlisting>
1405   \x -> (x, True)
1406 </programlisting>
1407 You can omit any combination of arguments to the tuple, as in the following
1408 <programlisting>
1409   (, "I", , , "Love", , 1337)
1410 </programlisting>
1411 which translates to
1412 <programlisting>
1413   \a b c d -> (a, "I", b, c, "Love", d, 1337)
1414 </programlisting>
1415 </para>
1416
1417 <para>
1418   If you have <link linkend="unboxed-tuples">unboxed tuples</link> enabled, tuple sections
1419   will also be available for them, like so
1420 <programlisting>
1421   (# , True #)
1422 </programlisting>
1423 Because there is no unboxed unit tuple, the following expression
1424 <programlisting>
1425   (# #)
1426 </programlisting>
1427 continues to stand for the unboxed singleton tuple data constructor.
1428 </para>
1429
1430 </sect2>
1431
1432 <sect2 id="disambiguate-fields">
1433 <title>Record field disambiguation</title>
1434 <para>
1435 In record construction and record pattern matching
1436 it is entirely unambiguous which field is referred to, even if there are two different
1437 data types in scope with a common field name.  For example:
1438 <programlisting>
1439 module M where
1440   data S = MkS { x :: Int, y :: Bool }
1441
1442 module Foo where
1443   import M
1444
1445   data T = MkT { x :: Int }
1446
1447   ok1 (MkS { x = n }) = n+1   -- Unambiguous
1448   ok2 n = MkT { x = n+1 }     -- Unambiguous
1449
1450   bad1 k = k { x = 3 }  -- Ambiguous
1451   bad2 k = x k          -- Ambiguous
1452 </programlisting>
1453 Even though there are two <literal>x</literal>'s in scope,
1454 it is clear that the <literal>x</literal> in the pattern in the
1455 definition of <literal>ok1</literal> can only mean the field
1456 <literal>x</literal> from type <literal>S</literal>. Similarly for
1457 the function <literal>ok2</literal>.  However, in the record update
1458 in <literal>bad1</literal> and the record selection in <literal>bad2</literal>
1459 it is not clear which of the two types is intended.
1460 </para>
1461 <para>
1462 Haskell 98 regards all four as ambiguous, but with the
1463 <option>-XDisambiguateRecordFields</option> flag, GHC will accept
1464 the former two.  The rules are precisely the same as those for instance
1465 declarations in Haskell 98, where the method names on the left-hand side
1466 of the method bindings in an instance declaration refer unambiguously
1467 to the method of that class (provided they are in scope at all), even
1468 if there are other variables in scope with the same name.
1469 This reduces the clutter of qualified names when you import two
1470 records from different modules that use the same field name.
1471 </para>
1472 <para>
1473 Some details:
1474 <itemizedlist>
1475 <listitem><para>
1476 Field disambiguation can be combined with punning (see <xref linkend="record-puns"/>). For exampe:
1477 <programlisting>
1478 module Foo where
1479   import M
1480   x=True
1481   ok3 (MkS { x }) = x+1   -- Uses both disambiguation and punning
1482 </programlisting>
1483 </para></listitem>
1484
1485 <listitem><para>
1486 With <option>-XDisambiguateRecordFields</option> you can use <emphasis>unqualifed</emphasis>
1487 field names even if the correponding selector is only in scope <emphasis>qualified</emphasis>
1488 For example, assuming the same module <literal>M</literal> as in our earlier example, this is legal:
1489 <programlisting>
1490 module Foo where
1491   import qualified M    -- Note qualified
1492
1493   ok4 (M.MkS { x = n }) = n+1   -- Unambiguous
1494 </programlisting>
1495 Since the constructore <literal>MkS</literal> is only in scope qualified, you must
1496 name it <literal>M.MkS</literal>, but the field <literal>x</literal> does not need
1497 to be qualified even though <literal>M.x</literal> is in scope but <literal>x</literal>
1498 is not.  (In effect, it is qualified by the constructor.)
1499 </para></listitem>
1500 </itemizedlist>
1501 </para>
1502
1503 </sect2>
1504
1505     <!-- ===================== Record puns ===================  -->
1506
1507 <sect2 id="record-puns">
1508 <title>Record puns
1509 </title>
1510
1511 <para>
1512 Record puns are enabled by the flag <literal>-XNamedFieldPuns</literal>.
1513 </para>
1514
1515 <para>
1516 When using records, it is common to write a pattern that binds a
1517 variable with the same name as a record field, such as:
1518
1519 <programlisting>
1520 data C = C {a :: Int}
1521 f (C {a = a}) = a
1522 </programlisting>
1523 </para>
1524
1525 <para>
1526 Record punning permits the variable name to be elided, so one can simply
1527 write
1528
1529 <programlisting>
1530 f (C {a}) = a
1531 </programlisting>
1532
1533 to mean the same pattern as above.  That is, in a record pattern, the
1534 pattern <literal>a</literal> expands into the pattern <literal>a =
1535 a</literal> for the same name <literal>a</literal>.
1536 </para>
1537
1538 <para>
1539 Note that:
1540 <itemizedlist>
1541 <listitem><para>
1542 Record punning can also be used in an expression, writing, for example,
1543 <programlisting>
1544 let a = 1 in C {a}
1545 </programlisting>
1546 instead of
1547 <programlisting>
1548 let a = 1 in C {a = a}
1549 </programlisting>
1550 The expansion is purely syntactic, so the expanded right-hand side
1551 expression refers to the nearest enclosing variable that is spelled the
1552 same as the field name.
1553 </para></listitem>
1554
1555 <listitem><para>
1556 Puns and other patterns can be mixed in the same record:
1557 <programlisting>
1558 data C = C {a :: Int, b :: Int}
1559 f (C {a, b = 4}) = a
1560 </programlisting>
1561 </para></listitem>
1562
1563 <listitem><para>
1564 Puns can be used wherever record patterns occur (e.g. in
1565 <literal>let</literal> bindings or at the top-level).
1566 </para></listitem>
1567
1568 <listitem><para>
1569 A pun on a qualified field name is expanded by stripping off the module qualifier.
1570 For example:
1571 <programlisting>
1572 f (C {M.a}) = a
1573 </programlisting>
1574 means
1575 <programlisting>
1576 f (M.C {M.a = a}) = a
1577 </programlisting>
1578 (This is useful if the field selector <literal>a</literal> for constructor <literal>M.C</literal>
1579 is only in scope in qualified form.)
1580 </para></listitem>
1581 </itemizedlist>
1582 </para>
1583
1584
1585 </sect2>
1586
1587     <!-- ===================== Record wildcards ===================  -->
1588
1589 <sect2 id="record-wildcards">
1590 <title>Record wildcards
1591 </title>
1592
1593 <para>
1594 Record wildcards are enabled by the flag <literal>-XRecordWildCards</literal>.
1595 This flag implies <literal>-XDisambiguateRecordFields</literal>.
1596 </para>
1597
1598 <para>
1599 For records with many fields, it can be tiresome to write out each field
1600 individually in a record pattern, as in
1601 <programlisting>
1602 data C = C {a :: Int, b :: Int, c :: Int, d :: Int}
1603 f (C {a = 1, b = b, c = c, d = d}) = b + c + d
1604 </programlisting>
1605 </para>
1606
1607 <para>
1608 Record wildcard syntax permits a "<literal>..</literal>" in a record
1609 pattern, where each elided field <literal>f</literal> is replaced by the
1610 pattern <literal>f = f</literal>.  For example, the above pattern can be
1611 written as
1612 <programlisting>
1613 f (C {a = 1, ..}) = b + c + d
1614 </programlisting>
1615 </para>
1616
1617 <para>
1618 More details:
1619 <itemizedlist>
1620 <listitem><para>
1621 Wildcards can be mixed with other patterns, including puns
1622 (<xref linkend="record-puns"/>); for example, in a pattern <literal>C {a
1623 = 1, b, ..})</literal>.  Additionally, record wildcards can be used
1624 wherever record patterns occur, including in <literal>let</literal>
1625 bindings and at the top-level.  For example, the top-level binding
1626 <programlisting>
1627 C {a = 1, ..} = e
1628 </programlisting>
1629 defines <literal>b</literal>, <literal>c</literal>, and
1630 <literal>d</literal>.
1631 </para></listitem>
1632
1633 <listitem><para>
1634 Record wildcards can also be used in expressions, writing, for example,
1635 <programlisting>
1636 let {a = 1; b = 2; c = 3; d = 4} in C {..}
1637 </programlisting>
1638 in place of
1639 <programlisting>
1640 let {a = 1; b = 2; c = 3; d = 4} in C {a=a, b=b, c=c, d=d}
1641 </programlisting>
1642 The expansion is purely syntactic, so the record wildcard
1643 expression refers to the nearest enclosing variables that are spelled
1644 the same as the omitted field names.
1645 </para></listitem>
1646
1647 <listitem><para>
1648 The "<literal>..</literal>" expands to the missing
1649 <emphasis>in-scope</emphasis> record fields, where "in scope"
1650 includes both unqualified and qualified-only.
1651 Any fields that are not in scope are not filled in.  For example
1652 <programlisting>
1653 module M where
1654   data R = R { a,b,c :: Int }
1655 module X where
1656   import qualified M( R(a,b) )
1657   f a b = R { .. }
1658 </programlisting>
1659 The <literal>{..}</literal> expands to <literal>{M.a=a,M.b=b}</literal>,
1660 omitting <literal>c</literal> since it is not in scope at all.
1661 </para></listitem>
1662 </itemizedlist>
1663 </para>
1664
1665 </sect2>
1666
1667     <!-- ===================== Local fixity declarations ===================  -->
1668
1669 <sect2 id="local-fixity-declarations">
1670 <title>Local Fixity Declarations
1671 </title>
1672
1673 <para>A careful reading of the Haskell 98 Report reveals that fixity
1674 declarations (<literal>infix</literal>, <literal>infixl</literal>, and
1675 <literal>infixr</literal>) are permitted to appear inside local bindings
1676 such those introduced by <literal>let</literal> and
1677 <literal>where</literal>.  However, the Haskell Report does not specify
1678 the semantics of such bindings very precisely.
1679 </para>
1680
1681 <para>In GHC, a fixity declaration may accompany a local binding:
1682 <programlisting>
1683 let f = ...
1684     infixr 3 `f`
1685 in
1686     ...
1687 </programlisting>
1688 and the fixity declaration applies wherever the binding is in scope.
1689 For example, in a <literal>let</literal>, it applies in the right-hand
1690 sides of other <literal>let</literal>-bindings and the body of the
1691 <literal>let</literal>C. Or, in recursive <literal>do</literal>
1692 expressions (<xref linkend="recursive-do-notation"/>), the local fixity
1693 declarations of a <literal>let</literal> statement scope over other
1694 statements in the group, just as the bound name does.
1695 </para>
1696
1697 <para>
1698 Moreover, a local fixity declaration *must* accompany a local binding of
1699 that name: it is not possible to revise the fixity of name bound
1700 elsewhere, as in
1701 <programlisting>
1702 let infixr 9 $ in ...
1703 </programlisting>
1704
1705 Because local fixity declarations are technically Haskell 98, no flag is
1706 necessary to enable them.
1707 </para>
1708 </sect2>
1709
1710 <sect2 id="package-imports">
1711   <title>Package-qualified imports</title>
1712
1713   <para>With the <option>-XPackageImports</option> flag, GHC allows
1714   import declarations to be qualified by the package name that the
1715     module is intended to be imported from.  For example:</para>
1716
1717 <programlisting>
1718 import "network" Network.Socket
1719 </programlisting>
1720
1721   <para>would import the module <literal>Network.Socket</literal> from
1722     the package <literal>network</literal> (any version).  This may
1723     be used to disambiguate an import when the same module is
1724     available from multiple packages, or is present in both the
1725     current package being built and an external package.</para>
1726
1727   <para>Note: you probably don't need to use this feature, it was
1728     added mainly so that we can build backwards-compatible versions of
1729     packages when APIs change.  It can lead to fragile dependencies in
1730     the common case: modules occasionally move from one package to
1731     another, rendering any package-qualified imports broken.</para>
1732 </sect2>
1733
1734 <sect2 id="syntax-stolen">
1735 <title>Summary of stolen syntax</title>
1736
1737     <para>Turning on an option that enables special syntax
1738     <emphasis>might</emphasis> cause working Haskell 98 code to fail
1739     to compile, perhaps because it uses a variable name which has
1740     become a reserved word.  This section lists the syntax that is
1741     "stolen" by language extensions.
1742      We use
1743     notation and nonterminal names from the Haskell 98 lexical syntax
1744     (see the Haskell 98 Report).
1745     We only list syntax changes here that might affect
1746     existing working programs (i.e. "stolen" syntax).  Many of these
1747     extensions will also enable new context-free syntax, but in all
1748     cases programs written to use the new syntax would not be
1749     compilable without the option enabled.</para>
1750
1751 <para>There are two classes of special
1752     syntax:
1753
1754     <itemizedlist>
1755       <listitem>
1756         <para>New reserved words and symbols: character sequences
1757         which are no longer available for use as identifiers in the
1758         program.</para>
1759       </listitem>
1760       <listitem>
1761         <para>Other special syntax: sequences of characters that have
1762         a different meaning when this particular option is turned
1763         on.</para>
1764       </listitem>
1765     </itemizedlist>
1766
1767 The following syntax is stolen:
1768
1769     <variablelist>
1770       <varlistentry>
1771         <term>
1772           <literal>forall</literal>
1773           <indexterm><primary><literal>forall</literal></primary></indexterm>
1774         </term>
1775         <listitem><para>
1776         Stolen (in types) by: <option>-XExplicitForAll</option>, and hence by
1777             <option>-XScopedTypeVariables</option>,
1778             <option>-XLiberalTypeSynonyms</option>,
1779             <option>-XRank2Types</option>,
1780             <option>-XRankNTypes</option>,
1781             <option>-XPolymorphicComponents</option>,
1782             <option>-XExistentialQuantification</option>
1783           </para></listitem>
1784       </varlistentry>
1785
1786       <varlistentry>
1787         <term>
1788           <literal>mdo</literal>
1789           <indexterm><primary><literal>mdo</literal></primary></indexterm>
1790         </term>
1791         <listitem><para>
1792         Stolen by: <option>-XRecursiveDo</option>,
1793           </para></listitem>
1794       </varlistentry>
1795
1796       <varlistentry>
1797         <term>
1798           <literal>foreign</literal>
1799           <indexterm><primary><literal>foreign</literal></primary></indexterm>
1800         </term>
1801         <listitem><para>
1802         Stolen by: <option>-XForeignFunctionInterface</option>,
1803           </para></listitem>
1804       </varlistentry>
1805
1806       <varlistentry>
1807         <term>
1808           <literal>rec</literal>,
1809           <literal>proc</literal>, <literal>-&lt;</literal>,
1810           <literal>&gt;-</literal>, <literal>-&lt;&lt;</literal>,
1811           <literal>&gt;&gt;-</literal>, and <literal>(|</literal>,
1812           <literal>|)</literal> brackets
1813           <indexterm><primary><literal>proc</literal></primary></indexterm>
1814         </term>
1815         <listitem><para>
1816         Stolen by: <option>-XArrows</option>,
1817           </para></listitem>
1818       </varlistentry>
1819
1820       <varlistentry>
1821         <term>
1822           <literal>?<replaceable>varid</replaceable></literal>,
1823           <literal>%<replaceable>varid</replaceable></literal>
1824           <indexterm><primary>implicit parameters</primary></indexterm>
1825         </term>
1826         <listitem><para>
1827         Stolen by: <option>-XImplicitParams</option>,
1828           </para></listitem>
1829       </varlistentry>
1830
1831       <varlistentry>
1832         <term>
1833           <literal>[|</literal>,
1834           <literal>[e|</literal>, <literal>[p|</literal>,
1835           <literal>[d|</literal>, <literal>[t|</literal>,
1836           <literal>$(</literal>,
1837           <literal>$<replaceable>varid</replaceable></literal>
1838           <indexterm><primary>Template Haskell</primary></indexterm>
1839         </term>
1840         <listitem><para>
1841         Stolen by: <option>-XTemplateHaskell</option>,
1842           </para></listitem>
1843       </varlistentry>
1844
1845       <varlistentry>
1846         <term>
1847           <literal>[:<replaceable>varid</replaceable>|</literal>
1848           <indexterm><primary>quasi-quotation</primary></indexterm>
1849         </term>
1850         <listitem><para>
1851         Stolen by: <option>-XQuasiQuotes</option>,
1852           </para></listitem>
1853       </varlistentry>
1854
1855       <varlistentry>
1856         <term>
1857               <replaceable>varid</replaceable>{<literal>&num;</literal>},
1858               <replaceable>char</replaceable><literal>&num;</literal>,
1859               <replaceable>string</replaceable><literal>&num;</literal>,
1860               <replaceable>integer</replaceable><literal>&num;</literal>,
1861               <replaceable>float</replaceable><literal>&num;</literal>,
1862               <replaceable>float</replaceable><literal>&num;&num;</literal>,
1863               <literal>(&num;</literal>, <literal>&num;)</literal>,
1864         </term>
1865         <listitem><para>
1866         Stolen by: <option>-XMagicHash</option>,
1867           </para></listitem>
1868       </varlistentry>
1869     </variablelist>
1870 </para>
1871 </sect2>
1872 </sect1>
1873
1874
1875 <!-- TYPE SYSTEM EXTENSIONS -->
1876 <sect1 id="data-type-extensions">
1877 <title>Extensions to data types and type synonyms</title>
1878
1879 <sect2 id="nullary-types">
1880 <title>Data types with no constructors</title>
1881
1882 <para>With the <option>-fglasgow-exts</option> flag, GHC lets you declare
1883 a data type with no constructors.  For example:</para>
1884
1885 <programlisting>
1886   data S      -- S :: *
1887   data T a    -- T :: * -> *
1888 </programlisting>
1889
1890 <para>Syntactically, the declaration lacks the "= constrs" part.  The
1891 type can be parameterised over types of any kind, but if the kind is
1892 not <literal>*</literal> then an explicit kind annotation must be used
1893 (see <xref linkend="kinding"/>).</para>
1894
1895 <para>Such data types have only one value, namely bottom.
1896 Nevertheless, they can be useful when defining "phantom types".</para>
1897 </sect2>
1898
1899 <sect2 id="infix-tycons">
1900 <title>Infix type constructors, classes, and type variables</title>
1901
1902 <para>
1903 GHC allows type constructors, classes, and type variables to be operators, and
1904 to be written infix, very much like expressions.  More specifically:
1905 <itemizedlist>
1906 <listitem><para>
1907   A type constructor or class can be an operator, beginning with a colon; e.g. <literal>:*:</literal>.
1908   The lexical syntax is the same as that for data constructors.
1909   </para></listitem>
1910 <listitem><para>
1911   Data type and type-synonym declarations can be written infix, parenthesised
1912   if you want further arguments.  E.g.
1913 <screen>
1914   data a :*: b = Foo a b
1915   type a :+: b = Either a b
1916   class a :=: b where ...
1917
1918   data (a :**: b) x = Baz a b x
1919   type (a :++: b) y = Either (a,b) y
1920 </screen>
1921   </para></listitem>
1922 <listitem><para>
1923   Types, and class constraints, can be written infix.  For example
1924   <screen>
1925         x :: Int :*: Bool
1926         f :: (a :=: b) => a -> b
1927   </screen>
1928   </para></listitem>
1929 <listitem><para>
1930   A type variable can be an (unqualified) operator e.g. <literal>+</literal>.
1931   The lexical syntax is the same as that for variable operators, excluding "(.)",
1932   "(!)", and "(*)".  In a binding position, the operator must be
1933   parenthesised.  For example:
1934 <programlisting>
1935    type T (+) = Int + Int
1936    f :: T Either
1937    f = Left 3
1938
1939    liftA2 :: Arrow (~>)
1940           => (a -> b -> c) -> (e ~> a) -> (e ~> b) -> (e ~> c)
1941    liftA2 = ...
1942 </programlisting>
1943   </para></listitem>
1944 <listitem><para>
1945   Back-quotes work
1946   as for expressions, both for type constructors and type variables;  e.g. <literal>Int `Either` Bool</literal>, or
1947   <literal>Int `a` Bool</literal>.  Similarly, parentheses work the same; e.g.  <literal>(:*:) Int Bool</literal>.
1948   </para></listitem>
1949 <listitem><para>
1950   Fixities may be declared for type constructors, or classes, just as for data constructors.  However,
1951   one cannot distinguish between the two in a fixity declaration; a fixity declaration
1952   sets the fixity for a data constructor and the corresponding type constructor.  For example:
1953 <screen>
1954   infixl 7 T, :*:
1955 </screen>
1956   sets the fixity for both type constructor <literal>T</literal> and data constructor <literal>T</literal>,
1957   and similarly for <literal>:*:</literal>.
1958   <literal>Int `a` Bool</literal>.
1959   </para></listitem>
1960 <listitem><para>
1961   Function arrow is <literal>infixr</literal> with fixity 0.  (This might change; I'm not sure what it should be.)
1962   </para></listitem>
1963
1964 </itemizedlist>
1965 </para>
1966 </sect2>
1967
1968 <sect2 id="type-synonyms">
1969 <title>Liberalised type synonyms</title>
1970
1971 <para>
1972 Type synonyms are like macros at the type level, but Haskell 98 imposes many rules
1973 on individual synonym declarations.
1974 With the <option>-XLiberalTypeSynonyms</option> extension,
1975 GHC does validity checking on types <emphasis>only after expanding type synonyms</emphasis>.
1976 That means that GHC can be very much more liberal about type synonyms than Haskell 98.
1977
1978 <itemizedlist>
1979 <listitem> <para>You can write a <literal>forall</literal> (including overloading)
1980 in a type synonym, thus:
1981 <programlisting>
1982   type Discard a = forall b. Show b => a -> b -> (a, String)
1983
1984   f :: Discard a
1985   f x y = (x, show y)
1986
1987   g :: Discard Int -> (Int,String)    -- A rank-2 type
1988   g f = f 3 True
1989 </programlisting>
1990 </para>
1991 </listitem>
1992
1993 <listitem><para>
1994 If you also use <option>-XUnboxedTuples</option>,
1995 you can write an unboxed tuple in a type synonym:
1996 <programlisting>
1997   type Pr = (# Int, Int #)
1998
1999   h :: Int -> Pr
2000   h x = (# x, x #)
2001 </programlisting>
2002 </para></listitem>
2003
2004 <listitem><para>
2005 You can apply a type synonym to a forall type:
2006 <programlisting>
2007   type Foo a = a -> a -> Bool
2008
2009   f :: Foo (forall b. b->b)
2010 </programlisting>
2011 After expanding the synonym, <literal>f</literal> has the legal (in GHC) type:
2012 <programlisting>
2013   f :: (forall b. b->b) -> (forall b. b->b) -> Bool
2014 </programlisting>
2015 </para></listitem>
2016
2017 <listitem><para>
2018 You can apply a type synonym to a partially applied type synonym:
2019 <programlisting>
2020   type Generic i o = forall x. i x -> o x
2021   type Id x = x
2022
2023   foo :: Generic Id []
2024 </programlisting>
2025 After expanding the synonym, <literal>foo</literal> has the legal (in GHC) type:
2026 <programlisting>
2027   foo :: forall x. x -> [x]
2028 </programlisting>
2029 </para></listitem>
2030
2031 </itemizedlist>
2032 </para>
2033
2034 <para>
2035 GHC currently does kind checking before expanding synonyms (though even that
2036 could be changed.)
2037 </para>
2038 <para>
2039 After expanding type synonyms, GHC does validity checking on types, looking for
2040 the following mal-formedness which isn't detected simply by kind checking:
2041 <itemizedlist>
2042 <listitem><para>
2043 Type constructor applied to a type involving for-alls.
2044 </para></listitem>
2045 <listitem><para>
2046 Unboxed tuple on left of an arrow.
2047 </para></listitem>
2048 <listitem><para>
2049 Partially-applied type synonym.
2050 </para></listitem>
2051 </itemizedlist>
2052 So, for example,
2053 this will be rejected:
2054 <programlisting>
2055   type Pr = (# Int, Int #)
2056
2057   h :: Pr -> Int
2058   h x = ...
2059 </programlisting>
2060 because GHC does not allow  unboxed tuples on the left of a function arrow.
2061 </para>
2062 </sect2>
2063
2064
2065 <sect2 id="existential-quantification">
2066 <title>Existentially quantified data constructors
2067 </title>
2068
2069 <para>
2070 The idea of using existential quantification in data type declarations
2071 was suggested by Perry, and implemented in Hope+ (Nigel Perry, <emphasis>The Implementation
2072 of Practical Functional Programming Languages</emphasis>, PhD Thesis, University of
2073 London, 1991). It was later formalised by Laufer and Odersky
2074 (<emphasis>Polymorphic type inference and abstract data types</emphasis>,
2075 TOPLAS, 16(5), pp1411-1430, 1994).
2076 It's been in Lennart
2077 Augustsson's <command>hbc</command> Haskell compiler for several years, and
2078 proved very useful.  Here's the idea.  Consider the declaration:
2079 </para>
2080
2081 <para>
2082
2083 <programlisting>
2084   data Foo = forall a. MkFoo a (a -> Bool)
2085            | Nil
2086 </programlisting>
2087
2088 </para>
2089
2090 <para>
2091 The data type <literal>Foo</literal> has two constructors with types:
2092 </para>
2093
2094 <para>
2095
2096 <programlisting>
2097   MkFoo :: forall a. a -> (a -> Bool) -> Foo
2098   Nil   :: Foo
2099 </programlisting>
2100
2101 </para>
2102
2103 <para>
2104 Notice that the type variable <literal>a</literal> in the type of <function>MkFoo</function>
2105 does not appear in the data type itself, which is plain <literal>Foo</literal>.
2106 For example, the following expression is fine:
2107 </para>
2108
2109 <para>
2110
2111 <programlisting>
2112   [MkFoo 3 even, MkFoo 'c' isUpper] :: [Foo]
2113 </programlisting>
2114
2115 </para>
2116
2117 <para>
2118 Here, <literal>(MkFoo 3 even)</literal> packages an integer with a function
2119 <function>even</function> that maps an integer to <literal>Bool</literal>; and <function>MkFoo 'c'
2120 isUpper</function> packages a character with a compatible function.  These
2121 two things are each of type <literal>Foo</literal> and can be put in a list.
2122 </para>
2123
2124 <para>
2125 What can we do with a value of type <literal>Foo</literal>?.  In particular,
2126 what happens when we pattern-match on <function>MkFoo</function>?
2127 </para>
2128
2129 <para>
2130
2131 <programlisting>
2132   f (MkFoo val fn) = ???
2133 </programlisting>
2134
2135 </para>
2136
2137 <para>
2138 Since all we know about <literal>val</literal> and <function>fn</function> is that they
2139 are compatible, the only (useful) thing we can do with them is to
2140 apply <function>fn</function> to <literal>val</literal> to get a boolean.  For example:
2141 </para>
2142
2143 <para>
2144
2145 <programlisting>
2146   f :: Foo -> Bool
2147   f (MkFoo val fn) = fn val
2148 </programlisting>
2149
2150 </para>
2151
2152 <para>
2153 What this allows us to do is to package heterogeneous values
2154 together with a bunch of functions that manipulate them, and then treat
2155 that collection of packages in a uniform manner.  You can express
2156 quite a bit of object-oriented-like programming this way.
2157 </para>
2158
2159 <sect3 id="existential">
2160 <title>Why existential?
2161 </title>
2162
2163 <para>
2164 What has this to do with <emphasis>existential</emphasis> quantification?
2165 Simply that <function>MkFoo</function> has the (nearly) isomorphic type
2166 </para>
2167
2168 <para>
2169
2170 <programlisting>
2171   MkFoo :: (exists a . (a, a -> Bool)) -> Foo
2172 </programlisting>
2173
2174 </para>
2175
2176 <para>
2177 But Haskell programmers can safely think of the ordinary
2178 <emphasis>universally</emphasis> quantified type given above, thereby avoiding
2179 adding a new existential quantification construct.
2180 </para>
2181
2182 </sect3>
2183
2184 <sect3 id="existential-with-context">
2185 <title>Existentials and type classes</title>
2186
2187 <para>
2188 An easy extension is to allow
2189 arbitrary contexts before the constructor.  For example:
2190 </para>
2191
2192 <para>
2193
2194 <programlisting>
2195 data Baz = forall a. Eq a => Baz1 a a
2196          | forall b. Show b => Baz2 b (b -> b)
2197 </programlisting>
2198
2199 </para>
2200
2201 <para>
2202 The two constructors have the types you'd expect:
2203 </para>
2204
2205 <para>
2206
2207 <programlisting>
2208 Baz1 :: forall a. Eq a => a -> a -> Baz
2209 Baz2 :: forall b. Show b => b -> (b -> b) -> Baz
2210 </programlisting>
2211
2212 </para>
2213
2214 <para>
2215 But when pattern matching on <function>Baz1</function> the matched values can be compared
2216 for equality, and when pattern matching on <function>Baz2</function> the first matched
2217 value can be converted to a string (as well as applying the function to it).
2218 So this program is legal:
2219 </para>
2220
2221 <para>
2222
2223 <programlisting>
2224   f :: Baz -> String
2225   f (Baz1 p q) | p == q    = "Yes"
2226                | otherwise = "No"
2227   f (Baz2 v fn)            = show (fn v)
2228 </programlisting>
2229
2230 </para>
2231
2232 <para>
2233 Operationally, in a dictionary-passing implementation, the
2234 constructors <function>Baz1</function> and <function>Baz2</function> must store the
2235 dictionaries for <literal>Eq</literal> and <literal>Show</literal> respectively, and
2236 extract it on pattern matching.
2237 </para>
2238
2239 </sect3>
2240
2241 <sect3 id="existential-records">
2242 <title>Record Constructors</title>
2243
2244 <para>
2245 GHC allows existentials to be used with records syntax as well.  For example:
2246
2247 <programlisting>
2248 data Counter a = forall self. NewCounter
2249     { _this    :: self
2250     , _inc     :: self -> self
2251     , _display :: self -> IO ()
2252     , tag      :: a
2253     }
2254 </programlisting>
2255 Here <literal>tag</literal> is a public field, with a well-typed selector
2256 function <literal>tag :: Counter a -> a</literal>.  The <literal>self</literal>
2257 type is hidden from the outside; any attempt to apply <literal>_this</literal>,
2258 <literal>_inc</literal> or <literal>_display</literal> as functions will raise a
2259 compile-time error.  In other words, <emphasis>GHC defines a record selector function
2260 only for fields whose type does not mention the existentially-quantified variables</emphasis>.
2261 (This example used an underscore in the fields for which record selectors
2262 will not be defined, but that is only programming style; GHC ignores them.)
2263 </para>
2264
2265 <para>
2266 To make use of these hidden fields, we need to create some helper functions:
2267
2268 <programlisting>
2269 inc :: Counter a -> Counter a
2270 inc (NewCounter x i d t) = NewCounter
2271     { _this = i x, _inc = i, _display = d, tag = t }
2272
2273 display :: Counter a -> IO ()
2274 display NewCounter{ _this = x, _display = d } = d x
2275 </programlisting>
2276
2277 Now we can define counters with different underlying implementations:
2278
2279 <programlisting>
2280 counterA :: Counter String
2281 counterA = NewCounter
2282     { _this = 0, _inc = (1+), _display = print, tag = "A" }
2283
2284 counterB :: Counter String
2285 counterB = NewCounter
2286     { _this = "", _inc = ('#':), _display = putStrLn, tag = "B" }
2287
2288 main = do
2289     display (inc counterA)         -- prints "1"
2290     display (inc (inc counterB))   -- prints "##"
2291 </programlisting>
2292
2293 Record update syntax is supported for existentials (and GADTs):
2294 <programlisting>
2295 setTag :: Counter a -> a -> Counter a
2296 setTag obj t = obj{ tag = t }
2297 </programlisting>
2298 The rule for record update is this: <emphasis>
2299 the types of the updated fields may
2300 mention only the universally-quantified type variables
2301 of the data constructor.  For GADTs, the field may mention only types
2302 that appear as a simple type-variable argument in the constructor's result
2303 type</emphasis>.  For example:
2304 <programlisting>
2305 data T a b where { T1 { f1::a, f2::b, f3::(b,c) } :: T a b } -- c is existential
2306 upd1 t x = t { f1=x }   -- OK:   upd1 :: T a b -> a' -> T a' b
2307 upd2 t x = t { f3=x }   -- BAD   (f3's type mentions c, which is
2308                         --        existentially quantified)
2309
2310 data G a b where { G1 { g1::a, g2::c } :: G a [c] }
2311 upd3 g x = g { g1=x }   -- OK:   upd3 :: G a b -> c -> G c b
2312 upd4 g x = g { g2=x }   -- BAD (f2's type mentions c, which is not a simple
2313                         --      type-variable argument in G1's result type)
2314 </programlisting>
2315 </para>
2316
2317 </sect3>
2318
2319
2320 <sect3>
2321 <title>Restrictions</title>
2322
2323 <para>
2324 There are several restrictions on the ways in which existentially-quantified
2325 constructors can be use.
2326 </para>
2327
2328 <para>
2329
2330 <itemizedlist>
2331 <listitem>
2332
2333 <para>
2334  When pattern matching, each pattern match introduces a new,
2335 distinct, type for each existential type variable.  These types cannot
2336 be unified with any other type, nor can they escape from the scope of
2337 the pattern match.  For example, these fragments are incorrect:
2338
2339
2340 <programlisting>
2341 f1 (MkFoo a f) = a
2342 </programlisting>
2343
2344
2345 Here, the type bound by <function>MkFoo</function> "escapes", because <literal>a</literal>
2346 is the result of <function>f1</function>.  One way to see why this is wrong is to
2347 ask what type <function>f1</function> has:
2348
2349
2350 <programlisting>
2351   f1 :: Foo -> a             -- Weird!
2352 </programlisting>
2353
2354
2355 What is this "<literal>a</literal>" in the result type? Clearly we don't mean
2356 this:
2357
2358
2359 <programlisting>
2360   f1 :: forall a. Foo -> a   -- Wrong!
2361 </programlisting>
2362
2363
2364 The original program is just plain wrong.  Here's another sort of error
2365
2366
2367 <programlisting>
2368   f2 (Baz1 a b) (Baz1 p q) = a==q
2369 </programlisting>
2370
2371
2372 It's ok to say <literal>a==b</literal> or <literal>p==q</literal>, but
2373 <literal>a==q</literal> is wrong because it equates the two distinct types arising
2374 from the two <function>Baz1</function> constructors.
2375
2376
2377 </para>
2378 </listitem>
2379 <listitem>
2380
2381 <para>
2382 You can't pattern-match on an existentially quantified
2383 constructor in a <literal>let</literal> or <literal>where</literal> group of
2384 bindings. So this is illegal:
2385
2386
2387 <programlisting>
2388   f3 x = a==b where { Baz1 a b = x }
2389 </programlisting>
2390
2391 Instead, use a <literal>case</literal> expression:
2392
2393 <programlisting>
2394   f3 x = case x of Baz1 a b -> a==b
2395 </programlisting>
2396
2397 In general, you can only pattern-match
2398 on an existentially-quantified constructor in a <literal>case</literal> expression or
2399 in the patterns of a function definition.
2400
2401 The reason for this restriction is really an implementation one.
2402 Type-checking binding groups is already a nightmare without
2403 existentials complicating the picture.  Also an existential pattern
2404 binding at the top level of a module doesn't make sense, because it's
2405 not clear how to prevent the existentially-quantified type "escaping".
2406 So for now, there's a simple-to-state restriction.  We'll see how
2407 annoying it is.
2408
2409 </para>
2410 </listitem>
2411 <listitem>
2412
2413 <para>
2414 You can't use existential quantification for <literal>newtype</literal>
2415 declarations.  So this is illegal:
2416
2417
2418 <programlisting>
2419   newtype T = forall a. Ord a => MkT a
2420 </programlisting>
2421
2422
2423 Reason: a value of type <literal>T</literal> must be represented as a
2424 pair of a dictionary for <literal>Ord t</literal> and a value of type
2425 <literal>t</literal>.  That contradicts the idea that
2426 <literal>newtype</literal> should have no concrete representation.
2427 You can get just the same efficiency and effect by using
2428 <literal>data</literal> instead of <literal>newtype</literal>.  If
2429 there is no overloading involved, then there is more of a case for
2430 allowing an existentially-quantified <literal>newtype</literal>,
2431 because the <literal>data</literal> version does carry an
2432 implementation cost, but single-field existentially quantified
2433 constructors aren't much use.  So the simple restriction (no
2434 existential stuff on <literal>newtype</literal>) stands, unless there
2435 are convincing reasons to change it.
2436
2437
2438 </para>
2439 </listitem>
2440 <listitem>
2441
2442 <para>
2443  You can't use <literal>deriving</literal> to define instances of a
2444 data type with existentially quantified data constructors.
2445
2446 Reason: in most cases it would not make sense. For example:;
2447
2448 <programlisting>
2449 data T = forall a. MkT [a] deriving( Eq )
2450 </programlisting>
2451
2452 To derive <literal>Eq</literal> in the standard way we would need to have equality
2453 between the single component of two <function>MkT</function> constructors:
2454
2455 <programlisting>
2456 instance Eq T where
2457   (MkT a) == (MkT b) = ???
2458 </programlisting>
2459
2460 But <varname>a</varname> and <varname>b</varname> have distinct types, and so can't be compared.
2461 It's just about possible to imagine examples in which the derived instance
2462 would make sense, but it seems altogether simpler simply to prohibit such
2463 declarations.  Define your own instances!
2464 </para>
2465 </listitem>
2466
2467 </itemizedlist>
2468
2469 </para>
2470
2471 </sect3>
2472 </sect2>
2473
2474 <!-- ====================== Generalised algebraic data types =======================  -->
2475
2476 <sect2 id="gadt-style">
2477 <title>Declaring data types with explicit constructor signatures</title>
2478
2479 <para>GHC allows you to declare an algebraic data type by
2480 giving the type signatures of constructors explicitly.  For example:
2481 <programlisting>
2482   data Maybe a where
2483       Nothing :: Maybe a
2484       Just    :: a -> Maybe a
2485 </programlisting>
2486 The form is called a "GADT-style declaration"
2487 because Generalised Algebraic Data Types, described in <xref linkend="gadt"/>,
2488 can only be declared using this form.</para>
2489 <para>Notice that GADT-style syntax generalises existential types (<xref linkend="existential-quantification"/>).
2490 For example, these two declarations are equivalent:
2491 <programlisting>
2492   data Foo = forall a. MkFoo a (a -> Bool)
2493   data Foo' where { MKFoo :: a -> (a->Bool) -> Foo' }
2494 </programlisting>
2495 </para>
2496 <para>Any data type that can be declared in standard Haskell-98 syntax
2497 can also be declared using GADT-style syntax.
2498 The choice is largely stylistic, but GADT-style declarations differ in one important respect:
2499 they treat class constraints on the data constructors differently.
2500 Specifically, if the constructor is given a type-class context, that
2501 context is made available by pattern matching.  For example:
2502 <programlisting>
2503   data Set a where
2504     MkSet :: Eq a => [a] -> Set a
2505
2506   makeSet :: Eq a => [a] -> Set a
2507   makeSet xs = MkSet (nub xs)
2508
2509   insert :: a -> Set a -> Set a
2510   insert a (MkSet as) | a `elem` as = MkSet as
2511                       | otherwise   = MkSet (a:as)
2512 </programlisting>
2513 A use of <literal>MkSet</literal> as a constructor (e.g. in the definition of <literal>makeSet</literal>)
2514 gives rise to a <literal>(Eq a)</literal>
2515 constraint, as you would expect.  The new feature is that pattern-matching on <literal>MkSet</literal>
2516 (as in the definition of <literal>insert</literal>) makes <emphasis>available</emphasis> an <literal>(Eq a)</literal>
2517 context.  In implementation terms, the <literal>MkSet</literal> constructor has a hidden field that stores
2518 the <literal>(Eq a)</literal> dictionary that is passed to <literal>MkSet</literal>; so
2519 when pattern-matching that dictionary becomes available for the right-hand side of the match.
2520 In the example, the equality dictionary is used to satisfy the equality constraint
2521 generated by the call to <literal>elem</literal>, so that the type of
2522 <literal>insert</literal> itself has no <literal>Eq</literal> constraint.
2523 </para>
2524 <para>
2525 For example, one possible application is to reify dictionaries:
2526 <programlisting>
2527    data NumInst a where
2528      MkNumInst :: Num a => NumInst a
2529
2530    intInst :: NumInst Int
2531    intInst = MkNumInst
2532
2533    plus :: NumInst a -> a -> a -> a
2534    plus MkNumInst p q = p + q
2535 </programlisting>
2536 Here, a value of type <literal>NumInst a</literal> is equivalent
2537 to an explicit <literal>(Num a)</literal> dictionary.
2538 </para>
2539 <para>
2540 All this applies to constructors declared using the syntax of <xref linkend="existential-with-context"/>.
2541 For example, the <literal>NumInst</literal> data type above could equivalently be declared
2542 like this:
2543 <programlisting>
2544    data NumInst a
2545       = Num a => MkNumInst (NumInst a)
2546 </programlisting>
2547 Notice that, unlike the situation when declaring an existential, there is
2548 no <literal>forall</literal>, because the <literal>Num</literal> constrains the
2549 data type's universally quantified type variable <literal>a</literal>.
2550 A constructor may have both universal and existential type variables: for example,
2551 the following two declarations are equivalent:
2552 <programlisting>
2553    data T1 a
2554         = forall b. (Num a, Eq b) => MkT1 a b
2555    data T2 a where
2556         MkT2 :: (Num a, Eq b) => a -> b -> T2 a
2557 </programlisting>
2558 </para>
2559 <para>All this behaviour contrasts with Haskell 98's peculiar treatment of
2560 contexts on a data type declaration (Section 4.2.1 of the Haskell 98 Report).
2561 In Haskell 98 the definition
2562 <programlisting>
2563   data Eq a => Set' a = MkSet' [a]
2564 </programlisting>
2565 gives <literal>MkSet'</literal> the same type as <literal>MkSet</literal> above.  But instead of
2566 <emphasis>making available</emphasis> an <literal>(Eq a)</literal> constraint, pattern-matching
2567 on <literal>MkSet'</literal> <emphasis>requires</emphasis> an <literal>(Eq a)</literal> constraint!
2568 GHC faithfully implements this behaviour, odd though it is.  But for GADT-style declarations,
2569 GHC's behaviour is much more useful, as well as much more intuitive.
2570 </para>
2571
2572 <para>
2573 The rest of this section gives further details about GADT-style data
2574 type declarations.
2575
2576 <itemizedlist>
2577 <listitem><para>
2578 The result type of each data constructor must begin with the type constructor being defined.
2579 If the result type of all constructors
2580 has the form <literal>T a1 ... an</literal>, where <literal>a1 ... an</literal>
2581 are distinct type variables, then the data type is <emphasis>ordinary</emphasis>;
2582 otherwise is a <emphasis>generalised</emphasis> data type (<xref linkend="gadt"/>).
2583 </para></listitem>
2584
2585 <listitem><para>
2586 As with other type signatures, you can give a single signature for several data constructors.
2587 In this example we give a single signature for <literal>T1</literal> and <literal>T2</literal>:
2588 <programlisting>
2589   data T a where
2590     T1,T2 :: a -> T a
2591     T3 :: T a
2592 </programlisting>
2593 </para></listitem>
2594
2595 <listitem><para>
2596 The type signature of
2597 each constructor is independent, and is implicitly universally quantified as usual.
2598 In particular, the type variable(s) in the "<literal>data T a where</literal>" header
2599 have no scope, and different constructors may have different universally-quantified type variables:
2600 <programlisting>
2601   data T a where        -- The 'a' has no scope
2602     T1,T2 :: b -> T b   -- Means forall b. b -> T b
2603     T3 :: T a           -- Means forall a. T a
2604 </programlisting>
2605 </para></listitem>
2606
2607 <listitem><para>
2608 A constructor signature may mention type class constraints, which can differ for
2609 different constructors.  For example, this is fine:
2610 <programlisting>
2611   data T a where
2612     T1 :: Eq b => b -> b -> T b
2613     T2 :: (Show c, Ix c) => c -> [c] -> T c
2614 </programlisting>
2615 When patten matching, these constraints are made available to discharge constraints
2616 in the body of the match. For example:
2617 <programlisting>
2618   f :: T a -> String
2619   f (T1 x y) | x==y      = "yes"
2620              | otherwise = "no"
2621   f (T2 a b)             = show a
2622 </programlisting>
2623 Note that <literal>f</literal> is not overloaded; the <literal>Eq</literal> constraint arising
2624 from the use of <literal>==</literal> is discharged by the pattern match on <literal>T1</literal>
2625 and similarly the <literal>Show</literal> constraint arising from the use of <literal>show</literal>.
2626 </para></listitem>
2627
2628 <listitem><para>
2629 Unlike a Haskell-98-style
2630 data type declaration, the type variable(s) in the "<literal>data Set a where</literal>" header
2631 have no scope.  Indeed, one can write a kind signature instead:
2632 <programlisting>
2633   data Set :: * -> * where ...
2634 </programlisting>
2635 or even a mixture of the two:
2636 <programlisting>
2637   data Bar a :: (* -> *) -> * where ...
2638 </programlisting>
2639 The type variables (if given) may be explicitly kinded, so we could also write the header for <literal>Foo</literal>
2640 like this:
2641 <programlisting>
2642   data Bar a (b :: * -> *) where ...
2643 </programlisting>
2644 </para></listitem>
2645
2646
2647 <listitem><para>
2648 You can use strictness annotations, in the obvious places
2649 in the constructor type:
2650 <programlisting>
2651   data Term a where
2652       Lit    :: !Int -> Term Int
2653       If     :: Term Bool -> !(Term a) -> !(Term a) -> Term a
2654       Pair   :: Term a -> Term b -> Term (a,b)
2655 </programlisting>
2656 </para></listitem>
2657
2658 <listitem><para>
2659 You can use a <literal>deriving</literal> clause on a GADT-style data type
2660 declaration.   For example, these two declarations are equivalent
2661 <programlisting>
2662   data Maybe1 a where {
2663       Nothing1 :: Maybe1 a ;
2664       Just1    :: a -> Maybe1 a
2665     } deriving( Eq, Ord )
2666
2667   data Maybe2 a = Nothing2 | Just2 a
2668        deriving( Eq, Ord )
2669 </programlisting>
2670 </para></listitem>
2671
2672 <listitem><para>
2673 The type signature may have quantified type variables that do not appear
2674 in the result type:
2675 <programlisting>
2676   data Foo where
2677      MkFoo :: a -> (a->Bool) -> Foo
2678      Nil   :: Foo
2679 </programlisting>
2680 Here the type variable <literal>a</literal> does not appear in the result type
2681 of either constructor.
2682 Although it is universally quantified in the type of the constructor, such
2683 a type variable is often called "existential".
2684 Indeed, the above declaration declares precisely the same type as
2685 the <literal>data Foo</literal> in <xref linkend="existential-quantification"/>.
2686 </para><para>
2687 The type may contain a class context too, of course:
2688 <programlisting>
2689   data Showable where
2690     MkShowable :: Show a => a -> Showable
2691 </programlisting>
2692 </para></listitem>
2693
2694 <listitem><para>
2695 You can use record syntax on a GADT-style data type declaration:
2696
2697 <programlisting>
2698   data Person where
2699       Adult :: { name :: String, children :: [Person] } -> Person
2700       Child :: Show a => { name :: !String, funny :: a } -> Person
2701 </programlisting>
2702 As usual, for every constructor that has a field <literal>f</literal>, the type of
2703 field <literal>f</literal> must be the same (modulo alpha conversion).
2704 The <literal>Child</literal> constructor above shows that the signature
2705 may have a context, existentially-quantified variables, and strictness annotations,
2706 just as in the non-record case.  (NB: the "type" that follows the double-colon
2707 is not really a type, because of the record syntax and strictness annotations.
2708 A "type" of this form can appear only in a constructor signature.)
2709 </para></listitem>
2710
2711 <listitem><para>
2712 Record updates are allowed with GADT-style declarations,
2713 only fields that have the following property: the type of the field
2714 mentions no existential type variables.
2715 </para></listitem>
2716
2717 <listitem><para>
2718 As in the case of existentials declared using the Haskell-98-like record syntax
2719 (<xref linkend="existential-records"/>),
2720 record-selector functions are generated only for those fields that have well-typed
2721 selectors.
2722 Here is the example of that section, in GADT-style syntax:
2723 <programlisting>
2724 data Counter a where
2725     NewCounter { _this    :: self
2726                , _inc     :: self -> self
2727                , _display :: self -> IO ()
2728                , tag      :: a
2729                }
2730         :: Counter a
2731 </programlisting>
2732 As before, only one selector function is generated here, that for <literal>tag</literal>.
2733 Nevertheless, you can still use all the field names in pattern matching and record construction.
2734 </para></listitem>
2735 </itemizedlist></para>
2736 </sect2>
2737
2738 <sect2 id="gadt">
2739 <title>Generalised Algebraic Data Types (GADTs)</title>
2740
2741 <para>Generalised Algebraic Data Types generalise ordinary algebraic data types
2742 by allowing constructors to have richer return types.  Here is an example:
2743 <programlisting>
2744   data Term a where
2745       Lit    :: Int -> Term Int
2746       Succ   :: Term Int -> Term Int
2747       IsZero :: Term Int -> Term Bool
2748       If     :: Term Bool -> Term a -> Term a -> Term a
2749       Pair   :: Term a -> Term b -> Term (a,b)
2750 </programlisting>
2751 Notice that the return type of the constructors is not always <literal>Term a</literal>, as is the
2752 case with ordinary data types.  This generality allows us to
2753 write a well-typed <literal>eval</literal> function
2754 for these <literal>Terms</literal>:
2755 <programlisting>
2756   eval :: Term a -> a
2757   eval (Lit i)      = i
2758   eval (Succ t)     = 1 + eval t
2759   eval (IsZero t)   = eval t == 0
2760   eval (If b e1 e2) = if eval b then eval e1 else eval e2
2761   eval (Pair e1 e2) = (eval e1, eval e2)
2762 </programlisting>
2763 The key point about GADTs is that <emphasis>pattern matching causes type refinement</emphasis>.
2764 For example, in the right hand side of the equation
2765 <programlisting>
2766   eval :: Term a -> a
2767   eval (Lit i) =  ...
2768 </programlisting>
2769 the type <literal>a</literal> is refined to <literal>Int</literal>.  That's the whole point!
2770 A precise specification of the type rules is beyond what this user manual aspires to,
2771 but the design closely follows that described in
2772 the paper <ulink
2773 url="http://research.microsoft.com/%7Esimonpj/papers/gadt/">Simple
2774 unification-based type inference for GADTs</ulink>,
2775 (ICFP 2006).
2776 The general principle is this: <emphasis>type refinement is only carried out
2777 based on user-supplied type annotations</emphasis>.
2778 So if no type signature is supplied for <literal>eval</literal>, no type refinement happens,
2779 and lots of obscure error messages will
2780 occur.  However, the refinement is quite general.  For example, if we had:
2781 <programlisting>
2782   eval :: Term a -> a -> a
2783   eval (Lit i) j =  i+j
2784 </programlisting>
2785 the pattern match causes the type <literal>a</literal> to be refined to <literal>Int</literal> (because of the type
2786 of the constructor <literal>Lit</literal>), and that refinement also applies to the type of <literal>j</literal>, and
2787 the result type of the <literal>case</literal> expression.  Hence the addition <literal>i+j</literal> is legal.
2788 </para>
2789 <para>
2790 These and many other examples are given in papers by Hongwei Xi, and
2791 Tim Sheard. There is a longer introduction
2792 <ulink url="http://www.haskell.org/haskellwiki/GADT">on the wiki</ulink>,
2793 and Ralf Hinze's
2794 <ulink url="http://www.informatik.uni-bonn.de/~ralf/publications/With.pdf">Fun with phantom types</ulink> also has a number of examples. Note that papers
2795 may use different notation to that implemented in GHC.
2796 </para>
2797 <para>
2798 The rest of this section outlines the extensions to GHC that support GADTs.   The extension is enabled with
2799 <option>-XGADTs</option>.  The <option>-XGADTs</option> flag also sets <option>-XRelaxedPolyRec</option>.
2800 <itemizedlist>
2801 <listitem><para>
2802 A GADT can only be declared using GADT-style syntax (<xref linkend="gadt-style"/>);
2803 the old Haskell-98 syntax for data declarations always declares an ordinary data type.
2804 The result type of each constructor must begin with the type constructor being defined,
2805 but for a GADT the arguments to the type constructor can be arbitrary monotypes.
2806 For example, in the <literal>Term</literal> data
2807 type above, the type of each constructor must end with <literal>Term ty</literal>, but
2808 the <literal>ty</literal> need not be a type variable (e.g. the <literal>Lit</literal>
2809 constructor).
2810 </para></listitem>
2811
2812 <listitem><para>
2813 It is permitted to declare an ordinary algebraic data type using GADT-style syntax.
2814 What makes a GADT into a GADT is not the syntax, but rather the presence of data constructors
2815 whose result type is not just <literal>T a b</literal>.
2816 </para></listitem>
2817
2818 <listitem><para>
2819 You cannot use a <literal>deriving</literal> clause for a GADT; only for
2820 an ordinary data type.
2821 </para></listitem>
2822
2823 <listitem><para>
2824 As mentioned in <xref linkend="gadt-style"/>, record syntax is supported.
2825 For example:
2826 <programlisting>
2827   data Term a where
2828       Lit    { val  :: Int }      :: Term Int
2829       Succ   { num  :: Term Int } :: Term Int
2830       Pred   { num  :: Term Int } :: Term Int
2831       IsZero { arg  :: Term Int } :: Term Bool
2832       Pair   { arg1 :: Term a
2833              , arg2 :: Term b
2834              }                    :: Term (a,b)
2835       If     { cnd  :: Term Bool
2836              , tru  :: Term a
2837              , fls  :: Term a
2838              }                    :: Term a
2839 </programlisting>
2840 However, for GADTs there is the following additional constraint:
2841 every constructor that has a field <literal>f</literal> must have
2842 the same result type (modulo alpha conversion)
2843 Hence, in the above example, we cannot merge the <literal>num</literal>
2844 and <literal>arg</literal> fields above into a
2845 single name.  Although their field types are both <literal>Term Int</literal>,
2846 their selector functions actually have different types:
2847
2848 <programlisting>
2849   num :: Term Int -> Term Int
2850   arg :: Term Bool -> Term Int
2851 </programlisting>
2852 </para></listitem>
2853
2854 <listitem><para>
2855 When pattern-matching against data constructors drawn from a GADT,
2856 for example in a <literal>case</literal> expression, the following rules apply:
2857 <itemizedlist>
2858 <listitem><para>The type of the scrutinee must be rigid.</para></listitem>
2859 <listitem><para>The type of the entire <literal>case</literal> expression must be rigid.</para></listitem>
2860 <listitem><para>The type of any free variable mentioned in any of
2861 the <literal>case</literal> alternatives must be rigid.</para></listitem>
2862 </itemizedlist>
2863 A type is "rigid" if it is completely known to the compiler at its binding site.  The easiest
2864 way to ensure that a variable a rigid type is to give it a type signature.
2865 For more precise details see <ulink url="http://research.microsoft.com/%7Esimonpj/papers/gadt">
2866 Simple unification-based type inference for GADTs
2867 </ulink>. The criteria implemented by GHC are given in the Appendix.
2868
2869 </para></listitem>
2870
2871 </itemizedlist>
2872 </para>
2873
2874 </sect2>
2875 </sect1>
2876
2877 <!-- ====================== End of Generalised algebraic data types =======================  -->
2878
2879 <sect1 id="deriving">
2880 <title>Extensions to the "deriving" mechanism</title>
2881
2882 <sect2 id="deriving-inferred">
2883 <title>Inferred context for deriving clauses</title>
2884
2885 <para>
2886 The Haskell Report is vague about exactly when a <literal>deriving</literal> clause is
2887 legal.  For example:
2888 <programlisting>
2889   data T0 f a = MkT0 a         deriving( Eq )
2890   data T1 f a = MkT1 (f a)     deriving( Eq )
2891   data T2 f a = MkT2 (f (f a)) deriving( Eq )
2892 </programlisting>
2893 The natural generated <literal>Eq</literal> code would result in these instance declarations:
2894 <programlisting>
2895   instance Eq a         => Eq (T0 f a) where ...
2896   instance Eq (f a)     => Eq (T1 f a) where ...
2897   instance Eq (f (f a)) => Eq (T2 f a) where ...
2898 </programlisting>
2899 The first of these is obviously fine. The second is still fine, although less obviously.
2900 The third is not Haskell 98, and risks losing termination of instances.
2901 </para>
2902 <para>
2903 GHC takes a conservative position: it accepts the first two, but not the third.  The  rule is this:
2904 each constraint in the inferred instance context must consist only of type variables,
2905 with no repetitions.
2906 </para>
2907 <para>
2908 This rule is applied regardless of flags.  If you want a more exotic context, you can write
2909 it yourself, using the <link linkend="stand-alone-deriving">standalone deriving mechanism</link>.
2910 </para>
2911 </sect2>
2912
2913 <sect2 id="stand-alone-deriving">
2914 <title>Stand-alone deriving declarations</title>
2915
2916 <para>
2917 GHC now allows stand-alone <literal>deriving</literal> declarations, enabled by <literal>-XStandaloneDeriving</literal>:
2918 <programlisting>
2919   data Foo a = Bar a | Baz String
2920
2921   deriving instance Eq a => Eq (Foo a)
2922 </programlisting>
2923 The syntax is identical to that of an ordinary instance declaration apart from (a) the keyword
2924 <literal>deriving</literal>, and (b) the absence of the <literal>where</literal> part.
2925 Note the following points:
2926 <itemizedlist>
2927 <listitem><para>
2928 You must supply an explicit context (in the example the context is <literal>(Eq a)</literal>),
2929 exactly as you would in an ordinary instance declaration.
2930 (In contrast, in a <literal>deriving</literal> clause
2931 attached to a data type declaration, the context is inferred.)
2932 </para></listitem>
2933
2934 <listitem><para>
2935 A <literal>deriving instance</literal> declaration
2936 must obey the same rules concerning form and termination as ordinary instance declarations,
2937 controlled by the same flags; see <xref linkend="instance-decls"/>.
2938 </para></listitem>
2939
2940 <listitem><para>
2941 Unlike a <literal>deriving</literal>
2942 declaration attached to a <literal>data</literal> declaration, the instance can be more specific
2943 than the data type (assuming you also use
2944 <literal>-XFlexibleInstances</literal>, <xref linkend="instance-rules"/>).  Consider
2945 for example
2946 <programlisting>
2947   data Foo a = Bar a | Baz String
2948
2949   deriving instance Eq a => Eq (Foo [a])
2950   deriving instance Eq a => Eq (Foo (Maybe a))
2951 </programlisting>
2952 This will generate a derived instance for <literal>(Foo [a])</literal> and <literal>(Foo (Maybe a))</literal>,
2953 but other types such as <literal>(Foo (Int,Bool))</literal> will not be an instance of <literal>Eq</literal>.
2954 </para></listitem>
2955
2956 <listitem><para>
2957 Unlike a <literal>deriving</literal>
2958 declaration attached to a <literal>data</literal> declaration,
2959 GHC does not restrict the form of the data type.  Instead, GHC simply generates the appropriate
2960 boilerplate code for the specified class, and typechecks it. If there is a type error, it is
2961 your problem. (GHC will show you the offending code if it has a type error.)
2962 The merit of this is that you can derive instances for GADTs and other exotic
2963 data types, providing only that the boilerplate code does indeed typecheck.  For example:
2964 <programlisting>
2965   data T a where
2966      T1 :: T Int
2967      T2 :: T Bool
2968
2969   deriving instance Show (T a)
2970 </programlisting>
2971 In this example, you cannot say <literal>... deriving( Show )</literal> on the
2972 data type declaration for <literal>T</literal>,
2973 because <literal>T</literal> is a GADT, but you <emphasis>can</emphasis> generate
2974 the instance declaration using stand-alone deriving.
2975 </para>
2976 </listitem>
2977
2978 <listitem>
2979 <para>The stand-alone syntax is generalised for newtypes in exactly the same
2980 way that ordinary <literal>deriving</literal> clauses are generalised (<xref linkend="newtype-deriving"/>).
2981 For example:
2982 <programlisting>
2983   newtype Foo a = MkFoo (State Int a)
2984
2985   deriving instance MonadState Int Foo
2986 </programlisting>
2987 GHC always treats the <emphasis>last</emphasis> parameter of the instance
2988 (<literal>Foo</literal> in this example) as the type whose instance is being derived.
2989 </para></listitem>
2990 </itemizedlist></para>
2991
2992 </sect2>
2993
2994
2995 <sect2 id="deriving-typeable">
2996 <title>Deriving clause for extra classes (<literal>Typeable</literal>, <literal>Data</literal>, etc)</title>
2997
2998 <para>
2999 Haskell 98 allows the programmer to add "<literal>deriving( Eq, Ord )</literal>" to a data type
3000 declaration, to generate a standard instance declaration for classes specified in the <literal>deriving</literal> clause.
3001 In Haskell 98, the only classes that may appear in the <literal>deriving</literal> clause are the standard
3002 classes <literal>Eq</literal>, <literal>Ord</literal>,
3003 <literal>Enum</literal>, <literal>Ix</literal>, <literal>Bounded</literal>, <literal>Read</literal>, and <literal>Show</literal>.
3004 </para>
3005 <para>
3006 GHC extends this list with several more classes that may be automatically derived:
3007 <itemizedlist>
3008 <listitem><para> With <option>-XDeriveDataTypeable</option>, you can derive instances of the classes
3009 <literal>Typeable</literal>, and <literal>Data</literal>, defined in the library
3010 modules <literal>Data.Typeable</literal> and <literal>Data.Generics</literal> respectively.
3011 </para>
3012 <para>An instance of <literal>Typeable</literal> can only be derived if the
3013 data type has seven or fewer type parameters, all of kind <literal>*</literal>.
3014 The reason for this is that the <literal>Typeable</literal> class is derived using the scheme
3015 described in
3016 <ulink url="http://research.microsoft.com/%7Esimonpj/papers/hmap/gmap2.ps">
3017 Scrap More Boilerplate: Reflection, Zips, and Generalised Casts
3018 </ulink>.
3019 (Section 7.4 of the paper describes the multiple <literal>Typeable</literal> classes that
3020 are used, and only <literal>Typeable1</literal> up to
3021 <literal>Typeable7</literal> are provided in the library.)
3022 In other cases, there is nothing to stop the programmer writing a <literal>TypableX</literal>
3023 class, whose kind suits that of the data type constructor, and
3024 then writing the data type instance by hand.
3025 </para>
3026 </listitem>
3027
3028 <listitem><para> With <option>-XDeriveFunctor</option>, you can derive instances of
3029 the class <literal>Functor</literal>,
3030 defined in <literal>GHC.Base</literal>.
3031 </para></listitem>
3032
3033 <listitem><para> With <option>-XDeriveFoldable</option>, you can derive instances of
3034 the class <literal>Foldable</literal>,
3035 defined in <literal>Data.Foldable</literal>.
3036 </para></listitem>
3037
3038 <listitem><para> With <option>-XDeriveTraversable</option>, you can derive instances of
3039 the class <literal>Traversable</literal>,
3040 defined in <literal>Data.Traversable</literal>.
3041 </para></listitem>
3042 </itemizedlist>
3043 In each case the appropriate class must be in scope before it
3044 can be mentioned in the <literal>deriving</literal> clause.
3045 </para>
3046 </sect2>
3047
3048 <sect2 id="newtype-deriving">
3049 <title>Generalised derived instances for newtypes</title>
3050
3051 <para>
3052 When you define an abstract type using <literal>newtype</literal>, you may want
3053 the new type to inherit some instances from its representation. In
3054 Haskell 98, you can inherit instances of <literal>Eq</literal>, <literal>Ord</literal>,
3055 <literal>Enum</literal> and <literal>Bounded</literal> by deriving them, but for any
3056 other classes you have to write an explicit instance declaration. For
3057 example, if you define
3058
3059 <programlisting>
3060   newtype Dollars = Dollars Int
3061 </programlisting>
3062
3063 and you want to use arithmetic on <literal>Dollars</literal>, you have to
3064 explicitly define an instance of <literal>Num</literal>:
3065
3066 <programlisting>
3067   instance Num Dollars where
3068     Dollars a + Dollars b = Dollars (a+b)
3069     ...
3070 </programlisting>
3071 All the instance does is apply and remove the <literal>newtype</literal>
3072 constructor. It is particularly galling that, since the constructor
3073 doesn't appear at run-time, this instance declaration defines a
3074 dictionary which is <emphasis>wholly equivalent</emphasis> to the <literal>Int</literal>
3075 dictionary, only slower!
3076 </para>
3077
3078
3079 <sect3> <title> Generalising the deriving clause </title>
3080 <para>
3081 GHC now permits such instances to be derived instead,
3082 using the flag <option>-XGeneralizedNewtypeDeriving</option>,
3083 so one can write
3084 <programlisting>
3085   newtype Dollars = Dollars Int deriving (Eq,Show,Num)
3086 </programlisting>
3087
3088 and the implementation uses the <emphasis>same</emphasis> <literal>Num</literal> dictionary
3089 for <literal>Dollars</literal> as for <literal>Int</literal>. Notionally, the compiler
3090 derives an instance declaration of the form
3091
3092 <programlisting>
3093   instance Num Int => Num Dollars
3094 </programlisting>
3095
3096 which just adds or removes the <literal>newtype</literal> constructor according to the type.
3097 </para>
3098 <para>
3099
3100 We can also derive instances of constructor classes in a similar
3101 way. For example, suppose we have implemented state and failure monad
3102 transformers, such that
3103
3104 <programlisting>
3105   instance Monad m => Monad (State s m)
3106   instance Monad m => Monad (Failure m)
3107 </programlisting>
3108 In Haskell 98, we can define a parsing monad by
3109 <programlisting>
3110   type Parser tok m a = State [tok] (Failure m) a
3111 </programlisting>
3112
3113 which is automatically a monad thanks to the instance declarations
3114 above. With the extension, we can make the parser type abstract,
3115 without needing to write an instance of class <literal>Monad</literal>, via
3116
3117 <programlisting>
3118   newtype Parser tok m a = Parser (State [tok] (Failure m) a)
3119                          deriving Monad
3120 </programlisting>
3121 In this case the derived instance declaration is of the form
3122 <programlisting>
3123   instance Monad (State [tok] (Failure m)) => Monad (Parser tok m)
3124 </programlisting>
3125
3126 Notice that, since <literal>Monad</literal> is a constructor class, the
3127 instance is a <emphasis>partial application</emphasis> of the new type, not the
3128 entire left hand side. We can imagine that the type declaration is
3129 "eta-converted" to generate the context of the instance
3130 declaration.
3131 </para>
3132 <para>
3133
3134 We can even derive instances of multi-parameter classes, provided the
3135 newtype is the last class parameter. In this case, a ``partial
3136 application'' of the class appears in the <literal>deriving</literal>
3137 clause. For example, given the class
3138
3139 <programlisting>
3140   class StateMonad s m | m -> s where ...
3141   instance Monad m => StateMonad s (State s m) where ...
3142 </programlisting>
3143 then we can derive an instance of <literal>StateMonad</literal> for <literal>Parser</literal>s by
3144 <programlisting>
3145   newtype Parser tok m a = Parser (State [tok] (Failure m) a)
3146                          deriving (Monad, StateMonad [tok])
3147 </programlisting>
3148
3149 The derived instance is obtained by completing the application of the
3150 class to the new type:
3151
3152 <programlisting>
3153   instance StateMonad [tok] (State [tok] (Failure m)) =>
3154            StateMonad [tok] (Parser tok m)
3155 </programlisting>
3156 </para>
3157 <para>
3158
3159 As a result of this extension, all derived instances in newtype
3160  declarations are treated uniformly (and implemented just by reusing
3161 the dictionary for the representation type), <emphasis>except</emphasis>
3162 <literal>Show</literal> and <literal>Read</literal>, which really behave differently for
3163 the newtype and its representation.
3164 </para>
3165 </sect3>
3166
3167 <sect3> <title> A more precise specification </title>
3168 <para>
3169 Derived instance declarations are constructed as follows. Consider the
3170 declaration (after expansion of any type synonyms)
3171
3172 <programlisting>
3173   newtype T v1...vn = T' (t vk+1...vn) deriving (c1...cm)
3174 </programlisting>
3175
3176 where
3177  <itemizedlist>
3178 <listitem><para>
3179   The <literal>ci</literal> are partial applications of
3180   classes of the form <literal>C t1'...tj'</literal>, where the arity of <literal>C</literal>
3181   is exactly <literal>j+1</literal>.  That is, <literal>C</literal> lacks exactly one type argument.
3182 </para></listitem>
3183 <listitem><para>
3184   The <literal>k</literal> is chosen so that <literal>ci (T v1...vk)</literal> is well-kinded.
3185 </para></listitem>
3186 <listitem><para>
3187   The type <literal>t</literal> is an arbitrary type.
3188 </para></listitem>
3189 <listitem><para>
3190   The type variables <literal>vk+1...vn</literal> do not occur in <literal>t</literal>,
3191   nor in the <literal>ci</literal>, and
3192 </para></listitem>
3193 <listitem><para>
3194   None of the <literal>ci</literal> is <literal>Read</literal>, <literal>Show</literal>,
3195                 <literal>Typeable</literal>, or <literal>Data</literal>.  These classes
3196                 should not "look through" the type or its constructor.  You can still
3197                 derive these classes for a newtype, but it happens in the usual way, not
3198                 via this new mechanism.
3199 </para></listitem>
3200 </itemizedlist>
3201 Then, for each <literal>ci</literal>, the derived instance
3202 declaration is:
3203 <programlisting>
3204   instance ci t => ci (T v1...vk)
3205 </programlisting>
3206 As an example which does <emphasis>not</emphasis> work, consider
3207 <programlisting>
3208   newtype NonMonad m s = NonMonad (State s m s) deriving Monad
3209 </programlisting>
3210 Here we cannot derive the instance
3211 <programlisting>
3212   instance Monad (State s m) => Monad (NonMonad m)
3213 </programlisting>
3214
3215 because the type variable <literal>s</literal> occurs in <literal>State s m</literal>,
3216 and so cannot be "eta-converted" away. It is a good thing that this
3217 <literal>deriving</literal> clause is rejected, because <literal>NonMonad m</literal> is
3218 not, in fact, a monad --- for the same reason. Try defining
3219 <literal>>>=</literal> with the correct type: you won't be able to.
3220 </para>
3221 <para>
3222
3223 Notice also that the <emphasis>order</emphasis> of class parameters becomes
3224 important, since we can only derive instances for the last one. If the
3225 <literal>StateMonad</literal> class above were instead defined as
3226
3227 <programlisting>
3228   class StateMonad m s | m -> s where ...
3229 </programlisting>
3230
3231 then we would not have been able to derive an instance for the
3232 <literal>Parser</literal> type above. We hypothesise that multi-parameter
3233 classes usually have one "main" parameter for which deriving new
3234 instances is most interesting.
3235 </para>
3236 <para>Lastly, all of this applies only for classes other than
3237 <literal>Read</literal>, <literal>Show</literal>, <literal>Typeable</literal>,
3238 and <literal>Data</literal>, for which the built-in derivation applies (section
3239 4.3.3. of the Haskell Report).
3240 (For the standard classes <literal>Eq</literal>, <literal>Ord</literal>,
3241 <literal>Ix</literal>, and <literal>Bounded</literal> it is immaterial whether
3242 the standard method is used or the one described here.)
3243 </para>
3244 </sect3>
3245 </sect2>
3246 </sect1>
3247
3248
3249 <!-- TYPE SYSTEM EXTENSIONS -->
3250 <sect1 id="type-class-extensions">
3251 <title>Class and instances declarations</title>
3252
3253 <sect2 id="multi-param-type-classes">
3254 <title>Class declarations</title>
3255
3256 <para>
3257 This section, and the next one, documents GHC's type-class extensions.
3258 There's lots of background in the paper <ulink
3259 url="http://research.microsoft.com/~simonpj/Papers/type-class-design-space/">Type
3260 classes: exploring the design space</ulink> (Simon Peyton Jones, Mark
3261 Jones, Erik Meijer).
3262 </para>
3263 <para>
3264 All the extensions are enabled by the <option>-fglasgow-exts</option> flag.
3265 </para>
3266
3267 <sect3>
3268 <title>Multi-parameter type classes</title>
3269 <para>
3270 Multi-parameter type classes are permitted, with flag <option>-XMultiParamTypeClasses</option>.
3271 For example:
3272
3273
3274 <programlisting>
3275   class Collection c a where
3276     union :: c a -> c a -> c a
3277     ...etc.
3278 </programlisting>
3279
3280 </para>
3281 </sect3>
3282
3283 <sect3 id="superclass-rules">
3284 <title>The superclasses of a class declaration</title>
3285
3286 <para>
3287 In Haskell 98 the context of a class declaration (which introduces superclasses)
3288 must be simple; that is, each predicate must consist of a class applied to
3289 type variables.  The flag <option>-XFlexibleContexts</option>
3290 (<xref linkend="flexible-contexts"/>)
3291 lifts this restriction,
3292 so that the only restriction on the context in a class declaration is
3293 that the class hierarchy must be acyclic.  So these class declarations are OK:
3294
3295
3296 <programlisting>
3297   class Functor (m k) => FiniteMap m k where
3298     ...
3299
3300   class (Monad m, Monad (t m)) => Transform t m where
3301     lift :: m a -> (t m) a
3302 </programlisting>
3303
3304
3305 </para>
3306 <para>
3307 As in Haskell 98, The class hierarchy must be acyclic.  However, the definition
3308 of "acyclic" involves only the superclass relationships.  For example,
3309 this is OK:
3310
3311
3312 <programlisting>
3313   class C a where {
3314     op :: D b => a -> b -> b
3315   }
3316
3317   class C a => D a where { ... }
3318 </programlisting>
3319
3320
3321 Here, <literal>C</literal> is a superclass of <literal>D</literal>, but it's OK for a
3322 class operation <literal>op</literal> of <literal>C</literal> to mention <literal>D</literal>.  (It
3323 would not be OK for <literal>D</literal> to be a superclass of <literal>C</literal>.)
3324 </para>
3325 </sect3>
3326
3327
3328
3329
3330 <sect3 id="class-method-types">
3331 <title>Class method types</title>
3332
3333 <para>
3334 Haskell 98 prohibits class method types to mention constraints on the
3335 class type variable, thus:
3336 <programlisting>
3337   class Seq s a where
3338     fromList :: [a] -> s a
3339     elem     :: Eq a => a -> s a -> Bool
3340 </programlisting>
3341 The type of <literal>elem</literal> is illegal in Haskell 98, because it
3342 contains the constraint <literal>Eq a</literal>, constrains only the
3343 class type variable (in this case <literal>a</literal>).
3344 GHC lifts this restriction (flag <option>-XConstrainedClassMethods</option>).
3345 </para>
3346
3347
3348 </sect3>
3349 </sect2>
3350
3351 <sect2 id="functional-dependencies">
3352 <title>Functional dependencies
3353 </title>
3354
3355 <para> Functional dependencies are implemented as described by Mark Jones
3356 in &ldquo;<ulink url="http://citeseer.ist.psu.edu/jones00type.html">Type Classes with Functional Dependencies</ulink>&rdquo;, Mark P. Jones,
3357 In Proceedings of the 9th European Symposium on Programming,
3358 ESOP 2000, Berlin, Germany, March 2000, Springer-Verlag LNCS 1782,
3359 .
3360 </para>
3361 <para>
3362 Functional dependencies are introduced by a vertical bar in the syntax of a
3363 class declaration;  e.g.
3364 <programlisting>
3365   class (Monad m) => MonadState s m | m -> s where ...
3366
3367   class Foo a b c | a b -> c where ...
3368 </programlisting>
3369 There should be more documentation, but there isn't (yet).  Yell if you need it.
3370 </para>
3371
3372 <sect3><title>Rules for functional dependencies </title>
3373 <para>
3374 In a class declaration, all of the class type variables must be reachable (in the sense
3375 mentioned in <xref linkend="flexible-contexts"/>)
3376 from the free variables of each method type.
3377 For example:
3378
3379 <programlisting>
3380   class Coll s a where
3381     empty  :: s
3382     insert :: s -> a -> s
3383 </programlisting>
3384
3385 is not OK, because the type of <literal>empty</literal> doesn't mention
3386 <literal>a</literal>.  Functional dependencies can make the type variable
3387 reachable:
3388 <programlisting>
3389   class Coll s a | s -> a where
3390     empty  :: s
3391     insert :: s -> a -> s
3392 </programlisting>
3393
3394 Alternatively <literal>Coll</literal> might be rewritten
3395
3396 <programlisting>
3397   class Coll s a where
3398     empty  :: s a
3399     insert :: s a -> a -> s a
3400 </programlisting>
3401
3402
3403 which makes the connection between the type of a collection of
3404 <literal>a</literal>'s (namely <literal>(s a)</literal>) and the element type <literal>a</literal>.
3405 Occasionally this really doesn't work, in which case you can split the
3406 class like this:
3407
3408
3409 <programlisting>
3410   class CollE s where
3411     empty  :: s
3412
3413   class CollE s => Coll s a where
3414     insert :: s -> a -> s
3415 </programlisting>
3416 </para>
3417 </sect3>
3418
3419
3420 <sect3>
3421 <title>Background on functional dependencies</title>
3422
3423 <para>The following description of the motivation and use of functional dependencies is taken
3424 from the Hugs user manual, reproduced here (with minor changes) by kind
3425 permission of Mark Jones.
3426 </para>
3427 <para>
3428 Consider the following class, intended as part of a
3429 library for collection types:
3430 <programlisting>
3431    class Collects e ce where
3432        empty  :: ce
3433        insert :: e -> ce -> ce
3434        member :: e -> ce -> Bool
3435 </programlisting>
3436 The type variable e used here represents the element type, while ce is the type
3437 of the container itself. Within this framework, we might want to define
3438 instances of this class for lists or characteristic functions (both of which
3439 can be used to represent collections of any equality type), bit sets (which can
3440 be used to represent collections of characters), or hash tables (which can be
3441 used to represent any collection whose elements have a hash function). Omitting
3442 standard implementation details, this would lead to the following declarations:
3443 <programlisting>
3444    instance Eq e => Collects e [e] where ...
3445    instance Eq e => Collects e (e -> Bool) where ...
3446    instance Collects Char BitSet where ...
3447    instance (Hashable e, Collects a ce)
3448               => Collects e (Array Int ce) where ...
3449 </programlisting>
3450 All this looks quite promising; we have a class and a range of interesting
3451 implementations. Unfortunately, there are some serious problems with the class
3452 declaration. First, the empty function has an ambiguous type:
3453 <programlisting>
3454    empty :: Collects e ce => ce
3455 </programlisting>
3456 By "ambiguous" we mean that there is a type variable e that appears on the left
3457 of the <literal>=&gt;</literal> symbol, but not on the right. The problem with
3458 this is that, according to the theoretical foundations of Haskell overloading,
3459 we cannot guarantee a well-defined semantics for any term with an ambiguous
3460 type.
3461 </para>
3462 <para>
3463 We can sidestep this specific problem by removing the empty member from the
3464 class declaration. However, although the remaining members, insert and member,
3465 do not have ambiguous types, we still run into problems when we try to use
3466 them. For example, consider the following two functions:
3467 <programlisting>
3468    f x y = insert x . insert y
3469    g     = f True 'a'
3470 </programlisting>
3471 for which GHC infers the following types:
3472 <programlisting>
3473    f :: (Collects a c, Collects b c) => a -> b -> c -> c
3474    g :: (Collects Bool c, Collects Char c) => c -> c
3475 </programlisting>
3476 Notice that the type for f allows the two parameters x and y to be assigned
3477 different types, even though it attempts to insert each of the two values, one
3478 after the other, into the same collection. If we're trying to model collections
3479 that contain only one type of value, then this is clearly an inaccurate
3480 type. Worse still, the definition for g is accepted, without causing a type
3481 error. As a result, the error in this code will not be flagged at the point
3482 where it appears. Instead, it will show up only when we try to use g, which
3483 might even be in a different module.
3484 </para>
3485
3486 <sect4><title>An attempt to use constructor classes</title>
3487
3488 <para>
3489 Faced with the problems described above, some Haskell programmers might be
3490 tempted to use something like the following version of the class declaration:
3491 <programlisting>
3492    class Collects e c where
3493       empty  :: c e
3494       insert :: e -> c e -> c e
3495       member :: e -> c e -> Bool
3496 </programlisting>
3497 The key difference here is that we abstract over the type constructor c that is
3498 used to form the collection type c e, and not over that collection type itself,
3499 represented by ce in the original class declaration. This avoids the immediate
3500 problems that we mentioned above: empty has type <literal>Collects e c => c
3501 e</literal>, which is not ambiguous.
3502 </para>
3503 <para>
3504 The function f from the previous section has a more accurate type:
3505 <programlisting>
3506    f :: (Collects e c) => e -> e -> c e -> c e
3507 </programlisting>
3508 The function g from the previous section is now rejected with a type error as
3509 we would hope because the type of f does not allow the two arguments to have
3510 different types.
3511 This, then, is an example of a multiple parameter class that does actually work
3512 quite well in practice, without ambiguity problems.
3513 There is, however, a catch. This version of the Collects class is nowhere near
3514 as general as the original class seemed to be: only one of the four instances
3515 for <literal>Collects</literal>
3516 given above can be used with this version of Collects because only one of
3517 them---the instance for lists---has a collection type that can be written in
3518 the form c e, for some type constructor c, and element type e.
3519 </para>
3520 </sect4>
3521
3522 <sect4><title>Adding functional dependencies</title>
3523
3524 <para>
3525 To get a more useful version of the Collects class, Hugs provides a mechanism
3526 that allows programmers to specify dependencies between the parameters of a
3527 multiple parameter class (For readers with an interest in theoretical
3528 foundations and previous work: The use of dependency information can be seen
3529 both as a generalization of the proposal for `parametric type classes' that was
3530 put forward by Chen, Hudak, and Odersky, or as a special case of Mark Jones's
3531 later framework for "improvement" of qualified types. The
3532 underlying ideas are also discussed in a more theoretical and abstract setting
3533 in a manuscript [implparam], where they are identified as one point in a
3534 general design space for systems of implicit parameterization.).
3535
3536 To start with an abstract example, consider a declaration such as:
3537 <programlisting>
3538    class C a b where ...
3539 </programlisting>
3540 which tells us simply that C can be thought of as a binary relation on types
3541 (or type constructors, depending on the kinds of a and b). Extra clauses can be
3542 included in the definition of classes to add information about dependencies
3543 between parameters, as in the following examples:
3544 <programlisting>
3545    class D a b | a -> b where ...
3546    class E a b | a -> b, b -> a where ...
3547 </programlisting>
3548 The notation <literal>a -&gt; b</literal> used here between the | and where
3549 symbols --- not to be
3550 confused with a function type --- indicates that the a parameter uniquely
3551 determines the b parameter, and might be read as "a determines b." Thus D is
3552 not just a relation, but actually a (partial) function. Similarly, from the two
3553 dependencies that are included in the definition of E, we can see that E
3554 represents a (partial) one-one mapping between types.
3555 </para>
3556 <para>
3557 More generally, dependencies take the form <literal>x1 ... xn -&gt; y1 ... ym</literal>,
3558 where x1, ..., xn, and y1, ..., yn are type variables with n&gt;0 and
3559 m&gt;=0, meaning that the y parameters are uniquely determined by the x
3560 parameters. Spaces can be used as separators if more than one variable appears
3561 on any single side of a dependency, as in <literal>t -&gt; a b</literal>. Note that a class may be
3562 annotated with multiple dependencies using commas as separators, as in the
3563 definition of E above. Some dependencies that we can write in this notation are
3564 redundant, and will be rejected because they don't serve any useful
3565 purpose, and may instead indicate an error in the program. Examples of
3566 dependencies like this include  <literal>a -&gt; a </literal>,
3567 <literal>a -&gt; a a </literal>,
3568 <literal>a -&gt; </literal>, etc. There can also be
3569 some redundancy if multiple dependencies are given, as in
3570 <literal>a-&gt;b</literal>,
3571  <literal>b-&gt;c </literal>,  <literal>a-&gt;c </literal>, and
3572 in which some subset implies the remaining dependencies. Examples like this are
3573 not treated as errors. Note that dependencies appear only in class
3574 declarations, and not in any other part of the language. In particular, the
3575 syntax for instance declarations, class constraints, and types is completely
3576 unchanged.
3577 </para>
3578 <para>
3579 By including dependencies in a class declaration, we provide a mechanism for
3580 the programmer to specify each multiple parameter class more precisely. The
3581 compiler, on the other hand, is responsible for ensuring that the set of
3582 instances that are in scope at any given point in the program is consistent
3583 with any declared dependencies. For example, the following pair of instance
3584 declarations cannot appear together in the same scope because they violate the
3585 dependency for D, even though either one on its own would be acceptable:
3586 <programlisting>
3587    instance D Bool Int where ...
3588    instance D Bool Char where ...
3589 </programlisting>
3590 Note also that the following declaration is not allowed, even by itself:
3591 <programlisting>
3592    instance D [a] b where ...
3593 </programlisting>
3594 The problem here is that this instance would allow one particular choice of [a]
3595 to be associated with more than one choice for b, which contradicts the
3596 dependency specified in the definition of D. More generally, this means that,
3597 in any instance of the form:
3598 <programlisting>
3599    instance D t s where ...
3600 </programlisting>
3601 for some particular types t and s, the only variables that can appear in s are
3602 the ones that appear in t, and hence, if the type t is known, then s will be
3603 uniquely determined.
3604 </para>
3605 <para>
3606 The benefit of including dependency information is that it allows us to define
3607 more general multiple parameter classes, without ambiguity problems, and with
3608 the benefit of more accurate types. To illustrate this, we return to the
3609 collection class example, and annotate the original definition of <literal>Collects</literal>
3610 with a simple dependency:
3611 <programlisting>
3612    class Collects e ce | ce -> e where
3613       empty  :: ce
3614       insert :: e -> ce -> ce
3615       member :: e -> ce -> Bool
3616 </programlisting>
3617 The dependency <literal>ce -&gt; e</literal> here specifies that the type e of elements is uniquely
3618 determined by the type of the collection ce. Note that both parameters of
3619 Collects are of kind *; there are no constructor classes here. Note too that
3620 all of the instances of Collects that we gave earlier can be used
3621 together with this new definition.
3622 </para>
3623 <para>
3624 What about the ambiguity problems that we encountered with the original
3625 definition? The empty function still has type Collects e ce => ce, but it is no
3626 longer necessary to regard that as an ambiguous type: Although the variable e
3627 does not appear on the right of the => symbol, the dependency for class
3628 Collects tells us that it is uniquely determined by ce, which does appear on
3629 the right of the => symbol. Hence the context in which empty is used can still
3630 give enough information to determine types for both ce and e, without
3631 ambiguity. More generally, we need only regard a type as ambiguous if it
3632 contains a variable on the left of the => that is not uniquely determined
3633 (either directly or indirectly) by the variables on the right.
3634 </para>
3635 <para>
3636 Dependencies also help to produce more accurate types for user defined
3637 functions, and hence to provide earlier detection of errors, and less cluttered
3638 types for programmers to work with. Recall the previous definition for a
3639 function f:
3640 <programlisting>
3641    f x y = insert x y = insert x . insert y
3642 </programlisting>
3643 for which we originally obtained a type:
3644 <programlisting>
3645    f :: (Collects a c, Collects b c) => a -> b -> c -> c
3646 </programlisting>
3647 Given the dependency information that we have for Collects, however, we can
3648 deduce that a and b must be equal because they both appear as the second
3649 parameter in a Collects constraint with the same first parameter c. Hence we
3650 can infer a shorter and more accurate type for f:
3651 <programlisting>
3652    f :: (Collects a c) => a -> a -> c -> c
3653 </programlisting>
3654 In a similar way, the earlier definition of g will now be flagged as a type error.
3655 </para>
3656 <para>
3657 Although we have given only a few examples here, it should be clear that the
3658 addition of dependency information can help to make multiple parameter classes
3659 more useful in practice, avoiding ambiguity problems, and allowing more general
3660 sets of instance declarations.
3661 </para>
3662 </sect4>
3663 </sect3>
3664 </sect2>
3665
3666 <sect2 id="instance-decls">
3667 <title>Instance declarations</title>
3668
3669 <para>An instance declaration has the form
3670 <screen>
3671   instance ( <replaceable>assertion</replaceable><subscript>1</subscript>, ..., <replaceable>assertion</replaceable><subscript>n</subscript>) =&gt; <replaceable>class</replaceable> <replaceable>type</replaceable><subscript>1</subscript> ... <replaceable>type</replaceable><subscript>m</subscript> where ...
3672 </screen>
3673 The part before the "<literal>=&gt;</literal>" is the
3674 <emphasis>context</emphasis>, while the part after the
3675 "<literal>=&gt;</literal>" is the <emphasis>head</emphasis> of the instance declaration.
3676 </para>
3677
3678 <sect3 id="flexible-instance-head">
3679 <title>Relaxed rules for the instance head</title>
3680
3681 <para>
3682 In Haskell 98 the head of an instance declaration
3683 must be of the form <literal>C (T a1 ... an)</literal>, where
3684 <literal>C</literal> is the class, <literal>T</literal> is a data type constructor,
3685 and the <literal>a1 ... an</literal> are distinct type variables.
3686 GHC relaxes these rules in two ways.
3687 <itemizedlist>
3688 <listitem>
3689 <para>
3690 The <option>-XFlexibleInstances</option> flag allows the head of the instance
3691 declaration to mention arbitrary nested types.
3692 For example, this becomes a legal instance declaration
3693 <programlisting>
3694   instance C (Maybe Int) where ...
3695 </programlisting>
3696 See also the <link linkend="instance-overlap">rules on overlap</link>.
3697 </para></listitem>
3698 <listitem><para>
3699 With the <option>-XTypeSynonymInstances</option> flag, instance heads may use type
3700 synonyms. As always, using a type synonym is just shorthand for
3701 writing the RHS of the type synonym definition.  For example:
3702
3703
3704 <programlisting>
3705   type Point = (Int,Int)
3706   instance C Point   where ...
3707   instance C [Point] where ...
3708 </programlisting>
3709
3710
3711 is legal.  However, if you added
3712
3713
3714 <programlisting>
3715   instance C (Int,Int) where ...
3716 </programlisting>
3717
3718
3719 as well, then the compiler will complain about the overlapping
3720 (actually, identical) instance declarations.  As always, type synonyms
3721 must be fully applied.  You cannot, for example, write:
3722
3723 <programlisting>
3724   type P a = [[a]]
3725   instance Monad P where ...
3726 </programlisting>
3727
3728 </para></listitem>
3729 </itemizedlist>
3730 </para>
3731 </sect3>
3732
3733 <sect3 id="instance-rules">
3734 <title>Relaxed rules for instance contexts</title>
3735
3736 <para>In Haskell 98, the assertions in the context of the instance declaration
3737 must be of the form <literal>C a</literal> where <literal>a</literal>
3738 is a type variable that occurs in the head.
3739 </para>
3740
3741 <para>
3742 The <option>-XFlexibleContexts</option> flag relaxes this rule, as well
3743 as the corresponding rule for type signatures (see <xref linkend="flexible-contexts"/>).
3744 With this flag the context of the instance declaration can each consist of arbitrary
3745 (well-kinded) assertions <literal>(C t1 ... tn)</literal> subject only to the
3746 following rules:
3747 <orderedlist>
3748 <listitem><para>
3749 The Paterson Conditions: for each assertion in the context
3750 <orderedlist>
3751 <listitem><para>No type variable has more occurrences in the assertion than in the head</para></listitem>
3752 <listitem><para>The assertion has fewer constructors and variables (taken together
3753       and counting repetitions) than the head</para></listitem>
3754 </orderedlist>
3755 </para></listitem>
3756
3757 <listitem><para>The Coverage Condition.  For each functional dependency,
3758 <replaceable>tvs</replaceable><subscript>left</subscript> <literal>-&gt;</literal>
3759 <replaceable>tvs</replaceable><subscript>right</subscript>,  of the class,
3760 every type variable in
3761 S(<replaceable>tvs</replaceable><subscript>right</subscript>) must appear in
3762 S(<replaceable>tvs</replaceable><subscript>left</subscript>), where S is the
3763 substitution mapping each type variable in the class declaration to the
3764 corresponding type in the instance declaration.
3765 </para></listitem>
3766 </orderedlist>
3767 These restrictions ensure that context reduction terminates: each reduction
3768 step makes the problem smaller by at least one
3769 constructor.  Both the Paterson Conditions and the Coverage Condition are lifted
3770 if you give the <option>-XUndecidableInstances</option>
3771 flag (<xref linkend="undecidable-instances"/>).
3772 You can find lots of background material about the reason for these
3773 restrictions in the paper <ulink
3774 url="http://research.microsoft.com/%7Esimonpj/papers/fd%2Dchr/">
3775 Understanding functional dependencies via Constraint Handling Rules</ulink>.
3776 </para>
3777 <para>
3778 For example, these are OK:
3779 <programlisting>
3780   instance C Int [a]          -- Multiple parameters
3781   instance Eq (S [a])         -- Structured type in head
3782
3783       -- Repeated type variable in head
3784   instance C4 a a => C4 [a] [a]
3785   instance Stateful (ST s) (MutVar s)
3786
3787       -- Head can consist of type variables only
3788   instance C a
3789   instance (Eq a, Show b) => C2 a b
3790
3791       -- Non-type variables in context
3792   instance Show (s a) => Show (Sized s a)
3793   instance C2 Int a => C3 Bool [a]
3794   instance C2 Int a => C3 [a] b
3795 </programlisting>
3796 But these are not:
3797 <programlisting>
3798       -- Context assertion no smaller than head
3799   instance C a => C a where ...
3800       -- (C b b) has more more occurrences of b than the head
3801   instance C b b => Foo [b] where ...
3802 </programlisting>
3803 </para>
3804
3805 <para>
3806 The same restrictions apply to instances generated by
3807 <literal>deriving</literal> clauses.  Thus the following is accepted:
3808 <programlisting>
3809   data MinHeap h a = H a (h a)
3810     deriving (Show)
3811 </programlisting>
3812 because the derived instance
3813 <programlisting>
3814   instance (Show a, Show (h a)) => Show (MinHeap h a)
3815 </programlisting>
3816 conforms to the above rules.
3817 </para>
3818
3819 <para>
3820 A useful idiom permitted by the above rules is as follows.
3821 If one allows overlapping instance declarations then it's quite
3822 convenient to have a "default instance" declaration that applies if
3823 something more specific does not:
3824 <programlisting>
3825   instance C a where
3826     op = ... -- Default
3827 </programlisting>
3828 </para>
3829 </sect3>
3830
3831 <sect3 id="undecidable-instances">
3832 <title>Undecidable instances</title>
3833
3834 <para>
3835 Sometimes even the rules of <xref linkend="instance-rules"/> are too onerous.
3836 For example, sometimes you might want to use the following to get the
3837 effect of a "class synonym":
3838 <programlisting>
3839   class (C1 a, C2 a, C3 a) => C a where { }
3840
3841   instance (C1 a, C2 a, C3 a) => C a where { }
3842 </programlisting>
3843 This allows you to write shorter signatures:
3844 <programlisting>
3845   f :: C a => ...
3846 </programlisting>
3847 instead of
3848 <programlisting>
3849   f :: (C1 a, C2 a, C3 a) => ...
3850 </programlisting>
3851 The restrictions on functional dependencies (<xref
3852 linkend="functional-dependencies"/>) are particularly troublesome.
3853 It is tempting to introduce type variables in the context that do not appear in
3854 the head, something that is excluded by the normal rules. For example:
3855 <programlisting>
3856   class HasConverter a b | a -> b where
3857      convert :: a -> b
3858
3859   data Foo a = MkFoo a
3860
3861   instance (HasConverter a b,Show b) => Show (Foo a) where
3862      show (MkFoo value) = show (convert value)
3863 </programlisting>
3864 This is dangerous territory, however. Here, for example, is a program that would make the
3865 typechecker loop:
3866 <programlisting>
3867   class D a
3868   class F a b | a->b
3869   instance F [a] [[a]]
3870   instance (D c, F a c) => D [a]   -- 'c' is not mentioned in the head
3871 </programlisting>
3872 Similarly, it can be tempting to lift the coverage condition:
3873 <programlisting>
3874   class Mul a b c | a b -> c where
3875         (.*.) :: a -> b -> c
3876
3877   instance Mul Int Int Int where (.*.) = (*)
3878   instance Mul Int Float Float where x .*. y = fromIntegral x * y
3879   instance Mul a b c => Mul a [b] [c] where x .*. v = map (x.*.) v
3880 </programlisting>
3881 The third instance declaration does not obey the coverage condition;
3882 and indeed the (somewhat strange) definition:
3883 <programlisting>
3884   f = \ b x y -> if b then x .*. [y] else y
3885 </programlisting>
3886 makes instance inference go into a loop, because it requires the constraint
3887 <literal>(Mul a [b] b)</literal>.
3888 </para>
3889 <para>
3890 Nevertheless, GHC allows you to experiment with more liberal rules.  If you use
3891 the experimental flag <option>-XUndecidableInstances</option>
3892 <indexterm><primary>-XUndecidableInstances</primary></indexterm>,
3893 both the Paterson Conditions and the Coverage Condition
3894 (described in <xref linkend="instance-rules"/>) are lifted.  Termination is ensured by having a
3895 fixed-depth recursion stack.  If you exceed the stack depth you get a
3896 sort of backtrace, and the opportunity to increase the stack depth
3897 with <option>-fcontext-stack=</option><emphasis>N</emphasis>.
3898 </para>
3899
3900 </sect3>
3901
3902
3903 <sect3 id="instance-overlap">
3904 <title>Overlapping instances</title>
3905 <para>
3906 In general, <emphasis>GHC requires that that it be unambiguous which instance
3907 declaration
3908 should be used to resolve a type-class constraint</emphasis>. This behaviour
3909 can be modified by two flags: <option>-XOverlappingInstances</option>
3910 <indexterm><primary>-XOverlappingInstances
3911 </primary></indexterm>
3912 and <option>-XIncoherentInstances</option>
3913 <indexterm><primary>-XIncoherentInstances
3914 </primary></indexterm>, as this section discusses.  Both these
3915 flags are dynamic flags, and can be set on a per-module basis, using
3916 an <literal>OPTIONS_GHC</literal> pragma if desired (<xref linkend="source-file-options"/>).</para>
3917 <para>
3918 When GHC tries to resolve, say, the constraint <literal>C Int Bool</literal>,
3919 it tries to match every instance declaration against the
3920 constraint,
3921 by instantiating the head of the instance declaration.  For example, consider
3922 these declarations:
3923 <programlisting>
3924   instance context1 => C Int a     where ...  -- (A)
3925   instance context2 => C a   Bool  where ...  -- (B)
3926   instance context3 => C Int [a]   where ...  -- (C)
3927   instance context4 => C Int [Int] where ...  -- (D)
3928 </programlisting>
3929 The instances (A) and (B) match the constraint <literal>C Int Bool</literal>,
3930 but (C) and (D) do not.  When matching, GHC takes
3931 no account of the context of the instance declaration
3932 (<literal>context1</literal> etc).
3933 GHC's default behaviour is that <emphasis>exactly one instance must match the
3934 constraint it is trying to resolve</emphasis>.
3935 It is fine for there to be a <emphasis>potential</emphasis> of overlap (by
3936 including both declarations (A) and (B), say); an error is only reported if a
3937 particular constraint matches more than one.
3938 </para>
3939
3940 <para>
3941 The <option>-XOverlappingInstances</option> flag instructs GHC to allow
3942 more than one instance to match, provided there is a most specific one.  For
3943 example, the constraint <literal>C Int [Int]</literal> matches instances (A),
3944 (C) and (D), but the last is more specific, and hence is chosen.  If there is no
3945 most-specific match, the program is rejected.
3946 </para>
3947 <para>
3948 However, GHC is conservative about committing to an overlapping instance.  For example:
3949 <programlisting>
3950   f :: [b] -> [b]
3951   f x = ...
3952 </programlisting>
3953 Suppose that from the RHS of <literal>f</literal> we get the constraint
3954 <literal>C Int [b]</literal>.  But
3955 GHC does not commit to instance (C), because in a particular
3956 call of <literal>f</literal>, <literal>b</literal> might be instantiate
3957 to <literal>Int</literal>, in which case instance (D) would be more specific still.
3958 So GHC rejects the program.
3959 (If you add the flag <option>-XIncoherentInstances</option>,
3960 GHC will instead pick (C), without complaining about
3961 the problem of subsequent instantiations.)
3962 </para>
3963 <para>
3964 Notice that we gave a type signature to <literal>f</literal>, so GHC had to
3965 <emphasis>check</emphasis> that <literal>f</literal> has the specified type.
3966 Suppose instead we do not give a type signature, asking GHC to <emphasis>infer</emphasis>
3967 it instead.  In this case, GHC will refrain from
3968 simplifying the constraint <literal>C Int [b]</literal> (for the same reason
3969 as before) but, rather than rejecting the program, it will infer the type
3970 <programlisting>
3971   f :: C Int [b] => [b] -> [b]
3972 </programlisting>
3973 That postpones the question of which instance to pick to the
3974 call site for <literal>f</literal>
3975 by which time more is known about the type <literal>b</literal>.
3976 You can write this type signature yourself if you use the
3977 <link linkend="flexible-contexts"><option>-XFlexibleContexts</option></link>
3978 flag.
3979 </para>
3980 <para>
3981 Exactly the same situation can arise in instance declarations themselves.  Suppose we have
3982 <programlisting>
3983   class Foo a where
3984      f :: a -> a
3985   instance Foo [b] where
3986      f x = ...
3987 </programlisting>
3988 and, as before, the constraint <literal>C Int [b]</literal> arises from <literal>f</literal>'s
3989 right hand side.  GHC will reject the instance, complaining as before that it does not know how to resolve
3990 the constraint <literal>C Int [b]</literal>, because it matches more than one instance
3991 declaration.  The solution is to postpone the choice by adding the constraint to the context
3992 of the instance declaration, thus:
3993 <programlisting>
3994   instance C Int [b] => Foo [b] where
3995      f x = ...
3996 </programlisting>
3997 (You need <link linkend="instance-rules"><option>-XFlexibleInstances</option></link> to do this.)
3998 </para>
3999 <para>
4000 Warning: overlapping instances must be used with care.  They
4001 can give rise to incoherence (ie different instance choices are made
4002 in different parts of the program) even without <option>-XIncoherentInstances</option>. Consider:
4003 <programlisting>
4004 {-# LANGUAGE OverlappingInstances #-}
4005 module Help where
4006
4007     class MyShow a where
4008       myshow :: a -> String
4009
4010     instance MyShow a => MyShow [a] where
4011       myshow xs = concatMap myshow xs
4012
4013     showHelp :: MyShow a => [a] -> String
4014     showHelp xs = myshow xs
4015
4016 {-# LANGUAGE FlexibleInstances, OverlappingInstances #-}
4017 module Main where
4018     import Help
4019
4020     data T = MkT
4021
4022     instance MyShow T where
4023       myshow x = "Used generic instance"
4024
4025     instance MyShow [T] where
4026       myshow xs = "Used more specific instance"
4027
4028     main = do { print (myshow [MkT]); print (showHelp [MkT]) }
4029 </programlisting>
4030 In function <literal>showHelp</literal> GHC sees no overlapping
4031 instances, and so uses the <literal>MyShow [a]</literal> instance
4032 without complaint.  In the call to <literal>myshow</literal> in <literal>main</literal>,
4033 GHC resolves the <literal>MyShow [T]</literal> constraint using the overlapping
4034 instance declaration in module <literal>Main</literal>. As a result,
4035 the program prints
4036 <programlisting>
4037   "Used more specific instance"
4038   "Used generic instance"
4039 </programlisting>
4040 (An alternative possible behaviour, not currently implemented,
4041 would be to reject module <literal>Help</literal>
4042 on the grounds that a later instance declaration might overlap the local one.)
4043 </para>
4044 <para>
4045 The willingness to be overlapped or incoherent is a property of
4046 the <emphasis>instance declaration</emphasis> itself, controlled by the
4047 presence or otherwise of the <option>-XOverlappingInstances</option>
4048 and <option>-XIncoherentInstances</option> flags when that module is
4049 being defined.  Neither flag is required in a module that imports and uses the
4050 instance declaration.  Specifically, during the lookup process:
4051 <itemizedlist>
4052 <listitem><para>
4053 An instance declaration is ignored during the lookup process if (a) a more specific
4054 match is found, and (b) the instance declaration was compiled with
4055 <option>-XOverlappingInstances</option>.  The flag setting for the
4056 more-specific instance does not matter.
4057 </para></listitem>
4058 <listitem><para>
4059 Suppose an instance declaration does not match the constraint being looked up, but
4060 does unify with it, so that it might match when the constraint is further
4061 instantiated.  Usually GHC will regard this as a reason for not committing to
4062 some other constraint.  But if the instance declaration was compiled with
4063 <option>-XIncoherentInstances</option>, GHC will skip the "does-it-unify?"
4064 check for that declaration.
4065 </para></listitem>
4066 </itemizedlist>
4067 These rules make it possible for a library author to design a library that relies on
4068 overlapping instances without the library client having to know.
4069 </para>
4070 <para>
4071 If an instance declaration is compiled without
4072 <option>-XOverlappingInstances</option>,
4073 then that instance can never be overlapped.  This could perhaps be
4074 inconvenient.  Perhaps the rule should instead say that the
4075 <emphasis>overlapping</emphasis> instance declaration should be compiled in
4076 this way, rather than the <emphasis>overlapped</emphasis> one.  Perhaps overlap
4077 at a usage site should be permitted regardless of how the instance declarations
4078 are compiled, if the <option>-XOverlappingInstances</option> flag is
4079 used at the usage site.  (Mind you, the exact usage site can occasionally be
4080 hard to pin down.)  We are interested to receive feedback on these points.
4081 </para>
4082 <para>The <option>-XIncoherentInstances</option> flag implies the
4083 <option>-XOverlappingInstances</option> flag, but not vice versa.
4084 </para>
4085 </sect3>
4086
4087
4088
4089 </sect2>
4090
4091 <sect2 id="overloaded-strings">
4092 <title>Overloaded string literals
4093 </title>
4094
4095 <para>
4096 GHC supports <emphasis>overloaded string literals</emphasis>.  Normally a
4097 string literal has type <literal>String</literal>, but with overloaded string
4098 literals enabled (with <literal>-XOverloadedStrings</literal>)
4099  a string literal has type <literal>(IsString a) => a</literal>.
4100 </para>
4101 <para>
4102 This means that the usual string syntax can be used, e.g., for packed strings
4103 and other variations of string like types.  String literals behave very much
4104 like integer literals, i.e., they can be used in both expressions and patterns.
4105 If used in a pattern the literal with be replaced by an equality test, in the same
4106 way as an integer literal is.
4107 </para>
4108 <para>
4109 The class <literal>IsString</literal> is defined as:
4110 <programlisting>
4111 class IsString a where
4112     fromString :: String -> a
4113 </programlisting>
4114 The only predefined instance is the obvious one to make strings work as usual:
4115 <programlisting>
4116 instance IsString [Char] where
4117     fromString cs = cs
4118 </programlisting>
4119 The class <literal>IsString</literal> is not in scope by default.  If you want to mention
4120 it explicitly (for example, to give an instance declaration for it), you can import it
4121 from module <literal>GHC.Exts</literal>.
4122 </para>
4123 <para>
4124 Haskell's defaulting mechanism is extended to cover string literals, when <option>-XOverloadedStrings</option> is specified.
4125 Specifically:
4126 <itemizedlist>
4127 <listitem><para>
4128 Each type in a default declaration must be an
4129 instance of <literal>Num</literal> <emphasis>or</emphasis> of <literal>IsString</literal>.
4130 </para></listitem>
4131
4132 <listitem><para>
4133 The standard defaulting rule (<ulink url="http://www.haskell.org/onlinereport/decls.html#sect4.3.4">Haskell Report, Section 4.3.4</ulink>)
4134 is extended thus: defaulting applies when all the unresolved constraints involve standard classes
4135 <emphasis>or</emphasis> <literal>IsString</literal>; and at least one is a numeric class
4136 <emphasis>or</emphasis> <literal>IsString</literal>.
4137 </para></listitem>
4138 </itemizedlist>
4139 </para>
4140 <para>
4141 A small example:
4142 <programlisting>
4143 module Main where
4144
4145 import GHC.Exts( IsString(..) )
4146
4147 newtype MyString = MyString String deriving (Eq, Show)
4148 instance IsString MyString where
4149     fromString = MyString
4150
4151 greet :: MyString -> MyString
4152 greet "hello" = "world"
4153 greet other = other
4154
4155 main = do
4156     print $ greet "hello"
4157     print $ greet "fool"
4158 </programlisting>
4159 </para>
4160 <para>
4161 Note that deriving <literal>Eq</literal> is necessary for the pattern matching
4162 to work since it gets translated into an equality comparison.
4163 </para>
4164 </sect2>
4165
4166 </sect1>
4167
4168 <sect1 id="type-families">
4169 <title>Type families</title>
4170
4171 <para>
4172   <firstterm>Indexed type families</firstterm> are a new GHC extension to
4173   facilitate type-level
4174   programming. Type families are a generalisation of <firstterm>associated
4175   data types</firstterm>
4176   (&ldquo;<ulink url="http://www.cse.unsw.edu.au/~chak/papers/CKPM05.html">Associated
4177   Types with Class</ulink>&rdquo;, M. Chakravarty, G. Keller, S. Peyton Jones,
4178   and S. Marlow. In Proceedings of &ldquo;The 32nd Annual ACM SIGPLAN-SIGACT
4179      Symposium on Principles of Programming Languages (POPL'05)&rdquo;, pages
4180   1-13, ACM Press, 2005) and <firstterm>associated type synonyms</firstterm>
4181   (&ldquo;<ulink url="http://www.cse.unsw.edu.au/~chak/papers/CKP05.html">Type
4182   Associated Type Synonyms</ulink>&rdquo;. M. Chakravarty, G. Keller, and
4183   S. Peyton Jones.
4184   In Proceedings of &ldquo;The Tenth ACM SIGPLAN International Conference on
4185   Functional Programming&rdquo;, ACM Press, pages 241-253, 2005).  Type families
4186   themselves are described in the paper &ldquo;<ulink
4187   url="http://www.cse.unsw.edu.au/~chak/papers/SPCS08.html">Type
4188   Checking with Open Type Functions</ulink>&rdquo;, T. Schrijvers,
4189   S. Peyton-Jones,
4190   M. Chakravarty, and M. Sulzmann, in Proceedings of &ldquo;ICFP 2008: The
4191   13th ACM SIGPLAN International Conference on Functional
4192   Programming&rdquo;, ACM Press, pages 51-62, 2008. Type families
4193   essentially provide type-indexed data types and named functions on types,
4194   which are useful for generic programming and highly parameterised library
4195   interfaces as well as interfaces with enhanced static information, much like
4196   dependent types. They might also be regarded as an alternative to functional
4197   dependencies, but provide a more functional style of type-level programming
4198   than the relational style of functional dependencies.
4199 </para>
4200 <para>
4201   Indexed type families, or type families for short, are type constructors that
4202   represent sets of types. Set members are denoted by supplying the type family
4203   constructor with type parameters, which are called <firstterm>type
4204   indices</firstterm>. The
4205   difference between vanilla parametrised type constructors and family
4206   constructors is much like between parametrically polymorphic functions and
4207   (ad-hoc polymorphic) methods of type classes. Parametric polymorphic functions
4208   behave the same at all type instances, whereas class methods can change their
4209   behaviour in dependence on the class type parameters. Similarly, vanilla type
4210   constructors imply the same data representation for all type instances, but
4211   family constructors can have varying representation types for varying type
4212   indices.
4213 </para>
4214 <para>
4215   Indexed type families come in two flavours: <firstterm>data
4216     families</firstterm> and <firstterm>type synonym
4217     families</firstterm>. They are the indexed family variants of algebraic
4218   data types and type synonyms, respectively. The instances of data families
4219   can be data types and newtypes.
4220 </para>
4221 <para>
4222   Type families are enabled by the flag <option>-XTypeFamilies</option>.
4223   Additional information on the use of type families in GHC is available on
4224   <ulink url="http://www.haskell.org/haskellwiki/GHC/Indexed_types">the
4225   Haskell wiki page on type families</ulink>.
4226 </para>
4227
4228 <sect2 id="data-families">
4229   <title>Data families</title>
4230
4231   <para>
4232     Data families appear in two flavours: (1) they can be defined on the
4233     toplevel
4234     or (2) they can appear inside type classes (in which case they are known as
4235     associated types). The former is the more general variant, as it lacks the
4236     requirement for the type-indexes to coincide with the class
4237     parameters. However, the latter can lead to more clearly structured code and
4238     compiler warnings if some type instances were - possibly accidentally -
4239     omitted. In the following, we always discuss the general toplevel form first
4240     and then cover the additional constraints placed on associated types.
4241   </para>
4242
4243   <sect3 id="data-family-declarations">
4244     <title>Data family declarations</title>
4245
4246     <para>
4247       Indexed data families are introduced by a signature, such as
4248 <programlisting>
4249 data family GMap k :: * -> *
4250 </programlisting>
4251       The special <literal>family</literal> distinguishes family from standard
4252       data declarations.  The result kind annotation is optional and, as
4253       usual, defaults to <literal>*</literal> if omitted.  An example is
4254 <programlisting>
4255 data family Array e
4256 </programlisting>
4257       Named arguments can also be given explicit kind signatures if needed.
4258       Just as with
4259       [http://www.haskell.org/ghc/docs/latest/html/users_guide/gadt.html GADT
4260       declarations] named arguments are entirely optional, so that we can
4261       declare <literal>Array</literal> alternatively with
4262 <programlisting>
4263 data family Array :: * -> *
4264 </programlisting>
4265     </para>
4266
4267     <sect4 id="assoc-data-family-decl">
4268       <title>Associated data family declarations</title>
4269       <para>
4270         When a data family is declared as part of a type class, we drop
4271         the <literal>family</literal> special.  The <literal>GMap</literal>
4272         declaration takes the following form
4273 <programlisting>
4274 class GMapKey k where
4275   data GMap k :: * -> *
4276   ...
4277 </programlisting>
4278         In contrast to toplevel declarations, named arguments must be used for
4279         all type parameters that are to be used as type-indexes.  Moreover,
4280         the argument names must be class parameters.  Each class parameter may
4281         only be used at most once per associated type, but some may be omitted
4282         and they may be in an order other than in the class head.  Hence, the
4283         following contrived example is admissible:
4284 <programlisting>
4285   class C a b c where
4286   data T c a :: *
4287 </programlisting>
4288       </para>
4289     </sect4>
4290   </sect3>
4291
4292   <sect3 id="data-instance-declarations">
4293     <title>Data instance declarations</title>
4294
4295     <para>
4296       Instance declarations of data and newtype families are very similar to
4297       standard data and newtype declarations.  The only two differences are
4298       that the keyword <literal>data</literal> or <literal>newtype</literal>
4299       is followed by <literal>instance</literal> and that some or all of the
4300       type arguments can be non-variable types, but may not contain forall
4301       types or type synonym families.  However, data families are generally
4302       allowed in type parameters, and type synonyms are allowed as long as
4303       they are fully applied and expand to a type that is itself admissible -
4304       exactly as this is required for occurrences of type synonyms in class
4305       instance parameters.  For example, the <literal>Either</literal>
4306       instance for <literal>GMap</literal> is
4307 <programlisting>
4308 data instance GMap (Either a b) v = GMapEither (GMap a v) (GMap b v)
4309 </programlisting>
4310       In this example, the declaration has only one variant.  In general, it
4311       can be any number.
4312     </para>
4313     <para>
4314       Data and newtype instance declarations are only permitted when an
4315       appropriate family declaration is in scope - just as a class instance declaratoin
4316       requires the class declaration to be visible.  Moreover, each instance
4317       declaration has to conform to the kind determined by its family
4318       declaration.  This implies that the number of parameters of an instance
4319       declaration matches the arity determined by the kind of the family.
4320     </para>
4321     <para>
4322       A data family instance declaration can use the full exprssiveness of
4323       ordinary <literal>data</literal> or <literal>newtype</literal> declarations:
4324       <itemizedlist>
4325       <listitem><para> Although, a data family is <emphasis>introduced</emphasis> with
4326       the keyword "<literal>data</literal>", a data family <emphasis>instance</emphasis> can
4327       use either <literal>data</literal> or <literal>newtype</literal>. For example:
4328 <programlisting>
4329 data family T a
4330 data    instance T Int  = T1 Int | T2 Bool
4331 newtype instance T Char = TC Bool
4332 </programlisting>
4333       </para></listitem>
4334       <listitem><para> A <literal>data instance</literal> can use GADT syntax for the data constructors,
4335       and indeed can define a GADT.  For example:
4336 <programlisting>
4337 data family G a b
4338 data instance G [a] b where
4339    G1 :: c -> G [Int] b
4340    G2 :: G [a] Bool
4341 </programlisting>
4342       </para></listitem>
4343       <listitem><para> You can use a <literal>deriving</literal> clause on a
4344       <literal>data instance</literal> or <literal>newtype instance</literal>
4345       declaration.
4346       </para></listitem>
4347       </itemizedlist>
4348     </para>
4349
4350     <para>
4351       Even if type families are defined as toplevel declarations, functions
4352       that perform different computations for different family instances may still
4353       need to be defined as methods of type classes.  In particular, the
4354       following is not possible:
4355 <programlisting>
4356 data family T a
4357 data instance T Int  = A
4358 data instance T Char = B
4359 foo :: T a -> Int
4360 foo A = 1             -- WRONG: These two equations together...
4361 foo B = 2             -- ...will produce a type error.
4362 </programlisting>
4363 Instead, you would have to write <literal>foo</literal> as a class operation, thus:
4364 <programlisting>
4365 class C a where
4366   foo :: T a -> Int
4367 instance Foo Int where
4368   foo A = 1
4369 instance Foo Char where
4370   foo B = 2
4371 </programlisting>
4372       (Given the functionality provided by GADTs (Generalised Algebraic Data
4373       Types), it might seem as if a definition, such as the above, should be
4374       feasible.  However, type families are - in contrast to GADTs - are
4375       <emphasis>open;</emphasis> i.e., new instances can always be added,
4376       possibly in other
4377       modules.  Supporting pattern matching across different data instances
4378       would require a form of extensible case construct.)
4379     </para>
4380
4381     <sect4 id="assoc-data-inst">
4382       <title>Associated data instances</title>
4383       <para>
4384         When an associated data family instance is declared within a type
4385         class instance, we drop the <literal>instance</literal> keyword in the
4386         family instance.  So, the <literal>Either</literal> instance
4387         for <literal>GMap</literal> becomes:
4388 <programlisting>
4389 instance (GMapKey a, GMapKey b) => GMapKey (Either a b) where
4390   data GMap (Either a b) v = GMapEither (GMap a v) (GMap b v)
4391   ...
4392 </programlisting>
4393         The most important point about associated family instances is that the
4394         type indexes corresponding to class parameters must be identical to
4395         the type given in the instance head; here this is the first argument
4396         of <literal>GMap</literal>, namely <literal>Either a b</literal>,
4397         which coincides with the only class parameter.  Any parameters to the
4398         family constructor that do not correspond to class parameters, need to
4399         be variables in every instance; here this is the
4400         variable <literal>v</literal>.
4401       </para>
4402       <para>
4403         Instances for an associated family can only appear as part of
4404         instances declarations of the class in which the family was declared -
4405         just as with the equations of the methods of a class.  Also in
4406         correspondence to how methods are handled, declarations of associated
4407         types can be omitted in class instances.  If an associated family
4408         instance is omitted, the corresponding instance type is not inhabited;
4409         i.e., only diverging expressions, such
4410         as <literal>undefined</literal>, can assume the type.
4411       </para>
4412     </sect4>
4413
4414     <sect4 id="scoping-class-params">
4415       <title>Scoping of class parameters</title>
4416       <para>
4417         In the case of multi-parameter type classes, the visibility of class
4418         parameters in the right-hand side of associated family instances
4419         depends <emphasis>solely</emphasis> on the parameters of the data
4420         family.  As an example, consider the simple class declaration
4421 <programlisting>
4422 class C a b where
4423   data T a
4424 </programlisting>
4425         Only one of the two class parameters is a parameter to the data
4426         family.  Hence, the following instance declaration is invalid:
4427 <programlisting>
4428 instance C [c] d where
4429   data T [c] = MkT (c, d)    -- WRONG!!  'd' is not in scope
4430 </programlisting>
4431         Here, the right-hand side of the data instance mentions the type
4432         variable <literal>d</literal> that does not occur in its left-hand
4433         side.  We cannot admit such data instances as they would compromise
4434         type safety.
4435       </para>
4436     </sect4>
4437
4438     <sect4 id="family-class-inst">
4439       <title>Type class instances of family instances</title>
4440       <para>
4441         Type class instances of instances of data families can be defined as
4442         usual, and in particular data instance declarations can
4443         have <literal>deriving</literal> clauses.  For example, we can write
4444 <programlisting>
4445 data GMap () v = GMapUnit (Maybe v)
4446                deriving Show
4447 </programlisting>
4448         which implicitly defines an instance of the form
4449 <programlisting>
4450 instance Show v => Show (GMap () v) where ...
4451 </programlisting>
4452       </para>
4453       <para>
4454         Note that class instances are always for
4455         particular <emphasis>instances</emphasis> of a data family and never
4456         for an entire family as a whole.  This is for essentially the same
4457         reasons that we cannot define a toplevel function that performs
4458         pattern matching on the data constructors
4459         of <emphasis>different</emphasis> instances of a single type family.
4460         It would require a form of extensible case construct.
4461       </para>
4462     </sect4>
4463
4464     <sect4 id="data-family-overlap">
4465       <title>Overlap of data instances</title>
4466       <para>
4467         The instance declarations of a data family used in a single program
4468         may not overlap at all, independent of whether they are associated or
4469         not.  In contrast to type class instances, this is not only a matter
4470         of consistency, but one of type safety.
4471       </para>
4472     </sect4>
4473
4474   </sect3>
4475
4476   <sect3 id="data-family-import-export">
4477     <title>Import and export</title>
4478
4479     <para>
4480       The association of data constructors with type families is more dynamic
4481       than that is the case with standard data and newtype declarations.  In
4482       the standard case, the notation <literal>T(..)</literal> in an import or
4483       export list denotes the type constructor and all the data constructors
4484       introduced in its declaration.  However, a family declaration never
4485       introduces any data constructors; instead, data constructors are
4486       introduced by family instances.  As a result, which data constructors
4487       are associated with a type family depends on the currently visible
4488       instance declarations for that family.  Consequently, an import or
4489       export item of the form <literal>T(..)</literal> denotes the family
4490       constructor and all currently visible data constructors - in the case of
4491       an export item, these may be either imported or defined in the current
4492       module.  The treatment of import and export items that explicitly list
4493       data constructors, such as <literal>GMap(GMapEither)</literal>, is
4494       analogous.
4495     </para>
4496
4497     <sect4 id="data-family-impexp-assoc">
4498       <title>Associated families</title>
4499       <para>
4500         As expected, an import or export item of the
4501         form <literal>C(..)</literal> denotes all of the class' methods and
4502         associated types.  However, when associated types are explicitly
4503         listed as subitems of a class, we need some new syntax, as uppercase
4504         identifiers as subitems are usually data constructors, not type
4505         constructors.  To clarify that we denote types here, each associated
4506         type name needs to be prefixed by the keyword <literal>type</literal>.
4507         So for example, when explicitly listing the components of
4508         the <literal>GMapKey</literal> class, we write <literal>GMapKey(type
4509         GMap, empty, lookup, insert)</literal>.
4510       </para>
4511     </sect4>
4512
4513     <sect4 id="data-family-impexp-examples">
4514       <title>Examples</title>
4515       <para>
4516         Assuming our running <literal>GMapKey</literal> class example, let us
4517         look at some export lists and their meaning:
4518         <itemizedlist>
4519           <listitem>
4520             <para><literal>module GMap (GMapKey) where...</literal>: Exports
4521               just the class name.</para>
4522           </listitem>
4523           <listitem>
4524             <para><literal>module GMap (GMapKey(..)) where...</literal>:
4525               Exports the class, the associated type <literal>GMap</literal>
4526               and the member
4527               functions <literal>empty</literal>, <literal>lookup</literal>,
4528               and <literal>insert</literal>.  None of the data constructors is
4529               exported.</para>
4530           </listitem>
4531           <listitem>
4532             <para><literal>module GMap (GMapKey(..), GMap(..))
4533                 where...</literal>: As before, but also exports all the data
4534               constructors <literal>GMapInt</literal>,
4535               <literal>GMapChar</literal>,
4536               <literal>GMapUnit</literal>, <literal>GMapPair</literal>,
4537               and <literal>GMapUnit</literal>.</para>
4538           </listitem>
4539           <listitem>
4540             <para><literal>module GMap (GMapKey(empty, lookup, insert),
4541             GMap(..)) where...</literal>: As before.</para>
4542           </listitem>
4543           <listitem>
4544             <para><literal>module GMap (GMapKey, empty, lookup, insert, GMap(..))
4545                 where...</literal>: As before.</para>
4546           </listitem>
4547         </itemizedlist>
4548       </para>
4549       <para>
4550         Finally, you can write <literal>GMapKey(type GMap)</literal> to denote
4551         both the class <literal>GMapKey</literal> as well as its associated
4552         type <literal>GMap</literal>.  However, you cannot
4553         write <literal>GMapKey(type GMap(..))</literal> &mdash; i.e.,
4554         sub-component specifications cannot be nested.  To
4555         specify <literal>GMap</literal>'s data constructors, you have to list
4556         it separately.
4557       </para>
4558     </sect4>
4559
4560     <sect4 id="data-family-impexp-instances">
4561       <title>Instances</title>
4562       <para>
4563         Family instances are implicitly exported, just like class instances.
4564         However, this applies only to the heads of instances, not to the data
4565         constructors an instance defines.
4566       </para>
4567     </sect4>
4568
4569   </sect3>
4570
4571 </sect2>
4572
4573 <sect2 id="synonym-families">
4574   <title>Synonym families</title>
4575
4576   <para>
4577     Type families appear in two flavours: (1) they can be defined on the
4578     toplevel or (2) they can appear inside type classes (in which case they
4579     are known as associated type synonyms).  The former is the more general
4580     variant, as it lacks the requirement for the type-indexes to coincide with
4581     the class parameters.  However, the latter can lead to more clearly
4582     structured code and compiler warnings if some type instances were -
4583     possibly accidentally - omitted.  In the following, we always discuss the
4584     general toplevel form first and then cover the additional constraints
4585     placed on associated types.
4586   </para>
4587
4588   <sect3 id="type-family-declarations">
4589     <title>Type family declarations</title>
4590
4591     <para>
4592       Indexed type families are introduced by a signature, such as
4593 <programlisting>
4594 type family Elem c :: *
4595 </programlisting>
4596       The special <literal>family</literal> distinguishes family from standard
4597       type declarations.  The result kind annotation is optional and, as
4598       usual, defaults to <literal>*</literal> if omitted.  An example is
4599 <programlisting>
4600 type family Elem c
4601 </programlisting>
4602       Parameters can also be given explicit kind signatures if needed.  We
4603       call the number of parameters in a type family declaration, the family's
4604       arity, and all applications of a type family must be fully saturated
4605       w.r.t. to that arity.  This requirement is unlike ordinary type synonyms
4606       and it implies that the kind of a type family is not sufficient to
4607       determine a family's arity, and hence in general, also insufficient to
4608       determine whether a type family application is well formed.  As an
4609       example, consider the following declaration:
4610 <programlisting>
4611 type family F a b :: * -> *   -- F's arity is 2,
4612                               -- although its overall kind is * -> * -> * -> *
4613 </programlisting>
4614       Given this declaration the following are examples of well-formed and
4615       malformed types:
4616 <programlisting>
4617 F Char [Int]       -- OK!  Kind: * -> *
4618 F Char [Int] Bool  -- OK!  Kind: *
4619 F IO Bool          -- WRONG: kind mismatch in the first argument
4620 F Bool             -- WRONG: unsaturated application
4621 </programlisting>
4622       </para>
4623
4624     <sect4 id="assoc-type-family-decl">
4625       <title>Associated type family declarations</title>
4626       <para>
4627         When a type family is declared as part of a type class, we drop
4628         the <literal>family</literal> special.  The <literal>Elem</literal>
4629         declaration takes the following form
4630 <programlisting>
4631 class Collects ce where
4632   type Elem ce :: *
4633   ...
4634 </programlisting>
4635         The argument names of the type family must be class parameters.  Each
4636         class parameter may only be used at most once per associated type, but
4637         some may be omitted and they may be in an order other than in the
4638         class head.  Hence, the following contrived example is admissible:
4639 <programlisting>
4640 class C a b c where
4641   type T c a :: *
4642 </programlisting>
4643         These rules are exactly as for associated data families.
4644       </para>
4645     </sect4>
4646   </sect3>
4647
4648   <sect3 id="type-instance-declarations">
4649     <title>Type instance declarations</title>
4650     <para>
4651       Instance declarations of type families are very similar to standard type
4652       synonym declarations.  The only two differences are that the
4653       keyword <literal>type</literal> is followed
4654       by <literal>instance</literal> and that some or all of the type
4655       arguments can be non-variable types, but may not contain forall types or
4656       type synonym families. However, data families are generally allowed, and
4657       type synonyms are allowed as long as they are fully applied and expand
4658       to a type that is admissible - these are the exact same requirements as
4659       for data instances.  For example, the <literal>[e]</literal> instance
4660       for <literal>Elem</literal> is
4661 <programlisting>
4662 type instance Elem [e] = e
4663 </programlisting>
4664     </para>
4665     <para>
4666       Type family instance declarations are only legitimate when an
4667       appropriate family declaration is in scope - just like class instances
4668       require the class declaration to be visible.  Moreover, each instance
4669       declaration has to conform to the kind determined by its family
4670       declaration, and the number of type parameters in an instance
4671       declaration must match the number of type parameters in the family
4672       declaration.   Finally, the right-hand side of a type instance must be a
4673       monotype (i.e., it may not include foralls) and after the expansion of
4674       all saturated vanilla type synonyms, no synonyms, except family synonyms
4675       may remain.  Here are some examples of admissible and illegal type
4676       instances:
4677 <programlisting>
4678 type family F a :: *
4679 type instance F [Int]              = Int         -- OK!
4680 type instance F String             = Char        -- OK!
4681 type instance F (F a)              = a           -- WRONG: type parameter mentions a type family
4682 type instance F (forall a. (a, b)) = b           -- WRONG: a forall type appears in a type parameter
4683 type instance F Float              = forall a.a  -- WRONG: right-hand side may not be a forall type
4684
4685 type family G a b :: * -> *
4686 type instance G Int            = (,)     -- WRONG: must be two type parameters
4687 type instance G Int Char Float = Double  -- WRONG: must be two type parameters
4688 </programlisting>
4689     </para>
4690
4691     <sect4 id="assoc-type-instance">
4692       <title>Associated type instance declarations</title>
4693       <para>
4694         When an associated family instance is declared within a type class
4695         instance, we drop the <literal>instance</literal> keyword in the family
4696         instance.  So, the <literal>[e]</literal> instance
4697         for <literal>Elem</literal> becomes:
4698 <programlisting>
4699 instance (Eq (Elem [e])) => Collects ([e]) where
4700   type Elem [e] = e
4701   ...
4702 </programlisting>
4703         The most important point about associated family instances is that the
4704         type indexes corresponding to class parameters must be identical to the
4705         type given in the instance head; here this is <literal>[e]</literal>,
4706         which coincides with the only class parameter.
4707       </para>
4708       <para>
4709         Instances for an associated family can only appear as part of  instances
4710         declarations of the class in which the family was declared - just as
4711         with the equations of the methods of a class.  Also in correspondence to
4712         how methods are handled, declarations of associated types can be omitted
4713         in class instances.  If an associated family instance is omitted, the
4714         corresponding instance type is not inhabited; i.e., only diverging
4715         expressions, such as <literal>undefined</literal>, can assume the type.
4716       </para>
4717     </sect4>
4718
4719     <sect4 id="type-family-overlap">
4720       <title>Overlap of type synonym instances</title>
4721       <para>
4722         The instance declarations of a type family used in a single program
4723         may only overlap if the right-hand sides of the overlapping instances
4724         coincide for the overlapping types.  More formally, two instance
4725         declarations overlap if there is a substitution that makes the
4726         left-hand sides of the instances syntactically the same.  Whenever
4727         that is the case, the right-hand sides of the instances must also be
4728         syntactically equal under the same substitution.  This condition is
4729         independent of whether the type family is associated or not, and it is
4730         not only a matter of consistency, but one of type safety.
4731       </para>
4732       <para>
4733         Here are two example to illustrate the condition under which overlap
4734         is permitted.
4735 <programlisting>
4736 type instance F (a, Int) = [a]
4737 type instance F (Int, b) = [b]   -- overlap permitted
4738
4739 type instance G (a, Int)  = [a]
4740 type instance G (Char, a) = [a]  -- ILLEGAL overlap, as [Char] /= [Int]
4741 </programlisting>
4742       </para>
4743     </sect4>
4744
4745     <sect4 id="type-family-decidability">
4746       <title>Decidability of type synonym instances</title>
4747       <para>
4748         In order to guarantee that type inference in the presence of type
4749         families decidable, we need to place a number of additional
4750         restrictions on the formation of type instance declarations (c.f.,
4751         Definition 5 (Relaxed Conditions) of &ldquo;<ulink
4752         url="http://www.cse.unsw.edu.au/~chak/papers/SPCS08.html">Type
4753           Checking with Open Type Functions</ulink>&rdquo;).  Instance
4754           declarations have the general form
4755 <programlisting>
4756 type instance F t1 .. tn = t
4757 </programlisting>
4758         where we require that for every type family application <literal>(G s1
4759         .. sm)</literal> in <literal>t</literal>,
4760         <orderedlist>
4761           <listitem>
4762             <para><literal>s1 .. sm</literal> do not contain any type family
4763             constructors,</para>
4764           </listitem>
4765           <listitem>
4766             <para>the total number of symbols (data type constructors and type
4767             variables) in <literal>s1 .. sm</literal> is strictly smaller than
4768             in <literal>t1 .. tn</literal>, and</para>
4769           </listitem>
4770           <listitem>
4771             <para>for every type
4772             variable <literal>a</literal>, <literal>a</literal> occurs
4773             in <literal>s1 .. sm</literal> at most as often as in <literal>t1
4774             .. tn</literal>.</para>
4775           </listitem>
4776         </orderedlist>
4777         These restrictions are easily verified and ensure termination of type
4778         inference.  However, they are not sufficient to guarantee completeness
4779         of type inference in the presence of, so called, ''loopy equalities'',
4780         such as <literal>a ~ [F a]</literal>, where a recursive occurrence of
4781         a type variable is underneath a family application and data
4782         constructor application - see the above mentioned paper for details.
4783       </para>
4784       <para>
4785         If the option <option>-XUndecidableInstances</option> is passed to the
4786         compiler, the above restrictions are not enforced and it is on the
4787         programmer to ensure termination of the normalisation of type families
4788         during type inference.
4789       </para>
4790     </sect4>
4791   </sect3>
4792
4793   <sect3 id-="equality-constraints">
4794     <title>Equality constraints</title>
4795     <para>
4796       Type context can include equality constraints of the form <literal>t1 ~
4797       t2</literal>, which denote that the types <literal>t1</literal>
4798       and <literal>t2</literal> need to be the same.  In the presence of type
4799       families, whether two types are equal cannot generally be decided
4800       locally.  Hence, the contexts of function signatures may include
4801       equality constraints, as in the following example:
4802 <programlisting>
4803 sumCollects :: (Collects c1, Collects c2, Elem c1 ~ Elem c2) => c1 -> c2 -> c2
4804 </programlisting>
4805       where we require that the element type of <literal>c1</literal>
4806       and <literal>c2</literal> are the same.  In general, the
4807       types <literal>t1</literal> and <literal>t2</literal> of an equality
4808       constraint may be arbitrary monotypes; i.e., they may not contain any
4809       quantifiers, independent of whether higher-rank types are otherwise
4810       enabled.
4811     </para>
4812     <para>
4813       Equality constraints can also appear in class and instance contexts.
4814       The former enable a simple translation of programs using functional
4815       dependencies into programs using family synonyms instead.  The general
4816       idea is to rewrite a class declaration of the form
4817 <programlisting>
4818 class C a b | a -> b
4819 </programlisting>
4820       to
4821 <programlisting>
4822 class (F a ~ b) => C a b where
4823   type F a
4824 </programlisting>
4825       That is, we represent every functional dependency (FD) <literal>a1 .. an
4826       -> b</literal> by an FD type family <literal>F a1 .. an</literal> and a
4827       superclass context equality <literal>F a1 .. an ~ b</literal>,
4828       essentially giving a name to the functional dependency.  In class
4829       instances, we define the type instances of FD families in accordance
4830       with the class head.  Method signatures are not affected by that
4831       process.
4832     </para>
4833     <para>
4834       NB: Equalities in superclass contexts are not fully implemented in
4835       GHC 6.10.
4836     </para>
4837   </sect3>
4838
4839   <sect3 id-="ty-fams-in-instances">
4840     <title>Type families and instance declarations</title>
4841     <para>Type families require us to extend the rules for
4842       the form of instance heads, which are given
4843       in <xref linkend="flexible-instance-head"/>.
4844       Specifically:
4845 <itemizedlist>
4846  <listitem><para>Data type families may appear in an instance head</para></listitem>
4847  <listitem><para>Type synonym families may not appear (at all) in an instance head</para></listitem>
4848 </itemizedlist>
4849 The reason for the latter restriction is that there is no way to check for. Consider
4850 <programlisting>
4851    type family F a
4852    type instance F Bool = Int
4853
4854    class C a
4855
4856    instance C Int
4857    instance C (F a)
4858 </programlisting>
4859 Now a constraint <literal>(C (F Bool))</literal> would match both instances.
4860 The situation is especially bad because the type instance for <literal>F Bool</literal>
4861 might be in another module, or even in a module that is not yet written.
4862 </para>
4863 </sect3>
4864 </sect2>
4865
4866 </sect1>
4867
4868 <sect1 id="other-type-extensions">
4869 <title>Other type system extensions</title>
4870
4871 <sect2 id="explicit-foralls"><title>Explicit universal quantification (forall)</title>
4872 <para>
4873 Haskell type signatures are implicitly quantified.  When the language option <option>-XExplicitForAll</option>
4874 is used, the keyword <literal>forall</literal>
4875 allows us to say exactly what this means.  For example:
4876 </para>
4877 <para>
4878 <programlisting>
4879         g :: b -> b
4880 </programlisting>
4881 means this:
4882 <programlisting>
4883         g :: forall b. (b -> b)
4884 </programlisting>
4885 The two are treated identically.
4886 </para>
4887 <para>
4888 Of course <literal>forall</literal> becomes a keyword; you can't use <literal>forall</literal> as
4889 a type variable any more!
4890 </para>
4891 </sect2>
4892
4893
4894 <sect2 id="flexible-contexts"><title>The context of a type signature</title>
4895 <para>
4896 The <option>-XFlexibleContexts</option> flag lifts the Haskell 98 restriction
4897 that the type-class constraints in a type signature must have the
4898 form <emphasis>(class type-variable)</emphasis> or
4899 <emphasis>(class (type-variable type-variable ...))</emphasis>.
4900 With <option>-XFlexibleContexts</option>
4901 these type signatures are perfectly OK
4902 <programlisting>
4903   g :: Eq [a] => ...
4904   g :: Ord (T a ()) => ...
4905 </programlisting>
4906 The flag <option>-XFlexibleContexts</option> also lifts the corresponding
4907 restriction on class declarations (<xref linkend="superclass-rules"/>) and instance declarations
4908 (<xref linkend="instance-rules"/>).
4909 </para>
4910
4911 <para>
4912 GHC imposes the following restrictions on the constraints in a type signature.
4913 Consider the type:
4914
4915 <programlisting>
4916   forall tv1..tvn (c1, ...,cn) => type
4917 </programlisting>
4918
4919 (Here, we write the "foralls" explicitly, although the Haskell source
4920 language omits them; in Haskell 98, all the free type variables of an
4921 explicit source-language type signature are universally quantified,
4922 except for the class type variables in a class declaration.  However,
4923 in GHC, you can give the foralls if you want.  See <xref linkend="explicit-foralls"/>).
4924 </para>
4925
4926 <para>
4927
4928 <orderedlist>
4929 <listitem>
4930
4931 <para>
4932  <emphasis>Each universally quantified type variable
4933 <literal>tvi</literal> must be reachable from <literal>type</literal></emphasis>.
4934
4935 A type variable <literal>a</literal> is "reachable" if it appears
4936 in the same constraint as either a type variable free in
4937 <literal>type</literal>, or another reachable type variable.
4938 A value with a type that does not obey
4939 this reachability restriction cannot be used without introducing
4940 ambiguity; that is why the type is rejected.
4941 Here, for example, is an illegal type:
4942
4943
4944 <programlisting>
4945   forall a. Eq a => Int
4946 </programlisting>
4947
4948
4949 When a value with this type was used, the constraint <literal>Eq tv</literal>
4950 would be introduced where <literal>tv</literal> is a fresh type variable, and
4951 (in the dictionary-translation implementation) the value would be
4952 applied to a dictionary for <literal>Eq tv</literal>.  The difficulty is that we
4953 can never know which instance of <literal>Eq</literal> to use because we never
4954 get any more information about <literal>tv</literal>.
4955 </para>
4956 <para>
4957 Note
4958 that the reachability condition is weaker than saying that <literal>a</literal> is
4959 functionally dependent on a type variable free in
4960 <literal>type</literal> (see <xref
4961 linkend="functional-dependencies"/>).  The reason for this is there
4962 might be a "hidden" dependency, in a superclass perhaps.  So
4963 "reachable" is a conservative approximation to "functionally dependent".
4964 For example, consider:
4965 <programlisting>
4966   class C a b | a -> b where ...
4967   class C a b => D a b where ...
4968   f :: forall a b. D a b => a -> a
4969 </programlisting>
4970 This is fine, because in fact <literal>a</literal> does functionally determine <literal>b</literal>
4971 but that is not immediately apparent from <literal>f</literal>'s type.
4972 </para>
4973 </listitem>
4974 <listitem>
4975
4976 <para>
4977  <emphasis>Every constraint <literal>ci</literal> must mention at least one of the
4978 universally quantified type variables <literal>tvi</literal></emphasis>.
4979
4980 For example, this type is OK because <literal>C a b</literal> mentions the
4981 universally quantified type variable <literal>b</literal>:
4982
4983
4984 <programlisting>
4985   forall a. C a b => burble
4986 </programlisting>
4987
4988
4989 The next type is illegal because the constraint <literal>Eq b</literal> does not
4990 mention <literal>a</literal>:
4991
4992
4993 <programlisting>
4994   forall a. Eq b => burble
4995 </programlisting>
4996
4997
4998 The reason for this restriction is milder than the other one.  The
4999 excluded types are never useful or necessary (because the offending
5000 context doesn't need to be witnessed at this point; it can be floated
5001 out).  Furthermore, floating them out increases sharing. Lastly,
5002 excluding them is a conservative choice; it leaves a patch of
5003 territory free in case we need it later.
5004
5005 </para>
5006 </listitem>
5007
5008 </orderedlist>
5009
5010 </para>
5011
5012 </sect2>
5013
5014 <sect2 id="implicit-parameters">
5015 <title>Implicit parameters</title>
5016
5017 <para> Implicit parameters are implemented as described in
5018 "Implicit parameters: dynamic scoping with static types",
5019 J Lewis, MB Shields, E Meijer, J Launchbury,
5020 27th ACM Symposium on Principles of Programming Languages (POPL'00),
5021 Boston, Jan 2000.
5022 </para>
5023
5024 <para>(Most of the following, still rather incomplete, documentation is
5025 due to Jeff Lewis.)</para>
5026
5027 <para>Implicit parameter support is enabled with the option
5028 <option>-XImplicitParams</option>.</para>
5029
5030 <para>
5031 A variable is called <emphasis>dynamically bound</emphasis> when it is bound by the calling
5032 context of a function and <emphasis>statically bound</emphasis> when bound by the callee's
5033 context. In Haskell, all variables are statically bound. Dynamic
5034 binding of variables is a notion that goes back to Lisp, but was later
5035 discarded in more modern incarnations, such as Scheme. Dynamic binding
5036 can be very confusing in an untyped language, and unfortunately, typed
5037 languages, in particular Hindley-Milner typed languages like Haskell,
5038 only support static scoping of variables.
5039 </para>
5040 <para>
5041 However, by a simple extension to the type class system of Haskell, we
5042 can support dynamic binding. Basically, we express the use of a
5043 dynamically bound variable as a constraint on the type. These
5044 constraints lead to types of the form <literal>(?x::t') => t</literal>, which says "this
5045 function uses a dynamically-bound variable <literal>?x</literal>
5046 of type <literal>t'</literal>". For
5047 example, the following expresses the type of a sort function,
5048 implicitly parameterized by a comparison function named <literal>cmp</literal>.
5049 <programlisting>
5050   sort :: (?cmp :: a -> a -> Bool) => [a] -> [a]
5051 </programlisting>
5052 The dynamic binding constraints are just a new form of predicate in the type class system.
5053 </para>
5054 <para>
5055 An implicit parameter occurs in an expression using the special form <literal>?x</literal>,
5056 where <literal>x</literal> is
5057 any valid identifier (e.g. <literal>ord ?x</literal> is a valid expression).
5058 Use of this construct also introduces a new
5059 dynamic-binding constraint in the type of the expression.
5060 For example, the following definition
5061 shows how we can define an implicitly parameterized sort function in
5062 terms of an explicitly parameterized <literal>sortBy</literal> function:
5063 <programlisting>
5064   sortBy :: (a -> a -> Bool) -> [a] -> [a]
5065
5066   sort   :: (?cmp :: a -> a -> Bool) => [a] -> [a]
5067   sort    = sortBy ?cmp
5068 </programlisting>
5069 </para>
5070
5071 <sect3>
5072 <title>Implicit-parameter type constraints</title>
5073 <para>
5074 Dynamic binding constraints behave just like other type class
5075 constraints in that they are automatically propagated. Thus, when a
5076 function is used, its implicit parameters are inherited by the
5077 function that called it. For example, our <literal>sort</literal> function might be used
5078 to pick out the least value in a list:
5079 <programlisting>
5080   least   :: (?cmp :: a -> a -> Bool) => [a] -> a
5081   least xs = head (sort xs)
5082 </programlisting>
5083 Without lifting a finger, the <literal>?cmp</literal> parameter is
5084 propagated to become a parameter of <literal>least</literal> as well. With explicit
5085 parameters, the default is that parameters must always be explicit
5086 propagated. With implicit parameters, the default is to always
5087 propagate them.
5088 </para>
5089 <para>
5090 An implicit-parameter type constraint differs from other type class constraints in the
5091 following way: All uses of a particular implicit parameter must have
5092 the same type. This means that the type of <literal>(?x, ?x)</literal>
5093 is <literal>(?x::a) => (a,a)</literal>, and not
5094 <literal>(?x::a, ?x::b) => (a, b)</literal>, as would be the case for type
5095 class constraints.
5096 </para>
5097
5098 <para> You can't have an implicit parameter in the context of a class or instance
5099 declaration.  For example, both these declarations are illegal:
5100 <programlisting>
5101   class (?x::Int) => C a where ...
5102   instance (?x::a) => Foo [a] where ...
5103 </programlisting>
5104 Reason: exactly which implicit parameter you pick up depends on exactly where
5105 you invoke a function. But the ``invocation'' of instance declarations is done
5106 behind the scenes by the compiler, so it's hard to figure out exactly where it is done.
5107 Easiest thing is to outlaw the offending types.</para>
5108 <para>
5109 Implicit-parameter constraints do not cause ambiguity.  For example, consider:
5110 <programlisting>
5111    f :: (?x :: [a]) => Int -> Int
5112    f n = n + length ?x
5113
5114    g :: (Read a, Show a) => String -> String
5115    g s = show (read s)
5116 </programlisting>
5117 Here, <literal>g</literal> has an ambiguous type, and is rejected, but <literal>f</literal>
5118 is fine.  The binding for <literal>?x</literal> at <literal>f</literal>'s call site is
5119 quite unambiguous, and fixes the type <literal>a</literal>.
5120 </para>
5121 </sect3>
5122
5123 <sect3>
5124 <title>Implicit-parameter bindings</title>
5125
5126 <para>
5127 An implicit parameter is <emphasis>bound</emphasis> using the standard
5128 <literal>let</literal> or <literal>where</literal> binding forms.
5129 For example, we define the <literal>min</literal> function by binding
5130 <literal>cmp</literal>.
5131 <programlisting>
5132   min :: [a] -> a
5133   min  = let ?cmp = (&lt;=) in least
5134 </programlisting>
5135 </para>
5136 <para>
5137 A group of implicit-parameter bindings may occur anywhere a normal group of Haskell
5138 bindings can occur, except at top level.  That is, they can occur in a <literal>let</literal>
5139 (including in a list comprehension, or do-notation, or pattern guards),
5140 or a <literal>where</literal> clause.
5141 Note the following points:
5142 <itemizedlist>
5143 <listitem><para>
5144 An implicit-parameter binding group must be a
5145 collection of simple bindings to implicit-style variables (no
5146 function-style bindings, and no type signatures); these bindings are
5147 neither polymorphic or recursive.
5148 </para></listitem>
5149 <listitem><para>
5150 You may not mix implicit-parameter bindings with ordinary bindings in a
5151 single <literal>let</literal>
5152 expression; use two nested <literal>let</literal>s instead.
5153 (In the case of <literal>where</literal> you are stuck, since you can't nest <literal>where</literal> clauses.)
5154 </para></listitem>
5155
5156 <listitem><para>
5157 You may put multiple implicit-parameter bindings in a
5158 single binding group; but they are <emphasis>not</emphasis> treated
5159 as a mutually recursive group (as ordinary <literal>let</literal> bindings are).
5160 Instead they are treated as a non-recursive group, simultaneously binding all the implicit
5161 parameter.  The bindings are not nested, and may be re-ordered without changing
5162 the meaning of the program.
5163 For example, consider:
5164 <programlisting>
5165   f t = let { ?x = t; ?y = ?x+(1::Int) } in ?x + ?y
5166 </programlisting>
5167 The use of <literal>?x</literal> in the binding for <literal>?y</literal> does not "see"
5168 the binding for <literal>?x</literal>, so the type of <literal>f</literal> is
5169 <programlisting>
5170   f :: (?x::Int) => Int -> Int
5171 </programlisting>
5172 </para></listitem>
5173 </itemizedlist>
5174 </para>
5175
5176 </sect3>
5177
5178 <sect3><title>Implicit parameters and polymorphic recursion</title>
5179
5180 <para>
5181 Consider these two definitions:
5182 <programlisting>
5183   len1 :: [a] -> Int
5184   len1 xs = let ?acc = 0 in len_acc1 xs
5185
5186   len_acc1 [] = ?acc
5187   len_acc1 (x:xs) = let ?acc = ?acc + (1::Int) in len_acc1 xs
5188
5189   ------------
5190
5191   len2 :: [a] -> Int
5192   len2 xs = let ?acc = 0 in len_acc2 xs
5193
5194   len_acc2 :: (?acc :: Int) => [a] -> Int
5195   len_acc2 [] = ?acc
5196   len_acc2 (x:xs) = let ?acc = ?acc + (1::Int) in len_acc2 xs
5197 </programlisting>
5198 The only difference between the two groups is that in the second group
5199 <literal>len_acc</literal> is given a type signature.
5200 In the former case, <literal>len_acc1</literal> is monomorphic in its own
5201 right-hand side, so the implicit parameter <literal>?acc</literal> is not
5202 passed to the recursive call.  In the latter case, because <literal>len_acc2</literal>
5203 has a type signature, the recursive call is made to the
5204 <emphasis>polymorphic</emphasis> version, which takes <literal>?acc</literal>
5205 as an implicit parameter.  So we get the following results in GHCi:
5206 <programlisting>
5207   Prog> len1 "hello"
5208   0
5209   Prog> len2 "hello"
5210   5
5211 </programlisting>
5212 Adding a type signature dramatically changes the result!  This is a rather
5213 counter-intuitive phenomenon, worth watching out for.
5214 </para>
5215 </sect3>
5216
5217 <sect3><title>Implicit parameters and monomorphism</title>
5218
5219 <para>GHC applies the dreaded Monomorphism Restriction (section 4.5.5 of the
5220 Haskell Report) to implicit parameters.  For example, consider:
5221 <programlisting>
5222  f :: Int -> Int
5223   f v = let ?x = 0     in
5224         let y = ?x + v in
5225         let ?x = 5     in
5226         y
5227 </programlisting>
5228 Since the binding for <literal>y</literal> falls under the Monomorphism
5229 Restriction it is not generalised, so the type of <literal>y</literal> is
5230 simply <literal>Int</literal>, not <literal>(?x::Int) => Int</literal>.
5231 Hence, <literal>(f 9)</literal> returns result <literal>9</literal>.
5232 If you add a type signature for <literal>y</literal>, then <literal>y</literal>
5233 will get type <literal>(?x::Int) => Int</literal>, so the occurrence of
5234 <literal>y</literal> in the body of the <literal>let</literal> will see the
5235 inner binding of <literal>?x</literal>, so <literal>(f 9)</literal> will return
5236 <literal>14</literal>.
5237 </para>
5238 </sect3>
5239 </sect2>
5240
5241     <!--   ======================= COMMENTED OUT ========================
5242
5243     We intend to remove linear implicit parameters, so I'm at least removing
5244     them from the 6.6 user manual
5245
5246 <sect2 id="linear-implicit-parameters">
5247 <title>Linear implicit parameters</title>
5248 <para>
5249 Linear implicit parameters are an idea developed by Koen Claessen,
5250 Mark Shields, and Simon PJ.  They address the long-standing
5251 problem that monads seem over-kill for certain sorts of problem, notably:
5252 </para>
5253 <itemizedlist>
5254 <listitem> <para> distributing a supply of unique names </para> </listitem>
5255 <listitem> <para> distributing a supply of random numbers </para> </listitem>
5256 <listitem> <para> distributing an oracle (as in QuickCheck) </para> </listitem>
5257 </itemizedlist>
5258
5259 <para>
5260 Linear implicit parameters are just like ordinary implicit parameters,
5261 except that they are "linear"; that is, they cannot be copied, and
5262 must be explicitly "split" instead.  Linear implicit parameters are
5263 written '<literal>%x</literal>' instead of '<literal>?x</literal>'.
5264 (The '/' in the '%' suggests the split!)
5265 </para>
5266 <para>
5267 For example:
5268 <programlisting>
5269     import GHC.Exts( Splittable )
5270
5271     data NameSupply = ...
5272
5273     splitNS :: NameSupply -> (NameSupply, NameSupply)
5274     newName :: NameSupply -> Name
5275
5276     instance Splittable NameSupply where
5277         split = splitNS
5278
5279
5280     f :: (%ns :: NameSupply) => Env -> Expr -> Expr
5281     f env (Lam x e) = Lam x' (f env e)
5282                     where
5283                       x'   = newName %ns
5284                       env' = extend env x x'
5285     ...more equations for f...
5286 </programlisting>
5287 Notice that the implicit parameter %ns is consumed
5288 <itemizedlist>
5289 <listitem> <para> once by the call to <literal>newName</literal> </para> </listitem>
5290 <listitem> <para> once by the recursive call to <literal>f</literal> </para></listitem>
5291 </itemizedlist>
5292 </para>
5293 <para>
5294 So the translation done by the type checker makes
5295 the parameter explicit:
5296 <programlisting>
5297     f :: NameSupply -> Env -> Expr -> Expr
5298     f ns env (Lam x e) = Lam x' (f ns1 env e)
5299                        where
5300                          (ns1,ns2) = splitNS ns
5301                          x' = newName ns2
5302                          env = extend env x x'
5303 </programlisting>
5304 Notice the call to 'split' introduced by the type checker.
5305 How did it know to use 'splitNS'?  Because what it really did
5306 was to introduce a call to the overloaded function 'split',
5307 defined by the class <literal>Splittable</literal>:
5308 <programlisting>
5309         class Splittable a where
5310           split :: a -> (a,a)
5311 </programlisting>
5312 The instance for <literal>Splittable NameSupply</literal> tells GHC how to implement
5313 split for name supplies.  But we can simply write
5314 <programlisting>
5315         g x = (x, %ns, %ns)
5316 </programlisting>
5317 and GHC will infer
5318 <programlisting>
5319         g :: (Splittable a, %ns :: a) => b -> (b,a,a)
5320 </programlisting>
5321 The <literal>Splittable</literal> class is built into GHC.  It's exported by module
5322 <literal>GHC.Exts</literal>.
5323 </para>
5324 <para>
5325 Other points:
5326 <itemizedlist>
5327 <listitem> <para> '<literal>?x</literal>' and '<literal>%x</literal>'
5328 are entirely distinct implicit parameters: you
5329   can use them together and they won't interfere with each other. </para>
5330 </listitem>
5331
5332 <listitem> <para> You can bind linear implicit parameters in 'with' clauses. </para> </listitem>
5333
5334 <listitem> <para>You cannot have implicit parameters (whether linear or not)
5335   in the context of a class or instance declaration. </para></listitem>
5336 </itemizedlist>
5337 </para>
5338
5339 <sect3><title>Warnings</title>
5340
5341 <para>
5342 The monomorphism restriction is even more important than usual.
5343 Consider the example above:
5344 <programlisting>
5345     f :: (%ns :: NameSupply) => Env -> Expr -> Expr
5346     f env (Lam x e) = Lam x' (f env e)
5347                     where
5348                       x'   = newName %ns
5349                       env' = extend env x x'
5350 </programlisting>
5351 If we replaced the two occurrences of x' by (newName %ns), which is
5352 usually a harmless thing to do, we get:
5353 <programlisting>
5354     f :: (%ns :: NameSupply) => Env -> Expr -> Expr
5355     f env (Lam x e) = Lam (newName %ns) (f env e)
5356                     where
5357                       env' = extend env x (newName %ns)
5358 </programlisting>
5359 But now the name supply is consumed in <emphasis>three</emphasis> places
5360 (the two calls to newName,and the recursive call to f), so
5361 the result is utterly different.  Urk!  We don't even have
5362 the beta rule.
5363 </para>
5364 <para>
5365 Well, this is an experimental change.  With implicit
5366 parameters we have already lost beta reduction anyway, and
5367 (as John Launchbury puts it) we can't sensibly reason about
5368 Haskell programs without knowing their typing.
5369 </para>
5370
5371 </sect3>
5372
5373 <sect3><title>Recursive functions</title>
5374 <para>Linear implicit parameters can be particularly tricky when you have a recursive function
5375 Consider
5376 <programlisting>
5377         foo :: %x::T => Int -> [Int]
5378         foo 0 = []
5379         foo n = %x : foo (n-1)
5380 </programlisting>
5381 where T is some type in class Splittable.</para>
5382 <para>
5383 Do you get a list of all the same T's or all different T's
5384 (assuming that split gives two distinct T's back)?
5385 </para><para>
5386 If you supply the type signature, taking advantage of polymorphic
5387 recursion, you get what you'd probably expect.  Here's the
5388 translated term, where the implicit param is made explicit:
5389 <programlisting>
5390         foo x 0 = []
5391         foo x n = let (x1,x2) = split x
5392                   in x1 : foo x2 (n-1)
5393 </programlisting>
5394 But if you don't supply a type signature, GHC uses the Hindley
5395 Milner trick of using a single monomorphic instance of the function
5396 for the recursive calls. That is what makes Hindley Milner type inference
5397 work.  So the translation becomes
5398 <programlisting>
5399         foo x = let
5400                   foom 0 = []
5401                   foom n = x : foom (n-1)
5402                 in
5403                 foom
5404 </programlisting>
5405 Result: 'x' is not split, and you get a list of identical T's.  So the
5406 semantics of the program depends on whether or not foo has a type signature.
5407 Yikes!
5408 </para><para>
5409 You may say that this is a good reason to dislike linear implicit parameters
5410 and you'd be right.  That is why they are an experimental feature.
5411 </para>
5412 </sect3>
5413
5414 </sect2>
5415
5416 ================ END OF Linear Implicit Parameters commented out -->
5417
5418 <sect2 id="kinding">
5419 <title>Explicitly-kinded quantification</title>
5420
5421 <para>
5422 Haskell infers the kind of each type variable.  Sometimes it is nice to be able
5423 to give the kind explicitly as (machine-checked) documentation,
5424 just as it is nice to give a type signature for a function.  On some occasions,
5425 it is essential to do so.  For example, in his paper "Restricted Data Types in Haskell" (Haskell Workshop 1999)
5426 John Hughes had to define the data type:
5427 <screen>
5428      data Set cxt a = Set [a]
5429                     | Unused (cxt a -> ())
5430 </screen>
5431 The only use for the <literal>Unused</literal> constructor was to force the correct
5432 kind for the type variable <literal>cxt</literal>.
5433 </para>
5434 <para>
5435 GHC now instead allows you to specify the kind of a type variable directly, wherever
5436 a type variable is explicitly bound, with the flag <option>-XKindSignatures</option>.
5437 </para>
5438 <para>
5439 This flag enables kind signatures in the following places:
5440 <itemizedlist>
5441 <listitem><para><literal>data</literal> declarations:
5442 <screen>
5443   data Set (cxt :: * -> *) a = Set [a]
5444 </screen></para></listitem>
5445 <listitem><para><literal>type</literal> declarations:
5446 <screen>
5447   type T (f :: * -> *) = f Int
5448 </screen></para></listitem>
5449 <listitem><para><literal>class</literal> declarations:
5450 <screen>
5451   class (Eq a) => C (f :: * -> *) a where ...
5452 </screen></para></listitem>
5453 <listitem><para><literal>forall</literal>'s in type signatures:
5454 <screen>
5455   f :: forall (cxt :: * -> *). Set cxt Int
5456 </screen></para></listitem>
5457 </itemizedlist>
5458 </para>
5459
5460 <para>
5461 The parentheses are required.  Some of the spaces are required too, to
5462 separate the lexemes.  If you write <literal>(f::*->*)</literal> you
5463 will get a parse error, because "<literal>::*->*</literal>" is a
5464 single lexeme in Haskell.
5465 </para>
5466
5467 <para>
5468 As part of the same extension, you can put kind annotations in types
5469 as well.  Thus:
5470 <screen>
5471    f :: (Int :: *) -> Int
5472    g :: forall a. a -> (a :: *)
5473 </screen>
5474 The syntax is
5475 <screen>
5476    atype ::= '(' ctype '::' kind ')
5477 </screen>
5478 The parentheses are required.
5479 </para>
5480 </sect2>
5481
5482
5483 <sect2 id="universal-quantification">
5484 <title>Arbitrary-rank polymorphism
5485 </title>
5486
5487 <para>
5488 GHC's type system supports <emphasis>arbitrary-rank</emphasis>
5489 explicit universal quantification in
5490 types.
5491 For example, all the following types are legal:
5492 <programlisting>
5493     f1 :: forall a b. a -> b -> a
5494     g1 :: forall a b. (Ord a, Eq  b) => a -> b -> a
5495
5496     f2 :: (forall a. a->a) -> Int -> Int
5497     g2 :: (forall a. Eq a => [a] -> a -> Bool) -> Int -> Int
5498
5499     f3 :: ((forall a. a->a) -> Int) -> Bool -> Bool
5500
5501     f4 :: Int -> (forall a. a -> a)
5502 </programlisting>
5503 Here, <literal>f1</literal> and <literal>g1</literal> are rank-1 types, and
5504 can be written in standard Haskell (e.g. <literal>f1 :: a->b->a</literal>).
5505 The <literal>forall</literal> makes explicit the universal quantification that
5506 is implicitly added by Haskell.
5507 </para>
5508 <para>
5509 The functions <literal>f2</literal> and <literal>g2</literal> have rank-2 types;
5510 the <literal>forall</literal> is on the left of a function arrow.  As <literal>g2</literal>
5511 shows, the polymorphic type on the left of the function arrow can be overloaded.
5512 </para>
5513 <para>
5514 The function <literal>f3</literal> has a rank-3 type;
5515 it has rank-2 types on the left of a function arrow.
5516 </para>
5517 <para>
5518 GHC has three flags to control higher-rank types:
5519 <itemizedlist>
5520 <listitem><para>
5521  <option>-XPolymorphicComponents</option>: data constructors (only) can have polymorphic argument types.
5522 </para></listitem>
5523 <listitem><para>
5524  <option>-XRank2Types</option>: any function (including data constructors) can have a rank-2 type.
5525 </para></listitem>
5526 <listitem><para>
5527  <option>-XRankNTypes</option>: any function (including data constructors) can have an arbitrary-rank type.
5528 That is,  you can nest <literal>forall</literal>s
5529 arbitrarily deep in function arrows.
5530 In particular, a forall-type (also called a "type scheme"),
5531 including an operational type class context, is legal:
5532 <itemizedlist>
5533 <listitem> <para> On the left or right (see <literal>f4</literal>, for example)
5534 of a function arrow </para> </listitem>
5535 <listitem> <para> As the argument of a constructor, or type of a field, in a data type declaration. For
5536 example, any of the <literal>f1,f2,f3,g1,g2</literal> above would be valid
5537 field type signatures.</para> </listitem>
5538 <listitem> <para> As the type of an implicit parameter </para> </listitem>
5539 <listitem> <para> In a pattern type signature (see <xref linkend="scoped-type-variables"/>) </para> </listitem>
5540 </itemizedlist>
5541 </para></listitem>
5542 </itemizedlist>
5543 </para>
5544
5545
5546 <sect3 id="univ">
5547 <title>Examples
5548 </title>
5549
5550 <para>
5551 In a <literal>data</literal> or <literal>newtype</literal> declaration one can quantify
5552 the types of the constructor arguments.  Here are several examples:
5553 </para>
5554
5555 <para>
5556
5557 <programlisting>
5558 data T a = T1 (forall b. b -> b -> b) a
5559
5560 data MonadT m = MkMonad { return :: forall a. a -> m a,
5561                           bind   :: forall a b. m a -> (a -> m b) -> m b
5562                         }
5563
5564 newtype Swizzle = MkSwizzle (Ord a => [a] -> [a])
5565 </programlisting>
5566
5567 </para>
5568
5569 <para>
5570 The constructors have rank-2 types:
5571 </para>
5572
5573 <para>
5574
5575 <programlisting>
5576 T1 :: forall a. (forall b. b -> b -> b) -> a -> T a
5577 MkMonad :: forall m. (forall a. a -> m a)
5578                   -> (forall a b. m a -> (a -> m b) -> m b)
5579                   -> MonadT m
5580 MkSwizzle :: (Ord a => [a] -> [a]) -> Swizzle
5581 </programlisting>
5582
5583 </para>
5584
5585 <para>
5586 Notice that you don't need to use a <literal>forall</literal> if there's an
5587 explicit context.  For example in the first argument of the
5588 constructor <function>MkSwizzle</function>, an implicit "<literal>forall a.</literal>" is
5589 prefixed to the argument type.  The implicit <literal>forall</literal>
5590 quantifies all type variables that are not already in scope, and are
5591 mentioned in the type quantified over.
5592 </para>
5593
5594 <para>
5595 As for type signatures, implicit quantification happens for non-overloaded
5596 types too.  So if you write this:
5597
5598 <programlisting>
5599   data T a = MkT (Either a b) (b -> b)
5600 </programlisting>
5601
5602 it's just as if you had written this:
5603
5604 <programlisting>
5605   data T a = MkT (forall b. Either a b) (forall b. b -> b)
5606 </programlisting>
5607
5608 That is, since the type variable <literal>b</literal> isn't in scope, it's
5609 implicitly universally quantified.  (Arguably, it would be better
5610 to <emphasis>require</emphasis> explicit quantification on constructor arguments
5611 where that is what is wanted.  Feedback welcomed.)
5612 </para>
5613
5614 <para>
5615 You construct values of types <literal>T1, MonadT, Swizzle</literal> by applying
5616 the constructor to suitable values, just as usual.  For example,
5617 </para>
5618
5619 <para>
5620
5621 <programlisting>
5622     a1 :: T Int
5623     a1 = T1 (\xy->x) 3
5624
5625     a2, a3 :: Swizzle
5626     a2 = MkSwizzle sort
5627     a3 = MkSwizzle reverse
5628
5629     a4 :: MonadT Maybe
5630     a4 = let r x = Just x
5631              b m k = case m of
5632                        Just y -> k y
5633                        Nothing -> Nothing
5634          in
5635          MkMonad r b
5636
5637     mkTs :: (forall b. b -> b -> b) -> a -> [T a]
5638     mkTs f x y = [T1 f x, T1 f y]
5639 </programlisting>
5640
5641 </para>
5642
5643 <para>
5644 The type of the argument can, as usual, be more general than the type
5645 required, as <literal>(MkSwizzle reverse)</literal> shows.  (<function>reverse</function>
5646 does not need the <literal>Ord</literal> constraint.)
5647 </para>
5648
5649 <para>
5650 When you use pattern matching, the bound variables may now have
5651 polymorphic types.  For example:
5652 </para>
5653
5654 <para>
5655
5656 <programlisting>
5657     f :: T a -> a -> (a, Char)
5658     f (T1 w k) x = (w k x, w 'c' 'd')
5659
5660     g :: (Ord a, Ord b) => Swizzle -> [a] -> (a -> b) -> [b]
5661     g (MkSwizzle s) xs f = s (map f (s xs))
5662
5663     h :: MonadT m -> [m a] -> m [a]
5664     h m [] = return m []
5665     h m (x:xs) = bind m x          $ \y ->
5666                  bind m (h m xs)   $ \ys ->
5667                  return m (y:ys)
5668 </programlisting>
5669
5670 </para>
5671
5672 <para>
5673 In the function <function>h</function> we use the record selectors <literal>return</literal>
5674 and <literal>bind</literal> to extract the polymorphic bind and return functions
5675 from the <literal>MonadT</literal> data structure, rather than using pattern
5676 matching.
5677 </para>
5678 </sect3>
5679
5680 <sect3>
5681 <title>Type inference</title>
5682
5683 <para>
5684 In general, type inference for arbitrary-rank types is undecidable.
5685 GHC uses an algorithm proposed by Odersky and Laufer ("Putting type annotations to work", POPL'96)
5686 to get a decidable algorithm by requiring some help from the programmer.
5687 We do not yet have a formal specification of "some help" but the rule is this:
5688 </para>
5689 <para>
5690 <emphasis>For a lambda-bound or case-bound variable, x, either the programmer
5691 provides an explicit polymorphic type for x, or GHC's type inference will assume
5692 that x's type has no foralls in it</emphasis>.
5693 </para>
5694 <para>
5695 What does it mean to "provide" an explicit type for x?  You can do that by
5696 giving a type signature for x directly, using a pattern type signature
5697 (<xref linkend="scoped-type-variables"/>), thus:
5698 <programlisting>
5699      \ f :: (forall a. a->a) -> (f True, f 'c')
5700 </programlisting>
5701 Alternatively, you can give a type signature to the enclosing
5702 context, which GHC can "push down" to find the type for the variable:
5703 <programlisting>
5704      (\ f -> (f True, f 'c')) :: (forall a. a->a) -> (Bool,Char)
5705 </programlisting>
5706 Here the type signature on the expression can be pushed inwards
5707 to give a type signature for f.  Similarly, and more commonly,
5708 one can give a type signature for the function itself:
5709 <programlisting>
5710      h :: (forall a. a->a) -> (Bool,Char)
5711      h f = (f True, f 'c')
5712 </programlisting>
5713 You don't need to give a type signature if the lambda bound variable
5714 is a constructor argument.  Here is an example we saw earlier:
5715 <programlisting>
5716     f :: T a -> a -> (a, Char)
5717     f (T1 w k) x = (w k x, w 'c' 'd')
5718 </programlisting>
5719 Here we do not need to give a type signature to <literal>w</literal>, because
5720 it is an argument of constructor <literal>T1</literal> and that tells GHC all
5721 it needs to know.
5722 </para>
5723
5724 </sect3>
5725
5726
5727 <sect3 id="implicit-quant">
5728 <title>Implicit quantification</title>
5729
5730 <para>
5731 GHC performs implicit quantification as follows.  <emphasis>At the top level (only) of
5732 user-written types, if and only if there is no explicit <literal>forall</literal>,
5733 GHC finds all the type variables mentioned in the type that are not already
5734 in scope, and universally quantifies them.</emphasis>  For example, the following pairs are
5735 equivalent:
5736 <programlisting>
5737   f :: a -> a
5738   f :: forall a. a -> a
5739
5740   g (x::a) = let
5741                 h :: a -> b -> b
5742                 h x y = y
5743              in ...
5744   g (x::a) = let
5745                 h :: forall b. a -> b -> b
5746                 h x y = y
5747              in ...
5748 </programlisting>
5749 </para>
5750 <para>
5751 Notice that GHC does <emphasis>not</emphasis> find the innermost possible quantification
5752 point.  For example:
5753 <programlisting>
5754   f :: (a -> a) -> Int
5755            -- MEANS
5756   f :: forall a. (a -> a) -> Int
5757            -- NOT
5758   f :: (forall a. a -> a) -> Int
5759
5760
5761   g :: (Ord a => a -> a) -> Int
5762            -- MEANS the illegal type
5763   g :: forall a. (Ord a => a -> a) -> Int
5764            -- NOT
5765   g :: (forall a. Ord a => a -> a) -> Int
5766 </programlisting>
5767 The latter produces an illegal type, which you might think is silly,
5768 but at least the rule is simple.  If you want the latter type, you
5769 can write your for-alls explicitly.  Indeed, doing so is strongly advised
5770 for rank-2 types.
5771 </para>
5772 </sect3>
5773 </sect2>
5774
5775
5776 <sect2 id="impredicative-polymorphism">
5777 <title>Impredicative polymorphism
5778 </title>
5779 <para><emphasis>NOTE: the impredicative-polymorphism feature is deprecated in GHC 6.12, and
5780 will be removed or replaced in GHC 6.14.</emphasis></para>
5781
5782 <para>GHC supports <emphasis>impredicative polymorphism</emphasis>,
5783 enabled with <option>-XImpredicativeTypes</option>.
5784 This means
5785 that you can call a polymorphic function at a polymorphic type, and
5786 parameterise data structures over polymorphic types.  For example:
5787 <programlisting>
5788   f :: Maybe (forall a. [a] -> [a]) -> Maybe ([Int], [Char])
5789   f (Just g) = Just (g [3], g "hello")
5790   f Nothing  = Nothing
5791 </programlisting>
5792 Notice here that the <literal>Maybe</literal> type is parameterised by the
5793 <emphasis>polymorphic</emphasis> type <literal>(forall a. [a] ->
5794 [a])</literal>.
5795 </para>
5796 <para>The technical details of this extension are described in the paper
5797 <ulink url="http://research.microsoft.com/%7Esimonpj/papers/boxy/">Boxy types:
5798 type inference for higher-rank types and impredicativity</ulink>,
5799 which appeared at ICFP 2006.
5800 </para>
5801 </sect2>
5802
5803 <sect2 id="scoped-type-variables">
5804 <title>Lexically scoped type variables
5805 </title>
5806
5807 <para>
5808 GHC supports <emphasis>lexically scoped type variables</emphasis>, without
5809 which some type signatures are simply impossible to write. For example:
5810 <programlisting>
5811 f :: forall a. [a] -> [a]
5812 f xs = ys ++ ys
5813      where
5814        ys :: [a]
5815        ys = reverse xs
5816 </programlisting>
5817 The type signature for <literal>f</literal> brings the type variable <literal>a</literal> into scope,
5818 because of the explicit <literal>forall</literal> (<xref linkend="decl-type-sigs"/>).
5819 The type variables bound by a <literal>forall</literal> scope over
5820 the entire definition of the accompanying value declaration.
5821 In this example, the type variable <literal>a</literal> scopes over the whole
5822 definition of <literal>f</literal>, including over
5823 the type signature for <varname>ys</varname>.
5824 In Haskell 98 it is not possible to declare
5825 a type for <varname>ys</varname>; a major benefit of scoped type variables is that
5826 it becomes possible to do so.
5827 </para>
5828 <para>Lexically-scoped type variables are enabled by
5829 <option>-XScopedTypeVariables</option>.  This flag implies <option>-XRelaxedPolyRec</option>.
5830 </para>
5831 <para>Note: GHC 6.6 contains substantial changes to the way that scoped type
5832 variables work, compared to earlier releases.  Read this section
5833 carefully!</para>
5834
5835 <sect3>
5836 <title>Overview</title>
5837
5838 <para>The design follows the following principles
5839 <itemizedlist>
5840 <listitem><para>A scoped type variable stands for a type <emphasis>variable</emphasis>, and not for
5841 a <emphasis>type</emphasis>. (This is a change from GHC's earlier
5842 design.)</para></listitem>
5843 <listitem><para>Furthermore, distinct lexical type variables stand for distinct
5844 type variables.  This means that every programmer-written type signature
5845 (including one that contains free scoped type variables) denotes a
5846 <emphasis>rigid</emphasis> type; that is, the type is fully known to the type
5847 checker, and no inference is involved.</para></listitem>
5848 <listitem><para>Lexical type variables may be alpha-renamed freely, without
5849 changing the program.</para></listitem>
5850 </itemizedlist>
5851 </para>
5852 <para>
5853 A <emphasis>lexically scoped type variable</emphasis> can be bound by:
5854 <itemizedlist>
5855 <listitem><para>A declaration type signature (<xref linkend="decl-type-sigs"/>)</para></listitem>
5856 <listitem><para>An expression type signature (<xref linkend="exp-type-sigs"/>)</para></listitem>
5857 <listitem><para>A pattern type signature (<xref linkend="pattern-type-sigs"/>)</para></listitem>
5858 <listitem><para>Class and instance declarations (<xref linkend="cls-inst-scoped-tyvars"/>)</para></listitem>
5859 </itemizedlist>
5860 </para>
5861 <para>
5862 In Haskell, a programmer-written type signature is implicitly quantified over
5863 its free type variables (<ulink
5864 url="http://www.haskell.org/onlinereport/decls.html#sect4.1.2">Section
5865 4.1.2</ulink>
5866 of the Haskell Report).
5867 Lexically scoped type variables affect this implicit quantification rules
5868 as follows: any type variable that is in scope is <emphasis>not</emphasis> universally
5869 quantified. For example, if type variable <literal>a</literal> is in scope,
5870 then
5871 <programlisting>
5872   (e :: a -> a)     means     (e :: a -> a)
5873   (e :: b -> b)     means     (e :: forall b. b->b)
5874   (e :: a -> b)     means     (e :: forall b. a->b)
5875 </programlisting>
5876 </para>
5877
5878
5879 </sect3>
5880
5881
5882 <sect3 id="decl-type-sigs">
5883 <title>Declaration type signatures</title>
5884 <para>A declaration type signature that has <emphasis>explicit</emphasis>
5885 quantification (using <literal>forall</literal>) brings into scope the
5886 explicitly-quantified
5887 type variables, in the definition of the named function.  For example:
5888 <programlisting>
5889   f :: forall a. [a] -> [a]
5890   f (x:xs) = xs ++ [ x :: a ]
5891 </programlisting>
5892 The "<literal>forall a</literal>" brings "<literal>a</literal>" into scope in
5893 the definition of "<literal>f</literal>".
5894 </para>
5895 <para>This only happens if:
5896 <itemizedlist>
5897 <listitem><para> The quantification in <literal>f</literal>'s type
5898 signature is explicit.  For example:
5899 <programlisting>
5900   g :: [a] -> [a]
5901   g (x:xs) = xs ++ [ x :: a ]
5902 </programlisting>
5903 This program will be rejected, because "<literal>a</literal>" does not scope
5904 over the definition of "<literal>f</literal>", so "<literal>x::a</literal>"
5905 means "<literal>x::forall a. a</literal>" by Haskell's usual implicit
5906 quantification rules.
5907 </para></listitem>
5908 <listitem><para> The signature gives a type for a function binding or a bare variable binding,
5909 not a pattern binding.
5910 For example:
5911 <programlisting>
5912   f1 :: forall a. [a] -> [a]
5913   f1 (x:xs) = xs ++ [ x :: a ]   -- OK
5914
5915   f2 :: forall a. [a] -> [a]
5916   f2 = \(x:xs) -> xs ++ [ x :: a ]   -- OK
5917
5918   f3 :: forall a. [a] -> [a]
5919   Just f3 = Just (\(x:xs) -> xs ++ [ x :: a ])   -- Not OK!
5920 </programlisting>
5921 The binding for <literal>f3</literal> is a pattern binding, and so its type signature
5922 does not bring <literal>a</literal> into scope.   However <literal>f1</literal> is a
5923 function binding, and <literal>f2</literal> binds a bare variable; in both cases
5924 the type signature brings <literal>a</literal> into scope.
5925 </para></listitem>
5926 </itemizedlist>
5927 </para>
5928 </sect3>
5929
5930 <sect3 id="exp-type-sigs">
5931 <title>Expression type signatures</title>
5932
5933 <para>An expression type signature that has <emphasis>explicit</emphasis>
5934 quantification (using <literal>forall</literal>) brings into scope the
5935 explicitly-quantified
5936 type variables, in the annotated expression.  For example:
5937 <programlisting>
5938   f = runST ( (op >>= \(x :: STRef s Int) -> g x) :: forall s. ST s Bool )
5939 </programlisting>
5940 Here, the type signature <literal>forall a. ST s Bool</literal> brings the
5941 type variable <literal>s</literal> into scope, in the annotated expression
5942 <literal>(op >>= \(x :: STRef s Int) -> g x)</literal>.
5943 </para>
5944
5945 </sect3>
5946
5947 <sect3 id="pattern-type-sigs">
5948 <title>Pattern type signatures</title>
5949 <para>
5950 A type signature may occur in any pattern; this is a <emphasis>pattern type
5951 signature</emphasis>.
5952 For example:
5953 <programlisting>
5954   -- f and g assume that 'a' is already in scope
5955   f = \(x::Int, y::a) -> x
5956   g (x::a) = x
5957   h ((x,y) :: (Int,Bool)) = (y,x)
5958 </programlisting>
5959 In the case where all the type variables in the pattern type signature are
5960 already in scope (i.e. bound by the enclosing context), matters are simple: the
5961 signature simply constrains the type of the pattern in the obvious way.
5962 </para>
5963 <para>
5964 Unlike expression and declaration type signatures, pattern type signatures are not implicitly generalised.
5965 The pattern in a <emphasis>pattern binding</emphasis> may only mention type variables
5966 that are already in scope.  For example:
5967 <programlisting>
5968   f :: forall a. [a] -> (Int, [a])
5969   f xs = (n, zs)
5970     where
5971       (ys::[a], n) = (reverse xs, length xs) -- OK
5972       zs::[a] = xs ++ ys                     -- OK
5973
5974       Just (v::b) = ...  -- Not OK; b is not in scope
5975 </programlisting>
5976 Here, the pattern signatures for <literal>ys</literal> and <literal>zs</literal>
5977 are fine, but the one for <literal>v</literal> is not because <literal>b</literal> is
5978 not in scope.
5979 </para>
5980 <para>
5981 However, in all patterns <emphasis>other</emphasis> than pattern bindings, a pattern
5982 type signature may mention a type variable that is not in scope; in this case,
5983 <emphasis>the signature brings that type variable into scope</emphasis>.
5984 This is particularly important for existential data constructors.  For example:
5985 <programlisting>
5986   data T = forall a. MkT [a]
5987
5988   k :: T -> T
5989   k (MkT [t::a]) = MkT t3
5990                  where
5991                    t3::[a] = [t,t,t]
5992 </programlisting>
5993 Here, the pattern type signature <literal>(t::a)</literal> mentions a lexical type
5994 variable that is not already in scope.  Indeed, it <emphasis>cannot</emphasis> already be in scope,
5995 because it is bound by the pattern match.  GHC's rule is that in this situation
5996 (and only then), a pattern type signature can mention a type variable that is
5997 not already in scope; the effect is to bring it into scope, standing for the
5998 existentially-bound type variable.
5999 </para>
6000 <para>
6001 When a pattern type signature binds a type variable in this way, GHC insists that the
6002 type variable is bound to a <emphasis>rigid</emphasis>, or fully-known, type variable.
6003 This means that any user-written type signature always stands for a completely known type.
6004 </para>
6005 <para>
6006 If all this seems a little odd, we think so too.  But we must have
6007 <emphasis>some</emphasis> way to bring such type variables into scope, else we
6008 could not name existentially-bound type variables in subsequent type signatures.
6009 </para>
6010 <para>
6011 This is (now) the <emphasis>only</emphasis> situation in which a pattern type
6012 signature is allowed to mention a lexical variable that is not already in
6013 scope.
6014 For example, both <literal>f</literal> and <literal>g</literal> would be
6015 illegal if <literal>a</literal> was not already in scope.
6016 </para>
6017
6018
6019 </sect3>
6020
6021 <!-- ==================== Commented out part about result type signatures
6022
6023 <sect3 id="result-type-sigs">
6024 <title>Result type signatures</title>
6025
6026 <para>
6027 The result type of a function, lambda, or case expression alternative can be given a signature, thus:
6028
6029 <programlisting>
6030   {- f assumes that 'a' is already in scope -}
6031   f x y :: [a] = [x,y,x]
6032
6033   g = \ x :: [Int] -> [3,4]
6034
6035   h :: forall a. [a] -> a
6036   h xs = case xs of
6037             (y:ys) :: a -> y
6038 </programlisting>
6039 The final <literal>:: [a]</literal> after the patterns of <literal>f</literal> gives the type of
6040 the result of the function.  Similarly, the body of the lambda in the RHS of
6041 <literal>g</literal> is <literal>[Int]</literal>, and the RHS of the case
6042 alternative in <literal>h</literal> is <literal>a</literal>.
6043 </para>
6044 <para> A result type signature never brings new type variables into scope.</para>
6045 <para>
6046 There are a couple of syntactic wrinkles.  First, notice that all three
6047 examples would parse quite differently with parentheses:
6048 <programlisting>
6049   {- f assumes that 'a' is already in scope -}
6050   f x (y :: [a]) = [x,y,x]
6051
6052   g = \ (x :: [Int]) -> [3,4]
6053
6054   h :: forall a. [a] -> a
6055   h xs = case xs of
6056             ((y:ys) :: a) -> y
6057 </programlisting>
6058 Now the signature is on the <emphasis>pattern</emphasis>; and
6059 <literal>h</literal> would certainly be ill-typed (since the pattern
6060 <literal>(y:ys)</literal> cannot have the type <literal>a</literal>.
6061
6062 Second, to avoid ambiguity, the type after the &ldquo;<literal>::</literal>&rdquo; in a result
6063 pattern signature on a lambda or <literal>case</literal> must be atomic (i.e. a single
6064 token or a parenthesised type of some sort).  To see why,
6065 consider how one would parse this:
6066 <programlisting>
6067   \ x :: a -> b -> x
6068 </programlisting>
6069 </para>
6070 </sect3>
6071
6072  -->
6073
6074 <sect3 id="cls-inst-scoped-tyvars">
6075 <title>Class and instance declarations</title>
6076 <para>
6077
6078 The type variables in the head of a <literal>class</literal> or <literal>instance</literal> declaration
6079 scope over the methods defined in the <literal>where</literal> part.  For example:
6080
6081
6082 <programlisting>
6083   class C a where
6084     op :: [a] -> a
6085
6086     op xs = let ys::[a]
6087                 ys = reverse xs
6088             in
6089             head ys
6090 </programlisting>
6091 </para>
6092 </sect3>
6093
6094 </sect2>
6095
6096
6097 <sect2 id="typing-binds">
6098 <title>Generalised typing of mutually recursive bindings</title>
6099
6100 <para>
6101 The Haskell Report specifies that a group of bindings (at top level, or in a
6102 <literal>let</literal> or <literal>where</literal>) should be sorted into
6103 strongly-connected components, and then type-checked in dependency order
6104 (<ulink url="http://www.haskell.org/onlinereport/decls.html#sect4.5.1">Haskell
6105 Report, Section 4.5.1</ulink>).
6106 As each group is type-checked, any binders of the group that
6107 have
6108 an explicit type signature are put in the type environment with the specified
6109 polymorphic type,
6110 and all others are monomorphic until the group is generalised
6111 (<ulink url="http://www.haskell.org/onlinereport/decls.html#sect4.5.2">Haskell Report, Section 4.5.2</ulink>).
6112 </para>
6113
6114 <para>Following a suggestion of Mark Jones, in his paper
6115 <ulink url="http://citeseer.ist.psu.edu/424440.html">Typing Haskell in
6116 Haskell</ulink>,
6117 GHC implements a more general scheme.  If <option>-XRelaxedPolyRec</option> is
6118 specified:
6119 <emphasis>the dependency analysis ignores references to variables that have an explicit
6120 type signature</emphasis>.
6121 As a result of this refined dependency analysis, the dependency groups are smaller, and more bindings will
6122 typecheck.  For example, consider:
6123 <programlisting>
6124   f :: Eq a =&gt; a -> Bool
6125   f x = (x == x) || g True || g "Yes"
6126
6127   g y = (y &lt;= y) || f True
6128 </programlisting>
6129 This is rejected by Haskell 98, but under Jones's scheme the definition for
6130 <literal>g</literal> is typechecked first, separately from that for
6131 <literal>f</literal>,
6132 because the reference to <literal>f</literal> in <literal>g</literal>'s right
6133 hand side is ignored by the dependency analysis.  Then <literal>g</literal>'s
6134 type is generalised, to get
6135 <programlisting>
6136   g :: Ord a =&gt; a -> Bool
6137 </programlisting>
6138 Now, the definition for <literal>f</literal> is typechecked, with this type for
6139 <literal>g</literal> in the type environment.
6140 </para>
6141
6142 <para>
6143 The same refined dependency analysis also allows the type signatures of
6144 mutually-recursive functions to have different contexts, something that is illegal in
6145 Haskell 98 (Section 4.5.2, last sentence).  With
6146 <option>-XRelaxedPolyRec</option>
6147 GHC only insists that the type signatures of a <emphasis>refined</emphasis> group have identical
6148 type signatures; in practice this means that only variables bound by the same
6149 pattern binding must have the same context.  For example, this is fine:
6150 <programlisting>
6151   f :: Eq a =&gt; a -> Bool
6152   f x = (x == x) || g True
6153
6154   g :: Ord a =&gt; a -> Bool
6155   g y = (y &lt;= y) || f True
6156 </programlisting>
6157 </para>
6158 </sect2>
6159
6160 <sect2 id="mono-local-binds">
6161 <title>Monomorphic local bindings</title>
6162 <para>
6163 We are actively thinking of simplifying GHC's type system, by <emphasis>not generalising local bindings</emphasis>.
6164 The rationale is described in the paper
6165 <ulink url="http://research.microsoft.com/~simonpj/papers/constraints/index.htm">Let should not be generalised</ulink>.
6166 </para>
6167 <para>
6168 The experimental new behaviour is enabled by the flag <option>-XMonoLocalBinds</option>.  The effect is
6169 that local (that is, non-top-level) bindings without a type signature are not generalised at all.  You can
6170 think of it as an extreme (but much more predictable) version of the Monomorphism Restriction.
6171 If you supply a type signature, then the flag has no effect.
6172 </para>
6173 </sect2>
6174
6175 </sect1>
6176 <!-- ==================== End of type system extensions =================  -->
6177
6178 <!-- ====================== TEMPLATE HASKELL =======================  -->
6179
6180 <sect1 id="template-haskell">
6181 <title>Template Haskell</title>
6182
6183 <para>Template Haskell allows you to do compile-time meta-programming in
6184 Haskell.
6185 The background to
6186 the main technical innovations is discussed in "<ulink
6187 url="http://research.microsoft.com/~simonpj/papers/meta-haskell/">
6188 Template Meta-programming for Haskell</ulink>" (Proc Haskell Workshop 2002).
6189 </para>
6190 <para>
6191 There is a Wiki page about
6192 Template Haskell at <ulink url="http://www.haskell.org/haskellwiki/Template_Haskell">
6193 http://www.haskell.org/haskellwiki/Template_Haskell</ulink>, and that is the best place to look for
6194 further details.
6195 You may also
6196 consult the <ulink
6197 url="http://www.haskell.org/ghc/docs/latest/html/libraries/index.html">online
6198 Haskell library reference material</ulink>
6199 (look for module <literal>Language.Haskell.TH</literal>).
6200 Many changes to the original design are described in
6201       <ulink url="http://research.microsoft.com/~simonpj/papers/meta-haskell/notes2.ps">
6202 Notes on Template Haskell version 2</ulink>.
6203 Not all of these changes are in GHC, however.
6204 </para>
6205
6206 <para> The first example from that paper is set out below (<xref linkend="th-example"/>)
6207 as a worked example to help get you started.
6208 </para>
6209
6210 <para>
6211 The documentation here describes the realisation of Template Haskell in GHC.  It is not detailed enough to
6212 understand Template Haskell; see the <ulink url="http://haskell.org/haskellwiki/Template_Haskell">
6213 Wiki page</ulink>.
6214 </para>
6215
6216     <sect2>
6217       <title>Syntax</title>
6218
6219       <para> Template Haskell has the following new syntactic
6220       constructions.  You need to use the flag
6221       <option>-XTemplateHaskell</option>
6222         <indexterm><primary><option>-XTemplateHaskell</option></primary>
6223       </indexterm>to switch these syntactic extensions on
6224       (<option>-XTemplateHaskell</option> is no longer implied by
6225       <option>-fglasgow-exts</option>).</para>
6226
6227         <itemizedlist>
6228               <listitem><para>
6229                   A splice is written <literal>$x</literal>, where <literal>x</literal> is an
6230                   identifier, or <literal>$(...)</literal>, where the "..." is an arbitrary expression.
6231                   There must be no space between the "$" and the identifier or parenthesis.  This use
6232                   of "$" overrides its meaning as an infix operator, just as "M.x" overrides the meaning
6233                   of "." as an infix operator.  If you want the infix operator, put spaces around it.
6234                   </para>
6235               <para> A splice can occur in place of
6236                   <itemizedlist>
6237                     <listitem><para> an expression; the spliced expression must
6238                     have type <literal>Q Exp</literal></para></listitem>
6239                     <listitem><para> an type; the spliced expression must
6240                     have type <literal>Q Typ</literal></para></listitem>
6241                     <listitem><para> a list of top-level declarations; the spliced expression
6242                     must have type <literal>Q [Dec]</literal></para></listitem>
6243                     </itemizedlist>
6244             Note that pattern splices are not supported.
6245             Inside a splice you can can only call functions defined in imported modules,
6246             not functions defined elsewhere in the same module.</para></listitem>
6247
6248               <listitem><para>
6249                   A expression quotation is written in Oxford brackets, thus:
6250                   <itemizedlist>
6251                     <listitem><para> <literal>[| ... |]</literal>, or <literal>[e| ... |]</literal>,
6252                              where the "..." is an expression;
6253                              the quotation has type <literal>Q Exp</literal>.</para></listitem>
6254                     <listitem><para> <literal>[d| ... |]</literal>, where the "..." is a list of top-level declarations;
6255                              the quotation has type <literal>Q [Dec]</literal>.</para></listitem>
6256                     <listitem><para> <literal>[t| ... |]</literal>, where the "..." is a type;
6257                              the quotation has type <literal>Q Type</literal>.</para></listitem>
6258                     <listitem><para> <literal>[p| ... |]</literal>, where the "..." is a pattern;
6259                              the quotation has type <literal>Q Pat</literal>.</para></listitem>
6260                   </itemizedlist></para></listitem>
6261
6262               <listitem><para>
6263                   A quasi-quotation can appear in either a pattern context or an
6264                   expression context and is also written in Oxford brackets:
6265                   <itemizedlist>
6266                     <listitem><para> <literal>[<replaceable>varid</replaceable>| ... |]</literal>,
6267                         where the "..." is an arbitrary string; a full description of the
6268                         quasi-quotation facility is given in <xref linkend="th-quasiquotation"/>.</para></listitem>
6269                   </itemizedlist></para></listitem>
6270
6271               <listitem><para>
6272                   A name can be quoted with either one or two prefix single quotes:
6273                   <itemizedlist>
6274                     <listitem><para> <literal>'f</literal> has type <literal>Name</literal>, and names the function <literal>f</literal>.
6275                   Similarly <literal>'C</literal> has type <literal>Name</literal> and names the data constructor <literal>C</literal>.
6276                   In general <literal>'</literal><replaceable>thing</replaceable> interprets <replaceable>thing</replaceable> in an expression context.
6277                      </para></listitem>
6278                     <listitem><para> <literal>''T</literal> has type <literal>Name</literal>, and names the type constructor  <literal>T</literal>.
6279                   That is, <literal>''</literal><replaceable>thing</replaceable> interprets <replaceable>thing</replaceable> in a type context.
6280                      </para></listitem>
6281                   </itemizedlist>
6282                   These <literal>Names</literal> can be used to construct Template Haskell expressions, patterns, declarations etc.  They
6283                   may also be given as an argument to the <literal>reify</literal> function.
6284                  </para>
6285                 </listitem>
6286
6287               <listitem><para> You may omit the <literal>$(...)</literal> in a top-level declaration splice.
6288               Simply writing an expression (rather than a declaration) implies a splice.  For example, you can write
6289 <programlisting>
6290 module Foo where
6291 import Bar
6292
6293 f x = x
6294
6295 $(deriveStuff 'f)   -- Uses the $(...) notation
6296
6297 g y = y+1
6298
6299 deriveStuff 'g      -- Omits the $(...)
6300
6301 h z = z-1
6302 </programlisting>
6303             This abbreviation makes top-level declaration slices quieter and less intimidating.
6304             </para></listitem>
6305
6306
6307         </itemizedlist>
6308 (Compared to the original paper, there are many differences of detail.
6309 The syntax for a declaration splice uses "<literal>$</literal>" not "<literal>splice</literal>".
6310 The type of the enclosed expression must be  <literal>Q [Dec]</literal>, not  <literal>[Q Dec]</literal>.
6311 Pattern splices and quotations are not implemented.)
6312
6313 </sect2>
6314
6315 <sect2>  <title> Using Template Haskell </title>
6316 <para>
6317 <itemizedlist>
6318     <listitem><para>
6319     The data types and monadic constructor functions for Template Haskell are in the library
6320     <literal>Language.Haskell.THSyntax</literal>.
6321     </para></listitem>
6322
6323     <listitem><para>
6324     You can only run a function at compile time if it is imported from another module.  That is,
6325             you can't define a function in a module, and call it from within a splice in the same module.
6326             (It would make sense to do so, but it's hard to implement.)
6327    </para></listitem>
6328
6329    <listitem><para>
6330    You can only run a function at compile time if it is imported
6331    from another module <emphasis>that is not part of a mutually-recursive group of modules
6332    that includes the module currently being compiled</emphasis>.  Furthermore, all of the modules of
6333    the mutually-recursive group must be reachable by non-SOURCE imports from the module where the
6334    splice is to be run.</para>
6335    <para>
6336    For example, when compiling module A,
6337    you can only run Template Haskell functions imported from B if B does not import A (directly or indirectly).
6338    The reason should be clear: to run B we must compile and run A, but we are currently type-checking A.
6339    </para></listitem>
6340
6341     <listitem><para>
6342             The flag <literal>-ddump-splices</literal> shows the expansion of all top-level splices as they happen.
6343    </para></listitem>
6344     <listitem><para>
6345             If you are building GHC from source, you need at least a stage-2 bootstrap compiler to
6346               run Template Haskell.  A stage-1 compiler will reject the TH constructs.  Reason: TH
6347               compiles and runs a program, and then looks at the result.  So it's important that
6348               the program it compiles produces results whose representations are identical to
6349               those of the compiler itself.
6350    </para></listitem>
6351 </itemizedlist>
6352 </para>
6353 <para> Template Haskell works in any mode (<literal>--make</literal>, <literal>--interactive</literal>,
6354         or file-at-a-time).  There used to be a restriction to the former two, but that restriction
6355         has been lifted.
6356 </para>
6357 </sect2>
6358
6359 <sect2 id="th-example">  <title> A Template Haskell Worked Example </title>
6360 <para>To help you get over the confidence barrier, try out this skeletal worked example.
6361   First cut and paste the two modules below into "Main.hs" and "Printf.hs":</para>
6362
6363 <programlisting>
6364
6365 {- Main.hs -}
6366 module Main where
6367
6368 -- Import our template "pr"
6369 import Printf ( pr )
6370
6371 -- The splice operator $ takes the Haskell source code
6372 -- generated at compile time by "pr" and splices it into
6373 -- the argument of "putStrLn".
6374 main = putStrLn ( $(pr "Hello") )
6375
6376
6377 {- Printf.hs -}
6378 module Printf where
6379
6380 -- Skeletal printf from the paper.
6381 -- It needs to be in a separate module to the one where
6382 -- you intend to use it.
6383
6384 -- Import some Template Haskell syntax
6385 import Language.Haskell.TH
6386
6387 -- Describe a format string
6388 data Format = D | S | L String
6389
6390 -- Parse a format string.  This is left largely to you
6391 -- as we are here interested in building our first ever
6392 -- Template Haskell program and not in building printf.
6393 parse :: String -> [Format]
6394 parse s   = [ L s ]
6395
6396 -- Generate Haskell source code from a parsed representation
6397 -- of the format string.  This code will be spliced into
6398 -- the module which calls "pr", at compile time.
6399 gen :: [Format] -> Q Exp
6400 gen [D]   = [| \n -> show n |]
6401 gen [S]   = [| \s -> s |]
6402 gen [L s] = stringE s
6403
6404 -- Here we generate the Haskell code for the splice
6405 -- from an input format string.
6406 pr :: String -> Q Exp
6407 pr s = gen (parse s)
6408 </programlisting>
6409
6410 <para>Now run the compiler (here we are a Cygwin prompt on Windows):
6411 </para>
6412 <programlisting>
6413 $ ghc --make -XTemplateHaskell main.hs -o main.exe
6414 </programlisting>
6415
6416 <para>Run "main.exe" and here is your output:</para>
6417
6418 <programlisting>
6419 $ ./main
6420 Hello
6421 </programlisting>
6422
6423 </sect2>
6424
6425 <sect2>
6426 <title>Using Template Haskell with Profiling</title>
6427 <indexterm><primary>profiling</primary><secondary>with Template Haskell</secondary></indexterm>
6428
6429 <para>Template Haskell relies on GHC's built-in bytecode compiler and
6430 interpreter to run the splice expressions.  The bytecode interpreter
6431 runs the compiled expression on top of the same runtime on which GHC
6432 itself is running; this means that the compiled code referred to by
6433 the interpreted expression must be compatible with this runtime, and
6434 in particular this means that object code that is compiled for
6435 profiling <emphasis>cannot</emphasis> be loaded and used by a splice
6436 expression, because profiled object code is only compatible with the
6437 profiling version of the runtime.</para>
6438
6439 <para>This causes difficulties if you have a multi-module program
6440 containing Template Haskell code and you need to compile it for
6441 profiling, because GHC cannot load the profiled object code and use it
6442 when executing the splices.  Fortunately GHC provides a workaround.
6443 The basic idea is to compile the program twice:</para>
6444
6445 <orderedlist>
6446 <listitem>
6447   <para>Compile the program or library first the normal way, without
6448   <option>-prof</option><indexterm><primary><option>-prof</option></primary></indexterm>.</para>
6449 </listitem>
6450 <listitem>
6451   <para>Then compile it again with <option>-prof</option>, and
6452   additionally use <option>-osuf
6453   p_o</option><indexterm><primary><option>-osuf</option></primary></indexterm>
6454   to name the object files differently (you can choose any suffix
6455   that isn't the normal object suffix here).  GHC will automatically
6456   load the object files built in the first step when executing splice
6457   expressions.  If you omit the <option>-osuf</option> flag when
6458   building with <option>-prof</option> and Template Haskell is used,
6459   GHC will emit an error message. </para>
6460 </listitem>
6461 </orderedlist>
6462 </sect2>
6463
6464 <sect2 id="th-quasiquotation">  <title> Template Haskell Quasi-quotation </title>
6465 <para>Quasi-quotation allows patterns and expressions to be written using
6466 programmer-defined concrete syntax; the motivation behind the extension and
6467 several examples are documented in
6468 "<ulink url="http://www.eecs.harvard.edu/~mainland/ghc-quasiquoting/">Why It's
6469 Nice to be Quoted: Quasiquoting for Haskell</ulink>" (Proc Haskell Workshop
6470 2007). The example below shows how to write a quasiquoter for a simple
6471 expression language.</para>
6472 <para>
6473 Here are the salient features
6474 <itemizedlist>
6475 <listitem><para>
6476 A quasi-quote has the form
6477 <literal>[<replaceable>quoter</replaceable>| <replaceable>string</replaceable> |]</literal>.
6478 <itemizedlist>
6479 <listitem><para>
6480 The <replaceable>quoter</replaceable> must be the (unqualified) name of an imported
6481 quoter; it cannot be an arbitrary expression.
6482 </para></listitem>
6483 <listitem><para>
6484 The <replaceable>quoter</replaceable> cannot be "<literal>e</literal>",
6485 "<literal>t</literal>", "<literal>d</literal>", or "<literal>p</literal>", since
6486 those overlap with Template Haskell quotations.
6487 </para></listitem>
6488 <listitem><para>
6489 There must be no spaces in the token
6490 <literal>[<replaceable>quoter</replaceable>|</literal>.
6491 </para></listitem>
6492 <listitem><para>
6493 The quoted <replaceable>string</replaceable>
6494 can be arbitrary, and may contain newlines.
6495 </para></listitem>
6496 </itemizedlist>
6497 </para></listitem>
6498
6499 <listitem><para>
6500 A quasiquote may appear in place of
6501 <itemizedlist>
6502 <listitem><para>An expression</para></listitem>
6503 <listitem><para>A pattern</para></listitem>
6504 <listitem><para>A type</para></listitem>
6505 <listitem><para>A top-level declaration</para></listitem>
6506 </itemizedlist>
6507 (Only the first two are described in the paper.)
6508 </para></listitem>
6509
6510 <listitem><para>
6511 A quoter is a value of type <literal>Language.Haskell.TH.Quote.QuasiQuoter</literal>,
6512 which is defined thus:
6513 <programlisting>
6514 data QuasiQuoter = QuasiQuoter { quoteExp  :: String -> Q Exp,
6515                                  quotePat  :: String -> Q Pat,
6516                                  quoteType :: String -> Q Type,
6517                                  quoteDec  :: String -> Q [Dec] }
6518 </programlisting>
6519 That is, a quoter is a tuple of four parsers, one for each of the contexts
6520 in which a quasi-quote can occur.
6521 </para></listitem>
6522 <listitem><para>
6523 A quasi-quote is expanded by applying the appropriate parser to the string
6524 enclosed by the Oxford brackets.  The context of the quasi-quote (expression, pattern,
6525 type, declaration) determines which of the parsers is called.
6526 </para></listitem>
6527 </itemizedlist>
6528 </para>
6529 <para>
6530 The example below shows quasi-quotation in action.  The quoter <literal>expr</literal>
6531 is bound to a value of type <literal>QuasiQuoter</literal> defined in module <literal>Expr</literal>.
6532 The example makes use of an antiquoted
6533 variable <literal>n</literal>, indicated by the syntax <literal>'int:n</literal>
6534 (this syntax for anti-quotation was defined by the parser's
6535 author, <emphasis>not</emphasis> by GHC). This binds <literal>n</literal> to the
6536 integer value argument of the constructor <literal>IntExpr</literal> when
6537 pattern matching. Please see the referenced paper for further details regarding
6538 anti-quotation as well as the description of a technique that uses SYB to
6539 leverage a single parser of type <literal>String -> a</literal> to generate both
6540 an expression parser that returns a value of type <literal>Q Exp</literal> and a
6541 pattern parser that returns a value of type <literal>Q Pat</literal>.
6542 </para>
6543
6544 <para>
6545 Quasiquoters must obey the same stage restrictions as Template Haskell, e.g., in
6546 the example, <literal>expr</literal> cannot be defined
6547 in <literal>Main.hs</literal> where it is used, but must be imported.
6548 </para>
6549
6550 <programlisting>
6551 {- ------------- file Main.hs --------------- -}
6552 module Main where
6553
6554 import Expr
6555
6556 main :: IO ()
6557 main = do { print $ eval [expr|1 + 2|]
6558           ; case IntExpr 1 of
6559               { [expr|'int:n|] -> print n
6560               ;  _              -> return ()
6561               }
6562           }
6563
6564
6565 {- ------------- file Expr.hs --------------- -}
6566 module Expr where
6567
6568 import qualified Language.Haskell.TH as TH
6569 import Language.Haskell.TH.Quote
6570
6571 data Expr  =  IntExpr Integer
6572            |  AntiIntExpr String
6573            |  BinopExpr BinOp Expr Expr
6574            |  AntiExpr String
6575     deriving(Show, Typeable, Data)
6576
6577 data BinOp  =  AddOp
6578             |  SubOp
6579             |  MulOp
6580             |  DivOp
6581     deriving(Show, Typeable, Data)
6582
6583 eval :: Expr -> Integer
6584 eval (IntExpr n)        = n
6585 eval (BinopExpr op x y) = (opToFun op) (eval x) (eval y)
6586   where
6587     opToFun AddOp = (+)
6588     opToFun SubOp = (-)
6589     opToFun MulOp = (*)
6590     opToFun DivOp = div
6591
6592 expr = QuasiQuoter { quoteExp = parseExprExp, quotePat =  parseExprPat }
6593
6594 -- Parse an Expr, returning its representation as
6595 -- either a Q Exp or a Q Pat. See the referenced paper
6596 -- for how to use SYB to do this by writing a single
6597 -- parser of type String -> Expr instead of two
6598 -- separate parsers.
6599
6600 parseExprExp :: String -> Q Exp
6601 parseExprExp ...
6602
6603 parseExprPat :: String -> Q Pat
6604 parseExprPat ...
6605 </programlisting>
6606
6607 <para>Now run the compiler:
6608 <programlisting>
6609 $ ghc --make -XQuasiQuotes Main.hs -o main
6610 </programlisting>
6611 </para>
6612
6613 <para>Run "main" and here is your output:
6614 <programlisting>
6615 $ ./main
6616 3
6617 1
6618 </programlisting>
6619 </para>
6620 </sect2>
6621
6622 </sect1>
6623
6624 <!-- ===================== Arrow notation ===================  -->
6625
6626 <sect1 id="arrow-notation">
6627 <title>Arrow notation
6628 </title>
6629
6630 <para>Arrows are a generalization of monads introduced by John Hughes.
6631 For more details, see
6632 <itemizedlist>
6633
6634 <listitem>
6635 <para>
6636 &ldquo;Generalising Monads to Arrows&rdquo;,
6637 John Hughes, in <citetitle>Science of Computer Programming</citetitle> 37,
6638 pp67&ndash;111, May 2000.
6639 The paper that introduced arrows: a friendly introduction, motivated with
6640 programming examples.
6641 </para>
6642 </listitem>
6643
6644 <listitem>
6645 <para>
6646 &ldquo;<ulink url="http://www.soi.city.ac.uk/~ross/papers/notation.html">A New Notation for Arrows</ulink>&rdquo;,
6647 Ross Paterson, in <citetitle>ICFP</citetitle>, Sep 2001.
6648 Introduced the notation described here.
6649 </para>
6650 </listitem>
6651
6652 <listitem>
6653 <para>
6654 &ldquo;<ulink url="http://www.soi.city.ac.uk/~ross/papers/fop.html">Arrows and Computation</ulink>&rdquo;,
6655 Ross Paterson, in <citetitle>The Fun of Programming</citetitle>,
6656 Palgrave, 2003.
6657 </para>
6658 </listitem>
6659
6660 <listitem>
6661 <para>
6662 &ldquo;<ulink url="http://www.cs.chalmers.se/~rjmh/afp-arrows.pdf">Programming with Arrows</ulink>&rdquo;,
6663 John Hughes, in <citetitle>5th International Summer School on
6664 Advanced Functional Programming</citetitle>,
6665 <citetitle>Lecture Notes in Computer Science</citetitle> vol. 3622,
6666 Springer, 2004.
6667 This paper includes another introduction to the notation,
6668 with practical examples.
6669 </para>
6670 </listitem>
6671
6672 <listitem>
6673 <para>
6674 &ldquo;<ulink url="http://www.haskell.org/ghc/docs/papers/arrow-rules.pdf">Type and Translation Rules for Arrow Notation in GHC</ulink>&rdquo;,
6675 Ross Paterson and Simon Peyton Jones, September 16, 2004.
6676 A terse enumeration of the formal rules used
6677 (extracted from comments in the source code).
6678 </para>
6679 </listitem>
6680
6681 <listitem>
6682 <para>
6683 The arrows web page at
6684 <ulink url="http://www.haskell.org/arrows/"><literal>http://www.haskell.org/arrows/</literal></ulink>.
6685 </para>
6686 </listitem>
6687
6688 </itemizedlist>
6689 With the <option>-XArrows</option> flag, GHC supports the arrow
6690 notation described in the second of these papers,
6691 translating it using combinators from the
6692 <ulink url="&libraryBaseLocation;/Control-Arrow.html"><literal>Control.Arrow</literal></ulink>
6693 module.
6694 What follows is a brief introduction to the notation;
6695 it won't make much sense unless you've read Hughes's paper.
6696 </para>
6697
6698 <para>The extension adds a new kind of expression for defining arrows:
6699 <screen>
6700 <replaceable>exp</replaceable><superscript>10</superscript> ::= ...
6701        |  proc <replaceable>apat</replaceable> -> <replaceable>cmd</replaceable>
6702 </screen>
6703 where <literal>proc</literal> is a new keyword.
6704 The variables of the pattern are bound in the body of the
6705 <literal>proc</literal>-expression,
6706 which is a new sort of thing called a <firstterm>command</firstterm>.
6707 The syntax of commands is as follows:
6708 <screen>
6709 <replaceable>cmd</replaceable>   ::= <replaceable>exp</replaceable><superscript>10</superscript> -&lt;  <replaceable>exp</replaceable>
6710        |  <replaceable>exp</replaceable><superscript>10</superscript> -&lt;&lt; <replaceable>exp</replaceable>
6711        |  <replaceable>cmd</replaceable><superscript>0</superscript>
6712 </screen>
6713 with <replaceable>cmd</replaceable><superscript>0</superscript> up to
6714 <replaceable>cmd</replaceable><superscript>9</superscript> defined using
6715 infix operators as for expressions, and
6716 <screen>
6717 <replaceable>cmd</replaceable><superscript>10</superscript> ::= \ <replaceable>apat</replaceable> ... <replaceable>apat</replaceable> -> <replaceable>cmd</replaceable>
6718        |  let <replaceable>decls</replaceable> in <replaceable>cmd</replaceable>
6719        |  if <replaceable>exp</replaceable> then <replaceable>cmd</replaceable> else <replaceable>cmd</replaceable>
6720        |  case <replaceable>exp</replaceable> of { <replaceable>calts</replaceable> }
6721        |  do { <replaceable>cstmt</replaceable> ; ... <replaceable>cstmt</replaceable> ; <replaceable>cmd</replaceable> }
6722        |  <replaceable>fcmd</replaceable>
6723
6724 <replaceable>fcmd</replaceable>  ::= <replaceable>fcmd</replaceable> <replaceable>aexp</replaceable>
6725        |  ( <replaceable>cmd</replaceable> )
6726        |  (| <replaceable>aexp</replaceable> <replaceable>cmd</replaceable> ... <replaceable>cmd</replaceable> |)
6727
6728 <replaceable>cstmt</replaceable> ::= let <replaceable>decls</replaceable>
6729        |  <replaceable>pat</replaceable> &lt;- <replaceable>cmd</replaceable>
6730        |  rec { <replaceable>cstmt</replaceable> ; ... <replaceable>cstmt</replaceable> [;] }
6731        |  <replaceable>cmd</replaceable>
6732 </screen>
6733 where <replaceable>calts</replaceable> are like <replaceable>alts</replaceable>
6734 except that the bodies are commands instead of expressions.
6735 </para>
6736
6737 <para>
6738 Commands produce values, but (like monadic computations)
6739 may yield more than one value,
6740 or none, and may do other things as well.
6741 For the most part, familiarity with monadic notation is a good guide to
6742 using commands.
6743 However the values of expressions, even monadic ones,
6744 are determined by the values of the variables they contain;
6745 this is not necessarily the case for commands.
6746 </para>
6747
6748 <para>
6749 A simple example of the new notation is the expression
6750 <screen>
6751 proc x -> f -&lt; x+1
6752 </screen>
6753 We call this a <firstterm>procedure</firstterm> or
6754 <firstterm>arrow abstraction</firstterm>.
6755 As with a lambda expression, the variable <literal>x</literal>
6756 is a new variable bound within the <literal>proc</literal>-expression.
6757 It refers to the input to the arrow.
6758 In the above example, <literal>-&lt;</literal> is not an identifier but an
6759 new reserved symbol used for building commands from an expression of arrow
6760 type and an expression to be fed as input to that arrow.
6761 (The weird look will make more sense later.)
6762 It may be read as analogue of application for arrows.
6763 The above example is equivalent to the Haskell expression
6764 <screen>
6765 arr (\ x -> x+1) >>> f
6766 </screen>
6767 That would make no sense if the expression to the left of
6768 <literal>-&lt;</literal> involves the bound variable <literal>x</literal>.
6769 More generally, the expression to the left of <literal>-&lt;</literal>
6770 may not involve any <firstterm>local variable</firstterm>,
6771 i.e. a variable bound in the current arrow abstraction.
6772 For such a situation there is a variant <literal>-&lt;&lt;</literal>, as in
6773 <screen>
6774 proc x -> f x -&lt;&lt; x+1
6775 </screen>
6776 which is equivalent to
6777 <screen>
6778 arr (\ x -> (f x, x+1)) >>> app
6779 </screen>
6780 so in this case the arrow must belong to the <literal>ArrowApply</literal>
6781 class.
6782 Such an arrow is equivalent to a monad, so if you're using this form
6783 you may find a monadic formulation more convenient.
6784 </para>
6785
6786 <sect2>
6787 <title>do-notation for commands</title>
6788
6789 <para>
6790 Another form of command is a form of <literal>do</literal>-notation.
6791 For example, you can write
6792 <screen>
6793 proc x -> do
6794         y &lt;- f -&lt; x+1
6795         g -&lt; 2*y
6796         let z = x+y
6797         t &lt;- h -&lt; x*z
6798         returnA -&lt; t+z
6799 </screen>
6800 You can read this much like ordinary <literal>do</literal>-notation,
6801 but with commands in place of monadic expressions.
6802 The first line sends the value of <literal>x+1</literal> as an input to
6803 the arrow <literal>f</literal>, and matches its output against
6804 <literal>y</literal>.
6805 In the next line, the output is discarded.
6806 The arrow <function>returnA</function> is defined in the
6807 <ulink url="&libraryBaseLocation;/Control-Arrow.html"><literal>Control.Arrow</literal></ulink>
6808 module as <literal>arr id</literal>.
6809 The above example is treated as an abbreviation for
6810 <screen>
6811 arr (\ x -> (x, x)) >>>
6812         first (arr (\ x -> x+1) >>> f) >>>
6813         arr (\ (y, x) -> (y, (x, y))) >>>
6814         first (arr (\ y -> 2*y) >>> g) >>>
6815         arr snd >>>
6816         arr (\ (x, y) -> let z = x+y in ((x, z), z)) >>>
6817         first (arr (\ (x, z) -> x*z) >>> h) >>>
6818         arr (\ (t, z) -> t+z) >>>
6819         returnA
6820 </screen>
6821 Note that variables not used later in the composition are projected out.
6822 After simplification using rewrite rules (see <xref linkend="rewrite-rules"/>)
6823 defined in the
6824 <ulink url="&libraryBaseLocation;/Control-Arrow.html"><literal>Control.Arrow</literal></ulink>
6825 module, this reduces to
6826 <screen>
6827 arr (\ x -> (x+1, x)) >>>
6828         first f >>>
6829         arr (\ (y, x) -> (2*y, (x, y))) >>>
6830         first g >>>
6831         arr (\ (_, (x, y)) -> let z = x+y in (x*z, z)) >>>
6832         first h >>>
6833         arr (\ (t, z) -> t+z)
6834 </screen>
6835 which is what you might have written by hand.
6836 With arrow notation, GHC keeps track of all those tuples of variables for you.
6837 </para>
6838
6839 <para>
6840 Note that although the above translation suggests that
6841 <literal>let</literal>-bound variables like <literal>z</literal> must be
6842 monomorphic, the actual translation produces Core,
6843 so polymorphic variables are allowed.
6844 </para>
6845
6846 <para>
6847 It's also possible to have mutually recursive bindings,
6848 using the new <literal>rec</literal> keyword, as in the following example:
6849 <programlisting>
6850 counter :: ArrowCircuit a => a Bool Int
6851 counter = proc reset -> do
6852         rec     output &lt;- returnA -&lt; if reset then 0 else next
6853                 next &lt;- delay 0 -&lt; output+1
6854         returnA -&lt; output
6855 </programlisting>
6856 The translation of such forms uses the <function>loop</function> combinator,
6857 so the arrow concerned must belong to the <literal>ArrowLoop</literal> class.
6858 </para>
6859
6860 </sect2>
6861
6862 <sect2>
6863 <title>Conditional commands</title>
6864
6865 <para>
6866 In the previous example, we used a conditional expression to construct the
6867 input for an arrow.
6868 Sometimes we want to conditionally execute different commands, as in
6869 <screen>
6870 proc (x,y) ->
6871         if f x y
6872         then g -&lt; x+1
6873         else h -&lt; y+2
6874 </screen>
6875 which is translated to
6876 <screen>
6877 arr (\ (x,y) -> if f x y then Left x else Right y) >>>
6878         (arr (\x -> x+1) >>> f) ||| (arr (\y -> y+2) >>> g)
6879 </screen>
6880 Since the translation uses <function>|||</function>,
6881 the arrow concerned must belong to the <literal>ArrowChoice</literal> class.
6882 </para>
6883
6884 <para>
6885 There are also <literal>case</literal> commands, like
6886 <screen>
6887 case input of
6888     [] -> f -&lt; ()
6889     [x] -> g -&lt; x+1
6890     x1:x2:xs -> do
6891         y &lt;- h -&lt; (x1, x2)
6892         ys &lt;- k -&lt; xs
6893         returnA -&lt; y:ys
6894 </screen>
6895 The syntax is the same as for <literal>case</literal> expressions,
6896 except that the bodies of the alternatives are commands rather than expressions.
6897 The translation is similar to that of <literal>if</literal> commands.
6898 </para>
6899
6900 </sect2>
6901
6902 <sect2>
6903 <title>Defining your own control structures</title>
6904
6905 <para>
6906 As we're seen, arrow notation provides constructs,
6907 modelled on those for expressions,
6908 for sequencing, value recursion and conditionals.
6909 But suitable combinators,
6910 which you can define in ordinary Haskell,
6911 may also be used to build new commands out of existing ones.
6912 The basic idea is that a command defines an arrow from environments to values.
6913 These environments assign values to the free local variables of the command.
6914 Thus combinators that produce arrows from arrows
6915 may also be used to build commands from commands.
6916 For example, the <literal>ArrowChoice</literal> class includes a combinator
6917 <programlisting>
6918 ArrowChoice a => (&lt;+>) :: a e c -> a e c -> a e c
6919 </programlisting>
6920 so we can use it to build commands:
6921 <programlisting>
6922 expr' = proc x -> do
6923                 returnA -&lt; x
6924         &lt;+> do
6925                 symbol Plus -&lt; ()
6926                 y &lt;- term -&lt; ()
6927                 expr' -&lt; x + y
6928         &lt;+> do
6929                 symbol Minus -&lt; ()
6930                 y &lt;- term -&lt; ()
6931                 expr' -&lt; x - y
6932 </programlisting>
6933 (The <literal>do</literal> on the first line is needed to prevent the first
6934 <literal>&lt;+> ...</literal> from being interpreted as part of the
6935 expression on the previous line.)
6936 This is equivalent to
6937 <programlisting>
6938 expr' = (proc x -> returnA -&lt; x)
6939         &lt;+> (proc x -> do
6940                 symbol Plus -&lt; ()
6941                 y &lt;- term -&lt; ()
6942                 expr' -&lt; x + y)
6943         &lt;+> (proc x -> do
6944                 symbol Minus -&lt; ()
6945                 y &lt;- term -&lt; ()
6946                 expr' -&lt; x - y)
6947 </programlisting>
6948 It is essential that this operator be polymorphic in <literal>e</literal>
6949 (representing the environment input to the command
6950 and thence to its subcommands)
6951 and satisfy the corresponding naturality property
6952 <screen>
6953 arr k >>> (f &lt;+> g) = (arr k >>> f) &lt;+> (arr k >>> g)
6954 </screen>
6955 at least for strict <literal>k</literal>.
6956 (This should be automatic if you're not using <function>seq</function>.)
6957 This ensures that environments seen by the subcommands are environments
6958 of the whole command,
6959 and also allows the translation to safely trim these environments.
6960 The operator must also not use any variable defined within the current
6961 arrow abstraction.
6962 </para>
6963
6964 <para>
6965 We could define our own operator
6966 <programlisting>
6967 untilA :: ArrowChoice a => a e () -> a e Bool -> a e ()
6968 untilA body cond = proc x ->
6969         b &lt;- cond -&lt; x
6970         if b then returnA -&lt; ()
6971         else do
6972                 body -&lt; x
6973                 untilA body cond -&lt; x
6974 </programlisting>
6975 and use it in the same way.
6976 Of course this infix syntax only makes sense for binary operators;
6977 there is also a more general syntax involving special brackets:
6978 <screen>
6979 proc x -> do
6980         y &lt;- f -&lt; x+1
6981         (|untilA (increment -&lt; x+y) (within 0.5 -&lt; x)|)
6982 </screen>
6983 </para>
6984
6985 </sect2>
6986
6987 <sect2>
6988 <title>Primitive constructs</title>
6989
6990 <para>
6991 Some operators will need to pass additional inputs to their subcommands.
6992 For example, in an arrow type supporting exceptions,
6993 the operator that attaches an exception handler will wish to pass the
6994 exception that occurred to the handler.
6995 Such an operator might have a type
6996 <screen>
6997 handleA :: ... => a e c -> a (e,Ex) c -> a e c
6998 </screen>
6999 where <literal>Ex</literal> is the type of exceptions handled.
7000 You could then use this with arrow notation by writing a command
7001 <screen>
7002 body `handleA` \ ex -> handler
7003 </screen>
7004 so that if an exception is raised in the command <literal>body</literal>,
7005 the variable <literal>ex</literal> is bound to the value of the exception
7006 and the command <literal>handler</literal>,
7007 which typically refers to <literal>ex</literal>, is entered.
7008 Though the syntax here looks like a functional lambda,
7009 we are talking about commands, and something different is going on.
7010 The input to the arrow represented by a command consists of values for
7011 the free local variables in the command, plus a stack of anonymous values.
7012 In all the prior examples, this stack was empty.
7013 In the second argument to <function>handleA</function>,
7014 this stack consists of one value, the value of the exception.
7015 The command form of lambda merely gives this value a name.
7016 </para>
7017
7018 <para>
7019 More concretely,
7020 the values on the stack are paired to the right of the environment.
7021 So operators like <function>handleA</function> that pass
7022 extra inputs to their subcommands can be designed for use with the notation
7023 by pairing the values with the environment in this way.
7024 More precisely, the type of each argument of the operator (and its result)
7025 should have the form
7026 <screen>
7027 a (...(e,t1), ... tn) t
7028 </screen>
7029 where <replaceable>e</replaceable> is a polymorphic variable
7030 (representing the environment)
7031 and <replaceable>ti</replaceable> are the types of the values on the stack,
7032 with <replaceable>t1</replaceable> being the <quote>top</quote>.
7033 The polymorphic variable <replaceable>e</replaceable> must not occur in
7034 <replaceable>a</replaceable>, <replaceable>ti</replaceable> or
7035 <replaceable>t</replaceable>.
7036 However the arrows involved need not be the same.
7037 Here are some more examples of suitable operators:
7038 <screen>
7039 bracketA :: ... => a e b -> a (e,b) c -> a (e,c) d -> a e d
7040 runReader :: ... => a e c -> a' (e,State) c
7041 runState :: ... => a e c -> a' (e,State) (c,State)
7042 </screen>
7043 We can supply the extra input required by commands built with the last two
7044 by applying them to ordinary expressions, as in
7045 <screen>
7046 proc x -> do
7047         s &lt;- ...
7048         (|runReader (do { ... })|) s
7049 </screen>
7050 which adds <literal>s</literal> to the stack of inputs to the command
7051 built using <function>runReader</function>.
7052 </para>
7053
7054 <para>
7055 The command versions of lambda abstraction and application are analogous to
7056 the expression versions.
7057 In particular, the beta and eta rules describe equivalences of commands.
7058 These three features (operators, lambda abstraction and application)
7059 are the core of the notation; everything else can be built using them,
7060 though the results would be somewhat clumsy.
7061 For example, we could simulate <literal>do</literal>-notation by defining
7062 <programlisting>
7063 bind :: Arrow a => a e b -> a (e,b) c -> a e c
7064 u `bind` f = returnA &amp;&amp;&amp; u >>> f
7065
7066 bind_ :: Arrow a => a e b -> a e c -> a e c
7067 u `bind_` f = u `bind` (arr fst >>> f)
7068 </programlisting>
7069 We could simulate <literal>if</literal> by defining
7070 <programlisting>
7071 cond :: ArrowChoice a => a e b -> a e b -> a (e,Bool) b
7072 cond f g = arr (\ (e,b) -> if b then Left e else Right e) >>> f ||| g
7073 </programlisting>
7074 </para>
7075
7076 </sect2>
7077
7078 <sect2>
7079 <title>Differences with the paper</title>
7080
7081 <itemizedlist>
7082
7083 <listitem>
7084 <para>Instead of a single form of arrow application (arrow tail) with two
7085 translations, the implementation provides two forms
7086 <quote><literal>-&lt;</literal></quote> (first-order)
7087 and <quote><literal>-&lt;&lt;</literal></quote> (higher-order).
7088 </para>
7089 </listitem>
7090
7091 <listitem>
7092 <para>User-defined operators are flagged with banana brackets instead of
7093 a new <literal>form</literal> keyword.
7094 </para>
7095 </listitem>
7096
7097 </itemizedlist>
7098
7099 </sect2>
7100
7101 <sect2>
7102 <title>Portability</title>
7103
7104 <para>
7105 Although only GHC implements arrow notation directly,
7106 there is also a preprocessor
7107 (available from the
7108 <ulink url="http://www.haskell.org/arrows/">arrows web page</ulink>)
7109 that translates arrow notation into Haskell 98
7110 for use with other Haskell systems.
7111 You would still want to check arrow programs with GHC;
7112 tracing type errors in the preprocessor output is not easy.
7113 Modules intended for both GHC and the preprocessor must observe some
7114 additional restrictions:
7115 <itemizedlist>
7116
7117 <listitem>
7118 <para>
7119 The module must import
7120 <ulink url="&libraryBaseLocation;/Control-Arrow.html"><literal>Control.Arrow</literal></ulink>.
7121 </para>
7122 </listitem>
7123
7124 <listitem>
7125 <para>
7126 The preprocessor cannot cope with other Haskell extensions.
7127 These would have to go in separate modules.
7128 </para>
7129 </listitem>
7130
7131 <listitem>
7132 <para>
7133 Because the preprocessor targets Haskell (rather than Core),
7134 <literal>let</literal>-bound variables are monomorphic.
7135 </para>
7136 </listitem>
7137
7138 </itemizedlist>
7139 </para>
7140
7141 </sect2>
7142
7143 </sect1>
7144
7145 <!-- ==================== BANG PATTERNS =================  -->
7146
7147 <sect1 id="bang-patterns">
7148 <title>Bang patterns
7149 <indexterm><primary>Bang patterns</primary></indexterm>
7150 </title>
7151 <para>GHC supports an extension of pattern matching called <emphasis>bang
7152 patterns</emphasis>, written <literal>!<replaceable>pat</replaceable></literal>.
7153 Bang patterns are under consideration for Haskell Prime.
7154 The <ulink
7155 url="http://hackage.haskell.org/trac/haskell-prime/wiki/BangPatterns">Haskell
7156 prime feature description</ulink> contains more discussion and examples
7157 than the material below.
7158 </para>
7159 <para>
7160 The key change is the addition of a new rule to the
7161 <ulink url="http://haskell.org/onlinereport/exps.html#sect3.17.2">semantics of pattern matching in the Haskell 98 report</ulink>.
7162 Add new bullet 10, saying: Matching the pattern <literal>!</literal><replaceable>pat</replaceable>
7163 against a value <replaceable>v</replaceable> behaves as follows:
7164 <itemizedlist>
7165 <listitem><para>if <replaceable>v</replaceable> is bottom, the match diverges</para></listitem>
7166 <listitem><para>otherwise, <replaceable>pat</replaceable> is matched against <replaceable>v</replaceable>  </para></listitem>
7167 </itemizedlist>
7168 </para>
7169 <para>
7170 Bang patterns are enabled by the flag <option>-XBangPatterns</option>.
7171 </para>
7172
7173 <sect2 id="bang-patterns-informal">
7174 <title>Informal description of bang patterns
7175 </title>
7176 <para>
7177 The main idea is to add a single new production to the syntax of patterns:
7178 <programlisting>
7179   pat ::= !pat
7180 </programlisting>
7181 Matching an expression <literal>e</literal> against a pattern <literal>!p</literal> is done by first
7182 evaluating <literal>e</literal> (to WHNF) and then matching the result against <literal>p</literal>.
7183 Example:
7184 <programlisting>
7185 f1 !x = True
7186 </programlisting>
7187 This definition makes <literal>f1</literal> is strict in <literal>x</literal>,
7188 whereas without the bang it would be lazy.
7189 Bang patterns can be nested of course:
7190 <programlisting>
7191 f2 (!x, y) = [x,y]
7192 </programlisting>
7193 Here, <literal>f2</literal> is strict in <literal>x</literal> but not in
7194 <literal>y</literal>.
7195 A bang only really has an effect if it precedes a variable or wild-card pattern:
7196 <programlisting>
7197 f3 !(x,y) = [x,y]
7198 f4 (x,y)  = [x,y]
7199 </programlisting>
7200 Here, <literal>f3</literal> and <literal>f4</literal> are identical;
7201 putting a bang before a pattern that
7202 forces evaluation anyway does nothing.
7203 </para>
7204 <para>
7205 There is one (apparent) exception to this general rule that a bang only
7206 makes a difference when it precedes a variable or wild-card: a bang at the
7207 top level of a <literal>let</literal> or <literal>where</literal>
7208 binding makes the binding strict, regardless of the pattern. For example:
7209 <programlisting>
7210 let ![x,y] = e in b
7211 </programlisting>
7212 is a strict binding: operationally, it evaluates <literal>e</literal>, matches
7213 it against the pattern <literal>[x,y]</literal>, and then evaluates <literal>b</literal>.
7214 (We say "apparent" exception because the Right Way to think of it is that the bang
7215 at the top of a binding is not part of the <emphasis>pattern</emphasis>; rather it
7216 is part of the syntax of the <emphasis>binding</emphasis>.)
7217 Nested bangs in a pattern binding behave uniformly with all other forms of
7218 pattern matching.  For example
7219 <programlisting>
7220 let (!x,[y]) = e in b
7221 </programlisting>
7222 is equivalent to this:
7223 <programlisting>
7224 let { t = case e of (x,[y]) -> x `seq` (x,y)
7225       x = fst t
7226       y = snd t }
7227 in b
7228 </programlisting>
7229 The binding is lazy, but when either <literal>x</literal> or <literal>y</literal> is
7230 evaluated by <literal>b</literal> the entire pattern is matched, including forcing the
7231 evaluation of <literal>x</literal>.
7232 </para>
7233 <para>
7234 Bang patterns work in <literal>case</literal> expressions too, of course:
7235 <programlisting>
7236 g5 x = let y = f x in body
7237 g6 x = case f x of { y -&gt; body }
7238 g7 x = case f x of { !y -&gt; body }
7239 </programlisting>
7240 The functions <literal>g5</literal> and <literal>g6</literal> mean exactly the same thing.
7241 But <literal>g7</literal> evaluates <literal>(f x)</literal>, binds <literal>y</literal> to the
7242 result, and then evaluates <literal>body</literal>.
7243 </para>
7244 </sect2>
7245
7246
7247 <sect2 id="bang-patterns-sem">
7248 <title>Syntax and semantics
7249 </title>
7250 <para>
7251
7252 We add a single new production to the syntax of patterns:
7253 <programlisting>
7254   pat ::= !pat
7255 </programlisting>
7256 There is one problem with syntactic ambiguity.  Consider:
7257 <programlisting>
7258 f !x = 3
7259 </programlisting>
7260 Is this a definition of the infix function "<literal>(!)</literal>",
7261 or of the "<literal>f</literal>" with a bang pattern? GHC resolves this
7262 ambiguity in favour of the latter.  If you want to define
7263 <literal>(!)</literal> with bang-patterns enabled, you have to do so using
7264 prefix notation:
7265 <programlisting>
7266 (!) f x = 3
7267 </programlisting>
7268 The semantics of Haskell pattern matching is described in <ulink
7269 url="http://www.haskell.org/onlinereport/exps.html#sect3.17.2">
7270 Section 3.17.2</ulink> of the Haskell Report.  To this description add
7271 one extra item 10, saying:
7272 <itemizedlist><listitem><para>Matching
7273 the pattern <literal>!pat</literal> against a value <literal>v</literal> behaves as follows:
7274 <itemizedlist><listitem><para>if <literal>v</literal> is bottom, the match diverges</para></listitem>
7275                 <listitem><para>otherwise, <literal>pat</literal> is matched against
7276                 <literal>v</literal></para></listitem>
7277 </itemizedlist>
7278 </para></listitem></itemizedlist>
7279 Similarly, in Figure 4 of  <ulink url="http://www.haskell.org/onlinereport/exps.html#sect3.17.3">
7280 Section 3.17.3</ulink>, add a new case (t):
7281 <programlisting>
7282 case v of { !pat -> e; _ -> e' }
7283    = v `seq` case v of { pat -> e; _ -> e' }
7284 </programlisting>
7285 </para><para>
7286 That leaves let expressions, whose translation is given in
7287 <ulink url="http://www.haskell.org/onlinereport/exps.html#sect3.12">Section
7288 3.12</ulink>
7289 of the Haskell Report.
7290 In the translation box, first apply
7291 the following transformation:  for each pattern <literal>pi</literal> that is of
7292 form <literal>!qi = ei</literal>, transform it to <literal>(xi,!qi) = ((),ei)</literal>, and and replace <literal>e0</literal>
7293 by <literal>(xi `seq` e0)</literal>.  Then, when none of the left-hand-side patterns
7294 have a bang at the top, apply the rules in the existing box.
7295 </para>
7296 <para>The effect of the let rule is to force complete matching of the pattern
7297 <literal>qi</literal> before evaluation of the body is begun.  The bang is
7298 retained in the translated form in case <literal>qi</literal> is a variable,
7299 thus:
7300 <programlisting>
7301   let !y = f x in b
7302 </programlisting>
7303
7304 </para>
7305 <para>
7306 The let-binding can be recursive.  However, it is much more common for
7307 the let-binding to be non-recursive, in which case the following law holds:
7308 <literal>(let !p = rhs in body)</literal>
7309      is equivalent to
7310 <literal>(case rhs of !p -> body)</literal>
7311 </para>
7312 <para>
7313 A pattern with a bang at the outermost level is not allowed at the top level of
7314 a module.
7315 </para>
7316 </sect2>
7317 </sect1>
7318
7319 <!-- ==================== ASSERTIONS =================  -->
7320
7321 <sect1 id="assertions">
7322 <title>Assertions
7323 <indexterm><primary>Assertions</primary></indexterm>
7324 </title>
7325
7326 <para>
7327 If you want to make use of assertions in your standard Haskell code, you
7328 could define a function like the following:
7329 </para>
7330
7331 <para>
7332
7333 <programlisting>
7334 assert :: Bool -> a -> a
7335 assert False x = error "assertion failed!"
7336 assert _     x = x
7337 </programlisting>
7338
7339 </para>
7340
7341 <para>
7342 which works, but gives you back a less than useful error message --
7343 an assertion failed, but which and where?
7344 </para>
7345
7346 <para>
7347 One way out is to define an extended <function>assert</function> function which also
7348 takes a descriptive string to include in the error message and
7349 perhaps combine this with the use of a pre-processor which inserts
7350 the source location where <function>assert</function> was used.
7351 </para>
7352
7353 <para>
7354 Ghc offers a helping hand here, doing all of this for you. For every
7355 use of <function>assert</function> in the user's source:
7356 </para>
7357
7358 <para>
7359
7360 <programlisting>
7361 kelvinToC :: Double -> Double
7362 kelvinToC k = assert (k &gt;= 0.0) (k+273.15)
7363 </programlisting>
7364
7365 </para>
7366
7367 <para>
7368 Ghc will rewrite this to also include the source location where the
7369 assertion was made,
7370 </para>
7371
7372 <para>
7373
7374 <programlisting>
7375 assert pred val ==> assertError "Main.hs|15" pred val
7376 </programlisting>
7377
7378 </para>
7379
7380 <para>
7381 The rewrite is only performed by the compiler when it spots
7382 applications of <function>Control.Exception.assert</function>, so you
7383 can still define and use your own versions of
7384 <function>assert</function>, should you so wish. If not, import
7385 <literal>Control.Exception</literal> to make use
7386 <function>assert</function> in your code.
7387 </para>
7388
7389 <para>
7390 GHC ignores assertions when optimisation is turned on with the
7391       <option>-O</option><indexterm><primary><option>-O</option></primary></indexterm> flag.  That is, expressions of the form
7392 <literal>assert pred e</literal> will be rewritten to
7393 <literal>e</literal>.  You can also disable assertions using the
7394       <option>-fignore-asserts</option>
7395       option<indexterm><primary><option>-fignore-asserts</option></primary>
7396       </indexterm>.</para>
7397
7398 <para>
7399 Assertion failures can be caught, see the documentation for the
7400 <literal>Control.Exception</literal> library for the details.
7401 </para>
7402
7403 </sect1>
7404
7405
7406 <!-- =============================== PRAGMAS ===========================  -->
7407
7408   <sect1 id="pragmas">
7409     <title>Pragmas</title>
7410
7411     <indexterm><primary>pragma</primary></indexterm>
7412
7413     <para>GHC supports several pragmas, or instructions to the
7414     compiler placed in the source code.  Pragmas don't normally affect
7415     the meaning of the program, but they might affect the efficiency
7416     of the generated code.</para>
7417
7418     <para>Pragmas all take the form
7419
7420 <literal>{-# <replaceable>word</replaceable> ... #-}</literal>
7421
7422     where <replaceable>word</replaceable> indicates the type of
7423     pragma, and is followed optionally by information specific to that
7424     type of pragma.  Case is ignored in
7425     <replaceable>word</replaceable>.  The various values for
7426     <replaceable>word</replaceable> that GHC understands are described
7427     in the following sections; any pragma encountered with an
7428     unrecognised <replaceable>word</replaceable> is
7429     ignored. The layout rule applies in pragmas, so the closing <literal>#-}</literal>
7430     should start in a column to the right of the opening <literal>{-#</literal>. </para>
7431
7432     <para>Certain pragmas are <emphasis>file-header pragmas</emphasis>:
7433       <itemizedlist>
7434       <listitem><para>
7435           A file-header
7436           pragma must precede the <literal>module</literal> keyword in the file.
7437           </para></listitem>
7438       <listitem><para>
7439       There can be as many file-header pragmas as you please, and they can be
7440       preceded or followed by comments.
7441           </para></listitem>
7442       <listitem><para>
7443       File-header pragmas are read once only, before
7444       pre-processing the file (e.g. with cpp).
7445           </para></listitem>
7446       <listitem><para>
7447          The file-header pragmas are: <literal>{-# LANGUAGE #-}</literal>,
7448         <literal>{-# OPTIONS_GHC #-}</literal>, and
7449         <literal>{-# INCLUDE #-}</literal>.
7450           </para></listitem>
7451       </itemizedlist>
7452       </para>
7453
7454     <sect2 id="language-pragma">
7455       <title>LANGUAGE pragma</title>
7456
7457       <indexterm><primary>LANGUAGE</primary><secondary>pragma</secondary></indexterm>
7458       <indexterm><primary>pragma</primary><secondary>LANGUAGE</secondary></indexterm>
7459
7460       <para>The <literal>LANGUAGE</literal> pragma allows language extensions to be enabled
7461         in a portable way.
7462         It is the intention that all Haskell compilers support the
7463         <literal>LANGUAGE</literal> pragma with the same syntax, although not
7464         all extensions are supported by all compilers, of
7465         course.  The <literal>LANGUAGE</literal> pragma should be used instead
7466         of <literal>OPTIONS_GHC</literal>, if possible.</para>
7467
7468       <para>For example, to enable the FFI and preprocessing with CPP:</para>
7469
7470 <programlisting>{-# LANGUAGE ForeignFunctionInterface, CPP #-}</programlisting>
7471
7472         <para><literal>LANGUAGE</literal> is a file-header pragma (see <xref linkend="pragmas"/>).</para>
7473
7474       <para>Every language extension can also be turned into a command-line flag
7475         by prefixing it with "<literal>-X</literal>"; for example <option>-XForeignFunctionInterface</option>.
7476         (Similarly, all "<literal>-X</literal>" flags can be written as <literal>LANGUAGE</literal> pragmas.
7477       </para>
7478
7479       <para>A list of all supported language extensions can be obtained by invoking
7480         <literal>ghc --supported-languages</literal> (see <xref linkend="modes"/>).</para>
7481
7482       <para>Any extension from the <literal>Extension</literal> type defined in
7483         <ulink
7484           url="&libraryCabalLocation;/Language-Haskell-Extension.html"><literal>Language.Haskell.Extension</literal></ulink>
7485         may be used.  GHC will report an error if any of the requested extensions are not supported.</para>
7486     </sect2>
7487
7488
7489     <sect2 id="options-pragma">
7490       <title>OPTIONS_GHC pragma</title>
7491       <indexterm><primary>OPTIONS_GHC</primary>
7492       </indexterm>
7493       <indexterm><primary>pragma</primary><secondary>OPTIONS_GHC</secondary>
7494       </indexterm>
7495
7496       <para>The <literal>OPTIONS_GHC</literal> pragma is used to specify
7497       additional options that are given to the compiler when compiling
7498       this source file.  See <xref linkend="source-file-options"/> for
7499       details.</para>
7500
7501       <para>Previous versions of GHC accepted <literal>OPTIONS</literal> rather
7502         than <literal>OPTIONS_GHC</literal>, but that is now deprecated.</para>
7503     </sect2>
7504
7505         <para><literal>OPTIONS_GHC</literal> is a file-header pragma (see <xref linkend="pragmas"/>).</para>
7506
7507     <sect2 id="include-pragma">
7508       <title>INCLUDE pragma</title>
7509
7510       <para>The <literal>INCLUDE</literal> used to be necessary for
7511         specifying header files to be included when using the FFI and
7512         compiling via C.  It is no longer required for GHC, but is
7513         accepted (and ignored) for compatibility with other
7514         compilers.</para>
7515     </sect2>
7516
7517     <sect2 id="warning-deprecated-pragma">
7518       <title>WARNING and DEPRECATED pragmas</title>
7519       <indexterm><primary>WARNING</primary></indexterm>
7520       <indexterm><primary>DEPRECATED</primary></indexterm>
7521
7522       <para>The WARNING pragma allows you to attach an arbitrary warning
7523       to a particular function, class, or type.
7524       A DEPRECATED pragma lets you specify that
7525       a particular function, class, or type is deprecated.
7526       There are two ways of using these pragmas.
7527
7528       <itemizedlist>
7529         <listitem>
7530           <para>You can work on an entire module thus:</para>
7531 <programlisting>
7532    module Wibble {-# DEPRECATED "Use Wobble instead" #-} where
7533      ...
7534 </programlisting>
7535       <para>Or:</para>
7536 <programlisting>
7537    module Wibble {-# WARNING "This is an unstable interface." #-} where
7538      ...
7539 </programlisting>
7540           <para>When you compile any module that import
7541           <literal>Wibble</literal>, GHC will print the specified
7542           message.</para>
7543         </listitem>
7544
7545         <listitem>
7546           <para>You can attach a warning to a function, class, type, or data constructor, with the
7547           following top-level declarations:</para>
7548 <programlisting>
7549    {-# DEPRECATED f, C, T "Don't use these" #-}
7550    {-# WARNING unsafePerformIO "This is unsafe; I hope you know what you're doing" #-}
7551 </programlisting>
7552           <para>When you compile any module that imports and uses any
7553           of the specified entities, GHC will print the specified
7554           message.</para>
7555           <para> You can only attach to entities declared at top level in the module
7556           being compiled, and you can only use unqualified names in the list of
7557           entities. A capitalised name, such as <literal>T</literal>
7558           refers to <emphasis>either</emphasis> the type constructor <literal>T</literal>
7559           <emphasis>or</emphasis> the data constructor <literal>T</literal>, or both if
7560           both are in scope.  If both are in scope, there is currently no way to
7561       specify one without the other (c.f. fixities
7562       <xref linkend="infix-tycons"/>).</para>
7563         </listitem>
7564       </itemizedlist>
7565       Warnings and deprecations are not reported for
7566       (a) uses within the defining module, and
7567       (b) uses in an export list.
7568       The latter reduces spurious complaints within a library
7569       in which one module gathers together and re-exports
7570       the exports of several others.
7571       </para>
7572       <para>You can suppress the warnings with the flag
7573       <option>-fno-warn-warnings-deprecations</option>.</para>
7574     </sect2>
7575
7576     <sect2 id="inline-noinline-pragma">
7577       <title>INLINE and NOINLINE pragmas</title>
7578
7579       <para>These pragmas control the inlining of function
7580       definitions.</para>
7581
7582       <sect3 id="inline-pragma">
7583         <title>INLINE pragma</title>
7584         <indexterm><primary>INLINE</primary></indexterm>
7585
7586         <para>GHC (with <option>-O</option>, as always) tries to
7587         inline (or &ldquo;unfold&rdquo;) functions/values that are
7588         &ldquo;small enough,&rdquo; thus avoiding the call overhead
7589         and possibly exposing other more-wonderful optimisations.
7590         Normally, if GHC decides a function is &ldquo;too
7591         expensive&rdquo; to inline, it will not do so, nor will it
7592         export that unfolding for other modules to use.</para>
7593
7594         <para>The sledgehammer you can bring to bear is the
7595         <literal>INLINE</literal><indexterm><primary>INLINE
7596         pragma</primary></indexterm> pragma, used thusly:</para>
7597
7598 <programlisting>
7599 key_function :: Int -> String -> (Bool, Double)
7600 {-# INLINE key_function #-}
7601 </programlisting>
7602
7603         <para>The major effect of an <literal>INLINE</literal> pragma
7604         is to declare a function's &ldquo;cost&rdquo; to be very low.
7605         The normal unfolding machinery will then be very keen to
7606         inline it.  However, an <literal>INLINE</literal> pragma for a
7607         function "<literal>f</literal>" has a number of other effects:
7608 <itemizedlist>
7609 <listitem><para>
7610 No functions are inlined into <literal>f</literal>.  Otherwise
7611 GHC might inline a big function into <literal>f</literal>'s right hand side,
7612 making <literal>f</literal> big; and then inline <literal>f</literal> blindly.
7613 </para></listitem>
7614 <listitem><para>
7615 The float-in, float-out, and common-sub-expression transformations are not
7616 applied to the body of <literal>f</literal>.
7617 </para></listitem>
7618 <listitem><para>
7619 An INLINE function is not worker/wrappered by strictness analysis.
7620 It's going to be inlined wholesale instead.
7621 </para></listitem>
7622 </itemizedlist>
7623 All of these effects are aimed at ensuring that what gets inlined is
7624 exactly what you asked for, no more and no less.
7625 </para>
7626 <para>GHC ensures that inlining cannot go on forever: every mutually-recursive
7627 group is cut by one or more <emphasis>loop breakers</emphasis> that is never inlined
7628 (see <ulink url="http://research.microsoft.com/%7Esimonpj/Papers/inlining/index.htm">
7629 Secrets of the GHC inliner, JFP 12(4) July 2002</ulink>).
7630 GHC tries not to select a function with an INLINE pragma as a loop breaker, but
7631 when there is no choice even an INLINE function can be selected, in which case
7632 the INLINE pragma is ignored.
7633 For example, for a self-recursive function, the loop breaker can only be the function
7634 itself, so an INLINE pragma is always ignored.</para>
7635
7636         <para>Syntactically, an <literal>INLINE</literal> pragma for a
7637         function can be put anywhere its type signature could be
7638         put.</para>
7639
7640         <para><literal>INLINE</literal> pragmas are a particularly
7641         good idea for the
7642         <literal>then</literal>/<literal>return</literal> (or
7643         <literal>bind</literal>/<literal>unit</literal>) functions in
7644         a monad.  For example, in GHC's own
7645         <literal>UniqueSupply</literal> monad code, we have:</para>
7646
7647 <programlisting>
7648 {-# INLINE thenUs #-}
7649 {-# INLINE returnUs #-}
7650 </programlisting>
7651
7652         <para>See also the <literal>NOINLINE</literal> pragma (<xref
7653         linkend="noinline-pragma"/>).</para>
7654
7655         <para>Note: the HBC compiler doesn't like <literal>INLINE</literal> pragmas,
7656           so if you want your code to be HBC-compatible you'll have to surround
7657           the pragma with C pre-processor directives
7658           <literal>#ifdef __GLASGOW_HASKELL__</literal>...<literal>#endif</literal>.</para>
7659
7660       </sect3>
7661
7662       <sect3 id="noinline-pragma">
7663         <title>NOINLINE pragma</title>
7664
7665         <indexterm><primary>NOINLINE</primary></indexterm>
7666         <indexterm><primary>NOTINLINE</primary></indexterm>
7667
7668         <para>The <literal>NOINLINE</literal> pragma does exactly what
7669         you'd expect: it stops the named function from being inlined
7670         by the compiler.  You shouldn't ever need to do this, unless
7671         you're very cautious about code size.</para>
7672
7673         <para><literal>NOTINLINE</literal> is a synonym for
7674         <literal>NOINLINE</literal> (<literal>NOINLINE</literal> is
7675         specified by Haskell 98 as the standard way to disable
7676         inlining, so it should be used if you want your code to be
7677         portable).</para>
7678       </sect3>
7679
7680       <sect3 id="conlike-pragma">
7681         <title>CONLIKE modifier</title>
7682         <indexterm><primary>CONLIKE</primary></indexterm>
7683         <para>An INLINE or NOINLINE pragma may have a CONLIKE modifier,
7684         which affects matching in RULEs (only).  See <xref linkend="conlike"/>.
7685         </para>
7686       </sect3>
7687
7688       <sect3 id="phase-control">
7689         <title>Phase control</title>
7690
7691         <para> Sometimes you want to control exactly when in GHC's
7692         pipeline the INLINE pragma is switched on.  Inlining happens
7693         only during runs of the <emphasis>simplifier</emphasis>.  Each
7694         run of the simplifier has a different <emphasis>phase
7695         number</emphasis>; the phase number decreases towards zero.
7696         If you use <option>-dverbose-core2core</option> you'll see the
7697         sequence of phase numbers for successive runs of the
7698         simplifier.  In an INLINE pragma you can optionally specify a
7699         phase number, thus:
7700         <itemizedlist>
7701           <listitem>
7702             <para>"<literal>INLINE[k] f</literal>" means: do not inline
7703             <literal>f</literal>
7704               until phase <literal>k</literal>, but from phase
7705               <literal>k</literal> onwards be very keen to inline it.
7706             </para></listitem>
7707           <listitem>
7708             <para>"<literal>INLINE[~k] f</literal>" means: be very keen to inline
7709             <literal>f</literal>
7710               until phase <literal>k</literal>, but from phase
7711               <literal>k</literal> onwards do not inline it.
7712             </para></listitem>
7713           <listitem>
7714             <para>"<literal>NOINLINE[k] f</literal>" means: do not inline
7715             <literal>f</literal>
7716               until phase <literal>k</literal>, but from phase
7717               <literal>k</literal> onwards be willing to inline it (as if
7718               there was no pragma).
7719             </para></listitem>
7720             <listitem>
7721             <para>"<literal>NOINLINE[~k] f</literal>" means: be willing to inline
7722             <literal>f</literal>
7723               until phase <literal>k</literal>, but from phase
7724               <literal>k</literal> onwards do not inline it.
7725             </para></listitem>
7726         </itemizedlist>
7727 The same information is summarised here:
7728 <programlisting>
7729                            -- Before phase 2     Phase 2 and later
7730   {-# INLINE   [2]  f #-}  --      No                 Yes
7731   {-# INLINE   [~2] f #-}  --      Yes                No
7732   {-# NOINLINE [2]  f #-}  --      No                 Maybe
7733   {-# NOINLINE [~2] f #-}  --      Maybe              No
7734
7735   {-# INLINE   f #-}       --      Yes                Yes
7736   {-# NOINLINE f #-}       --      No                 No
7737 </programlisting>
7738 By "Maybe" we mean that the usual heuristic inlining rules apply (if the
7739 function body is small, or it is applied to interesting-looking arguments etc).
7740 Another way to understand the semantics is this:
7741 <itemizedlist>
7742 <listitem><para>For both INLINE and NOINLINE, the phase number says
7743 when inlining is allowed at all.</para></listitem>
7744 <listitem><para>The INLINE pragma has the additional effect of making the
7745 function body look small, so that when inlining is allowed it is very likely to
7746 happen.
7747 </para></listitem>
7748 </itemizedlist>
7749 </para>
7750 <para>The same phase-numbering control is available for RULES
7751         (<xref linkend="rewrite-rules"/>).</para>
7752       </sect3>
7753     </sect2>
7754
7755     <sect2 id="annotation-pragmas">
7756       <title>ANN pragmas</title>
7757
7758       <para>GHC offers the ability to annotate various code constructs with additional
7759       data by using three pragmas.  This data can then be inspected at a later date by
7760       using GHC-as-a-library.</para>
7761
7762       <sect3 id="ann-pragma">
7763         <title>Annotating values</title>
7764
7765         <indexterm><primary>ANN</primary></indexterm>
7766
7767         <para>Any expression that has both <literal>Typeable</literal> and <literal>Data</literal> instances may be attached to a top-level value
7768         binding using an <literal>ANN</literal> pragma. In particular, this means you can use <literal>ANN</literal>
7769         to annotate data constructors (e.g. <literal>Just</literal>) as well as normal values (e.g. <literal>take</literal>).
7770         By way of example, to annotate the function <literal>foo</literal> with the annotation <literal>Just "Hello"</literal>
7771         you would do this:</para>
7772
7773 <programlisting>
7774 {-# ANN foo (Just "Hello") #-}
7775 foo = ...
7776 </programlisting>
7777
7778         <para>
7779           A number of restrictions apply to use of annotations:
7780           <itemizedlist>
7781             <listitem><para>The binder being annotated must be at the top level (i.e. no nested binders)</para></listitem>
7782             <listitem><para>The binder being annotated must be declared in the current module</para></listitem>
7783             <listitem><para>The expression you are annotating with must have a type with <literal>Typeable</literal> and <literal>Data</literal> instances</para></listitem>
7784             <listitem><para>The <ulink linkend="using-template-haskell">Template Haskell staging restrictions</ulink> apply to the
7785             expression being annotated with, so for example you cannot run a function from the module being compiled.</para>
7786
7787             <para>To be precise, the annotation <literal>{-# ANN x e #-}</literal> is well staged if and only if <literal>$(e)</literal> would be
7788             (disregarding the usual type restrictions of the splice syntax, and the usual restriction on splicing inside a splice - <literal>$([|1|])</literal> is fine as an annotation, albeit redundant).</para></listitem>
7789           </itemizedlist>
7790
7791           If you feel strongly that any of these restrictions are too onerous, <ulink url="http://hackage.haskell.org/trac/ghc/wiki/MailingListsAndIRC">
7792           please give the GHC team a shout</ulink>.
7793         </para>
7794
7795         <para>However, apart from these restrictions, many things are allowed, including expressions which are not fully evaluated!
7796         Annotation expressions will be evaluated by the compiler just like Template Haskell splices are. So, this annotation is fine:</para>
7797
7798 <programlisting>
7799 {-# ANN f SillyAnnotation { foo = (id 10) + $([| 20 |]), bar = 'f } #-}
7800 f = ...
7801 </programlisting>
7802       </sect3>
7803
7804       <sect3 id="typeann-pragma">
7805         <title>Annotating types</title>
7806
7807         <indexterm><primary>ANN type</primary></indexterm>
7808         <indexterm><primary>ANN</primary></indexterm>
7809
7810         <para>You can annotate types with the <literal>ANN</literal> pragma by using the <literal>type</literal> keyword. For example:</para>
7811
7812 <programlisting>
7813 {-# ANN type Foo (Just "A `Maybe String' annotation") #-}
7814 data Foo = ...
7815 </programlisting>
7816       </sect3>
7817
7818       <sect3 id="modann-pragma">
7819         <title>Annotating modules</title>
7820
7821         <indexterm><primary>ANN module</primary></indexterm>
7822         <indexterm><primary>ANN</primary></indexterm>
7823
7824         <para>You can annotate modules with the <literal>ANN</literal> pragma by using the <literal>module</literal> keyword. For example:</para>
7825
7826 <programlisting>
7827 {-# ANN module (Just "A `Maybe String' annotation") #-}
7828 </programlisting>
7829       </sect3>
7830     </sect2>
7831
7832     <sect2 id="line-pragma">
7833       <title>LINE pragma</title>
7834
7835       <indexterm><primary>LINE</primary><secondary>pragma</secondary></indexterm>
7836       <indexterm><primary>pragma</primary><secondary>LINE</secondary></indexterm>
7837       <para>This pragma is similar to C's <literal>&num;line</literal>
7838       pragma, and is mainly for use in automatically generated Haskell
7839       code.  It lets you specify the line number and filename of the
7840       original code; for example</para>
7841
7842 <programlisting>{-# LINE 42 "Foo.vhs" #-}</programlisting>
7843
7844       <para>if you'd generated the current file from something called
7845       <filename>Foo.vhs</filename> and this line corresponds to line
7846       42 in the original.  GHC will adjust its error messages to refer
7847       to the line/file named in the <literal>LINE</literal>
7848       pragma.</para>
7849     </sect2>
7850
7851     <sect2 id="rules">
7852       <title>RULES pragma</title>
7853
7854       <para>The RULES pragma lets you specify rewrite rules.  It is
7855       described in <xref linkend="rewrite-rules"/>.</para>
7856     </sect2>
7857
7858     <sect2 id="specialize-pragma">
7859       <title>SPECIALIZE pragma</title>
7860
7861       <indexterm><primary>SPECIALIZE pragma</primary></indexterm>
7862       <indexterm><primary>pragma, SPECIALIZE</primary></indexterm>
7863       <indexterm><primary>overloading, death to</primary></indexterm>
7864
7865       <para>(UK spelling also accepted.)  For key overloaded
7866       functions, you can create extra versions (NB: more code space)
7867       specialised to particular types.  Thus, if you have an
7868       overloaded function:</para>
7869
7870 <programlisting>
7871   hammeredLookup :: Ord key => [(key, value)] -> key -> value
7872 </programlisting>
7873
7874       <para>If it is heavily used on lists with
7875       <literal>Widget</literal> keys, you could specialise it as
7876       follows:</para>
7877
7878 <programlisting>
7879   {-# SPECIALIZE hammeredLookup :: [(Widget, value)] -> Widget -> value #-}
7880 </programlisting>
7881
7882       <para>A <literal>SPECIALIZE</literal> pragma for a function can
7883       be put anywhere its type signature could be put.</para>
7884
7885       <para>A <literal>SPECIALIZE</literal> has the effect of generating
7886       (a) a specialised version of the function and (b) a rewrite rule
7887       (see <xref linkend="rewrite-rules"/>) that rewrites a call to the
7888       un-specialised function into a call to the specialised one.</para>
7889
7890       <para>The type in a SPECIALIZE pragma can be any type that is less
7891         polymorphic than the type of the original function.  In concrete terms,
7892         if the original function is <literal>f</literal> then the pragma
7893 <programlisting>
7894   {-# SPECIALIZE f :: &lt;type&gt; #-}
7895 </programlisting>
7896       is valid if and only if the definition
7897 <programlisting>
7898   f_spec :: &lt;type&gt;
7899   f_spec = f
7900 </programlisting>
7901       is valid.  Here are some examples (where we only give the type signature
7902       for the original function, not its code):
7903 <programlisting>
7904   f :: Eq a => a -> b -> b
7905   {-# SPECIALISE f :: Int -> b -> b #-}
7906
7907   g :: (Eq a, Ix b) => a -> b -> b
7908   {-# SPECIALISE g :: (Eq a) => a -> Int -> Int #-}
7909
7910   h :: Eq a => a -> a -> a
7911   {-# SPECIALISE h :: (Eq a) => [a] -> [a] -> [a] #-}
7912 </programlisting>
7913 The last of these examples will generate a
7914 RULE with a somewhat-complex left-hand side (try it yourself), so it might not fire very
7915 well.  If you use this kind of specialisation, let us know how well it works.
7916 </para>
7917
7918 <para>A <literal>SPECIALIZE</literal> pragma can optionally be followed with a
7919 <literal>INLINE</literal> or <literal>NOINLINE</literal> pragma, optionally
7920 followed by a phase, as described in <xref linkend="inline-noinline-pragma"/>.
7921 The <literal>INLINE</literal> pragma affects the specialised version of the
7922 function (only), and applies even if the function is recursive.  The motivating
7923 example is this:
7924 <programlisting>
7925 -- A GADT for arrays with type-indexed representation
7926 data Arr e where
7927   ArrInt :: !Int -> ByteArray# -> Arr Int
7928   ArrPair :: !Int -> Arr e1 -> Arr e2 -> Arr (e1, e2)
7929
7930 (!:) :: Arr e -> Int -> e
7931 {-# SPECIALISE INLINE (!:) :: Arr Int -> Int -> Int #-}
7932 {-# SPECIALISE INLINE (!:) :: Arr (a, b) -> Int -> (a, b) #-}
7933 (ArrInt _ ba)     !: (I# i) = I# (indexIntArray# ba i)
7934 (ArrPair _ a1 a2) !: i      = (a1 !: i, a2 !: i)
7935 </programlisting>
7936 Here, <literal>(!:)</literal> is a recursive function that indexes arrays
7937 of type <literal>Arr e</literal>.  Consider a call to  <literal>(!:)</literal>
7938 at type <literal>(Int,Int)</literal>.  The second specialisation will fire, and
7939 the specialised function will be inlined.  It has two calls to
7940 <literal>(!:)</literal>,
7941 both at type <literal>Int</literal>.  Both these calls fire the first
7942 specialisation, whose body is also inlined.  The result is a type-based
7943 unrolling of the indexing function.</para>
7944 <para>Warning: you can make GHC diverge by using <literal>SPECIALISE INLINE</literal>
7945 on an ordinarily-recursive function.</para>
7946
7947       <para>Note: In earlier versions of GHC, it was possible to provide your own
7948       specialised function for a given type:
7949
7950 <programlisting>
7951 {-# SPECIALIZE hammeredLookup :: [(Int, value)] -> Int -> value = intLookup #-}
7952 </programlisting>
7953
7954       This feature has been removed, as it is now subsumed by the
7955       <literal>RULES</literal> pragma (see <xref linkend="rule-spec"/>).</para>
7956
7957     </sect2>
7958
7959 <sect2 id="specialize-instance-pragma">
7960 <title>SPECIALIZE instance pragma
7961 </title>
7962
7963 <para>
7964 <indexterm><primary>SPECIALIZE pragma</primary></indexterm>
7965 <indexterm><primary>overloading, death to</primary></indexterm>
7966 Same idea, except for instance declarations.  For example:
7967
7968 <programlisting>
7969 instance (Eq a) => Eq (Foo a) where {
7970    {-# SPECIALIZE instance Eq (Foo [(Int, Bar)]) #-}
7971    ... usual stuff ...
7972  }
7973 </programlisting>
7974 The pragma must occur inside the <literal>where</literal> part
7975 of the instance declaration.
7976 </para>
7977 <para>
7978 Compatible with HBC, by the way, except perhaps in the placement
7979 of the pragma.
7980 </para>
7981
7982 </sect2>
7983
7984     <sect2 id="unpack-pragma">
7985       <title>UNPACK pragma</title>
7986
7987       <indexterm><primary>UNPACK</primary></indexterm>
7988
7989       <para>The <literal>UNPACK</literal> indicates to the compiler
7990       that it should unpack the contents of a constructor field into
7991       the constructor itself, removing a level of indirection.  For
7992       example:</para>
7993
7994 <programlisting>
7995 data T = T {-# UNPACK #-} !Float
7996            {-# UNPACK #-} !Float
7997 </programlisting>
7998
7999       <para>will create a constructor <literal>T</literal> containing
8000       two unboxed floats.  This may not always be an optimisation: if
8001       the <function>T</function> constructor is scrutinised and the
8002       floats passed to a non-strict function for example, they will
8003       have to be reboxed (this is done automatically by the
8004       compiler).</para>
8005
8006       <para>Unpacking constructor fields should only be used in
8007       conjunction with <option>-O</option>, in order to expose
8008       unfoldings to the compiler so the reboxing can be removed as
8009       often as possible.  For example:</para>
8010
8011 <programlisting>
8012 f :: T -&#62; Float
8013 f (T f1 f2) = f1 + f2
8014 </programlisting>
8015
8016       <para>The compiler will avoid reboxing <function>f1</function>
8017       and <function>f2</function> by inlining <function>+</function>
8018       on floats, but only when <option>-O</option> is on.</para>
8019
8020       <para>Any single-constructor data is eligible for unpacking; for
8021       example</para>
8022
8023 <programlisting>
8024 data T = T {-# UNPACK #-} !(Int,Int)
8025 </programlisting>
8026
8027       <para>will store the two <literal>Int</literal>s directly in the
8028       <function>T</function> constructor, by flattening the pair.
8029       Multi-level unpacking is also supported:
8030
8031 <programlisting>
8032 data T = T {-# UNPACK #-} !S
8033 data S = S {-# UNPACK #-} !Int {-# UNPACK #-} !Int
8034 </programlisting>
8035
8036       will store two unboxed <literal>Int&num;</literal>s
8037       directly in the <function>T</function> constructor.  The
8038       unpacker can see through newtypes, too.</para>
8039
8040       <para>If a field cannot be unpacked, you will not get a warning,
8041       so it might be an idea to check the generated code with
8042       <option>-ddump-simpl</option>.</para>
8043
8044       <para>See also the <option>-funbox-strict-fields</option> flag,
8045       which essentially has the effect of adding
8046       <literal>{-#&nbsp;UNPACK&nbsp;#-}</literal> to every strict
8047       constructor field.</para>
8048     </sect2>
8049
8050     <sect2 id="source-pragma">
8051       <title>SOURCE pragma</title>
8052
8053       <indexterm><primary>SOURCE</primary></indexterm>
8054      <para>The <literal>{-# SOURCE #-}</literal> pragma is used only in <literal>import</literal> declarations,
8055      to break a module loop.  It is described in detail in <xref linkend="mutual-recursion"/>.
8056      </para>
8057 </sect2>
8058
8059 </sect1>
8060
8061 <!--  ======================= REWRITE RULES ======================== -->
8062
8063 <sect1 id="rewrite-rules">
8064 <title>Rewrite rules
8065
8066 <indexterm><primary>RULES pragma</primary></indexterm>
8067 <indexterm><primary>pragma, RULES</primary></indexterm>
8068 <indexterm><primary>rewrite rules</primary></indexterm></title>
8069
8070 <para>
8071 The programmer can specify rewrite rules as part of the source program
8072 (in a pragma).
8073 Here is an example:
8074
8075 <programlisting>
8076   {-# RULES
8077   "map/map"    forall f g xs.  map f (map g xs) = map (f.g) xs
8078     #-}
8079 </programlisting>
8080 </para>
8081 <para>
8082 Use the debug flag <option>-ddump-simpl-stats</option> to see what rules fired.
8083 If you need more information, then <option>-ddump-rule-firings</option> shows you
8084 each individual rule firing in detail.
8085 </para>
8086
8087 <sect2>
8088 <title>Syntax</title>
8089
8090 <para>
8091 From a syntactic point of view:
8092
8093 <itemizedlist>
8094
8095 <listitem>
8096 <para>
8097  There may be zero or more rules in a <literal>RULES</literal> pragma, separated by semicolons (which
8098  may be generated by the layout rule).
8099 </para>
8100 </listitem>
8101
8102 <listitem>
8103 <para>
8104 The layout rule applies in a pragma.
8105 Currently no new indentation level
8106 is set, so if you put several rules in single RULES pragma and wish to use layout to separate them,
8107 you must lay out the starting in the same column as the enclosing definitions.
8108 <programlisting>
8109   {-# RULES
8110   "map/map"    forall f g xs.  map f (map g xs) = map (f.g) xs
8111   "map/append" forall f xs ys. map f (xs ++ ys) = map f xs ++ map f ys
8112     #-}
8113 </programlisting>
8114 Furthermore, the closing <literal>#-}</literal>
8115 should start in a column to the right of the opening <literal>{-#</literal>.
8116 </para>
8117 </listitem>
8118
8119 <listitem>
8120 <para>
8121  Each rule has a name, enclosed in double quotes.  The name itself has
8122 no significance at all.  It is only used when reporting how many times the rule fired.
8123 </para>
8124 </listitem>
8125
8126 <listitem>
8127 <para>
8128 A rule may optionally have a phase-control number (see <xref linkend="phase-control"/>),
8129 immediately after the name of the rule.  Thus:
8130 <programlisting>
8131   {-# RULES
8132         "map/map" [2]  forall f g xs. map f (map g xs) = map (f.g) xs
8133     #-}
8134 </programlisting>
8135 The "[2]" means that the rule is active in Phase 2 and subsequent phases.  The inverse
8136 notation "[~2]" is also accepted, meaning that the rule is active up to, but not including,
8137 Phase 2.
8138 </para>
8139 </listitem>
8140
8141
8142
8143 <listitem>
8144 <para>
8145  Each variable mentioned in a rule must either be in scope (e.g. <function>map</function>),
8146 or bound by the <literal>forall</literal> (e.g. <function>f</function>, <function>g</function>, <function>xs</function>).  The variables bound by
8147 the <literal>forall</literal> are called the <emphasis>pattern</emphasis> variables.  They are separated
8148 by spaces, just like in a type <literal>forall</literal>.
8149 </para>
8150 </listitem>
8151 <listitem>
8152
8153 <para>
8154  A pattern variable may optionally have a type signature.
8155 If the type of the pattern variable is polymorphic, it <emphasis>must</emphasis> have a type signature.
8156 For example, here is the <literal>foldr/build</literal> rule:
8157
8158 <programlisting>
8159 "fold/build"  forall k z (g::forall b. (a->b->b) -> b -> b) .
8160               foldr k z (build g) = g k z
8161 </programlisting>
8162
8163 Since <function>g</function> has a polymorphic type, it must have a type signature.
8164
8165 </para>
8166 </listitem>
8167 <listitem>
8168
8169 <para>
8170 The left hand side of a rule must consist of a top-level variable applied
8171 to arbitrary expressions.  For example, this is <emphasis>not</emphasis> OK:
8172
8173 <programlisting>
8174 "wrong1"   forall e1 e2.  case True of { True -> e1; False -> e2 } = e1
8175 "wrong2"   forall f.      f True = True
8176 </programlisting>
8177
8178 In <literal>"wrong1"</literal>, the LHS is not an application; in <literal>"wrong2"</literal>, the LHS has a pattern variable
8179 in the head.
8180 </para>
8181 </listitem>
8182 <listitem>
8183
8184 <para>
8185  A rule does not need to be in the same module as (any of) the
8186 variables it mentions, though of course they need to be in scope.
8187 </para>
8188 </listitem>
8189 <listitem>
8190
8191 <para>
8192  All rules are implicitly exported from the module, and are therefore
8193 in force in any module that imports the module that defined the rule, directly
8194 or indirectly.  (That is, if A imports B, which imports C, then C's rules are
8195 in force when compiling A.)  The situation is very similar to that for instance
8196 declarations.
8197 </para>
8198 </listitem>
8199
8200 <listitem>
8201
8202 <para>
8203 Inside a RULE "<literal>forall</literal>" is treated as a keyword, regardless of
8204 any other flag settings.  Furthermore, inside a RULE, the language extension
8205 <option>-XScopedTypeVariables</option> is automatically enabled; see
8206 <xref linkend="scoped-type-variables"/>.
8207 </para>
8208 </listitem>
8209 <listitem>
8210
8211 <para>
8212 Like other pragmas, RULE pragmas are always checked for scope errors, and
8213 are typechecked. Typechecking means that the LHS and RHS of a rule are typechecked,
8214 and must have the same type.  However, rules are only <emphasis>enabled</emphasis>
8215 if the <option>-fenable-rewrite-rules</option> flag is
8216 on (see <xref linkend="rule-semantics"/>).
8217 </para>
8218 </listitem>
8219 </itemizedlist>
8220
8221 </para>
8222
8223 </sect2>
8224
8225 <sect2 id="rule-semantics">
8226 <title>Semantics</title>
8227
8228 <para>
8229 From a semantic point of view:
8230
8231 <itemizedlist>
8232 <listitem>
8233 <para>
8234 Rules are enabled (that is, used during optimisation)
8235 by the <option>-fenable-rewrite-rules</option> flag.
8236 This flag is implied by <option>-O</option>, and may be switched
8237 off (as usual) by <option>-fno-enable-rewrite-rules</option>.
8238 (NB: enabling <option>-fenable-rewrite-rules</option> without <option>-O</option>
8239 may not do what you expect, though, because without <option>-O</option> GHC
8240 ignores all optimisation information in interface files;
8241 see <option>-fignore-interface-pragmas</option>, <xref linkend="options-f"/>.)
8242 Note that <option>-fenable-rewrite-rules</option> is an <emphasis>optimisation</emphasis> flag, and
8243 has no effect on parsing or typechecking.
8244 </para>
8245 </listitem>
8246
8247 <listitem>
8248 <para>
8249  Rules are regarded as left-to-right rewrite rules.
8250 When GHC finds an expression that is a substitution instance of the LHS
8251 of a rule, it replaces the expression by the (appropriately-substituted) RHS.
8252 By "a substitution instance" we mean that the LHS can be made equal to the
8253 expression by substituting for the pattern variables.
8254
8255 </para>
8256 </listitem>
8257 <listitem>
8258
8259 <para>
8260  GHC makes absolutely no attempt to verify that the LHS and RHS
8261 of a rule have the same meaning.  That is undecidable in general, and
8262 infeasible in most interesting cases.  The responsibility is entirely the programmer's!
8263
8264 </para>
8265 </listitem>
8266 <listitem>
8267
8268 <para>
8269  GHC makes no attempt to make sure that the rules are confluent or
8270 terminating.  For example:
8271
8272 <programlisting>
8273   "loop"        forall x y.  f x y = f y x
8274 </programlisting>
8275
8276 This rule will cause the compiler to go into an infinite loop.
8277
8278 </para>
8279 </listitem>
8280 <listitem>
8281
8282 <para>
8283  If more than one rule matches a call, GHC will choose one arbitrarily to apply.
8284
8285 </para>
8286 </listitem>
8287 <listitem>
8288 <para>
8289  GHC currently uses a very simple, syntactic, matching algorithm
8290 for matching a rule LHS with an expression.  It seeks a substitution
8291 which makes the LHS and expression syntactically equal modulo alpha
8292 conversion.  The pattern (rule), but not the expression, is eta-expanded if
8293 necessary.  (Eta-expanding the expression can lead to laziness bugs.)
8294 But not beta conversion (that's called higher-order matching).
8295 </para>
8296
8297 <para>
8298 Matching is carried out on GHC's intermediate language, which includes
8299 type abstractions and applications.  So a rule only matches if the
8300 types match too.  See <xref linkend="rule-spec"/> below.
8301 </para>
8302 </listitem>
8303 <listitem>
8304
8305 <para>
8306  GHC keeps trying to apply the rules as it optimises the program.
8307 For example, consider:
8308
8309 <programlisting>
8310   let s = map f
8311       t = map g
8312   in
8313   s (t xs)
8314 </programlisting>
8315
8316 The expression <literal>s (t xs)</literal> does not match the rule <literal>"map/map"</literal>, but GHC
8317 will substitute for <varname>s</varname> and <varname>t</varname>, giving an expression which does match.
8318 If <varname>s</varname> or <varname>t</varname> was (a) used more than once, and (b) large or a redex, then it would
8319 not be substituted, and the rule would not fire.
8320
8321 </para>
8322 </listitem>
8323 </itemizedlist>
8324
8325 </para>
8326
8327 </sect2>
8328
8329 <sect2 id="conlike">
8330 <title>How rules interact with INLINE/NOINLINE and CONLIKE pragmas</title>
8331
8332 <para>
8333 Ordinary inlining happens at the same time as rule rewriting, which may lead to unexpected
8334 results.  Consider this (artificial) example
8335 <programlisting>
8336 f x = x
8337 g y = f y
8338 h z = g True
8339
8340 {-# RULES "f" f True = False #-}
8341 </programlisting>
8342 Since <literal>f</literal>'s right-hand side is small, it is inlined into <literal>g</literal>,
8343 to give
8344 <programlisting>
8345 g y = y
8346 </programlisting>
8347 Now <literal>g</literal> is inlined into <literal>h</literal>, but <literal>f</literal>'s RULE has
8348 no chance to fire.
8349 If instead GHC had first inlined <literal>g</literal> into <literal>h</literal> then there
8350 would have been a better chance that <literal>f</literal>'s RULE might fire.
8351 </para>
8352 <para>
8353 The way to get predictable behaviour is to use a NOINLINE
8354 pragma, or an INLINE[<replaceable>phase</replaceable>] pragma, on <literal>f</literal>, to ensure
8355 that it is not inlined until its RULEs have had a chance to fire.
8356 </para>
8357 <para>
8358 GHC is very cautious about duplicating work.  For example, consider
8359 <programlisting>
8360 f k z xs = let xs = build g
8361            in ...(foldr k z xs)...sum xs...
8362 {-# RULES "foldr/build" forall k z g. foldr k z (build g) = g k z #-}
8363 </programlisting>
8364 Since <literal>xs</literal> is used twice, GHC does not fire the foldr/build rule.  Rightly
8365 so, because it might take a lot of work to compute <literal>xs</literal>, which would be
8366 duplicated if the rule fired.
8367 </para>
8368 <para>
8369 Sometimes, however, this approach is over-cautious, and we <emphasis>do</emphasis> want the
8370 rule to fire, even though doing so would duplicate redex.  There is no way that GHC can work out
8371 when this is a good idea, so we provide the CONLIKE pragma to declare it, thus:
8372 <programlisting>
8373 {-# INLINE[1] CONLIKE f #-}
8374 f x = <replaceable>blah</replaceable>
8375 </programlisting>
8376 CONLIKE is a modifier to an INLINE or NOINLINE pragam.  It specifies that an application
8377 of f to one argument (in general, the number of arguments to the left of the '=' sign)
8378 should be considered cheap enough to duplicate, if such a duplication would make rule
8379 fire.  (The name "CONLIKE" is short for "constructor-like", because constructors certainly
8380 have such a property.)
8381 The CONLIKE pragam is a modifier to INLINE/NOINLINE because it really only makes sense to match
8382 <literal>f</literal> on the LHS of a rule if you are sure that <literal>f</literal> is
8383 not going to be inlined before the rule has a chance to fire.
8384 </para>
8385 </sect2>
8386
8387 <sect2>
8388 <title>List fusion</title>
8389
8390 <para>
8391 The RULES mechanism is used to implement fusion (deforestation) of common list functions.
8392 If a "good consumer" consumes an intermediate list constructed by a "good producer", the
8393 intermediate list should be eliminated entirely.
8394 </para>
8395
8396 <para>
8397 The following are good producers:
8398
8399 <itemizedlist>
8400 <listitem>
8401
8402 <para>
8403  List comprehensions
8404 </para>
8405 </listitem>
8406 <listitem>
8407
8408 <para>
8409  Enumerations of <literal>Int</literal> and <literal>Char</literal> (e.g. <literal>['a'..'z']</literal>).
8410 </para>
8411 </listitem>
8412 <listitem>
8413
8414 <para>
8415  Explicit lists (e.g. <literal>[True, False]</literal>)
8416 </para>
8417 </listitem>
8418 <listitem>
8419
8420 <para>
8421  The cons constructor (e.g <literal>3:4:[]</literal>)
8422 </para>
8423 </listitem>
8424 <listitem>
8425
8426 <para>
8427  <function>++</function>
8428 </para>
8429 </listitem>
8430
8431 <listitem>
8432 <para>
8433  <function>map</function>
8434 </para>
8435 </listitem>
8436
8437 <listitem>
8438 <para>
8439 <function>take</function>, <function>filter</function>
8440 </para>
8441 </listitem>
8442 <listitem>
8443
8444 <para>
8445  <function>iterate</function>, <function>repeat</function>
8446 </para>
8447 </listitem>
8448 <listitem>
8449
8450 <para>
8451  <function>zip</function>, <function>zipWith</function>
8452 </para>
8453 </listitem>
8454
8455 </itemizedlist>
8456
8457 </para>
8458
8459 <para>
8460 The following are good consumers:
8461
8462 <itemizedlist>
8463 <listitem>
8464
8465 <para>
8466  List comprehensions
8467 </para>
8468 </listitem>
8469 <listitem>
8470
8471 <para>
8472  <function>array</function> (on its second argument)
8473 </para>
8474 </listitem>
8475 <listitem>
8476
8477 <para>
8478  <function>++</function> (on its first argument)
8479 </para>
8480 </listitem>
8481
8482 <listitem>
8483 <para>
8484  <function>foldr</function>
8485 </para>
8486 </listitem>
8487
8488 <listitem>
8489 <para>
8490  <function>map</function>
8491 </para>
8492 </listitem>
8493 <listitem>
8494
8495 <para>
8496 <function>take</function>, <function>filter</function>
8497 </para>
8498 </listitem>
8499 <listitem>
8500
8501 <para>
8502  <function>concat</function>
8503 </para>
8504 </listitem>
8505 <listitem>
8506
8507 <para>
8508  <function>unzip</function>, <function>unzip2</function>, <function>unzip3</function>, <function>unzip4</function>
8509 </para>
8510 </listitem>
8511 <listitem>
8512
8513 <para>
8514  <function>zip</function>, <function>zipWith</function> (but on one argument only; if both are good producers, <function>zip</function>
8515 will fuse with one but not the other)
8516 </para>
8517 </listitem>
8518 <listitem>
8519
8520 <para>
8521  <function>partition</function>
8522 </para>
8523 </listitem>
8524 <listitem>
8525
8526 <para>
8527  <function>head</function>
8528 </para>
8529 </listitem>
8530 <listitem>
8531
8532 <para>
8533  <function>and</function>, <function>or</function>, <function>any</function>, <function>all</function>
8534 </para>
8535 </listitem>
8536 <listitem>
8537
8538 <para>
8539  <function>sequence&lowbar;</function>
8540 </para>
8541 </listitem>
8542 <listitem>
8543
8544 <para>
8545  <function>msum</function>
8546 </para>
8547 </listitem>
8548 <listitem>
8549
8550 <para>
8551  <function>sortBy</function>
8552 </para>
8553 </listitem>
8554
8555 </itemizedlist>
8556
8557 </para>
8558
8559  <para>
8560 So, for example, the following should generate no intermediate lists:
8561
8562 <programlisting>
8563 array (1,10) [(i,i*i) | i &#60;- map (+ 1) [0..9]]
8564 </programlisting>
8565
8566 </para>
8567
8568 <para>
8569 This list could readily be extended; if there are Prelude functions that you use
8570 a lot which are not included, please tell us.
8571 </para>
8572
8573 <para>
8574 If you want to write your own good consumers or producers, look at the
8575 Prelude definitions of the above functions to see how to do so.
8576 </para>
8577
8578 </sect2>
8579
8580 <sect2 id="rule-spec">
8581 <title>Specialisation
8582 </title>
8583
8584 <para>
8585 Rewrite rules can be used to get the same effect as a feature
8586 present in earlier versions of GHC.
8587 For example, suppose that:
8588
8589 <programlisting>
8590 genericLookup :: Ord a => Table a b   -> a   -> b
8591 intLookup     ::          Table Int b -> Int -> b
8592 </programlisting>
8593
8594 where <function>intLookup</function> is an implementation of
8595 <function>genericLookup</function> that works very fast for
8596 keys of type <literal>Int</literal>.  You might wish
8597 to tell GHC to use <function>intLookup</function> instead of
8598 <function>genericLookup</function> whenever the latter was called with
8599 type <literal>Table Int b -&gt; Int -&gt; b</literal>.
8600 It used to be possible to write
8601
8602 <programlisting>
8603 {-# SPECIALIZE genericLookup :: Table Int b -> Int -> b = intLookup #-}
8604 </programlisting>
8605
8606 This feature is no longer in GHC, but rewrite rules let you do the same thing:
8607
8608 <programlisting>
8609 {-# RULES "genericLookup/Int" genericLookup = intLookup #-}
8610 </programlisting>
8611
8612 This slightly odd-looking rule instructs GHC to replace
8613 <function>genericLookup</function> by <function>intLookup</function>
8614 <emphasis>whenever the types match</emphasis>.
8615 What is more, this rule does not need to be in the same
8616 file as <function>genericLookup</function>, unlike the
8617 <literal>SPECIALIZE</literal> pragmas which currently do (so that they
8618 have an original definition available to specialise).
8619 </para>
8620
8621 <para>It is <emphasis>Your Responsibility</emphasis> to make sure that
8622 <function>intLookup</function> really behaves as a specialised version
8623 of <function>genericLookup</function>!!!</para>
8624
8625 <para>An example in which using <literal>RULES</literal> for
8626 specialisation will Win Big:
8627
8628 <programlisting>
8629 toDouble :: Real a => a -> Double
8630 toDouble = fromRational . toRational
8631
8632 {-# RULES "toDouble/Int" toDouble = i2d #-}
8633 i2d (I# i) = D# (int2Double# i) -- uses Glasgow prim-op directly
8634 </programlisting>
8635
8636 The <function>i2d</function> function is virtually one machine
8637 instruction; the default conversion&mdash;via an intermediate
8638 <literal>Rational</literal>&mdash;is obscenely expensive by
8639 comparison.
8640 </para>
8641
8642 </sect2>
8643
8644 <sect2 id="controlling-rules">
8645 <title>Controlling what's going on in rewrite rules</title>
8646
8647 <para>
8648
8649 <itemizedlist>
8650 <listitem>
8651
8652 <para>
8653 Use <option>-ddump-rules</option> to see the rules that are defined
8654 <emphasis>in this module</emphasis>.
8655 This includes rules generated by the specialisation pass, but excludes
8656 rules imported from other modules.
8657 </para>
8658 </listitem>
8659
8660 <listitem>
8661 <para>
8662  Use <option>-ddump-simpl-stats</option> to see what rules are being fired.
8663 If you add <option>-dppr-debug</option> you get a more detailed listing.
8664 </para>
8665 </listitem>
8666
8667 <listitem>
8668 <para>
8669  Use <option>-ddump-rule-firings</option> to see in great detail what rules are being fired.
8670 If you add <option>-dppr-debug</option> you get a still more detailed listing.
8671 </para>
8672 </listitem>
8673
8674 <listitem>
8675 <para>
8676  The definition of (say) <function>build</function> in <filename>GHC/Base.lhs</filename> looks like this:
8677
8678 <programlisting>
8679         build   :: forall a. (forall b. (a -> b -> b) -> b -> b) -> [a]
8680         {-# INLINE build #-}
8681         build g = g (:) []
8682 </programlisting>
8683
8684 Notice the <literal>INLINE</literal>!  That prevents <literal>(:)</literal> from being inlined when compiling
8685 <literal>PrelBase</literal>, so that an importing module will &ldquo;see&rdquo; the <literal>(:)</literal>, and can
8686 match it on the LHS of a rule.  <literal>INLINE</literal> prevents any inlining happening
8687 in the RHS of the <literal>INLINE</literal> thing.  I regret the delicacy of this.
8688
8689 </para>
8690 </listitem>
8691 <listitem>
8692
8693 <para>
8694  In <filename>libraries/base/GHC/Base.lhs</filename> look at the rules for <function>map</function> to
8695 see how to write rules that will do fusion and yet give an efficient
8696 program even if fusion doesn't happen.  More rules in <filename>GHC/List.lhs</filename>.
8697 </para>
8698 </listitem>
8699
8700 </itemizedlist>
8701
8702 </para>
8703
8704 </sect2>
8705
8706 <sect2 id="core-pragma">
8707   <title>CORE pragma</title>
8708
8709   <indexterm><primary>CORE pragma</primary></indexterm>
8710   <indexterm><primary>pragma, CORE</primary></indexterm>
8711   <indexterm><primary>core, annotation</primary></indexterm>
8712
8713 <para>
8714   The external core format supports <quote>Note</quote> annotations;
8715   the <literal>CORE</literal> pragma gives a way to specify what these
8716   should be in your Haskell source code.  Syntactically, core
8717   annotations are attached to expressions and take a Haskell string
8718   literal as an argument.  The following function definition shows an
8719   example:
8720
8721 <programlisting>
8722 f x = ({-# CORE "foo" #-} show) ({-# CORE "bar" #-} x)
8723 </programlisting>
8724
8725   Semantically, this is equivalent to:
8726
8727 <programlisting>
8728 g x = show x
8729 </programlisting>
8730 </para>
8731
8732 <para>
8733   However, when external core is generated (via
8734   <option>-fext-core</option>), there will be Notes attached to the
8735   expressions <function>show</function> and <varname>x</varname>.
8736   The core function declaration for <function>f</function> is:
8737 </para>
8738
8739 <programlisting>
8740   f :: %forall a . GHCziShow.ZCTShow a ->
8741                    a -> GHCziBase.ZMZN GHCziBase.Char =
8742     \ @ a (zddShow::GHCziShow.ZCTShow a) (eta::a) ->
8743         (%note "foo"
8744          %case zddShow %of (tpl::GHCziShow.ZCTShow a)
8745            {GHCziShow.ZCDShow
8746             (tpl1::GHCziBase.Int ->
8747                    a ->
8748                    GHCziBase.ZMZN GHCziBase.Char -> GHCziBase.ZMZN GHCziBase.Cha
8749 r)
8750             (tpl2::a -> GHCziBase.ZMZN GHCziBase.Char)
8751             (tpl3::GHCziBase.ZMZN a ->
8752                    GHCziBase.ZMZN GHCziBase.Char -> GHCziBase.ZMZN GHCziBase.Cha
8753 r) ->
8754               tpl2})
8755         (%note "bar"
8756          eta);
8757 </programlisting>
8758
8759 <para>
8760   Here, we can see that the function <function>show</function> (which
8761   has been expanded out to a case expression over the Show dictionary)
8762   has a <literal>%note</literal> attached to it, as does the
8763   expression <varname>eta</varname> (which used to be called
8764   <varname>x</varname>).
8765 </para>
8766
8767 </sect2>
8768
8769 </sect1>
8770
8771 <sect1 id="special-ids">
8772 <title>Special built-in functions</title>
8773 <para>GHC has a few built-in functions with special behaviour.  These
8774 are now described in the module <ulink
8775 url="&libraryGhcPrimLocation;/GHC-Prim.html"><literal>GHC.Prim</literal></ulink>
8776 in the library documentation.</para>
8777 </sect1>
8778
8779
8780 <sect1 id="generic-classes">
8781 <title>Generic classes</title>
8782
8783 <para>
8784 The ideas behind this extension are described in detail in "Derivable type classes",
8785 Ralf Hinze and Simon Peyton Jones, Haskell Workshop, Montreal Sept 2000, pp94-105.
8786 An example will give the idea:
8787 </para>
8788
8789 <programlisting>
8790   import Generics
8791
8792   class Bin a where
8793     toBin   :: a -> [Int]
8794     fromBin :: [Int] -> (a, [Int])
8795
8796     toBin {| Unit |}    Unit      = []
8797     toBin {| a :+: b |} (Inl x)   = 0 : toBin x
8798     toBin {| a :+: b |} (Inr y)   = 1 : toBin y
8799     toBin {| a :*: b |} (x :*: y) = toBin x ++ toBin y
8800
8801     fromBin {| Unit |}    bs      = (Unit, bs)
8802     fromBin {| a :+: b |} (0:bs)  = (Inl x, bs')    where (x,bs') = fromBin bs
8803     fromBin {| a :+: b |} (1:bs)  = (Inr y, bs')    where (y,bs') = fromBin bs
8804     fromBin {| a :*: b |} bs      = (x :*: y, bs'') where (x,bs' ) = fromBin bs
8805                                                           (y,bs'') = fromBin bs'
8806 </programlisting>
8807 <para>
8808 This class declaration explains how <literal>toBin</literal> and <literal>fromBin</literal>
8809 work for arbitrary data types.  They do so by giving cases for unit, product, and sum,
8810 which are defined thus in the library module <literal>Generics</literal>:
8811 </para>
8812 <programlisting>
8813   data Unit    = Unit
8814   data a :+: b = Inl a | Inr b
8815   data a :*: b = a :*: b
8816 </programlisting>
8817 <para>
8818 Now you can make a data type into an instance of Bin like this:
8819 <programlisting>
8820   instance (Bin a, Bin b) => Bin (a,b)
8821   instance Bin a => Bin [a]
8822 </programlisting>
8823 That is, just leave off the "where" clause.  Of course, you can put in the
8824 where clause and over-ride whichever methods you please.
8825 </para>
8826
8827     <sect2>
8828       <title> Using generics </title>
8829       <para>To use generics you need to</para>
8830       <itemizedlist>
8831         <listitem>
8832           <para>Use the flags <option>-fglasgow-exts</option> (to enable the extra syntax),
8833                 <option>-XGenerics</option> (to generate extra per-data-type code),
8834                 and <option>-package lang</option> (to make the <literal>Generics</literal> library
8835                 available.  </para>
8836         </listitem>
8837         <listitem>
8838           <para>Import the module <literal>Generics</literal> from the
8839           <literal>lang</literal> package.  This import brings into
8840           scope the data types <literal>Unit</literal>,
8841           <literal>:*:</literal>, and <literal>:+:</literal>.  (You
8842           don't need this import if you don't mention these types
8843           explicitly; for example, if you are simply giving instance
8844           declarations.)</para>
8845         </listitem>
8846       </itemizedlist>
8847     </sect2>
8848
8849 <sect2> <title> Changes wrt the paper </title>
8850 <para>
8851 Note that the type constructors <literal>:+:</literal> and <literal>:*:</literal>
8852 can be written infix (indeed, you can now use
8853 any operator starting in a colon as an infix type constructor).  Also note that
8854 the type constructors are not exactly as in the paper (Unit instead of 1, etc).
8855 Finally, note that the syntax of the type patterns in the class declaration
8856 uses "<literal>{|</literal>" and "<literal>|}</literal>" brackets; curly braces
8857 alone would ambiguous when they appear on right hand sides (an extension we
8858 anticipate wanting).
8859 </para>
8860 </sect2>
8861
8862 <sect2> <title>Terminology and restrictions</title>
8863 <para>
8864 Terminology.  A "generic default method" in a class declaration
8865 is one that is defined using type patterns as above.
8866 A "polymorphic default method" is a default method defined as in Haskell 98.
8867 A "generic class declaration" is a class declaration with at least one
8868 generic default method.
8869 </para>
8870
8871 <para>
8872 Restrictions:
8873 <itemizedlist>
8874 <listitem>
8875 <para>
8876 Alas, we do not yet implement the stuff about constructor names and
8877 field labels.
8878 </para>
8879 </listitem>
8880
8881 <listitem>
8882 <para>
8883 A generic class can have only one parameter; you can't have a generic
8884 multi-parameter class.
8885 </para>
8886 </listitem>
8887
8888 <listitem>
8889 <para>
8890 A default method must be defined entirely using type patterns, or entirely
8891 without.  So this is illegal:
8892 <programlisting>
8893   class Foo a where
8894     op :: a -> (a, Bool)
8895     op {| Unit |} Unit = (Unit, True)
8896     op x               = (x,    False)
8897 </programlisting>
8898 However it is perfectly OK for some methods of a generic class to have
8899 generic default methods and others to have polymorphic default methods.
8900 </para>
8901 </listitem>
8902
8903 <listitem>
8904 <para>
8905 The type variable(s) in the type pattern for a generic method declaration
8906 scope over the right hand side.  So this is legal (note the use of the type variable ``p'' in a type signature on the right hand side:
8907 <programlisting>
8908   class Foo a where
8909     op :: a -> Bool
8910     op {| p :*: q |} (x :*: y) = op (x :: p)
8911     ...
8912 </programlisting>
8913 </para>
8914 </listitem>
8915
8916 <listitem>
8917 <para>
8918 The type patterns in a generic default method must take one of the forms:
8919 <programlisting>
8920        a :+: b
8921        a :*: b
8922        Unit
8923 </programlisting>
8924 where "a" and "b" are type variables.  Furthermore, all the type patterns for
8925 a single type constructor (<literal>:*:</literal>, say) must be identical; they
8926 must use the same type variables.  So this is illegal:
8927 <programlisting>
8928   class Foo a where
8929     op :: a -> Bool
8930     op {| a :+: b |} (Inl x) = True
8931     op {| p :+: q |} (Inr y) = False
8932 </programlisting>
8933 The type patterns must be identical, even in equations for different methods of the class.
8934 So this too is illegal:
8935 <programlisting>
8936   class Foo a where
8937     op1 :: a -> Bool
8938     op1 {| a :*: b |} (x :*: y) = True
8939
8940     op2 :: a -> Bool
8941     op2 {| p :*: q |} (x :*: y) = False
8942 </programlisting>
8943 (The reason for this restriction is that we gather all the equations for a particular type constructor
8944 into a single generic instance declaration.)
8945 </para>
8946 </listitem>
8947
8948 <listitem>
8949 <para>
8950 A generic method declaration must give a case for each of the three type constructors.
8951 </para>
8952 </listitem>
8953
8954 <listitem>
8955 <para>
8956 The type for a generic method can be built only from:
8957   <itemizedlist>
8958   <listitem> <para> Function arrows </para> </listitem>
8959   <listitem> <para> Type variables </para> </listitem>
8960   <listitem> <para> Tuples </para> </listitem>
8961   <listitem> <para> Arbitrary types not involving type variables </para> </listitem>
8962   </itemizedlist>
8963 Here are some example type signatures for generic methods:
8964 <programlisting>
8965     op1 :: a -> Bool
8966     op2 :: Bool -> (a,Bool)
8967     op3 :: [Int] -> a -> a
8968     op4 :: [a] -> Bool
8969 </programlisting>
8970 Here, op1, op2, op3 are OK, but op4 is rejected, because it has a type variable
8971 inside a list.
8972 </para>
8973 <para>
8974 This restriction is an implementation restriction: we just haven't got around to
8975 implementing the necessary bidirectional maps over arbitrary type constructors.
8976 It would be relatively easy to add specific type constructors, such as Maybe and list,
8977 to the ones that are allowed.</para>
8978 </listitem>
8979
8980 <listitem>
8981 <para>
8982 In an instance declaration for a generic class, the idea is that the compiler
8983 will fill in the methods for you, based on the generic templates.  However it can only
8984 do so if
8985   <itemizedlist>
8986   <listitem>
8987   <para>
8988   The instance type is simple (a type constructor applied to type variables, as in Haskell 98).
8989   </para>
8990   </listitem>
8991   <listitem>
8992   <para>
8993   No constructor of the instance type has unboxed fields.
8994   </para>
8995   </listitem>
8996   </itemizedlist>
8997 (Of course, these things can only arise if you are already using GHC extensions.)
8998 However, you can still give an instance declarations for types which break these rules,
8999 provided you give explicit code to override any generic default methods.
9000 </para>
9001 </listitem>
9002
9003 </itemizedlist>
9004 </para>
9005
9006 <para>
9007 The option <option>-ddump-deriv</option> dumps incomprehensible stuff giving details of
9008 what the compiler does with generic declarations.
9009 </para>
9010
9011 </sect2>
9012
9013 <sect2> <title> Another example </title>
9014 <para>
9015 Just to finish with, here's another example I rather like:
9016 <programlisting>
9017   class Tag a where
9018     nCons :: a -> Int
9019     nCons {| Unit |}    _ = 1
9020     nCons {| a :*: b |} _ = 1
9021     nCons {| a :+: b |} _ = nCons (bot::a) + nCons (bot::b)
9022
9023     tag :: a -> Int
9024     tag {| Unit |}    _       = 1
9025     tag {| a :*: b |} _       = 1
9026     tag {| a :+: b |} (Inl x) = tag x
9027     tag {| a :+: b |} (Inr y) = nCons (bot::a) + tag y
9028 </programlisting>
9029 </para>
9030 </sect2>
9031 </sect1>
9032
9033 <sect1 id="monomorphism">
9034 <title>Control over monomorphism</title>
9035
9036 <para>GHC supports two flags that control the way in which generalisation is
9037 carried out at let and where bindings.
9038 </para>
9039
9040 <sect2>
9041 <title>Switching off the dreaded Monomorphism Restriction</title>
9042           <indexterm><primary><option>-XNoMonomorphismRestriction</option></primary></indexterm>
9043
9044 <para>Haskell's monomorphism restriction (see
9045 <ulink url="http://www.haskell.org/onlinereport/decls.html#sect4.5.5">Section
9046 4.5.5</ulink>
9047 of the Haskell Report)
9048 can be completely switched off by
9049 <option>-XNoMonomorphismRestriction</option>.
9050 </para>
9051 </sect2>
9052
9053 <sect2>
9054 <title>Monomorphic pattern bindings</title>
9055           <indexterm><primary><option>-XNoMonoPatBinds</option></primary></indexterm>
9056           <indexterm><primary><option>-XMonoPatBinds</option></primary></indexterm>
9057
9058           <para> As an experimental change, we are exploring the possibility of
9059           making pattern bindings monomorphic; that is, not generalised at all.
9060             A pattern binding is a binding whose LHS has no function arguments,
9061             and is not a simple variable.  For example:
9062 <programlisting>
9063   f x = x                    -- Not a pattern binding
9064   f = \x -> x                -- Not a pattern binding
9065   f :: Int -> Int = \x -> x  -- Not a pattern binding
9066
9067   (g,h) = e                  -- A pattern binding
9068   (f) = e                    -- A pattern binding
9069   [x] = e                    -- A pattern binding
9070 </programlisting>
9071 Experimentally, GHC now makes pattern bindings monomorphic <emphasis>by
9072 default</emphasis>.  Use <option>-XNoMonoPatBinds</option> to recover the
9073 standard behaviour.
9074 </para>
9075 </sect2>
9076 </sect1>
9077
9078
9079
9080 <!-- Emacs stuff:
9081      ;;; Local Variables: ***
9082      ;;; mode: xml ***
9083      ;;; sgml-parent-document: ("users_guide.xml" "book" "chapter" "sect1") ***
9084      ;;; ispell-local-dictionary: "british" ***
9085      ;;; End: ***
9086  -->
9087