Improve documentation of MagicHash and primitive types generally (Trac #2547)

[ghc-hetmet.git] / docs / users_guide / glasgow_exts.xml
diff --git a/docs/users_guide/glasgow_exts.xml b/docs/users_guide/glasgow_exts.xml

index 8b6ec73..4d31dd1 100644 (file)
--- a/docs/users_guide/glasgow_exts.xml
+++ b/docs/users_guide/glasgow_exts.xml
@@ -131,6 +131,16 @@ documentation</ulink> describes all the libraries that come with GHC.
  
        <varlistentry>
         <term>
+          <option>-XMagicHash</option>:
+        </term>
+       <listitem>
+         <para> Allow "&num;" as a <link linkend="magic-hash">postfix modifier on identifiers</link>.
+          </para>
+       </listitem>
+      </varlistentry>
+
+      <varlistentry>
+       <term>
            <option>-XMonomorphismRestriction</option>,<option>-XMonoPatBinds</option>:
          </term>
         <listitem>
@@ -308,7 +318,8 @@ documentation</ulink> describes all the libraries that come with GHC.
  <sect1 id="primitives">
    <title>Unboxed types and primitive operations</title>
  
-<para>GHC is built on a raft of primitive data types and operations.
+<para>GHC is built on a raft of primitive data types and operations;
+"primitive" in the sense that they cannot be defined in Haskell itself.
  While you really can use this stuff to write fast code,
    we generally find it a lot less painful, and more satisfying in the
    long run, to use higher-level language features and libraries.  With
@@ -316,28 +327,21 @@ While you really can use this stuff to write fast code,
    unboxed version in any case.  And if it isn't, we'd like to know
    about it.</para>
  
-<para>We do not currently have good, up-to-date documentation about the
-primitives, perhaps because they are mainly intended for internal use.
-There used to be a long section about them here in the User Guide, but it
-became out of date, and wrong information is worse than none.</para>
-
-<para>The Real Truth about what primitive types there are, and what operations
-work over those types, is held in the file
-<filename>compiler/prelude/primops.txt.pp</filename>.
-This file is used directly to generate GHC's primitive-operation definitions, so
-it is always correct!  It is also intended for processing into text.</para>
-
-<para>Indeed,
-the result of such processing is part of the description of the 
- <ulink
-      url="http://www.haskell.org/ghc/docs/papers/core.ps.gz">External
-        Core language</ulink>.
-So that document is a good place to look for a type-set version.
-We would be very happy if someone wanted to volunteer to produce an XML
-back end to the program that processes <filename>primops.txt</filename> so that
-we could include the results here in the User Guide.</para>
-
-<para>What follows here is a brief summary of some main points.</para>
+<para>All these primitive data types and operations are exported by the 
+library <literal>GHC.Prim</literal>, for which there is 
+<ulink url="../libraries/base/GHC.Prim.html">detailed online documentation</ulink>.
+(This documentation is generated from the file <filename>compiler/prelude/primops.txt.pp</filename>.)
+</para>
+<para>
+If you want to mention any of the primitive data types or operations in your
+program, you must first import <literal>GHC.Prim</literal> to bring them
+into scope.  Many of them have names ending in "&num;", and to mention such
+names you need the <option>-XMagicHash</option> extension (<xref linkend="magic-hash"/>).
+</para>
+
+<para>The primops make extensive use of <link linkend="glasgow-unboxed">unboxed types</link> 
+and <link linkend="unboxed-tuples">unboxed tuples</link>, which
+we briefly summarise here. </para>
    
  <sect2 id="glasgow-unboxed">
  <title>Unboxed types
@@ -366,26 +370,15 @@ would use in C: <literal>Int&num;</literal> (long int),
  know and love&mdash;usually one instruction.
  </para>
  
-<para> For some primitive types we have special syntax for literals.
-Anything that would be an integer lexeme followed by a
-<literal>&num;</literal> is an <literal>Int&num;</literal> literal, e.g.
-<literal>32&num;</literal> and <literal>-0x3A&num;</literal>. Likewise,
-any non-negative integer literal followed by
-<literal>&num;&num;</literal> is a <literal>Word&num;</literal> literal.
-Likewise, any floating point literal followed by a
-<literal>&num;</literal> is a <literal>Float&num;</literal> literal, and
-followed by <literal>&num;&num;</literal> is a
-<literal>Double&num;</literal>. Finally, a string literal followed by a
-<literal>&num;</literal>, e.g. <literal>&quot;foo&quot;&num;</literal>,
-is a <literal>Addr&num;</literal> literal.
-</para>
-
  <para>
  Primitive (unboxed) types cannot be defined in Haskell, and are
  therefore built into the language and compiler.  Primitive types are
  always unlifted; that is, a value of a primitive type cannot be
-bottom.  We use the convention that primitive types, values, and
-operations have a <literal>&num;</literal> suffix.
+bottom.  We use the convention (but it is only a convention) 
+that primitive types, values, and
+operations have a <literal>&num;</literal> suffix (see <xref linkend="magic-hash"/>).
+For some primitive types we have special syntax for literals, also
+described in the <link linkend="magic-hash">same section</link>.
  </para>
  
  <para>
@@ -562,8 +555,40 @@ Indeed, the bindings can even be recursive.
  <sect1 id="syntax-extns">
  <title>Syntactic extensions</title>
   
+    <sect2 id="magic-hash">
+      <title>The magic hash</title>
+      <para>The language extension <option>-XMagicHash</option> allows "&num;" as a
+       postfix modifier to identifiers.  Thus, "x&num;" is a valid variable, and "T&num;" is
+       a valid type constructor or data constructor.</para>
+
+      <para>The hash sign does not change sematics at all.  We tend to use variable
+       names ending in "&num;" for unboxed values or types (e.g. <literal>Int&num;</literal>), 
+       but there is no requirement to do so; they are just plain ordinary variables.
+       Nor does the <option>-XMagicHash</option> extension bring anything into scope.
+       For example, to bring <literal>Int&num;</literal> into scope you must 
+       import <literal>GHC.Prim</literal> (see <xref linkend="primitives"/>); 
+       the <option>-XMagicHash</option> extension
+       then allows you to <emphasis>refer</emphasis> to the <literal>Int&num;</literal>
+       that is now in scope.</para>
+      <para> The <option>-XMagicHash</option> also enables some new forms of literals (see <xref linkend="glasgow-unboxed"/>):
+       <itemizedlist> 
+         <listitem><para> <literal>'x'&num;</literal> has type <literal>Char&num;</literal></para> </listitem>
+         <listitem><para> <literal>&quot;foo&quot;&num;</literal> has type <literal>Addr&num;</literal></para> </listitem>
+         <listitem><para> <literal>3&num;</literal> has type <literal>Int&num;</literal>. In general,
+         any Haskell 98 integer lexeme followed by a <literal>&num;</literal> is an <literal>Int&num;</literal> literal, e.g.
+            <literal>-0x3A&num;</literal> as well as <literal>32&num;</literal></para>.</listitem>
+         <listitem><para> <literal>3&num;&num;</literal> has type <literal>Word&num;</literal>. In general,
+         any non-negative Haskell 98 integer lexeme followed by <literal>&num;&num;</literal> 
+             is a <literal>Word&num;</literal>. </para> </listitem>
+         <listitem><para> <literal>3.2&num;</literal> has type <literal>Float&num;</literal>.</para> </listitem>
+         <listitem><para> <literal>3.2&num;&num;</literal> has type <literal>Double&num;</literal></para> </listitem>
+         </itemizedlist>
+      </para>
+   </sect2>
+
      <!-- ====================== HIERARCHICAL MODULES =======================  -->
  
+
      <sect2 id="hierarchical-modules">
        <title>Hierarchical Modules</title>
  
@@ -1410,7 +1435,7 @@ records from different modules that use the same field name.
  </title>
  
  <para>
-Record puns are enabled by the flag <literal>-XRecordPuns</literal>.
+Record puns are enabled by the flag <literal>-XNamedFieldPuns</literal>.
  </para>
  
  <para>
@@ -1568,6 +1593,29 @@ necessary to enable them.
  </para>
  </sect2>
  
+<sect2 id="package-imports">
+  <title>Package-qualified imports</title>
+
+  <para>With the <option>-XPackageImports</option> flag, GHC allows
+  import declarations to be qualified by the package name that the
+    module is intended to be imported from.  For example:</para>
+
+<programlisting>
+import "network" Network.Socket
+</programlisting>
+  
+  <para>would import the module <literal>Network.Socket</literal> from
+    the package <literal>network</literal> (any version).  This may
+    be used to disambiguate an import when the same module is
+    available from multiple packages, or is present in both the
+    current package being built and an external package.</para>
+
+  <para>Note: you probably don't need to use this feature, it was
+    added mainly so that we can build backwards-compatible versions of
+    packages when APIs change.  It can lead to fragile dependencies in
+    the common case: modules occasionally move from one package to
+    another, rendering any package-qualified imports broken.</para>
+</sect2>
  </sect1>
  
  
@@ -2436,11 +2484,17 @@ The result type of each constructor must begin with the type constructor being d
  but for a GADT the arguments to the type constructor can be arbitrary monotypes.  
  For example, in the <literal>Term</literal> data
  type above, the type of each constructor must end with <literal>Term ty</literal>, but
-the <literal>ty</literal> may not be a type variable (e.g. the <literal>Lit</literal>
+the <literal>ty</literal> need not be a type variable (e.g. the <literal>Lit</literal>
  constructor).
  </para></listitem>
  
  <listitem><para>
+It's is permitted to declare an ordinary algebraic data type using GADT-style syntax.
+What makes a GADT into a GADT is not the syntax, but rather the presence of data constructors
+whose result type is not just <literal>T a b</literal>.
+</para></listitem>
+
+<listitem><para>
  You cannot use a <literal>deriving</literal> clause for a GADT; only for
  an ordinary data type.
  </para></listitem>
@@ -2476,6 +2530,19 @@ their selector functions actually have different types:
  </programlisting>
  </para></listitem>
  
+<listitem><para>
+When pattern-matching against data constructors drawn from a GADT, 
+for example in a <literal>case</literal> expression, the following rules apply:
+<itemizedlist>
+<listitem><para>The type of the scrutinee must be rigid.</para></listitem>
+<listitem><para>The type of the result of the <literal>case</literal> expression must be rigid.</para></listitem>
+<listitem><para>The type of any free variable mentioned in any of
+the <literal>case</literal> alternatives must be rigid.</para></listitem>
+</itemizedlist>
+A type is "rigid" if it is completely known to the compiler at its binding site.  The easiest
+way to ensure that a variable a rigid type is to give it a type signature.
+</para></listitem>
+
  </itemizedlist>
  </para>
  
@@ -2533,9 +2600,27 @@ The syntax is identical to that of an ordinary instance declaration apart from (
  You must supply a context (in the example the context is <literal>(Eq a)</literal>), 
  exactly as you would in an ordinary instance declaration.
  (In contrast the context is inferred in a <literal>deriving</literal> clause 
-attached to a data type declaration.) These <literal>deriving instance</literal>
-rules obey the same rules concerning form and termination as ordinary instance declarations,
-controlled by the same flags; see <xref linkend="instance-decls"/>. </para>
+attached to a data type declaration.) 
+
+A <literal>deriving instance</literal> declaration
+must obey the same rules concerning form and termination as ordinary instance declarations,
+controlled by the same flags; see <xref linkend="instance-decls"/>.
+</para>
+<para>
+Unlike a <literal>deriving</literal>
+declaration attached to a <literal>data</literal> declaration, the instance can be more specific
+than the data type (assuming you also use 
+<literal>-XFlexibleInstances</literal>, <xref linkend="instance-rules"/>).  Consider
+for example
+<programlisting>
+  data Foo a = Bar a | Baz String
+
+  deriving instance Eq a => Eq (Foo [a])
+  deriving instance Eq a => Eq (Foo (Maybe a))
+</programlisting>
+This will generate a derived instance for <literal>(Foo [a])</literal> and <literal>(Foo (Maybe a))</literal>,
+but other types such as <literal>(Foo (Int,Bool))</literal> will not be an instance of <literal>Eq</literal>.
+</para>
  
  <para>The stand-alone syntax is generalised for newtypes in exactly the same
  way that ordinary <literal>deriving</literal> clauses are generalised (<xref linkend="newtype-deriving"/>).
@@ -3251,7 +3336,7 @@ corresponding type in the instance declaration.
  These restrictions ensure that context reduction terminates: each reduction
  step makes the problem smaller by at least one
  constructor.  Both the Paterson Conditions and the Coverage Condition are lifted 
-if you give the <option>-fallow-undecidable-instances</option> 
+if you give the <option>-XUndecidableInstances</option> 
  flag (<xref linkend="undecidable-instances"/>).
  You can find lots of background material about the reason for these
  restrictions in the paper <ulink
@@ -6144,56 +6229,63 @@ Assertion failures can be caught, see the documentation for the
         don't recommend using this approach with GHC.</para>
      </sect2>
  
-    <sect2 id="deprecated-pragma">
-      <title>DEPRECATED pragma</title>
-      <indexterm><primary>DEPRECATED</primary>
-      </indexterm>
+    <sect2 id="warning-deprecated-pragma">
+      <title>WARNING and DEPRECATED pragmas</title>
+      <indexterm><primary>WARNING</primary></indexterm>
+      <indexterm><primary>DEPRECATED</primary></indexterm>
  
-      <para>The DEPRECATED pragma lets you specify that a particular
-      function, class, or type, is deprecated.  There are two
-      forms.
+      <para>The WARNING pragma allows you to attach an arbitrary warning
+      to a particular function, class, or type.
+      A DEPRECATED pragma lets you specify that
+      a particular function, class, or type is deprecated.
+      There are two ways of using these pragmas.
  
        <itemizedlist>
         <listitem>
-         <para>You can deprecate an entire module thus:</para>
+         <para>You can work on an entire module thus:</para>
  <programlisting>
     module Wibble {-# DEPRECATED "Use Wobble instead" #-} where
       ...
  </programlisting>
+      <para>Or:</para>
+<programlisting>
+   module Wibble {-# WARNING "This is an unstable interface." #-} where
+     ...
+</programlisting>
           <para>When you compile any module that import
            <literal>Wibble</literal>, GHC will print the specified
            message.</para>
         </listitem>
  
         <listitem>
-         <para>You can deprecate a function, class, type, or data constructor, with the
-         following top-level declaration:</para>
+         <para>You can attach a warning to a function, class, type, or data constructor, with the
+         following top-level declarations:</para>
  <programlisting>
     {-# DEPRECATED f, C, T "Don't use these" #-}
+   {-# WARNING unsafePerformIO "This is unsafe; I hope you know what you're doing" #-}
  </programlisting>
           <para>When you compile any module that imports and uses any
            of the specified entities, GHC will print the specified
            message.</para>
-         <para> You can only deprecate entities declared at top level in the module
+         <para> You can only attach to entities declared at top level in the module
           being compiled, and you can only use unqualified names in the list of
-         entities being deprecated.  A capitalised name, such as <literal>T</literal>
+         entities. A capitalised name, such as <literal>T</literal>
           refers to <emphasis>either</emphasis> the type constructor <literal>T</literal>
           <emphasis>or</emphasis> the data constructor <literal>T</literal>, or both if
-         both are in scope.  If both are in scope, there is currently no way to deprecate 
-         one without the other (c.f. fixities <xref linkend="infix-tycons"/>).</para>
+         both are in scope.  If both are in scope, there is currently no way to
+      specify one without the other (c.f. fixities
+      <xref linkend="infix-tycons"/>).</para>
         </listitem>
        </itemizedlist>
-      Any use of the deprecated item, or of anything from a deprecated
-      module, will be flagged with an appropriate message.  However,
-      deprecations are not reported for
-      (a) uses of a deprecated function within its defining module, and
-      (b) uses of a deprecated function in an export list.
+      Warnings and deprecations are not reported for
+      (a) uses within the defining module, and
+      (b) uses in an export list.
        The latter reduces spurious complaints within a library
        in which one module gathers together and re-exports 
        the exports of several others.
        </para>
        <para>You can suppress the warnings with the flag
-      <option>-fno-warn-deprecations</option>.</para>
+      <option>-fno-warn-warnings-deprecations</option>.</para>
      </sect2>
  
      <sect2 id="inline-noinline-pragma">
@@ -6607,15 +6699,7 @@ data S = S {-# UNPACK #-} !Int {-# UNPACK #-} !Int
  
  <para>
  The programmer can specify rewrite rules as part of the source program
-(in a pragma).  GHC applies these rewrite rules wherever it can, provided (a) 
-the <option>-O</option> flag (<xref linkend="options-optimise"/>) is on, 
-and (b) the <option>-fno-rewrite-rules</option> flag
-(<xref linkend="options-f"/>) is not specified, and (c) the
-<option>-fglasgow-exts</option> (<xref linkend="options-language"/>)
-flag is active.
-</para>
-
-<para>
+(in a pragma).  
  Here is an example:
  
  <programlisting>
@@ -6624,6 +6708,11 @@ Here is an example:
      #-}
  </programlisting>
  </para>
+<para>
+Use the debug flag <option>-ddump-simpl-stats</option> to see what rules fired.
+If you need more information, then <option>-ddump-rule-firings</option> shows you
+each individual rule firing in detail.
+</para>
  
  <sect2>
  <title>Syntax</title>
@@ -6730,17 +6819,40 @@ variables it mentions, though of course they need to be in scope.
  <listitem>
  
  <para>
- Rules are automatically exported from a module, just as instance declarations are.
+ All rules are implicitly exported from the module, and are therefore
+in force in any module that imports the module that defined the rule, directly
+or indirectly.  (That is, if A imports B, which imports C, then C's rules are
+in force when compiling A.)  The situation is very similar to that for instance
+declarations.
+</para>
+</listitem>
+
+<listitem>
+
+<para>
+Inside a RULE "<literal>forall</literal>" is treated as a keyword, regardless of
+any other flag settings.  Furthermore, inside a RULE, the language extension
+<option>-XScopedTypeVariables</option> is automatically enabled; see 
+<xref linkend="scoped-type-variables"/>.
  </para>
  </listitem>
+<listitem>
  
+<para>
+Like other pragmas, RULE pragmas are always checked for scope errors, and
+are typechecked. Typechecking means that the LHS and RHS of a rule are typechecked, 
+and must have the same type.  However, rules are only <emphasis>enabled</emphasis>
+if the <option>-fenable-rewrite-rules</option> flag is 
+on (see <xref linkend="rule-semantics"/>).
+</para>
+</listitem>
  </itemizedlist>
  
  </para>
  
  </sect2>
  
-<sect2>
+<sect2 id="rule-semantics">
  <title>Semantics</title>
  
  <para>
@@ -6748,9 +6860,17 @@ From a semantic point of view:
  
  <itemizedlist>
  <listitem>
-
  <para>
-Rules are only applied if you use the <option>-O</option> flag.
+Rules are enabled (that is, used during optimisation)
+by the <option>-fenable-rewrite-rules</option> flag.
+This flag is implied by <option>-O</option>, and may be switched
+off (as usual) by <option>-fno-enable-rewrite-rules</option>.
+(NB: enabling <option>-fenable-rewrite-rules</option> without <option>-O</option> 
+may not do what you expect, though, because without <option>-O</option> GHC 
+ignores all optimisation information in interface files;
+see <option>-fignore-interface-pragmas</option>, <xref linkend="options-f"/>.)
+Note that <option>-fenable-rewrite-rules</option> is an <emphasis>optimisation</emphasis> flag, and
+has no effect on parsing or typechecking.
  </para>
  </listitem>
  
@@ -6767,14 +6887,6 @@ expression by substituting for the pattern variables.
  <listitem>
  
  <para>
- The LHS and RHS of a rule are typechecked, and must have the
-same type.
-
-</para>
-</listitem>
-<listitem>
-
-<para>
   GHC makes absolutely no attempt to verify that the LHS and RHS
  of a rule have the same meaning.  That is undecidable in general, and
  infeasible in most interesting cases.  The responsibility is entirely the programmer's!
@@ -6841,48 +6953,32 @@ not be substituted, and the rule would not fire.
  <listitem>
  
  <para>
- In the earlier phases of compilation, GHC inlines <emphasis>nothing
-that appears on the LHS of a rule</emphasis>, because once you have substituted
-for something you can't match against it (given the simple minded
-matching).  So if you write the rule
-
+Ordinary inlining happens at the same time as rule rewriting, which may lead to unexpected
+results.  Consider this (artificial) example
  <programlisting>
-        "map/map"       forall f,g.  map f . map g = map (f.g)
-</programlisting>
+f x = x
+{-# RULES "f" f True = False #-}
  
-this <emphasis>won't</emphasis> match the expression <literal>map f (map g xs)</literal>.
-It will only match something written with explicit use of ".".
-Well, not quite.  It <emphasis>will</emphasis> match the expression
+g y = f y
  
-<programlisting>
-wibble f g xs
+h z = g True
  </programlisting>
-
-where <function>wibble</function> is defined:
-
+Since <literal>f</literal>'s right-hand side is small, it is inlined into <literal>g</literal>,
+to give
  <programlisting>
-wibble f g = map f . map g
+g y = y
  </programlisting>
-
-because <function>wibble</function> will be inlined (it's small).
-
-Later on in compilation, GHC starts inlining even things on the
-LHS of rules, but still leaves the rules enabled.  This inlining
-policy is controlled by the per-simplification-pass flag <option>-finline-phase</option><emphasis>n</emphasis>.
-
+Now <literal>g</literal> is inlined into <literal>h</literal>, but <literal>f</literal>'s RULE has
+no chance to fire.  
+If instead GHC had first inlined <literal>g</literal> into <literal>h</literal> then there
+would have been a better chance that <literal>f</literal>'s RULE might fire.  
  </para>
-</listitem>
-<listitem>
-
  <para>
- All rules are implicitly exported from the module, and are therefore
-in force in any module that imports the module that defined the rule, directly
-or indirectly.  (That is, if A imports B, which imports C, then C's rules are
-in force when compiling A.)  The situation is very similar to that for instance
-declarations.
+The way to get predictable behaviour is to use a NOINLINE 
+pragma on <literal>f</literal>, to ensure
+that it is not inlined until its RULEs have had a chance to fire.
  </para>
  </listitem>
-
  </itemizedlist>
  
  </para>