[project @ 2002-10-23 15:56:39 by simonpj]

[ghc-hetmet.git] / ghc / docs / users_guide / glasgow_exts.sgml
diff --git a/ghc/docs/users_guide/glasgow_exts.sgml b/ghc/docs/users_guide/glasgow_exts.sgml

index cbfe208..063527b 100644 (file)
--- a/ghc/docs/users_guide/glasgow_exts.sgml
+++ b/ghc/docs/users_guide/glasgow_exts.sgml
@@ -152,7 +152,222 @@ with GHC.
  
  <!-- UNBOXED TYPES AND PRIMITIVE OPERATIONS -->
  <!--    included from primitives.sgml  -->
-&primitives;
+<!-- &primitives; -->
+<sect1 id="primitives">
+  <title>Unboxed types and primitive operations</title>
+
+<para>GHC is built on a raft of primitive data types and operations.
+While you really can use this stuff to write fast code,
+  we generally find it a lot less painful, and more satisfying in the
+  long run, to use higher-level language features and libraries.  With
+  any luck, the code you write will be optimised to the efficient
+  unboxed version in any case.  And if it isn't, we'd like to know
+  about it.</para>
+
+<para>We do not currently have good, up-to-date documentation about the
+primitives, perhaps because they are mainly intended for internal use.
+There used to be a long section about them here in the User Guide, but it
+became out of date, and wrong information is worse than none.</para>
+
+<para>The Real Truth about what primitive types there are, and what operations
+work over those types, is held in the file
+<filename>fptools/ghc/compiler/prelude/primops.txt</filename>.
+This file is used directly to generate GHC's primitive-operation definitions, so
+it is always correct!  It is also intended for processing into text.</para>
+
+<para> Indeed,
+the result of such processing is part of the description of the 
+ <ulink
+      url="http://haskell.cs.yale.edu/ghc/docs/papers/core.ps.gz">External
+        Core language</ulink>.
+So that document is a good place to look for a type-set version.
+We would be very happy if someone wanted to volunteer to produce an SGML
+back end to the program that processes <filename>primops.txt</filename> so that
+we could include the results here in the User Guide.</para>
+
+<para>What follows here is a brief summary of some main points.</para>
+  
+<sect2 id="glasgow-unboxed">
+<title>Unboxed types
+</title>
+
+<para>
+<indexterm><primary>Unboxed types (Glasgow extension)</primary></indexterm>
+</para>
+
+<para>Most types in GHC are <firstterm>boxed</firstterm>, which means
+that values of that type are represented by a pointer to a heap
+object.  The representation of a Haskell <literal>Int</literal>, for
+example, is a two-word heap object.  An <firstterm>unboxed</firstterm>
+type, however, is represented by the value itself, no pointers or heap
+allocation are involved.
+</para>
+
+<para>
+Unboxed types correspond to the &ldquo;raw machine&rdquo; types you
+would use in C: <literal>Int&num;</literal> (long int),
+<literal>Double&num;</literal> (double), <literal>Addr&num;</literal>
+(void *), etc.  The <emphasis>primitive operations</emphasis>
+(PrimOps) on these types are what you might expect; e.g.,
+<literal>(+&num;)</literal> is addition on
+<literal>Int&num;</literal>s, and is the machine-addition that we all
+know and love&mdash;usually one instruction.
+</para>
+
+<para>
+Primitive (unboxed) types cannot be defined in Haskell, and are
+therefore built into the language and compiler.  Primitive types are
+always unlifted; that is, a value of a primitive type cannot be
+bottom.  We use the convention that primitive types, values, and
+operations have a <literal>&num;</literal> suffix.
+</para>
+
+<para>
+Primitive values are often represented by a simple bit-pattern, such
+as <literal>Int&num;</literal>, <literal>Float&num;</literal>,
+<literal>Double&num;</literal>.  But this is not necessarily the case:
+a primitive value might be represented by a pointer to a
+heap-allocated object.  Examples include
+<literal>Array&num;</literal>, the type of primitive arrays.  A
+primitive array is heap-allocated because it is too big a value to fit
+in a register, and would be too expensive to copy around; in a sense,
+it is accidental that it is represented by a pointer.  If a pointer
+represents a primitive value, then it really does point to that value:
+no unevaluated thunks, no indirections&hellip;nothing can be at the
+other end of the pointer than the primitive value.
+</para>
+
+<para>
+There are some restrictions on the use of primitive types, the main
+one being that you can't pass a primitive value to a polymorphic
+function or store one in a polymorphic data type.  This rules out
+things like <literal>[Int&num;]</literal> (i.e. lists of primitive
+integers).  The reason for this restriction is that polymorphic
+arguments and constructor fields are assumed to be pointers: if an
+unboxed integer is stored in one of these, the garbage collector would
+attempt to follow it, leading to unpredictable space leaks.  Or a
+<function>seq</function> operation on the polymorphic component may
+attempt to dereference the pointer, with disastrous results.  Even
+worse, the unboxed value might be larger than a pointer
+(<literal>Double&num;</literal> for instance).
+</para>
+
+<para>
+Nevertheless, A numerically-intensive program using unboxed types can
+go a <emphasis>lot</emphasis> faster than its &ldquo;standard&rdquo;
+counterpart&mdash;we saw a threefold speedup on one example.
+</para>
+
+</sect2>
+
+<sect2 id="unboxed-tuples">
+<title>Unboxed Tuples
+</title>
+
+<para>
+Unboxed tuples aren't really exported by <literal>GHC.Exts</literal>,
+they're available by default with <option>-fglasgow-exts</option>.  An
+unboxed tuple looks like this:
+</para>
+
+<para>
+
+<programlisting>
+(# e_1, ..., e_n #)
+</programlisting>
+
+</para>
+
+<para>
+where <literal>e&lowbar;1..e&lowbar;n</literal> are expressions of any
+type (primitive or non-primitive).  The type of an unboxed tuple looks
+the same.
+</para>
+
+<para>
+Unboxed tuples are used for functions that need to return multiple
+values, but they avoid the heap allocation normally associated with
+using fully-fledged tuples.  When an unboxed tuple is returned, the
+components are put directly into registers or on the stack; the
+unboxed tuple itself does not have a composite representation.  Many
+of the primitive operations listed in this section return unboxed
+tuples.
+</para>
+
+<para>
+There are some pretty stringent restrictions on the use of unboxed tuples:
+</para>
+
+<para>
+
+<itemizedlist>
+<listitem>
+
+<para>
+ Unboxed tuple types are subject to the same restrictions as
+other unboxed types; i.e. they may not be stored in polymorphic data
+structures or passed to polymorphic functions.
+
+</para>
+</listitem>
+<listitem>
+
+<para>
+ Unboxed tuples may only be constructed as the direct result of
+a function, and may only be deconstructed with a <literal>case</literal> expression.
+eg. the following are valid:
+
+
+<programlisting>
+f x y = (# x+1, y-1 #)
+g x = case f x x of { (# a, b #) -&#62; a + b }
+</programlisting>
+
+
+but the following are invalid:
+
+
+<programlisting>
+f x y = g (# x, y #)
+g (# x, y #) = x + y
+</programlisting>
+
+
+</para>
+</listitem>
+<listitem>
+
+<para>
+ No variable can have an unboxed tuple type.  This is illegal:
+
+
+<programlisting>
+f :: (# Int, Int #) -&#62; (# Int, Int #)
+f x = x
+</programlisting>
+
+
+because <literal>x</literal> has an unboxed tuple type.
+
+</para>
+</listitem>
+
+</itemizedlist>
+
+</para>
+
+<para>
+Note: we may relax some of these restrictions in the future.
+</para>
+
+<para>
+The <literal>IO</literal> and <literal>ST</literal> monads use unboxed
+tuples to avoid unnecessary allocation during sequences of operations.
+</para>
+
+</sect2>
+</sect1>
+
  
  <!-- ====================== SYNTACTIC EXTENSIONS =======================  -->
  
@@ -357,6 +572,8 @@ the do-notation, and this extension provides the necessary syntactic support.
  Here is a simple (yet contrived) example:
  </para>
  <programlisting>
+import Control.Monad.Fix
+
  justOnes = mdo xs <- Just (1:xs)
                 return xs
  </programlisting>
@@ -378,8 +595,9 @@ then that monad must be declared an instance of the <literal>MonadFix</literal>
  For details, see the above mentioned reference.
  </para>
  <para>
-The following instances of <literal>MonadFix</literal> are automatically provided: List, Maybe, IO, and
-state monads (both lazy and strict).
+The following instances of <literal>MonadFix</literal> are automatically provided: List, Maybe, IO. 
+Furthermore, the Control.Monad.ST and Control.Monad.ST.Lazy modules provide the instances of the MonadFix class 
+for Haskell's internal state monad (strict and lazy, respectively).
  </para>
  <para>
  There are three important points in using the recursive-do notation:
@@ -390,14 +608,10 @@ than <literal>do</literal>).
  </para></listitem>
  
  <listitem><para>
-If you want to declare an instance of the <literal>MonadFix</literal> class for one of 
-your own monads, or you need to refer to the class name <literal>MonadFix</literal> in any other way (for 
-instance when writing a type constraint), then your program should 
-<literal>import Control.Monad.MonadFix</literal>.
-Otherwise, you don't need to import any special libraries to use the mdo-notation. That is,
-as long as you only use the predefined instances mentioned above, the mdo-notation will
-be automatically available. 
-To be on the safe side, of course, you can simply import it in all cases.
+You should <literal>import Control.Monad.Fix</literal>.
+(Note: Strictly speaking, this import is required only when you need to refer to the name
+<literal>MonadFix</literal> in your program, but the import is always safe, and the programmers
+are encouraged to always import this module when using the mdo-notation.)
  </para></listitem>
  
  <listitem><para>
@@ -407,17 +621,44 @@ As with other extensions, ghc should be given the flag <literal>-fglasgow-exts</
  </para>
  
  <para>
+The web page: <ulink url="http://www.cse.ogi.edu/PacSoft/projects/rmb">http://www.cse.ogi.edu/PacSoft/projects/rmb</ulink>
+contains up to date information on recursive monadic bindings.
+</para>
+
+<para>
  Historical note: The old implementation of the mdo-notation (and most
  of the existing documents) used the name
  <literal>MonadRec</literal> for the class and the corresponding library.
-This name is no longer supported.
+This name is not supported by GHC.
+</para>
+
+</sect2>
+
+
+<sect2> <title> Infix type constructors </title>
+
+<para>GHC supports infix type constructors, much as it supports infix data constructors.  For example:
+<programlisting>
+  infixl 5 :+:
+
+  data a :+: b = Inl a | Inr b
+
+  f :: a `Either` b -> a :+: b
+  f (Left x) = Inl x
+</programlisting>
  </para>
+<para>The lexical 
+syntax of an infix type constructor is just like that of an infix data constructor: either
+it's an operator beginning with ":", or it is an ordinary (alphabetic) type constructor enclosed in
+back-quotes.</para>
  
  <para>
-The web page: <ulink url="http://www.cse.ogi.edu/PacSoft/projects/rmb">http://www.cse.ogi.edu/PacSoft/projects/rmb</ulink>
-contains up to date information on recursive monadic bindings.
+When you give a fixity declaration, the fixity applies to both the data constructor and the
+type constructor with the specified name.  You cannot give different fixities to the type constructor T
+and the data constructor T.
  </para>
  
+
  </sect2>
  
     <!-- ===================== PARALLEL LIST COMPREHENSIONS ===================  -->
@@ -1314,10 +1555,12 @@ implicitly parameterized by a comparison function named <literal>cmp</literal>.
  The dynamic binding constraints are just a new form of predicate in the type class system.
  </para>
  <para>
-An implicit parameter is introduced by the special form <literal>?x</literal>, 
+An implicit parameter occurs in an exprssion using the special form <literal>?x</literal>, 
  where <literal>x</literal> is
-any valid identifier. Use if this construct also introduces new
-dynamic binding constraints. For example, the following definition
+any valid identifier (e.g. <literal>ord ?x</literal> is a valid expression). 
+Use of this construct also introduces a new
+dynamic-binding constraint in the type of the expression. 
+For example, the following definition
  shows how we can define an implicitly parameterized sort function in
  terms of an explicitly parameterized <literal>sortBy</literal> function:
  <programlisting>
@@ -1326,6 +1569,11 @@ terms of an explicitly parameterized <literal>sortBy</literal> function:
    sort   :: (?cmp :: a -> a -> Bool) => [a] -> [a]
    sort    = sortBy ?cmp
  </programlisting>
+</para>
+
+<sect3>
+<title>Implicit-parameter type constraints</title>
+<para>
  Dynamic binding constraints behave just like other type class
  constraints in that they are automatically propagated. Thus, when a
  function is used, its implicit parameters are inherited by the
@@ -1342,68 +1590,80 @@ propagated. With implicit parameters, the default is to always
  propagate them.
  </para>
  <para>
-An implicit parameter differs from other type class constraints in the
+An implicit-parameter type constraint differs from other type class constraints in the
  following way: All uses of a particular implicit parameter must have
  the same type. This means that the type of <literal>(?x, ?x)</literal> 
  is <literal>(?x::a) => (a,a)</literal>, and not 
  <literal>(?x::a, ?x::b) => (a, b)</literal>, as would be the case for type
  class constraints.
  </para>
+
+<para> You can't have an implicit parameter in the context of a class or instance
+declaration.  For example, both these declarations are illegal:
+<programlisting>
+  class (?x::Int) => C a where ...
+  instance (?x::a) => Foo [a] where ...
+</programlisting>
+Reason: exactly which implicit parameter you pick up depends on exactly where
+you invoke a function. But the ``invocation'' of instance declarations is done
+behind the scenes by the compiler, so it's hard to figure out exactly where it is done.
+Easiest thing is to outlaw the offending types.</para>
+</sect3>
+
+<sect3>
+<title>Implicit-parameter bindings</title>
+
  <para>
-An implicit parameter is bound using the standard
-<literal>let</literal> binding form, where the bindings must be a
-collection of simple bindings to implicit-style variables (no
-function-style bindings, and no type signatures); these bindings are
-neither polymorphic or recursive. This form binds the implicit
-parameters arising in the body, not the free variables as a
-<literal>let</literal> or <literal>where</literal> would do. For
-example, we define the <literal>min</literal> function by binding
-<literal>cmp</literal>.</para>
+An implicit parameter is <emphasis>bound</emphasis> using the standard
+<literal>let</literal> or <literal>where</literal> binding forms.
+For example, we define the <literal>min</literal> function by binding
+<literal>cmp</literal>.
  <programlisting>
    min :: [a] -> a
    min  = let ?cmp = (<=) in least
  </programlisting>
+</para>
  <para>
+A group of implicit-parameter bindings may occur anywhere a normal group of Haskell
+bindings can occur, except at top level.  That is, they can occur in a <literal>let</literal> 
+(including in a list comprehension, or do-notation, or pattern guards), 
+or a <literal>where</literal> clause.
  Note the following points:
  <itemizedlist>
  <listitem><para>
+An implicit-parameter binding group must be a
+collection of simple bindings to implicit-style variables (no
+function-style bindings, and no type signatures); these bindings are
+neither polymorphic or recursive.  
+</para></listitem>
+<listitem><para>
  You may not mix implicit-parameter bindings with ordinary bindings in a 
  single <literal>let</literal>
  expression; use two nested <literal>let</literal>s instead.
+(In the case of <literal>where</literal> you are stuck, since you can't nest <literal>where</literal> clauses.)
  </para></listitem>
  
  <listitem><para>
  You may put multiple implicit-parameter bindings in a
-single <literal>let</literal> expression; they are <emphasis>not</emphasis> treated
+single binding group; but they are <emphasis>not</emphasis> treated
  as a mutually recursive group (as ordinary <literal>let</literal> bindings are).
-Instead they are treated as a non-recursive group, each scoping over the bindings that
-follow.  For example, consider:
+Instead they are treated as a non-recursive group, simultaneously binding all the implicit
+parameter.  The bindings are not nested, and may be re-ordered without changing
+the meaning of the program.
+For example, consider:
  <programlisting>
-  f y = let { ?x = y; ?x = ?x+1 } in ?x
+  f t = let { ?x = t; ?y = ?x+(1::Int) } in ?x + ?y
  </programlisting>
-This function adds one to its argument.
-</para></listitem>
-
-<listitem><para>
-You may not have an implicit-parameter binding in a <literal>where</literal> clause,
-only in a <literal>let</literal> binding.
-</para></listitem>
-
-<listitem>
-<para> You can't have an implicit parameter in the context of a class or instance
-declaration.  For example, both these declarations are illegal:
+The use of <literal>?x</literal> in the binding for <literal>?y</literal> does not "see"
+the binding for <literal>?x</literal>, so the type of <literal>f</literal> is
  <programlisting>
-  class (?x::Int) => C a where ...
-  instance (?x::a) => Foo [a] where ...
+  f :: (?x::Int) => Int -> Int
  </programlisting>
-Reason: exactly which implicit parameter you pick up depends on exactly where
-you invoke a function. But the ``invocation'' of instance declarations is done
-behind the scenes by the compiler, so it's hard to figure out exactly where it is done.
-Easiest thing is to outlaw the offending types.</para>
-</listitem>
+</para></listitem>
  </itemizedlist>
  </para>
  
+</sect3>
  </sect2>
  
  <sect2 id="linear-implicit-parameters">