Implement SSE2 floating-point support in the x86 native code generator (#594)

[ghc-hetmet.git] / docs / users_guide / using.xml
diff --git a/docs/users_guide/using.xml b/docs/users_guide/using.xml

index f668639..329c31f 100644 (file)
--- a/docs/users_guide/using.xml
+++ b/docs/users_guide/using.xml
@@ -860,6 +860,7 @@ ghc -c Foo.hs</screen>
           <indexterm><primary>-W option</primary></indexterm>
           <para>Provides the standard warnings plus
           <option>-fwarn-incomplete-patterns</option>,
+         <option>-fwarn-dodgy-exports</option>,
           <option>-fwarn-dodgy-imports</option>,
           <option>-fwarn-unused-matches</option>,
           <option>-fwarn-unused-imports</option>, and
@@ -991,6 +992,20 @@ foreign import "&amp;f" f :: FunPtr t
        </varlistentry>
  
        <varlistentry>
+       <term><option>-fwarn-dodgy-exports</option>:</term>
+       <listitem>
+         <indexterm><primary><option>-fwarn-dodgy-exports</option></primary>
+         </indexterm>
+         <para>Causes a warning to be emitted when a datatype
+      <literal>T</literal> is exported
+      with all constructors, i.e. <literal>T(..)</literal>, but is it
+      just a type synonym.</para>
+         <para>Also causes a warning to be emitted when a module is
+      re-exported, but that module exports nothing.</para>
+       </listitem>
+      </varlistentry>
+
+      <varlistentry>
         <term><option>-fwarn-dodgy-imports</option>:</term>
         <listitem>
           <indexterm><primary><option>-fwarn-dodgy-imports</option></primary>
@@ -1652,6 +1667,26 @@ f "2"    = 2
  
         <varlistentry>
           <term>
+            <option>-fno-float-in</option>
+            <indexterm><primary><option>-fno-float-in</option></primary></indexterm>
+          </term>
+         <listitem>
+           <para>Turns off the float-in transformation.</para>
+         </listitem>
+       </varlistentry>
+
+       <varlistentry>
+         <term>
+            <option>-fno-specialise</option>
+            <indexterm><primary><option>-fno-specialise</option></primary></indexterm>
+          </term>
+         <listitem>
+           <para>Turns off the automatic specialisation of overloaded functions.</para>
+         </listitem>
+       </varlistentry>
+
+       <varlistentry>
+         <term>
              <option>-fspec-constr</option>
              <indexterm><primary><option>-fspec-constr</option></primary></indexterm>
            </term>
@@ -1936,6 +1971,10 @@ f "2"    = 2
  
              <para>There is no means (currently) by which this value
               may vary after the program has started.</para>
+
+            <para>The current value of the <option>-N</option> option
+              is available to the Haskell program
+              via <literal>GHC.Conc.numCapabilities</literal>.</para>
           </listitem>
         </varlistentry>
        </variablelist>
@@ -1945,6 +1984,17 @@ f "2"    = 2
  
        <variablelist>
         <varlistentry>
+         <term><option>-qa</option></term>
+          <indexterm><primary><option>-qa</option></primary><secondary>RTS
+          option</secondary></indexterm>
+         <listitem>
+            <para>Use the OS's affinity facilities to try to pin OS
+              threads to CPU cores.  This is an experimental feature,
+              and may or may not be useful.  Please let us know
+              whether it helps for you!</para>
+          </listitem>
+        </varlistentry>
+       <varlistentry>
           <term><option>-qm</option></term>
            <indexterm><primary><option>-qm</option></primary><secondary>RTS
            option</secondary></indexterm>
@@ -1952,9 +2002,16 @@ f "2"    = 2
              <para>Disable automatic migration for load balancing.
              Normally the runtime will automatically try to schedule
              threads across the available CPUs to make use of idle
-            CPUs; this option disables that behaviour.  It is probably
-            only of use if you are explicitly scheduling threads onto
-            CPUs with <literal>GHC.Conc.forkOnIO</literal>.</para>
+            CPUs; this option disables that behaviour.  Note that
+              migration only applies to threads; sparks created
+              by <literal>par</literal> are load-balanced separately
+              by work-stealing.</para>
+
+            <para>
+              This option is probably only of use for concurrent
+              programs that explicitly schedule threads onto CPUs
+              with <literal>GHC.Conc.forkOnIO</literal>.
+            </para>
            </listitem>
          </varlistentry>
         <varlistentry>
@@ -1987,19 +2044,20 @@ f "2"    = 2
         whether your program got faster by using more CPUs or not.  If the user
         time is greater than
         the elapsed time, then the program used more than one CPU.  You should
-       also run the program without <literal>-N</literal> for comparison.</para>
-
-      <para>GHC's parallelism support is new and experimental.  It may make your
-       program go faster, or it might slow it down - either way, we'd be
-       interested to hear from you.</para>
-      
-      <para>One significant limitation with the current implementation is that
-       the garbage collector is still single-threaded, and all execution must
-       stop when GC takes place.  This can be a significant bottleneck in a
-       parallel program, especially if your program does a lot of GC.  If this
-       happens to you, then try reducing the cost of GC by tweaking the GC
-       settings (<xref linkend="rts-options-gc" />): enlarging the heap or the
-       allocation area size is a good start.</para>
+       also run the program without <literal>-N</literal> for
+       comparison.</para>
+
+      <para>The output of <literal>+RTS -s</literal> tells you how
+        many &ldquo;sparks&rdquo; were created and executed during the
+        run of the program (see <xref linkend="rts-options-gc" />), which
+        will give you an idea how well your <literal>par</literal>
+        annotations are working.</para>
+
+      <para>GHC's parallelism support has improved in 6.12.1 as a
+        result of much experimentation and tuning in the runtime
+        system.  We'd still be interested to hear how well it works
+        for you, and we're also interested in collecting parallel
+        programs to add to our benchmarking suite.</para>
      </sect2>
    </sect1>
  
@@ -2016,9 +2074,27 @@ f "2"    = 2
      <variablelist>
  
        <varlistentry>
+       <term><option>-msse2</option>:</term>
+       <listitem>
+          <para>
+            (x86 only, added in GHC 6.14.1) Use the SSE2 registers and
+            instruction set to implement floating point operations
+            when using the native code generator.  This gives a
+            substantial performance improvement for floating point,
+            but the resulting compiled code will only run on
+            processors that support SSE2 (Intel Pentium 4 and later,
+            or AMD Athlon 64 and later).
+          </para>
+          <para>
+            SSE2 is unconditionally used on x86-64 platforms.
+          </para>
+        </listitem>
+      </varlistentry>
+
+      <varlistentry>
         <term><option>-monly-[32]-regs</option>:</term>
         <listitem>
-         <para>(iX86 machines)<indexterm><primary>-monly-N-regs
+         <para>(x86 only)<indexterm><primary>-monly-N-regs
            option (iX86 only)</primary></indexterm> GHC tries to
            &ldquo;steal&rdquo; four registers from GCC, for performance
            reasons; it almost always works.  However, when GCC is