ghc/docs/users_guide/bugs.xml

   1 <?xml version="1.0" encoding="iso-8859-1"?>
   2 <chapter id="bugs-and-infelicities">
   3   <title>Known bugs and infelicities</title>
   4
   5   <sect1 id="vs-Haskell-defn">
   6     <title>Haskell&nbsp;98 vs.&nbsp;Glasgow Haskell: language non-compliance
   7 </title>
   8
   9     <indexterm><primary>GHC vs the Haskell 98 language</primary></indexterm>
  10     <indexterm><primary>Haskell 98 language vs GHC</primary></indexterm>
  11
  12   <para>This section lists Glasgow Haskell infelicities in its
  13   implementation of Haskell&nbsp;98.  See also the &ldquo;when things
  14   go wrong&rdquo; section (<xref linkend="wrong"/>) for information
  15   about crashes, space leaks, and other undesirable phenomena.</para>
  16
  17   <para>The limitations here are listed in Haskell Report order
  18   (roughly).</para>
  19
  20   <sect2 id="haskell98-divergence">
  21     <title>Divergence from Haskell&nbsp;98</title>
  22
  23
  24     <sect3 id="infelicities-lexical">
  25       <title>Lexical syntax</title>
  26
  27       <itemizedlist>
  28         <listitem>
  29           <para>The Haskell report specifies that programs may be
  30           written using Unicode.  GHC only accepts the ISO-8859-1
  31           character set at the moment.</para>
  32         </listitem>
  33
  34         <listitem>
  35           <para>Certain lexical rules regarding qualified identifiers
  36           are slightly different in GHC compared to the Haskell
  37           report.  When you have
  38           <replaceable>module</replaceable><literal>.</literal><replaceable>reservedop</replaceable>,
  39           such as <literal>M.\</literal>, GHC will interpret it as a
  40           single qualified operator rather than the two lexemes
  41           <literal>M</literal> and <literal>.\</literal>.</para>
  42         </listitem>
  43       </itemizedlist>
  44     </sect3>
  45
  46       <sect3 id="infelicities-syntax">
  47         <title>Context-free syntax</title>
  48
  49         <itemizedlist>
  50           <listitem>
  51             <para>GHC is a little less strict about the layout rule when used
  52               in <literal>do</literal> expressions.  Specifically, the
  53               restriction that "a nested context must be indented further to
  54               the right than the enclosing context" is relaxed to allow the
  55               nested context to be at the same level as the enclosing context,
  56               if the enclosing context is a <literal>do</literal>
  57               expression.</para>
  58
  59             <para>For example, the following code is accepted by GHC:
  60
  61 <programlisting>
  62 main = do args &lt;- getArgs
  63           if null args then return [] else do
  64           ps &lt;- mapM process args
  65           mapM print ps</programlisting>
  66
  67               </para>
  68           </listitem>
  69
  70         <listitem>
  71           <para>GHC doesn't do fixity resolution in expressions during
  72           parsing.  For example, according to the Haskell report, the
  73           following expression is legal Haskell:
  74 <programlisting>
  75     let x = 42 in x == 42 == True</programlisting>
  76         and parses as:
  77 <programlisting>
  78     (let x = 42 in x == 42) == True</programlisting>
  79
  80           because according to the report, the <literal>let</literal>
  81           expression <quote>extends as far to the right as
  82           possible</quote>.  Since it can't extend past the second
  83           equals sign without causing a parse error
  84           (<literal>==</literal> is non-fix), the
  85           <literal>let</literal>-expression must terminate there.  GHC
  86           simply gobbles up the whole expression, parsing like this:
  87 <programlisting>
  88     (let x = 42 in x == 42 == True)</programlisting>
  89
  90           The Haskell report is arguably wrong here, but nevertheless
  91           it's a difference between GHC &amp; Haskell 98.</para>
  92         </listitem>
  93       </itemizedlist>
  94     </sect3>
  95
  96   <sect3 id="infelicities-exprs-pats">
  97       <title>Expressions and patterns</title>
  98
  99         <para>None known.</para>
 100     </sect3>
 101
 102     <sect3 id="infelicities-decls">
 103       <title>Declarations and bindings</title>
 104
 105       <para>None known.</para>
 106     </sect3>
 107
 108       <sect3 id="infelicities-Modules">
 109         <title>Module system and interface files</title>
 110
 111         <para>None known.</para>
 112     </sect3>
 113
 114     <sect3 id="infelicities-numbers">
 115       <title>Numbers, basic types, and built-in classes</title>
 116
 117       <variablelist>
 118         <varlistentry>
 119           <term>Multiply-defined array elements&mdash;not checked:</term>
 120           <listitem>
 121             <para>This code fragment should
 122             elicit a fatal error, but it does not:
 123
 124 <programlisting>
 125 main = print (array (1,1) [(1,2), (1,3)])</programlisting>
 126 GHC's implementation of <literal>array</literal> takes the value of an
 127 array slot from the last (index,value) pair in the list, and does no
 128 checking for duplicates.  The reason for this is efficiency, pure and simple.
 129             </para>
 130           </listitem>
 131         </varlistentry>
 132       </variablelist>
 133
 134     </sect3>
 135
 136       <sect3 id="infelicities-Prelude">
 137         <title>In <literal>Prelude</literal> support</title>
 138
 139       <variablelist>
 140         <varlistentry>
 141           <term>Arbitrary-sized tuples</term>
 142           <listitem>
 143             <para>Tuples are currently limited to size 100.  HOWEVER:
 144             standard instances for tuples (<literal>Eq</literal>,
 145             <literal>Ord</literal>, <literal>Bounded</literal>,
 146             <literal>Ix</literal> <literal>Read</literal>, and
 147             <literal>Show</literal>) are available
 148             <emphasis>only</emphasis> up to 16-tuples.</para>
 149
 150             <para>This limitation is easily subvertible, so please ask
 151             if you get stuck on it.</para>
 152             </listitem>
 153           </varlistentry>
 154
 155           <varlistentry>
 156             <term><literal>Read</literal>ing integers</term>
 157             <listitem>
 158               <para>GHC's implementation of the
 159               <literal>Read</literal> class for integral types accepts
 160               hexadecimal and octal literals (the code in the Haskell
 161               98 report doesn't).  So, for example,
 162 <programlisting>read "0xf00" :: Int</programlisting>
 163               works in GHC.</para>
 164               <para>A possible reason for this is that <literal>readLitChar</literal> accepts hex and
 165                 octal escapes, so it seems inconsistent not to do so for integers too.</para>
 166             </listitem>
 167           </varlistentry>
 168
 169           <varlistentry>
 170             <term><literal>isAlpha</literal></term>
 171             <listitem>
 172               <para>The Haskell 98 definition of <literal>isAlpha</literal>
 173               is:</para>
 174
 175 <programlisting>isAlpha c = isUpper c || isLower c</programlisting>
 176
 177               <para>GHC's implementation diverges from the Haskell 98
 178               definition in the sense that Unicode alphabetic characters which
 179               are neither upper nor lower case will still be identified as
 180               alphabetic by <literal>isAlpha</literal>.</para>
 181             </listitem>
 182           </varlistentry>
 183         </variablelist>
 184     </sect3>
 185   </sect2>
 186
 187   <sect2 id="haskell98-undefined">
 188     <title>GHC's interpretation of undefined behaviour in
 189     Haskell&nbsp;98</title>
 190
 191     <para>This section documents GHC's take on various issues that are
 192     left undefined or implementation specific in Haskell 98.</para>
 193
 194     <variablelist>
 195       <varlistentry>
 196         <term>
 197           The <literal>Char</literal> type
 198           <indexterm><primary><literal>Char</literal></primary><secondary>size of</secondary></indexterm>
 199         </term>
 200         <listitem>
 201           <para>Following the ISO-10646 standard,
 202           <literal>maxBound :: Char</literal> in GHC is
 203           <literal>0x10FFFF</literal>.</para>
 204         </listitem>
 205       </varlistentry>
 206
 207       <varlistentry>
 208         <term>
 209           Sized integral types
 210           <indexterm><primary><literal>Int</literal></primary><secondary>size of</secondary></indexterm>
 211         </term>
 212         <listitem>
 213           <para>In GHC the <literal>Int</literal> type follows the
 214           size of an address on the host architecture; in other words
 215           it holds 32 bits on a 32-bit machine, and 64-bits on a
 216           64-bit machine.</para>
 217
 218           <para>Arithmetic on <literal>Int</literal> is unchecked for
 219           overflow<indexterm><primary>overflow</primary><secondary><literal>Int</literal></secondary>
 220             </indexterm>, so all operations on <literal>Int</literal> happen
 221           modulo
 222           2<superscript><replaceable>n</replaceable></superscript>
 223           where <replaceable>n</replaceable> is the size in bits of
 224           the <literal>Int</literal> type.</para>
 225
 226           <para>The <literal>fromInteger</literal><indexterm><primary><literal>fromInteger</literal></primary>
 227             </indexterm>function (and hence
 228           also <literal>fromIntegral</literal><indexterm><primary><literal>fromIntegral</literal></primary>
 229             </indexterm>) is a special case when
 230           converting to <literal>Int</literal>.  The value of
 231           <literal>fromIntegral x :: Int</literal> is given by taking
 232           the lower <replaceable>n</replaceable> bits of <literal>(abs
 233           x)</literal>, multiplied by the sign of <literal>x</literal>
 234           (in 2's complement <replaceable>n</replaceable>-bit
 235           arithmetic).  This behaviour was chosen so that for example
 236           writing <literal>0xffffffff :: Int</literal> preserves the
 237           bit-pattern in the resulting <literal>Int</literal>.</para>
 238
 239
 240            <para>Negative literals, such as <literal>-3</literal>, are
 241              specified by (a careful reading of) the Haskell Report as
 242              meaning <literal>Prelude.negate (Prelude.fromInteger 3)</literal>.
 243              So <literal>-2147483648</literal> means <literal>negate (fromInteger 2147483648)</literal>.
 244              Since <literal>fromInteger</literal> takes the lower 32 bits of the representation,
 245              <literal>fromInteger (2147483648::Integer)</literal>, computed at type <literal>Int</literal> is
 246              <literal>-2147483648::Int</literal>.  The <literal>negate</literal> operation then
 247              overflows, but it is unchecked, so <literal>negate (-2147483648::Int)</literal> is just
 248              <literal>-2147483648</literal>.  In short, one can write <literal>minBound::Int</literal> as
 249              a literal with the expected meaning (but that is not in general guaranteed.
 250              </para>
 251
 252           <para>The <literal>fromIntegral</literal> function also
 253           preserves bit-patterns when converting between the sized
 254           integral types (<literal>Int8</literal>,
 255           <literal>Int16</literal>, <literal>Int32</literal>,
 256           <literal>Int64</literal> and the unsigned
 257           <literal>Word</literal> variants), see the modules
 258           <literal>Data.Int</literal> and <literal>Data.Word</literal>
 259           in the library documentation.</para>
 260         </listitem>
 261       </varlistentry>
 262
 263       <varlistentry>
 264         <term>Unchecked float arithmetic</term>
 265         <listitem>
 266           <para>Operations on <literal>Float</literal> and
 267           <literal>Double</literal> numbers are
 268           <emphasis>unchecked</emphasis> for overflow, underflow, and
 269           other sad occurrences.  (note, however that some
 270           architectures trap floating-point overflow and
 271           loss-of-precision and report a floating-point exception,
 272           probably terminating the
 273           program)<indexterm><primary>floating-point
 274           exceptions</primary></indexterm>.</para>
 275         </listitem>
 276       </varlistentry>
 277     </variablelist>
 278
 279     </sect2>
 280   </sect1>
 281
 282
 283   <sect1 id="bugs">
 284     <title>Known bugs or infelicities</title>
 285
 286     <para>In addition to the divergences from the Haskell 98 standard
 287     listed above, GHC has the following known bugs or
 288     infelicities.</para>
 289
 290   <sect2 id="bugs-ghc">
 291     <title>Bugs in GHC</title>
 292
 293     <itemizedlist>
 294       <listitem>
 295         <para> GHC can warn about non-exhaustive or overlapping
 296         patterns (see <xref linkend="options-sanity"/>), and usually
 297         does so correctly.  But not always.  It gets confused by
 298         string patterns, and by guards, and can then emit bogus
 299         warnings.  The entire overlap-check code needs an overhaul
 300         really.</para>
 301       </listitem>
 302
 303       <listitem>
 304         <para>GHC does not allow you to have a data type with a context
 305            that mentions type variables that are not data type parameters.
 306           For example:
 307 <programlisting>
 308   data C a b => T a = MkT a
 309 </programlisting>
 310           so that <literal>MkT</literal>'s type is
 311 <programlisting>
 312   MkT :: forall a b. C a b => a -> T a
 313 </programlisting>
 314         In principle, with a suitable class declaration with a functional dependency,
 315          it's possible that this type is not ambiguous; but GHC nevertheless rejects
 316           it.  The type variables mentioned in the context of the data type declaration must
 317         be among the type parameters of the data type.</para>
 318       </listitem>
 319
 320       <listitem>
 321         <para>GHC's inliner can be persuaded into non-termination
 322         using the standard way to encode recursion via a data type:</para>
 323 <programlisting>
 324   data U = MkU (U -> Bool)
 325
 326   russel :: U -> Bool
 327   russel u@(MkU p) = not $ p u
 328
 329   x :: Bool
 330   x = russel (MkU russel)
 331 </programlisting>
 332
 333         <para>We have never found another class of programs, other
 334         than this contrived one, that makes GHC diverge, and fixing
 335         the problem would impose an extra overhead on every
 336         compilation.  So the bug remains un-fixed.  There is more
 337         background in <ulink
 338         url="http://research.microsoft.com/~simonpj/Papers/inlining">
 339         Secrets of the GHC inliner</ulink>.</para>
 340       </listitem>
 341     </itemizedlist>
 342   </sect2>
 343
 344   <sect2 id="bugs-ghci">
 345     <title>Bugs in GHCi (the interactive GHC)</title>
 346     <itemizedlist>
 347       <listitem>
 348         <para>GHCi does not respect the <literal>default</literal>
 349         declaration in the module whose scope you are in.  Instead,
 350         for expressions typed at the command line, you always get the
 351         default default-type behaviour; that is,
 352         <literal>default(Int,Double)</literal>.</para>
 353
 354         <para>It would be better for GHCi to record what the default
 355         settings in each module are, and use those of the 'current'
 356         module (whatever that is).</para>
 357       </listitem>
 358
 359       <listitem>
 360         <para>GHCi does not keep careful track of what instance
 361         declarations are 'in scope' if they come from other packages.
 362         Instead, all instance declarations that GHC has seen in other
 363         packages are all in scope everywhere, whether or not the
 364         module from that package is used by the command-line
 365         expression.</para>
 366       </listitem>
 367
 368       <listitem>
 369       <para>On Windows, there's a GNU ld/BFD bug
 370       whereby it emits bogus PE object files that have more than
 371       0xffff relocations. When GHCi tries to load a package affected by this
 372       bug, you get an error message of the form
 373 <screen>
 374 Loading package javavm ... linking ... WARNING: Overflown relocation field (# relocs found: 30765)
 375 </screen>
 376       The last time we looked, this bug still
 377       wasn't fixed in the BFD codebase, and there wasn't any
 378       noticeable interest in fixing it when we reported the bug
 379       back in 2001 or so.
 380       </para>
 381       <para>The workaround is to split up the .o files that make up
 382       your package into two or more .o's, along the lines of
 383       how the "base" package does it.</para>
 384       </listitem>
 385     </itemizedlist>
 386   </sect2>
 387   </sect1>
 388
 389 </chapter>
 390
 391 <!-- Emacs stuff:
 392      ;;; Local Variables: ***
 393      ;;; mode: xml ***
 394      ;;; sgml-parent-document: ("users_guide.xml" "book" "chapter") ***
 395      ;;; End: ***
 396  -->