Implement SSE2 floating-point support in the x86 native code generator (#594)
authorSimon Marlow <marlowsd@gmail.com>
Thu, 4 Feb 2010 10:48:49 +0000 (10:48 +0000)
committerSimon Marlow <marlowsd@gmail.com>
Thu, 4 Feb 2010 10:48:49 +0000 (10:48 +0000)
commit335b9f366ac440259318777c4c07e4fa42fbbec6
tree6eaa6bee7a0af467c18ed1d42eb47b38c52a9169
parentd9f7177402769968e8f42b49c1941661e18c5773
Implement SSE2 floating-point support in the x86 native code generator (#594)

The new flag -msse2 enables code generation for SSE2 on x86.  It
results in substantially faster floating-point performance; the main
reason for doing this was that our x87 code generation is appallingly
bad, and since we plan to drop -fvia-C soon, we need a way to generate
half-decent floating-point code.

The catch is that SSE2 is only available on CPUs that support it (P4+,
AMD K8+).  We'll have to think hard about whether we should enable it
by default for the libraries we ship.  In the meantime, at least
-msse2 should be an acceptable replacement for "-fvia-C
-optc-ffast-math -fexcess-precision".

SSE2 also has the advantage of performing all operations at the
correct precision, so floating-point results are consistent with other
platforms.

I also tweaked the x87 code generation a bit while I was here, now
it's slighlty less bad than before.
17 files changed:
compiler/main/DynFlags.hs
compiler/nativeGen/PPC/Ppr.hs
compiler/nativeGen/PPC/Regs.hs
compiler/nativeGen/Reg.hs
compiler/nativeGen/RegAlloc/Graph/Main.hs
compiler/nativeGen/RegAlloc/Graph/TrivColorable.hs
compiler/nativeGen/RegClass.hs
compiler/nativeGen/SPARC/Instr.hs
compiler/nativeGen/SPARC/Ppr.hs
compiler/nativeGen/SPARC/Regs.hs
compiler/nativeGen/X86/CodeGen.hs
compiler/nativeGen/X86/Instr.hs
compiler/nativeGen/X86/Ppr.hs
compiler/nativeGen/X86/RegInfo.hs
compiler/nativeGen/X86/Regs.hs
docs/users_guide/flags.xml
docs/users_guide/using.xml