Implement SSE2 floating-point support in the x86 native code generator (#594)
The new flag -msse2 enables code generation for SSE2 on x86. It
results in substantially faster floating-point performance; the main
reason for doing this was that our x87 code generation is appallingly
bad, and since we plan to drop -fvia-C soon, we need a way to generate
half-decent floating-point code.
The catch is that SSE2 is only available on CPUs that support it (P4+,
AMD K8+). We'll have to think hard about whether we should enable it
by default for the libraries we ship. In the meantime, at least
-msse2 should be an acceptable replacement for "-fvia-C
-optc-ffast-math -fexcess-precision".
SSE2 also has the advantage of performing all operations at the
correct precision, so floating-point results are consistent with other
platforms.
I also tweaked the x87 code generation a bit while I was here, now
it's slighlty less bad than before.
17 files changed: