<body BGCOLOR="FFFFFF">
<h1>The GHC Commentary - The Native Code Generator</h1>
<p>
- On x86 and sparc platforms, GHC can generate assembly code
+ On some platforms (currently x86 and PowerPC, with bitrotted
+ support for Sparc and Alpha), GHC can generate assembly code
directly, without having to go via C. This can sometimes almost
halve compilation time, and avoids the fragility and
- horribleness of the mangler. The NCG is enabled by default for
- non-optimising compilation on x86 and sparc. For most programs
- it generates code which runs only about 1% slower than that
+ horribleness of the <a href="mangler.html">mangler</a>. The NCG
+ is enabled by default for
+ non-optimising compilation on supported platforms. For most programs
+ it generates code which runs only 1-3% slower
+ (depending on platform and type of code) than that
created by gcc on x86s, so it is well worth using even with
optimised compilation. FP-intensive x86 programs see a bigger
- slowdown, and all sparc code runs about 5% slower due to
+ slowdown, and all Sparc code runs about 5% slower due to
us not filling branch delay slots.
<p>
- In the distant past
- the NCG could also generate Alpha code, and that machinery
- is still there, but will need extensive refurbishment to
- get it going again, due to underlying infrastructural changes.
- Budding hackers thinking of doing a PowerPC port would do well
- to use the sparc bits as a starting point.
- <p>
The NCG has always been something of a second-class citizen
inside GHC, an unloved child, rather. This means that its
integration into the compiler as a whole is rather clumsy, which
proper is fairly cleanly designed, as target-independent as it
reasonably can be, and so should not be difficult to retarget.
<p>
- The following details are correct as per the CVS head of end-Jan
- 2002.
+ <b>NOTE!</b> The native code generator was largely rewritten as part
+ of the C-- backend changes, around May 2004. Unfortunately the
+ rest of this document still refers to the old version, and was written
+ with relation to the CVS head as of end-Jan 2002. Some of it is relevant,
+ some of it isn't.
<h2>Overview</h2>
The top-level code generator fn is
implement on all targets, and their meaning is intended to be
unambiguous, and the same on all targets, regardless of word
size or endianness.
+ <p>
+ <b>A note on <code>MagicId</code>s.</b>
+ Those which are assigned to
+ registers on the current target are left unmodified. Those
+ which are not are stored in memory as offsets from
+ <code>BaseReg</code> (which is assumed to permanently have the
+ value <code>(&MainCapability.r)</code>), so the constant folder
+ calculates the offsets and inserts suitable loads/stores. One
+ complication is that not all archs have <code>BaseReg</code>
+ itself in a register, so for those (sparc), we instead
+ generate the address as an offset from the static symbol
+ <code>MainCapability</code>, since the register table lives in
+ there.
+ <p>
+ Finally, <code>BaseReg</code> does occasionally itself get
+ mentioned in Stix expression trees, and in this case what is
+ denoted is precisely <code>(&MainCapability.r)</code>, not, as
+ in all other cases, the value of memory at some offset from
+ the start of the register table. Since what it denotes is an
+ r-value and not an l-value, assigning <code>BaseReg</code> is
+ meaningless, so the machinery checks to ensure this never
+ happens. All these details are taken into account by the
+ constant folder.
<p>
<li><b>Instruction selection.</b> This is the only majorly
target-specific phase. It turns Stix statements and
simplified <code>getRegister</code> scheme described above, in which
<code>iselExpr64</code>generates its results into two vregs which
can always safely be modified afterwards.
-
+<p>
Virtual registers are, unsurprisingly, distinguished by their
<code>Unique</code>s. There is a small difficulty in how to
know what the vreg for the upper 32 bits of a value is, given the vreg
<li>This doesn't work properly, because it doesn't observe the normal
conventions for x86 FP code generation. It turns out that each of
the 8 elements in the x86 FP register stack has a tag bit which
- indicates whether or not that slot contains a valid value. If you
- do a FPU operation which happens to read such a value, you get a
- x87 FPU exception, which is normally handled by the FPU without
- passing it to the OS: the program keeps going, but the resulting
- FP values are garbage. (The OS can ask for the FPU to pass it FP
- stack-invalid exceptions, but it usually doesn't).
+ indicates whether or not that register is notionally in use or
+ not. If you do a FPU operation which happens to read a
+ tagged-as-empty register, you get an x87 FPU (stack invalid)
+ exception, which is normally handled by the FPU without passing it
+ to the OS: the program keeps going, but the resulting FP values
+ are garbage. The OS can ask for the FPU to pass it FP
+ stack-invalid exceptions, but it usually doesn't.
<p>
- Anyways: inside NCG created x86 FP code this all works fine, but
- when control returns to a gcc-generated world, the stack tag bits
- soon cause stack exceptions, and thus garbage results.
+ Anyways: inside NCG created x86 FP code this all works fine.
+ However, the NCG's fiction of a flat register set does not operate
+ the x87 register stack in the required stack-like way. When
+ control returns to a gcc-generated world, the stack tag bits soon
+ cause stack exceptions, and thus garbage results.
<p>
The only fix I could think of -- and it is horrible -- is to clear
all the tag bits just before the next STG-level entry, in chunks
comment in the code.
+<h3>Duplicate implementation for many STG macros</h3>
+
+This has been discussed at length already. It has caused a couple of
+nasty bugs due to subtle untracked divergence in the macro
+translations. The macro-expander really should be pushed up into the
+Abstract C phase, so the problem can't happen.
+<p>
+Doing so would have the added benefit that the NCG could be used to
+compile more "ways" -- well, at least the 'p' profiling way.
+
<h3>How to debug the NCG without losing your sanity/hair/cool</h3>