X-Git-Url: http://git.megacz.com/?a=blobdiff_plain;f=ghc%2Fdocs%2Fcomm%2Fthe-beast%2Fncg.html;h=5810a352126232bfe592597efad525e5a261f55e;hb=adc50ac8d406b712e093c808c916a8bd44731ee5;hp=0dea3e2ddd1f8948b99ad94eeee00690fef2f8c8;hpb=1100e42470e5baa05aef6833772805901398b48b;p=ghc-hetmet.git diff --git a/ghc/docs/comm/the-beast/ncg.html b/ghc/docs/comm/the-beast/ncg.html index 0dea3e2..5810a35 100644 --- a/ghc/docs/comm/the-beast/ncg.html +++ b/ghc/docs/comm/the-beast/ncg.html @@ -8,24 +8,20 @@

The GHC Commentary - The Native Code Generator

- On x86 and sparc platforms, GHC can generate assembly code + On some platforms (currently x86 and PowerPC, with bitrotted + support for Sparc and Alpha), GHC can generate assembly code directly, without having to go via C. This can sometimes almost halve compilation time, and avoids the fragility and - horribleness of the mangler. The NCG is enabled by default for - non-optimising compilation on x86 and sparc. For most programs - it generates code which runs only about 1% slower than that + horribleness of the mangler. The NCG + is enabled by default for + non-optimising compilation on supported platforms. For most programs + it generates code which runs only 1-3% slower + (depending on platform and type of code) than that created by gcc on x86s, so it is well worth using even with optimised compilation. FP-intensive x86 programs see a bigger - slowdown, and all sparc code runs about 5% slower due to + slowdown, and all Sparc code runs about 5% slower due to us not filling branch delay slots.

- In the distant past - the NCG could also generate Alpha code, and that machinery - is still there, but will need extensive refurbishment to - get it going again, due to underlying infrastructural changes. - Budding hackers thinking of doing a PowerPC port would do well - to use the sparc bits as a starting point. -

The NCG has always been something of a second-class citizen inside GHC, an unloved child, rather. This means that its integration into the compiler as a whole is rather clumsy, which @@ -33,8 +29,11 @@ proper is fairly cleanly designed, as target-independent as it reasonably can be, and so should not be difficult to retarget.

- The following details are correct as per the CVS head of end-Jan - 2002. + NOTE! The native code generator was largely rewritten as part + of the C-- backend changes, around May 2004. Unfortunately the + rest of this document still refers to the old version, and was written + with relation to the CVS head as of end-Jan 2002. Some of it is relevant, + some of it isn't.

Overview

The top-level code generator fn is @@ -96,6 +95,29 @@ implement on all targets, and their meaning is intended to be unambiguous, and the same on all targets, regardless of word size or endianness. +

+ A note on MagicIds. + Those which are assigned to + registers on the current target are left unmodified. Those + which are not are stored in memory as offsets from + BaseReg (which is assumed to permanently have the + value (&MainCapability.r)), so the constant folder + calculates the offsets and inserts suitable loads/stores. One + complication is that not all archs have BaseReg + itself in a register, so for those (sparc), we instead + generate the address as an offset from the static symbol + MainCapability, since the register table lives in + there. +

+ Finally, BaseReg does occasionally itself get + mentioned in Stix expression trees, and in this case what is + denoted is precisely (&MainCapability.r), not, as + in all other cases, the value of memory at some offset from + the start of the register table. Since what it denotes is an + r-value and not an l-value, assigning BaseReg is + meaningless, so the machinery checks to ensure this never + happens. All these details are taken into account by the + constant folder.

  • Instruction selection. This is the only majorly target-specific phase. It turns Stix statements and @@ -435,7 +457,7 @@ bit code as simply as possible. To this end, I used the simplified getRegister scheme described above, in which iselExpr64generates its results into two vregs which can always safely be modified afterwards. - +

    Virtual registers are, unsurprisingly, distinguished by their Uniques. There is a small difficulty in how to know what the vreg for the upper 32 bits of a value is, given the vreg @@ -593,16 +615,19 @@ There are, however, two unforeseen bad side effects:

  • This doesn't work properly, because it doesn't observe the normal conventions for x86 FP code generation. It turns out that each of the 8 elements in the x86 FP register stack has a tag bit which - indicates whether or not that slot contains a valid value. If you - do a FPU operation which happens to read such a value, you get a - x87 FPU exception, which is normally handled by the FPU without - passing it to the OS: the program keeps going, but the resulting - FP values are garbage. (The OS can ask for the FPU to pass it FP - stack-invalid exceptions, but it usually doesn't). + indicates whether or not that register is notionally in use or + not. If you do a FPU operation which happens to read a + tagged-as-empty register, you get an x87 FPU (stack invalid) + exception, which is normally handled by the FPU without passing it + to the OS: the program keeps going, but the resulting FP values + are garbage. The OS can ask for the FPU to pass it FP + stack-invalid exceptions, but it usually doesn't.

    - Anyways: inside NCG created x86 FP code this all works fine, but - when control returns to a gcc-generated world, the stack tag bits - soon cause stack exceptions, and thus garbage results. + Anyways: inside NCG created x86 FP code this all works fine. + However, the NCG's fiction of a flat register set does not operate + the x87 register stack in the required stack-like way. When + control returns to a gcc-generated world, the stack tag bits soon + cause stack exceptions, and thus garbage results.

    The only fix I could think of -- and it is horrible -- is to clear all the tag bits just before the next STG-level entry, in chunks @@ -635,6 +660,16 @@ some unnecessary reg-reg moves. The reason is explained in a comment in the code. +

    Duplicate implementation for many STG macros

    + +This has been discussed at length already. It has caused a couple of +nasty bugs due to subtle untracked divergence in the macro +translations. The macro-expander really should be pushed up into the +Abstract C phase, so the problem can't happen. +

    +Doing so would have the added benefit that the NCG could be used to +compile more "ways" -- well, at least the 'p' profiling way. +

    How to debug the NCG without losing your sanity/hair/cool