X-Git-Url: http://git.megacz.com/?a=blobdiff_plain;f=ghc%2Fdocs%2Fcomm%2Fthe-beast%2Fncg.html;h=5810a352126232bfe592597efad525e5a261f55e;hb=adc50ac8d406b712e093c808c916a8bd44731ee5;hp=0dea3e2ddd1f8948b99ad94eeee00690fef2f8c8;hpb=1100e42470e5baa05aef6833772805901398b48b;p=ghc-hetmet.git diff --git a/ghc/docs/comm/the-beast/ncg.html b/ghc/docs/comm/the-beast/ncg.html index 0dea3e2..5810a35 100644 --- a/ghc/docs/comm/the-beast/ncg.html +++ b/ghc/docs/comm/the-beast/ncg.html @@ -8,24 +8,20 @@
- On x86 and sparc platforms, GHC can generate assembly code + On some platforms (currently x86 and PowerPC, with bitrotted + support for Sparc and Alpha), GHC can generate assembly code directly, without having to go via C. This can sometimes almost halve compilation time, and avoids the fragility and - horribleness of the mangler. The NCG is enabled by default for - non-optimising compilation on x86 and sparc. For most programs - it generates code which runs only about 1% slower than that + horribleness of the mangler. The NCG + is enabled by default for + non-optimising compilation on supported platforms. For most programs + it generates code which runs only 1-3% slower + (depending on platform and type of code) than that created by gcc on x86s, so it is well worth using even with optimised compilation. FP-intensive x86 programs see a bigger - slowdown, and all sparc code runs about 5% slower due to + slowdown, and all Sparc code runs about 5% slower due to us not filling branch delay slots.
- In the distant past - the NCG could also generate Alpha code, and that machinery - is still there, but will need extensive refurbishment to - get it going again, due to underlying infrastructural changes. - Budding hackers thinking of doing a PowerPC port would do well - to use the sparc bits as a starting point. -
The NCG has always been something of a second-class citizen inside GHC, an unloved child, rather. This means that its integration into the compiler as a whole is rather clumsy, which @@ -33,8 +29,11 @@ proper is fairly cleanly designed, as target-independent as it reasonably can be, and so should not be difficult to retarget.
- The following details are correct as per the CVS head of end-Jan - 2002. + NOTE! The native code generator was largely rewritten as part + of the C-- backend changes, around May 2004. Unfortunately the + rest of this document still refers to the old version, and was written + with relation to the CVS head as of end-Jan 2002. Some of it is relevant, + some of it isn't.
+ A note on MagicId
s.
+ Those which are assigned to
+ registers on the current target are left unmodified. Those
+ which are not are stored in memory as offsets from
+ BaseReg
(which is assumed to permanently have the
+ value (&MainCapability.r)
), so the constant folder
+ calculates the offsets and inserts suitable loads/stores. One
+ complication is that not all archs have BaseReg
+ itself in a register, so for those (sparc), we instead
+ generate the address as an offset from the static symbol
+ MainCapability
, since the register table lives in
+ there.
+
+ Finally, BaseReg
does occasionally itself get
+ mentioned in Stix expression trees, and in this case what is
+ denoted is precisely (&MainCapability.r)
, not, as
+ in all other cases, the value of memory at some offset from
+ the start of the register table. Since what it denotes is an
+ r-value and not an l-value, assigning BaseReg
is
+ meaningless, so the machinery checks to ensure this never
+ happens. All these details are taken into account by the
+ constant folder.
getRegister
scheme described above, in which
iselExpr64
generates its results into two vregs which
can always safely be modified afterwards.
-
+
Virtual registers are, unsurprisingly, distinguished by their
Unique
s. There is a small difficulty in how to
know what the vreg for the upper 32 bits of a value is, given the vreg
@@ -593,16 +615,19 @@ There are, however, two unforeseen bad side effects:
- Anyways: inside NCG created x86 FP code this all works fine, but - when control returns to a gcc-generated world, the stack tag bits - soon cause stack exceptions, and thus garbage results. + Anyways: inside NCG created x86 FP code this all works fine. + However, the NCG's fiction of a flat register set does not operate + the x87 register stack in the required stack-like way. When + control returns to a gcc-generated world, the stack tag bits soon + cause stack exceptions, and thus garbage results.
The only fix I could think of -- and it is horrible -- is to clear all the tag bits just before the next STG-level entry, in chunks @@ -635,6 +660,16 @@ some unnecessary reg-reg moves. The reason is explained in a comment in the code. +
+Doing so would have the added benefit that the NCG could be used to +compile more "ways" -- well, at least the 'p' profiling way. +