2 % (c) The AQUA Project, Glasgow University, 1993-1998
6 module AsmCodeGen ( nativeCodeGen ) where
8 #include "HsVersions.h"
9 #include "nativeGen/NCG.h"
12 import List ( intersperse )
19 import AbsCStixGen ( genCodeAbstractC )
20 import AbsCSyn ( AbstractC, MagicId )
21 import AsmRegAlloc ( runRegAllocate )
22 import PrimOp ( commutableOp, PrimOp(..) )
23 import RegAllocInfo ( mkMRegsState, MRegsState, findReservedRegs )
24 import Stix ( StixTree(..), StixReg(..),
25 pprStixTrees, ppStixTree, CodeSegment(..),
26 stixCountTempUses, stixSubst,
27 NatM, initNat, mapNat,
28 NatM_State, mkNatM_State,
29 uniqOfNatM_State, deltaOfNatM_State )
30 import PrimRep ( isFloatingRep, PrimRep(..) )
31 import UniqSupply ( returnUs, thenUs, mapUs, initUs,
32 initUs_, UniqSM, UniqSupply )
33 import MachMisc ( IF_ARCH_i386(i386_insert_ffrees,) )
35 import OrdList ( fromOL, concatOL )
40 The 96/03 native-code generator has machine-independent and
41 machine-dependent modules (those \tr{#include}'ing \tr{NCG.h}).
43 This module (@AsmCodeGen@) is the top-level machine-independent
44 module. It uses @AbsCStixGen.genCodeAbstractC@ to produce @StixTree@s
45 (defined in module @Stix@), using support code from @StixInfo@ (info
46 tables), @StixPrim@ (primitive operations), @StixMacro@ (Abstract C
47 macros), and @StixInteger@ (GMP arbitrary-precision operations).
49 Before entering machine-dependent land, we do some machine-independent
50 @genericOpt@imisations (defined below) on the @StixTree@s.
52 We convert to the machine-specific @Instr@ datatype with
53 @stmt2Instrs@, assuming an ``infinite'' supply of registers. We then
54 use a machine-independent register allocator (@runRegAllocate@) to
55 rejoin reality. Obviously, @runRegAllocate@ has machine-specific
56 helper functions (see about @RegAllocInfo@ below).
58 The machine-dependent bits break down as follows:
60 \item[@MachRegs@:] Everything about the target platform's machine
61 registers (and immediate operands, and addresses, which tend to
62 intermingle/interact with registers).
64 \item[@MachMisc@:] Includes the @Instr@ datatype (possibly should
65 have a module of its own), plus a miscellany of other things
66 (e.g., @targetDoubleSize@, @smStablePtrTable@, ...)
68 \item[@MachCode@:] @stmt2Instrs@ is where @Stix@ stuff turns into
71 \item[@PprMach@:] @pprInstr@ turns an @Instr@ into text (well, really
74 \item[@RegAllocInfo@:] In the register allocator, we manipulate
75 @MRegsState@s, which are @BitSet@s, one bit per machine register.
76 When we want to say something about a specific machine register
77 (e.g., ``it gets clobbered by this instruction''), we set/unset
78 its bit. Obviously, we do this @BitSet@ thing for efficiency
81 The @RegAllocInfo@ module collects together the machine-specific
82 info needed to do register allocation.
88 nativeCodeGen :: AbstractC -> UniqSupply -> (SDoc, SDoc)
90 = let (stixRaw, us1) = initUs us (genCodeAbstractC absC)
91 stixOpt = map genericOpt stixRaw
92 insns = initUs_ us1 (codeGen stixOpt)
93 debug_stix = vcat (map pprStixTrees stixOpt)
95 trace "nativeGen: begin"
99 @codeGen@ is the top-level code-generation function:
101 codeGen :: [[StixTree]] -> UniqSM SDoc
104 = mapUs genMachCode stixFinal `thenUs` \ dynamic_codes ->
106 fp_kludge :: [Instr] -> [Instr]
107 fp_kludge = IF_ARCH_i386(i386_insert_ffrees,id)
109 static_instrss :: [[Instr]]
110 static_instrss = map fp_kludge (scheduleMachCode dynamic_codes)
111 docs = map (vcat . map pprInstr) static_instrss
113 -- for debugging only
114 docs_prealloc = map (vcat . map pprInstr . fromOL)
116 text_prealloc = vcat (intersperse (char ' ' $$ char ' ') docs_prealloc)
118 --trace (showSDoc text_prealloc) (
119 returnUs (vcat (intersperse (char ' '
120 $$ ptext SLIT("# ___stg_split_marker")
126 Top level code generator for a chunk of stix code. For this part of
127 the computation, we switch from the UniqSM monad to the NatM monad.
128 The latter carries not only a Unique, but also an Int denoting the
129 current C stack pointer offset in the generated code; this is needed
130 for creating correct spill offsets on architectures which don't offer,
131 or for which it would be prohibitively expensive to employ, a frame
132 pointer register. Viz, x86.
134 The offset is measured in bytes, and indicates the difference between
135 the current (simulated) C stack-ptr and the value it was at the
136 beginning of the block. For stacks which grow down, this value should
137 be either zero or negative.
139 Switching between the two monads whilst carrying along the same Unique
140 supply breaks abstraction. Is that bad?
143 genMachCode :: [StixTree] -> UniqSM InstrBlock
145 genMachCode stmts initial_us
146 = let initial_st = mkNatM_State initial_us 0
147 (blocks, final_st) = initNat initial_st
148 (mapNat stmt2Instrs stmts)
149 instr_list = concatOL blocks
150 final_us = uniqOfNatM_State final_st
151 final_delta = deltaOfNatM_State final_st
154 then (instr_list, final_us)
155 else pprPanic "genMachCode: nonzero final delta"
159 The next bit does the code scheduling. The scheduler must also deal
160 with register allocation of temporaries. Much parallelism can be
161 exposed via the OrdList, but more might occur, so further analysis
165 scheduleMachCode :: [InstrBlock] -> [[Instr]]
168 = map (runRegAllocate freeRegsState findReservedRegs)
170 freeRegsState = mkMRegsState (extractMappedRegNos freeRegs)
173 %************************************************************************
175 \subsection[NCOpt]{The Generic Optimiser}
177 %************************************************************************
179 This is called between translating Abstract C to its Tree and actually
180 using the Native Code Generator to generate the annotations. It's a
181 chance to do some strength reductions.
183 ** Remember these all have to be machine independent ***
185 Note that constant-folding should have already happened, but we might
186 have introduced some new opportunities for constant-folding wrt
187 address manipulations.
190 genericOpt :: [StixTree] -> [StixTree]
191 genericOpt = map stixConFold . stixPeep
195 stixPeep :: [StixTree] -> [StixTree]
197 -- This transformation assumes that the temp assigned to in t1
198 -- is not assigned to in t2; for otherwise the target of the
199 -- second assignment would be substituted for, giving nonsense
200 -- code. As far as I can see, StixTemps are only ever assigned
201 -- to once. It would be nice to be sure!
203 stixPeep ( t1@(StAssign pka (StReg (StixTemp u pk)) rhs)
206 | stixCountTempUses u t2 == 1
207 && sum (map (stixCountTempUses u) ts) == 0
208 = trace ("nativeGen: stixInline: " ++ showSDoc (ppStixTree rhs))
209 (stixPeep (stixSubst u rhs t2 : ts))
211 stixPeep (t1:t2:ts) = t1 : stixPeep (t2:ts)
216 -- disable stix inlining until we figure out how to fix the
217 -- latent bugs in the register allocator which are exposed by
222 For most nodes, just optimize the children.
225 stixConFold :: StixTree -> StixTree
227 stixConFold (StInd pk addr) = StInd pk (stixConFold addr)
229 stixConFold (StAssign pk dst src)
230 = StAssign pk (stixConFold dst) (stixConFold src)
232 stixConFold (StJump addr) = StJump (stixConFold addr)
234 stixConFold (StCondJump addr test)
235 = StCondJump addr (stixConFold test)
237 stixConFold (StCall fn cconv pk args)
238 = StCall fn cconv pk (map stixConFold args)
241 Fold indices together when the types match:
243 stixConFold (StIndex pk (StIndex pk' base off) off')
245 = StIndex pk (stixConFold base)
246 (stixConFold (StPrim IntAddOp [off, off']))
248 stixConFold (StIndex pk base off)
249 = StIndex pk (stixConFold base) (stixConFold off)
252 For PrimOps, we first optimize the children, and then we try our hand
253 at some constant-folding.
256 stixConFold (StPrim op args) = stixPrimFold op (map stixConFold args)
259 Replace register leaves with appropriate StixTrees for the given
263 stixConFold leaf@(StReg (StixMagicId id))
264 = case (stgReg id) of
265 Always tree -> stixConFold tree
268 stixConFold other = other
271 Now, try to constant-fold the PrimOps. The arguments have already
272 been optimized and folded.
276 :: PrimOp -- The operation from an StPrim
277 -> [StixTree] -- The optimized arguments
280 stixPrimFold op arg@[StInt x]
282 IntNegOp -> StInt (-x)
285 stixPrimFold op args@[StInt x, StInt y]
287 CharGtOp -> StInt (if x > y then 1 else 0)
288 CharGeOp -> StInt (if x >= y then 1 else 0)
289 CharEqOp -> StInt (if x == y then 1 else 0)
290 CharNeOp -> StInt (if x /= y then 1 else 0)
291 CharLtOp -> StInt (if x < y then 1 else 0)
292 CharLeOp -> StInt (if x <= y then 1 else 0)
293 IntAddOp -> StInt (x + y)
294 IntSubOp -> StInt (x - y)
295 IntMulOp -> StInt (x * y)
296 IntQuotOp -> StInt (x `quot` y)
297 IntRemOp -> StInt (x `rem` y)
298 IntGtOp -> StInt (if x > y then 1 else 0)
299 IntGeOp -> StInt (if x >= y then 1 else 0)
300 IntEqOp -> StInt (if x == y then 1 else 0)
301 IntNeOp -> StInt (if x /= y then 1 else 0)
302 IntLtOp -> StInt (if x < y then 1 else 0)
303 IntLeOp -> StInt (if x <= y then 1 else 0)
304 -- ToDo: WordQuotOp, WordRemOp.
308 When possible, shift the constants to the right-hand side, so that we
309 can match for strength reductions. Note that the code generator will
310 also assume that constants have been shifted to the right when
314 stixPrimFold op [x@(StInt _), y] | commutableOp op = stixPrimFold op [y, x]
317 We can often do something with constants of 0 and 1 ...
320 stixPrimFold op args@[x, y@(StInt 0)]
333 IntNeOp | is_comparison -> x
338 StPrim opp [_, _] -> opp `elem` comparison_ops
341 stixPrimFold op args@[x, y@(StInt 1)]
349 Now look for multiplication/division by powers of 2 (integers).
352 stixPrimFold op args@[x, y@(StInt n)]
354 IntMulOp -> case exactLog2 n of
355 Nothing -> StPrim op args
356 Just p -> StPrim ISllOp [x, StInt p]
357 IntQuotOp -> case exactLog2 n of
358 Nothing -> StPrim op args
359 Just p -> StPrim ISrlOp [x, StInt p]
363 Anything else is just too hard.
366 stixPrimFold op args = StPrim op args
371 = [ CharGtOp , CharGeOp , CharEqOp , CharNeOp , CharLtOp , CharLeOp,
372 IntGtOp , IntGeOp , IntEqOp , IntNeOp , IntLtOp , IntLeOp,
373 WordGtOp , WordGeOp , WordEqOp , WordNeOp , WordLtOp , WordLeOp,
374 AddrGtOp , AddrGeOp , AddrEqOp , AddrNeOp , AddrLtOp , AddrLeOp,
375 FloatGtOp , FloatGeOp , FloatEqOp , FloatNeOp , FloatLtOp , FloatLeOp,
376 DoubleGtOp, DoubleGeOp, DoubleEqOp, DoubleNeOp, DoubleLtOp, DoubleLeOp