2 % (c) The AQUA Project, Glasgow University, 1993-1998
6 module AsmCodeGen ( nativeCodeGen ) where
8 #include "HsVersions.h"
9 #include "nativeGen/NCG.h"
12 import List ( intersperse )
19 import AbsCStixGen ( genCodeAbstractC )
20 import AbsCSyn ( AbstractC, MagicId )
21 import AsmRegAlloc ( runRegAllocate )
22 import PrimOp ( commutableOp, PrimOp(..) )
23 import RegAllocInfo ( mkMRegsState, MRegsState, findReservedRegs )
24 import Stix ( StixTree(..), StixReg(..),
25 pprStixTrees, ppStixTree, CodeSegment(..),
26 stixCountTempUses, stixSubst,
27 NatM, initNat, mapNat,
28 NatM_State, mkNatM_State,
29 uniqOfNatM_State, deltaOfNatM_State )
30 import PrimRep ( isFloatingRep, PrimRep(..) )
31 import UniqSupply ( returnUs, thenUs, mapUs, initUs,
32 initUs_, UniqSM, UniqSupply )
33 import MachMisc ( IF_ARCH_i386(i386_insert_ffrees,) )
35 import OrdList ( fromOL, concatOL )
40 The 96/03 native-code generator has machine-independent and
41 machine-dependent modules (those \tr{#include}'ing \tr{NCG.h}).
43 This module (@AsmCodeGen@) is the top-level machine-independent
44 module. It uses @AbsCStixGen.genCodeAbstractC@ to produce @StixTree@s
45 (defined in module @Stix@), using support code from @StixInfo@ (info
46 tables), @StixPrim@ (primitive operations), @StixMacro@ (Abstract C
47 macros), and @StixInteger@ (GMP arbitrary-precision operations).
49 Before entering machine-dependent land, we do some machine-independent
50 @genericOpt@imisations (defined below) on the @StixTree@s.
52 We convert to the machine-specific @Instr@ datatype with
53 @stmt2Instrs@, assuming an ``infinite'' supply of registers. We then
54 use a machine-independent register allocator (@runRegAllocate@) to
55 rejoin reality. Obviously, @runRegAllocate@ has machine-specific
56 helper functions (see about @RegAllocInfo@ below).
58 The machine-dependent bits break down as follows:
60 \item[@MachRegs@:] Everything about the target platform's machine
61 registers (and immediate operands, and addresses, which tend to
62 intermingle/interact with registers).
64 \item[@MachMisc@:] Includes the @Instr@ datatype (possibly should
65 have a module of its own), plus a miscellany of other things
66 (e.g., @targetDoubleSize@, @smStablePtrTable@, ...)
68 \item[@MachCode@:] @stmt2Instrs@ is where @Stix@ stuff turns into
71 \item[@PprMach@:] @pprInstr@ turns an @Instr@ into text (well, really
74 \item[@RegAllocInfo@:] In the register allocator, we manipulate
75 @MRegsState@s, which are @BitSet@s, one bit per machine register.
76 When we want to say something about a specific machine register
77 (e.g., ``it gets clobbered by this instruction''), we set/unset
78 its bit. Obviously, we do this @BitSet@ thing for efficiency
81 The @RegAllocInfo@ module collects together the machine-specific
82 info needed to do register allocation.
88 nativeCodeGen :: AbstractC -> UniqSupply -> (SDoc, SDoc)
90 = let (stixRaw, us1) = initUs us (genCodeAbstractC absC)
91 stixOpt = map genericOpt stixRaw
92 insns = initUs_ us1 (codeGen stixOpt)
93 debug_stix = vcat (map pprStixTrees stixOpt)
94 in {- trace "nativeGen: begin" -} (debug_stix, insns)
97 @codeGen@ is the top-level code-generation function:
99 codeGen :: [[StixTree]] -> UniqSM SDoc
102 = mapUs genMachCode stixFinal `thenUs` \ dynamic_codes ->
104 fp_kludge :: [Instr] -> [Instr]
105 fp_kludge = IF_ARCH_i386(i386_insert_ffrees,id)
107 static_instrss :: [[Instr]]
108 static_instrss = map fp_kludge (scheduleMachCode dynamic_codes)
109 docs = map (vcat . map pprInstr) static_instrss
111 -- for debugging only
112 docs_prealloc = map (vcat . map pprInstr . fromOL)
114 text_prealloc = vcat (intersperse (char ' ' $$ char ' ') docs_prealloc)
116 --trace (showSDoc text_prealloc) (
117 returnUs (vcat (intersperse (char ' '
118 $$ ptext SLIT("# ___stg_split_marker")
124 Top level code generator for a chunk of stix code. For this part of
125 the computation, we switch from the UniqSM monad to the NatM monad.
126 The latter carries not only a Unique, but also an Int denoting the
127 current C stack pointer offset in the generated code; this is needed
128 for creating correct spill offsets on architectures which don't offer,
129 or for which it would be prohibitively expensive to employ, a frame
130 pointer register. Viz, x86.
132 The offset is measured in bytes, and indicates the difference between
133 the current (simulated) C stack-ptr and the value it was at the
134 beginning of the block. For stacks which grow down, this value should
135 be either zero or negative.
137 Switching between the two monads whilst carrying along the same Unique
138 supply breaks abstraction. Is that bad?
141 genMachCode :: [StixTree] -> UniqSM InstrBlock
143 genMachCode stmts initial_us
144 = let initial_st = mkNatM_State initial_us 0
145 (blocks, final_st) = initNat initial_st
146 (mapNat stmt2Instrs stmts)
147 instr_list = concatOL blocks
148 final_us = uniqOfNatM_State final_st
149 final_delta = deltaOfNatM_State final_st
152 then (instr_list, final_us)
153 else pprPanic "genMachCode: nonzero final delta"
157 The next bit does the code scheduling. The scheduler must also deal
158 with register allocation of temporaries. Much parallelism can be
159 exposed via the OrdList, but more might occur, so further analysis
163 scheduleMachCode :: [InstrBlock] -> [[Instr]]
166 = map (runRegAllocate freeRegsState findReservedRegs)
168 freeRegsState = mkMRegsState (extractMappedRegNos freeRegs)
171 %************************************************************************
173 \subsection[NCOpt]{The Generic Optimiser}
175 %************************************************************************
177 This is called between translating Abstract C to its Tree and actually
178 using the Native Code Generator to generate the annotations. It's a
179 chance to do some strength reductions.
181 ** Remember these all have to be machine independent ***
183 Note that constant-folding should have already happened, but we might
184 have introduced some new opportunities for constant-folding wrt
185 address manipulations.
188 genericOpt :: [StixTree] -> [StixTree]
189 genericOpt = map stixConFold . stixPeep
193 stixPeep :: [StixTree] -> [StixTree]
195 -- This transformation assumes that the temp assigned to in t1
196 -- is not assigned to in t2; for otherwise the target of the
197 -- second assignment would be substituted for, giving nonsense
198 -- code. As far as I can see, StixTemps are only ever assigned
199 -- to once. It would be nice to be sure!
201 stixPeep ( t1@(StAssign pka (StReg (StixTemp u pk)) rhs)
204 | stixCountTempUses u t2 == 1
205 && sum (map (stixCountTempUses u) ts) == 0
206 = trace ("nativeGen: stixInline: " ++ showSDoc (ppStixTree rhs))
207 (stixPeep (stixSubst u rhs t2 : ts))
209 stixPeep (t1:t2:ts) = t1 : stixPeep (t2:ts)
214 -- disable stix inlining until we figure out how to fix the
215 -- latent bugs in the register allocator which are exposed by
220 For most nodes, just optimize the children.
223 stixConFold :: StixTree -> StixTree
225 stixConFold (StInd pk addr) = StInd pk (stixConFold addr)
227 stixConFold (StAssign pk dst src)
228 = StAssign pk (stixConFold dst) (stixConFold src)
230 stixConFold (StJump addr) = StJump (stixConFold addr)
232 stixConFold (StCondJump addr test)
233 = StCondJump addr (stixConFold test)
235 stixConFold (StCall fn cconv pk args)
236 = StCall fn cconv pk (map stixConFold args)
239 Fold indices together when the types match:
241 stixConFold (StIndex pk (StIndex pk' base off) off')
243 = StIndex pk (stixConFold base)
244 (stixConFold (StPrim IntAddOp [off, off']))
246 stixConFold (StIndex pk base off)
247 = StIndex pk (stixConFold base) (stixConFold off)
250 For PrimOps, we first optimize the children, and then we try our hand
251 at some constant-folding.
254 stixConFold (StPrim op args) = stixPrimFold op (map stixConFold args)
257 Replace register leaves with appropriate StixTrees for the given
261 stixConFold leaf@(StReg (StixMagicId id))
262 = case (stgReg id) of
263 Always tree -> stixConFold tree
266 stixConFold other = other
269 Now, try to constant-fold the PrimOps. The arguments have already
270 been optimized and folded.
274 :: PrimOp -- The operation from an StPrim
275 -> [StixTree] -- The optimized arguments
278 stixPrimFold op arg@[StInt x]
280 IntNegOp -> StInt (-x)
283 stixPrimFold op args@[StInt x, StInt y]
285 CharGtOp -> StInt (if x > y then 1 else 0)
286 CharGeOp -> StInt (if x >= y then 1 else 0)
287 CharEqOp -> StInt (if x == y then 1 else 0)
288 CharNeOp -> StInt (if x /= y then 1 else 0)
289 CharLtOp -> StInt (if x < y then 1 else 0)
290 CharLeOp -> StInt (if x <= y then 1 else 0)
291 IntAddOp -> StInt (x + y)
292 IntSubOp -> StInt (x - y)
293 IntMulOp -> StInt (x * y)
294 IntQuotOp -> StInt (x `quot` y)
295 IntRemOp -> StInt (x `rem` y)
296 IntGtOp -> StInt (if x > y then 1 else 0)
297 IntGeOp -> StInt (if x >= y then 1 else 0)
298 IntEqOp -> StInt (if x == y then 1 else 0)
299 IntNeOp -> StInt (if x /= y then 1 else 0)
300 IntLtOp -> StInt (if x < y then 1 else 0)
301 IntLeOp -> StInt (if x <= y then 1 else 0)
302 -- ToDo: WordQuotOp, WordRemOp.
306 When possible, shift the constants to the right-hand side, so that we
307 can match for strength reductions. Note that the code generator will
308 also assume that constants have been shifted to the right when
312 stixPrimFold op [x@(StInt _), y] | commutableOp op = stixPrimFold op [y, x]
315 We can often do something with constants of 0 and 1 ...
318 stixPrimFold op args@[x, y@(StInt 0)]
331 IntNeOp | is_comparison -> x
336 StPrim opp [_, _] -> opp `elem` comparison_ops
339 stixPrimFold op args@[x, y@(StInt 1)]
347 Now look for multiplication/division by powers of 2 (integers).
350 stixPrimFold op args@[x, y@(StInt n)]
352 IntMulOp -> case exactLog2 n of
353 Nothing -> StPrim op args
354 Just p -> StPrim ISllOp [x, StInt p]
355 IntQuotOp -> case exactLog2 n of
356 Nothing -> StPrim op args
357 Just p -> StPrim ISrlOp [x, StInt p]
361 Anything else is just too hard.
364 stixPrimFold op args = StPrim op args
369 = [ CharGtOp , CharGeOp , CharEqOp , CharNeOp , CharLtOp , CharLeOp,
370 IntGtOp , IntGeOp , IntEqOp , IntNeOp , IntLtOp , IntLeOp,
371 WordGtOp , WordGeOp , WordEqOp , WordNeOp , WordLtOp , WordLeOp,
372 AddrGtOp , AddrGeOp , AddrEqOp , AddrNeOp , AddrLtOp , AddrLeOp,
373 FloatGtOp , FloatGeOp , FloatEqOp , FloatNeOp , FloatLtOp , FloatLeOp,
374 DoubleGtOp, DoubleGeOp, DoubleEqOp, DoubleNeOp, DoubleLtOp, DoubleLeOp