%
-% (c) The AQUA Project, Glasgow University, 1993-1995
+% (c) The AQUA Project, Glasgow University, 1993-1998
%
\begin{code}
+module AsmCodeGen ( nativeCodeGen ) where
+
#include "HsVersions.h"
-#include "../../includes/platform.h"
-#include "../../includes/GhcConstants.h"
-
-module AsmCodeGen (
- writeRealAsm,
- dumpRealAsm,
-
- -- And, I guess we need these...
- AbstractC, GlobalSwitch, SwitchResult,
- UniqSupply, UniqSM(..)
- ) where
-
-import AbsCSyn ( AbstractC )
-import AbsCStixGen ( genCodeAbstractC )
-import PrelInfo ( PrimRep, PrimOp(..)
- IF_ATTACK_PRAGMAS(COMMA tagOf_PrimOp)
- IF_ATTACK_PRAGMAS(COMMA pprPrimOp)
- )
-import CmdLineOpts ( GlobalSwitch(..), stringSwitchSet, switchIsOn, SwitchResult(..) )
-import MachDesc
-import Maybes ( Maybe(..) )
+#include "nativeGen/NCG.h"
+
+import List ( intersperse )
+
+import MachMisc
+import MachRegs
+import MachCode
+import PprMach
+
+import AbsCStixGen ( genCodeAbstractC )
+import AbsCSyn ( AbstractC )
+import AbsCUtils ( mkAbsCStmtList )
+import AsmRegAlloc ( runRegAllocate )
+import PrimOp ( commutableOp, PrimOp(..) )
+import RegAllocInfo ( findReservedRegs )
+import Stix ( StixTree(..), StixReg(..),
+ pprStixTrees, pprStixTree,
+ stixCountTempUses, stixSubst,
+ initNat, mapNat,
+ mkNatM_State,
+ uniqOfNatM_State, deltaOfNatM_State )
+import UniqSupply ( returnUs, thenUs, initUs,
+ UniqSM, UniqSupply,
+ lazyMapUs )
+import MachMisc ( IF_ARCH_i386(i386_insert_ffrees,) )
+
+import OrdList ( concatOL )
import Outputable
-#if alpha_TARGET_ARCH
-import AlphaDesc ( mkAlpha )
-#endif
-#if i386_TARGET_ARCH
-import I386Desc ( mkI386 )
-#endif
-#if sparc_TARGET_ARCH
-import SparcDesc ( mkSparc )
-#endif
-import Stix
-import UniqSupply
-import Unpretty
-import Util
+
\end{code}
-This is a generic assembly language generator for the Glasgow Haskell
-Compiler. It has been a long time in germinating, basically due to
-time constraints and the large spectrum of design possibilities.
-Presently it generates code for:
-\begin{itemize}
-\item Sparc
-\end{itemize}
-In the pipeline (sic) are plans and/or code for 680x0, 386/486.
-
-The code generator presumes the presence of a working C port. This is
-because any code that cannot be compiled (e.g. @casm@s) is re-directed
-via this route. It also help incremental development. Because this
-code generator is specially written for the Abstract C produced by the
-Glasgow Haskell Compiler, several optimisation opportunities are open
-to us that are not open to @gcc@. In particular, we know that the A
-and B stacks and the Heap are all mutually exclusive wrt. aliasing,
-and that expressions have no side effects (all state transformations
-are top level objects).
-
-There are two main components to the code generator.
-\begin{itemize}
-\item Abstract C is considered in statements,
- with a Twig-like system handling each statement in turn.
-\item A scheduler turns the tree of assembly language orderings
- into a sequence suitable for input to an assembler.
-\end{itemize}
-The @codeGenerate@ function returns the final assembly language output
-(as a String). We can return a string, because there is only one way
-of printing the output suitable for assembler consumption. It also
-allows limited abstraction of different machines from the Main module.
-
-The first part is the actual assembly language generation. First we
-split up the Abstract C into individual functions, then consider
-chunks in isolation, giving back an @OrdList@ of assembly language
-instructions. The generic algorithm is heavily inspired by Twig
-(ref), but also draws concepts from (ref). The basic idea is to
-(dynamically) walk the Abstract C syntax tree, annotating it with
-possible code matches. For example, on the Sparc, a possible match
-(with its translation) could be
-@
- :=
- / \
- i r2 => ST r2,[r1]
- |
- r1
-@
-where @r1,r2@ are registers, and @i@ is an indirection. The Twig
-bit twiddling algorithm for tree matching has been abandoned. It is
-replaced with a more direct scheme. This is because, after careful
-consideration it is felt that the overhead of handling many bit
-patterns would be heavier that simply looking at the syntax of the
-tree at the node being considered, and dynamically choosing and
-pruning rules.
-
-The ultimate result of the first part is a Set of ordering lists of
-ordering lists of assembly language instructions (yes, really!), where
-each element in the set is basic chunk. Now several (generic)
-simplifications and transformations can be performed. This includes
-ones that turn the the ordering of orderings into just a single
-ordering list. (The equivalent of applying @concat@ to a list of
-lists.) A lot of the re-ordering and optimisation is actually done
-(generically) here! The final part, the scheduler, can now be used on
-this structure. The code sequence is optimised (obviously) to avoid
-stalling the pipeline. This part {\em has} to be heavily machine
-dependent.
-
-[The above seems to describe mostly dreamware. -- JSM]
-
-The flag that needs to be added is -fasm-<platform> where platform is one of
-the choices below.
+The 96/03 native-code generator has machine-independent and
+machine-dependent modules (those \tr{#include}'ing \tr{NCG.h}).
-\begin{code}
-writeRealAsm :: (GlobalSwitch -> SwitchResult) -> _FILE -> AbstractC -> UniqSupply -> PrimIO ()
+This module (@AsmCodeGen@) is the top-level machine-independent
+module. It uses @AbsCStixGen.genCodeAbstractC@ to produce @StixTree@s
+(defined in module @Stix@), using support code from @StixInfo@ (info
+tables), @StixPrim@ (primitive operations), @StixMacro@ (Abstract C
+macros), and @StixInteger@ (GMP arbitrary-precision operations).
-writeRealAsm flags file absC uniq_supply
- = uppAppendFile file 80 (runNCG (code flags absC) uniq_supply)
+Before entering machine-dependent land, we do some machine-independent
+@genericOpt@imisations (defined below) on the @StixTree@s.
-dumpRealAsm :: (GlobalSwitch -> SwitchResult) -> AbstractC -> UniqSupply -> String
+We convert to the machine-specific @Instr@ datatype with
+@stmt2Instrs@, assuming an ``infinite'' supply of registers. We then
+use a machine-independent register allocator (@runRegAllocate@) to
+rejoin reality. Obviously, @runRegAllocate@ has machine-specific
+helper functions (see about @RegAllocInfo@ below).
-dumpRealAsm flags absC uniq_supply = uppShow 80 (runNCG (code flags absC) uniq_supply)
+The machine-dependent bits break down as follows:
+\begin{description}
+\item[@MachRegs@:] Everything about the target platform's machine
+ registers (and immediate operands, and addresses, which tend to
+ intermingle/interact with registers).
-runNCG m uniq_supply = m uniq_supply
+\item[@MachMisc@:] Includes the @Instr@ datatype (possibly should
+ have a module of its own), plus a miscellany of other things
+ (e.g., @targetDoubleSize@, @smStablePtrTable@, ...)
-code flags absC =
- genCodeAbstractC target absC `thenUs` \ treelists ->
- let
- stix = map (map (genericOpt target)) treelists
- in
- codeGen {-target-} sty stix
- where
- sty = PprForAsm (switchIsOn flags) (underscore {-target-}) (fmtAsmLbl {-target-})
-
- (target, codeGen, underscore, fmtAsmLbl)
- = case stringSwitchSet flags AsmTarget of
-#if ! OMIT_NATIVE_CODEGEN
-# if alpha_TARGET_ARCH
- Just _ {-???"alpha-dec-osf1"-} -> mkAlpha flags
-# endif
-# if i386_TARGET_ARCH
- Just _ {-???"i386_unknown_linuxaout"-} -> mkI386 True flags
-# endif
-# if sparc_sun_sunos4_TARGET
- Just _ {-???"sparc-sun-sunos4"-} -> mkSparc True flags
-# endif
-# if sparc_sun_solaris2_TARGET
- Just _ {-???"sparc-sun-solaris2"-} -> mkSparc False flags
-# endif
-#endif
- _ -> error
- ("ERROR:Trying to generate assembly language for an unsupported architecture\n"++
- "(or one for which this build is not configured).")
+\item[@MachCode@:] @stmt2Instrs@ is where @Stix@ stuff turns into
+ machine instructions.
+
+\item[@PprMach@:] @pprInstr@ turns an @Instr@ into text (well, really
+ an @Doc@).
+
+\item[@RegAllocInfo@:] In the register allocator, we manipulate
+ @MRegsState@s, which are @BitSet@s, one bit per machine register.
+ When we want to say something about a specific machine register
+ (e.g., ``it gets clobbered by this instruction''), we set/unset
+ its bit. Obviously, we do this @BitSet@ thing for efficiency
+ reasons.
+ The @RegAllocInfo@ module collects together the machine-specific
+ info needed to do register allocation.
+\end{description}
+
+So, here we go:
+
+\begin{code}
+nativeCodeGen :: AbstractC -> UniqSupply -> (SDoc, SDoc)
+nativeCodeGen absC us
+ = let absCstmts = mkAbsCStmtList absC
+ (sdoc_pairs, us1) = initUs us (lazyMapUs absCtoNat absCstmts)
+ stix_sdocs = map fst sdoc_pairs
+ insn_sdocs = map snd sdoc_pairs
+
+ insn_sdoc = my_vcat insn_sdocs
+ stix_sdoc = vcat stix_sdocs
+
+# ifdef NCG_DEBUG
+ my_trace m x = trace m x
+ my_vcat sds = vcat (intersperse (char ' '
+ $$ ptext SLIT("# ___ncg_debug_marker")
+ $$ char ' ')
+ sds)
+# else
+ my_vcat sds = vcat sds
+ my_trace m x = x
+# endif
+ in
+ my_trace "nativeGen: begin"
+ (stix_sdoc, insn_sdoc)
+
+
+absCtoNat :: AbstractC -> UniqSM (SDoc, SDoc)
+absCtoNat absC
+ = _scc_ "genCodeAbstractC" genCodeAbstractC absC `thenUs` \ stixRaw ->
+ _scc_ "genericOpt" genericOpt stixRaw `bind` \ stixOpt ->
+ _scc_ "genMachCode" genMachCode stixOpt `thenUs` \ pre_regalloc ->
+ _scc_ "regAlloc" regAlloc pre_regalloc `bind` \ almost_final ->
+ _scc_ "x86fp_kludge" x86fp_kludge almost_final `bind` \ final_mach_code ->
+ _scc_ "vcat" vcat (map pprInstr final_mach_code) `bind` \ final_sdoc ->
+ _scc_ "pprStixTrees" pprStixTrees stixOpt `bind` \ stix_sdoc ->
+ returnUs (stix_sdoc, final_sdoc)
+ where
+ bind f x = x f
+
+ x86fp_kludge :: [Instr] -> [Instr]
+ x86fp_kludge = IF_ARCH_i386(i386_insert_ffrees,id)
+
+ regAlloc :: InstrBlock -> [Instr]
+ regAlloc = runRegAllocate allocatableRegs findReservedRegs
+\end{code}
+
+Top level code generator for a chunk of stix code. For this part of
+the computation, we switch from the UniqSM monad to the NatM monad.
+The latter carries not only a Unique, but also an Int denoting the
+current C stack pointer offset in the generated code; this is needed
+for creating correct spill offsets on architectures which don't offer,
+or for which it would be prohibitively expensive to employ, a frame
+pointer register. Viz, x86.
+
+The offset is measured in bytes, and indicates the difference between
+the current (simulated) C stack-ptr and the value it was at the
+beginning of the block. For stacks which grow down, this value should
+be either zero or negative.
+
+Switching between the two monads whilst carrying along the same Unique
+supply breaks abstraction. Is that bad?
+
+\begin{code}
+genMachCode :: [StixTree] -> UniqSM InstrBlock
+
+genMachCode stmts initial_us
+ = let initial_st = mkNatM_State initial_us 0
+ (blocks, final_st) = initNat initial_st
+ (mapNat stmt2Instrs stmts)
+ instr_list = concatOL blocks
+ final_us = uniqOfNatM_State final_st
+ final_delta = deltaOfNatM_State final_st
+ in
+ if final_delta == 0
+ then (instr_list, final_us)
+ else pprPanic "genMachCode: nonzero final delta"
+ (int final_delta)
\end{code}
%************************************************************************
%* *
%************************************************************************
-This is called between translating Abstract C to its Tree
-and actually using the Native Code Generator to generate
-the annotations. It's a chance to do some strength reductions.
+This is called between translating Abstract C to its Tree and actually
+using the Native Code Generator to generate the annotations. It's a
+chance to do some strength reductions.
** Remember these all have to be machine independent ***
-Note that constant-folding should have already happened, but we might have
-introduced some new opportunities for constant-folding wrt address manipulations.
+Note that constant-folding should have already happened, but we might
+have introduced some new opportunities for constant-folding wrt
+address manipulations.
\begin{code}
-
-genericOpt
- :: Target
- -> StixTree
- -> StixTree
-
+genericOpt :: [StixTree] -> [StixTree]
+genericOpt = map stixConFold . stixPeep
+
+
+
+stixPeep :: [StixTree] -> [StixTree]
+
+-- This transformation assumes that the temp assigned to in t1
+-- is not assigned to in t2; for otherwise the target of the
+-- second assignment would be substituted for, giving nonsense
+-- code. As far as I can see, StixTemps are only ever assigned
+-- to once. It would be nice to be sure!
+
+stixPeep ( t1@(StAssign pka (StReg (StixTemp u pk)) rhs)
+ : t2
+ : ts )
+ | stixCountTempUses u t2 == 1
+ && sum (map (stixCountTempUses u) ts) == 0
+ =
+# ifdef NCG_DEBUG
+ trace ("nativeGen: inlining " ++ showSDoc (pprStixTree rhs))
+# endif
+ (stixPeep (stixSubst u rhs t2 : ts))
+
+stixPeep (t1:t2:ts) = t1 : stixPeep (t2:ts)
+stixPeep [t1] = [t1]
+stixPeep [] = []
+
+-- disable stix inlining until we figure out how to fix the
+-- latent bugs in the register allocator which are exposed by
+-- the inliner.
+--stixPeep = id
\end{code}
For most nodes, just optimize the children.
\begin{code}
--- hacking with Uncle Will:
-#define target_STRICT target@(Target _ _ _ _ _ _ _ _)
-
-genericOpt target_STRICT (StInd pk addr) =
- StInd pk (genericOpt target addr)
+stixConFold :: StixTree -> StixTree
-genericOpt target (StAssign pk dst src) =
- StAssign pk (genericOpt target dst) (genericOpt target src)
+stixConFold (StInd pk addr) = StInd pk (stixConFold addr)
-genericOpt target (StJump addr) =
- StJump (genericOpt target addr)
+stixConFold (StAssign pk dst src)
+ = StAssign pk (stixConFold dst) (stixConFold src)
-genericOpt target (StCondJump addr test) =
- StCondJump addr (genericOpt target test)
+stixConFold (StJump dsts addr) = StJump dsts (stixConFold addr)
-genericOpt target (StCall fn pk args) =
- StCall fn pk (map (genericOpt target) args)
+stixConFold (StCondJump addr test)
+ = StCondJump addr (stixConFold test)
+stixConFold (StCall fn cconv pk args)
+ = StCall fn cconv pk (map stixConFold args)
\end{code}
-Fold indices together when the types match.
-
+Fold indices together when the types match:
\begin{code}
+stixConFold (StIndex pk (StIndex pk' base off) off')
+ | pk == pk'
+ = StIndex pk (stixConFold base)
+ (stixConFold (StPrim IntAddOp [off, off']))
-genericOpt target (StIndex pk (StIndex pk' base off) off')
- | pk == pk' =
- StIndex pk (genericOpt target base)
- (genericOpt target (StPrim IntAddOp [off, off']))
-
-genericOpt target (StIndex pk base off) =
- StIndex pk (genericOpt target base)
- (genericOpt target off)
-
+stixConFold (StIndex pk base off)
+ = StIndex pk (stixConFold base) (stixConFold off)
\end{code}
-For primOps, we first optimize the children, and then we try our hand
+For PrimOps, we first optimize the children, and then we try our hand
at some constant-folding.
\begin{code}
-
-genericOpt target (StPrim op args) =
- primOpt op (map (genericOpt target) args)
-
+stixConFold (StPrim op args) = stixPrimFold op (map stixConFold args)
\end{code}
-Replace register leaves with appropriate StixTrees for the given target.
-(Oh, so this is why we've been hauling the target around!)
+Replace register leaves with appropriate StixTrees for the given
+target.
\begin{code}
+stixConFold leaf@(StReg (StixMagicId id))
+ = case (stgReg id) of
+ Always tree -> stixConFold tree
+ Save _ -> leaf
-genericOpt target leaf@(StReg (StixMagicId id)) =
- case stgReg target id of
- Always tree -> genericOpt target tree
- Save _ -> leaf
-
-genericOpt target other = other
-
+stixConFold other = other
\end{code}
-Now, try to constant-fold the primOps. The arguments have
-already been optimized and folded.
+Now, try to constant-fold the PrimOps. The arguments have already
+been optimized and folded.
\begin{code}
-
-primOpt
+stixPrimFold
:: PrimOp -- The operation from an StPrim
-> [StixTree] -- The optimized arguments
-> StixTree
-primOpt op arg@[StInt x] =
- case op of
+stixPrimFold op arg@[StInt x]
+ = case op of
IntNegOp -> StInt (-x)
- IntAbsOp -> StInt (abs x)
_ -> StPrim op arg
-primOpt op args@[StInt x, StInt y] =
- case op of
- CharGtOp -> StInt (if x > y then 1 else 0)
+stixPrimFold op args@[StInt x, StInt y]
+ = case op of
+ CharGtOp -> StInt (if x > y then 1 else 0)
CharGeOp -> StInt (if x >= y then 1 else 0)
CharEqOp -> StInt (if x == y then 1 else 0)
CharNeOp -> StInt (if x /= y then 1 else 0)
- CharLtOp -> StInt (if x < y then 1 else 0)
+ CharLtOp -> StInt (if x < y then 1 else 0)
CharLeOp -> StInt (if x <= y then 1 else 0)
IntAddOp -> StInt (x + y)
IntSubOp -> StInt (x - y)
IntMulOp -> StInt (x * y)
IntQuotOp -> StInt (x `quot` y)
IntRemOp -> StInt (x `rem` y)
- IntGtOp -> StInt (if x > y then 1 else 0)
+ IntGtOp -> StInt (if x > y then 1 else 0)
IntGeOp -> StInt (if x >= y then 1 else 0)
IntEqOp -> StInt (if x == y then 1 else 0)
IntNeOp -> StInt (if x /= y then 1 else 0)
- IntLtOp -> StInt (if x < y then 1 else 0)
+ IntLtOp -> StInt (if x < y then 1 else 0)
IntLeOp -> StInt (if x <= y then 1 else 0)
+ -- ToDo: WordQuotOp, WordRemOp.
_ -> StPrim op args
-
\end{code}
When possible, shift the constants to the right-hand side, so that we
can match for strength reductions. Note that the code generator will
-also assume that constants have been shifted to the right when possible.
+also assume that constants have been shifted to the right when
+possible.
\begin{code}
-primOpt op [x@(StInt _), y] | commutableOp op = primOpt op [y, x]
+stixPrimFold op [x@(StInt _), y] | commutableOp op = stixPrimFold op [y, x]
\end{code}
We can often do something with constants of 0 and 1 ...
\begin{code}
-primOpt op args@[x, y@(StInt 0)] =
- case op of
+stixPrimFold op args@[x, y@(StInt 0)]
+ = case op of
IntAddOp -> x
IntSubOp -> x
IntMulOp -> y
- AndOp -> y
- OrOp -> x
- SllOp -> x
- SraOp -> x
- SrlOp -> x
- ISllOp -> x
- ISraOp -> x
- ISrlOp -> x
- _ -> StPrim op args
-
-primOpt op args@[x, y@(StInt 1)] =
- case op of
- IntMulOp -> x
+ AndOp -> y
+ OrOp -> x
+ XorOp -> x
+ SllOp -> x
+ SrlOp -> x
+ ISllOp -> x
+ ISraOp -> x
+ ISrlOp -> x
+ IntNeOp | is_comparison -> x
+ _ -> StPrim op args
+ where
+ is_comparison
+ = case x of
+ StPrim opp [_, _] -> opp `elem` comparison_ops
+ _ -> False
+
+stixPrimFold op args@[x, y@(StInt 1)]
+ = case op of
+ IntMulOp -> x
IntQuotOp -> x
- IntRemOp -> StInt 0
- _ -> StPrim op args
+ IntRemOp -> StInt 0
+ _ -> StPrim op args
\end{code}
Now look for multiplication/division by powers of 2 (integers).
\begin{code}
-primOpt op args@[x, y@(StInt n)] =
- case op of
- IntMulOp -> case exact_log2 n of
+stixPrimFold op args@[x, y@(StInt n)]
+ = case op of
+ IntMulOp -> case exactLog2 n of
Nothing -> StPrim op args
- Just p -> StPrim SllOp [x, StInt p]
- IntQuotOp -> case exact_log2 n of
+ Just p -> StPrim ISllOp [x, StInt p]
+ IntQuotOp -> case exactLog2 n of
Nothing -> StPrim op args
- Just p -> StPrim SraOp [x, StInt p]
+ Just p -> StPrim ISrlOp [x, StInt p]
_ -> StPrim op args
\end{code}
Anything else is just too hard.
\begin{code}
-primOpt op args = StPrim op args
-\end{code}
-
-The commutable ops are those for which we will try to move constants
-to the right hand side for strength reduction.
-
-\begin{code}
-commutableOp :: PrimOp -> Bool
-
-commutableOp CharEqOp = True
-commutableOp CharNeOp = True
-commutableOp IntAddOp = True
-commutableOp IntMulOp = True
-commutableOp AndOp = True
-commutableOp OrOp = True
-commutableOp IntEqOp = True
-commutableOp IntNeOp = True
-commutableOp IntegerAddOp = True
-commutableOp IntegerMulOp = True
-commutableOp FloatAddOp = True
-commutableOp FloatMulOp = True
-commutableOp FloatEqOp = True
-commutableOp FloatNeOp = True
-commutableOp DoubleAddOp = True
-commutableOp DoubleMulOp = True
-commutableOp DoubleEqOp = True
-commutableOp DoubleNeOp = True
-commutableOp _ = False
+stixPrimFold op args = StPrim op args
\end{code}
-This algorithm for determining the $\log_2$ of exact powers of 2 comes
-from gcc. It requires bit manipulation primitives, so we have a ghc
-version and an hbc version. Other Haskell compilers are on their own.
-
\begin{code}
-w2i x = word2Int# x
-i2w x = int2Word# x
-i2w_s x = (x::Int#)
-
-exact_log2 :: Integer -> Maybe Integer
-exact_log2 x
- | x <= 0 || x >= 2147483648 = Nothing
- | otherwise = case fromInteger x of
- I# x# -> if (w2i ((i2w x#) `and#` (i2w (0# -# x#))) /=# x#) then Nothing
- else Just (toInteger (I# (pow2 x#)))
-
- where pow2 x# | x# ==# 1# = 0#
- | otherwise = 1# +# pow2 (w2i (i2w x# `shiftr` i2w_s 1#))
-
- shiftr x y = shiftRA# x y
+comparison_ops
+ = [ CharGtOp , CharGeOp , CharEqOp , CharNeOp , CharLtOp , CharLeOp,
+ IntGtOp , IntGeOp , IntEqOp , IntNeOp , IntLtOp , IntLeOp,
+ WordGtOp , WordGeOp , WordEqOp , WordNeOp , WordLtOp , WordLeOp,
+ AddrGtOp , AddrGeOp , AddrEqOp , AddrNeOp , AddrLtOp , AddrLeOp,
+ FloatGtOp , FloatGeOp , FloatEqOp , FloatNeOp , FloatLtOp , FloatLeOp,
+ DoubleGtOp, DoubleGeOp, DoubleEqOp, DoubleNeOp, DoubleLtOp, DoubleLeOp
+ ]
\end{code}