-Notes on new codegen (Sept 09)\r
+Notes on new codegen (Aug 10)\r
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\r
\r
Things to do:\r
+ - We insert spills for variables before the stack check! This is the reason for\r
+ some fishy code in StgCmmHeap.entryHeapCheck where we are doing some strange\r
+ things to fix up the stack pointer before GC calls/jumps.\r
+\r
+ The reason spills are inserted before the sp check is that at the entry to a\r
+ function we always store the parameters passed in registers to local variables.\r
+ The spill pass simply inserts spills at variable definitions. We instead should\r
+ sink the spills so that we can avoid spilling them on branches that never\r
+ reload them.\r
+\r
+ This will fix the spill before stack check problem but only really as a side\r
+ effect. A 'real fix' probably requires making the spiller know about sp checks.\r
+\r
+ - There is some silly stuff happening with the Sp. We end up with code like:\r
+ Sp = Sp + 8; R1 = _vwf::I64; Sp = Sp -8\r
+ Seems to be perhaps caused by the issue above but also maybe a optimisation\r
+ pass needed?\r
+\r
+ - Proc pass all arguments on the stack, adding more code and slowing down things\r
+ a lot. We either need to fix this or even better would be to get rid of\r
+ proc points.\r
+\r
+ - CmmInfo.cmmToRawCmm uses Old.Cmm, so it is called after converting Cmm.Cmm to\r
+ Old.Cmm. We should abstract it to work on both representations, it needs only to\r
+ convert a CmmInfoTable to [CmmStatic].\r
+\r
+ - The MkGraph currenty uses a different semantics for <*> than Hoopl. Maybe\r
+ we could convert codeGen/StgCmm* clients to the Hoopl's semantics?\r
+ It's all deeply unsatisfactory.\r
+\r
+ - Improve preformance of Hoopl.\r
+\r
+ A nofib comparison of -fasm vs -fnewcodegen nofib compilation parameters\r
+ (using the same ghc-cmm branch +libraries compiled by the old codegenerator)\r
+ is at http://fox.auryn.cz/msrc/0517_hoopl/32bit.oldghcoldgen.oldghchoopl.txt\r
+ - the code produced is 10.9% slower, the compilation is +118% slower!\r
+\r
+ The same comparison with ghc-head with zip representation is at\r
+ http://fox.auryn.cz/msrc/0517_hoopl/32bit.oldghcoldgen.oldghczip.txt\r
+ - the code produced is 11.7% slower, the compilation is +78% slower.\r
+\r
+ When compiling nofib, ghc-cmm + libraries compiled with -fnew-codegen\r
+ is 23.7% slower (http://fox.auryn.cz/msrc/0517_hoopl/32bit.oldghcoldgen.hooplghcoldgen.txt).\r
+ When compiling nofib, ghc-head + libraries compiled with -fnew-codegen\r
+ is 31.4% slower (http://fox.auryn.cz/msrc/0517_hoopl/32bit.oldghcoldgen.zipghcoldgen.txt).\r
+\r
+ So we generate a bit better code, but it takes us longer!\r
+\r
+ - Are all blockToNodeList and blockOfNodeList really needed? Maybe we could\r
+ splice blocks instead?\r
+\r
+ In the CmmContFlowOpt.blockConcat, using Dataflow seems too clumsy. Still,\r
+ a block catenation function would be probably nicer than blockToNodeList\r
+ / blockOfNodeList combo.\r
+\r
+ - loweSafeForeignCall seems too lowlevel. Just use Dataflow. After that\r
+ delete splitEntrySeq from HooplUtils.\r
+\r
+ - manifestSP seems to touch a lot of the graph representation. It is\r
+ also slow for CmmSwitch nodes O(block_nodes * switch_statements).\r
+ Maybe rewrite manifestSP to use Dataflow?\r
+\r
+ - Sort out Label, LabelMap, LabelSet versus BlockId, BlockEnv, BlockSet\r
+ dichotomy. Mostly this means global replace, but we also need to make\r
+ Label an instance of Outputable (probably in the Outputable module).\r
+\r
+ - NB that CmmProcPoint line 283 has a hack that works around a GADT-related\r
+ bug in 6.10.\r
+\r
+ - SDM (2010-02-26) can we remove the Foreign constructor from Convention?\r
+ Reason: we never generate code for a function with the Foreign\r
+ calling convention, and the code for calling foreign calls is generated\r
+\r
+ - AsmCodeGen has a generic Cmm optimiser; move this into new pipeline\r
+\r
+ - AsmCodeGen has post-native-cg branch eliminator (shortCutBranches);\r
+ we ultimately want to share this with the Cmm branch eliminator.\r
+\r
+ - At the moment, references to global registers like Hp are "lowered" \r
+ late (in CgUtils.fixStgRegisters). We should do this early, in the\r
+ new native codegen, much in the way that we lower calling conventions.\r
+ Might need to be a bit sophisticated about aliasing.\r
+\r
+ - Question: currently we lift procpoints to become separate\r
+ CmmProcs. Do we still want to do this?\r
+ \r
+ NB: and advantage of continuing to do this is that\r
+ we can do common-proc elimination!\r
+\r
+ - Move to new Cmm rep:\r
+ * Make native CG consume New Cmm; \r
+ * Convert Old Cmm->New Cmm to keep old path alive\r
+ * Produce New Cmm when reading in .cmm files\r
+\r
+ - Consider module names\r
+\r
- Top-level SRT threading is a bit ugly\r
\r
- Add type/newtype for CmmModule = [CmmGroup] -- A module\r
regardless of SplitObjs. Question: can we *always* generate M.o if there\r
is just one element in the list (rather than M/M1.o, M/M2.o etc)\r
\r
- - Change \r
- type CmmZ = GenCmm CmmStatic CmmInfo (CmmStackInfo, CmmGraph)\r
- to\r
- type CmmZ = GenCmm CmmStatic (CmmInfo, CmmStackInfo) CmmGraph\r
- -- And perhaps take opportunity to prune CmmInfo?\r
-\r
- - Clarify which fields of CmmInfo are still used\r
- - Maybe get rid of CmmFormals arg of CmmProc in all versions?\r
+ One SRT per group.\r
\r
- - We aren't sure whether cmmToRawCmm is actively used by the new pipeline; check\r
- And what does CmmBuildInfoTables do?!\r
-\r
- - Nuke CmmZipUtil, move zipPreds into ZipCfg\r
+ - See "CAFs" below; we want to totally refactor the way SRTs are calculated\r
\r
- Pull out Areas into its own module\r
Parameterise AreaMap\r
type SubArea = (Area, ByteOff, ByteWidth) \r
ByteOff should not be defined in SMRep -- that is too high up the hierarchy\r
\r
+ - SMRep should not be imported by any module in cmm/! Make it so.\r
+ -- ByteOff etc ==> CmmExpr\r
+ -- rET_SMALL etc ==> CmmInfo\r
+ Check that there are no other imports from codeGen in cmm/\r
+\r
+ - If you eliminate a label by branch chain elimination,\r
+ what happens if there's an Area associated with that label?\r
+\r
- Think about a non-flattened representation?\r
\r
- LastCall: \r
http://hackage.haskell.org/trac/ghc/wiki/Commentary/Compiler/NewCodeGenPipeline\r
\r
\r
- - We believe that all of CmmProcPointZ.addProcPointProtocols is dead. What\r
+ - We believe that all of CmmProcPoint.addProcPointProtocols is dead. What\r
goes wrong if we simply never call it?\r
\r
- Something fishy in CmmStackLayout.hs\r
points that are not successors of a call, we think) can be treated\r
uniformly: zero-size Area, and use inSP.\r
\r
-Dead files\r
-~~~~~~~~~~\r
-CmmProcPoint (Michael Adams)\r
-CmmCPS (ditto)\r
+\r
+ - Currently AsmCodeGen top level calls AsmCodeGen.cmmToCmm, which is a small\r
+ C-- optimiser. It has quite a lot of boilerplate folding code in AsmCodeGen\r
+ (cmmBlockConFold, cmmStmtConFold, cmmExprConFold), before calling out to\r
+ CmmOpt. ToDo: see what optimisations are being done; and do them before\r
+ AsmCodeGen.\r
+\r
+ - Modularise the CPS pipeline; instead of ...; A;B;C; ...\r
+ use ..; ABC; ....\r
+\r
+ - Most of HscMain.tryNewCodeGen does not belong in HscMain. Instead\r
+ if new_cg then\r
+ StgCmm.codeGen\r
+ processCmm [including generating "raw" cmm]\r
+ else\r
+ CodeGen.codeGen\r
+ cmmToRawCmm\r
+\r
+\r
+ - If we stick CAF and stack liveness info on a LastCall node (not LastRet/Jump)\r
+ then all CAF and stack liveness stuff be completed before we split\r
+ into separate C procedures.\r
+\r
+ Short term:\r
+ compute and attach liveness into to LastCall\r
+ right at end, split, cvt to old rep\r
+ [must split before cvt, because old rep is not expressive enough]\r
+\r
+ Longer term: \r
+ when old rep disappears, \r
+ move the whole splitting game into the C back end *only*\r
+ (guided by the procpoint set)\r
+\r
+----------------------------------------------------\r
+ Modules in cmm/\r
+----------------------------------------------------\r
+\r
+-------- Testing stuff ------------\r
HscMain.optionallyConvertAndOrCPS\r
testCmmConversion\r
-DynFlags: -fconvert-to-zipper-and-back, -frun-cps, -frun-cpsz\r
+DynFlags: -fconvert-to-zipper-and-back, -frun-cpsz\r
\r
-Proc-points\r
-~~~~~~~~~~~~\r
-Consider this program, which has a diamond control flow, \r
-with a call on one branch\r
- fn(p,x) {\r
- h()\r
- if b then { ... f(x) ...; q=5; goto J }\r
- else { ...; q=7; goto J }\r
- J: ..p...q...\r
- }\r
-then the join point J is a "proc-point". So, is 'p' passed to J\r
-as a parameter? Or, if 'p' was saved on the stack anyway, perhaps\r
-to keep it alive across the call to h(), maybe 'p' gets communicated\r
-to J that way. This is an awkward choice. (We think that we currently\r
-never pass variables to join points via arguments.)\r
+-------- Moribund stuff ------------\r
+OldCmm.hs Definition of flowgraph of old representation\r
+OldCmmUtil.hs Utilites that operates mostly on on CmmStmt\r
+OldPprCmm.hs Pretty print for CmmStmt, GenBasicBlock and ListGraph\r
+CmmCvt.hs Conversion between old and new Cmm reps\r
+CmmOpt.hs Hopefully-redundant optimiser\r
\r
-Furthermore, there is *no way* to pass q to J in a register (other\r
-than a paramter register).\r
+-------- Stuff to keep ------------\r
+CmmCPS.hs Driver for new pipeline\r
\r
-What we want is to do register allocation across the whole caboodle.\r
-Then we could drop all the code that deals with the above awkward\r
-decisions about spilling variables across proc-points.\r
+CmmLive.hs Liveness analysis, dead code elim\r
+CmmProcPoint.hs Identifying and splitting out proc-points\r
\r
-Note that J doesn't need an info table.\r
+CmmSpillReload.hs Save and restore across calls\r
\r
-What we really want is for each Block to have an optional info table.\r
-To do that, we need to be polymorphic over first nodes.\r
+CmmCommonBlockElim.hs Common block elim\r
+CmmContFlowOpt.hs Other optimisations (branch-chain, merging)\r
\r
-Figuring out proc-points\r
-~~~~~~~~~~~~~~~~~~~~~~~~\r
-Proc-points are identified by\r
-CmmProcPointZ.minimalProcPointSet/extendPPSet Although there isn't\r
-that much code, JD thinks that it could be done much more nicely using\r
-a dominator analysis, using the Dataflow Engine.\r
+CmmBuildInfoTables.hs New info-table \r
+CmmStackLayout.hs and stack layout \r
+CmmCallConv.hs\r
+CmmInfo.hs Defn of InfoTables, and conversion to exact byte layout\r
+\r
+---------- Cmm data types --------------\r
+Cmm.hs Cmm instantiations of dataflow graph framework\r
+MkGraph.hs Interface for building Cmm for codeGen/Stg*.hs modules\r
+\r
+CmmDecl.hs Shared Cmm types of both representations\r
+CmmExpr.hs Type of Cmm expression\r
+CmmType.hs Type of Cmm types and their widths\r
+CmmMachOp.hs MachOp type and accompanying utilities\r
+\r
+CmmUtils.hs\r
+CmmLint.hs\r
+\r
+PprC.hs Pretty print Cmm in C syntax\r
+PprCmm.hs Pretty printer for CmmGraph.\r
+PprCmmDecl.hs Pretty printer for common Cmm types.\r
+PprCmmExpr.hs Pretty printer for Cmm expressions.\r
+\r
+CLabel.hs CLabel\r
+BlockId.hs BlockId, BlockEnv, BlockSet\r
\r
----------------------------------------------------\r
Top-level structure\r
* HscMain.tryNewCodeGen\r
- STG->Cmm: StgCmm.codeGen (new codegen)\r
- Optimise: CmmContFlowOpt (simple optimisations, very self contained)\r
- - Cps convert: CmmCPSZ.protoCmmCPSZ \r
+ - Cps convert: CmmCPS.protoCmmCPS \r
- Optimise: CmmContFlowOpt again\r
- Convert: CmmCvt.cmmOfZgraph (convert to old rep) very self contained\r
\r
\r
\r
----------------------------------------------------\r
- CmmCPSZ.protoCmmCPSZ The new pipeline\r
+ CmmCPS.protoCmmCPS The new pipeline\r
----------------------------------------------------\r
\r
-CmmCPSZprotoCmmCPSZ:\r
+CmmCPS.protoCmmCPS:\r
1. Do cpsTop for each procedures separately\r
2. Build SRT representation; this spans multiple procedures\r
(unless split-objs)\r
\r
cpsTop:\r
- * CmmCommonBlockElimZ.elimCommonBlocks:\r
+ * CmmCommonBlockElim.elimCommonBlocks:\r
eliminate common blocks \r
\r
- * CmmProcPointZ.minimalProcPointSet\r
+ * CmmProcPoint.minimalProcPointSet\r
identify proc-points\r
+ no change to graph\r
\r
- * CmmProcPointZ.addProcPointProtocols\r
+ * CmmProcPoint.addProcPointProtocols\r
something to do with the MA optimisation\r
probably entirely unnecessary\r
\r
-\r
* Spill and reload:\r
- CmmSpillReload.dualLivenessWithInsertion\r
insert spills/reloads across \r
\r
- CmmStackLayout.layout\r
Lay out the stack, returning an AreaMap\r
- type AreaMap = FiniteMap Area ByteOff\r
+ type AreaMap = FiniteMap Area ByteOff\r
-- Byte offset of the oldest byte of the Area, \r
-- relative to the oldest byte of the Old Area\r
\r
Manifest the stack pointer\r
\r
* Split into separate procedures\r
- - CmmProcPointZ.procPointAnalysis\r
+ - CmmProcPoint.procPointAnalysis\r
Given set of proc points, which blocks are reachable from each\r
+ Claim: too few proc-points => code duplication, but program still works??\r
\r
- - CmmProcPointZ.splitAtProcPoints\r
+ - CmmProcPoint.splitAtProcPoints\r
Using this info, split into separate procedures\r
\r
+ - CmmBuildInfoTables.setInfoTableStackMap\r
+ Attach stack maps to each info table\r
+\r
+\r
+----------------------------------------------------\r
+ Proc-points\r
+----------------------------------------------------\r
+\r
+Consider this program, which has a diamond control flow, \r
+with a call on one branch\r
+ fn(p,x) {\r
+ h()\r
+ if b then { ... f(x) ...; q=5; goto J }\r
+ else { ...; q=7; goto J }\r
+ J: ..p...q...\r
+ }\r
+then the join point J is a "proc-point". So, is 'p' passed to J\r
+as a parameter? Or, if 'p' was saved on the stack anyway, perhaps\r
+to keep it alive across the call to h(), maybe 'p' gets communicated\r
+to J that way. This is an awkward choice. (We think that we currently\r
+never pass variables to join points via arguments.)\r
+\r
+Furthermore, there is *no way* to pass q to J in a register (other\r
+than a paramter register).\r
+\r
+What we want is to do register allocation across the whole caboodle.\r
+Then we could drop all the code that deals with the above awkward\r
+decisions about spilling variables across proc-points.\r
+\r
+Note that J doesn't need an info table.\r
+\r
+What we really want is for each LastCall (not LastJump/Ret) \r
+to have an info table. Note that ProcPoints that are not successors\r
+of calls don't need an info table.\r
+\r
+Figuring out proc-points\r
+~~~~~~~~~~~~~~~~~~~~~~~~\r
+Proc-points are identified by\r
+CmmProcPoint.minimalProcPointSet/extendPPSet Although there isn't\r
+that much code, JD thinks that it could be done much more nicely using\r
+a dominator analysis, using the Dataflow Engine.\r
+\r
----------------------------------------------------\r
CAFs\r
----------------------------------------------------\r
If f is live, then so is g. f's SRT must include g's closure.\r
\r
* The CLabel for the entry-point/closure reveals whether g is \r
- a CAF (or refers to CAFs). See the IdLabell constructor of CLabel.\r
+ a CAF (or refers to CAFs). See the IdLabel constructor of CLabel.\r
\r
* The CAF-ness of the original top-level defininions is figured out\r
(by TidyPgm) before we generate C--. This CafInfo is only set for\r
- top-level Ids; nested bindings stay with NoCafRefs.\r
+ top-level Ids; nested bindings stay with MayHaveCafRefs.\r
\r
* Currently an SRT contains (only) pointers to (top-level) closures.\r
\r
This generates C-- roughly like this:\r
f_closure: .word f_entry\r
f_entry() [info-tbl-for-f] { ...jump g_entry...jump h2... }\r
- g_entry() [info-tbl-for-g] { ...jump h1 }\r
+ g_entry() [info-tbl-for-g] { ...jump h1... }\r
\r
Note that there is no top-level closure for g (only an info table).\r
- So: info-tbl-for-f must have an SRT that keeps h1,h2 alive\r
+ This fact (whether or not there is a top-level closure) is recorded\r
+ in the InfoTable attached to the CmmProc for f, g\r
+ INVARIANT: \r
+ Any out-of-Group references to an IdLabel goes to\r
+ a Proc whose InfoTable says "I have a top-level closure".\r
+ Equivalently: \r
+ A CmmProc whose InfoTable says "I do not have a top-level\r
+ closure" is referred to only from its own Group.\r
+\r
+* So: info-tbl-for-f must have an SRT that keeps h1,h2 alive\r
info-tbl-for-g must have an SRT that keeps h1 (only) alive\r
\r
But if we just look for the free CAF refs, we get:\r
f's keep-alive refs to include h1.\r
\r
* The SRT info is the C_SRT field of Cmm.ClosureTypeInfo in a\r
- CmmInfoTable attached to each CmmProc. CmmCPSZ.toTops actually does\r
+ CmmInfoTable attached to each CmmProc. CmmCPS.toTops actually does\r
the attaching, right at the end of the pipeline. The C_SRT part\r
gives offsets within a single, shared table of closure pointers.\r
\r
+* DECIDED: we can generate SRTs based on the final Cmm program\r
+ without knowledge of how it is generated.\r
+\r
----------------------------------------------------\r
Foreign calls\r
----------------------------------------------------\r
\r
-See Note [Foreign calls] in ZipCfgCmmRep! This explains that a safe\r
+See Note [Foreign calls] in CmmNode! This explains that a safe\r
foreign call must do this:\r
save thread state\r
push info table (on thread stack) to describe frame\r
Cmm representations\r
----------------------------------------------------\r
\r
-* Cmm.hs\r
+* CmmDecl.hs\r
The type [GenCmm d h g] represents a whole module, \r
** one list element per .o file **\r
Without SplitObjs, the list has exactly one element\r
\r
\r
-------------\r
-OLD BACK END representations (Cmm.hs): \r
+OLD BACK END representations (OldCmm.hs): \r
type Cmm = GenCmm CmmStatic CmmInfo (ListGraph CmmStmt)\r
-- A whole module\r
newtype ListGraph i = ListGraph [GenBasicBlock i]\r
\r
-------------\r
NEW BACK END representations \r
-* Not Cmm-specific at all\r
- ZipCfg.hs defines Graph, LGraph, FGraph,\r
- ZHead, ZTail, ZBlock ...\r
+* Uses Hoopl library, a zero-boot package\r
+* CmmNode defines a node of a flow graph.\r
+* Cmm defines CmmGraph, CmmTop, Cmm\r
+ - CmmGraph is a closed/closed graph + an entry node.\r
\r
- classes LastNode, HavingSuccessors\r
+ data CmmGraph = CmmGraph { g_entry :: BlockId\r
+ , g_graph :: Graph CmmNode C C }\r
\r
- MkZipCfg.hs: AGraph: building graphs\r
+ - CmmTop is a top level chunk, specialization of GenCmmTop from CmmDecl.hs\r
+ with CmmGraph as a flow graph.\r
+ - Cmm is a collection of CmmTops.\r
\r
-* ZipCfgCmmRep: instantiates ZipCfg for Cmm\r
- data Middle = ...CmmExpr...\r
- data Last = ...CmmExpr...\r
- type CmmGraph = Graph Middle Last\r
+ type Cmm = GenCmm CmmStatic CmmTopInfo CmmGraph\r
+ type CmmTop = GenCmmTop CmmStatic CmmTopInfo CmmGraph\r
\r
- type CmmZ = GenCmm CmmStatic CmmInfo (CmmStackInfo, CmmGraph)\r
- type CmmStackInfo = (ByteOff, Maybe ByteOff)\r
- -- (SP offset on entry, update frame space = SP offset on exit)\r
- -- The new codegen produces CmmZ, but once the stack is \r
- -- manifested we can drop that in favour of \r
- -- GenCmm CmmStatic CmmInfo CmmGraph\r
+ - CmmTop uses CmmTopInfo, which is a CmmInfoTable and CmmStackInfo\r
\r
- Inside a CmmProc:\r
- - CLabel: used\r
- - CmmInfo: partly used by NEW\r
- - CmmFormals: not used at all PERHAPS NOT EVEN BY OLD PIPELINE!\r
+ data CmmTopInfo = TopInfo {info_tbl :: CmmInfoTable, stack_info :: CmmStackInfo}\r
\r
-* MkZipCfgCmm.hs: smart constructors for ZipCfgCmmRep\r
- Depends on (a) MkZipCfg (Cmm-independent)\r
- (b) ZipCfgCmmRep (Cmm-specific)\r
+ - CmmStackInfo\r
\r
--------------\r
-* SHARED stuff\r
- CmmExpr.hs defines the Cmm expression types\r
- - CmmExpr, CmmReg, Width, CmmLit, LocalReg, GlobalReg\r
- - CmmType, Width etc (saparate module?)\r
- - MachOp (separate module?)\r
- - Area, AreaId etc (separate module?)\r
+ data CmmStackInfo = StackInfo {arg_space :: ByteOff, updfr_space :: Maybe ByteOff}\r
\r
- BlockId.hs defines BlockId, BlockEnv, BlockSet\r
+ * arg_space = SP offset on entry\r
+ * updfr_space space = SP offset on exit\r
+ Once the staci is manifested, we could drom CmmStackInfo, ie. get\r
+ GenCmm CmmStatic CmmInfoTable CmmGraph, but we do not do that currently.\r
\r
--------------\r
\r
+* MkGraph.hs: smart constructors for Cmm.hs\r
+ Beware, the CmmAGraph defined here does not use AGraph from Hoopl,\r
+ as CmmAGraph can be opened or closed at exit, See the notes in that module.\r
\r
-------------\r
-* Transactions indicate whether or not the result changes: CmmTx \r
- type Tx a = a -> TxRes a\r
- data TxRes a = TxRes ChangeFlag a\r
+* SHARED stuff\r
+ CmmDecl.hs - GenCmm and GenCmmTop types\r
+ CmmExpr.hs - defines the Cmm expression types\r
+ - CmmExpr, CmmReg, CmmLit, LocalReg, GlobalReg\r
+ - Area, AreaId etc (separate module?)\r
+ CmmType.hs - CmmType, Width etc (saparate module?)\r
+ CmmMachOp.hs - MachOp and CallishMachOp types\r
+\r
+ BlockId.hs defines BlockId, BlockEnv, BlockSet\r
+-------------\r