X-Git-Url: http://git.megacz.com/?p=ghc-hetmet.git;a=blobdiff_plain;f=compiler%2Fcmm%2Fcmm-notes;h=98c2e836994c8e05990c47205af7de371a4d2ff2;hp=5ec489571f223ec50a6c8b817295d4d1b621587e;hb=d25676a6b1c42495702048b6ca6f26ebd15205d8;hpb=2bb3a439c106935d97fae7f7a0b60c21493d1bef diff --git a/compiler/cmm/cmm-notes b/compiler/cmm/cmm-notes index 5ec4895..98c2e83 100644 --- a/compiler/cmm/cmm-notes +++ b/compiler/cmm/cmm-notes @@ -1,7 +1,146 @@ -Notes on new codegen (Sept 09) +More notes (June 11) +~~~~~~~~~~~~~~~~~~~~ +* Kill dead code assignArguments, argumentsSize in CmmCallConv. + Bake in ByteOff to ParamLocation and ArgumentFormat + CmmActuals -> [CmmActual] similary CmmFormals + +* Possible refactoring: Nuke AGraph in favour of + mkIfThenElse :: Expr -> Graph -> Graph -> FCode Graph + or even + mkIfThenElse :: HasUniques m => Expr -> Graph -> Graph -> m Graph + (Remmber that the .cmm file parser must use this function) + + or parameterise FCode over its envt; the CgState part seem useful for both + +* Move top and tail calls to runCmmContFlowOpts from HscMain to CmmCps.cpsTop + (and rename the latter!) + +* "Remove redundant reloads" in CmmSpillReload should be redundant; since + insertLateReloads is now gone, every reload is reloading a live variable. + Test and nuke. + +* Sink and inline S(RegSlot(x)) = e in precisely the same way that we + sink and inline x = e + +* Stack layout is very like register assignment: find non-conflicting assigments. + In particular we can use colouring or linear scan (etc). + + We'd fine-grain interference (on a word by word basis) to get maximum overlap. + But that may make very big interference graphs. So linear scan might be + more attactive. + + NB: linear scan does on-the-fly live range splitting. + +* When stubbing dead slots be careful not to write into an area that + overlaps with an area that's in use. So stubbing needs to *follow* + stack layout. + + +More notes (May 11) +~~~~~~~~~~~~~~~~~~~ +In CmmNode, consider spliting CmmCall into two: call and jump + +Notes on new codegen (Aug 10) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Things to do: + - We insert spills for variables before the stack check! This is the reason for + some fishy code in StgCmmHeap.entryHeapCheck where we are doing some strange + things to fix up the stack pointer before GC calls/jumps. + + The reason spills are inserted before the sp check is that at the entry to a + function we always store the parameters passed in registers to local variables. + The spill pass simply inserts spills at variable definitions. We instead should + sink the spills so that we can avoid spilling them on branches that never + reload them. + + This will fix the spill before stack check problem but only really as a side + effect. A 'real fix' probably requires making the spiller know about sp checks. + + EZY: I don't understand this comment. David Terei, can you clarify? + + - Proc points pass all arguments on the stack, adding more code and + slowing down things a lot. We either need to fix this or even better + would be to get rid of proc points. + + - CmmInfo.cmmToRawCmm uses Old.Cmm, so it is called after converting Cmm.Cmm to + Old.Cmm. We should abstract it to work on both representations, it needs only to + convert a CmmInfoTable to [CmmStatic]. + + - The MkGraph currenty uses a different semantics for <*> than Hoopl. Maybe + we could convert codeGen/StgCmm* clients to the Hoopl's semantics? + It's all deeply unsatisfactory. + + - Improve performance of Hoopl. + + A nofib comparison of -fasm vs -fnewcodegen nofib compilation parameters + (using the same ghc-cmm branch +libraries compiled by the old codegenerator) + is at http://fox.auryn.cz/msrc/0517_hoopl/32bit.oldghcoldgen.oldghchoopl.txt + - the code produced is 10.9% slower, the compilation is +118% slower! + + The same comparison with ghc-head with zip representation is at + http://fox.auryn.cz/msrc/0517_hoopl/32bit.oldghcoldgen.oldghczip.txt + - the code produced is 11.7% slower, the compilation is +78% slower. + + When compiling nofib, ghc-cmm + libraries compiled with -fnew-codegen + is 23.7% slower (http://fox.auryn.cz/msrc/0517_hoopl/32bit.oldghcoldgen.hooplghcoldgen.txt). + When compiling nofib, ghc-head + libraries compiled with -fnew-codegen + is 31.4% slower (http://fox.auryn.cz/msrc/0517_hoopl/32bit.oldghcoldgen.zipghcoldgen.txt). + + So we generate a bit better code, but it takes us longer! + + EZY: Also importantly, Hoopl uses dramatically more memory than the + old code generator. + + - Are all blockToNodeList and blockOfNodeList really needed? Maybe we could + splice blocks instead? + + In the CmmContFlowOpt.blockConcat, using Dataflow seems too clumsy. Still, + a block catenation function would be probably nicer than blockToNodeList + / blockOfNodeList combo. + + - lowerSafeForeignCall seems too lowlevel. Just use Dataflow. After that + delete splitEntrySeq from HooplUtils. + + - manifestSP seems to touch a lot of the graph representation. It is + also slow for CmmSwitch nodes O(block_nodes * switch_statements). + Maybe rewrite manifestSP to use Dataflow? + + - Sort out Label, LabelMap, LabelSet versus BlockId, BlockEnv, BlockSet + dichotomy. Mostly this means global replace, but we also need to make + Label an instance of Outputable (probably in the Outputable module). + + - NB that CmmProcPoint line 283 has a hack that works around a GADT-related + bug in 6.10. + + - SDM (2010-02-26) can we remove the Foreign constructor from Convention? + Reason: we never generate code for a function with the Foreign + calling convention, and the code for calling foreign calls is generated + + - AsmCodeGen has a generic Cmm optimiser; move this into new pipeline + EZY (2011-04-16): The mini-inliner has been generalized and ported, + but the constant folding and other optimizations need to still be + ported. + + - AsmCodeGen has post-native-cg branch eliminator (shortCutBranches); + we ultimately want to share this with the Cmm branch eliminator. + + - At the moment, references to global registers like Hp are "lowered" + late (in CgUtils.fixStgRegisters). We should do this early, in the + new native codegen, much in the way that we lower calling conventions. + Might need to be a bit sophisticated about aliasing. + + - Question: currently we lift procpoints to become separate + CmmProcs. Do we still want to do this? + + NB: and advantage of continuing to do this is that + we can do common-proc elimination! + + - Move to new Cmm rep: + * Make native CG consume New Cmm; + * Convert Old Cmm->New Cmm to keep old path alive + * Produce New Cmm when reading in .cmm files + - Consider module names - Top-level SRT threading is a bit ugly @@ -18,22 +157,8 @@ Things to do: - See "CAFs" below; we want to totally refactor the way SRTs are calculated - - Change - type CmmZ = GenCmm CmmStatic CmmInfo (CmmStackInfo, CmmGraph) - to - type CmmZ = GenCmm CmmStatic (CmmInfo, CmmStackInfo) CmmGraph - -- And perhaps take opportunity to prune CmmInfo? - - - Clarify which fields of CmmInfo are still used - - Maybe get rid of CmmFormals arg of CmmProc in all versions? - - - We aren't sure whether cmmToRawCmm is actively used by the new pipeline; check - And what does CmmBuildInfoTables do?! - - - Nuke CmmZipUtil, move zipPreds into ZipCfg - - Pull out Areas into its own module - Parameterise AreaMap + Parameterise AreaMap (note there are type synonyms in CmmStackLayout!) Add ByteWidth = Int type SubArea = (Area, ByteOff, ByteWidth) ByteOff should not be defined in SMRep -- that is too high up the hierarchy @@ -43,6 +168,9 @@ Things to do: -- rET_SMALL etc ==> CmmInfo Check that there are no other imports from codeGen in cmm/ + - If you eliminate a label by branch chain elimination, + what happens if there's an Area associated with that label? + - Think about a non-flattened representation? - LastCall: @@ -65,7 +193,7 @@ Things to do: http://hackage.haskell.org/trac/ghc/wiki/Commentary/Compiler/NewCodeGenPipeline - - We believe that all of CmmProcPointZ.addProcPointProtocols is dead. What + - We believe that all of CmmProcPoint.addProcPointProtocols is dead. What goes wrong if we simply never call it? - Something fishy in CmmStackLayout.hs @@ -110,75 +238,57 @@ Things to do: move the whole splitting game into the C back end *only* (guided by the procpoint set) - ---------------------------------------------------- Modules in cmm/ ---------------------------------------------------- --------- Dead stuff ------------ -CmmProcPoint Dead: Michael Adams -CmmCPS Dead: Michael Adams -CmmCPSGen.hs Dead: Michael Adams -CmmBrokenBlock.hs Dead: Michael Adams -CmmLive.hs Dead: Michael Adams -CmmProcPoint.hs Dead: Michael Adams -Dataflow.hs Dead: Michael Adams -StackColor.hs Norman? -StackPlacements.hs Norman? - +-------- Testing stuff ------------ HscMain.optionallyConvertAndOrCPS testCmmConversion -DynFlags: -fconvert-to-zipper-and-back, -frun-cps, -frun-cpsz +DynFlags: -fconvert-to-zipper-and-back, -frun-cpsz -------- Moribund stuff ------------ +OldCmm.hs Definition of flowgraph of old representation +OldCmmUtil.hs Utilites that operates mostly on on CmmStmt +OldPprCmm.hs Pretty print for CmmStmt, GenBasicBlock and ListGraph CmmCvt.hs Conversion between old and new Cmm reps CmmOpt.hs Hopefully-redundant optimiser -CmmZipUtil.hs Only one function; move elsewhere -------- Stuff to keep ------------ -CmmCPSZ.hs Driver for new pipeline +CmmCPS.hs Driver for new pipeline -CmmLiveZ.hs Liveness analysis, dead code elim -CmmProcPointZ.hs Identifying and splitting out proc-points +CmmLive.hs Liveness analysis, dead code elim +CmmProcPoint.hs Identifying and splitting out proc-points CmmSpillReload.hs Save and restore across calls -CmmCommonBlockElimZ.hs Common block elim +CmmCommonBlockElim.hs Common block elim CmmContFlowOpt.hs Other optimisations (branch-chain, merging) CmmBuildInfoTables.hs New info-table CmmStackLayout.hs and stack layout CmmCallConv.hs -CmmInfo.hs Defn of InfoTables, and conversion to exact layout +CmmInfo.hs Defn of InfoTables, and conversion to exact byte layout ---------- Cmm data types -------------- -ZipCfgCmmRep.hs Cmm instantiations of dataflow graph framework -MkZipCfgCmm.hs Cmm instantiations of dataflow graph framework +Cmm.hs Cmm instantiations of dataflow graph framework +MkGraph.hs Interface for building Cmm for codeGen/Stg*.hs modules + +CmmDecl.hs Shared Cmm types of both representations +CmmExpr.hs Type of Cmm expression +CmmType.hs Type of Cmm types and their widths +CmmMachOp.hs MachOp type and accompanying utilities -Cmm.hs Key module; a mix of old and new stuff - so needs tidying up in due course -CmmExpr.hs CmmUtils.hs CmmLint.hs PprC.hs Pretty print Cmm in C syntax -PprCmm.hs Pretty printer for Cmm -PprCmmZ.hs Additional stuff for zipper rep - -CLabel.hs CLabel - ----------- Dataflow modules -------------- - Goal: separate library; for now, separate directory - -MkZipCfg.hs -ZipCfg.hs -ZipCfgExtras.hs -ZipDataflow.hs -CmmTx.hs Transactions -OptimizationFuel.hs Fuel -BlockId.hs BlockId, BlockEnv, BlockSet -DFMonad.hs +PprCmm.hs Pretty printer for CmmGraph. +PprCmmDecl.hs Pretty printer for common Cmm types. +PprCmmExpr.hs Pretty printer for Cmm expressions. +CLabel.hs CLabel +BlockId.hs BlockId, BlockEnv, BlockSet ---------------------------------------------------- Top-level structure @@ -194,7 +304,7 @@ DFMonad.hs * HscMain.tryNewCodeGen - STG->Cmm: StgCmm.codeGen (new codegen) - Optimise: CmmContFlowOpt (simple optimisations, very self contained) - - Cps convert: CmmCPSZ.protoCmmCPSZ + - Cps convert: CmmCPS.protoCmmCPS - Optimise: CmmContFlowOpt again - Convert: CmmCvt.cmmOfZgraph (convert to old rep) very self contained @@ -203,23 +313,23 @@ DFMonad.hs ---------------------------------------------------- - CmmCPSZ.protoCmmCPSZ The new pipeline + CmmCPS.protoCmmCPS The new pipeline ---------------------------------------------------- -CmmCPSZprotoCmmCPSZ: +CmmCPS.protoCmmCPS: 1. Do cpsTop for each procedures separately 2. Build SRT representation; this spans multiple procedures (unless split-objs) cpsTop: - * CmmCommonBlockElimZ.elimCommonBlocks: + * CmmCommonBlockElim.elimCommonBlocks: eliminate common blocks - * CmmProcPointZ.minimalProcPointSet + * CmmProcPoint.minimalProcPointSet identify proc-points no change to graph - * CmmProcPointZ.addProcPointProtocols + * CmmProcPoint.addProcPointProtocols something to do with the MA optimisation probably entirely unnecessary @@ -228,8 +338,8 @@ cpsTop: insert spills/reloads across LastCalls, and Branches to proc-points - Now sink those reloads: - - CmmSpillReload.insertLateReloads + Now sink those reloads (and other instructions): + - CmmSpillReload.rewriteAssignments - CmmSpillReload.removeDeadAssignmentsAndReloads * CmmStackLayout.stubSlotsOnDeath @@ -249,11 +359,11 @@ cpsTop: Manifest the stack pointer * Split into separate procedures - - CmmProcPointZ.procPointAnalysis + - CmmProcPoint.procPointAnalysis Given set of proc points, which blocks are reachable from each Claim: too few proc-points => code duplication, but program still works?? - - CmmProcPointZ.splitAtProcPoints + - CmmProcPoint.splitAtProcPoints Using this info, split into separate procedures - CmmBuildInfoTables.setInfoTableStackMap @@ -279,7 +389,7 @@ to J that way. This is an awkward choice. (We think that we currently never pass variables to join points via arguments.) Furthermore, there is *no way* to pass q to J in a register (other -than a paramter register). +than a parameter register). What we want is to do register allocation across the whole caboodle. Then we could drop all the code that deals with the above awkward @@ -294,7 +404,7 @@ of calls don't need an info table. Figuring out proc-points ~~~~~~~~~~~~~~~~~~~~~~~~ Proc-points are identified by -CmmProcPointZ.minimalProcPointSet/extendPPSet Although there isn't +CmmProcPoint.minimalProcPointSet/extendPPSet Although there isn't that much code, JD thinks that it could be done much more nicely using a dominator analysis, using the Dataflow Engine. @@ -347,7 +457,7 @@ a dominator analysis, using the Dataflow Engine. f's keep-alive refs to include h1. * The SRT info is the C_SRT field of Cmm.ClosureTypeInfo in a - CmmInfoTable attached to each CmmProc. CmmCPSZ.toTops actually does + CmmInfoTable attached to each CmmProc. CmmCPS.toTops actually does the attaching, right at the end of the pipeline. The C_SRT part gives offsets within a single, shared table of closure pointers. @@ -358,7 +468,7 @@ a dominator analysis, using the Dataflow Engine. Foreign calls ---------------------------------------------------- -See Note [Foreign calls] in ZipCfgCmmRep! This explains that a safe +See Note [Foreign calls] in CmmNode! This explains that a safe foreign call must do this: save thread state push info table (on thread stack) to describe frame @@ -393,7 +503,7 @@ NEW PLAN for foreign calls: Cmm representations ---------------------------------------------------- -* Cmm.hs +* CmmDecl.hs The type [GenCmm d h g] represents a whole module, ** one list element per .o file ** Without SplitObjs, the list has exactly one element @@ -408,7 +518,7 @@ NEW PLAN for foreign calls: ------------- -OLD BACK END representations (Cmm.hs): +OLD BACK END representations (OldCmm.hs): type Cmm = GenCmm CmmStatic CmmInfo (ListGraph CmmStmt) -- A whole module newtype ListGraph i = ListGraph [GenBasicBlock i] @@ -423,49 +533,47 @@ OLD BACK END representations (Cmm.hs): ------------- NEW BACK END representations -* Not Cmm-specific at all - ZipCfg.hs defines Graph, LGraph, FGraph, - ZHead, ZTail, ZBlock ... +* Uses Hoopl library, a zero-boot package +* CmmNode defines a node of a flow graph. +* Cmm defines CmmGraph, CmmTop, Cmm + - CmmGraph is a closed/closed graph + an entry node. - classes LastNode, HavingSuccessors + data CmmGraph = CmmGraph { g_entry :: BlockId + , g_graph :: Graph CmmNode C C } - MkZipCfg.hs: AGraph: building graphs + - CmmTop is a top level chunk, specialization of GenCmmTop from CmmDecl.hs + with CmmGraph as a flow graph. + - Cmm is a collection of CmmTops. -* ZipCfgCmmRep: instantiates ZipCfg for Cmm - data Middle = ...CmmExpr... - data Last = ...CmmExpr... - type CmmGraph = Graph Middle Last + type Cmm = GenCmm CmmStatic CmmTopInfo CmmGraph + type CmmTop = GenCmmTop CmmStatic CmmTopInfo CmmGraph - type CmmZ = GenCmm CmmStatic CmmInfo (CmmStackInfo, CmmGraph) - type CmmStackInfo = (ByteOff, Maybe ByteOff) - -- (SP offset on entry, update frame space = SP offset on exit) - -- The new codegen produces CmmZ, but once the stack is - -- manifested we can drop that in favour of - -- GenCmm CmmStatic CmmInfo CmmGraph + - CmmTop uses CmmTopInfo, which is a CmmInfoTable and CmmStackInfo - Inside a CmmProc: - - CLabel: used - - CmmInfo: partly used by NEW - - CmmFormals: not used at all PERHAPS NOT EVEN BY OLD PIPELINE! + data CmmTopInfo = TopInfo {info_tbl :: CmmInfoTable, stack_info :: CmmStackInfo} -* MkZipCfgCmm.hs: smart constructors for ZipCfgCmmRep - Depends on (a) MkZipCfg (Cmm-independent) - (b) ZipCfgCmmRep (Cmm-specific) + - CmmStackInfo -------------- -* SHARED stuff - CmmExpr.hs defines the Cmm expression types - - CmmExpr, CmmReg, Width, CmmLit, LocalReg, GlobalReg - - CmmType, Width etc (saparate module?) - - MachOp (separate module?) - - Area, AreaId etc (separate module?) + data CmmStackInfo = StackInfo {arg_space :: ByteOff, updfr_space :: Maybe ByteOff} - BlockId.hs defines BlockId, BlockEnv, BlockSet + * arg_space = SP offset on entry + * updfr_space space = SP offset on exit + Once the staci is manifested, we could drom CmmStackInfo, ie. get + GenCmm CmmStatic CmmInfoTable CmmGraph, but we do not do that currently. -------------- +* MkGraph.hs: smart constructors for Cmm.hs + Beware, the CmmAGraph defined here does not use AGraph from Hoopl, + as CmmAGraph can be opened or closed at exit, See the notes in that module. + +------------- +* SHARED stuff + CmmDecl.hs - GenCmm and GenCmmTop types + CmmExpr.hs - defines the Cmm expression types + - CmmExpr, CmmReg, CmmLit, LocalReg, GlobalReg + - Area, AreaId etc (separate module?) + CmmType.hs - CmmType, Width etc (saparate module?) + CmmMachOp.hs - MachOp and CallishMachOp types + BlockId.hs defines BlockId, BlockEnv, BlockSet ------------- -* Transactions indicate whether or not the result changes: CmmTx - type Tx a = a -> TxRes a - data TxRes a = TxRes ChangeFlag a