X-Git-Url: http://git.megacz.com/?p=ghc-hetmet.git;a=blobdiff_plain;f=compiler%2Fcmm%2Fcmm-notes;h=4a87911ec544ea5d076f73ea86a61751f50e74c0;hp=ee5162476d1bfabbdd72730b8459aa52fce393c4;hb=de2d10e18ce23e5df7fa4f3433b85c95d6092b58;hpb=fc9bd6476e648e719787df2c3656413369a7cc30 diff --git a/compiler/cmm/cmm-notes b/compiler/cmm/cmm-notes index ee51624..4a87911 100644 --- a/compiler/cmm/cmm-notes +++ b/compiler/cmm/cmm-notes @@ -1,7 +1,141 @@ -Notes on new codegen (Sept 09) +More notes (June 11) +~~~~~~~~~~~~~~~~~~~~ +* Possible refactoring: Nuke AGraph in favour of + mkIfThenElse :: Expr -> Graph -> Graph -> FCode Graph + or even + mkIfThenElse :: HasUniques m => Expr -> Graph -> Graph -> m Graph + (Remmber that the .cmm file parser must use this function) + + or parameterise FCode over its envt; the CgState part seem useful for both + +* "Remove redundant reloads" in CmmSpillReload should be redundant; since + insertLateReloads is now gone, every reload is reloading a live variable. + Test and nuke. + +* Sink and inline S(RegSlot(x)) = e in precisely the same way that we + sink and inline x = e + +* Stack layout is very like register assignment: find non-conflicting assigments. + In particular we can use colouring or linear scan (etc). + + We'd fine-grain interference (on a word by word basis) to get maximum overlap. + But that may make very big interference graphs. So linear scan might be + more attactive. + + NB: linear scan does on-the-fly live range splitting. + +* When stubbing dead slots be careful not to write into an area that + overlaps with an area that's in use. So stubbing needs to *follow* + stack layout. + + +More notes (May 11) +~~~~~~~~~~~~~~~~~~~ +In CmmNode, consider spliting CmmCall into two: call and jump + +Notes on new codegen (Aug 10) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Things to do: + - We insert spills for variables before the stack check! This is the reason for + some fishy code in StgCmmHeap.entryHeapCheck where we are doing some strange + things to fix up the stack pointer before GC calls/jumps. + + The reason spills are inserted before the sp check is that at the entry to a + function we always store the parameters passed in registers to local variables. + The spill pass simply inserts spills at variable definitions. We instead should + sink the spills so that we can avoid spilling them on branches that never + reload them. + + This will fix the spill before stack check problem but only really as a side + effect. A 'real fix' probably requires making the spiller know about sp checks. + + EZY: I don't understand this comment. David Terei, can you clarify? + + - Proc points pass all arguments on the stack, adding more code and + slowing down things a lot. We either need to fix this or even better + would be to get rid of proc points. + + - CmmInfo.cmmToRawCmm uses Old.Cmm, so it is called after converting Cmm.Cmm to + Old.Cmm. We should abstract it to work on both representations, it needs only to + convert a CmmInfoTable to [CmmStatic]. + + - The MkGraph currenty uses a different semantics for <*> than Hoopl. Maybe + we could convert codeGen/StgCmm* clients to the Hoopl's semantics? + It's all deeply unsatisfactory. + + - Improve performance of Hoopl. + + A nofib comparison of -fasm vs -fnewcodegen nofib compilation parameters + (using the same ghc-cmm branch +libraries compiled by the old codegenerator) + is at http://fox.auryn.cz/msrc/0517_hoopl/32bit.oldghcoldgen.oldghchoopl.txt + - the code produced is 10.9% slower, the compilation is +118% slower! + + The same comparison with ghc-head with zip representation is at + http://fox.auryn.cz/msrc/0517_hoopl/32bit.oldghcoldgen.oldghczip.txt + - the code produced is 11.7% slower, the compilation is +78% slower. + + When compiling nofib, ghc-cmm + libraries compiled with -fnew-codegen + is 23.7% slower (http://fox.auryn.cz/msrc/0517_hoopl/32bit.oldghcoldgen.hooplghcoldgen.txt). + When compiling nofib, ghc-head + libraries compiled with -fnew-codegen + is 31.4% slower (http://fox.auryn.cz/msrc/0517_hoopl/32bit.oldghcoldgen.zipghcoldgen.txt). + + So we generate a bit better code, but it takes us longer! + + EZY: Also importantly, Hoopl uses dramatically more memory than the + old code generator. + + - Are all blockToNodeList and blockOfNodeList really needed? Maybe we could + splice blocks instead? + + In the CmmContFlowOpt.blockConcat, using Dataflow seems too clumsy. Still, + a block catenation function would be probably nicer than blockToNodeList + / blockOfNodeList combo. + + - lowerSafeForeignCall seems too lowlevel. Just use Dataflow. After that + delete splitEntrySeq from HooplUtils. + + - manifestSP seems to touch a lot of the graph representation. It is + also slow for CmmSwitch nodes O(block_nodes * switch_statements). + Maybe rewrite manifestSP to use Dataflow? + + - Sort out Label, LabelMap, LabelSet versus BlockId, BlockEnv, BlockSet + dichotomy. Mostly this means global replace, but we also need to make + Label an instance of Outputable (probably in the Outputable module). + + - NB that CmmProcPoint line 283 has a hack that works around a GADT-related + bug in 6.10. + + - SDM (2010-02-26) can we remove the Foreign constructor from Convention? + Reason: we never generate code for a function with the Foreign + calling convention, and the code for calling foreign calls is generated + + - AsmCodeGen has a generic Cmm optimiser; move this into new pipeline + EZY (2011-04-16): The mini-inliner has been generalized and ported, + but the constant folding and other optimizations need to still be + ported. + + - AsmCodeGen has post-native-cg branch eliminator (shortCutBranches); + we ultimately want to share this with the Cmm branch eliminator. + + - At the moment, references to global registers like Hp are "lowered" + late (in CgUtils.fixStgRegisters). We should do this early, in the + new native codegen, much in the way that we lower calling conventions. + Might need to be a bit sophisticated about aliasing. + + - Question: currently we lift procpoints to become separate + CmmProcs. Do we still want to do this? + + NB: and advantage of continuing to do this is that + we can do common-proc elimination! + + - Move to new Cmm rep: + * Make native CG consume New Cmm; + * Convert Old Cmm->New Cmm to keep old path alive + * Produce New Cmm when reading in .cmm files + + - Consider module names + - Top-level SRT threading is a bit ugly - Add type/newtype for CmmModule = [CmmGroup] -- A module @@ -12,26 +146,24 @@ Things to do: regardless of SplitObjs. Question: can we *always* generate M.o if there is just one element in the list (rather than M/M1.o, M/M2.o etc) - - Change - type CmmZ = GenCmm CmmStatic CmmInfo (CmmStackInfo, CmmGraph) - to - type CmmZ = GenCmm CmmStatic (CmmInfo, CmmStackInfo) CmmGraph - -- And perhaps take opportunity to prune CmmInfo? - - - Clarify which fields of CmmInfo are still used - - Maybe get rid of CmmFormals arg of CmmProc in all versions? + One SRT per group. - - We aren't sure whether cmmToRawCmm is actively used by the new pipeline; check - And what does CmmBuildInfoTables do?! - - - Nuke CmmZipUtil, move zipPreds into ZipCfg + - See "CAFs" below; we want to totally refactor the way SRTs are calculated - Pull out Areas into its own module - Parameterise AreaMap + Parameterise AreaMap (note there are type synonyms in CmmStackLayout!) Add ByteWidth = Int type SubArea = (Area, ByteOff, ByteWidth) ByteOff should not be defined in SMRep -- that is too high up the hierarchy + - SMRep should not be imported by any module in cmm/! Make it so. + -- ByteOff etc ==> CmmExpr + -- rET_SMALL etc ==> CmmInfo + Check that there are no other imports from codeGen in cmm/ + + - If you eliminate a label by branch chain elimination, + what happens if there's an Area associated with that label? + - Think about a non-flattened representation? - LastCall: @@ -54,7 +186,7 @@ Things to do: http://hackage.haskell.org/trac/ghc/wiki/Commentary/Compiler/NewCodeGenPipeline - - We believe that all of CmmProcPointZ.addProcPointProtocols is dead. What + - We believe that all of CmmProcPoint.addProcPointProtocols is dead. What goes wrong if we simply never call it? - Something fishy in CmmStackLayout.hs @@ -66,48 +198,90 @@ Things to do: points that are not successors of a call, we think) can be treated uniformly: zero-size Area, and use inSP. -Dead files -~~~~~~~~~~ -CmmProcPoint (Michael Adams) -CmmCPS (ditto) + + - Currently AsmCodeGen top level calls AsmCodeGen.cmmToCmm, which is a small + C-- optimiser. It has quite a lot of boilerplate folding code in AsmCodeGen + (cmmBlockConFold, cmmStmtConFold, cmmExprConFold), before calling out to + CmmOpt. ToDo: see what optimisations are being done; and do them before + AsmCodeGen. + + - Modularise the CPS pipeline; instead of ...; A;B;C; ... + use ..; ABC; .... + + - Most of HscMain.tryNewCodeGen does not belong in HscMain. Instead + if new_cg then + StgCmm.codeGen + processCmm [including generating "raw" cmm] + else + CodeGen.codeGen + cmmToRawCmm + + + - If we stick CAF and stack liveness info on a LastCall node (not LastRet/Jump) + then all CAF and stack liveness stuff be completed before we split + into separate C procedures. + + Short term: + compute and attach liveness into to LastCall + right at end, split, cvt to old rep + [must split before cvt, because old rep is not expressive enough] + + Longer term: + when old rep disappears, + move the whole splitting game into the C back end *only* + (guided by the procpoint set) + +---------------------------------------------------- + Modules in cmm/ +---------------------------------------------------- + +-------- Testing stuff ------------ HscMain.optionallyConvertAndOrCPS testCmmConversion -DynFlags: -fconvert-to-zipper-and-back, -frun-cps, -frun-cpsz +DynFlags: -fconvert-to-zipper-and-back, -frun-cpsz -Proc-points -~~~~~~~~~~~~ -Consider this program, which has a diamond control flow, -with a call on one branch - fn(p,x) { - h() - if b then { ... f(x) ...; q=5; goto J } - else { ...; q=7; goto J } - J: ..p...q... - } -then the join point J is a "proc-point". So, is 'p' passed to J -as a parameter? Or, if 'p' was saved on the stack anyway, perhaps -to keep it alive across the call to h(), maybe 'p' gets communicated -to J that way. This is an awkward choice. (We think that we currently -never pass variables to join points via arguments.) +-------- Moribund stuff ------------ +OldCmm.hs Definition of flowgraph of old representation +OldCmmUtil.hs Utilites that operates mostly on on CmmStmt +OldPprCmm.hs Pretty print for CmmStmt, GenBasicBlock and ListGraph +CmmCvt.hs Conversion between old and new Cmm reps +CmmOpt.hs Hopefully-redundant optimiser -Furthermore, there is *no way* to pass q to J in a register (other -than a paramter register). +-------- Stuff to keep ------------ +CmmPipeline.hs Driver for new pipeline -What we want is to do register allocation across the whole caboodle. -Then we could drop all the code that deals with the above awkward -decisions about spilling variables across proc-points. +CmmLive.hs Liveness analysis, dead code elim +CmmProcPoint.hs Identifying and splitting out proc-points -Note that J doesn't need an info table. +CmmSpillReload.hs Save and restore across calls -What we really want is for each Block to have an optional info table. -To do that, we need to be polymorphic over first nodes. +CmmCommonBlockElim.hs Common block elim +CmmContFlowOpt.hs Other optimisations (branch-chain, merging) -Figuring out proc-points -~~~~~~~~~~~~~~~~~~~~~~~~ -Proc-points are identified by -CmmProcPointZ.minimalProcPointSet/extendPPSet Although there isn't -that much code, JD thinks that it could be done much more nicely using -a dominator analysis, using the Dataflow Engine. +CmmBuildInfoTables.hs New info-table +CmmStackLayout.hs and stack layout +CmmCallConv.hs +CmmInfo.hs Defn of InfoTables, and conversion to exact byte layout + +---------- Cmm data types -------------- +Cmm.hs Cmm instantiations of dataflow graph framework +MkGraph.hs Interface for building Cmm for codeGen/Stg*.hs modules + +CmmDecl.hs Shared Cmm types of both representations +CmmExpr.hs Type of Cmm expression +CmmType.hs Type of Cmm types and their widths +CmmMachOp.hs MachOp type and accompanying utilities + +CmmUtils.hs +CmmLint.hs + +PprC.hs Pretty print Cmm in C syntax +PprCmm.hs Pretty printer for CmmGraph. +PprCmmDecl.hs Pretty printer for common Cmm types. +PprCmmExpr.hs Pretty printer for Cmm expressions. + +CLabel.hs CLabel +BlockId.hs BlockId, BlockEnv, BlockSet ---------------------------------------------------- Top-level structure @@ -121,44 +295,44 @@ a dominator analysis, using the Dataflow Engine. type RawCmm = GenCmm CmmStatic [CmmStatic] (ListGraph CmmStmt) * HscMain.tryNewCodeGen - - STG->Cmm: StgCmm.codeGen (new codegen) - - Optimise: CmmContFlowOpt (simple optimisations, very self contained) - - Cps convert: CmmCPSZ.protoCmmCPSZ - - Optimise: CmmContFlowOpt again - - Convert: CmmCvt.cmmOfZgraph (convert to old rep) very self contained + - STG->Cmm: StgCmm.codeGen (new codegen) + - Optimize and CPS: CmmPipeline.cmmPipeline + - Convert: CmmCvt.cmmOfZgraph (convert to old rep) very self contained * StgCmm.hs The new STG -> Cmm conversion code generator Lots of modules StgCmmXXX ---------------------------------------------------- - CmmCPSZ.protoCmmCPSZ The new pipeline + CmmPipeline.cmmPipeline The new pipeline ---------------------------------------------------- -CmmCPSZprotoCmmCPSZ: - 1. Do cpsTop for each procedures separately - 2. Build SRT representation; this spans multiple procedures - (unless split-objs) +CmmPipeline.cmmPipeline: + 1. Do control flow optimization + 2. Do cpsTop for each procedures separately + 3. Build SRT representation; this spans multiple procedures + (unless split-objs) + 4. Do control flow optimization on all resulting procedures cpsTop: - * CmmCommonBlockElimZ.elimCommonBlocks: + * CmmCommonBlockElim.elimCommonBlocks: eliminate common blocks - * CmmProcPointZ.minimalProcPointSet + * CmmProcPoint.minimalProcPointSet identify proc-points + no change to graph - * CmmProcPointZ.addProcPointProtocols + * CmmProcPoint.addProcPointProtocols something to do with the MA optimisation probably entirely unnecessary - * Spill and reload: - CmmSpillReload.dualLivenessWithInsertion insert spills/reloads across LastCalls, and Branches to proc-points - Now sink those reloads: - - CmmSpillReload.insertLateReloads + Now sink those reloads (and other instructions): + - CmmSpillReload.rewriteAssignments - CmmSpillReload.removeDeadAssignmentsAndReloads * CmmStackLayout.stubSlotsOnDeath @@ -170,7 +344,7 @@ cpsTop: - CmmStackLayout.layout Lay out the stack, returning an AreaMap - type AreaMap = FiniteMap Area ByteOff + type AreaMap = FiniteMap Area ByteOff -- Byte offset of the oldest byte of the Area, -- relative to the oldest byte of the Old Area @@ -178,12 +352,55 @@ cpsTop: Manifest the stack pointer * Split into separate procedures - - CmmProcPointZ.procPointAnalysis + - CmmProcPoint.procPointAnalysis Given set of proc points, which blocks are reachable from each + Claim: too few proc-points => code duplication, but program still works?? - - CmmProcPointZ.splitAtProcPoints + - CmmProcPoint.splitAtProcPoints Using this info, split into separate procedures + - CmmBuildInfoTables.setInfoTableStackMap + Attach stack maps to each info table + + +---------------------------------------------------- + Proc-points +---------------------------------------------------- + +Consider this program, which has a diamond control flow, +with a call on one branch + fn(p,x) { + h() + if b then { ... f(x) ...; q=5; goto J } + else { ...; q=7; goto J } + J: ..p...q... + } +then the join point J is a "proc-point". So, is 'p' passed to J +as a parameter? Or, if 'p' was saved on the stack anyway, perhaps +to keep it alive across the call to h(), maybe 'p' gets communicated +to J that way. This is an awkward choice. (We think that we currently +never pass variables to join points via arguments.) + +Furthermore, there is *no way* to pass q to J in a register (other +than a parameter register). + +What we want is to do register allocation across the whole caboodle. +Then we could drop all the code that deals with the above awkward +decisions about spilling variables across proc-points. + +Note that J doesn't need an info table. + +What we really want is for each LastCall (not LastJump/Ret) +to have an info table. Note that ProcPoints that are not successors +of calls don't need an info table. + +Figuring out proc-points +~~~~~~~~~~~~~~~~~~~~~~~~ +Proc-points are identified by +CmmProcPoint.minimalProcPointSet/extendPPSet Although there isn't +that much code, JD thinks that it could be done much more nicely using +a dominator analysis, using the Dataflow Engine. + ---------------------------------------------------- CAFs ---------------------------------------------------- @@ -193,11 +410,11 @@ cpsTop: If f is live, then so is g. f's SRT must include g's closure. * The CLabel for the entry-point/closure reveals whether g is - a CAF (or refers to CAFs). See the IdLabell constructor of CLabel. + a CAF (or refers to CAFs). See the IdLabel constructor of CLabel. * The CAF-ness of the original top-level defininions is figured out (by TidyPgm) before we generate C--. This CafInfo is only set for - top-level Ids; nested bindings stay with NoCafRefs. + top-level Ids; nested bindings stay with MayHaveCafRefs. * Currently an SRT contains (only) pointers to (top-level) closures. @@ -210,10 +427,19 @@ cpsTop: This generates C-- roughly like this: f_closure: .word f_entry f_entry() [info-tbl-for-f] { ...jump g_entry...jump h2... } - g_entry() [info-tbl-for-g] { ...jump h1 } + g_entry() [info-tbl-for-g] { ...jump h1... } Note that there is no top-level closure for g (only an info table). - So: info-tbl-for-f must have an SRT that keeps h1,h2 alive + This fact (whether or not there is a top-level closure) is recorded + in the InfoTable attached to the CmmProc for f, g + INVARIANT: + Any out-of-Group references to an IdLabel goes to + a Proc whose InfoTable says "I have a top-level closure". + Equivalently: + A CmmProc whose InfoTable says "I do not have a top-level + closure" is referred to only from its own Group. + +* So: info-tbl-for-f must have an SRT that keeps h1,h2 alive info-tbl-for-g must have an SRT that keeps h1 (only) alive But if we just look for the free CAF refs, we get: @@ -224,15 +450,18 @@ cpsTop: f's keep-alive refs to include h1. * The SRT info is the C_SRT field of Cmm.ClosureTypeInfo in a - CmmInfoTable attached to each CmmProc. CmmCPSZ.toTops actually does + CmmInfoTable attached to each CmmProc. CmmPipeline.toTops actually does the attaching, right at the end of the pipeline. The C_SRT part gives offsets within a single, shared table of closure pointers. +* DECIDED: we can generate SRTs based on the final Cmm program + without knowledge of how it is generated. + ---------------------------------------------------- Foreign calls ---------------------------------------------------- -See Note [Foreign calls] in ZipCfgCmmRep! This explains that a safe +See Note [Foreign calls] in CmmNode! This explains that a safe foreign call must do this: save thread state push info table (on thread stack) to describe frame @@ -267,7 +496,7 @@ NEW PLAN for foreign calls: Cmm representations ---------------------------------------------------- -* Cmm.hs +* CmmDecl.hs The type [GenCmm d h g] represents a whole module, ** one list element per .o file ** Without SplitObjs, the list has exactly one element @@ -282,7 +511,7 @@ NEW PLAN for foreign calls: ------------- -OLD BACK END representations (Cmm.hs): +OLD BACK END representations (OldCmm.hs): type Cmm = GenCmm CmmStatic CmmInfo (ListGraph CmmStmt) -- A whole module newtype ListGraph i = ListGraph [GenBasicBlock i] @@ -297,49 +526,47 @@ OLD BACK END representations (Cmm.hs): ------------- NEW BACK END representations -* Not Cmm-specific at all - ZipCfg.hs defines Graph, LGraph, FGraph, - ZHead, ZTail, ZBlock ... +* Uses Hoopl library, a zero-boot package +* CmmNode defines a node of a flow graph. +* Cmm defines CmmGraph, CmmTop, Cmm + - CmmGraph is a closed/closed graph + an entry node. - classes LastNode, HavingSuccessors + data CmmGraph = CmmGraph { g_entry :: BlockId + , g_graph :: Graph CmmNode C C } - MkZipCfg.hs: AGraph: building graphs + - CmmTop is a top level chunk, specialization of GenCmmTop from CmmDecl.hs + with CmmGraph as a flow graph. + - Cmm is a collection of CmmTops. -* ZipCfgCmmRep: instantiates ZipCfg for Cmm - data Middle = ...CmmExpr... - data Last = ...CmmExpr... - type CmmGraph = Graph Middle Last + type Cmm = GenCmm CmmStatic CmmTopInfo CmmGraph + type CmmTop = GenCmmTop CmmStatic CmmTopInfo CmmGraph - type CmmZ = GenCmm CmmStatic CmmInfo (CmmStackInfo, CmmGraph) - type CmmStackInfo = (ByteOff, Maybe ByteOff) - -- (SP offset on entry, update frame space = SP offset on exit) - -- The new codegen produces CmmZ, but once the stack is - -- manifested we can drop that in favour of - -- GenCmm CmmStatic CmmInfo CmmGraph + - CmmTop uses CmmTopInfo, which is a CmmInfoTable and CmmStackInfo - Inside a CmmProc: - - CLabel: used - - CmmInfo: partly used by NEW - - CmmFormals: not used at all PERHAPS NOT EVEN BY OLD PIPELINE! + data CmmTopInfo = TopInfo {info_tbl :: CmmInfoTable, stack_info :: CmmStackInfo} -* MkZipCfgCmm.hs: smart constructors for ZipCfgCmmRep - Depends on (a) MkZipCfg (Cmm-independent) - (b) ZipCfgCmmRep (Cmm-specific) + - CmmStackInfo -------------- -* SHARED stuff - CmmExpr.hs defines the Cmm expression types - - CmmExpr, CmmReg, Width, CmmLit, LocalReg, GlobalReg - - CmmType, Width etc (saparate module?) - - MachOp (separate module?) - - Area, AreaId etc (separate module?) + data CmmStackInfo = StackInfo {arg_space :: ByteOff, updfr_space :: Maybe ByteOff} - BlockId.hs defines BlockId, BlockEnv, BlockSet + * arg_space = SP offset on entry + * updfr_space space = SP offset on exit + Once the staci is manifested, we could drom CmmStackInfo, ie. get + GenCmm CmmStatic CmmInfoTable CmmGraph, but we do not do that currently. -------------- +* MkGraph.hs: smart constructors for Cmm.hs + Beware, the CmmAGraph defined here does not use AGraph from Hoopl, + as CmmAGraph can be opened or closed at exit, See the notes in that module. ------------- -* Transactions indicate whether or not the result changes: CmmTx - type Tx a = a -> TxRes a - data TxRes a = TxRes ChangeFlag a +* SHARED stuff + CmmDecl.hs - GenCmm and GenCmmTop types + CmmExpr.hs - defines the Cmm expression types + - CmmExpr, CmmReg, CmmLit, LocalReg, GlobalReg + - Area, AreaId etc (separate module?) + CmmType.hs - CmmType, Width etc (saparate module?) + CmmMachOp.hs - MachOp and CallishMachOp types + + BlockId.hs defines BlockId, BlockEnv, BlockSet +-------------