compiler/cmm/cmm-notes

   1 Notes on new codegen (Sept 09)\r
   2 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\r
   3 \r
   4 Things to do:\r
   5 \r
   6 \r
   7  - All dataflow analyses are in the FuelMonad, even though they\r
   8    are guarnteed to consume no fuel.  This seems silly\r
   9 \r
  10  - CmmContFlowOpt.runCmmContFlowOptZs is not called!\r
  11  - Why is runCmmOpts called from HscMain?  Seems too "high up".\r
  12    In fact HscMain calls (runCmmOpts cmmCfgOptsZ) which is what\r
  13    runCmmContFlowOptZs does.  Tidy up!\r
  14 \r
  15 \r
  16  - AsmCodeGen has a generic Cmm optimiser; move this into new pipeline\r
  17 \r
  18  - AsmCodeGen has post-native-cg branch elimiator (shortCutBranches);\r
  19    we ultimately want to share this with the Cmm branch eliminator.\r
  20 \r
  21  - At the moment, references to global registers like Hp are "lowered" \r
  22    late (in AsmCodeGen.fixAssignTop and cmmToCmm). We should do this\r
  23    early, in the new native codegen, much in the way that we lower \r
  24    calling conventions.  Might need to be a bit sophisticated about\r
  25    aliasing.\r
  26 \r
  27  - Refactor Cmm so that it contains only shared stuff\r
  28    Add a module MoribundCmm which contains stuff from\r
  29    Cmm for old code gen path\r
  30 \r
  31  - Question: currently we lift procpoints to become separate\r
  32    CmmProcs.  Do we still want to do this?\r
  33     \r
  34    NB: and advantage of continuing to do this is that\r
  35    we can do common-proc elimination!\r
  36 \r
  37  - Move to new Cmm rep:\r
  38      * Make native CG consume New Cmm; \r
  39      * Convert Old Cmm->New Cmm to keep old path alive\r
  40      * Produce New Cmm when reading in .cmm files\r
  41 \r
  42  - Consider module names\r
  43 \r
  44  - Top-level SRT threading is a bit ugly\r
  45 \r
  46  - Add type/newtype for CmmModule = [CmmGroup]    -- A module\r
  47                         CmmGroup  = [CmmTop]      -- A .o file\r
  48                         CmmTop    = Proc | Data   -- A procedure or data\r
  49 \r
  50  - This is a *change*: currently a CmmGroup is one function's-worth of code\r
  51    regardless of SplitObjs.   Question: can we *always* generate M.o if there\r
  52    is just one element in the list (rather than M/M1.o, M/M2.o etc)\r
  53 \r
  54    One SRT per group.\r
  55 \r
  56  - See "CAFs" below; we want to totally refactor the way SRTs are calculated\r
  57 \r
  58  - Change  \r
  59       type CmmZ = GenCmm CmmStatic CmmInfo (CmmStackInfo, CmmGraph)\r
  60    to\r
  61       type CmmZ = GenCmm CmmStatic (CmmInfo, CmmStackInfo) CmmGraph\r
  62         -- And perhaps take opportunity to prune CmmInfo?\r
  63 \r
  64  - Clarify which fields of CmmInfo are still used\r
  65  - Maybe get rid of CmmFormals arg of CmmProc in all versions?\r
  66 \r
  67  - We aren't sure whether cmmToRawCmm is actively used by the new pipeline; check\r
  68    And what does CmmBuildInfoTables do?!\r
  69 \r
  70  - Nuke CmmZipUtil, move zipPreds into ZipCfg\r
  71 \r
  72  - Pull out Areas into its own module\r
  73    Parameterise AreaMap\r
  74    Add ByteWidth = Int\r
  75    type SubArea    = (Area, ByteOff, ByteWidth) \r
  76    ByteOff should not be defined in SMRep -- that is too high up the hierarchy\r
  77    \r
  78  - SMRep should not be imported by any module in cmm/!  Make it so.\r
  79         -- ByteOff etc   ==>  CmmExpr\r
  80         -- rET_SMALL etc ==> CmmInfo\r
  81    Check that there are no other imports from codeGen in cmm/\r
  82 \r
  83  - Think about a non-flattened representation?\r
  84 \r
  85  - LastCall: \r
  86     * Use record fields for LastCall!\r
  87     * cml_ret_off should be a ByteOff\r
  88     * Split into \r
  89          LastCall (which has a successor) and\r
  90          LastJump (which does not, includes return?)\r
  91            - does not have cml_cont, cml_ret_args, cml_ret_off\r
  92          LastForeignCall \r
  93            - safe! \r
  94            - expands into save/MidForeignCall/restore/goto\r
  95            - like any LastCall, target of the call gets an info table\r
  96 \r
  97  - JD: remind self of what goes wrong if you turn off the \r
  98    liveness of the update frame\r
  99 \r
 100  - Garbage-collect http://hackage.haskell.org/trac/ghc/wiki/Commentary/Compiler/CPS\r
 101    moving good stuff into \r
 102    http://hackage.haskell.org/trac/ghc/wiki/Commentary/Compiler/NewCodeGenPipeline\r
 103 \r
 104 \r
 105  - We believe that all of CmmProcPointZ.addProcPointProtocols is dead.  What\r
 106    goes wrong if we simply never call it?\r
 107 \r
 108  - Something fishy in CmmStackLayout.hs\r
 109    * In particular, 'getAreaSize' returns an AreaMap, but we *know* the width of\r
 110         LocalRegs, so it'd be better to return FiniteMap AreaId ByteWidth\r
 111    * setSuccSPs looks fishy.  Rather than lookin in procPoints, it could\r
 112         just lookup the block in areaSize which, after all, has a binding\r
 113         for precisely successors of calls.  All other blocks (including proc\r
 114         points that are not successors of a call, we think) can be treated\r
 115         uniformly: zero-size Area, and use inSP.\r
 116 \r
 117 \r
 118  - Currently AsmCodeGen top level calls AsmCodeGen.cmmToCmm, which is a small\r
 119    C-- optimiser.  It has quite a lot of boilerplate folding code in AsmCodeGen\r
 120    (cmmBlockConFold, cmmStmtConFold, cmmExprConFold), before calling out to\r
 121    CmmOpt.  ToDo: see what optimisations are being done; and do them before\r
 122    AsmCodeGen.\r
 123 \r
 124  - Modularise the CPS pipeline; instead of ...; A;B;C; ...\r
 125                                 use  ..; ABC; ....\r
 126 \r
 127  - Most of HscMain.tryNewCodeGen does not belong in HscMain.  Instead\r
 128         if new_cg then\r
 129              StgCmm.codeGen\r
 130              processCmm  [including generating "raw" cmm]\r
 131         else\r
 132              CodeGen.codeGen\r
 133              cmmToRawCmm\r
 134 \r
 135 \r
 136  - If we stick CAF and stack liveness info on a LastCall node (not LastRet/Jump)\r
 137    then all CAF and stack liveness stuff be completed before we split\r
 138    into separate C procedures.\r
 139 \r
 140    Short term:\r
 141      compute and attach liveness into to LastCall\r
 142      right at end, split, cvt to old rep\r
 143      [must split before cvt, because old rep is not expressive enough]\r
 144 \r
 145    Longer term: \r
 146      when old rep disappears, \r
 147      move the whole splitting game into the C back end *only*\r
 148          (guided by the procpoint set)\r
 149 \r
 150       \r
 151 ----------------------------------------------------\r
 152         Modules in cmm/\r
 153 ----------------------------------------------------\r
 154 \r
 155 -------- Dead stuff ------------\r
 156 CmmProcPoint        Dead: Michael Adams\r
 157 CmmCPS              Dead: Michael Adams\r
 158 CmmCPSGen.hs        Dead: Michael Adams\r
 159 CmmBrokenBlock.hs   Dead: Michael Adams\r
 160 CmmLive.hs          Dead: Michael Adams\r
 161 CmmProcPoint.hs     Dead: Michael Adams\r
 162 Dataflow.hs         Dead: Michael Adams\r
 163 StackColor.hs       Norman?\r
 164 StackPlacements.hs  Norman?\r
 165 \r
 166 HscMain.optionallyConvertAndOrCPS\r
 167         testCmmConversion\r
 168 DynFlags:  -fconvert-to-zipper-and-back, -frun-cps, -frun-cpsz\r
 169 \r
 170 -------- Moribund stuff ------------\r
 171 CmmCvt.hs      Conversion between old and new Cmm reps\r
 172 CmmOpt.hs      Hopefully-redundant optimiser\r
 173 CmmZipUtil.hs  Only one function; move elsewhere\r
 174 \r
 175 -------- Stuff to keep ------------\r
 176 CmmCPSZ.hs                Driver for new pipeline\r
 177 \r
 178 CmmLiveZ.hs               Liveness analysis, dead code elim\r
 179 CmmProcPointZ.hs          Identifying and splitting out proc-points\r
 180 \r
 181 CmmSpillReload.hs         Save and restore across calls\r
 182 \r
 183 CmmCommonBlockElimZ.hs    Common block elim\r
 184 CmmContFlowOpt.hs         Other optimisations (branch-chain, merging)\r
 185 \r
 186 CmmBuildInfoTables.hs     New info-table \r
 187 CmmStackLayout.hs         and stack layout \r
 188 CmmCallConv.hs\r
 189 CmmInfo.hs                Defn of InfoTables, and conversion to exact layout\r
 190 \r
 191 ---------- Cmm data types --------------\r
 192 ZipCfgCmmRep.hs     Cmm instantiations of dataflow graph framework\r
 193 MkZipCfgCmm.hs      Cmm instantiations of dataflow graph framework\r
 194 \r
 195 Cmm.hs        Key module; a mix of old and new stuff\r
 196                   so needs tidying up in due course\r
 197 CmmExpr.hs\r
 198 CmmUtils.hs\r
 199 CmmLint.hs\r
 200 \r
 201 PprC.hs             Pretty print Cmm in C syntax\r
 202 PprCmm.hs           Pretty printer for Cmm\r
 203 PprCmmZ.hs          Additional stuff for zipper rep\r
 204 \r
 205 CLabel.hs     CLabel\r
 206 \r
 207 ----------  Dataflow modules --------------\r
 208    Goal: separate library; for now, separate directory\r
 209 \r
 210 MkZipCfg.hs\r
 211 ZipCfg.hs\r
 212 ZipCfgExtras.hs\r
 213 ZipDataflow.hs\r
 214 CmmTx.hs              Transactions\r
 215 OptimizationFuel.hs   Fuel\r
 216 BlockId.hs    BlockId, BlockEnv, BlockSet\r
 217 DFMonad.hs            \r
 218 \r
 219 \r
 220 ----------------------------------------------------\r
 221       Top-level structure\r
 222 ----------------------------------------------------\r
 223 \r
 224 * New codgen called in HscMain.hscGenHardCode, by calling HscMain.tryNewCodeGen, \r
 225   enabled by -fnew-codegen (Opt_TryNewCodeGen)\r
 226 \r
 227   THEN it calls CmmInfo.cmmToRawCmm to lay out the details of info tables\r
 228       type Cmm    = GenCmm CmmStatic CmmInfo     (ListGraph CmmStmt)\r
 229       type RawCmm = GenCmm CmmStatic [CmmStatic] (ListGraph CmmStmt)\r
 230 \r
 231 * HscMain.tryNewCodeGen\r
 232     - STG->Cmm:    StgCmm.codeGen (new codegen)\r
 233     - Optimise:    CmmContFlowOpt (simple optimisations, very self contained)\r
 234     - Cps convert: CmmCPSZ.protoCmmCPSZ \r
 235     - Optimise:    CmmContFlowOpt again\r
 236     - Convert:     CmmCvt.cmmOfZgraph (convert to old rep) very self contained\r
 237 \r
 238 * StgCmm.hs  The new STG -> Cmm conversion code generator\r
 239   Lots of modules StgCmmXXX\r
 240 \r
 241 \r
 242 ----------------------------------------------------\r
 243       CmmCPSZ.protoCmmCPSZ   The new pipeline\r
 244 ----------------------------------------------------\r
 245 \r
 246 CmmCPSZprotoCmmCPSZ:\r
 247    1. Do cpsTop for each procedures separately\r
 248    2. Build SRT representation; this spans multiple procedures\r
 249         (unless split-objs)\r
 250 \r
 251 cpsTop:\r
 252   * CmmCommonBlockElimZ.elimCommonBlocks:\r
 253         eliminate common blocks \r
 254 \r
 255   * CmmProcPointZ.minimalProcPointSet\r
 256         identify proc-points\r
 257         no change to graph\r
 258 \r
 259   * CmmProcPointZ.addProcPointProtocols\r
 260         something to do with the MA optimisation\r
 261         probably entirely unnecessary\r
 262 \r
 263   * Spill and reload:\r
 264      - CmmSpillReload.dualLivenessWithInsertion\r
 265        insert spills/reloads across \r
 266            LastCalls, and \r
 267            Branches to proc-points\r
 268      Now sink those reloads:\r
 269      - CmmSpillReload.insertLateReloads\r
 270      - CmmSpillReload.removeDeadAssignmentsAndReloads\r
 271 \r
 272   * CmmStackLayout.stubSlotsOnDeath\r
 273         debug only: zero out dead slots when they die\r
 274 \r
 275   * Stack layout\r
 276      - CmmStackLayout.lifeSlotAnal: \r
 277        find which sub-areas are live on entry to each block\r
 278 \r
 279      - CmmStackLayout.layout\r
 280        Lay out the stack, returning an AreaMap\r
 281          type AreaMap = FiniteMap Area ByteOff\r
 282           -- Byte offset of the oldest byte of the Area, \r
 283           -- relative to the oldest byte of the Old Area\r
 284 \r
 285      - CmmStackLayout.manifestSP\r
 286        Manifest the stack pointer\r
 287 \r
 288    * Split into separate procedures\r
 289       - CmmProcPointZ.procPointAnalysis\r
 290         Given set of proc points, which blocks are reachable from each\r
 291         Claim: too few proc-points => code duplication, but program still works??\r
 292 \r
 293       - CmmProcPointZ.splitAtProcPoints\r
 294         Using this info, split into separate procedures\r
 295 \r
 296       - CmmBuildInfoTables.setInfoTableStackMap\r
 297         Attach stack maps to each info table\r
 298 \r
 299 \r
 300 ----------------------------------------------------\r
 301         Proc-points\r
 302 ----------------------------------------------------\r
 303 \r
 304 Consider this program, which has a diamond control flow, \r
 305 with a call on one branch\r
 306  fn(p,x) {\r
 307         h()\r
 308         if b then { ... f(x) ...; q=5; goto J }\r
 309              else { ...; q=7; goto J }\r
 310      J: ..p...q...\r
 311   }\r
 312 then the join point J is a "proc-point".  So, is 'p' passed to J\r
 313 as a parameter?  Or, if 'p' was saved on the stack anyway, perhaps\r
 314 to keep it alive across the call to h(), maybe 'p' gets communicated\r
 315 to J that way. This is an awkward choice.  (We think that we currently\r
 316 never pass variables to join points via arguments.)\r
 317 \r
 318 Furthermore, there is *no way* to pass q to J in a register (other\r
 319 than a paramter register).\r
 320 \r
 321 What we want is to do register allocation across the whole caboodle.\r
 322 Then we could drop all the code that deals with the above awkward\r
 323 decisions about spilling variables across proc-points.\r
 324 \r
 325 Note that J doesn't need an info table.\r
 326 \r
 327 What we really want is for each LastCall (not LastJump/Ret) \r
 328 to have an info table.   Note that ProcPoints that are not successors\r
 329 of calls don't need an info table.\r
 330 \r
 331 Figuring out proc-points\r
 332 ~~~~~~~~~~~~~~~~~~~~~~~~\r
 333 Proc-points are identified by\r
 334 CmmProcPointZ.minimalProcPointSet/extendPPSet Although there isn't\r
 335 that much code, JD thinks that it could be done much more nicely using\r
 336 a dominator analysis, using the Dataflow Engine.\r
 337 \r
 338 ----------------------------------------------------\r
 339                 CAFs\r
 340 ----------------------------------------------------\r
 341 \r
 342 * The code for a procedure f may refer to either the *closure* \r
 343   or the *entry point* of another top-level procedure g.  \r
 344   If f is live, then so is g.  f's SRT must include g's closure.\r
 345 \r
 346 * The CLabel for the entry-point/closure reveals whether g is \r
 347   a CAF (or refers to CAFs).  See the IdLabel constructor of CLabel.\r
 348 \r
 349 * The CAF-ness of the original top-level defininions is figured out\r
 350   (by TidyPgm) before we generate C--.  This CafInfo is only set for\r
 351   top-level Ids; nested bindings stay with MayHaveCafRefs.\r
 352 \r
 353 * Currently an SRT contains (only) pointers to (top-level) closures.\r
 354 \r
 355 * Consider this Core code\r
 356         f = \x -> let g = \y -> ...x...y...h1...\r
 357                   in ...h2...g...\r
 358   and suppose that h1, h2 have IdInfo of MayHaveCafRefs.\r
 359   Therefore, so will f,  But g will not (since it's nested).\r
 360 \r
 361   This generates C-- roughly like this:\r
 362      f_closure: .word f_entry\r
 363      f_entry() [info-tbl-for-f] { ...jump g_entry...jump h2... }\r
 364      g_entry() [info-tbl-for-g] { ...jump h1... }\r
 365 \r
 366   Note that there is no top-level closure for g (only an info table).\r
 367   This fact (whether or not there is a top-level closure) is recorded\r
 368   in the InfoTable attached to the CmmProc for f, g\r
 369   INVARIANT: \r
 370      Any out-of-Group references to an IdLabel goes to\r
 371      a Proc whose InfoTable says "I have a top-level closure".\r
 372   Equivalently: \r
 373      A CmmProc whose InfoTable says "I do not have a top-level\r
 374      closure" is referred to only from its own Group.\r
 375 \r
 376 * So:   info-tbl-for-f must have an SRT that keeps h1,h2 alive\r
 377         info-tbl-for-g must have an SRT that keeps h1 (only) alive\r
 378 \r
 379   But if we just look for the free CAF refs, we get:\r
 380         f   h2 (only)\r
 381         g   h1\r
 382 \r
 383   So we need to do a transitive closure thing to flesh out \r
 384   f's keep-alive refs to include h1.\r
 385 \r
 386 * The SRT info is the C_SRT field of Cmm.ClosureTypeInfo in a\r
 387   CmmInfoTable attached to each CmmProc.  CmmCPSZ.toTops actually does\r
 388   the attaching, right at the end of the pipeline.  The C_SRT part\r
 389   gives offsets within a single, shared table of closure pointers.\r
 390 \r
 391 * DECIDED: we can generate SRTs based on the final Cmm program\r
 392   without knowledge of how it is generated.\r
 393 \r
 394 ----------------------------------------------------\r
 395                 Foreign calls\r
 396 ----------------------------------------------------\r
 397 \r
 398 See Note [Foreign calls] in ZipCfgCmmRep!  This explains that a safe\r
 399 foreign call must do this:\r
 400   save thread state\r
 401   push info table (on thread stack) to describe frame\r
 402   make call (via C stack)\r
 403   pop info table\r
 404   restore thread state\r
 405 and explains why this expansion must be done late in the day.\r
 406 \r
 407 Hence, \r
 408   - Every foreign call is represented as a middle node\r
 409 \r
 410   - *Unsafe* foreign calls are simply "fat machine instructions"\r
 411       and are passed along to the native code generator\r
 412 \r
 413   - *Safe* foreign calls are "lowered" to unsafe calls by wrapping\r
 414       them in the above save/restore sequence. This step is done\r
 415       very late in the pipeline, just before handing to the native\r
 416       code gen.   \r
 417   \r
 418       This lowering is done by BuildInfoTables.lowerSafeForeignCalls\r
 419 \r
 420 \r
 421 NEW PLAN for foreign calls:\r
 422   - Unsafe foreign calls remain as a middle node (fat machine instruction)\r
 423     Even the parameter passing is not lowered (just as machine instrs\r
 424     get arguments).\r
 425 \r
 426   - Initially, safe foreign calls appear as LastCalls with \r
 427         \r
 428 \r
 429 ----------------------------------------------------\r
 430                 Cmm representations\r
 431 ----------------------------------------------------\r
 432 \r
 433 * Cmm.hs\r
 434      The type [GenCmm d h g] represents a whole module, \r
 435         ** one list element per .o file **\r
 436         Without SplitObjs, the list has exactly one element\r
 437 \r
 438      newtype GenCmm d h g = Cmm [GenCmmTop d h g]  -- A whole .o file\r
 439      data GenCmmTop d h g\r
 440          = CmmProc h g           -- One procedure, graph d\r
 441          | CmmData <stuff> [d]   -- Initialised data, items d\r
 442 \r
 443   Old and new piplines use different representations\r
 444         (CmmCvt.hs converts between the two)\r
 445 \r
 446 \r
 447 -------------\r
 448 OLD BACK END representations (Cmm.hs):  \r
 449       type Cmm = GenCmm CmmStatic CmmInfo (ListGraph CmmStmt)\r
 450                                 -- A whole module\r
 451       newtype ListGraph i = ListGraph [GenBasicBlock i]\r
 452 \r
 453       data CmmStmt = Assign | Store | Return etc -- OLD BACK END ONLY\r
 454 \r
 455 \r
 456    Once the info tables are laid out, we replace CmmInfo with [CmmStatic]\r
 457       type RawCmm    = GenCmm CmmStatic [CmmStatic] (ListGraph CmmStmt)\r
 458    which represents the info tables as data, that should \r
 459    immediately precede the code\r
 460   \r
 461 -------------\r
 462 NEW BACK END representations \r
 463 * Not Cmm-specific at all\r
 464     ZipCfg.hs defines  Graph, LGraph, FGraph,\r
 465                        ZHead, ZTail, ZBlock ...\r
 466 \r
 467               classes  LastNode, HavingSuccessors\r
 468 \r
 469     MkZipCfg.hs: AGraph: building graphs\r
 470 \r
 471 * ZipCfgCmmRep: instantiates ZipCfg for Cmm\r
 472       data Middle = ...CmmExpr...\r
 473       data Last = ...CmmExpr...\r
 474       type CmmGraph = Graph Middle Last\r
 475 \r
 476       type CmmZ = GenCmm CmmStatic CmmInfo (CmmStackInfo, CmmGraph)\r
 477       type CmmStackInfo = (ByteOff, Maybe ByteOff)\r
 478                 -- (SP offset on entry, update frame space = SP offset on exit)\r
 479                 -- The new codegen produces CmmZ, but once the stack is \r
 480                 -- manifested we can drop that in favour of \r
 481                 --    GenCmm CmmStatic CmmInfo CmmGraph\r
 482 \r
 483       Inside a CmmProc:\r
 484            - CLabel: used\r
 485            - CmmInfo: partly used by NEW\r
 486            - CmmFormals: not used at all  PERHAPS NOT EVEN BY OLD PIPELINE!\r
 487 \r
 488 * MkZipCfgCmm.hs: smart constructors for ZipCfgCmmRep\r
 489    Depends on (a) MkZipCfg (Cmm-independent)\r
 490               (b) ZipCfgCmmRep (Cmm-specific)\r
 491 \r
 492 -------------\r
 493 * SHARED stuff\r
 494   CmmExpr.hs defines the Cmm expression types\r
 495         - CmmExpr, CmmReg, Width, CmmLit, LocalReg, GlobalReg\r
 496         - CmmType, Width etc   (saparate module?)\r
 497         - MachOp               (separate module?)\r
 498         - Area, AreaId etc     (separate module?)\r
 499 \r
 500   BlockId.hs defines  BlockId, BlockEnv, BlockSet\r
 501 \r
 502 -------------\r
 503 \r
 504 \r
 505 -------------\r
 506 * Transactions indicate whether or not the result changes: CmmTx \r
 507      type Tx a = a -> TxRes a\r
 508      data TxRes a = TxRes ChangeFlag a\r