compiler/cmm/cmm-notes

   1 Notes on new codegen (Sept 09)\r
   2 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\r
   3 \r
   4 Things to do:\r
   5 \r
   6  - SDM (2010-02-26) can we remove the Foreign constructor from Convention?\r
   7    Reason: we never generate code for a function with the Foreign\r
   8    calling convention, and the code for calling foreign calls is generated\r
   9 \r
  10  - All dataflow analyses are in the FuelMonad, even though they\r
  11    are guarnteed to consume no fuel.  This seems silly\r
  12 \r
  13  - CmmContFlowOpt.runCmmContFlowOptZs is not called!\r
  14  - Why is runCmmOpts called from HscMain?  Seems too "high up".\r
  15    In fact HscMain calls (runCmmOpts cmmCfgOptsZ) which is what\r
  16    runCmmContFlowOptZs does.  Tidy up!\r
  17 \r
  18 \r
  19  - AsmCodeGen has a generic Cmm optimiser; move this into new pipeline\r
  20 \r
  21  - AsmCodeGen has post-native-cg branch elimiator (shortCutBranches);\r
  22    we ultimately want to share this with the Cmm branch eliminator.\r
  23 \r
  24  - At the moment, references to global registers like Hp are "lowered" \r
  25    late (in AsmCodeGen.fixAssignTop and cmmToCmm). We should do this\r
  26    early, in the new native codegen, much in the way that we lower \r
  27    calling conventions.  Might need to be a bit sophisticated about\r
  28    aliasing.\r
  29 \r
  30  - Refactor Cmm so that it contains only shared stuff\r
  31    Add a module MoribundCmm which contains stuff from\r
  32    Cmm for old code gen path\r
  33 \r
  34  - Question: currently we lift procpoints to become separate\r
  35    CmmProcs.  Do we still want to do this?\r
  36     \r
  37    NB: and advantage of continuing to do this is that\r
  38    we can do common-proc elimination!\r
  39 \r
  40  - Move to new Cmm rep:\r
  41      * Make native CG consume New Cmm; \r
  42      * Convert Old Cmm->New Cmm to keep old path alive\r
  43      * Produce New Cmm when reading in .cmm files\r
  44 \r
  45  - Consider module names\r
  46 \r
  47  - Top-level SRT threading is a bit ugly\r
  48 \r
  49  - Add type/newtype for CmmModule = [CmmGroup]    -- A module\r
  50                         CmmGroup  = [CmmTop]      -- A .o file\r
  51                         CmmTop    = Proc | Data   -- A procedure or data\r
  52 \r
  53  - This is a *change*: currently a CmmGroup is one function's-worth of code\r
  54    regardless of SplitObjs.   Question: can we *always* generate M.o if there\r
  55    is just one element in the list (rather than M/M1.o, M/M2.o etc)\r
  56 \r
  57    One SRT per group.\r
  58 \r
  59  - See "CAFs" below; we want to totally refactor the way SRTs are calculated\r
  60 \r
  61  - Change  \r
  62       type CmmZ = GenCmm CmmStatic CmmInfo (CmmStackInfo, CmmGraph)\r
  63    to\r
  64       type CmmZ = GenCmm CmmStatic (CmmInfo, CmmStackInfo) CmmGraph\r
  65         -- And perhaps take opportunity to prune CmmInfo?\r
  66 \r
  67  - Clarify which fields of CmmInfo are still used\r
  68  - Maybe get rid of CmmFormals arg of CmmProc in all versions?\r
  69 \r
  70  - We aren't sure whether cmmToRawCmm is actively used by the new pipeline; check\r
  71    And what does CmmBuildInfoTables do?!\r
  72 \r
  73  - Nuke CmmZipUtil, move zipPreds into ZipCfg\r
  74 \r
  75  - Pull out Areas into its own module\r
  76    Parameterise AreaMap\r
  77    Add ByteWidth = Int\r
  78    type SubArea    = (Area, ByteOff, ByteWidth) \r
  79    ByteOff should not be defined in SMRep -- that is too high up the hierarchy\r
  80    \r
  81  - SMRep should not be imported by any module in cmm/!  Make it so.\r
  82         -- ByteOff etc   ==>  CmmExpr\r
  83         -- rET_SMALL etc ==> CmmInfo\r
  84    Check that there are no other imports from codeGen in cmm/\r
  85 \r
  86  - Think about a non-flattened representation?\r
  87 \r
  88  - LastCall: \r
  89     * Use record fields for LastCall!\r
  90     * cml_ret_off should be a ByteOff\r
  91     * Split into \r
  92          LastCall (which has a successor) and\r
  93          LastJump (which does not, includes return?)\r
  94            - does not have cml_cont, cml_ret_args, cml_ret_off\r
  95          LastForeignCall \r
  96            - safe! \r
  97            - expands into save/MidForeignCall/restore/goto\r
  98            - like any LastCall, target of the call gets an info table\r
  99 \r
 100  - JD: remind self of what goes wrong if you turn off the \r
 101    liveness of the update frame\r
 102 \r
 103  - Garbage-collect http://hackage.haskell.org/trac/ghc/wiki/Commentary/Compiler/CPS\r
 104    moving good stuff into \r
 105    http://hackage.haskell.org/trac/ghc/wiki/Commentary/Compiler/NewCodeGenPipeline\r
 106 \r
 107 \r
 108  - We believe that all of CmmProcPointZ.addProcPointProtocols is dead.  What\r
 109    goes wrong if we simply never call it?\r
 110 \r
 111  - Something fishy in CmmStackLayout.hs\r
 112    * In particular, 'getAreaSize' returns an AreaMap, but we *know* the width of\r
 113         LocalRegs, so it'd be better to return FiniteMap AreaId ByteWidth\r
 114    * setSuccSPs looks fishy.  Rather than lookin in procPoints, it could\r
 115         just lookup the block in areaSize which, after all, has a binding\r
 116         for precisely successors of calls.  All other blocks (including proc\r
 117         points that are not successors of a call, we think) can be treated\r
 118         uniformly: zero-size Area, and use inSP.\r
 119 \r
 120 \r
 121  - Currently AsmCodeGen top level calls AsmCodeGen.cmmToCmm, which is a small\r
 122    C-- optimiser.  It has quite a lot of boilerplate folding code in AsmCodeGen\r
 123    (cmmBlockConFold, cmmStmtConFold, cmmExprConFold), before calling out to\r
 124    CmmOpt.  ToDo: see what optimisations are being done; and do them before\r
 125    AsmCodeGen.\r
 126 \r
 127  - Modularise the CPS pipeline; instead of ...; A;B;C; ...\r
 128                                 use  ..; ABC; ....\r
 129 \r
 130  - Most of HscMain.tryNewCodeGen does not belong in HscMain.  Instead\r
 131         if new_cg then\r
 132              StgCmm.codeGen\r
 133              processCmm  [including generating "raw" cmm]\r
 134         else\r
 135              CodeGen.codeGen\r
 136              cmmToRawCmm\r
 137 \r
 138 \r
 139  - If we stick CAF and stack liveness info on a LastCall node (not LastRet/Jump)\r
 140    then all CAF and stack liveness stuff be completed before we split\r
 141    into separate C procedures.\r
 142 \r
 143    Short term:\r
 144      compute and attach liveness into to LastCall\r
 145      right at end, split, cvt to old rep\r
 146      [must split before cvt, because old rep is not expressive enough]\r
 147 \r
 148    Longer term: \r
 149      when old rep disappears, \r
 150      move the whole splitting game into the C back end *only*\r
 151          (guided by the procpoint set)\r
 152 \r
 153       \r
 154 ----------------------------------------------------\r
 155         Modules in cmm/\r
 156 ----------------------------------------------------\r
 157 \r
 158 -------- Dead stuff ------------\r
 159 CmmProcPoint        Dead: Michael Adams\r
 160 CmmCPS              Dead: Michael Adams\r
 161 CmmCPSGen.hs        Dead: Michael Adams\r
 162 CmmBrokenBlock.hs   Dead: Michael Adams\r
 163 CmmLive.hs          Dead: Michael Adams\r
 164 CmmProcPoint.hs     Dead: Michael Adams\r
 165 Dataflow.hs         Dead: Michael Adams\r
 166 StackColor.hs       Norman?\r
 167 StackPlacements.hs  Norman?\r
 168 \r
 169 HscMain.optionallyConvertAndOrCPS\r
 170         testCmmConversion\r
 171 DynFlags:  -fconvert-to-zipper-and-back, -frun-cps, -frun-cpsz\r
 172 \r
 173 -------- Moribund stuff ------------\r
 174 CmmCvt.hs      Conversion between old and new Cmm reps\r
 175 CmmOpt.hs      Hopefully-redundant optimiser\r
 176 CmmZipUtil.hs  Only one function; move elsewhere\r
 177 \r
 178 -------- Stuff to keep ------------\r
 179 CmmCPSZ.hs                Driver for new pipeline\r
 180 \r
 181 CmmLiveZ.hs               Liveness analysis, dead code elim\r
 182 CmmProcPointZ.hs          Identifying and splitting out proc-points\r
 183 \r
 184 CmmSpillReload.hs         Save and restore across calls\r
 185 \r
 186 CmmCommonBlockElimZ.hs    Common block elim\r
 187 CmmContFlowOpt.hs         Other optimisations (branch-chain, merging)\r
 188 \r
 189 CmmBuildInfoTables.hs     New info-table \r
 190 CmmStackLayout.hs         and stack layout \r
 191 CmmCallConv.hs\r
 192 CmmInfo.hs                Defn of InfoTables, and conversion to exact layout\r
 193 \r
 194 ---------- Cmm data types --------------\r
 195 ZipCfgCmmRep.hs     Cmm instantiations of dataflow graph framework\r
 196 MkZipCfgCmm.hs      Cmm instantiations of dataflow graph framework\r
 197 \r
 198 Cmm.hs        Key module; a mix of old and new stuff\r
 199                   so needs tidying up in due course\r
 200 CmmExpr.hs\r
 201 CmmUtils.hs\r
 202 CmmLint.hs\r
 203 \r
 204 PprC.hs             Pretty print Cmm in C syntax\r
 205 PprCmm.hs           Pretty printer for Cmm\r
 206 PprCmmZ.hs          Additional stuff for zipper rep\r
 207 \r
 208 CLabel.hs     CLabel\r
 209 \r
 210 ----------  Dataflow modules --------------\r
 211    Goal: separate library; for now, separate directory\r
 212 \r
 213 MkZipCfg.hs\r
 214 ZipCfg.hs\r
 215 ZipCfgExtras.hs\r
 216 ZipDataflow.hs\r
 217 CmmTx.hs              Transactions\r
 218 OptimizationFuel.hs   Fuel\r
 219 BlockId.hs    BlockId, BlockEnv, BlockSet\r
 220 DFMonad.hs            \r
 221 \r
 222 \r
 223 ----------------------------------------------------\r
 224       Top-level structure\r
 225 ----------------------------------------------------\r
 226 \r
 227 * New codgen called in HscMain.hscGenHardCode, by calling HscMain.tryNewCodeGen, \r
 228   enabled by -fnew-codegen (Opt_TryNewCodeGen)\r
 229 \r
 230   THEN it calls CmmInfo.cmmToRawCmm to lay out the details of info tables\r
 231       type Cmm    = GenCmm CmmStatic CmmInfo     (ListGraph CmmStmt)\r
 232       type RawCmm = GenCmm CmmStatic [CmmStatic] (ListGraph CmmStmt)\r
 233 \r
 234 * HscMain.tryNewCodeGen\r
 235     - STG->Cmm:    StgCmm.codeGen (new codegen)\r
 236     - Optimise:    CmmContFlowOpt (simple optimisations, very self contained)\r
 237     - Cps convert: CmmCPSZ.protoCmmCPSZ \r
 238     - Optimise:    CmmContFlowOpt again\r
 239     - Convert:     CmmCvt.cmmOfZgraph (convert to old rep) very self contained\r
 240 \r
 241 * StgCmm.hs  The new STG -> Cmm conversion code generator\r
 242   Lots of modules StgCmmXXX\r
 243 \r
 244 \r
 245 ----------------------------------------------------\r
 246       CmmCPSZ.protoCmmCPSZ   The new pipeline\r
 247 ----------------------------------------------------\r
 248 \r
 249 CmmCPSZprotoCmmCPSZ:\r
 250    1. Do cpsTop for each procedures separately\r
 251    2. Build SRT representation; this spans multiple procedures\r
 252         (unless split-objs)\r
 253 \r
 254 cpsTop:\r
 255   * CmmCommonBlockElimZ.elimCommonBlocks:\r
 256         eliminate common blocks \r
 257 \r
 258   * CmmProcPointZ.minimalProcPointSet\r
 259         identify proc-points\r
 260         no change to graph\r
 261 \r
 262   * CmmProcPointZ.addProcPointProtocols\r
 263         something to do with the MA optimisation\r
 264         probably entirely unnecessary\r
 265 \r
 266   * Spill and reload:\r
 267      - CmmSpillReload.dualLivenessWithInsertion\r
 268        insert spills/reloads across \r
 269            LastCalls, and \r
 270            Branches to proc-points\r
 271      Now sink those reloads:\r
 272      - CmmSpillReload.insertLateReloads\r
 273      - CmmSpillReload.removeDeadAssignmentsAndReloads\r
 274 \r
 275   * CmmStackLayout.stubSlotsOnDeath\r
 276         debug only: zero out dead slots when they die\r
 277 \r
 278   * Stack layout\r
 279      - CmmStackLayout.lifeSlotAnal: \r
 280        find which sub-areas are live on entry to each block\r
 281 \r
 282      - CmmStackLayout.layout\r
 283        Lay out the stack, returning an AreaMap\r
 284          type AreaMap = FiniteMap Area ByteOff\r
 285           -- Byte offset of the oldest byte of the Area, \r
 286           -- relative to the oldest byte of the Old Area\r
 287 \r
 288      - CmmStackLayout.manifestSP\r
 289        Manifest the stack pointer\r
 290 \r
 291    * Split into separate procedures\r
 292       - CmmProcPointZ.procPointAnalysis\r
 293         Given set of proc points, which blocks are reachable from each\r
 294         Claim: too few proc-points => code duplication, but program still works??\r
 295 \r
 296       - CmmProcPointZ.splitAtProcPoints\r
 297         Using this info, split into separate procedures\r
 298 \r
 299       - CmmBuildInfoTables.setInfoTableStackMap\r
 300         Attach stack maps to each info table\r
 301 \r
 302 \r
 303 ----------------------------------------------------\r
 304         Proc-points\r
 305 ----------------------------------------------------\r
 306 \r
 307 Consider this program, which has a diamond control flow, \r
 308 with a call on one branch\r
 309  fn(p,x) {\r
 310         h()\r
 311         if b then { ... f(x) ...; q=5; goto J }\r
 312              else { ...; q=7; goto J }\r
 313      J: ..p...q...\r
 314   }\r
 315 then the join point J is a "proc-point".  So, is 'p' passed to J\r
 316 as a parameter?  Or, if 'p' was saved on the stack anyway, perhaps\r
 317 to keep it alive across the call to h(), maybe 'p' gets communicated\r
 318 to J that way. This is an awkward choice.  (We think that we currently\r
 319 never pass variables to join points via arguments.)\r
 320 \r
 321 Furthermore, there is *no way* to pass q to J in a register (other\r
 322 than a paramter register).\r
 323 \r
 324 What we want is to do register allocation across the whole caboodle.\r
 325 Then we could drop all the code that deals with the above awkward\r
 326 decisions about spilling variables across proc-points.\r
 327 \r
 328 Note that J doesn't need an info table.\r
 329 \r
 330 What we really want is for each LastCall (not LastJump/Ret) \r
 331 to have an info table.   Note that ProcPoints that are not successors\r
 332 of calls don't need an info table.\r
 333 \r
 334 Figuring out proc-points\r
 335 ~~~~~~~~~~~~~~~~~~~~~~~~\r
 336 Proc-points are identified by\r
 337 CmmProcPointZ.minimalProcPointSet/extendPPSet Although there isn't\r
 338 that much code, JD thinks that it could be done much more nicely using\r
 339 a dominator analysis, using the Dataflow Engine.\r
 340 \r
 341 ----------------------------------------------------\r
 342                 CAFs\r
 343 ----------------------------------------------------\r
 344 \r
 345 * The code for a procedure f may refer to either the *closure* \r
 346   or the *entry point* of another top-level procedure g.  \r
 347   If f is live, then so is g.  f's SRT must include g's closure.\r
 348 \r
 349 * The CLabel for the entry-point/closure reveals whether g is \r
 350   a CAF (or refers to CAFs).  See the IdLabel constructor of CLabel.\r
 351 \r
 352 * The CAF-ness of the original top-level defininions is figured out\r
 353   (by TidyPgm) before we generate C--.  This CafInfo is only set for\r
 354   top-level Ids; nested bindings stay with MayHaveCafRefs.\r
 355 \r
 356 * Currently an SRT contains (only) pointers to (top-level) closures.\r
 357 \r
 358 * Consider this Core code\r
 359         f = \x -> let g = \y -> ...x...y...h1...\r
 360                   in ...h2...g...\r
 361   and suppose that h1, h2 have IdInfo of MayHaveCafRefs.\r
 362   Therefore, so will f,  But g will not (since it's nested).\r
 363 \r
 364   This generates C-- roughly like this:\r
 365      f_closure: .word f_entry\r
 366      f_entry() [info-tbl-for-f] { ...jump g_entry...jump h2... }\r
 367      g_entry() [info-tbl-for-g] { ...jump h1... }\r
 368 \r
 369   Note that there is no top-level closure for g (only an info table).\r
 370   This fact (whether or not there is a top-level closure) is recorded\r
 371   in the InfoTable attached to the CmmProc for f, g\r
 372   INVARIANT: \r
 373      Any out-of-Group references to an IdLabel goes to\r
 374      a Proc whose InfoTable says "I have a top-level closure".\r
 375   Equivalently: \r
 376      A CmmProc whose InfoTable says "I do not have a top-level\r
 377      closure" is referred to only from its own Group.\r
 378 \r
 379 * So:   info-tbl-for-f must have an SRT that keeps h1,h2 alive\r
 380         info-tbl-for-g must have an SRT that keeps h1 (only) alive\r
 381 \r
 382   But if we just look for the free CAF refs, we get:\r
 383         f   h2 (only)\r
 384         g   h1\r
 385 \r
 386   So we need to do a transitive closure thing to flesh out \r
 387   f's keep-alive refs to include h1.\r
 388 \r
 389 * The SRT info is the C_SRT field of Cmm.ClosureTypeInfo in a\r
 390   CmmInfoTable attached to each CmmProc.  CmmCPSZ.toTops actually does\r
 391   the attaching, right at the end of the pipeline.  The C_SRT part\r
 392   gives offsets within a single, shared table of closure pointers.\r
 393 \r
 394 * DECIDED: we can generate SRTs based on the final Cmm program\r
 395   without knowledge of how it is generated.\r
 396 \r
 397 ----------------------------------------------------\r
 398                 Foreign calls\r
 399 ----------------------------------------------------\r
 400 \r
 401 See Note [Foreign calls] in ZipCfgCmmRep!  This explains that a safe\r
 402 foreign call must do this:\r
 403   save thread state\r
 404   push info table (on thread stack) to describe frame\r
 405   make call (via C stack)\r
 406   pop info table\r
 407   restore thread state\r
 408 and explains why this expansion must be done late in the day.\r
 409 \r
 410 Hence, \r
 411   - Every foreign call is represented as a middle node\r
 412 \r
 413   - *Unsafe* foreign calls are simply "fat machine instructions"\r
 414       and are passed along to the native code generator\r
 415 \r
 416   - *Safe* foreign calls are "lowered" to unsafe calls by wrapping\r
 417       them in the above save/restore sequence. This step is done\r
 418       very late in the pipeline, just before handing to the native\r
 419       code gen.   \r
 420   \r
 421       This lowering is done by BuildInfoTables.lowerSafeForeignCalls\r
 422 \r
 423 \r
 424 NEW PLAN for foreign calls:\r
 425   - Unsafe foreign calls remain as a middle node (fat machine instruction)\r
 426     Even the parameter passing is not lowered (just as machine instrs\r
 427     get arguments).\r
 428 \r
 429   - Initially, safe foreign calls appear as LastCalls with \r
 430         \r
 431 \r
 432 ----------------------------------------------------\r
 433                 Cmm representations\r
 434 ----------------------------------------------------\r
 435 \r
 436 * Cmm.hs\r
 437      The type [GenCmm d h g] represents a whole module, \r
 438         ** one list element per .o file **\r
 439         Without SplitObjs, the list has exactly one element\r
 440 \r
 441      newtype GenCmm d h g = Cmm [GenCmmTop d h g]  -- A whole .o file\r
 442      data GenCmmTop d h g\r
 443          = CmmProc h g           -- One procedure, graph d\r
 444          | CmmData <stuff> [d]   -- Initialised data, items d\r
 445 \r
 446   Old and new piplines use different representations\r
 447         (CmmCvt.hs converts between the two)\r
 448 \r
 449 \r
 450 -------------\r
 451 OLD BACK END representations (Cmm.hs):  \r
 452       type Cmm = GenCmm CmmStatic CmmInfo (ListGraph CmmStmt)\r
 453                                 -- A whole module\r
 454       newtype ListGraph i = ListGraph [GenBasicBlock i]\r
 455 \r
 456       data CmmStmt = Assign | Store | Return etc -- OLD BACK END ONLY\r
 457 \r
 458 \r
 459    Once the info tables are laid out, we replace CmmInfo with [CmmStatic]\r
 460       type RawCmm    = GenCmm CmmStatic [CmmStatic] (ListGraph CmmStmt)\r
 461    which represents the info tables as data, that should \r
 462    immediately precede the code\r
 463   \r
 464 -------------\r
 465 NEW BACK END representations \r
 466 * Not Cmm-specific at all\r
 467     ZipCfg.hs defines  Graph, LGraph, FGraph,\r
 468                        ZHead, ZTail, ZBlock ...\r
 469 \r
 470               classes  LastNode, HavingSuccessors\r
 471 \r
 472     MkZipCfg.hs: AGraph: building graphs\r
 473 \r
 474 * ZipCfgCmmRep: instantiates ZipCfg for Cmm\r
 475       data Middle = ...CmmExpr...\r
 476       data Last = ...CmmExpr...\r
 477       type CmmGraph = Graph Middle Last\r
 478 \r
 479       type CmmZ = GenCmm CmmStatic CmmInfo (CmmStackInfo, CmmGraph)\r
 480       type CmmStackInfo = (ByteOff, Maybe ByteOff)\r
 481                 -- (SP offset on entry, update frame space = SP offset on exit)\r
 482                 -- The new codegen produces CmmZ, but once the stack is \r
 483                 -- manifested we can drop that in favour of \r
 484                 --    GenCmm CmmStatic CmmInfo CmmGraph\r
 485 \r
 486       Inside a CmmProc:\r
 487            - CLabel: used\r
 488            - CmmInfo: partly used by NEW\r
 489            - CmmFormals: not used at all  PERHAPS NOT EVEN BY OLD PIPELINE!\r
 490 \r
 491 * MkZipCfgCmm.hs: smart constructors for ZipCfgCmmRep\r
 492    Depends on (a) MkZipCfg (Cmm-independent)\r
 493               (b) ZipCfgCmmRep (Cmm-specific)\r
 494 \r
 495 -------------\r
 496 * SHARED stuff\r
 497   CmmExpr.hs defines the Cmm expression types\r
 498         - CmmExpr, CmmReg, Width, CmmLit, LocalReg, GlobalReg\r
 499         - CmmType, Width etc   (saparate module?)\r
 500         - MachOp               (separate module?)\r
 501         - Area, AreaId etc     (separate module?)\r
 502 \r
 503   BlockId.hs defines  BlockId, BlockEnv, BlockSet\r
 504 \r
 505 -------------\r
 506 \r
 507 \r
 508 -------------\r
 509 * Transactions indicate whether or not the result changes: CmmTx \r
 510      type Tx a = a -> TxRes a\r
 511      data TxRes a = TxRes ChangeFlag a\r