update submodule pointers

[ghc-hetmet.git] / compiler / cmm / cmm-notes
diff --git a/compiler/cmm/cmm-notes b/compiler/cmm/cmm-notes

index 0852711..4a87911 100644 (file)
--- a/compiler/cmm/cmm-notes
+++ b/compiler/cmm/cmm-notes
@@ -1,3 +1,38 @@
+More notes (June 11)\r
+~~~~~~~~~~~~~~~~~~~~\r
+* Possible refactoring: Nuke AGraph in favour of \r
+      mkIfThenElse :: Expr -> Graph -> Graph -> FCode Graph\r
+  or even\r
+      mkIfThenElse :: HasUniques m => Expr -> Graph -> Graph -> m Graph\r
+  (Remmber that the .cmm file parser must use this function)\r
+\r
+  or parameterise FCode over its envt; the CgState part seem useful for both\r
+\r
+* "Remove redundant reloads" in CmmSpillReload should be redundant; since\r
+  insertLateReloads is now gone, every reload is reloading a live variable.\r
+  Test and nuke.\r
+\r
+* Sink and inline S(RegSlot(x)) = e in precisely the same way that we\r
+  sink and inline x = e\r
+\r
+* Stack layout is very like register assignment: find non-conflicting assigments.\r
+  In particular we can use colouring or linear scan (etc).\r
+\r
+  We'd fine-grain interference (on a word by word basis) to get maximum overlap.\r
+  But that may make very big interference graphs.  So linear scan might be\r
+  more attactive.\r
+\r
+  NB: linear scan does on-the-fly live range splitting.\r
+\r
+* When stubbing dead slots be careful not to write into an area that\r
+  overlaps with an area that's in use.  So stubbing needs to *follow* \r
+  stack layout.\r
+\r
+\r
+More notes (May 11)\r
+~~~~~~~~~~~~~~~~~~~\r
+In CmmNode, consider spliting CmmCall into two: call and jump\r
+\r
  Notes on new codegen (Aug 10)\r
  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\r
  \r
@@ -15,14 +50,11 @@ Things to do:
         This will fix the spill before stack check problem but only really as a side\r
         effect. A 'real fix' probably requires making the spiller know about sp checks.\r
  \r
- - There is some silly stuff happening with the Sp. We end up with code like:\r
-   Sp = Sp + 8; R1 = _vwf::I64; Sp = Sp -8\r
-       Seems to be perhaps caused by the issue above but also maybe a optimisation\r
-       pass needed?\r
+   EZY: I don't understand this comment. David Terei, can you clarify?\r
  \r
- - Proc pass all arguments on the stack, adding more code and slowing down things\r
-   a lot. We either need to fix this or even better would be to get rid of\r
-       proc points.\r
+ - Proc points pass all arguments on the stack, adding more code and\r
+   slowing down things a lot. We either need to fix this or even better\r
+   would be to get rid of proc points.\r
  \r
   - CmmInfo.cmmToRawCmm uses Old.Cmm, so it is called after converting Cmm.Cmm to\r
     Old.Cmm. We should abstract it to work on both representations, it needs only to\r
@@ -32,7 +64,7 @@ Things to do:
     we could convert codeGen/StgCmm* clients to the Hoopl's semantics?\r
     It's all deeply unsatisfactory.\r
  \r
- - Improve preformance of Hoopl.\r
+ - Improve performance of Hoopl.\r
  \r
     A nofib comparison of -fasm vs -fnewcodegen nofib compilation parameters\r
     (using the same ghc-cmm branch +libraries compiled by the old codegenerator)\r
@@ -50,6 +82,9 @@ Things to do:
  \r
     So we generate a bit better code, but it takes us longer!\r
  \r
+   EZY: Also importantly, Hoopl uses dramatically more memory than the\r
+   old code generator.\r
+\r
   - Are all blockToNodeList and blockOfNodeList really needed? Maybe we could\r
     splice blocks instead?\r
  \r
@@ -57,7 +92,7 @@ Things to do:
     a block catenation function would be probably nicer than blockToNodeList\r
     / blockOfNodeList combo.\r
  \r
- - loweSafeForeignCall seems too lowlevel. Just use Dataflow. After that\r
+ - lowerSafeForeignCall seems too lowlevel. Just use Dataflow. After that\r
     delete splitEntrySeq from HooplUtils.\r
  \r
   - manifestSP seems to touch a lot of the graph representation. It is\r
@@ -76,6 +111,9 @@ Things to do:
     calling convention, and the code for calling foreign calls is generated\r
  \r
   - AsmCodeGen has a generic Cmm optimiser; move this into new pipeline\r
+   EZY (2011-04-16): The mini-inliner has been generalized and ported,\r
+   but the constant folding and other optimizations need to still be\r
+   ported.\r
  \r
   - AsmCodeGen has post-native-cg branch eliminator (shortCutBranches);\r
     we ultimately want to share this with the Cmm branch eliminator.\r
@@ -113,7 +151,7 @@ Things to do:
   - See "CAFs" below; we want to totally refactor the way SRTs are calculated\r
  \r
   - Pull out Areas into its own module\r
-   Parameterise AreaMap\r
+   Parameterise AreaMap (note there are type synonyms in CmmStackLayout!)\r
     Add ByteWidth = Int\r
     type SubArea    = (Area, ByteOff, ByteWidth) \r
     ByteOff should not be defined in SMRep -- that is too high up the hierarchy\r
@@ -210,7 +248,7 @@ CmmCvt.hs      Conversion between old and new Cmm reps
  CmmOpt.hs      Hopefully-redundant optimiser\r
  \r
  -------- Stuff to keep ------------\r
-CmmCPS.hs                 Driver for new pipeline\r
+CmmPipeline.hs            Driver for new pipeline\r
  \r
  CmmLive.hs                Liveness analysis, dead code elim\r
  CmmProcPoint.hs           Identifying and splitting out proc-points\r
@@ -257,24 +295,24 @@ BlockId.hs          BlockId, BlockEnv, BlockSet
        type RawCmm = GenCmm CmmStatic [CmmStatic] (ListGraph CmmStmt)\r
  \r
  * HscMain.tryNewCodeGen\r
-    - STG->Cmm:    StgCmm.codeGen (new codegen)\r
-    - Optimise:    CmmContFlowOpt (simple optimisations, very self contained)\r
-    - Cps convert: CmmCPS.protoCmmCPS \r
-    - Optimise:    CmmContFlowOpt again\r
-    - Convert:     CmmCvt.cmmOfZgraph (convert to old rep) very self contained\r
+    - STG->Cmm:         StgCmm.codeGen (new codegen)\r
+    - Optimize and CPS: CmmPipeline.cmmPipeline\r
+    - Convert:          CmmCvt.cmmOfZgraph (convert to old rep) very self contained\r
  \r
  * StgCmm.hs  The new STG -> Cmm conversion code generator\r
    Lots of modules StgCmmXXX\r
  \r
  \r
  ----------------------------------------------------\r
-      CmmCPS.protoCmmCPS   The new pipeline\r
+      CmmPipeline.cmmPipeline   The new pipeline\r
  ----------------------------------------------------\r
  \r
-CmmCPS.protoCmmCPS:\r
-   1. Do cpsTop for each procedures separately\r
-   2. Build SRT representation; this spans multiple procedures\r
-       (unless split-objs)\r
+CmmPipeline.cmmPipeline:\r
+   1. Do control flow optimization\r
+   2. Do cpsTop for each procedures separately\r
+   3. Build SRT representation; this spans multiple procedures\r
+        (unless split-objs)\r
+   4. Do control flow optimization on all resulting procedures\r
  \r
  cpsTop:\r
    * CmmCommonBlockElim.elimCommonBlocks:\r
@@ -293,8 +331,8 @@ cpsTop:
         insert spills/reloads across \r
            LastCalls, and \r
            Branches to proc-points\r
-     Now sink those reloads:\r
-     - CmmSpillReload.insertLateReloads\r
+     Now sink those reloads (and other instructions):\r
+     - CmmSpillReload.rewriteAssignments\r
       - CmmSpillReload.removeDeadAssignmentsAndReloads\r
  \r
    * CmmStackLayout.stubSlotsOnDeath\r
@@ -344,7 +382,7 @@ to J that way. This is an awkward choice.  (We think that we currently
  never pass variables to join points via arguments.)\r
  \r
  Furthermore, there is *no way* to pass q to J in a register (other\r
-than a paramter register).\r
+than a parameter register).\r
  \r
  What we want is to do register allocation across the whole caboodle.\r
  Then we could drop all the code that deals with the above awkward\r
@@ -412,7 +450,7 @@ a dominator analysis, using the Dataflow Engine.
    f's keep-alive refs to include h1.\r
  \r
  * The SRT info is the C_SRT field of Cmm.ClosureTypeInfo in a\r
-  CmmInfoTable attached to each CmmProc.  CmmCPS.toTops actually does\r
+  CmmInfoTable attached to each CmmProc.  CmmPipeline.toTops actually does\r
    the attaching, right at the end of the pipeline.  The C_SRT part\r
    gives offsets within a single, shared table of closure pointers.\r
  \r