1 Notes on new codegen (Sept 09)
\r
2 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
\r
6 - SDM (2010-02-26) can we remove the Foreign constructor from Convention?
\r
7 Reason: we never generate code for a function with the Foreign
\r
8 calling convention, and the code for calling foreign calls is generated
\r
10 - All dataflow analyses are in the FuelMonad, even though they
\r
11 are guarnteed to consume no fuel. This seems silly
\r
13 - CmmContFlowOpt.runCmmContFlowOptZs is not called!
\r
14 - Why is runCmmOpts called from HscMain? Seems too "high up".
\r
15 In fact HscMain calls (runCmmOpts cmmCfgOptsZ) which is what
\r
16 runCmmContFlowOptZs does. Tidy up!
\r
19 - AsmCodeGen has a generic Cmm optimiser; move this into new pipeline
\r
21 - AsmCodeGen has post-native-cg branch elimiator (shortCutBranches);
\r
22 we ultimately want to share this with the Cmm branch eliminator.
\r
24 - At the moment, references to global registers like Hp are "lowered"
\r
25 late (in AsmCodeGen.fixAssignTop and cmmToCmm). We should do this
\r
26 early, in the new native codegen, much in the way that we lower
\r
27 calling conventions. Might need to be a bit sophisticated about
\r
30 - Refactor Cmm so that it contains only shared stuff
\r
31 Add a module MoribundCmm which contains stuff from
\r
32 Cmm for old code gen path
\r
34 - Question: currently we lift procpoints to become separate
\r
35 CmmProcs. Do we still want to do this?
\r
37 NB: and advantage of continuing to do this is that
\r
38 we can do common-proc elimination!
\r
40 - Move to new Cmm rep:
\r
41 * Make native CG consume New Cmm;
\r
42 * Convert Old Cmm->New Cmm to keep old path alive
\r
43 * Produce New Cmm when reading in .cmm files
\r
45 - Consider module names
\r
47 - Top-level SRT threading is a bit ugly
\r
49 - Add type/newtype for CmmModule = [CmmGroup] -- A module
\r
50 CmmGroup = [CmmTop] -- A .o file
\r
51 CmmTop = Proc | Data -- A procedure or data
\r
53 - This is a *change*: currently a CmmGroup is one function's-worth of code
\r
54 regardless of SplitObjs. Question: can we *always* generate M.o if there
\r
55 is just one element in the list (rather than M/M1.o, M/M2.o etc)
\r
59 - See "CAFs" below; we want to totally refactor the way SRTs are calculated
\r
62 type CmmZ = GenCmm CmmStatic CmmInfo (CmmStackInfo, CmmGraph)
\r
64 type CmmZ = GenCmm CmmStatic (CmmInfo, CmmStackInfo) CmmGraph
\r
65 -- And perhaps take opportunity to prune CmmInfo?
\r
67 - Clarify which fields of CmmInfo are still used
\r
68 - Maybe get rid of CmmFormals arg of CmmProc in all versions?
\r
70 - We aren't sure whether cmmToRawCmm is actively used by the new pipeline; check
\r
71 And what does CmmBuildInfoTables do?!
\r
73 - Nuke CmmZipUtil, move zipPreds into ZipCfg
\r
75 - Pull out Areas into its own module
\r
76 Parameterise AreaMap
\r
78 type SubArea = (Area, ByteOff, ByteWidth)
\r
79 ByteOff should not be defined in SMRep -- that is too high up the hierarchy
\r
81 - SMRep should not be imported by any module in cmm/! Make it so.
\r
82 -- ByteOff etc ==> CmmExpr
\r
83 -- rET_SMALL etc ==> CmmInfo
\r
84 Check that there are no other imports from codeGen in cmm/
\r
86 - Think about a non-flattened representation?
\r
89 * Use record fields for LastCall!
\r
90 * cml_ret_off should be a ByteOff
\r
92 LastCall (which has a successor) and
\r
93 LastJump (which does not, includes return?)
\r
94 - does not have cml_cont, cml_ret_args, cml_ret_off
\r
97 - expands into save/MidForeignCall/restore/goto
\r
98 - like any LastCall, target of the call gets an info table
\r
100 - JD: remind self of what goes wrong if you turn off the
\r
101 liveness of the update frame
\r
103 - Garbage-collect http://hackage.haskell.org/trac/ghc/wiki/Commentary/Compiler/CPS
\r
104 moving good stuff into
\r
105 http://hackage.haskell.org/trac/ghc/wiki/Commentary/Compiler/NewCodeGenPipeline
\r
108 - We believe that all of CmmProcPointZ.addProcPointProtocols is dead. What
\r
109 goes wrong if we simply never call it?
\r
111 - Something fishy in CmmStackLayout.hs
\r
112 * In particular, 'getAreaSize' returns an AreaMap, but we *know* the width of
\r
113 LocalRegs, so it'd be better to return FiniteMap AreaId ByteWidth
\r
114 * setSuccSPs looks fishy. Rather than lookin in procPoints, it could
\r
115 just lookup the block in areaSize which, after all, has a binding
\r
116 for precisely successors of calls. All other blocks (including proc
\r
117 points that are not successors of a call, we think) can be treated
\r
118 uniformly: zero-size Area, and use inSP.
\r
121 - Currently AsmCodeGen top level calls AsmCodeGen.cmmToCmm, which is a small
\r
122 C-- optimiser. It has quite a lot of boilerplate folding code in AsmCodeGen
\r
123 (cmmBlockConFold, cmmStmtConFold, cmmExprConFold), before calling out to
\r
124 CmmOpt. ToDo: see what optimisations are being done; and do them before
\r
127 - Modularise the CPS pipeline; instead of ...; A;B;C; ...
\r
130 - Most of HscMain.tryNewCodeGen does not belong in HscMain. Instead
\r
133 processCmm [including generating "raw" cmm]
\r
139 - If we stick CAF and stack liveness info on a LastCall node (not LastRet/Jump)
\r
140 then all CAF and stack liveness stuff be completed before we split
\r
141 into separate C procedures.
\r
144 compute and attach liveness into to LastCall
\r
145 right at end, split, cvt to old rep
\r
146 [must split before cvt, because old rep is not expressive enough]
\r
149 when old rep disappears,
\r
150 move the whole splitting game into the C back end *only*
\r
151 (guided by the procpoint set)
\r
154 ----------------------------------------------------
\r
156 ----------------------------------------------------
\r
158 -------- Dead stuff ------------
\r
159 CmmProcPoint Dead: Michael Adams
\r
160 CmmCPS Dead: Michael Adams
\r
161 CmmCPSGen.hs Dead: Michael Adams
\r
162 CmmBrokenBlock.hs Dead: Michael Adams
\r
163 CmmLive.hs Dead: Michael Adams
\r
164 CmmProcPoint.hs Dead: Michael Adams
\r
165 Dataflow.hs Dead: Michael Adams
\r
166 StackColor.hs Norman?
\r
167 StackPlacements.hs Norman?
\r
169 HscMain.optionallyConvertAndOrCPS
\r
171 DynFlags: -fconvert-to-zipper-and-back, -frun-cps, -frun-cpsz
\r
173 -------- Moribund stuff ------------
\r
174 CmmCvt.hs Conversion between old and new Cmm reps
\r
175 CmmOpt.hs Hopefully-redundant optimiser
\r
176 CmmZipUtil.hs Only one function; move elsewhere
\r
178 -------- Stuff to keep ------------
\r
179 CmmCPSZ.hs Driver for new pipeline
\r
181 CmmLiveZ.hs Liveness analysis, dead code elim
\r
182 CmmProcPointZ.hs Identifying and splitting out proc-points
\r
184 CmmSpillReload.hs Save and restore across calls
\r
186 CmmCommonBlockElimZ.hs Common block elim
\r
187 CmmContFlowOpt.hs Other optimisations (branch-chain, merging)
\r
189 CmmBuildInfoTables.hs New info-table
\r
190 CmmStackLayout.hs and stack layout
\r
192 CmmInfo.hs Defn of InfoTables, and conversion to exact layout
\r
194 ---------- Cmm data types --------------
\r
195 ZipCfgCmmRep.hs Cmm instantiations of dataflow graph framework
\r
196 MkZipCfgCmm.hs Cmm instantiations of dataflow graph framework
\r
198 Cmm.hs Key module; a mix of old and new stuff
\r
199 so needs tidying up in due course
\r
204 PprC.hs Pretty print Cmm in C syntax
\r
205 PprCmm.hs Pretty printer for Cmm
\r
206 PprCmmZ.hs Additional stuff for zipper rep
\r
210 ---------- Dataflow modules --------------
\r
211 Goal: separate library; for now, separate directory
\r
217 CmmTx.hs Transactions
\r
218 OptimizationFuel.hs Fuel
\r
219 BlockId.hs BlockId, BlockEnv, BlockSet
\r
223 ----------------------------------------------------
\r
224 Top-level structure
\r
225 ----------------------------------------------------
\r
227 * New codgen called in HscMain.hscGenHardCode, by calling HscMain.tryNewCodeGen,
\r
228 enabled by -fnew-codegen (Opt_TryNewCodeGen)
\r
230 THEN it calls CmmInfo.cmmToRawCmm to lay out the details of info tables
\r
231 type Cmm = GenCmm CmmStatic CmmInfo (ListGraph CmmStmt)
\r
232 type RawCmm = GenCmm CmmStatic [CmmStatic] (ListGraph CmmStmt)
\r
234 * HscMain.tryNewCodeGen
\r
235 - STG->Cmm: StgCmm.codeGen (new codegen)
\r
236 - Optimise: CmmContFlowOpt (simple optimisations, very self contained)
\r
237 - Cps convert: CmmCPSZ.protoCmmCPSZ
\r
238 - Optimise: CmmContFlowOpt again
\r
239 - Convert: CmmCvt.cmmOfZgraph (convert to old rep) very self contained
\r
241 * StgCmm.hs The new STG -> Cmm conversion code generator
\r
242 Lots of modules StgCmmXXX
\r
245 ----------------------------------------------------
\r
246 CmmCPSZ.protoCmmCPSZ The new pipeline
\r
247 ----------------------------------------------------
\r
249 CmmCPSZprotoCmmCPSZ:
\r
250 1. Do cpsTop for each procedures separately
\r
251 2. Build SRT representation; this spans multiple procedures
\r
252 (unless split-objs)
\r
255 * CmmCommonBlockElimZ.elimCommonBlocks:
\r
256 eliminate common blocks
\r
258 * CmmProcPointZ.minimalProcPointSet
\r
259 identify proc-points
\r
262 * CmmProcPointZ.addProcPointProtocols
\r
263 something to do with the MA optimisation
\r
264 probably entirely unnecessary
\r
266 * Spill and reload:
\r
267 - CmmSpillReload.dualLivenessWithInsertion
\r
268 insert spills/reloads across
\r
270 Branches to proc-points
\r
271 Now sink those reloads:
\r
272 - CmmSpillReload.insertLateReloads
\r
273 - CmmSpillReload.removeDeadAssignmentsAndReloads
\r
275 * CmmStackLayout.stubSlotsOnDeath
\r
276 debug only: zero out dead slots when they die
\r
279 - CmmStackLayout.lifeSlotAnal:
\r
280 find which sub-areas are live on entry to each block
\r
282 - CmmStackLayout.layout
\r
283 Lay out the stack, returning an AreaMap
\r
284 type AreaMap = FiniteMap Area ByteOff
\r
285 -- Byte offset of the oldest byte of the Area,
\r
286 -- relative to the oldest byte of the Old Area
\r
288 - CmmStackLayout.manifestSP
\r
289 Manifest the stack pointer
\r
291 * Split into separate procedures
\r
292 - CmmProcPointZ.procPointAnalysis
\r
293 Given set of proc points, which blocks are reachable from each
\r
294 Claim: too few proc-points => code duplication, but program still works??
\r
296 - CmmProcPointZ.splitAtProcPoints
\r
297 Using this info, split into separate procedures
\r
299 - CmmBuildInfoTables.setInfoTableStackMap
\r
300 Attach stack maps to each info table
\r
303 ----------------------------------------------------
\r
305 ----------------------------------------------------
\r
307 Consider this program, which has a diamond control flow,
\r
308 with a call on one branch
\r
311 if b then { ... f(x) ...; q=5; goto J }
\r
312 else { ...; q=7; goto J }
\r
315 then the join point J is a "proc-point". So, is 'p' passed to J
\r
316 as a parameter? Or, if 'p' was saved on the stack anyway, perhaps
\r
317 to keep it alive across the call to h(), maybe 'p' gets communicated
\r
318 to J that way. This is an awkward choice. (We think that we currently
\r
319 never pass variables to join points via arguments.)
\r
321 Furthermore, there is *no way* to pass q to J in a register (other
\r
322 than a paramter register).
\r
324 What we want is to do register allocation across the whole caboodle.
\r
325 Then we could drop all the code that deals with the above awkward
\r
326 decisions about spilling variables across proc-points.
\r
328 Note that J doesn't need an info table.
\r
330 What we really want is for each LastCall (not LastJump/Ret)
\r
331 to have an info table. Note that ProcPoints that are not successors
\r
332 of calls don't need an info table.
\r
334 Figuring out proc-points
\r
335 ~~~~~~~~~~~~~~~~~~~~~~~~
\r
336 Proc-points are identified by
\r
337 CmmProcPointZ.minimalProcPointSet/extendPPSet Although there isn't
\r
338 that much code, JD thinks that it could be done much more nicely using
\r
339 a dominator analysis, using the Dataflow Engine.
\r
341 ----------------------------------------------------
\r
343 ----------------------------------------------------
\r
345 * The code for a procedure f may refer to either the *closure*
\r
346 or the *entry point* of another top-level procedure g.
\r
347 If f is live, then so is g. f's SRT must include g's closure.
\r
349 * The CLabel for the entry-point/closure reveals whether g is
\r
350 a CAF (or refers to CAFs). See the IdLabel constructor of CLabel.
\r
352 * The CAF-ness of the original top-level defininions is figured out
\r
353 (by TidyPgm) before we generate C--. This CafInfo is only set for
\r
354 top-level Ids; nested bindings stay with MayHaveCafRefs.
\r
356 * Currently an SRT contains (only) pointers to (top-level) closures.
\r
358 * Consider this Core code
\r
359 f = \x -> let g = \y -> ...x...y...h1...
\r
361 and suppose that h1, h2 have IdInfo of MayHaveCafRefs.
\r
362 Therefore, so will f, But g will not (since it's nested).
\r
364 This generates C-- roughly like this:
\r
365 f_closure: .word f_entry
\r
366 f_entry() [info-tbl-for-f] { ...jump g_entry...jump h2... }
\r
367 g_entry() [info-tbl-for-g] { ...jump h1... }
\r
369 Note that there is no top-level closure for g (only an info table).
\r
370 This fact (whether or not there is a top-level closure) is recorded
\r
371 in the InfoTable attached to the CmmProc for f, g
\r
373 Any out-of-Group references to an IdLabel goes to
\r
374 a Proc whose InfoTable says "I have a top-level closure".
\r
376 A CmmProc whose InfoTable says "I do not have a top-level
\r
377 closure" is referred to only from its own Group.
\r
379 * So: info-tbl-for-f must have an SRT that keeps h1,h2 alive
\r
380 info-tbl-for-g must have an SRT that keeps h1 (only) alive
\r
382 But if we just look for the free CAF refs, we get:
\r
386 So we need to do a transitive closure thing to flesh out
\r
387 f's keep-alive refs to include h1.
\r
389 * The SRT info is the C_SRT field of Cmm.ClosureTypeInfo in a
\r
390 CmmInfoTable attached to each CmmProc. CmmCPSZ.toTops actually does
\r
391 the attaching, right at the end of the pipeline. The C_SRT part
\r
392 gives offsets within a single, shared table of closure pointers.
\r
394 * DECIDED: we can generate SRTs based on the final Cmm program
\r
395 without knowledge of how it is generated.
\r
397 ----------------------------------------------------
\r
399 ----------------------------------------------------
\r
401 See Note [Foreign calls] in ZipCfgCmmRep! This explains that a safe
\r
402 foreign call must do this:
\r
404 push info table (on thread stack) to describe frame
\r
405 make call (via C stack)
\r
407 restore thread state
\r
408 and explains why this expansion must be done late in the day.
\r
411 - Every foreign call is represented as a middle node
\r
413 - *Unsafe* foreign calls are simply "fat machine instructions"
\r
414 and are passed along to the native code generator
\r
416 - *Safe* foreign calls are "lowered" to unsafe calls by wrapping
\r
417 them in the above save/restore sequence. This step is done
\r
418 very late in the pipeline, just before handing to the native
\r
421 This lowering is done by BuildInfoTables.lowerSafeForeignCalls
\r
424 NEW PLAN for foreign calls:
\r
425 - Unsafe foreign calls remain as a middle node (fat machine instruction)
\r
426 Even the parameter passing is not lowered (just as machine instrs
\r
429 - Initially, safe foreign calls appear as LastCalls with
\r
432 ----------------------------------------------------
\r
433 Cmm representations
\r
434 ----------------------------------------------------
\r
437 The type [GenCmm d h g] represents a whole module,
\r
438 ** one list element per .o file **
\r
439 Without SplitObjs, the list has exactly one element
\r
441 newtype GenCmm d h g = Cmm [GenCmmTop d h g] -- A whole .o file
\r
442 data GenCmmTop d h g
\r
443 = CmmProc h g -- One procedure, graph d
\r
444 | CmmData <stuff> [d] -- Initialised data, items d
\r
446 Old and new piplines use different representations
\r
447 (CmmCvt.hs converts between the two)
\r
451 OLD BACK END representations (Cmm.hs):
\r
452 type Cmm = GenCmm CmmStatic CmmInfo (ListGraph CmmStmt)
\r
454 newtype ListGraph i = ListGraph [GenBasicBlock i]
\r
456 data CmmStmt = Assign | Store | Return etc -- OLD BACK END ONLY
\r
459 Once the info tables are laid out, we replace CmmInfo with [CmmStatic]
\r
460 type RawCmm = GenCmm CmmStatic [CmmStatic] (ListGraph CmmStmt)
\r
461 which represents the info tables as data, that should
\r
462 immediately precede the code
\r
465 NEW BACK END representations
\r
466 * Not Cmm-specific at all
\r
467 ZipCfg.hs defines Graph, LGraph, FGraph,
\r
468 ZHead, ZTail, ZBlock ...
\r
470 classes LastNode, HavingSuccessors
\r
472 MkZipCfg.hs: AGraph: building graphs
\r
474 * ZipCfgCmmRep: instantiates ZipCfg for Cmm
\r
475 data Middle = ...CmmExpr...
\r
476 data Last = ...CmmExpr...
\r
477 type CmmGraph = Graph Middle Last
\r
479 type CmmZ = GenCmm CmmStatic CmmInfo (CmmStackInfo, CmmGraph)
\r
480 type CmmStackInfo = (ByteOff, Maybe ByteOff)
\r
481 -- (SP offset on entry, update frame space = SP offset on exit)
\r
482 -- The new codegen produces CmmZ, but once the stack is
\r
483 -- manifested we can drop that in favour of
\r
484 -- GenCmm CmmStatic CmmInfo CmmGraph
\r
488 - CmmInfo: partly used by NEW
\r
489 - CmmFormals: not used at all PERHAPS NOT EVEN BY OLD PIPELINE!
\r
491 * MkZipCfgCmm.hs: smart constructors for ZipCfgCmmRep
\r
492 Depends on (a) MkZipCfg (Cmm-independent)
\r
493 (b) ZipCfgCmmRep (Cmm-specific)
\r
497 CmmExpr.hs defines the Cmm expression types
\r
498 - CmmExpr, CmmReg, Width, CmmLit, LocalReg, GlobalReg
\r
499 - CmmType, Width etc (saparate module?)
\r
500 - MachOp (separate module?)
\r
501 - Area, AreaId etc (separate module?)
\r
503 BlockId.hs defines BlockId, BlockEnv, BlockSet
\r
509 * Transactions indicate whether or not the result changes: CmmTx
\r
510 type Tx a = a -> TxRes a
\r
511 data TxRes a = TxRes ChangeFlag a
\r