1 -----------------------------------------------------------------------------
3 -- Stg to C-- code generation: expressions
5 -- (c) The University of Glasgow 2004-2006
7 -----------------------------------------------------------------------------
9 module StgCmmExpr ( cgExpr ) where
11 #define FAST_STRING_NOT_NEEDED
12 #include "HsVersions.h"
14 import {-# SOURCE #-} StgCmmBind ( cgBind )
40 import CostCentre ( CostCentreStack, currentCCS )
46 ------------------------------------------------------------------------
47 -- cgExpr: the main function
48 ------------------------------------------------------------------------
50 cgExpr :: StgExpr -> FCode ()
52 cgExpr (StgApp fun args) = cgIdApp fun args
53 cgExpr (StgOpApp op args ty) = cgOpApp op args ty
54 cgExpr (StgConApp con args) = cgConApp con args
55 cgExpr (StgSCC cc expr) = do { emitSetCCC cc; cgExpr expr }
56 cgExpr (StgTick m n expr) = do { emit (mkTickBox m n); cgExpr expr }
57 cgExpr (StgLit lit) = do cmm_lit <- cgLit lit
58 emitReturn [CmmLit cmm_lit]
60 cgExpr (StgLet binds expr) = do { cgBind binds; cgExpr expr }
61 cgExpr (StgLetNoEscape _ _ binds expr) = do { cgLneBinds binds; cgExpr expr }
63 cgExpr (StgCase expr _live_vars _save_vars bndr srt alt_type alts) =
64 cgCase expr bndr srt alt_type alts
66 cgExpr (StgLam {}) = panic "cgExpr: StgLam"
68 ------------------------------------------------------------------------
70 ------------------------------------------------------------------------
72 {- Generating code for a let-no-escape binding, aka join point is very
73 very similar to what we do for a case expression. The duality is
80 That is, the RHS of 'x' (ie 'b') will execute *later*, just like
81 the alternative of the case; it needs to be compiled in an environment
82 in which all volatile bindings are forgotten, and the free vars are
83 bound only to stable things like stack locations.. The 'e' part will
84 execute *next*, just like the scrutinee of a case. -}
86 -------------------------
87 cgLneBinds :: StgBinding -> FCode ()
88 cgLneBinds (StgNonRec bndr rhs)
89 = do { local_cc <- saveCurrentCostCentre
90 -- See Note [Saving the current cost centre]
91 ; info <- cgLetNoEscapeRhs local_cc bndr rhs
92 ; addBindC (cg_id info) info }
94 cgLneBinds (StgRec pairs)
95 = do { local_cc <- saveCurrentCostCentre
96 ; new_bindings <- fixC (\ new_bindings -> do
97 { addBindsC new_bindings
98 ; listFCs [ cgLetNoEscapeRhs local_cc b e
101 ; addBindsC new_bindings }
104 -------------------------
105 cgLetNoEscapeRhs, cgLetNoEscapeRhsBody
106 :: Maybe LocalReg -- Saved cost centre
111 cgLetNoEscapeRhs local_cc bndr rhs =
112 do { (info, rhs_body) <- getCodeR $ cgLetNoEscapeRhsBody local_cc bndr rhs
113 ; let (bid, _) = expectJust "cgLetNoEscapeRhs" $ maybeLetNoEscape info
114 ; emit (outOfLine $ mkLabel bid emptyStackInfo <*> rhs_body)
118 cgLetNoEscapeRhsBody local_cc bndr (StgRhsClosure cc _bi _ _upd _ args body)
119 = cgLetNoEscapeClosure bndr local_cc cc (nonVoidIds args) body
120 cgLetNoEscapeRhsBody local_cc bndr (StgRhsCon cc con args)
121 = cgLetNoEscapeClosure bndr local_cc cc [] (StgConApp con args)
122 -- For a constructor RHS we want to generate a single chunk of
123 -- code which can be jumped to from many places, which will
124 -- return the constructor. It's easy; just behave as if it
125 -- was an StgRhsClosure with a ConApp inside!
127 -------------------------
130 -> Maybe LocalReg -- Slot for saved current cost centre
131 -> CostCentreStack -- XXX: *** NOT USED *** why not?
132 -> [NonVoid Id] -- Args (as in \ args -> body)
133 -> StgExpr -- Body (as in above)
136 cgLetNoEscapeClosure bndr cc_slot _unused_cc args body
137 = do { arg_regs <- forkProc $ do
138 { restoreCurrentCostCentre cc_slot
139 ; arg_regs <- bindArgsToRegs args
140 ; altHeapCheck arg_regs (cgExpr body)
141 -- Using altHeapCheck just reduces
142 -- instructions to save on stack
144 ; return $ lneIdInfo bndr arg_regs}
147 ------------------------------------------------------------------------
149 ------------------------------------------------------------------------
151 {- Note [Compiling case expressions]
152 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
153 It is quite interesting to decide whether to put a heap-check at the
154 start of each alternative. Of course we certainly have to do so if
155 the case forces an evaluation, or if there is a primitive op which can
158 A more interesting situation is this (a Plan-B situation)
164 default -> !R!; ...R...
166 where !x! indicates a possible heap-check point. The heap checks
167 in the alternatives *can* be omitted, in which case the topmost
168 heapcheck will take their worst case into account.
170 In favour of omitting !Q!, !R!:
172 - *May* save a heap overflow test,
173 if ...P... allocates anything.
175 - We can use relative addressing from a single Hp to
176 get at all the closures so allocated.
178 - No need to save volatile vars etc across heap checks
181 Against omitting !Q!, !R!
183 - May put a heap-check into the inner loop. Suppose
184 the main loop is P -> R -> P -> R...
185 Q is the loop exit, and only it does allocation.
186 This only hurts us if P does no allocation. If P allocates,
187 then there is a heap check in the inner loop anyway.
189 - May do more allocation than reqd. This sometimes bites us
190 badly. For example, nfib (ha!) allocates about 30\% more space if the
191 worst-casing is done, because many many calls to nfib are leaf calls
192 which don't need to allocate anything.
194 We can un-allocate, but that costs an instruction
196 Neither problem hurts us if there is only one alternative.
198 Suppose the inner loop is P->R->P->R etc. Then here is
199 how many heap checks we get in the *inner loop* under various
202 Alooc Heap check in branches (!Q!, !R!)?
203 P Q R yes no (absorb to !P!)
204 --------------------------------------
211 Best choices: absorb heap checks from Q and R into !P! iff
212 a) P itself does some allocation
214 b) P does allocation, or there is exactly one alternative
216 We adopt (b) because that is more likely to put the heap check at the
217 entry to a function, when not many things are live. After a bunch of
218 single-branch cases, we may have lots of things live
220 Hence: two basic plans for
224 ------ Plan A: the general case ---------
226 ...save current cost centre...
229 with sequel (SetLocals r)
231 ...restore current cost centre...
233 ...alts do their own heap checks
235 ------ Plan B: special case when ---------
236 (i) e does not allocate or call GC
237 (ii) either upstream code performs allocation
238 or there is just one alternative
240 Then heap allocation in the (single) case branch
241 is absorbed by the upstream check.
242 Very common example: primops on unboxed values
245 with sequel (SetLocals r)...
253 -------------------------------------
255 = GcInAlts -- Put a GC check at the start the case alternatives,
256 [LocalReg] -- which binds these registers
257 SRT -- using this SRT
258 | NoGcInAlts -- The scrutinee is a primitive value, or a call to a
259 -- primitive op which does no GC. Absorb the allocation
260 -- of the case alternative(s) into the upstream check
262 -------------------------------------
263 -- See Note [case on Bool]
264 cgCase :: StgExpr -> Id -> SRT -> AltType -> [StgAlt] -> FCode ()
266 cgCase (OpApp ) bndr srt AlgAlt [(DataAlt flase, a2]
267 | isBoolTy (idType bndr)
272 cgCase scrut bndr srt alt_type alts
273 = do { up_hp_usg <- getVirtHp -- Upstream heap usage
274 ; let ret_bndrs = chooseReturnBndrs bndr alt_type alts
275 alt_regs = map idToReg ret_bndrs
276 simple_scrut = isSimpleScrut scrut alt_type
277 gcInAlts | not simple_scrut = True
278 | isSingleton alts = False
279 | up_hp_usg > 0 = False
281 gc_plan = if gcInAlts then GcInAlts alt_regs srt else NoGcInAlts
283 ; mb_cc <- maybeSaveCostCentre simple_scrut
284 ; withSequel (AssignTo alt_regs gcInAlts) (cgExpr scrut)
285 ; restoreCurrentCostCentre mb_cc
287 -- JD: We need Note: [Better Alt Heap Checks]
288 ; bindArgsToRegs ret_bndrs
289 ; cgAlts gc_plan (NonVoid bndr) alt_type alts }
292 maybeSaveCostCentre :: Bool -> FCode (Maybe LocalReg)
293 maybeSaveCostCentre simple_scrut
294 | simple_scrut = saveCurrentCostCentre
295 | otherwise = return Nothing
299 isSimpleScrut :: StgExpr -> AltType -> Bool
300 -- Simple scrutinee, does not block or allocate; hence safe to amalgamate
301 -- heap usage from alternatives into the stuff before the case
302 -- NB: if you get this wrong, and claim that the expression doesn't allocate
303 -- when it does, you'll deeply mess up allocation
304 isSimpleScrut (StgOpApp op _ _) _ = isSimpleOp op
305 isSimpleScrut (StgLit _) _ = True -- case 1# of { 0# -> ..; ... }
306 isSimpleScrut (StgApp _ []) (PrimAlt _) = True -- case x# of { 0# -> ..; ... }
307 isSimpleScrut _ _ = False
309 isSimpleOp :: StgOp -> Bool
310 -- True iff the op cannot block or allocate
311 isSimpleOp (StgFCallOp (CCall (CCallSpec _ _ safe)) _) = not (playSafe safe)
312 isSimpleOp (StgFCallOp (DNCall _) _) = False -- Safe!
313 isSimpleOp (StgPrimOp op) = not (primOpOutOfLine op)
316 chooseReturnBndrs :: Id -> AltType -> [StgAlt] -> [NonVoid Id]
317 -- These are the binders of a case that are assigned
318 -- by the evaluation of the scrutinee
319 -- Only non-void ones come back
320 chooseReturnBndrs bndr (PrimAlt _) _alts
323 chooseReturnBndrs _bndr (UbxTupAlt _) [(_, ids, _, _)]
324 = nonVoidIds ids -- 'bndr' is not assigned!
326 chooseReturnBndrs bndr (AlgAlt _) _alts
327 = nonVoidIds [bndr] -- Only 'bndr' is assigned
329 chooseReturnBndrs bndr PolyAlt _alts
330 = nonVoidIds [bndr] -- Only 'bndr' is assigned
332 chooseReturnBndrs _ _ _ = panic "chooseReturnBndrs"
333 -- UbxTupALt has only one alternative
335 -------------------------------------
336 cgAlts :: GcPlan -> NonVoid Id -> AltType -> [StgAlt] -> FCode ()
337 -- At this point the result of the case are in the binders
338 cgAlts gc_plan _bndr PolyAlt [(_, _, _, rhs)]
339 = maybeAltHeapCheck gc_plan (cgExpr rhs)
341 cgAlts gc_plan _bndr (UbxTupAlt _) [(_, _, _, rhs)]
342 = maybeAltHeapCheck gc_plan (cgExpr rhs)
343 -- Here bndrs are *already* in scope, so don't rebind them
345 cgAlts gc_plan bndr (PrimAlt _) alts
346 = do { tagged_cmms <- cgAltRhss gc_plan bndr alts
348 ; let bndr_reg = CmmLocal (idToReg bndr)
349 (DEFAULT,deflt) = head tagged_cmms
350 -- PrimAlts always have a DEFAULT case
351 -- and it always comes first
353 tagged_cmms' = [(lit,code)
354 | (LitAlt lit, code) <- tagged_cmms]
355 ; emit (mkCmmLitSwitch (CmmReg bndr_reg) tagged_cmms' deflt) }
357 cgAlts gc_plan bndr (AlgAlt tycon) alts
358 = do { tagged_cmms <- cgAltRhss gc_plan bndr alts
360 ; let fam_sz = tyConFamilySize tycon
361 bndr_reg = CmmLocal (idToReg bndr)
362 mb_deflt = case tagged_cmms of
363 ((DEFAULT,rhs) : _) -> Just rhs
365 -- DEFAULT is always first, if present
367 branches = [ (dataConTagZ con, cmm)
368 | (DataAlt con, cmm) <- tagged_cmms ]
370 -- Is the constructor tag in the node reg?
371 ; if isSmallFamily fam_sz
372 then let -- Yes, bndr_reg has constr. tag in ls bits
373 tag_expr = cmmConstrTag1 (CmmReg bndr_reg)
374 branches' = [(tag+1,branch) | (tag,branch) <- branches]
376 emitSwitch tag_expr branches' mb_deflt 1 fam_sz
378 else -- No, get tag from info table
379 let -- Note that ptr _always_ has tag 1
380 -- when the family size is big enough
381 untagged_ptr = cmmRegOffB bndr_reg (-1)
382 tag_expr = getConstrTag (untagged_ptr)
384 emitSwitch tag_expr branches mb_deflt 0 (fam_sz - 1) }
386 cgAlts _ _ _ _ = panic "cgAlts"
387 -- UbxTupAlt and PolyAlt have only one alternative
390 cgAltRhss :: GcPlan -> NonVoid Id -> [StgAlt] -> FCode [(AltCon, CmmAGraph)]
391 cgAltRhss gc_plan bndr alts
392 = forkAlts (map cg_alt alts)
394 base_reg = idToReg bndr
395 cg_alt :: StgAlt -> FCode (AltCon, CmmAGraph)
396 cg_alt (con, bndrs, _uses, rhs)
398 maybeAltHeapCheck gc_plan $
399 do { bindConArgs con base_reg bndrs
403 maybeAltHeapCheck :: GcPlan -> FCode a -> FCode a
404 maybeAltHeapCheck NoGcInAlts code
406 maybeAltHeapCheck (GcInAlts regs _) code
407 = altHeapCheck regs code
409 -----------------------------------------------------------------------------
411 -----------------------------------------------------------------------------
413 cgConApp :: DataCon -> [StgArg] -> FCode ()
414 cgConApp con stg_args
415 | isUnboxedTupleCon con -- Unboxed tuple: assign and return
416 = do { arg_exprs <- getNonVoidArgAmodes stg_args
417 ; tickyUnboxedTupleReturn (length arg_exprs)
418 ; emitReturn arg_exprs }
420 | otherwise -- Boxed constructors; allocate and return
421 = ASSERT( stg_args `lengthIs` dataConRepArity con )
422 do { (idinfo, init) <- buildDynCon (dataConWorkId con) currentCCS con stg_args
423 -- The first "con" says that the name bound to this closure is
424 -- is "con", which is a bit of a fudge, but it only affects profiling
427 ; emitReturn [idInfoToAmode idinfo] }
430 cgIdApp :: Id -> [StgArg] -> FCode ()
431 cgIdApp fun_id [] | isVoidId fun_id = emitReturn []
433 = do { fun_info <- getCgIdInfo fun_id
434 ; case maybeLetNoEscape fun_info of
435 Just (blk_id, lne_regs) -> cgLneJump blk_id lne_regs args
436 Nothing -> cgTailCall fun_id fun_info args }
438 cgLneJump :: BlockId -> [LocalReg] -> [StgArg] -> FCode ()
439 cgLneJump blk_id lne_regs args -- Join point; discard sequel
440 = do { cmm_args <- getNonVoidArgAmodes args
441 ; emit (mkMultiAssign lne_regs cmm_args
442 <*> mkBranch blk_id) }
444 cgTailCall :: Id -> CgIdInfo -> [StgArg] -> FCode ()
445 cgTailCall fun_id fun_info args = do
446 dflags <- getDynFlags
447 case (getCallMethod dflags fun_name (idCafInfo fun_id) lf_info (length args)) of
449 -- A value in WHNF, so we can just return it.
450 ReturnIt -> emitReturn [fun] -- ToDo: does ReturnIt guarantee tagged?
452 EnterIt -> ASSERT( null args ) -- Discarding arguments
453 do { let fun' = CmmLoad fun (cmmExprType fun)
454 ; [ret,call] <- forkAlts [
455 getCode $ emitReturn [fun], -- Is tagged; no need to untag
456 getCode $ do emit (mkAssign nodeReg fun)
457 emitCall Native (entryCode fun') []] -- Not tagged
458 ; emit (mkCmmIfThenElse (cmmIsTagged fun) ret call) }
460 SlowCall -> do -- A slow function call via the RTS apply routines
461 { tickySlowCall lf_info args
462 ; emit $ mkComment $ mkFastString "slowCall"
463 ; slowCall fun args }
465 -- A direct function call (possibly with some left-over arguments)
466 DirectEntry lbl arity -> do
467 { tickyDirectCall arity args
468 ; if node_points then
469 do emit $ mkComment $ mkFastString "directEntry"
470 emit (mkAssign nodeReg fun)
471 directCall lbl arity args
472 -- directCall lbl (arity+1) (StgVarArg fun_id : args))
473 -- >>= (emit . (mkComment (mkFastString "DirectEntry") <*>))
474 else do emit $ mkComment $ mkFastString "directEntry else"
475 directCall lbl arity args }
477 JumpToIt {} -> panic "cgTailCall" -- ???
480 fun_name = idName fun_id
481 fun = idInfoToAmode fun_info
482 lf_info = cgIdInfoLF fun_info
483 node_points = nodeMustPointToIt lf_info
486 {- Note [case on Bool]
488 A case on a Boolean value does two things:
489 1. It looks up the Boolean in a closure table and assigns the
490 result to the binder.
491 2. It branches to the True or False case through analysis
492 of the closure assigned to the binder.
493 But the indirection through the closure table is unnecessary
494 if the assignment to the binder will be dead code (use isDeadBndr).
496 The following example illustrates how badly the code turns out:
498 case <=## [ww_s7Hx y_s7HD] of wild2_sbH8 {
499 GHC.Bool.False -> <true code> // sbH8 dead
500 GHC.Bool.True -> <false code> // sbH8 dead
503 _s7HD::F64 = F64[_sbH7::I64 + 7]; // MidAssign
504 _ccsW::I64 = %MO_F_Le_W64(_s7Hx::F64, _s7HD::F64); // MidAssign
505 // emitReturn // MidComment
506 _sbH8::I64 = I64[ghczmprim_GHCziBool_Bool_closure_tbl + (_ccsW::I64 << 3)]; // MidAssign
507 _ccsX::I64 = _sbH8::I64 & 7; // MidAssign
508 if (_ccsX::I64 >= 2) goto ccsH; else goto ccsI; // LastCondBranch
510 The assignments to _sbH8 and _ccsX are completely unnecessary.
511 Instead, we should branch based on the value of _ccsW.
514 {- Note [Better Alt Heap Checks]
515 If two function calls can share a return point, then they will also
516 get the same info table. Therefore, it's worth our effort to make
517 those opportunities appear as frequently as possible.
519 Here are a few examples of how it should work:
523 True -> <True code -- including allocation>
524 False -> <False code>
526 r = call f(x) returns to L;
528 if r & 7 >= 2 goto L1 else goto L2;
533 <True code -- including allocation>
536 Note that the code following both the call to f(x) and the code to gc(r)
537 should be the same, which will allow the common blockifier to discover
538 that they are the same. Therefore, both function calls will return to the same
539 block, and they will use the same info table.
541 Here's an example of the Cmm code we want from a primOp.
542 The primOp doesn't produce an info table for us to reuse, but that's okay:
543 we should still generate the same code:
546 0 -> <0-case code -- including allocation>
547 _ -> <default-case code>
551 if r == 0 then goto L1 else goto L2;
556 <0-case code -- including allocation>