1 -----------------------------------------------------------------------------
3 -- Stg to C-- code generation: expressions
5 -- (c) The University of Glasgow 2004-2006
7 -----------------------------------------------------------------------------
9 module StgCmmExpr ( cgExpr ) where
11 #define FAST_STRING_NOT_NEEDED
12 #include "HsVersions.h"
14 import {-# SOURCE #-} StgCmmBind ( cgBind )
40 import CostCentre ( CostCentreStack, currentCCS )
47 ------------------------------------------------------------------------
48 -- cgExpr: the main function
49 ------------------------------------------------------------------------
51 cgExpr :: StgExpr -> FCode ()
53 cgExpr (StgApp fun args) = cgIdApp fun args
54 cgExpr (StgOpApp op args ty) = cgOpApp op args ty
55 cgExpr (StgConApp con args) = cgConApp con args
56 cgExpr (StgSCC cc expr) = do { emitSetCCC cc; cgExpr expr }
57 cgExpr (StgTick m n expr) = do { emit (mkTickBox m n); cgExpr expr }
58 cgExpr (StgLit lit) = do cmm_lit <- cgLit lit
59 emitReturn [CmmLit cmm_lit]
61 cgExpr (StgLet binds expr) = do { cgBind binds; cgExpr expr }
62 cgExpr (StgLetNoEscape _ _ binds expr) =
63 do { us <- newUniqSupply
64 ; let join_id = mkBlockId (uniqFromSupply us)
65 ; cgLneBinds join_id binds
67 ; emit $ mkLabel join_id}
69 cgExpr (StgCase expr _live_vars _save_vars bndr srt alt_type alts) =
70 cgCase expr bndr srt alt_type alts
72 cgExpr (StgLam {}) = panic "cgExpr: StgLam"
74 ------------------------------------------------------------------------
76 ------------------------------------------------------------------------
78 {- Generating code for a let-no-escape binding, aka join point is very
79 very similar to what we do for a case expression. The duality is
86 That is, the RHS of 'x' (ie 'b') will execute *later*, just like
87 the alternative of the case; it needs to be compiled in an environment
88 in which all volatile bindings are forgotten, and the free vars are
89 bound only to stable things like stack locations.. The 'e' part will
90 execute *next*, just like the scrutinee of a case. -}
92 -------------------------
93 cgLneBinds :: BlockId -> StgBinding -> FCode ()
94 cgLneBinds join_id (StgNonRec bndr rhs)
95 = do { local_cc <- saveCurrentCostCentre
96 -- See Note [Saving the current cost centre]
97 ; info <- cgLetNoEscapeRhs join_id local_cc bndr rhs
98 ; addBindC (cg_id info) info }
100 cgLneBinds join_id (StgRec pairs)
101 = do { local_cc <- saveCurrentCostCentre
102 ; new_bindings <- fixC (\ new_bindings -> do
103 { addBindsC new_bindings
104 ; listFCs [ cgLetNoEscapeRhs join_id local_cc b e
105 | (b,e) <- pairs ] })
106 ; addBindsC new_bindings }
109 -------------------------
111 :: BlockId -- join point for successor of let-no-escape
112 -> Maybe LocalReg -- Saved cost centre
117 cgLetNoEscapeRhs join_id local_cc bndr rhs =
118 do { (info, rhs_body) <- getCodeR $ cgLetNoEscapeRhsBody local_cc bndr rhs
119 ; let (bid, _) = expectJust "cgLetNoEscapeRhs" $ maybeLetNoEscape info
120 ; emit (outOfLine $ mkLabel bid <*> rhs_body <*> mkBranch join_id)
125 :: Maybe LocalReg -- Saved cost centre
129 cgLetNoEscapeRhsBody local_cc bndr (StgRhsClosure cc _bi _ _upd _ args body)
130 = cgLetNoEscapeClosure bndr local_cc cc (nonVoidIds args) body
131 cgLetNoEscapeRhsBody local_cc bndr (StgRhsCon cc con args)
132 = cgLetNoEscapeClosure bndr local_cc cc [] (StgConApp con args)
133 -- For a constructor RHS we want to generate a single chunk of
134 -- code which can be jumped to from many places, which will
135 -- return the constructor. It's easy; just behave as if it
136 -- was an StgRhsClosure with a ConApp inside!
138 -------------------------
141 -> Maybe LocalReg -- Slot for saved current cost centre
142 -> CostCentreStack -- XXX: *** NOT USED *** why not?
143 -> [NonVoid Id] -- Args (as in \ args -> body)
144 -> StgExpr -- Body (as in above)
147 cgLetNoEscapeClosure bndr cc_slot _unused_cc args body
148 = do { arg_regs <- forkProc $ do
149 { restoreCurrentCostCentre cc_slot
150 ; arg_regs <- bindArgsToRegs args
151 ; altHeapCheck arg_regs (cgExpr body)
152 -- Using altHeapCheck just reduces
153 -- instructions to save on stack
155 ; return $ lneIdInfo bndr arg_regs}
158 ------------------------------------------------------------------------
160 ------------------------------------------------------------------------
162 {- Note [Compiling case expressions]
163 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
164 It is quite interesting to decide whether to put a heap-check at the
165 start of each alternative. Of course we certainly have to do so if
166 the case forces an evaluation, or if there is a primitive op which can
169 A more interesting situation is this (a Plan-B situation)
175 default -> !R!; ...R...
177 where !x! indicates a possible heap-check point. The heap checks
178 in the alternatives *can* be omitted, in which case the topmost
179 heapcheck will take their worst case into account.
181 In favour of omitting !Q!, !R!:
183 - *May* save a heap overflow test,
184 if ...P... allocates anything.
186 - We can use relative addressing from a single Hp to
187 get at all the closures so allocated.
189 - No need to save volatile vars etc across heap checks
192 Against omitting !Q!, !R!
194 - May put a heap-check into the inner loop. Suppose
195 the main loop is P -> R -> P -> R...
196 Q is the loop exit, and only it does allocation.
197 This only hurts us if P does no allocation. If P allocates,
198 then there is a heap check in the inner loop anyway.
200 - May do more allocation than reqd. This sometimes bites us
201 badly. For example, nfib (ha!) allocates about 30\% more space if the
202 worst-casing is done, because many many calls to nfib are leaf calls
203 which don't need to allocate anything.
205 We can un-allocate, but that costs an instruction
207 Neither problem hurts us if there is only one alternative.
209 Suppose the inner loop is P->R->P->R etc. Then here is
210 how many heap checks we get in the *inner loop* under various
213 Alooc Heap check in branches (!Q!, !R!)?
214 P Q R yes no (absorb to !P!)
215 --------------------------------------
222 Best choices: absorb heap checks from Q and R into !P! iff
223 a) P itself does some allocation
225 b) P does allocation, or there is exactly one alternative
227 We adopt (b) because that is more likely to put the heap check at the
228 entry to a function, when not many things are live. After a bunch of
229 single-branch cases, we may have lots of things live
231 Hence: two basic plans for
235 ------ Plan A: the general case ---------
237 ...save current cost centre...
240 with sequel (SetLocals r)
242 ...restore current cost centre...
244 ...alts do their own heap checks
246 ------ Plan B: special case when ---------
247 (i) e does not allocate or call GC
248 (ii) either upstream code performs allocation
249 or there is just one alternative
251 Then heap allocation in the (single) case branch
252 is absorbed by the upstream check.
253 Very common example: primops on unboxed values
256 with sequel (SetLocals r)...
264 -------------------------------------
266 = GcInAlts -- Put a GC check at the start the case alternatives,
267 [LocalReg] -- which binds these registers
268 SRT -- using this SRT
269 | NoGcInAlts -- The scrutinee is a primitive value, or a call to a
270 -- primitive op which does no GC. Absorb the allocation
271 -- of the case alternative(s) into the upstream check
273 -------------------------------------
274 -- See Note [case on Bool]
275 cgCase :: StgExpr -> Id -> SRT -> AltType -> [StgAlt] -> FCode ()
277 cgCase (OpApp ) bndr srt AlgAlt [(DataAlt flase, a2]
278 | isBoolTy (idType bndr)
283 cgCase scrut bndr srt alt_type alts
284 = do { up_hp_usg <- getVirtHp -- Upstream heap usage
285 ; let ret_bndrs = chooseReturnBndrs bndr alt_type alts
286 alt_regs = map idToReg ret_bndrs
287 simple_scrut = isSimpleScrut scrut alt_type
288 gcInAlts | not simple_scrut = True
289 | isSingleton alts = False
290 | up_hp_usg > 0 = False
292 gc_plan = if gcInAlts then GcInAlts alt_regs srt else NoGcInAlts
294 ; mb_cc <- maybeSaveCostCentre simple_scrut
295 ; withSequel (AssignTo alt_regs gcInAlts) (cgExpr scrut)
296 ; restoreCurrentCostCentre mb_cc
298 -- JD: We need Note: [Better Alt Heap Checks]
299 ; bindArgsToRegs ret_bndrs
300 ; cgAlts gc_plan (NonVoid bndr) alt_type alts }
303 maybeSaveCostCentre :: Bool -> FCode (Maybe LocalReg)
304 maybeSaveCostCentre simple_scrut
305 | simple_scrut = saveCurrentCostCentre
306 | otherwise = return Nothing
310 isSimpleScrut :: StgExpr -> AltType -> Bool
311 -- Simple scrutinee, does not block or allocate; hence safe to amalgamate
312 -- heap usage from alternatives into the stuff before the case
313 -- NB: if you get this wrong, and claim that the expression doesn't allocate
314 -- when it does, you'll deeply mess up allocation
315 isSimpleScrut (StgOpApp op _ _) _ = isSimpleOp op
316 isSimpleScrut (StgLit _) _ = True -- case 1# of { 0# -> ..; ... }
317 isSimpleScrut (StgApp _ []) (PrimAlt _) = True -- case x# of { 0# -> ..; ... }
318 isSimpleScrut _ _ = False
320 isSimpleOp :: StgOp -> Bool
321 -- True iff the op cannot block or allocate
322 isSimpleOp (StgFCallOp (CCall (CCallSpec _ _ safe)) _) = not (playSafe safe)
323 isSimpleOp (StgFCallOp (DNCall _) _) = False -- Safe!
324 isSimpleOp (StgPrimOp op) = not (primOpOutOfLine op)
327 chooseReturnBndrs :: Id -> AltType -> [StgAlt] -> [NonVoid Id]
328 -- These are the binders of a case that are assigned
329 -- by the evaluation of the scrutinee
330 -- Only non-void ones come back
331 chooseReturnBndrs bndr (PrimAlt _) _alts
334 chooseReturnBndrs _bndr (UbxTupAlt _) [(_, ids, _, _)]
335 = nonVoidIds ids -- 'bndr' is not assigned!
337 chooseReturnBndrs bndr (AlgAlt _) _alts
338 = nonVoidIds [bndr] -- Only 'bndr' is assigned
340 chooseReturnBndrs bndr PolyAlt _alts
341 = nonVoidIds [bndr] -- Only 'bndr' is assigned
343 chooseReturnBndrs _ _ _ = panic "chooseReturnBndrs"
344 -- UbxTupALt has only one alternative
346 -------------------------------------
347 cgAlts :: GcPlan -> NonVoid Id -> AltType -> [StgAlt] -> FCode ()
348 -- At this point the result of the case are in the binders
349 cgAlts gc_plan _bndr PolyAlt [(_, _, _, rhs)]
350 = maybeAltHeapCheck gc_plan (cgExpr rhs)
352 cgAlts gc_plan _bndr (UbxTupAlt _) [(_, _, _, rhs)]
353 = maybeAltHeapCheck gc_plan (cgExpr rhs)
354 -- Here bndrs are *already* in scope, so don't rebind them
356 cgAlts gc_plan bndr (PrimAlt _) alts
357 = do { tagged_cmms <- cgAltRhss gc_plan bndr alts
359 ; let bndr_reg = CmmLocal (idToReg bndr)
360 (DEFAULT,deflt) = head tagged_cmms
361 -- PrimAlts always have a DEFAULT case
362 -- and it always comes first
364 tagged_cmms' = [(lit,code)
365 | (LitAlt lit, code) <- tagged_cmms]
366 ; emit (mkCmmLitSwitch (CmmReg bndr_reg) tagged_cmms' deflt) }
368 cgAlts gc_plan bndr (AlgAlt tycon) alts
369 = do { tagged_cmms <- cgAltRhss gc_plan bndr alts
371 ; let fam_sz = tyConFamilySize tycon
372 bndr_reg = CmmLocal (idToReg bndr)
373 mb_deflt = case tagged_cmms of
374 ((DEFAULT,rhs) : _) -> Just rhs
376 -- DEFAULT is always first, if present
378 branches = [ (dataConTagZ con, cmm)
379 | (DataAlt con, cmm) <- tagged_cmms ]
381 -- Is the constructor tag in the node reg?
382 ; if isSmallFamily fam_sz
383 then let -- Yes, bndr_reg has constr. tag in ls bits
384 tag_expr = cmmConstrTag1 (CmmReg bndr_reg)
385 branches' = [(tag+1,branch) | (tag,branch) <- branches]
387 emitSwitch tag_expr branches' mb_deflt 1 fam_sz
389 else -- No, get tag from info table
390 let -- Note that ptr _always_ has tag 1
391 -- when the family size is big enough
392 untagged_ptr = cmmRegOffB bndr_reg (-1)
393 tag_expr = getConstrTag (untagged_ptr)
395 emitSwitch tag_expr branches mb_deflt 0 (fam_sz - 1) }
397 cgAlts _ _ _ _ = panic "cgAlts"
398 -- UbxTupAlt and PolyAlt have only one alternative
401 cgAltRhss :: GcPlan -> NonVoid Id -> [StgAlt] -> FCode [(AltCon, CmmAGraph)]
402 cgAltRhss gc_plan bndr alts
403 = forkAlts (map cg_alt alts)
405 base_reg = idToReg bndr
406 cg_alt :: StgAlt -> FCode (AltCon, CmmAGraph)
407 cg_alt (con, bndrs, _uses, rhs)
409 maybeAltHeapCheck gc_plan $
410 do { bindConArgs con base_reg bndrs
414 maybeAltHeapCheck :: GcPlan -> FCode a -> FCode a
415 maybeAltHeapCheck NoGcInAlts code
417 maybeAltHeapCheck (GcInAlts regs _) code
418 = altHeapCheck regs code
420 -----------------------------------------------------------------------------
422 -----------------------------------------------------------------------------
424 cgConApp :: DataCon -> [StgArg] -> FCode ()
425 cgConApp con stg_args
426 | isUnboxedTupleCon con -- Unboxed tuple: assign and return
427 = do { arg_exprs <- getNonVoidArgAmodes stg_args
428 ; tickyUnboxedTupleReturn (length arg_exprs)
429 ; emitReturn arg_exprs }
431 | otherwise -- Boxed constructors; allocate and return
432 = ASSERT( stg_args `lengthIs` dataConRepArity con )
433 do { (idinfo, init) <- buildDynCon (dataConWorkId con) currentCCS con stg_args
434 -- The first "con" says that the name bound to this closure is
435 -- is "con", which is a bit of a fudge, but it only affects profiling
438 ; emitReturn [idInfoToAmode idinfo] }
441 cgIdApp :: Id -> [StgArg] -> FCode ()
442 cgIdApp fun_id [] | isVoidId fun_id = emitReturn []
444 = do { fun_info <- getCgIdInfo fun_id
445 ; case maybeLetNoEscape fun_info of
446 Just (blk_id, lne_regs) -> cgLneJump blk_id lne_regs args
447 Nothing -> cgTailCall fun_id fun_info args }
449 cgLneJump :: BlockId -> [LocalReg] -> [StgArg] -> FCode ()
450 cgLneJump blk_id lne_regs args -- Join point; discard sequel
451 = do { cmm_args <- getNonVoidArgAmodes args
452 ; emit (mkMultiAssign lne_regs cmm_args
453 <*> mkBranch blk_id) }
455 cgTailCall :: Id -> CgIdInfo -> [StgArg] -> FCode ()
456 cgTailCall fun_id fun_info args = do
457 dflags <- getDynFlags
458 case (getCallMethod dflags fun_name (idCafInfo fun_id) lf_info (length args)) of
460 -- A value in WHNF, so we can just return it.
461 ReturnIt -> emitReturn [fun] -- ToDo: does ReturnIt guarantee tagged?
463 EnterIt -> ASSERT( null args ) -- Discarding arguments
464 do { let fun' = CmmLoad fun (cmmExprType fun)
465 ; [ret,call] <- forkAlts [
466 getCode $ emitReturn [fun], -- Is tagged; no need to untag
467 getCode $ do -- emit (mkAssign nodeReg fun)
468 emitCall (NativeNodeCall, NativeReturn)
469 (entryCode fun') [fun]] -- Not tagged
470 ; emit (mkCmmIfThenElse (cmmIsTagged fun) ret call) }
472 SlowCall -> do -- A slow function call via the RTS apply routines
473 { tickySlowCall lf_info args
474 ; emit $ mkComment $ mkFastString "slowCall"
475 ; slowCall fun args }
477 -- A direct function call (possibly with some left-over arguments)
478 DirectEntry lbl arity -> do
479 { tickyDirectCall arity args
480 ; if node_points then
481 do emit $ mkComment $ mkFastString "directEntry"
482 emit (mkAssign nodeReg fun)
483 directCall lbl arity args
484 else do emit $ mkComment $ mkFastString "directEntry else"
485 directCall lbl arity args }
487 JumpToIt {} -> panic "cgTailCall" -- ???
490 fun_name = idName fun_id
491 fun = idInfoToAmode fun_info
492 lf_info = cgIdInfoLF fun_info
493 node_points = nodeMustPointToIt lf_info
496 {- Note [case on Bool]
498 A case on a Boolean value does two things:
499 1. It looks up the Boolean in a closure table and assigns the
500 result to the binder.
501 2. It branches to the True or False case through analysis
502 of the closure assigned to the binder.
503 But the indirection through the closure table is unnecessary
504 if the assignment to the binder will be dead code (use isDeadBndr).
506 The following example illustrates how badly the code turns out:
508 case <=## [ww_s7Hx y_s7HD] of wild2_sbH8 {
509 GHC.Bool.False -> <true code> // sbH8 dead
510 GHC.Bool.True -> <false code> // sbH8 dead
513 _s7HD::F64 = F64[_sbH7::I64 + 7]; // MidAssign
514 _ccsW::I64 = %MO_F_Le_W64(_s7Hx::F64, _s7HD::F64); // MidAssign
515 // emitReturn // MidComment
516 _sbH8::I64 = I64[ghczmprim_GHCziBool_Bool_closure_tbl + (_ccsW::I64 << 3)]; // MidAssign
517 _ccsX::I64 = _sbH8::I64 & 7; // MidAssign
518 if (_ccsX::I64 >= 2) goto ccsH; else goto ccsI; // LastCondBranch
520 The assignments to _sbH8 and _ccsX are completely unnecessary.
521 Instead, we should branch based on the value of _ccsW.
524 {- Note [Better Alt Heap Checks]
525 If two function calls can share a return point, then they will also
526 get the same info table. Therefore, it's worth our effort to make
527 those opportunities appear as frequently as possible.
529 Here are a few examples of how it should work:
533 True -> <True code -- including allocation>
534 False -> <False code>
536 r = call f(x) returns to L;
538 if r & 7 >= 2 goto L1 else goto L2;
543 <True code -- including allocation>
546 Note that the code following both the call to f(x) and the code to gc(r)
547 should be the same, which will allow the common blockifier to discover
548 that they are the same. Therefore, both function calls will return to the same
549 block, and they will use the same info table.
551 Here's an example of the Cmm code we want from a primOp.
552 The primOp doesn't produce an info table for us to reuse, but that's okay:
553 we should still generate the same code:
556 0 -> <0-case code -- including allocation>
557 _ -> <default-case code>
561 if r == 0 then goto L1 else goto L2;
566 <0-case code -- including allocation>