1 -----------------------------------------------------------------------------
3 -- Stg to C-- code generation: expressions
5 -- (c) The University of Glasgow 2004-2006
7 -----------------------------------------------------------------------------
9 module StgCmmExpr ( cgExpr ) where
11 #define FAST_STRING_NOT_NEEDED
12 #include "HsVersions.h"
14 import {-# SOURCE #-} StgCmmBind ( cgBind )
40 import CostCentre ( CostCentreStack, currentCCS )
47 ------------------------------------------------------------------------
48 -- cgExpr: the main function
49 ------------------------------------------------------------------------
51 cgExpr :: StgExpr -> FCode ()
53 cgExpr (StgApp fun args) = cgIdApp fun args
54 cgExpr (StgOpApp op args ty) = cgOpApp op args ty
55 cgExpr (StgConApp con args) = cgConApp con args
56 cgExpr (StgSCC cc expr) = do { emitSetCCC cc; cgExpr expr }
57 cgExpr (StgTick m n expr) = do { emit (mkTickBox m n); cgExpr expr }
58 cgExpr (StgLit lit) = do cmm_lit <- cgLit lit
59 emitReturn [CmmLit cmm_lit]
61 cgExpr (StgLet binds expr) = do { cgBind binds; cgExpr expr }
62 cgExpr (StgLetNoEscape _ _ binds expr) =
63 do { us <- newUniqSupply
64 ; let join_id = mkBlockId (uniqFromSupply us)
65 ; cgLneBinds join_id binds
67 ; emit $ mkLabel join_id}
69 cgExpr (StgCase expr _live_vars _save_vars bndr srt alt_type alts) =
70 cgCase expr bndr srt alt_type alts
72 cgExpr (StgLam {}) = panic "cgExpr: StgLam"
74 ------------------------------------------------------------------------
76 ------------------------------------------------------------------------
78 {- Generating code for a let-no-escape binding, aka join point is very
79 very similar to what we do for a case expression. The duality is
86 That is, the RHS of 'x' (ie 'b') will execute *later*, just like
87 the alternative of the case; it needs to be compiled in an environment
88 in which all volatile bindings are forgotten, and the free vars are
89 bound only to stable things like stack locations.. The 'e' part will
90 execute *next*, just like the scrutinee of a case. -}
92 -------------------------
93 cgLneBinds :: BlockId -> StgBinding -> FCode ()
94 cgLneBinds join_id (StgNonRec bndr rhs)
95 = do { local_cc <- saveCurrentCostCentre
96 -- See Note [Saving the current cost centre]
97 ; info <- cgLetNoEscapeRhs join_id local_cc bndr rhs
98 ; addBindC (cg_id info) info }
100 cgLneBinds join_id (StgRec pairs)
101 = do { local_cc <- saveCurrentCostCentre
102 ; new_bindings <- fixC (\ new_bindings -> do
103 { addBindsC new_bindings
104 ; listFCs [ cgLetNoEscapeRhs join_id local_cc b e
105 | (b,e) <- pairs ] })
106 ; addBindsC new_bindings }
109 -------------------------
111 :: BlockId -- join point for successor of let-no-escape
112 -> Maybe LocalReg -- Saved cost centre
117 cgLetNoEscapeRhs join_id local_cc bndr rhs =
118 do { (info, rhs_body) <- getCodeR $ cgLetNoEscapeRhsBody local_cc bndr rhs
119 ; let (bid, _) = expectJust "cgLetNoEscapeRhs" $ maybeLetNoEscape info
120 ; emit (outOfLine $ mkLabel bid <*> rhs_body <*> mkBranch join_id)
125 :: Maybe LocalReg -- Saved cost centre
129 cgLetNoEscapeRhsBody local_cc bndr (StgRhsClosure cc _bi _ _upd _ args body)
130 = cgLetNoEscapeClosure bndr local_cc cc (nonVoidIds args) body
131 cgLetNoEscapeRhsBody local_cc bndr (StgRhsCon cc con args)
132 = cgLetNoEscapeClosure bndr local_cc cc [] (StgConApp con args)
133 -- For a constructor RHS we want to generate a single chunk of
134 -- code which can be jumped to from many places, which will
135 -- return the constructor. It's easy; just behave as if it
136 -- was an StgRhsClosure with a ConApp inside!
138 -------------------------
141 -> Maybe LocalReg -- Slot for saved current cost centre
142 -> CostCentreStack -- XXX: *** NOT USED *** why not?
143 -> [NonVoid Id] -- Args (as in \ args -> body)
144 -> StgExpr -- Body (as in above)
147 cgLetNoEscapeClosure bndr cc_slot _unused_cc args body
148 = do { arg_regs <- forkProc $ do
149 { restoreCurrentCostCentre cc_slot
150 ; arg_regs <- bindArgsToRegs args
151 ; altHeapCheck arg_regs (cgExpr body)
152 -- Using altHeapCheck just reduces
153 -- instructions to save on stack
155 ; return $ lneIdInfo bndr arg_regs}
158 ------------------------------------------------------------------------
160 ------------------------------------------------------------------------
162 {- Note [Compiling case expressions]
163 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
164 It is quite interesting to decide whether to put a heap-check at the
165 start of each alternative. Of course we certainly have to do so if
166 the case forces an evaluation, or if there is a primitive op which can
169 A more interesting situation is this (a Plan-B situation)
175 default -> !R!; ...R...
177 where !x! indicates a possible heap-check point. The heap checks
178 in the alternatives *can* be omitted, in which case the topmost
179 heapcheck will take their worst case into account.
181 In favour of omitting !Q!, !R!:
183 - *May* save a heap overflow test,
184 if ...P... allocates anything.
186 - We can use relative addressing from a single Hp to
187 get at all the closures so allocated.
189 - No need to save volatile vars etc across heap checks
192 Against omitting !Q!, !R!
194 - May put a heap-check into the inner loop. Suppose
195 the main loop is P -> R -> P -> R...
196 Q is the loop exit, and only it does allocation.
197 This only hurts us if P does no allocation. If P allocates,
198 then there is a heap check in the inner loop anyway.
200 - May do more allocation than reqd. This sometimes bites us
201 badly. For example, nfib (ha!) allocates about 30\% more space if the
202 worst-casing is done, because many many calls to nfib are leaf calls
203 which don't need to allocate anything.
205 We can un-allocate, but that costs an instruction
207 Neither problem hurts us if there is only one alternative.
209 Suppose the inner loop is P->R->P->R etc. Then here is
210 how many heap checks we get in the *inner loop* under various
213 Alooc Heap check in branches (!Q!, !R!)?
214 P Q R yes no (absorb to !P!)
215 --------------------------------------
222 Best choices: absorb heap checks from Q and R into !P! iff
223 a) P itself does some allocation
225 b) P does allocation, or there is exactly one alternative
227 We adopt (b) because that is more likely to put the heap check at the
228 entry to a function, when not many things are live. After a bunch of
229 single-branch cases, we may have lots of things live
231 Hence: two basic plans for
235 ------ Plan A: the general case ---------
237 ...save current cost centre...
240 with sequel (SetLocals r)
242 ...restore current cost centre...
244 ...alts do their own heap checks
246 ------ Plan B: special case when ---------
247 (i) e does not allocate or call GC
248 (ii) either upstream code performs allocation
249 or there is just one alternative
251 Then heap allocation in the (single) case branch
252 is absorbed by the upstream check.
253 Very common example: primops on unboxed values
256 with sequel (SetLocals r)...
264 -------------------------------------
266 = GcInAlts -- Put a GC check at the start the case alternatives,
267 [LocalReg] -- which binds these registers
268 SRT -- using this SRT
269 | NoGcInAlts -- The scrutinee is a primitive value, or a call to a
270 -- primitive op which does no GC. Absorb the allocation
271 -- of the case alternative(s) into the upstream check
273 -------------------------------------
274 -- See Note [case on Bool]
275 cgCase :: StgExpr -> Id -> SRT -> AltType -> [StgAlt] -> FCode ()
277 cgCase (OpApp ) bndr srt AlgAlt [(DataAlt flase, a2]
278 | isBoolTy (idType bndr)
283 cgCase scrut bndr srt alt_type alts
284 = do { up_hp_usg <- getVirtHp -- Upstream heap usage
285 ; let ret_bndrs = chooseReturnBndrs bndr alt_type alts
286 alt_regs = map idToReg ret_bndrs
287 simple_scrut = isSimpleScrut scrut alt_type
288 gcInAlts | not simple_scrut = True
289 | isSingleton alts = False
290 | up_hp_usg > 0 = False
292 gc_plan = if gcInAlts then GcInAlts alt_regs srt else NoGcInAlts
294 ; mb_cc <- maybeSaveCostCentre simple_scrut
295 ; withSequel (AssignTo alt_regs gcInAlts) (cgExpr scrut)
296 ; restoreCurrentCostCentre mb_cc
298 -- JD: We need Note: [Better Alt Heap Checks]
299 ; _ <- bindArgsToRegs ret_bndrs
300 ; cgAlts gc_plan (NonVoid bndr) alt_type alts }
303 maybeSaveCostCentre :: Bool -> FCode (Maybe LocalReg)
304 maybeSaveCostCentre simple_scrut
305 | simple_scrut = saveCurrentCostCentre
306 | otherwise = return Nothing
310 isSimpleScrut :: StgExpr -> AltType -> Bool
311 -- Simple scrutinee, does not block or allocate; hence safe to amalgamate
312 -- heap usage from alternatives into the stuff before the case
313 -- NB: if you get this wrong, and claim that the expression doesn't allocate
314 -- when it does, you'll deeply mess up allocation
315 isSimpleScrut (StgOpApp op _ _) _ = isSimpleOp op
316 isSimpleScrut (StgLit _) _ = True -- case 1# of { 0# -> ..; ... }
317 isSimpleScrut (StgApp _ []) (PrimAlt _) = True -- case x# of { 0# -> ..; ... }
318 isSimpleScrut _ _ = False
320 isSimpleOp :: StgOp -> Bool
321 -- True iff the op cannot block or allocate
322 isSimpleOp (StgFCallOp (CCall (CCallSpec _ _ safe)) _) = not (playSafe safe)
323 isSimpleOp (StgFCallOp (DNCall _) _) = False -- Safe!
324 isSimpleOp (StgPrimOp op) = not (primOpOutOfLine op)
325 isSimpleOp (StgPrimCallOp _) = False
328 chooseReturnBndrs :: Id -> AltType -> [StgAlt] -> [NonVoid Id]
329 -- These are the binders of a case that are assigned
330 -- by the evaluation of the scrutinee
331 -- Only non-void ones come back
332 chooseReturnBndrs bndr (PrimAlt _) _alts
335 chooseReturnBndrs _bndr (UbxTupAlt _) [(_, ids, _, _)]
336 = nonVoidIds ids -- 'bndr' is not assigned!
338 chooseReturnBndrs bndr (AlgAlt _) _alts
339 = nonVoidIds [bndr] -- Only 'bndr' is assigned
341 chooseReturnBndrs bndr PolyAlt _alts
342 = nonVoidIds [bndr] -- Only 'bndr' is assigned
344 chooseReturnBndrs _ _ _ = panic "chooseReturnBndrs"
345 -- UbxTupALt has only one alternative
347 -------------------------------------
348 cgAlts :: GcPlan -> NonVoid Id -> AltType -> [StgAlt] -> FCode ()
349 -- At this point the result of the case are in the binders
350 cgAlts gc_plan _bndr PolyAlt [(_, _, _, rhs)]
351 = maybeAltHeapCheck gc_plan (cgExpr rhs)
353 cgAlts gc_plan _bndr (UbxTupAlt _) [(_, _, _, rhs)]
354 = maybeAltHeapCheck gc_plan (cgExpr rhs)
355 -- Here bndrs are *already* in scope, so don't rebind them
357 cgAlts gc_plan bndr (PrimAlt _) alts
358 = do { tagged_cmms <- cgAltRhss gc_plan bndr alts
360 ; let bndr_reg = CmmLocal (idToReg bndr)
361 (DEFAULT,deflt) = head tagged_cmms
362 -- PrimAlts always have a DEFAULT case
363 -- and it always comes first
365 tagged_cmms' = [(lit,code)
366 | (LitAlt lit, code) <- tagged_cmms]
367 ; emit (mkCmmLitSwitch (CmmReg bndr_reg) tagged_cmms' deflt) }
369 cgAlts gc_plan bndr (AlgAlt tycon) alts
370 = do { tagged_cmms <- cgAltRhss gc_plan bndr alts
372 ; let fam_sz = tyConFamilySize tycon
373 bndr_reg = CmmLocal (idToReg bndr)
374 mb_deflt = case tagged_cmms of
375 ((DEFAULT,rhs) : _) -> Just rhs
377 -- DEFAULT is always first, if present
379 branches = [ (dataConTagZ con, cmm)
380 | (DataAlt con, cmm) <- tagged_cmms ]
382 -- Is the constructor tag in the node reg?
383 ; if isSmallFamily fam_sz
384 then let -- Yes, bndr_reg has constr. tag in ls bits
385 tag_expr = cmmConstrTag1 (CmmReg bndr_reg)
386 branches' = [(tag+1,branch) | (tag,branch) <- branches]
388 emitSwitch tag_expr branches' mb_deflt 1 fam_sz
390 else -- No, get tag from info table
391 let -- Note that ptr _always_ has tag 1
392 -- when the family size is big enough
393 untagged_ptr = cmmRegOffB bndr_reg (-1)
394 tag_expr = getConstrTag (untagged_ptr)
396 emitSwitch tag_expr branches mb_deflt 0 (fam_sz - 1) }
398 cgAlts _ _ _ _ = panic "cgAlts"
399 -- UbxTupAlt and PolyAlt have only one alternative
402 cgAltRhss :: GcPlan -> NonVoid Id -> [StgAlt] -> FCode [(AltCon, CmmAGraph)]
403 cgAltRhss gc_plan bndr alts
404 = forkAlts (map cg_alt alts)
406 base_reg = idToReg bndr
407 cg_alt :: StgAlt -> FCode (AltCon, CmmAGraph)
408 cg_alt (con, bndrs, _uses, rhs)
410 maybeAltHeapCheck gc_plan $
411 do { _ <- bindConArgs con base_reg bndrs
415 maybeAltHeapCheck :: GcPlan -> FCode a -> FCode a
416 maybeAltHeapCheck NoGcInAlts code
418 maybeAltHeapCheck (GcInAlts regs _) code
419 = altHeapCheck regs code
421 -----------------------------------------------------------------------------
423 -----------------------------------------------------------------------------
425 cgConApp :: DataCon -> [StgArg] -> FCode ()
426 cgConApp con stg_args
427 | isUnboxedTupleCon con -- Unboxed tuple: assign and return
428 = do { arg_exprs <- getNonVoidArgAmodes stg_args
429 ; tickyUnboxedTupleReturn (length arg_exprs)
430 ; emitReturn arg_exprs }
432 | otherwise -- Boxed constructors; allocate and return
433 = ASSERT( stg_args `lengthIs` dataConRepArity con )
434 do { (idinfo, init) <- buildDynCon (dataConWorkId con) currentCCS con stg_args
435 -- The first "con" says that the name bound to this closure is
436 -- is "con", which is a bit of a fudge, but it only affects profiling
439 ; emitReturn [idInfoToAmode idinfo] }
442 cgIdApp :: Id -> [StgArg] -> FCode ()
443 cgIdApp fun_id [] | isVoidId fun_id = emitReturn []
445 = do { fun_info <- getCgIdInfo fun_id
446 ; case maybeLetNoEscape fun_info of
447 Just (blk_id, lne_regs) -> cgLneJump blk_id lne_regs args
448 Nothing -> cgTailCall fun_id fun_info args }
450 cgLneJump :: BlockId -> [LocalReg] -> [StgArg] -> FCode ()
451 cgLneJump blk_id lne_regs args -- Join point; discard sequel
452 = do { cmm_args <- getNonVoidArgAmodes args
453 ; emit (mkMultiAssign lne_regs cmm_args
454 <*> mkBranch blk_id) }
456 cgTailCall :: Id -> CgIdInfo -> [StgArg] -> FCode ()
457 cgTailCall fun_id fun_info args = do
458 dflags <- getDynFlags
459 case (getCallMethod dflags fun_name (idCafInfo fun_id) lf_info (length args)) of
461 -- A value in WHNF, so we can just return it.
462 ReturnIt -> emitReturn [fun] -- ToDo: does ReturnIt guarantee tagged?
464 EnterIt -> ASSERT( null args ) -- Discarding arguments
465 do { let fun' = CmmLoad fun (cmmExprType fun)
466 ; [ret,call] <- forkAlts [
467 getCode $ emitReturn [fun], -- Is tagged; no need to untag
468 getCode $ do -- emit (mkAssign nodeReg fun)
469 emitCall (NativeNodeCall, NativeReturn)
470 (entryCode fun') [fun]] -- Not tagged
471 ; emit (mkCmmIfThenElse (cmmIsTagged fun) ret call) }
473 SlowCall -> do -- A slow function call via the RTS apply routines
474 { tickySlowCall lf_info args
475 ; emit $ mkComment $ mkFastString "slowCall"
476 ; slowCall fun args }
478 -- A direct function call (possibly with some left-over arguments)
479 DirectEntry lbl arity -> do
480 { tickyDirectCall arity args
481 ; if node_points then
482 do emit $ mkComment $ mkFastString "directEntry"
483 emit (mkAssign nodeReg fun)
484 directCall lbl arity args
485 else do emit $ mkComment $ mkFastString "directEntry else"
486 directCall lbl arity args }
488 JumpToIt {} -> panic "cgTailCall" -- ???
491 fun_name = idName fun_id
492 fun = idInfoToAmode fun_info
493 lf_info = cgIdInfoLF fun_info
494 node_points = nodeMustPointToIt lf_info
497 {- Note [case on Bool]
499 A case on a Boolean value does two things:
500 1. It looks up the Boolean in a closure table and assigns the
501 result to the binder.
502 2. It branches to the True or False case through analysis
503 of the closure assigned to the binder.
504 But the indirection through the closure table is unnecessary
505 if the assignment to the binder will be dead code (use isDeadBndr).
507 The following example illustrates how badly the code turns out:
509 case <=## [ww_s7Hx y_s7HD] of wild2_sbH8 {
510 GHC.Bool.False -> <true code> // sbH8 dead
511 GHC.Bool.True -> <false code> // sbH8 dead
514 _s7HD::F64 = F64[_sbH7::I64 + 7]; // MidAssign
515 _ccsW::I64 = %MO_F_Le_W64(_s7Hx::F64, _s7HD::F64); // MidAssign
516 // emitReturn // MidComment
517 _sbH8::I64 = I64[ghczmprim_GHCziBool_Bool_closure_tbl + (_ccsW::I64 << 3)]; // MidAssign
518 _ccsX::I64 = _sbH8::I64 & 7; // MidAssign
519 if (_ccsX::I64 >= 2) goto ccsH; else goto ccsI; // LastCondBranch
521 The assignments to _sbH8 and _ccsX are completely unnecessary.
522 Instead, we should branch based on the value of _ccsW.
525 {- Note [Better Alt Heap Checks]
526 If two function calls can share a return point, then they will also
527 get the same info table. Therefore, it's worth our effort to make
528 those opportunities appear as frequently as possible.
530 Here are a few examples of how it should work:
534 True -> <True code -- including allocation>
535 False -> <False code>
537 r = call f(x) returns to L;
539 if r & 7 >= 2 goto L1 else goto L2;
544 <True code -- including allocation>
547 Note that the code following both the call to f(x) and the code to gc(r)
548 should be the same, which will allow the common blockifier to discover
549 that they are the same. Therefore, both function calls will return to the same
550 block, and they will use the same info table.
552 Here's an example of the Cmm code we want from a primOp.
553 The primOp doesn't produce an info table for us to reuse, but that's okay:
554 we should still generate the same code:
557 0 -> <0-case code -- including allocation>
558 _ -> <default-case code>
562 if r == 0 then goto L1 else goto L2;
567 <0-case code -- including allocation>