1 -----------------------------------------------------------------------------
3 -- Stg to C-- code generation: expressions
5 -- (c) The University of Glasgow 2004-2006
7 -----------------------------------------------------------------------------
9 module StgCmmExpr ( cgExpr ) where
11 #define FAST_STRING_NOT_NEEDED
12 #include "HsVersions.h"
14 import {-# SOURCE #-} StgCmmBind ( cgBind )
42 import CostCentre ( CostCentreStack, currentCCS )
43 import Control.Monad (when)
50 ------------------------------------------------------------------------
51 -- cgExpr: the main function
52 ------------------------------------------------------------------------
54 cgExpr :: StgExpr -> FCode ()
56 cgExpr (StgApp fun args) = cgIdApp fun args
57 cgExpr (StgOpApp op args ty) = cgOpApp op args ty
58 cgExpr (StgConApp con args) = cgConApp con args
59 cgExpr (StgSCC cc expr) = do { emitSetCCC cc; cgExpr expr }
60 cgExpr (StgTick m n expr) = do { emit (mkTickBox m n); cgExpr expr }
61 cgExpr (StgLit lit) = do cmm_lit <- cgLit lit
62 emitReturn [CmmLit cmm_lit]
64 cgExpr (StgLet binds expr) = do { cgBind binds; cgExpr expr }
65 cgExpr (StgLetNoEscape _ _ binds expr) =
66 do { us <- newUniqSupply
67 ; let join_id = mkBlockId (uniqFromSupply us)
68 ; cgLneBinds join_id binds
70 ; emit $ mkLabel join_id}
72 cgExpr (StgCase expr _live_vars _save_vars bndr srt alt_type alts) =
73 cgCase expr bndr srt alt_type alts
75 cgExpr (StgLam {}) = panic "cgExpr: StgLam"
77 ------------------------------------------------------------------------
79 ------------------------------------------------------------------------
81 {- Generating code for a let-no-escape binding, aka join point is very
82 very similar to what we do for a case expression. The duality is
89 That is, the RHS of 'x' (ie 'b') will execute *later*, just like
90 the alternative of the case; it needs to be compiled in an environment
91 in which all volatile bindings are forgotten, and the free vars are
92 bound only to stable things like stack locations.. The 'e' part will
93 execute *next*, just like the scrutinee of a case. -}
95 -------------------------
96 cgLneBinds :: BlockId -> StgBinding -> FCode ()
97 cgLneBinds join_id (StgNonRec bndr rhs)
98 = do { local_cc <- saveCurrentCostCentre
99 -- See Note [Saving the current cost centre]
100 ; info <- cgLetNoEscapeRhs join_id local_cc bndr rhs
101 ; addBindC (cg_id info) info }
103 cgLneBinds join_id (StgRec pairs)
104 = do { local_cc <- saveCurrentCostCentre
105 ; new_bindings <- fixC (\ new_bindings -> do
106 { addBindsC new_bindings
107 ; listFCs [ cgLetNoEscapeRhs join_id local_cc b e
108 | (b,e) <- pairs ] })
109 ; addBindsC new_bindings }
112 -------------------------
114 :: BlockId -- join point for successor of let-no-escape
115 -> Maybe LocalReg -- Saved cost centre
120 cgLetNoEscapeRhs join_id local_cc bndr rhs =
121 do { (info, rhs_body) <- getCodeR $ cgLetNoEscapeRhsBody local_cc bndr rhs
122 ; let (bid, _) = expectJust "cgLetNoEscapeRhs" $ maybeLetNoEscape info
123 ; emit (outOfLine $ mkLabel bid <*> rhs_body <*> mkBranch join_id)
128 :: Maybe LocalReg -- Saved cost centre
132 cgLetNoEscapeRhsBody local_cc bndr (StgRhsClosure cc _bi _ _upd _ args body)
133 = cgLetNoEscapeClosure bndr local_cc cc (nonVoidIds args) body
134 cgLetNoEscapeRhsBody local_cc bndr (StgRhsCon cc con args)
135 = cgLetNoEscapeClosure bndr local_cc cc [] (StgConApp con args)
136 -- For a constructor RHS we want to generate a single chunk of
137 -- code which can be jumped to from many places, which will
138 -- return the constructor. It's easy; just behave as if it
139 -- was an StgRhsClosure with a ConApp inside!
141 -------------------------
144 -> Maybe LocalReg -- Slot for saved current cost centre
145 -> CostCentreStack -- XXX: *** NOT USED *** why not?
146 -> [NonVoid Id] -- Args (as in \ args -> body)
147 -> StgExpr -- Body (as in above)
150 cgLetNoEscapeClosure bndr cc_slot _unused_cc args body
151 = do { arg_regs <- forkProc $ do
152 { restoreCurrentCostCentre cc_slot
153 ; arg_regs <- bindArgsToRegs args
154 ; altHeapCheck arg_regs (cgExpr body)
155 -- Using altHeapCheck just reduces
156 -- instructions to save on stack
158 ; return $ lneIdInfo bndr arg_regs}
161 ------------------------------------------------------------------------
163 ------------------------------------------------------------------------
165 {- Note [Compiling case expressions]
166 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
167 It is quite interesting to decide whether to put a heap-check at the
168 start of each alternative. Of course we certainly have to do so if
169 the case forces an evaluation, or if there is a primitive op which can
172 A more interesting situation is this (a Plan-B situation)
178 default -> !R!; ...R...
180 where !x! indicates a possible heap-check point. The heap checks
181 in the alternatives *can* be omitted, in which case the topmost
182 heapcheck will take their worst case into account.
184 In favour of omitting !Q!, !R!:
186 - *May* save a heap overflow test,
187 if ...P... allocates anything.
189 - We can use relative addressing from a single Hp to
190 get at all the closures so allocated.
192 - No need to save volatile vars etc across heap checks
195 Against omitting !Q!, !R!
197 - May put a heap-check into the inner loop. Suppose
198 the main loop is P -> R -> P -> R...
199 Q is the loop exit, and only it does allocation.
200 This only hurts us if P does no allocation. If P allocates,
201 then there is a heap check in the inner loop anyway.
203 - May do more allocation than reqd. This sometimes bites us
204 badly. For example, nfib (ha!) allocates about 30\% more space if the
205 worst-casing is done, because many many calls to nfib are leaf calls
206 which don't need to allocate anything.
208 We can un-allocate, but that costs an instruction
210 Neither problem hurts us if there is only one alternative.
212 Suppose the inner loop is P->R->P->R etc. Then here is
213 how many heap checks we get in the *inner loop* under various
216 Alooc Heap check in branches (!Q!, !R!)?
217 P Q R yes no (absorb to !P!)
218 --------------------------------------
225 Best choices: absorb heap checks from Q and R into !P! iff
226 a) P itself does some allocation
228 b) P does allocation, or there is exactly one alternative
230 We adopt (b) because that is more likely to put the heap check at the
231 entry to a function, when not many things are live. After a bunch of
232 single-branch cases, we may have lots of things live
234 Hence: two basic plans for
238 ------ Plan A: the general case ---------
240 ...save current cost centre...
243 with sequel (SetLocals r)
245 ...restore current cost centre...
247 ...alts do their own heap checks
249 ------ Plan B: special case when ---------
250 (i) e does not allocate or call GC
251 (ii) either upstream code performs allocation
252 or there is just one alternative
254 Then heap allocation in the (single) case branch
255 is absorbed by the upstream check.
256 Very common example: primops on unboxed values
259 with sequel (SetLocals r)...
267 -------------------------------------
269 = GcInAlts -- Put a GC check at the start the case alternatives,
270 [LocalReg] -- which binds these registers
271 SRT -- using this SRT
272 | NoGcInAlts -- The scrutinee is a primitive value, or a call to a
273 -- primitive op which does no GC. Absorb the allocation
274 -- of the case alternative(s) into the upstream check
276 -------------------------------------
277 -- See Note [case on Bool]
278 cgCase :: StgExpr -> Id -> SRT -> AltType -> [StgAlt] -> FCode ()
280 cgCase (OpApp ) bndr srt AlgAlt [(DataAlt flase, a2]
281 | isBoolTy (idType bndr)
286 -- Note [ticket #3132]: we might be looking at a case of a lifted Id
287 -- that was cast to an unlifted type. The Id will always be bottom,
288 -- but we don't want the code generator to fall over here. If we
289 -- just emit an assignment here, the assignment will be
290 -- type-incorrect Cmm. Hence, we emit the usual enter/return code,
291 -- (and because bottom must be untagged, it will be entered and the
292 -- program will crash).
293 -- The Sequel is a type-correct assignment, albeit bogus.
294 -- The (dead) continuation loops; it would be better to invoke some kind
295 -- of panic function here.
297 -- However, we also want to allow an assignment to be generated
298 -- in the case when the types are compatible, because this allows
299 -- some slightly-dodgy but occasionally-useful casts to be used,
300 -- such as in RtClosureInspect where we cast an HValue to a MutVar#
301 -- so we can print out the contents of the MutVar#. If we generate
302 -- code that enters the HValue, then we'll get a runtime panic, because
303 -- the HValue really is a MutVar#. The types are compatible though,
304 -- so we can just generate an assignment.
305 cgCase (StgApp v []) bndr _ alt_type@(PrimAlt _) alts
306 | isUnLiftedType (idType v)
308 = -- assignment suffices for unlifted types
309 do { when (not reps_compatible) $
310 panic "cgCase: reps do not match, perhaps a dodgy unsafeCoerce?"
311 ; v_info <- getCgIdInfo v
312 ; emit (mkAssign (CmmLocal (idToReg (NonVoid bndr))) (idInfoToAmode v_info))
313 ; _ <- bindArgsToRegs [NonVoid bndr]
314 ; cgAlts NoGcInAlts (NonVoid bndr) alt_type alts }
316 reps_compatible = idCgRep v == idCgRep bndr
318 cgCase scrut@(StgApp v []) _ _ (PrimAlt _) _
319 = -- fail at run-time, not compile-time
320 do { mb_cc <- maybeSaveCostCentre True
321 ; withSequel (AssignTo [idToReg (NonVoid v)] False) (cgExpr scrut)
322 ; restoreCurrentCostCentre mb_cc
323 ; emit $ mkComment $ mkFastString "should be unreachable code"
324 ; emit $ withFreshLabel "l" (\l -> mkLabel l <*> mkBranch l)}
326 cgCase scrut bndr srt alt_type alts
327 = -- the general case
328 do { up_hp_usg <- getVirtHp -- Upstream heap usage
329 ; let ret_bndrs = chooseReturnBndrs bndr alt_type alts
330 alt_regs = map idToReg ret_bndrs
331 simple_scrut = isSimpleScrut scrut alt_type
332 gcInAlts | not simple_scrut = True
333 | isSingleton alts = False
334 | up_hp_usg > 0 = False
336 gc_plan = if gcInAlts then GcInAlts alt_regs srt else NoGcInAlts
338 ; mb_cc <- maybeSaveCostCentre simple_scrut
339 ; withSequel (AssignTo alt_regs gcInAlts) (cgExpr scrut)
340 ; restoreCurrentCostCentre mb_cc
342 -- JD: We need Note: [Better Alt Heap Checks]
343 ; _ <- bindArgsToRegs ret_bndrs
344 ; cgAlts gc_plan (NonVoid bndr) alt_type alts }
347 maybeSaveCostCentre :: Bool -> FCode (Maybe LocalReg)
348 maybeSaveCostCentre simple_scrut
349 | simple_scrut = saveCurrentCostCentre
350 | otherwise = return Nothing
354 isSimpleScrut :: StgExpr -> AltType -> Bool
355 -- Simple scrutinee, does not block or allocate; hence safe to amalgamate
356 -- heap usage from alternatives into the stuff before the case
357 -- NB: if you get this wrong, and claim that the expression doesn't allocate
358 -- when it does, you'll deeply mess up allocation
359 isSimpleScrut (StgOpApp op _ _) _ = isSimpleOp op
360 isSimpleScrut (StgLit _) _ = True -- case 1# of { 0# -> ..; ... }
361 isSimpleScrut (StgApp _ []) (PrimAlt _) = True -- case x# of { 0# -> ..; ... }
362 isSimpleScrut _ _ = False
364 isSimpleOp :: StgOp -> Bool
365 -- True iff the op cannot block or allocate
366 isSimpleOp (StgFCallOp (CCall (CCallSpec _ _ safe)) _) = not (playSafe safe)
367 isSimpleOp (StgPrimOp op) = not (primOpOutOfLine op)
368 isSimpleOp (StgPrimCallOp _) = False
371 chooseReturnBndrs :: Id -> AltType -> [StgAlt] -> [NonVoid Id]
372 -- These are the binders of a case that are assigned
373 -- by the evaluation of the scrutinee
374 -- Only non-void ones come back
375 chooseReturnBndrs bndr (PrimAlt _) _alts
378 chooseReturnBndrs _bndr (UbxTupAlt _) [(_, ids, _, _)]
379 = nonVoidIds ids -- 'bndr' is not assigned!
381 chooseReturnBndrs bndr (AlgAlt _) _alts
382 = nonVoidIds [bndr] -- Only 'bndr' is assigned
384 chooseReturnBndrs bndr PolyAlt _alts
385 = nonVoidIds [bndr] -- Only 'bndr' is assigned
387 chooseReturnBndrs _ _ _ = panic "chooseReturnBndrs"
388 -- UbxTupALt has only one alternative
390 -------------------------------------
391 cgAlts :: GcPlan -> NonVoid Id -> AltType -> [StgAlt] -> FCode ()
392 -- At this point the result of the case are in the binders
393 cgAlts gc_plan _bndr PolyAlt [(_, _, _, rhs)]
394 = maybeAltHeapCheck gc_plan (cgExpr rhs)
396 cgAlts gc_plan _bndr (UbxTupAlt _) [(_, _, _, rhs)]
397 = maybeAltHeapCheck gc_plan (cgExpr rhs)
398 -- Here bndrs are *already* in scope, so don't rebind them
400 cgAlts gc_plan bndr (PrimAlt _) alts
401 = do { tagged_cmms <- cgAltRhss gc_plan bndr alts
403 ; let bndr_reg = CmmLocal (idToReg bndr)
404 (DEFAULT,deflt) = head tagged_cmms
405 -- PrimAlts always have a DEFAULT case
406 -- and it always comes first
408 tagged_cmms' = [(lit,code)
409 | (LitAlt lit, code) <- tagged_cmms]
410 ; emit (mkCmmLitSwitch (CmmReg bndr_reg) tagged_cmms' deflt) }
412 cgAlts gc_plan bndr (AlgAlt tycon) alts
413 = do { tagged_cmms <- cgAltRhss gc_plan bndr alts
415 ; let fam_sz = tyConFamilySize tycon
416 bndr_reg = CmmLocal (idToReg bndr)
417 mb_deflt = case tagged_cmms of
418 ((DEFAULT,rhs) : _) -> Just rhs
420 -- DEFAULT is always first, if present
422 branches = [ (dataConTagZ con, cmm)
423 | (DataAlt con, cmm) <- tagged_cmms ]
425 -- Is the constructor tag in the node reg?
426 ; if isSmallFamily fam_sz
427 then let -- Yes, bndr_reg has constr. tag in ls bits
428 tag_expr = cmmConstrTag1 (CmmReg bndr_reg)
429 branches' = [(tag+1,branch) | (tag,branch) <- branches]
431 emitSwitch tag_expr branches' mb_deflt 1 fam_sz
433 else -- No, get tag from info table
434 let -- Note that ptr _always_ has tag 1
435 -- when the family size is big enough
436 untagged_ptr = cmmRegOffB bndr_reg (-1)
437 tag_expr = getConstrTag (untagged_ptr)
439 emitSwitch tag_expr branches mb_deflt 0 (fam_sz - 1) }
441 cgAlts _ _ _ _ = panic "cgAlts"
442 -- UbxTupAlt and PolyAlt have only one alternative
445 cgAltRhss :: GcPlan -> NonVoid Id -> [StgAlt] -> FCode [(AltCon, CmmAGraph)]
446 cgAltRhss gc_plan bndr alts
447 = forkAlts (map cg_alt alts)
449 base_reg = idToReg bndr
450 cg_alt :: StgAlt -> FCode (AltCon, CmmAGraph)
451 cg_alt (con, bndrs, _uses, rhs)
453 maybeAltHeapCheck gc_plan $
454 do { _ <- bindConArgs con base_reg bndrs
458 maybeAltHeapCheck :: GcPlan -> FCode a -> FCode a
459 maybeAltHeapCheck NoGcInAlts code
461 maybeAltHeapCheck (GcInAlts regs _) code
462 = altHeapCheck regs code
464 -----------------------------------------------------------------------------
466 -----------------------------------------------------------------------------
468 cgConApp :: DataCon -> [StgArg] -> FCode ()
469 cgConApp con stg_args
470 | isUnboxedTupleCon con -- Unboxed tuple: assign and return
471 = do { arg_exprs <- getNonVoidArgAmodes stg_args
472 ; tickyUnboxedTupleReturn (length arg_exprs)
473 ; emitReturn arg_exprs }
475 | otherwise -- Boxed constructors; allocate and return
476 = ASSERT( stg_args `lengthIs` dataConRepArity con )
477 do { (idinfo, init) <- buildDynCon (dataConWorkId con) currentCCS con stg_args
478 -- The first "con" says that the name bound to this closure is
479 -- is "con", which is a bit of a fudge, but it only affects profiling
482 ; emitReturn [idInfoToAmode idinfo] }
485 cgIdApp :: Id -> [StgArg] -> FCode ()
486 cgIdApp fun_id [] | isVoidId fun_id = emitReturn []
488 = do { fun_info <- getCgIdInfo fun_id
489 ; case maybeLetNoEscape fun_info of
490 Just (blk_id, lne_regs) -> cgLneJump blk_id lne_regs args
491 Nothing -> cgTailCall fun_id fun_info args }
493 cgLneJump :: BlockId -> [LocalReg] -> [StgArg] -> FCode ()
494 cgLneJump blk_id lne_regs args -- Join point; discard sequel
495 = do { cmm_args <- getNonVoidArgAmodes args
496 ; emit (mkMultiAssign lne_regs cmm_args
497 <*> mkBranch blk_id) }
499 cgTailCall :: Id -> CgIdInfo -> [StgArg] -> FCode ()
500 cgTailCall fun_id fun_info args = do
501 dflags <- getDynFlags
502 case (getCallMethod dflags fun_name (idCafInfo fun_id) lf_info (length args)) of
504 -- A value in WHNF, so we can just return it.
505 ReturnIt -> emitReturn [fun] -- ToDo: does ReturnIt guarantee tagged?
507 EnterIt -> ASSERT( null args ) -- Discarding arguments
508 do { let fun' = CmmLoad fun (cmmExprType fun)
509 ; [ret,call] <- forkAlts [
510 getCode $ emitReturn [fun], -- Is tagged; no need to untag
511 getCode $ do -- emit (mkAssign nodeReg fun)
512 emitCall (NativeNodeCall, NativeReturn)
513 (entryCode fun') [fun]] -- Not tagged
514 ; emit (mkCmmIfThenElse (cmmIsTagged fun) ret call) }
516 SlowCall -> do -- A slow function call via the RTS apply routines
517 { tickySlowCall lf_info args
518 ; emit $ mkComment $ mkFastString "slowCall"
519 ; slowCall fun args }
521 -- A direct function call (possibly with some left-over arguments)
522 DirectEntry lbl arity -> do
523 { tickyDirectCall arity args
524 ; if node_points then
525 do emit $ mkComment $ mkFastString "directEntry"
526 emit (mkAssign nodeReg fun)
527 directCall lbl arity args
528 else do emit $ mkComment $ mkFastString "directEntry else"
529 directCall lbl arity args }
531 JumpToIt {} -> panic "cgTailCall" -- ???
534 fun_name = idName fun_id
535 fun = idInfoToAmode fun_info
536 lf_info = cgIdInfoLF fun_info
537 node_points = nodeMustPointToIt lf_info
540 {- Note [case on Bool]
542 A case on a Boolean value does two things:
543 1. It looks up the Boolean in a closure table and assigns the
544 result to the binder.
545 2. It branches to the True or False case through analysis
546 of the closure assigned to the binder.
547 But the indirection through the closure table is unnecessary
548 if the assignment to the binder will be dead code (use isDeadBndr).
550 The following example illustrates how badly the code turns out:
552 case <=## [ww_s7Hx y_s7HD] of wild2_sbH8 {
553 GHC.Bool.False -> <true code> // sbH8 dead
554 GHC.Bool.True -> <false code> // sbH8 dead
557 _s7HD::F64 = F64[_sbH7::I64 + 7]; // MidAssign
558 _ccsW::I64 = %MO_F_Le_W64(_s7Hx::F64, _s7HD::F64); // MidAssign
559 // emitReturn // MidComment
560 _sbH8::I64 = I64[ghczmprim_GHCziBool_Bool_closure_tbl + (_ccsW::I64 << 3)]; // MidAssign
561 _ccsX::I64 = _sbH8::I64 & 7; // MidAssign
562 if (_ccsX::I64 >= 2) goto ccsH; else goto ccsI; // LastCondBranch
564 The assignments to _sbH8 and _ccsX are completely unnecessary.
565 Instead, we should branch based on the value of _ccsW.
568 {- Note [Better Alt Heap Checks]
569 If two function calls can share a return point, then they will also
570 get the same info table. Therefore, it's worth our effort to make
571 those opportunities appear as frequently as possible.
573 Here are a few examples of how it should work:
577 True -> <True code -- including allocation>
578 False -> <False code>
580 r = call f(x) returns to L;
582 if r & 7 >= 2 goto L1 else goto L2;
587 <True code -- including allocation>
590 Note that the code following both the call to f(x) and the code to gc(r)
591 should be the same, which will allow the common blockifier to discover
592 that they are the same. Therefore, both function calls will return to the same
593 block, and they will use the same info table.
595 Here's an example of the Cmm code we want from a primOp.
596 The primOp doesn't produce an info table for us to reuse, but that's okay:
597 we should still generate the same code:
600 0 -> <0-case code -- including allocation>
601 _ -> <default-case code>
605 if r == 0 then goto L1 else goto L2;
610 <0-case code -- including allocation>