1 -----------------------------------------------------------------------------
3 -- Stg to C-- code generation: expressions
5 -- (c) The University of Glasgow 2004-2006
7 -----------------------------------------------------------------------------
9 module StgCmmExpr ( cgExpr ) where
11 #define FAST_STRING_NOT_NEEDED
12 #include "HsVersions.h"
14 import {-# SOURCE #-} StgCmmBind ( cgBind )
41 import CostCentre ( CostCentreStack, currentCCS )
42 import Control.Monad (when)
49 ------------------------------------------------------------------------
50 -- cgExpr: the main function
51 ------------------------------------------------------------------------
53 cgExpr :: StgExpr -> FCode ()
55 cgExpr (StgApp fun args) = cgIdApp fun args
56 cgExpr (StgOpApp op args ty) = cgOpApp op args ty
57 cgExpr (StgConApp con args) = cgConApp con args
58 cgExpr (StgSCC cc expr) = do { emitSetCCC cc; cgExpr expr }
59 cgExpr (StgTick m n expr) = do { emit (mkTickBox m n); cgExpr expr }
60 cgExpr (StgLit lit) = do cmm_lit <- cgLit lit
61 emitReturn [CmmLit cmm_lit]
63 cgExpr (StgLet binds expr) = do { cgBind binds; cgExpr expr }
64 cgExpr (StgLetNoEscape _ _ binds expr) =
65 do { us <- newUniqSupply
66 ; let join_id = mkBlockId (uniqFromSupply us)
67 ; cgLneBinds join_id binds
69 ; emit $ mkLabel join_id}
71 cgExpr (StgCase expr _live_vars _save_vars bndr srt alt_type alts) =
72 cgCase expr bndr srt alt_type alts
74 cgExpr (StgLam {}) = panic "cgExpr: StgLam"
76 ------------------------------------------------------------------------
78 ------------------------------------------------------------------------
80 {- Generating code for a let-no-escape binding, aka join point is very
81 very similar to what we do for a case expression. The duality is
88 That is, the RHS of 'x' (ie 'b') will execute *later*, just like
89 the alternative of the case; it needs to be compiled in an environment
90 in which all volatile bindings are forgotten, and the free vars are
91 bound only to stable things like stack locations.. The 'e' part will
92 execute *next*, just like the scrutinee of a case. -}
94 -------------------------
95 cgLneBinds :: BlockId -> StgBinding -> FCode ()
96 cgLneBinds join_id (StgNonRec bndr rhs)
97 = do { local_cc <- saveCurrentCostCentre
98 -- See Note [Saving the current cost centre]
99 ; info <- cgLetNoEscapeRhs join_id local_cc bndr rhs
100 ; addBindC (cg_id info) info }
102 cgLneBinds join_id (StgRec pairs)
103 = do { local_cc <- saveCurrentCostCentre
104 ; new_bindings <- fixC (\ new_bindings -> do
105 { addBindsC new_bindings
106 ; listFCs [ cgLetNoEscapeRhs join_id local_cc b e
107 | (b,e) <- pairs ] })
108 ; addBindsC new_bindings }
111 -------------------------
113 :: BlockId -- join point for successor of let-no-escape
114 -> Maybe LocalReg -- Saved cost centre
119 cgLetNoEscapeRhs join_id local_cc bndr rhs =
120 do { (info, rhs_body) <- getCodeR $ cgLetNoEscapeRhsBody local_cc bndr rhs
121 ; let (bid, _) = expectJust "cgLetNoEscapeRhs" $ maybeLetNoEscape info
122 ; emit (outOfLine $ mkLabel bid <*> rhs_body <*> mkBranch join_id)
127 :: Maybe LocalReg -- Saved cost centre
131 cgLetNoEscapeRhsBody local_cc bndr (StgRhsClosure cc _bi _ _upd _ args body)
132 = cgLetNoEscapeClosure bndr local_cc cc (nonVoidIds args) body
133 cgLetNoEscapeRhsBody local_cc bndr (StgRhsCon cc con args)
134 = cgLetNoEscapeClosure bndr local_cc cc [] (StgConApp con args)
135 -- For a constructor RHS we want to generate a single chunk of
136 -- code which can be jumped to from many places, which will
137 -- return the constructor. It's easy; just behave as if it
138 -- was an StgRhsClosure with a ConApp inside!
140 -------------------------
143 -> Maybe LocalReg -- Slot for saved current cost centre
144 -> CostCentreStack -- XXX: *** NOT USED *** why not?
145 -> [NonVoid Id] -- Args (as in \ args -> body)
146 -> StgExpr -- Body (as in above)
149 cgLetNoEscapeClosure bndr cc_slot _unused_cc args body
150 = do { arg_regs <- forkProc $ do
151 { restoreCurrentCostCentre cc_slot
152 ; arg_regs <- bindArgsToRegs args
153 ; altHeapCheck arg_regs (cgExpr body)
154 -- Using altHeapCheck just reduces
155 -- instructions to save on stack
157 ; return $ lneIdInfo bndr arg_regs}
160 ------------------------------------------------------------------------
162 ------------------------------------------------------------------------
164 {- Note [Compiling case expressions]
165 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
166 It is quite interesting to decide whether to put a heap-check at the
167 start of each alternative. Of course we certainly have to do so if
168 the case forces an evaluation, or if there is a primitive op which can
171 A more interesting situation is this (a Plan-B situation)
177 default -> !R!; ...R...
179 where !x! indicates a possible heap-check point. The heap checks
180 in the alternatives *can* be omitted, in which case the topmost
181 heapcheck will take their worst case into account.
183 In favour of omitting !Q!, !R!:
185 - *May* save a heap overflow test,
186 if ...P... allocates anything.
188 - We can use relative addressing from a single Hp to
189 get at all the closures so allocated.
191 - No need to save volatile vars etc across heap checks
194 Against omitting !Q!, !R!
196 - May put a heap-check into the inner loop. Suppose
197 the main loop is P -> R -> P -> R...
198 Q is the loop exit, and only it does allocation.
199 This only hurts us if P does no allocation. If P allocates,
200 then there is a heap check in the inner loop anyway.
202 - May do more allocation than reqd. This sometimes bites us
203 badly. For example, nfib (ha!) allocates about 30\% more space if the
204 worst-casing is done, because many many calls to nfib are leaf calls
205 which don't need to allocate anything.
207 We can un-allocate, but that costs an instruction
209 Neither problem hurts us if there is only one alternative.
211 Suppose the inner loop is P->R->P->R etc. Then here is
212 how many heap checks we get in the *inner loop* under various
215 Alooc Heap check in branches (!Q!, !R!)?
216 P Q R yes no (absorb to !P!)
217 --------------------------------------
224 Best choices: absorb heap checks from Q and R into !P! iff
225 a) P itself does some allocation
227 b) P does allocation, or there is exactly one alternative
229 We adopt (b) because that is more likely to put the heap check at the
230 entry to a function, when not many things are live. After a bunch of
231 single-branch cases, we may have lots of things live
233 Hence: two basic plans for
237 ------ Plan A: the general case ---------
239 ...save current cost centre...
242 with sequel (SetLocals r)
244 ...restore current cost centre...
246 ...alts do their own heap checks
248 ------ Plan B: special case when ---------
249 (i) e does not allocate or call GC
250 (ii) either upstream code performs allocation
251 or there is just one alternative
253 Then heap allocation in the (single) case branch
254 is absorbed by the upstream check.
255 Very common example: primops on unboxed values
258 with sequel (SetLocals r)...
266 -------------------------------------
268 = GcInAlts -- Put a GC check at the start the case alternatives,
269 [LocalReg] -- which binds these registers
270 SRT -- using this SRT
271 | NoGcInAlts -- The scrutinee is a primitive value, or a call to a
272 -- primitive op which does no GC. Absorb the allocation
273 -- of the case alternative(s) into the upstream check
275 -------------------------------------
276 -- See Note [case on Bool]
277 cgCase :: StgExpr -> Id -> SRT -> AltType -> [StgAlt] -> FCode ()
279 cgCase (OpApp ) bndr srt AlgAlt [(DataAlt flase, a2]
280 | isBoolTy (idType bndr)
285 -- Note [ticket #3132]: we might be looking at a case of a lifted Id
286 -- that was cast to an unlifted type. The Id will always be bottom,
287 -- but we don't want the code generator to fall over here. If we
288 -- just emit an assignment here, the assignment will be
289 -- type-incorrect Cmm. Hence, we emit the usual enter/return code,
290 -- (and because bottom must be untagged, it will be entered and the
291 -- program will crash).
292 -- The Sequel is a type-correct assignment, albeit bogus.
293 -- The (dead) continuation loops; it would be better to invoke some kind
294 -- of panic function here.
296 -- However, we also want to allow an assignment to be generated
297 -- in the case when the types are compatible, because this allows
298 -- some slightly-dodgy but occasionally-useful casts to be used,
299 -- such as in RtClosureInspect where we cast an HValue to a MutVar#
300 -- so we can print out the contents of the MutVar#. If we generate
301 -- code that enters the HValue, then we'll get a runtime panic, because
302 -- the HValue really is a MutVar#. The types are compatible though,
303 -- so we can just generate an assignment.
304 cgCase (StgApp v []) bndr _ alt_type@(PrimAlt _) alts
305 | isUnLiftedType (idType v)
307 = -- assignment suffices for unlifted types
308 do { when (not reps_compatible) $
309 panic "cgCase: reps do not match, perhaps a dodgy unsafeCoerce?"
310 ; v_info <- getCgIdInfo v
311 ; emit (mkAssign (CmmLocal (idToReg (NonVoid bndr))) (idInfoToAmode v_info))
312 ; _ <- bindArgsToRegs [NonVoid bndr]
313 ; cgAlts NoGcInAlts (NonVoid bndr) alt_type alts }
315 reps_compatible = idCgRep v == idCgRep bndr
317 cgCase scrut@(StgApp v []) _ _ (PrimAlt _) _
318 = -- fail at run-time, not compile-time
319 do { mb_cc <- maybeSaveCostCentre True
320 ; withSequel (AssignTo [idToReg (NonVoid v)] False) (cgExpr scrut)
321 ; restoreCurrentCostCentre mb_cc
322 ; emit $ mkComment $ mkFastString "should be unreachable code"
323 ; emit $ withFreshLabel "l" (\l -> mkLabel l <*> mkBranch l)}
325 cgCase scrut bndr srt alt_type alts
326 = -- the general case
327 do { up_hp_usg <- getVirtHp -- Upstream heap usage
328 ; let ret_bndrs = chooseReturnBndrs bndr alt_type alts
329 alt_regs = map idToReg ret_bndrs
330 simple_scrut = isSimpleScrut scrut alt_type
331 gcInAlts | not simple_scrut = True
332 | isSingleton alts = False
333 | up_hp_usg > 0 = False
335 gc_plan = if gcInAlts then GcInAlts alt_regs srt else NoGcInAlts
337 ; mb_cc <- maybeSaveCostCentre simple_scrut
338 ; withSequel (AssignTo alt_regs gcInAlts) (cgExpr scrut)
339 ; restoreCurrentCostCentre mb_cc
341 -- JD: We need Note: [Better Alt Heap Checks]
342 ; _ <- bindArgsToRegs ret_bndrs
343 ; cgAlts gc_plan (NonVoid bndr) alt_type alts }
346 maybeSaveCostCentre :: Bool -> FCode (Maybe LocalReg)
347 maybeSaveCostCentre simple_scrut
348 | simple_scrut = saveCurrentCostCentre
349 | otherwise = return Nothing
353 isSimpleScrut :: StgExpr -> AltType -> Bool
354 -- Simple scrutinee, does not block or allocate; hence safe to amalgamate
355 -- heap usage from alternatives into the stuff before the case
356 -- NB: if you get this wrong, and claim that the expression doesn't allocate
357 -- when it does, you'll deeply mess up allocation
358 isSimpleScrut (StgOpApp op _ _) _ = isSimpleOp op
359 isSimpleScrut (StgLit _) _ = True -- case 1# of { 0# -> ..; ... }
360 isSimpleScrut (StgApp _ []) (PrimAlt _) = True -- case x# of { 0# -> ..; ... }
361 isSimpleScrut _ _ = False
363 isSimpleOp :: StgOp -> Bool
364 -- True iff the op cannot block or allocate
365 isSimpleOp (StgFCallOp (CCall (CCallSpec _ _ safe)) _) = not (playSafe safe)
366 isSimpleOp (StgPrimOp op) = not (primOpOutOfLine op)
367 isSimpleOp (StgPrimCallOp _) = False
370 chooseReturnBndrs :: Id -> AltType -> [StgAlt] -> [NonVoid Id]
371 -- These are the binders of a case that are assigned
372 -- by the evaluation of the scrutinee
373 -- Only non-void ones come back
374 chooseReturnBndrs bndr (PrimAlt _) _alts
377 chooseReturnBndrs _bndr (UbxTupAlt _) [(_, ids, _, _)]
378 = nonVoidIds ids -- 'bndr' is not assigned!
380 chooseReturnBndrs bndr (AlgAlt _) _alts
381 = nonVoidIds [bndr] -- Only 'bndr' is assigned
383 chooseReturnBndrs bndr PolyAlt _alts
384 = nonVoidIds [bndr] -- Only 'bndr' is assigned
386 chooseReturnBndrs _ _ _ = panic "chooseReturnBndrs"
387 -- UbxTupALt has only one alternative
389 -------------------------------------
390 cgAlts :: GcPlan -> NonVoid Id -> AltType -> [StgAlt] -> FCode ()
391 -- At this point the result of the case are in the binders
392 cgAlts gc_plan _bndr PolyAlt [(_, _, _, rhs)]
393 = maybeAltHeapCheck gc_plan (cgExpr rhs)
395 cgAlts gc_plan _bndr (UbxTupAlt _) [(_, _, _, rhs)]
396 = maybeAltHeapCheck gc_plan (cgExpr rhs)
397 -- Here bndrs are *already* in scope, so don't rebind them
399 cgAlts gc_plan bndr (PrimAlt _) alts
400 = do { tagged_cmms <- cgAltRhss gc_plan bndr alts
402 ; let bndr_reg = CmmLocal (idToReg bndr)
403 (DEFAULT,deflt) = head tagged_cmms
404 -- PrimAlts always have a DEFAULT case
405 -- and it always comes first
407 tagged_cmms' = [(lit,code)
408 | (LitAlt lit, code) <- tagged_cmms]
409 ; emit (mkCmmLitSwitch (CmmReg bndr_reg) tagged_cmms' deflt) }
411 cgAlts gc_plan bndr (AlgAlt tycon) alts
412 = do { tagged_cmms <- cgAltRhss gc_plan bndr alts
414 ; let fam_sz = tyConFamilySize tycon
415 bndr_reg = CmmLocal (idToReg bndr)
416 mb_deflt = case tagged_cmms of
417 ((DEFAULT,rhs) : _) -> Just rhs
419 -- DEFAULT is always first, if present
421 branches = [ (dataConTagZ con, cmm)
422 | (DataAlt con, cmm) <- tagged_cmms ]
424 -- Is the constructor tag in the node reg?
425 ; if isSmallFamily fam_sz
426 then let -- Yes, bndr_reg has constr. tag in ls bits
427 tag_expr = cmmConstrTag1 (CmmReg bndr_reg)
428 branches' = [(tag+1,branch) | (tag,branch) <- branches]
430 emitSwitch tag_expr branches' mb_deflt 1 fam_sz
432 else -- No, get tag from info table
433 let -- Note that ptr _always_ has tag 1
434 -- when the family size is big enough
435 untagged_ptr = cmmRegOffB bndr_reg (-1)
436 tag_expr = getConstrTag (untagged_ptr)
438 emitSwitch tag_expr branches mb_deflt 0 (fam_sz - 1) }
440 cgAlts _ _ _ _ = panic "cgAlts"
441 -- UbxTupAlt and PolyAlt have only one alternative
444 cgAltRhss :: GcPlan -> NonVoid Id -> [StgAlt] -> FCode [(AltCon, CmmAGraph)]
445 cgAltRhss gc_plan bndr alts
446 = forkAlts (map cg_alt alts)
448 base_reg = idToReg bndr
449 cg_alt :: StgAlt -> FCode (AltCon, CmmAGraph)
450 cg_alt (con, bndrs, _uses, rhs)
452 maybeAltHeapCheck gc_plan $
453 do { _ <- bindConArgs con base_reg bndrs
457 maybeAltHeapCheck :: GcPlan -> FCode a -> FCode a
458 maybeAltHeapCheck NoGcInAlts code
460 maybeAltHeapCheck (GcInAlts regs _) code
461 = altHeapCheck regs code
463 -----------------------------------------------------------------------------
465 -----------------------------------------------------------------------------
467 cgConApp :: DataCon -> [StgArg] -> FCode ()
468 cgConApp con stg_args
469 | isUnboxedTupleCon con -- Unboxed tuple: assign and return
470 = do { arg_exprs <- getNonVoidArgAmodes stg_args
471 ; tickyUnboxedTupleReturn (length arg_exprs)
472 ; emitReturn arg_exprs }
474 | otherwise -- Boxed constructors; allocate and return
475 = ASSERT( stg_args `lengthIs` dataConRepArity con )
476 do { (idinfo, init) <- buildDynCon (dataConWorkId con) currentCCS con stg_args
477 -- The first "con" says that the name bound to this closure is
478 -- is "con", which is a bit of a fudge, but it only affects profiling
481 ; emitReturn [idInfoToAmode idinfo] }
484 cgIdApp :: Id -> [StgArg] -> FCode ()
485 cgIdApp fun_id [] | isVoidId fun_id = emitReturn []
487 = do { fun_info <- getCgIdInfo fun_id
488 ; case maybeLetNoEscape fun_info of
489 Just (blk_id, lne_regs) -> cgLneJump blk_id lne_regs args
490 Nothing -> cgTailCall fun_id fun_info args }
492 cgLneJump :: BlockId -> [LocalReg] -> [StgArg] -> FCode ()
493 cgLneJump blk_id lne_regs args -- Join point; discard sequel
494 = do { cmm_args <- getNonVoidArgAmodes args
495 ; emit (mkMultiAssign lne_regs cmm_args
496 <*> mkBranch blk_id) }
498 cgTailCall :: Id -> CgIdInfo -> [StgArg] -> FCode ()
499 cgTailCall fun_id fun_info args = do
500 dflags <- getDynFlags
501 case (getCallMethod dflags fun_name (idCafInfo fun_id) lf_info (length args)) of
503 -- A value in WHNF, so we can just return it.
504 ReturnIt -> emitReturn [fun] -- ToDo: does ReturnIt guarantee tagged?
506 EnterIt -> ASSERT( null args ) -- Discarding arguments
507 do { let fun' = CmmLoad fun (cmmExprType fun)
508 ; [ret,call] <- forkAlts [
509 getCode $ emitReturn [fun], -- Is tagged; no need to untag
510 getCode $ do -- emit (mkAssign nodeReg fun)
511 emitCall (NativeNodeCall, NativeReturn)
512 (entryCode fun') [fun]] -- Not tagged
513 ; emit (mkCmmIfThenElse (cmmIsTagged fun) ret call) }
515 SlowCall -> do -- A slow function call via the RTS apply routines
516 { tickySlowCall lf_info args
517 ; emit $ mkComment $ mkFastString "slowCall"
518 ; slowCall fun args }
520 -- A direct function call (possibly with some left-over arguments)
521 DirectEntry lbl arity -> do
522 { tickyDirectCall arity args
523 ; if node_points then
524 do emit $ mkComment $ mkFastString "directEntry"
525 emit (mkAssign nodeReg fun)
526 directCall lbl arity args
527 else do emit $ mkComment $ mkFastString "directEntry else"
528 directCall lbl arity args }
530 JumpToIt {} -> panic "cgTailCall" -- ???
533 fun_name = idName fun_id
534 fun = idInfoToAmode fun_info
535 lf_info = cgIdInfoLF fun_info
536 node_points = nodeMustPointToIt lf_info
539 {- Note [case on Bool]
541 A case on a Boolean value does two things:
542 1. It looks up the Boolean in a closure table and assigns the
543 result to the binder.
544 2. It branches to the True or False case through analysis
545 of the closure assigned to the binder.
546 But the indirection through the closure table is unnecessary
547 if the assignment to the binder will be dead code (use isDeadBndr).
549 The following example illustrates how badly the code turns out:
551 case <=## [ww_s7Hx y_s7HD] of wild2_sbH8 {
552 GHC.Types.False -> <true code> // sbH8 dead
553 GHC.Types.True -> <false code> // sbH8 dead
556 _s7HD::F64 = F64[_sbH7::I64 + 7]; // MidAssign
557 _ccsW::I64 = %MO_F_Le_W64(_s7Hx::F64, _s7HD::F64); // MidAssign
558 // emitReturn // MidComment
559 _sbH8::I64 = I64[ghczmprim_GHCziBool_Bool_closure_tbl + (_ccsW::I64 << 3)]; // MidAssign
560 _ccsX::I64 = _sbH8::I64 & 7; // MidAssign
561 if (_ccsX::I64 >= 2) goto ccsH; else goto ccsI; // LastCondBranch
563 The assignments to _sbH8 and _ccsX are completely unnecessary.
564 Instead, we should branch based on the value of _ccsW.
567 {- Note [Better Alt Heap Checks]
568 If two function calls can share a return point, then they will also
569 get the same info table. Therefore, it's worth our effort to make
570 those opportunities appear as frequently as possible.
572 Here are a few examples of how it should work:
576 True -> <True code -- including allocation>
577 False -> <False code>
579 r = call f(x) returns to L;
581 if r & 7 >= 2 goto L1 else goto L2;
586 <True code -- including allocation>
589 Note that the code following both the call to f(x) and the code to gc(r)
590 should be the same, which will allow the common blockifier to discover
591 that they are the same. Therefore, both function calls will return to the same
592 block, and they will use the same info table.
594 Here's an example of the Cmm code we want from a primOp.
595 The primOp doesn't produce an info table for us to reuse, but that's okay:
596 we should still generate the same code:
599 0 -> <0-case code -- including allocation>
600 _ -> <default-case code>
604 if r == 0 then goto L1 else goto L2;
609 <0-case code -- including allocation>