1 ------------------------------------
2 Mainly PredTypes (28 Sept 00)
3 ------------------------------------
5 Three things in this commit:
7 1. Main thing: tidy up PredTypes
8 2. Move all Keys into PrelNames
9 3. Check for unboxed tuples in function args
13 The main thing in this commit is to modify the representation of Types
14 so that they are a (much) better for the qualified-type world. This
15 should simplify Jeff's life as he proceeds with implicit parameters
16 and functional dependencies. In particular, PredType, introduced by
17 Jeff, is now blessed and dignified with a place in TypeRep.lhs:
19 data PredType = Class Class [Type]
22 Consider these examples:
23 f :: (Eq a) => a -> Int
24 g :: (?x :: Int -> Int) => a -> Int
25 h :: (r\l) => {r} => {l::Int | r}
27 Here the "Eq a" and "?x :: Int -> Int" and "r\l" are all called
28 *predicates*, and are represented by a PredType. (We don't support
29 TREX records yet, but the setup is designed to expand to allow them.)
31 In addition, Type gains an extra constructor:
33 data Type = .... | PredTy PredType
35 so that PredType is injected directly into Type. So the type
40 I have deleted the hackish IPNote stuff; predicates are dealt with entirely
41 through PredTys, not through NoteTy at all.
44 2. Move Keys into PrelNames
45 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
46 This is just a housekeeping operation. I've moved all the pre-assigned Uniques
47 (aka Keys) from Unique.lhs into PrelNames.lhs. I've also moved knowKeyRdrNames
48 from PrelInfo down into PrelNames. This localises in PrelNames lots of stuff
49 about predefined names. Previously one had to alter three files to add one,
54 Add a static check for unboxed tuple arguments. E.g.
55 data T = T (# Int, Int #)
60 ---------------------------------------
62 ---------------------------------------
64 -funfolding-update-in-place
65 Switching it on doesn't affect many programs, except these
66 sphere is because it makes a critical function (vecsub) more inlinable
74 cichelli 33546k +2.69%
75 typecheck 146023k +1.47%
78 ---------------------------------------
79 Simon's tuning changes: early Sept 2000
80 ---------------------------------------
84 * Eta expand PrelShow.showLitChar. It's impossible to compile this well,
85 and it makes a big difference to some programs (e.g. gen_regexps)
87 * Make PrelList.concat into a good producer (in the foldr/build sense)
92 * Add -ddump-hi-diffs to print out changes in interface files. Useful
93 when watching what the compiler is doing
95 * Add -funfolding-update-in-place to enable the experimental optimisation
96 that makes the inliner a bit keener to inline if it's in the RHS of
97 a thunk that might be updated in place. Sometimes this is a bad idea
98 (one example is in spectral/sphere; see notes in nofib/Simon-nofib-notes)
103 * Fix a bug in SetLevels.lvlMFE. (change ctxt_lvl to dest_level)
104 I don't think this has any performance effect, but it saves making
105 a redundant let-binding that is later eliminated.
107 * Desugar.dsProgram and DsForeign
108 Glom together all the bindings into a single Rec. Previously the
109 bindings generated by 'foreign' declarations were not glommed together, but
110 this led to an infelicity (i.e. poorer code than necessary) in the modules
111 that actually declare Float and Double (explained a bit more in Desugar.dsProgram)
113 * OccurAnal.shortMeOut and IdInfo.shortableIdInfo
114 Don't do the occurrence analyser's shorting out stuff for things which
115 have rules. Comments near IdInfo.shortableIdInfo.
116 This is deeply boring, and mainly to do with making rules work well.
117 Maybe rules should have phases attached too....
119 * CprAnalyse.addIdCprInfo
120 Be a bit more willing to add CPR information to thunks;
121 in particular, if the strictness analyser has just discovered that this
122 is a strict let, then the let-to-case transform will happen, and CPR is fine.
123 This made a big difference to PrelBase.modInt, which had something like
124 modInt = \ x -> let r = ... -> I# v in
125 ...body strict in r...
126 r's RHS isn't a value yet; but modInt returns r in various branches, so
127 if r doesn't have the CPR property then neither does modInt
129 * MkId.mkDataConWrapId
130 Arrange that vanilla constructors, like (:) and I#, get unfoldings that are
131 just a simple variable $w:, $wI#. This ensures they'll be inlined even into
132 rules etc, which makes matching a bit more reliable. The downside is that in
133 situations like (map (:) xs), we'll end up with (map (\y ys. $w: y ys) xs.
134 Which is tiresome but it doesn't happen much.
136 * SaAbsInt.findStrictness
137 Deal with the case where a thing with no arguments is bottom. This is Good.
138 E.g. module M where { foo = error "help" }
139 Suppose we have in another module
141 Then we'd like to do the case-of-error transform, without inlining foo.
146 * Reorganised Simplify.completeBinding (again).
148 * Removed the is_bot field in CoreUnfolding (is_cheap is true if is_bot is!)
149 This is just a tidy up
152 Remove the NewCon constructor from ConDecl. It just added code, and nothing else.
153 And it led to a bug in MkIface, which though that a newtype decl was always changing!
155 * IdInfo and many others
156 Remove all vestiges of UpdateInfo (hasn't been used for years)
158 ------------------------------
159 Join points Sept 2000
160 ------------------------------
162 With Andrew Kennedy, I found out why a few of the join points introduced by
163 the simplifier end up as *not* let-no-escpaed. Here's an example:
165 f x y = case (pwr x b) == 1 of
170 f = \ @ t w :: Integer ->
172 $j :: (State# RealWorld -> Bool)
175 = \ w1 :: (State# RealWorld) ->
176 case pwr w c of wild {
177 S# i -> case i of wild1 { 1 -> $wTrue; __DEFAULT -> $wFalse };
179 case cmpIntegerInt# s d1 1 of wild2 {
180 0 -> $wTrue; __DEFAULT -> $wFalse
184 case pwr w b of wild {
186 case i of wild1 { 1 -> $j realWorld#; __DEFAULT -> $wFalse };
188 case cmpIntegerInt# s d1 1 of wild2 {
189 0 -> $j realWorld#; __DEFAULT -> $wFalse
199 Suppose f is inlined into this case. No new join points are introduced,
200 because the alternatives are both small. But the consumer
201 case [.] of {True -> False; False -> True}
202 will move into the body of f, be duplicated 4 ways, and end up consuming
203 the result of the four outcomes at the body of f. This yields:
204 $j :: (State# RealWorld -> Bool)
207 = \ w1 :: (State# RealWorld) ->
208 case pwr w c of wild {
209 S# i -> case i of wild1 { 1 -> $wTrue; __DEFAULT -> $wFalse };
211 case cmpIntegerInt# s d1 1 of wild2 {
212 0 -> $wTrue; __DEFAULT -> $wFalse
216 case pwr w b of wild {
218 case i of wild1 { 1 -> case $j realWorld# of {T->F; F->T}
219 ; __DEFAULT -> $wTrue };
221 case cmpIntegerInt# s d1 1 of wild2 {
222 0 -> case $j realWorld# of {T->F; F->T}
223 ; __DEFAULT -> $wTrue
227 And, voila, the join point $j isn't let-no-escaped any more.
228 The point is that the consuming context can't "see inside" the join point.
229 It's a phase ordering thing. If f is inlined before the join points
230 are built in the first place, then all is well.
234 -----------------------------
236 -----------------------------
238 * Make the simplifier's Stop continuation record whether the expression being
239 simplified is the RHS of a thunk, or (say) the body of a lambda or case RHS.
240 In the thunk case we want to be a bit keener about inlining if the type of
241 the thunk is amenable to update in place.
243 * SetLevels was being a bit too eager to float things to the top
244 level; e.g. _inline_me_ (\a -> e); here e got floated...
245 Easily fixed by a change to ltMajLvl
247 * Make CoreUnfold.calcUnfoldingGuidance a bit less keen to make case expressions
248 seem small. The original idea was to make inlined wrappers look small, so that
249 when we inline a wrapper it doesn't make call site (much) bigger
250 Otherwise we get nasty phase ordering stuff:
253 If we inline g's wrapper, f looks big, and doesn't get inlined
254 into h; if we inline f first, while it looks small, then g's
255 wrapper will get inlined later anyway. To avoid this nasty
256 ordering difference, we make (case a of (x,y) -> ...),
257 *where a is one of the arguments* look free.
259 BUT (a) It's too eager. We don't want to inline a wrapper into a
260 context with no benefit.
261 E.g. \ x. f (x+x) o point in inlining (+) here!
263 (b) It's ineffective. Once g's wrapper is inlined, its case-expressions
264 aren't scrutinising arguments any more
266 So I've rescinded this idea for now. cases still look fairly small.
268 * Fix interestingArg, which was being too liberal, and hence doing
271 * Extended CoreUtils.exprIsCheap to make two more things cheap:
272 - case (coerce x) of ...
274 This makes a bit more eta expansion happen. It was provoked by
275 a program of Marcin's.
277 * The simplifier used to glom together all the top-level bindings into
278 a single Rec every time it was invoked. The reason for this is explained
279 in SimplCore.lhs, but for at least one simple program it meant that the
280 simplifier never got around to unravelling the recursive group into
281 non-recursive pieces. So I've put the glomming under explicit flag
282 control with a -fglom-binds simplifier pass. A side benefit is
283 that because it happens less often, the (expensive) SCC algorithm
286 * MkIface.ifaceBinds. Make sure that we emit rules for things
287 (like class operations) that don't get a top-level binding in the
288 interface file. Previously such rules were silently forgotten.
290 * Move transformRhs to *after* simplification, which makes it a
291 little easier to do, and means that the arity it computes is
292 readily available to completeBinding. This gets much better
295 * Do coerce splitting in completeBinding. This gets good code for
296 newtype CInt = CInt Int
306 * Modify the meaning of "arity" so that during compilation it means
307 "if you apply this function to fewer args, it will do virtually
308 no work". So, for example
309 f = coerce t (\x -> e)
310 has arity at least 1. When a function is exported, it's arity becomes
311 the number of exposed, top-level lambdas, which is subtly different.
314 I removed CoreUtils.exprArity altogether: it looked only at the exposed
315 lambdas. Instead, we use exprEtaExpandArity exclusively.
317 All of this makes I/O programs work much better.
320 -----------------------------
322 -----------------------------
324 * PrimRep, TysPrim. Add PrimPtrRep as the representation for
325 MVars and MutVars. Previously they were given PtrRep, but that
326 crashed dataReturnConvPrim! Here's the program the killed it:
327 data STRef s a = STRef (MutVar# s a)
330 * Make the desugarer use string equality for string literal
331 patterns longer than 1 character. And put a specialised
332 eqString into PrelBase, with a suitable specialisation rule.
333 This makes a huge difference to the size of the code generated
334 by deriving(Read) notably in Time.lhs
336 -----------------------------
337 Marktoberdorf Commits (Aug 2000)
338 -----------------------------
340 1. Tidy up the renaming story for "system binders", such as
341 dictionary functions, default methods, constructor workers etc. These
342 are now documented in HsDecls. The main effect of the change, apart
343 from tidying up, is to make the *type-checker* (instead of the
344 renamer) generate names for dict-funs and default-methods. This is
345 good because Sergei's generic-class stuff generates new classes at
349 2. Fix the CSE pass so it does not require the no-shadowing invariant.
350 Keith discovered that the simplifier occasionally returns a result
351 with shadowing. After much fiddling around (which has improved the
352 code in the simplifier a bit) I found that it is nearly impossible to
353 arrange that it really does do no-shadowing. So I gave up and fixed
354 the CSE pass (which is the only one to rely on it) instead.
357 3. Fix a performance bug in the simplifier. The change is in
358 SimplUtils.interestingArg. It computes whether an argment should
359 be considered "interesting"; if a function is applied to an interesting
360 argument, we are more likely to inline that function.
363 The 'x' argument was considered "uninteresting" for a silly reason.
364 Since x only occurs once, it was unconditionally substituted, but
365 interestingArg didn't take account of that case. Now it does.
367 I also made interestingArg a bit more liberal. Let's see if we
368 get too much inlining now.
371 4. In the occurrence analyser, we were choosing a bad loop breaker.
372 Here's the comment that's now in OccurAnal.reOrderRec
374 score ((bndr, rhs), _, _)
375 | exprIsTrivial rhs = 3 -- Practically certain to be inlined
376 -- Used to have also: && not (isExportedId bndr)
377 -- But I found this sometimes cost an extra iteration when we have
378 -- rec { d = (a,b); a = ...df...; b = ...df...; df = d }
379 -- where df is the exported dictionary. Then df makes a really
380 -- bad choice for loop breaker
382 I also increased the score for bindings with a non-functional type, so that
383 dictionaries have a better chance of getting inlined early
386 5. Add a hash code to the InScopeSet (and make it properly abstract)
387 This should make uniqAway a lot more robust. Simple experiments suggest
388 that uniqAway no longer gets into the long iteration chains that it used
392 6. Fix a bug in the inliner that made the simplifier tend to get into
393 a loop where it would keep iterating ("4 iterations, bailing out" message).
394 In SimplUtils.mkRhsTyLam we float bindings out past a big lambda, thus:
395 x = /\ b -> let g = \x -> f x x
398 g* = /\a -> \x -> f x x
399 x = /\ b -> let g = g* b in E
401 It's essential that we don't simply inling g* back into the RHS of g,
402 else we will be back to square 1. The inliner is meant not to do this
403 because there's no benefit to the inlining, but the size calculation
404 was a little off in CoreUnfold.
407 7. In SetLevels we were bogus-ly building a Subst with an empty in-scope
408 set, so a WARNING popped up when compiling some modules. (knights/ChessSetList
409 was the example that tickled it.) Now in fact the warning wasn't an error,
410 but the Right Thing to do is to carry down a proper Subst in SetLevels, so
411 that is what I have now done. It is very little more expensive.
419 This is a pretty big commit! It adds stuff I've been working on
420 over the last month or so. DO NOT MERGE IT WITH 4.07!
422 Recompilation checking
423 ~~~~~~~~~~~~~~~~~~~~~~
424 Substantial improvement in recompilation checking. The version management
425 is now entirely internal to GHC. ghc-iface.lprl is dead!
427 The trick is to generate the new interface file in two steps:
428 - first convert Types etc to HsTypes etc, and thereby
429 build a new ParsedIface
430 - then compare against the parsed (but not renamed) version of the old
432 Doing this meant adding code to convert *to* HsSyn things, and to
433 compare HsSyn things for equality. That is the main tedious bit.
435 Another improvement is that we now track version info for
436 fixities and rules, which was missing before.
439 Interface file reading
440 ~~~~~~~~~~~~~~~~~~~~~~
441 Make interface files reading more robust.
442 * If the old interface file is unreadable, don't fail. [bug fix]
444 * If the old interface file mentions interfaces
445 that are unreadable, don't fail. [bug fix]
447 * When we can't find the interface file,
448 print the directories we are looking in. [feature]
453 * New flag -ddump-types to print type signatures
459 data T = T1 A | T2 B | T3 C
460 it seems excessive to import the types A, B, C as well, unless
461 the constructors T1, T2 etc are used. A,B,C might be more types,
462 and importing them may mean reading more interfaces, and so on.
463 So the idea is that the renamer will just import the decl
465 unless one of the constructors is used. This turns out to be quite
466 easy to implement. The downside is that we must make sure the
467 constructors are always available if they are really needed, so
468 I regard this as an experimental feature.
471 Elimininate ThinAir names
472 ~~~~~~~~~~~~~~~~~~~~~~~~~
473 Eliminate ThinAir.lhs and all its works. It was always a hack, and now
474 the desugarer carries around an environment I think we can nuke ThinAir
477 As part of this, I had to move all the Prelude RdrName defns from PrelInfo
478 to PrelMods --- so I renamed PrelMods as PrelNames.
480 I also had to move the builtinRules so that they are injected by the renamer
481 (rather than appearing out of the blue in SimplCore). This is if anything simpler.
485 * Tidy up the data types involved in Rules
487 * Eliminate RnEnv.better_provenance; use Name.hasBetterProv instead
489 * Add Unique.hasKey :: Uniquable a => a -> Unique -> Bool
490 It's useful in a lot of places
492 * Fix a bug in interface file parsing for __U[!]
495 =======================================
498 * Try the effect of enhancing update in place with the CPR
499 idea in CoreUnfold.calcUnfoldingGuidance
501 * Check with Simon M re srt on Lit
503 * Make all primops return a data type so that we can't over-apply a primop
504 This makes code gen simpler. Currently the only primops with a polymorphic
507 catch# :: a -> (b->a) -> a
508 tagToEnum# :: Int -> a
510 Very strange code for PrelException.catchException! What has STret got
515 * Missing w/w for coerce in go2 functions of fibToList' in fibheaps
517 * Watch out for re-boxing in workers; sometimes it happens
518 and then w/w is a Bad Thing
520 * Only two uses of mkCompulsoryUnfolding -- try to nuke it
522 * Note that mkDupAlt makes alts that have binders that
523 are guaranteed to appear just once or not at all
525 Same for case binder, but that's harder to take into account.
527 * max :: Int -> Int -> Int could be CPRd but isn't.
529 * In mandel2 we do a little less well than 4.04 because we aren't
530 inlining point_colour, and that means we have to box up an argument
531 before calling it. [This was due to a bug in 4.04]
532 There's also a great opportunity for liberateCase
533 in check_radius, where it loops around with two lazy F# built each time
535 * In PrelShow.itos' we find a thunk like:
536 tpl = case chrzh {(zpzh {(remIntzh {x{-aMf-} 10}) 48})}
537 of tpl{-X1j-} __D P { __DEFAULT ->
538 PrelBase.Czh{-62,s-} {tpl{-X1j-}}
540 This is a pity. The remInt# can't fail because the divisor isn't 0,
541 so we could do the sum eagerly and allocate a charcter instead of a thunk.
543 * It's good to do let-to-case before we wrap up. Consider
544 f b xs = let ys = partition isUpper xs
545 zs = case ys of (a,b) -> a
550 (a,b) -> (zs ++ xs,[])
551 If we don't do let-to-case at all, we get 3 redundant case ys left.
552 On the other hand we don't want to do it too early, because it
553 prevents inlining into strict arg positions, which is important for
556 * Strict dictionaries.
558 * INLINE functions are not always INLINEd, so it's sad to leave
559 stuff in their bodies like constructors that havn't been inlined.
561 * If let x = e in b is strict, then CPR can use the CPR info from x
562 This bites in the mod method of Integral Int
564 * Inline wrappers if they are the RHS of a let, so that update in place
567 * Consider doing unboxing on strict constr args in a pattern match,
570 * In spectral/expert/Search.ask there's a statically visible CSE. Catching this
571 depends almost entirely on chance, which is a pity.
573 * Think about exprEtaExpandArity in WwLib. Perhaps eliminate eta expand in simplify?
574 Perhaps use even if no coerces etc, just eta expansion. (e.g. PrelArr.done)
576 * In knights/KnightHeuristic, we don't find that possibleMoves is strict
577 (with important knock-on effects) unless we apply rules before floating
578 out the literal list [A,B,C...].
579 Similarly, in f_se (F_Cmp ...) in listcompr (but a smaller effect)
581 * Floating can float the entire body of an INLINE thing out.
583 This is sad, and a bit stupid.
585 * In spectral/multiplier, we have
586 xor = lift21 forceBit f
587 where f :: Bit -> Bit -> Bit
592 Trouble is, f is CPR'd, and that means that instead of returning
593 the constants I# 0, I# 1, it returns 0,1 and then boxes them.
594 So allocation goes up. I don't see a way around this.
596 * spectral/hartel/parstof ends up saying
597 case (unpackCString "x") of { c:cs -> ... }
598 quite a bit. We should spot these and behave accordingly.
600 * Try a different hashing algorithms in hashUFM. This might reduce long CSE lists
601 as well as making uniqAway faster.
603 * [I'm not sure this is really important in the end.]
604 Don't float out partial applications in lvlMFE. E.g. (in hPutStr defn of shoveString)
608 setBufWPtr has arity 3. Floating it out is plain silly. And in this particular
609 case it's harmful, because it ends up preventing eta expansion on the \x.
610 That in turn leads to a big extra cost in hPutStr.
612 *** Try not doing lvlMFE on the body of a lambda and case alternative ***
614 * PrelNumExtra.lhs we get three copies of dropTrailing0s. Too much inlining!
615 drop0 has cost 21, but gets a discount of 6 (3 * #constrs) for its arg.
616 With a keen-neess factor of 2, that makes a discount of 12. Add two for
617 the arguments and we get 21-12-2, which is just small enough to inline.
618 But that is plainly stupid.
620 Add one for cases; and decrease discount for constructors.
622 * IO.hGetContents still doesn't see that it is strict in the handle.
623 Coerces still getting in the way.
625 * Try not having really_interesting_cont (subsumed by changes in the
626 way guidance is calculated for inline things?)
628 * Enumeration types in worker/wrapper for strictness analysis
630 * This should be reported as an error:
631 data T k = MkT (k Int#)
633 * Bogus report of overlapped pattern for
634 f (R {field = [c]}) = 1
636 This shows up for TyCon.maybeTyConSingleCon
638 * > module Main( main ) where
650 For ``f'' the following is reported.
653 Pattern match(es) are overlapped in the definition of function `f'
656 There are no complaints for definition for ``g''.
658 * Without -O I don't think we need change the module version
659 if the usages change; I forget why it changes even with -O
661 * Record selectors for existential type; no good! What to do?
662 Record update doesn't make sense either.
664 Need to be careful when figuring out strictness, and when generating
665 worker-wrapper split.
673 A fairly big pile of work originally aimed at
674 removing the Con form of Core expression, and replacing it with simple
675 Lit form. However, I wanted to make sure that the resulting thing
676 performed better than the original, so I ended up making an absolute
677 raft of other changes.
679 Removing the Con form of Core expressions
680 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
681 The big thing is that
683 For every constructor C there are now *two* Ids:
685 C is the constructor's *wrapper*. It evaluates and unboxes arguments
686 before calling $wC. It has a perfectly ordinary top-level defn
687 in the module defining the data type.
689 $wC is the constructor's *worker*. It is like a primop that simply
690 allocates and builds the constructor value. Its arguments are the
691 actual representation arguments of the constructor.
693 For every primop P there is *one* Id, its (curried) Id
695 Neither contructor worker Id nor the primop Id have a defminition anywhere.
696 Instead they are saturated during the core-to-STG pass, and the code generator
697 generates code for them directly. The STG language still has saturated
698 primops and constructor applications.
700 * The Const type disappears, along with Const.lhs. The literal part
701 of Const.lhs reappears as Literal.lhs. Much tidying up in here,
702 to bring all the range checking into this one module.
704 * I got rid of NoRep literals entirely. They just seem to be too much trouble.
706 * Because Con's don't exist any more, the funny C { args } syntax
707 disappears from inteface files.
709 * Every constructor, C, comes with a
711 *wrapper*, called C, whose type is exactly what it looks like
712 in the source program. It is an ordinary function,
713 and it gets a top-level binding like any other function
715 *worker*, called $wC, which is the actual data constructor.
716 Its type may be different to C, because:
717 - useless dict args are dropped
718 - strict args may be flattened
719 It does not have a binding.
721 The worker is very like a primop, in that it has no binding,
726 * Result type signatures now work
727 f :: Int -> Int = \x -> x
728 -- The Int->Int is the type of f
731 -- The Int is the type of the result of (g x y)
734 Recompilation checking and make
735 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
736 * The .hi file for a modules is not touched if it doesn't change. (It used to
737 be touched regardless, forcing a chain of recompilations.) The penalty for this
738 is that we record exported things just as if they were mentioned in the body of
739 the module. And the penalty for that is that we may recompile a module when
740 the only things that have changed are the things it is passing on without using.
741 But it seems like a good trade.
743 * -recomp is on by default
748 foreign export zoo :: Int -> IO Int
749 then you get a C produre called 'zoo', not 'zzoo' as before.
750 I've also added a check that complains if you export (or import) a C
751 procedure whose name isn't legal C.
754 Code generation and labels
755 ~~~~~~~~~~~~~~~~~~~~~~~~~~
756 * Now that constructor workers and wrappers have distinct names, there's
757 no need to have a Foo_static_closure and a Foo_closure for constructor Foo.
758 I nuked the entire StaticClosure story. This has effects in some of
759 the RTS headers (i.e. s/static_closure/closure/g)
762 Rules, constant folding
763 ~~~~~~~~~~~~~~~~~~~~~~~
764 * Constant folding becomes just another rewrite rule, attached to the Id for the
765 PrimOp. To achieve this, there's a new form of Rule, a BuiltinRule (see CoreSyn.lhs).
766 The prelude rules are in prelude/PrelRules.lhs, while simplCore/ConFold.lhs has gone.
768 * Appending of constant strings now works, using fold/build fusion, plus
770 unpack "foo" c (unpack "baz" c n) = unpack "foobaz" c n
771 Implemented in PrelRules.lhs
773 * The CCall primop is tidied up quite a bit. There is now a data type CCall,
774 defined in PrimOp, that packages up the info needed for a particular CCall.
775 There is a new Id for each new ccall, with an big "occurrence name"
776 {__ccall "foo" gc Int# -> Int#}
777 In interface files, this is parsed as a single Id, which is what it is, really.
781 * There were numerous places where the host compiler's
782 minInt/maxInt was being used as the target machine's minInt/maxInt.
783 I nuked all of these; everything is localised to inIntRange and inWordRange,
786 * Desugaring record updates was broken: it didn't generate correct matches when
787 used withe records with fancy unboxing etc. It now uses matchWrapper.
789 * Significant tidying up in codeGen/SMRep.lhs
791 * Add __word, __word64, __int64 terminals to signal the obvious types
792 in interface files. Add the ability to print word values in hex into
795 * PrimOp.lhs is no longer part of a loop. Remove PrimOp.hi-boot*
800 * isProductTyCon no longer returns False for recursive products, nor
801 for unboxed products; you have to test for these separately.
802 There's no reason not to do CPR for recursive product types, for example.
803 Ditto splitProductType_maybe.
807 * New -fno-case-of-case flag for the simplifier. We use this in the first run
808 of the simplifier, where it helps to stop messing up expressions that
809 the (subsequent) full laziness pass would otherwise find float out.
810 It's much more effective than previous half-baked hacks in inlining.
812 Actually, it turned out that there were three places in Simplify.lhs that
813 needed to know use this flag.
815 * Make the float-in pass push duplicatable bindings into the branches of
816 a case expression, in the hope that we never have to allocate them.
817 (see FloatIn.sepBindsByDropPoint)
819 * Arrange that top-level bottoming Ids get a NOINLINE pragma
820 This reduced gratuitous inlining of error messages.
821 But arrange that such things still get w/w'd.
823 * Arrange that a strict argument position is regarded as an 'interesting'
824 context, so that if we see
826 then we'll be inclined to inline g; this can expose a build.
828 * There was a missing case in CoreUtils.exprEtaExpandArity that meant
829 we were missing some obvious cases for eta expansion
830 Also improve the code when handling applications.
832 * Make record selectors (identifiable by their IdFlavour) into "cheap" operations.
833 [The change is a 2-liner in CoreUtils.exprIsCheap]
834 This means that record selection may be inlined into function bodies, which
835 greatly improves the arities of overloaded functions.
837 * Make a cleaner job of inlining "lone variables". There was some distributed
838 cunning, but I've centralised it all now in SimplUtils.analyseCont, which
839 analyses the context of a call to decide whether it is "interesting".
841 * Don't specialise very small functions in Specialise.specDefn
842 It's better to inline it. Rather like the worker/wrapper case.
844 * Be just a little more aggressive when floating out of let rhss.
845 See comments with Simplify.wantToExpose
846 A small change with an occasional big effect.
848 * Make the inline-size computation think that
849 case x of I# x -> ...
855 * Fix what was essentially a bug in CPR analysis. Consider
857 letrec f x = let g y = let ... in f e1
859 if ... then (a,b) else g x
861 g has the CPR property if f does; so when generating the final annotated
862 RHS for f, we must use an envt in which f is bound to its final abstract
863 value. This wasn't happening. Instead, f was given the CPR tag but g
864 wasn't; but of course the w/w pass gives rotten results in that case!!
865 (Because f's CPR-ness relied on g's.)
867 On they way I tidied up the code in CprAnalyse. It's quite a bit shorter.
869 The fact that some data constructors return a constructed product shows
870 up in their CPR info (MkId.mkDataConId) not in CprAnalyse.lhs
874 Strictness analysis and worker/wrapper
875 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
876 * BIG THING: pass in the demand to StrictAnal.saExpr. This affects situations
878 f (let x = e1 in (x,x))
879 where f turns out to have strictness u(SS), say. In this case we can
880 mark x as demanded, and use a case expression for it.
882 The situation before is that we didn't "know" that there is the u(SS)
883 demand on the argument, so we simply computed that the body of the let
884 expression is lazy in x, and marked x as lazily-demanded. Then even after
887 let x = e1 in case (x,x) of (a,b) -> $wf a b
891 let x = e1 in $wf a b
893 I found a much more complicated situation in spectral/sphere/Main.shade,
894 which improved quite a bit with this change.
896 * Moved the StrictnessInfo type from IdInfo to Demand. It's the logical
897 place for it, and helps avoid module loops
899 * Do worker/wrapper for coerces even if the arity is zero. Thus:
900 stdout = coerce Handle (..blurg..)
902 wibble = (...blurg...)
903 stdout = coerce Handle wibble
904 This is good because I found places where we were saying
905 case coerce t stdout of { MVar a ->
907 case coerce t stdout of { MVar b ->
909 and the redundant case wasn't getting eliminated because of the coerce.
915 * Fix a few renamer bugs
917 * Substantially reorganise the Prelude to eliminate all orphan declarations.
918 Details in PrelBase.lhs
920 * Do a much better job of appending literal strings
922 - move unpackCString stuff to PrelBase
923 - add BuiltinRules to the Rule type
924 - add fold/build rules for literal strings
930 * Fix a terrible bug in Simplify.mkDupableAlt; we were duplicating a small
931 *InAlt*, but doing so invalidated occurrence info, which could lead to
932 substantial code duplication.
934 * Fix a bug in WwLib.mkWWcpr; I was generating CPR wrappers like
936 which is utterly wrong. It should be
938 (The effect was to make functions stricter than they really are.)
940 * Try doing no inlining at all in phase 0. This noticeably improved
941 spectral/fish (esp Main.hs I think), by improving floating.
942 This single change has quite a large effect on some programs (allocation)
944 Don't inline Don't inline
946 in phase 0 in phase 0
948 cichelli 28962k -3.12%
949 wave4main 88089k +130.45%
950 fibheaps 31731k +19.01%
952 typecheck 148713k +4.91%
954 But I found that fish worked much better if we inline *local* things
955 in phase 0, but not *imported* things.
957 * Fix a terrible bug in Simplify.mkLamBndrZapper. It was counting
958 type args in one place, but not type binders, so it was sometimes
959 inlining into unsaturated lambdas!
961 * I found that there were some very bad loss-of-arity cases in PrelShow.
962 In particular, we had:
964 showl "" = showChar '"' s
965 showl ('"':xs) = showString "\\\"" . showl xs
966 showl (x:xs) = showLitChar x . showl xs
969 showl = \xs -> case xs of
971 (x:xs) -> let f = showLitChar x
974 which is TERRIBLE. We can't spot that showLitChar has arity 2 because
978 showLitChar c = showString ('\\' : asciiTab!!ord c)
980 notice that the (asciiTab!!orc c) is outside the \s, so GHC can't rewrite it to
982 showLitChar c = \s -> showString ('\\' : asciiTab!!ord c) s
984 So I've changed PrelShow.showLitChar to use explicit \s. Even then, showl
985 doesn't work, because GHC can't see that showl xs can be pushed inside the \s.
986 So I've put an explict \s there too.
988 showl "" s = showChar '"' s
989 showl ('"':xs) s = showString "\\\"" (showl xs s)
990 showl (x:xs) s = showLitChar x (showl xs s)
992 Net result: imaginary/gen_regexps more than halves in allocation!
994 Turns out that the mkLamBndrZapper bug (above) meant that showl was
995 erroneously inlining showLitChar x and showl xs, which is why this
996 problem hasn't shown up before.
998 * Improve CSE a bit. In ptic
999 case h x of y -> ...(h x)...
1000 replaces (h x) by y.
1002 * Inline INLINE things very agressively, even though we get code duplication
1003 thereby. Reason: otherwise we sometimes call the original un-inlined INLINE
1004 defns, which have constructors etc still un-inlined in their RHSs. The
1005 improvement is dramatic for a few programs:
1007 typecheck 150865k -1.43%
1008 wave4main 114216k -22.87%
1010 cichelli 33786k -14.28%
1012 rewrite 14665k -4.91%
1015 Code size increases by 10% which is not so good. There must be a better way.
1016 Another bad thing showed up in fish/Main.hs. Here we have
1017 (x1,y1) `vec_add` (x2,y2) = (x1+x2, y1+y2)
1018 which tends to get inlined. But if we first inline (+), it looks big,
1019 so we don't inline it. Sigh.
1022 * Don't inline constructors in INLINE RHSs. Ever. Otherwise rules don't match.
1025 * In ebnf2ps/Lexer.uncommentString, it would be a good idea to inline a constructor
1026 that occurs once in each branch of a case. That way it doesn't get allocated
1027 in the branches that don't use it. And in fact in this particular case
1028 something else good happens. So CoreUnfold now does that.
1030 * Reverted to n_val_binders+2 in calcUnfoldingGuidance
1031 Otherwise wrappers are inlined even if there's no benefit.
1036 * Arrange that simplConArgs works in one less pass than before.
1037 This exposed a bug: a bogus call to completeBeta.
1039 * Add a top-level flag in CoreUnfolding, used in callSiteInline
1041 * Extend w/w to use etaExpandArity, so it does eta/coerce expansion
1043 * Don't float anything out of an INLINE.
1044 Don't float things to top level unless they also escape a value lambda.
1045 [see comments with SetLevels.lvlMFE
1046 Without at least one of these changes, I found that
1047 {-# INLINE concat #-}
1048 concat = __inline (/\a -> foldr (++) [])
1049 was getting floated to
1050 concat = __inline( /\a -> lvl a )
1051 lvl = ...inlined version of foldr...
1053 Subsequently I found that not floating constants out of an INLINE
1054 gave really bad code like
1055 __inline (let x = e in \y -> ...)
1056 so I now let things float out of INLINE
1058 * Implement inline phases. The meaning of the inline pragmas is
1059 described in CoreUnfold.lhs
1061 * Implement the "reverse-mapping" idea for CSE; actually it turned out to be easier
1062 to implement it in SetLevels, and may benefit full laziness too.
1066 * It's a good idea to inline inRange. Consider
1068 index (l,h) i = case inRange (l,h) i of
1071 inRange itself isn't strict in h, but if it't inlined then 'index'
1072 *does* become strict in h. Interesting!
1074 * Big change to the way unfoldings and occurrence info is propagated in the simplifier
1075 The plan is described in Subst.lhs with the Subst type
1076 Occurrence info is now in a separate IdInfo field than user pragmas
1079 (coerce T (coerce S (\x.e))) y
1080 didn't simplify in one round. First we get to
1082 and only then do the beta. Solution: cancel the coerces in the continuation
1084 * Amazingly, CoreUnfold wasn't counting the cost of a function an application.
1088 * No commas between for-alls in RULES
1090 * Disable rules in initial simplifier run. Otherwise full laziness
1091 doesn't get a chance to lift out a MFE before a rule (e.g. fusion)
1092 zaps it. queens is a case in point
1094 * Improve float-out stuff significantly. The big change is that if we have
1096 \x -> ... /\a -> ...let p = ..a.. in let q = ...p...
1098 where p's rhs doesn't x, we abstract a from p, so that we can get p past x.
1099 (We did that before.) But we also substitute (p a) for p in q, and then
1100 we can do the same thing for q. (We didn't do that, so q got stuck.)
1101 This is much better. It involves doing a substitution "as we go" in SetLevels,
1107 * exprIsDupable for an application (f e1 .. en) wasn't calling exprIsDupable
1108 on the arguments!! So applications with few, but large, args were being dupliated.
1110 * sizeExpr on an application wasn't doing a nukeScrutDiscount on the arg of
1111 an application!! So bogus discounts could accumulate from arguments!
1113 * Improve handling of INLINE pragmas in calcUnfoldingGuidance. It was really
1116 * Substantially improve handling of coerces in worker/wrapper
1120 * Fix Kevin Atkinson's cant-find-instance bug. Turns out that Rename.slurpSourceRefs
1121 needs to repeatedly call getImportedInstDecls, and then go back to slurping
1122 source-refs. Comments with Rename.slurpSourceRefs.
1124 * Add a case to Simplify.mkDupableAlt for the quite-common case where there's
1125 a very simple alternative, in which case there's no point in creating a
1128 * Fix CoreUtils.exprOkForSpeculation so that it returns True of (==# a# b#).
1129 This lack meant that
1130 case ==# a# b# of { True -> x; False -> x }
1133 * Make float-out dump bindings at the top of a function argument, as
1134 at the top of a let(rec) rhs. See notes with FloatOut.floatRhs
1136 * Make the ArgOf case of mkDupableAlt generate a OneShot lambda.
1137 This gave a noticeable boost to spectral/boyer2
1142 Work, using IO.hPutStr as an example, to reduce the number of coerces.
1143 The main idea is in WwLib.mkWWcoerce. The gloss is that we must do
1144 the w/w split even for small non-recursive things. See notes with
1150 Study why gen_regexps is slower than before. Problem is in IO.writeLines,
1151 in particular the local defn shoveString. Two things are getting
1152 in the way of arity expansion, which means we build far more function
1153 closures than we should:
1154 shove = \ x -> let lvl = \s -> ...
1155 in \s -> ... lvl ...
1159 b) full laziness floats
1162 Solution to (a): add coerces to the worker/wrapper stuff.
1163 See notes with WwLib.mkWWcoerce.
1165 This further complicated getWorkerId, so I finally bit the bullet and
1166 make the workerInfo field of the IdInfo work properly, including
1167 under substitutions. Death to getWorkerId.
1171 Solution to (b): make all lambdas over realWorldStatePrimTy
1172 into one-shot lambdas. This is a GROSS HACK.
1174 * Also make the occurrence analyser aware of one-shot lambdas.
1179 Fix SetLevels so that it does not clone top-level bindings, but it
1180 *does* clone bindings that are destined for the top level.
1182 The global invariant is that the top level bindings are always
1183 unique, and never cloned.