1 ------------------------------------
3 ------------------------------------
5 * Don't forget to put deferred-type-decls back into RnIfaces
7 * Do we want to record a package name in a .hi file?
8 Does pi_mod have a ModuleName or a Module?
12 ------------------------------------
13 Mainly PredTypes (28 Sept 00)
14 ------------------------------------
16 Three things in this commit:
18 1. Main thing: tidy up PredTypes
19 2. Move all Keys into PrelNames
20 3. Check for unboxed tuples in function args
24 The main thing in this commit is to modify the representation of Types
25 so that they are a (much) better for the qualified-type world. This
26 should simplify Jeff's life as he proceeds with implicit parameters
27 and functional dependencies. In particular, PredType, introduced by
28 Jeff, is now blessed and dignified with a place in TypeRep.lhs:
30 data PredType = Class Class [Type]
33 Consider these examples:
34 f :: (Eq a) => a -> Int
35 g :: (?x :: Int -> Int) => a -> Int
36 h :: (r\l) => {r} => {l::Int | r}
38 Here the "Eq a" and "?x :: Int -> Int" and "r\l" are all called
39 *predicates*, and are represented by a PredType. (We don't support
40 TREX records yet, but the setup is designed to expand to allow them.)
42 In addition, Type gains an extra constructor:
44 data Type = .... | PredTy PredType
46 so that PredType is injected directly into Type. So the type
51 I have deleted the hackish IPNote stuff; predicates are dealt with entirely
52 through PredTys, not through NoteTy at all.
55 2. Move Keys into PrelNames
56 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
57 This is just a housekeeping operation. I've moved all the pre-assigned Uniques
58 (aka Keys) from Unique.lhs into PrelNames.lhs. I've also moved knowKeyRdrNames
59 from PrelInfo down into PrelNames. This localises in PrelNames lots of stuff
60 about predefined names. Previously one had to alter three files to add one,
65 Add a static check for unboxed tuple arguments. E.g.
66 data T = T (# Int, Int #)
71 ---------------------------------------
73 ---------------------------------------
75 -funfolding-update-in-place
76 Switching it on doesn't affect many programs, except these
77 sphere is because it makes a critical function (vecsub) more inlinable
85 cichelli 33546k +2.69%
86 typecheck 146023k +1.47%
89 ---------------------------------------
90 Simon's tuning changes: early Sept 2000
91 ---------------------------------------
95 * Eta expand PrelShow.showLitChar. It's impossible to compile this well,
96 and it makes a big difference to some programs (e.g. gen_regexps)
98 * Make PrelList.concat into a good producer (in the foldr/build sense)
103 * Add -ddump-hi-diffs to print out changes in interface files. Useful
104 when watching what the compiler is doing
106 * Add -funfolding-update-in-place to enable the experimental optimisation
107 that makes the inliner a bit keener to inline if it's in the RHS of
108 a thunk that might be updated in place. Sometimes this is a bad idea
109 (one example is in spectral/sphere; see notes in nofib/Simon-nofib-notes)
114 * Fix a bug in SetLevels.lvlMFE. (change ctxt_lvl to dest_level)
115 I don't think this has any performance effect, but it saves making
116 a redundant let-binding that is later eliminated.
118 * Desugar.dsProgram and DsForeign
119 Glom together all the bindings into a single Rec. Previously the
120 bindings generated by 'foreign' declarations were not glommed together, but
121 this led to an infelicity (i.e. poorer code than necessary) in the modules
122 that actually declare Float and Double (explained a bit more in Desugar.dsProgram)
124 * OccurAnal.shortMeOut and IdInfo.shortableIdInfo
125 Don't do the occurrence analyser's shorting out stuff for things which
126 have rules. Comments near IdInfo.shortableIdInfo.
127 This is deeply boring, and mainly to do with making rules work well.
128 Maybe rules should have phases attached too....
130 * CprAnalyse.addIdCprInfo
131 Be a bit more willing to add CPR information to thunks;
132 in particular, if the strictness analyser has just discovered that this
133 is a strict let, then the let-to-case transform will happen, and CPR is fine.
134 This made a big difference to PrelBase.modInt, which had something like
135 modInt = \ x -> let r = ... -> I# v in
136 ...body strict in r...
137 r's RHS isn't a value yet; but modInt returns r in various branches, so
138 if r doesn't have the CPR property then neither does modInt
140 * MkId.mkDataConWrapId
141 Arrange that vanilla constructors, like (:) and I#, get unfoldings that are
142 just a simple variable $w:, $wI#. This ensures they'll be inlined even into
143 rules etc, which makes matching a bit more reliable. The downside is that in
144 situations like (map (:) xs), we'll end up with (map (\y ys. $w: y ys) xs.
145 Which is tiresome but it doesn't happen much.
147 * SaAbsInt.findStrictness
148 Deal with the case where a thing with no arguments is bottom. This is Good.
149 E.g. module M where { foo = error "help" }
150 Suppose we have in another module
152 Then we'd like to do the case-of-error transform, without inlining foo.
157 * Reorganised Simplify.completeBinding (again).
159 * Removed the is_bot field in CoreUnfolding (is_cheap is true if is_bot is!)
160 This is just a tidy up
163 Remove the NewCon constructor from ConDecl. It just added code, and nothing else.
164 And it led to a bug in MkIface, which though that a newtype decl was always changing!
166 * IdInfo and many others
167 Remove all vestiges of UpdateInfo (hasn't been used for years)
169 ------------------------------
170 Join points Sept 2000
171 ------------------------------
173 With Andrew Kennedy, I found out why a few of the join points introduced by
174 the simplifier end up as *not* let-no-escpaed. Here's an example:
176 f x y = case (pwr x b) == 1 of
181 f = \ @ t w :: Integer ->
183 $j :: (State# RealWorld -> Bool)
186 = \ w1 :: (State# RealWorld) ->
187 case pwr w c of wild {
188 S# i -> case i of wild1 { 1 -> $wTrue; __DEFAULT -> $wFalse };
190 case cmpIntegerInt# s d1 1 of wild2 {
191 0 -> $wTrue; __DEFAULT -> $wFalse
195 case pwr w b of wild {
197 case i of wild1 { 1 -> $j realWorld#; __DEFAULT -> $wFalse };
199 case cmpIntegerInt# s d1 1 of wild2 {
200 0 -> $j realWorld#; __DEFAULT -> $wFalse
210 Suppose f is inlined into this case. No new join points are introduced,
211 because the alternatives are both small. But the consumer
212 case [.] of {True -> False; False -> True}
213 will move into the body of f, be duplicated 4 ways, and end up consuming
214 the result of the four outcomes at the body of f. This yields:
215 $j :: (State# RealWorld -> Bool)
218 = \ w1 :: (State# RealWorld) ->
219 case pwr w c of wild {
220 S# i -> case i of wild1 { 1 -> $wTrue; __DEFAULT -> $wFalse };
222 case cmpIntegerInt# s d1 1 of wild2 {
223 0 -> $wTrue; __DEFAULT -> $wFalse
227 case pwr w b of wild {
229 case i of wild1 { 1 -> case $j realWorld# of {T->F; F->T}
230 ; __DEFAULT -> $wTrue };
232 case cmpIntegerInt# s d1 1 of wild2 {
233 0 -> case $j realWorld# of {T->F; F->T}
234 ; __DEFAULT -> $wTrue
238 And, voila, the join point $j isn't let-no-escaped any more.
239 The point is that the consuming context can't "see inside" the join point.
240 It's a phase ordering thing. If f is inlined before the join points
241 are built in the first place, then all is well.
245 -----------------------------
247 -----------------------------
249 * Make the simplifier's Stop continuation record whether the expression being
250 simplified is the RHS of a thunk, or (say) the body of a lambda or case RHS.
251 In the thunk case we want to be a bit keener about inlining if the type of
252 the thunk is amenable to update in place.
254 * SetLevels was being a bit too eager to float things to the top
255 level; e.g. _inline_me_ (\a -> e); here e got floated...
256 Easily fixed by a change to ltMajLvl
258 * Make CoreUnfold.calcUnfoldingGuidance a bit less keen to make case expressions
259 seem small. The original idea was to make inlined wrappers look small, so that
260 when we inline a wrapper it doesn't make call site (much) bigger
261 Otherwise we get nasty phase ordering stuff:
264 If we inline g's wrapper, f looks big, and doesn't get inlined
265 into h; if we inline f first, while it looks small, then g's
266 wrapper will get inlined later anyway. To avoid this nasty
267 ordering difference, we make (case a of (x,y) -> ...),
268 *where a is one of the arguments* look free.
270 BUT (a) It's too eager. We don't want to inline a wrapper into a
271 context with no benefit.
272 E.g. \ x. f (x+x) o point in inlining (+) here!
274 (b) It's ineffective. Once g's wrapper is inlined, its case-expressions
275 aren't scrutinising arguments any more
277 So I've rescinded this idea for now. cases still look fairly small.
279 * Fix interestingArg, which was being too liberal, and hence doing
282 * Extended CoreUtils.exprIsCheap to make two more things cheap:
283 - case (coerce x) of ...
285 This makes a bit more eta expansion happen. It was provoked by
286 a program of Marcin's.
288 * The simplifier used to glom together all the top-level bindings into
289 a single Rec every time it was invoked. The reason for this is explained
290 in SimplCore.lhs, but for at least one simple program it meant that the
291 simplifier never got around to unravelling the recursive group into
292 non-recursive pieces. So I've put the glomming under explicit flag
293 control with a -fglom-binds simplifier pass. A side benefit is
294 that because it happens less often, the (expensive) SCC algorithm
297 * MkIface.ifaceBinds. Make sure that we emit rules for things
298 (like class operations) that don't get a top-level binding in the
299 interface file. Previously such rules were silently forgotten.
301 * Move transformRhs to *after* simplification, which makes it a
302 little easier to do, and means that the arity it computes is
303 readily available to completeBinding. This gets much better
306 * Do coerce splitting in completeBinding. This gets good code for
307 newtype CInt = CInt Int
317 * Modify the meaning of "arity" so that during compilation it means
318 "if you apply this function to fewer args, it will do virtually
319 no work". So, for example
320 f = coerce t (\x -> e)
321 has arity at least 1. When a function is exported, it's arity becomes
322 the number of exposed, top-level lambdas, which is subtly different.
325 I removed CoreUtils.exprArity altogether: it looked only at the exposed
326 lambdas. Instead, we use exprEtaExpandArity exclusively.
328 All of this makes I/O programs work much better.
331 -----------------------------
333 -----------------------------
335 * PrimRep, TysPrim. Add PrimPtrRep as the representation for
336 MVars and MutVars. Previously they were given PtrRep, but that
337 crashed dataReturnConvPrim! Here's the program the killed it:
338 data STRef s a = STRef (MutVar# s a)
341 * Make the desugarer use string equality for string literal
342 patterns longer than 1 character. And put a specialised
343 eqString into PrelBase, with a suitable specialisation rule.
344 This makes a huge difference to the size of the code generated
345 by deriving(Read) notably in Time.lhs
347 -----------------------------
348 Marktoberdorf Commits (Aug 2000)
349 -----------------------------
351 1. Tidy up the renaming story for "system binders", such as
352 dictionary functions, default methods, constructor workers etc. These
353 are now documented in HsDecls. The main effect of the change, apart
354 from tidying up, is to make the *type-checker* (instead of the
355 renamer) generate names for dict-funs and default-methods. This is
356 good because Sergei's generic-class stuff generates new classes at
360 2. Fix the CSE pass so it does not require the no-shadowing invariant.
361 Keith discovered that the simplifier occasionally returns a result
362 with shadowing. After much fiddling around (which has improved the
363 code in the simplifier a bit) I found that it is nearly impossible to
364 arrange that it really does do no-shadowing. So I gave up and fixed
365 the CSE pass (which is the only one to rely on it) instead.
368 3. Fix a performance bug in the simplifier. The change is in
369 SimplUtils.interestingArg. It computes whether an argment should
370 be considered "interesting"; if a function is applied to an interesting
371 argument, we are more likely to inline that function.
374 The 'x' argument was considered "uninteresting" for a silly reason.
375 Since x only occurs once, it was unconditionally substituted, but
376 interestingArg didn't take account of that case. Now it does.
378 I also made interestingArg a bit more liberal. Let's see if we
379 get too much inlining now.
382 4. In the occurrence analyser, we were choosing a bad loop breaker.
383 Here's the comment that's now in OccurAnal.reOrderRec
385 score ((bndr, rhs), _, _)
386 | exprIsTrivial rhs = 3 -- Practically certain to be inlined
387 -- Used to have also: && not (isExportedId bndr)
388 -- But I found this sometimes cost an extra iteration when we have
389 -- rec { d = (a,b); a = ...df...; b = ...df...; df = d }
390 -- where df is the exported dictionary. Then df makes a really
391 -- bad choice for loop breaker
393 I also increased the score for bindings with a non-functional type, so that
394 dictionaries have a better chance of getting inlined early
397 5. Add a hash code to the InScopeSet (and make it properly abstract)
398 This should make uniqAway a lot more robust. Simple experiments suggest
399 that uniqAway no longer gets into the long iteration chains that it used
403 6. Fix a bug in the inliner that made the simplifier tend to get into
404 a loop where it would keep iterating ("4 iterations, bailing out" message).
405 In SimplUtils.mkRhsTyLam we float bindings out past a big lambda, thus:
406 x = /\ b -> let g = \x -> f x x
409 g* = /\a -> \x -> f x x
410 x = /\ b -> let g = g* b in E
412 It's essential that we don't simply inling g* back into the RHS of g,
413 else we will be back to square 1. The inliner is meant not to do this
414 because there's no benefit to the inlining, but the size calculation
415 was a little off in CoreUnfold.
418 7. In SetLevels we were bogus-ly building a Subst with an empty in-scope
419 set, so a WARNING popped up when compiling some modules. (knights/ChessSetList
420 was the example that tickled it.) Now in fact the warning wasn't an error,
421 but the Right Thing to do is to carry down a proper Subst in SetLevels, so
422 that is what I have now done. It is very little more expensive.
430 This is a pretty big commit! It adds stuff I've been working on
431 over the last month or so. DO NOT MERGE IT WITH 4.07!
433 Recompilation checking
434 ~~~~~~~~~~~~~~~~~~~~~~
435 Substantial improvement in recompilation checking. The version management
436 is now entirely internal to GHC. ghc-iface.lprl is dead!
438 The trick is to generate the new interface file in two steps:
439 - first convert Types etc to HsTypes etc, and thereby
440 build a new ParsedIface
441 - then compare against the parsed (but not renamed) version of the old
443 Doing this meant adding code to convert *to* HsSyn things, and to
444 compare HsSyn things for equality. That is the main tedious bit.
446 Another improvement is that we now track version info for
447 fixities and rules, which was missing before.
450 Interface file reading
451 ~~~~~~~~~~~~~~~~~~~~~~
452 Make interface files reading more robust.
453 * If the old interface file is unreadable, don't fail. [bug fix]
455 * If the old interface file mentions interfaces
456 that are unreadable, don't fail. [bug fix]
458 * When we can't find the interface file,
459 print the directories we are looking in. [feature]
464 * New flag -ddump-types to print type signatures
470 data T = T1 A | T2 B | T3 C
471 it seems excessive to import the types A, B, C as well, unless
472 the constructors T1, T2 etc are used. A,B,C might be more types,
473 and importing them may mean reading more interfaces, and so on.
474 So the idea is that the renamer will just import the decl
476 unless one of the constructors is used. This turns out to be quite
477 easy to implement. The downside is that we must make sure the
478 constructors are always available if they are really needed, so
479 I regard this as an experimental feature.
482 Elimininate ThinAir names
483 ~~~~~~~~~~~~~~~~~~~~~~~~~
484 Eliminate ThinAir.lhs and all its works. It was always a hack, and now
485 the desugarer carries around an environment I think we can nuke ThinAir
488 As part of this, I had to move all the Prelude RdrName defns from PrelInfo
489 to PrelMods --- so I renamed PrelMods as PrelNames.
491 I also had to move the builtinRules so that they are injected by the renamer
492 (rather than appearing out of the blue in SimplCore). This is if anything simpler.
496 * Tidy up the data types involved in Rules
498 * Eliminate RnEnv.better_provenance; use Name.hasBetterProv instead
500 * Add Unique.hasKey :: Uniquable a => a -> Unique -> Bool
501 It's useful in a lot of places
503 * Fix a bug in interface file parsing for __U[!]
506 =======================================
509 * Try the effect of enhancing update in place with the CPR
510 idea in CoreUnfold.calcUnfoldingGuidance
512 * Check with Simon M re srt on Lit
514 * Make all primops return a data type so that we can't over-apply a primop
515 This makes code gen simpler. Currently the only primops with a polymorphic
518 catch# :: a -> (b->a) -> a
519 tagToEnum# :: Int -> a
521 Very strange code for PrelException.catchException! What has STret got
526 * Missing w/w for coerce in go2 functions of fibToList' in fibheaps
528 * Watch out for re-boxing in workers; sometimes it happens
529 and then w/w is a Bad Thing
531 * Only two uses of mkCompulsoryUnfolding -- try to nuke it
533 * Note that mkDupAlt makes alts that have binders that
534 are guaranteed to appear just once or not at all
536 Same for case binder, but that's harder to take into account.
538 * max :: Int -> Int -> Int could be CPRd but isn't.
540 * In mandel2 we do a little less well than 4.04 because we aren't
541 inlining point_colour, and that means we have to box up an argument
542 before calling it. [This was due to a bug in 4.04]
543 There's also a great opportunity for liberateCase
544 in check_radius, where it loops around with two lazy F# built each time
546 * In PrelShow.itos' we find a thunk like:
547 tpl = case chrzh {(zpzh {(remIntzh {x{-aMf-} 10}) 48})}
548 of tpl{-X1j-} __D P { __DEFAULT ->
549 PrelBase.Czh{-62,s-} {tpl{-X1j-}}
551 This is a pity. The remInt# can't fail because the divisor isn't 0,
552 so we could do the sum eagerly and allocate a charcter instead of a thunk.
554 * It's good to do let-to-case before we wrap up. Consider
555 f b xs = let ys = partition isUpper xs
556 zs = case ys of (a,b) -> a
561 (a,b) -> (zs ++ xs,[])
562 If we don't do let-to-case at all, we get 3 redundant case ys left.
563 On the other hand we don't want to do it too early, because it
564 prevents inlining into strict arg positions, which is important for
567 * Strict dictionaries.
569 * INLINE functions are not always INLINEd, so it's sad to leave
570 stuff in their bodies like constructors that havn't been inlined.
572 * If let x = e in b is strict, then CPR can use the CPR info from x
573 This bites in the mod method of Integral Int
575 * Inline wrappers if they are the RHS of a let, so that update in place
578 * Consider doing unboxing on strict constr args in a pattern match,
581 * In spectral/expert/Search.ask there's a statically visible CSE. Catching this
582 depends almost entirely on chance, which is a pity.
584 * Think about exprEtaExpandArity in WwLib. Perhaps eliminate eta expand in simplify?
585 Perhaps use even if no coerces etc, just eta expansion. (e.g. PrelArr.done)
587 * In knights/KnightHeuristic, we don't find that possibleMoves is strict
588 (with important knock-on effects) unless we apply rules before floating
589 out the literal list [A,B,C...].
590 Similarly, in f_se (F_Cmp ...) in listcompr (but a smaller effect)
592 * Floating can float the entire body of an INLINE thing out.
594 This is sad, and a bit stupid.
596 * In spectral/multiplier, we have
597 xor = lift21 forceBit f
598 where f :: Bit -> Bit -> Bit
603 Trouble is, f is CPR'd, and that means that instead of returning
604 the constants I# 0, I# 1, it returns 0,1 and then boxes them.
605 So allocation goes up. I don't see a way around this.
607 * spectral/hartel/parstof ends up saying
608 case (unpackCString "x") of { c:cs -> ... }
609 quite a bit. We should spot these and behave accordingly.
611 * Try a different hashing algorithms in hashUFM. This might reduce long CSE lists
612 as well as making uniqAway faster.
614 * [I'm not sure this is really important in the end.]
615 Don't float out partial applications in lvlMFE. E.g. (in hPutStr defn of shoveString)
619 setBufWPtr has arity 3. Floating it out is plain silly. And in this particular
620 case it's harmful, because it ends up preventing eta expansion on the \x.
621 That in turn leads to a big extra cost in hPutStr.
623 *** Try not doing lvlMFE on the body of a lambda and case alternative ***
625 * PrelNumExtra.lhs we get three copies of dropTrailing0s. Too much inlining!
626 drop0 has cost 21, but gets a discount of 6 (3 * #constrs) for its arg.
627 With a keen-neess factor of 2, that makes a discount of 12. Add two for
628 the arguments and we get 21-12-2, which is just small enough to inline.
629 But that is plainly stupid.
631 Add one for cases; and decrease discount for constructors.
633 * IO.hGetContents still doesn't see that it is strict in the handle.
634 Coerces still getting in the way.
636 * Try not having really_interesting_cont (subsumed by changes in the
637 way guidance is calculated for inline things?)
639 * Enumeration types in worker/wrapper for strictness analysis
641 * This should be reported as an error:
642 data T k = MkT (k Int#)
644 * Bogus report of overlapped pattern for
645 f (R {field = [c]}) = 1
647 This shows up for TyCon.maybeTyConSingleCon
649 * > module Main( main ) where
661 For ``f'' the following is reported.
664 Pattern match(es) are overlapped in the definition of function `f'
667 There are no complaints for definition for ``g''.
669 * Without -O I don't think we need change the module version
670 if the usages change; I forget why it changes even with -O
672 * Record selectors for existential type; no good! What to do?
673 Record update doesn't make sense either.
675 Need to be careful when figuring out strictness, and when generating
676 worker-wrapper split.
684 A fairly big pile of work originally aimed at
685 removing the Con form of Core expression, and replacing it with simple
686 Lit form. However, I wanted to make sure that the resulting thing
687 performed better than the original, so I ended up making an absolute
688 raft of other changes.
690 Removing the Con form of Core expressions
691 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
692 The big thing is that
694 For every constructor C there are now *two* Ids:
696 C is the constructor's *wrapper*. It evaluates and unboxes arguments
697 before calling $wC. It has a perfectly ordinary top-level defn
698 in the module defining the data type.
700 $wC is the constructor's *worker*. It is like a primop that simply
701 allocates and builds the constructor value. Its arguments are the
702 actual representation arguments of the constructor.
704 For every primop P there is *one* Id, its (curried) Id
706 Neither contructor worker Id nor the primop Id have a defminition anywhere.
707 Instead they are saturated during the core-to-STG pass, and the code generator
708 generates code for them directly. The STG language still has saturated
709 primops and constructor applications.
711 * The Const type disappears, along with Const.lhs. The literal part
712 of Const.lhs reappears as Literal.lhs. Much tidying up in here,
713 to bring all the range checking into this one module.
715 * I got rid of NoRep literals entirely. They just seem to be too much trouble.
717 * Because Con's don't exist any more, the funny C { args } syntax
718 disappears from inteface files.
720 * Every constructor, C, comes with a
722 *wrapper*, called C, whose type is exactly what it looks like
723 in the source program. It is an ordinary function,
724 and it gets a top-level binding like any other function
726 *worker*, called $wC, which is the actual data constructor.
727 Its type may be different to C, because:
728 - useless dict args are dropped
729 - strict args may be flattened
730 It does not have a binding.
732 The worker is very like a primop, in that it has no binding,
737 * Result type signatures now work
738 f :: Int -> Int = \x -> x
739 -- The Int->Int is the type of f
742 -- The Int is the type of the result of (g x y)
745 Recompilation checking and make
746 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
747 * The .hi file for a modules is not touched if it doesn't change. (It used to
748 be touched regardless, forcing a chain of recompilations.) The penalty for this
749 is that we record exported things just as if they were mentioned in the body of
750 the module. And the penalty for that is that we may recompile a module when
751 the only things that have changed are the things it is passing on without using.
752 But it seems like a good trade.
754 * -recomp is on by default
759 foreign export zoo :: Int -> IO Int
760 then you get a C produre called 'zoo', not 'zzoo' as before.
761 I've also added a check that complains if you export (or import) a C
762 procedure whose name isn't legal C.
765 Code generation and labels
766 ~~~~~~~~~~~~~~~~~~~~~~~~~~
767 * Now that constructor workers and wrappers have distinct names, there's
768 no need to have a Foo_static_closure and a Foo_closure for constructor Foo.
769 I nuked the entire StaticClosure story. This has effects in some of
770 the RTS headers (i.e. s/static_closure/closure/g)
773 Rules, constant folding
774 ~~~~~~~~~~~~~~~~~~~~~~~
775 * Constant folding becomes just another rewrite rule, attached to the Id for the
776 PrimOp. To achieve this, there's a new form of Rule, a BuiltinRule (see CoreSyn.lhs).
777 The prelude rules are in prelude/PrelRules.lhs, while simplCore/ConFold.lhs has gone.
779 * Appending of constant strings now works, using fold/build fusion, plus
781 unpack "foo" c (unpack "baz" c n) = unpack "foobaz" c n
782 Implemented in PrelRules.lhs
784 * The CCall primop is tidied up quite a bit. There is now a data type CCall,
785 defined in PrimOp, that packages up the info needed for a particular CCall.
786 There is a new Id for each new ccall, with an big "occurrence name"
787 {__ccall "foo" gc Int# -> Int#}
788 In interface files, this is parsed as a single Id, which is what it is, really.
792 * There were numerous places where the host compiler's
793 minInt/maxInt was being used as the target machine's minInt/maxInt.
794 I nuked all of these; everything is localised to inIntRange and inWordRange,
797 * Desugaring record updates was broken: it didn't generate correct matches when
798 used withe records with fancy unboxing etc. It now uses matchWrapper.
800 * Significant tidying up in codeGen/SMRep.lhs
802 * Add __word, __word64, __int64 terminals to signal the obvious types
803 in interface files. Add the ability to print word values in hex into
806 * PrimOp.lhs is no longer part of a loop. Remove PrimOp.hi-boot*
811 * isProductTyCon no longer returns False for recursive products, nor
812 for unboxed products; you have to test for these separately.
813 There's no reason not to do CPR for recursive product types, for example.
814 Ditto splitProductType_maybe.
818 * New -fno-case-of-case flag for the simplifier. We use this in the first run
819 of the simplifier, where it helps to stop messing up expressions that
820 the (subsequent) full laziness pass would otherwise find float out.
821 It's much more effective than previous half-baked hacks in inlining.
823 Actually, it turned out that there were three places in Simplify.lhs that
824 needed to know use this flag.
826 * Make the float-in pass push duplicatable bindings into the branches of
827 a case expression, in the hope that we never have to allocate them.
828 (see FloatIn.sepBindsByDropPoint)
830 * Arrange that top-level bottoming Ids get a NOINLINE pragma
831 This reduced gratuitous inlining of error messages.
832 But arrange that such things still get w/w'd.
834 * Arrange that a strict argument position is regarded as an 'interesting'
835 context, so that if we see
837 then we'll be inclined to inline g; this can expose a build.
839 * There was a missing case in CoreUtils.exprEtaExpandArity that meant
840 we were missing some obvious cases for eta expansion
841 Also improve the code when handling applications.
843 * Make record selectors (identifiable by their IdFlavour) into "cheap" operations.
844 [The change is a 2-liner in CoreUtils.exprIsCheap]
845 This means that record selection may be inlined into function bodies, which
846 greatly improves the arities of overloaded functions.
848 * Make a cleaner job of inlining "lone variables". There was some distributed
849 cunning, but I've centralised it all now in SimplUtils.analyseCont, which
850 analyses the context of a call to decide whether it is "interesting".
852 * Don't specialise very small functions in Specialise.specDefn
853 It's better to inline it. Rather like the worker/wrapper case.
855 * Be just a little more aggressive when floating out of let rhss.
856 See comments with Simplify.wantToExpose
857 A small change with an occasional big effect.
859 * Make the inline-size computation think that
860 case x of I# x -> ...
866 * Fix what was essentially a bug in CPR analysis. Consider
868 letrec f x = let g y = let ... in f e1
870 if ... then (a,b) else g x
872 g has the CPR property if f does; so when generating the final annotated
873 RHS for f, we must use an envt in which f is bound to its final abstract
874 value. This wasn't happening. Instead, f was given the CPR tag but g
875 wasn't; but of course the w/w pass gives rotten results in that case!!
876 (Because f's CPR-ness relied on g's.)
878 On they way I tidied up the code in CprAnalyse. It's quite a bit shorter.
880 The fact that some data constructors return a constructed product shows
881 up in their CPR info (MkId.mkDataConId) not in CprAnalyse.lhs
885 Strictness analysis and worker/wrapper
886 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
887 * BIG THING: pass in the demand to StrictAnal.saExpr. This affects situations
889 f (let x = e1 in (x,x))
890 where f turns out to have strictness u(SS), say. In this case we can
891 mark x as demanded, and use a case expression for it.
893 The situation before is that we didn't "know" that there is the u(SS)
894 demand on the argument, so we simply computed that the body of the let
895 expression is lazy in x, and marked x as lazily-demanded. Then even after
898 let x = e1 in case (x,x) of (a,b) -> $wf a b
902 let x = e1 in $wf a b
904 I found a much more complicated situation in spectral/sphere/Main.shade,
905 which improved quite a bit with this change.
907 * Moved the StrictnessInfo type from IdInfo to Demand. It's the logical
908 place for it, and helps avoid module loops
910 * Do worker/wrapper for coerces even if the arity is zero. Thus:
911 stdout = coerce Handle (..blurg..)
913 wibble = (...blurg...)
914 stdout = coerce Handle wibble
915 This is good because I found places where we were saying
916 case coerce t stdout of { MVar a ->
918 case coerce t stdout of { MVar b ->
920 and the redundant case wasn't getting eliminated because of the coerce.
926 * Fix a few renamer bugs
928 * Substantially reorganise the Prelude to eliminate all orphan declarations.
929 Details in PrelBase.lhs
931 * Do a much better job of appending literal strings
933 - move unpackCString stuff to PrelBase
934 - add BuiltinRules to the Rule type
935 - add fold/build rules for literal strings
941 * Fix a terrible bug in Simplify.mkDupableAlt; we were duplicating a small
942 *InAlt*, but doing so invalidated occurrence info, which could lead to
943 substantial code duplication.
945 * Fix a bug in WwLib.mkWWcpr; I was generating CPR wrappers like
947 which is utterly wrong. It should be
949 (The effect was to make functions stricter than they really are.)
951 * Try doing no inlining at all in phase 0. This noticeably improved
952 spectral/fish (esp Main.hs I think), by improving floating.
953 This single change has quite a large effect on some programs (allocation)
955 Don't inline Don't inline
957 in phase 0 in phase 0
959 cichelli 28962k -3.12%
960 wave4main 88089k +130.45%
961 fibheaps 31731k +19.01%
963 typecheck 148713k +4.91%
965 But I found that fish worked much better if we inline *local* things
966 in phase 0, but not *imported* things.
968 * Fix a terrible bug in Simplify.mkLamBndrZapper. It was counting
969 type args in one place, but not type binders, so it was sometimes
970 inlining into unsaturated lambdas!
972 * I found that there were some very bad loss-of-arity cases in PrelShow.
973 In particular, we had:
975 showl "" = showChar '"' s
976 showl ('"':xs) = showString "\\\"" . showl xs
977 showl (x:xs) = showLitChar x . showl xs
980 showl = \xs -> case xs of
982 (x:xs) -> let f = showLitChar x
985 which is TERRIBLE. We can't spot that showLitChar has arity 2 because
989 showLitChar c = showString ('\\' : asciiTab!!ord c)
991 notice that the (asciiTab!!orc c) is outside the \s, so GHC can't rewrite it to
993 showLitChar c = \s -> showString ('\\' : asciiTab!!ord c) s
995 So I've changed PrelShow.showLitChar to use explicit \s. Even then, showl
996 doesn't work, because GHC can't see that showl xs can be pushed inside the \s.
997 So I've put an explict \s there too.
999 showl "" s = showChar '"' s
1000 showl ('"':xs) s = showString "\\\"" (showl xs s)
1001 showl (x:xs) s = showLitChar x (showl xs s)
1003 Net result: imaginary/gen_regexps more than halves in allocation!
1005 Turns out that the mkLamBndrZapper bug (above) meant that showl was
1006 erroneously inlining showLitChar x and showl xs, which is why this
1007 problem hasn't shown up before.
1009 * Improve CSE a bit. In ptic
1010 case h x of y -> ...(h x)...
1011 replaces (h x) by y.
1013 * Inline INLINE things very agressively, even though we get code duplication
1014 thereby. Reason: otherwise we sometimes call the original un-inlined INLINE
1015 defns, which have constructors etc still un-inlined in their RHSs. The
1016 improvement is dramatic for a few programs:
1018 typecheck 150865k -1.43%
1019 wave4main 114216k -22.87%
1021 cichelli 33786k -14.28%
1023 rewrite 14665k -4.91%
1026 Code size increases by 10% which is not so good. There must be a better way.
1027 Another bad thing showed up in fish/Main.hs. Here we have
1028 (x1,y1) `vec_add` (x2,y2) = (x1+x2, y1+y2)
1029 which tends to get inlined. But if we first inline (+), it looks big,
1030 so we don't inline it. Sigh.
1033 * Don't inline constructors in INLINE RHSs. Ever. Otherwise rules don't match.
1036 * In ebnf2ps/Lexer.uncommentString, it would be a good idea to inline a constructor
1037 that occurs once in each branch of a case. That way it doesn't get allocated
1038 in the branches that don't use it. And in fact in this particular case
1039 something else good happens. So CoreUnfold now does that.
1041 * Reverted to n_val_binders+2 in calcUnfoldingGuidance
1042 Otherwise wrappers are inlined even if there's no benefit.
1047 * Arrange that simplConArgs works in one less pass than before.
1048 This exposed a bug: a bogus call to completeBeta.
1050 * Add a top-level flag in CoreUnfolding, used in callSiteInline
1052 * Extend w/w to use etaExpandArity, so it does eta/coerce expansion
1054 * Don't float anything out of an INLINE.
1055 Don't float things to top level unless they also escape a value lambda.
1056 [see comments with SetLevels.lvlMFE
1057 Without at least one of these changes, I found that
1058 {-# INLINE concat #-}
1059 concat = __inline (/\a -> foldr (++) [])
1060 was getting floated to
1061 concat = __inline( /\a -> lvl a )
1062 lvl = ...inlined version of foldr...
1064 Subsequently I found that not floating constants out of an INLINE
1065 gave really bad code like
1066 __inline (let x = e in \y -> ...)
1067 so I now let things float out of INLINE
1069 * Implement inline phases. The meaning of the inline pragmas is
1070 described in CoreUnfold.lhs
1072 * Implement the "reverse-mapping" idea for CSE; actually it turned out to be easier
1073 to implement it in SetLevels, and may benefit full laziness too.
1077 * It's a good idea to inline inRange. Consider
1079 index (l,h) i = case inRange (l,h) i of
1082 inRange itself isn't strict in h, but if it't inlined then 'index'
1083 *does* become strict in h. Interesting!
1085 * Big change to the way unfoldings and occurrence info is propagated in the simplifier
1086 The plan is described in Subst.lhs with the Subst type
1087 Occurrence info is now in a separate IdInfo field than user pragmas
1090 (coerce T (coerce S (\x.e))) y
1091 didn't simplify in one round. First we get to
1093 and only then do the beta. Solution: cancel the coerces in the continuation
1095 * Amazingly, CoreUnfold wasn't counting the cost of a function an application.
1099 * No commas between for-alls in RULES
1101 * Disable rules in initial simplifier run. Otherwise full laziness
1102 doesn't get a chance to lift out a MFE before a rule (e.g. fusion)
1103 zaps it. queens is a case in point
1105 * Improve float-out stuff significantly. The big change is that if we have
1107 \x -> ... /\a -> ...let p = ..a.. in let q = ...p...
1109 where p's rhs doesn't x, we abstract a from p, so that we can get p past x.
1110 (We did that before.) But we also substitute (p a) for p in q, and then
1111 we can do the same thing for q. (We didn't do that, so q got stuck.)
1112 This is much better. It involves doing a substitution "as we go" in SetLevels,
1118 * exprIsDupable for an application (f e1 .. en) wasn't calling exprIsDupable
1119 on the arguments!! So applications with few, but large, args were being dupliated.
1121 * sizeExpr on an application wasn't doing a nukeScrutDiscount on the arg of
1122 an application!! So bogus discounts could accumulate from arguments!
1124 * Improve handling of INLINE pragmas in calcUnfoldingGuidance. It was really
1127 * Substantially improve handling of coerces in worker/wrapper
1131 * Fix Kevin Atkinson's cant-find-instance bug. Turns out that Rename.slurpSourceRefs
1132 needs to repeatedly call getImportedInstDecls, and then go back to slurping
1133 source-refs. Comments with Rename.slurpSourceRefs.
1135 * Add a case to Simplify.mkDupableAlt for the quite-common case where there's
1136 a very simple alternative, in which case there's no point in creating a
1139 * Fix CoreUtils.exprOkForSpeculation so that it returns True of (==# a# b#).
1140 This lack meant that
1141 case ==# a# b# of { True -> x; False -> x }
1144 * Make float-out dump bindings at the top of a function argument, as
1145 at the top of a let(rec) rhs. See notes with FloatOut.floatRhs
1147 * Make the ArgOf case of mkDupableAlt generate a OneShot lambda.
1148 This gave a noticeable boost to spectral/boyer2
1153 Work, using IO.hPutStr as an example, to reduce the number of coerces.
1154 The main idea is in WwLib.mkWWcoerce. The gloss is that we must do
1155 the w/w split even for small non-recursive things. See notes with
1161 Study why gen_regexps is slower than before. Problem is in IO.writeLines,
1162 in particular the local defn shoveString. Two things are getting
1163 in the way of arity expansion, which means we build far more function
1164 closures than we should:
1165 shove = \ x -> let lvl = \s -> ...
1166 in \s -> ... lvl ...
1170 b) full laziness floats
1173 Solution to (a): add coerces to the worker/wrapper stuff.
1174 See notes with WwLib.mkWWcoerce.
1176 This further complicated getWorkerId, so I finally bit the bullet and
1177 make the workerInfo field of the IdInfo work properly, including
1178 under substitutions. Death to getWorkerId.
1182 Solution to (b): make all lambdas over realWorldStatePrimTy
1183 into one-shot lambdas. This is a GROSS HACK.
1185 * Also make the occurrence analyser aware of one-shot lambdas.
1190 Fix SetLevels so that it does not clone top-level bindings, but it
1191 *does* clone bindings that are destined for the top level.
1193 The global invariant is that the top level bindings are always
1194 unique, and never cloned.