1 ------------------------------------
3 ------------------------------------
5 * Don't forget to put deferred-type-decls back into RnIfaces
7 * Do we want to record a package name in a .hi file?
8 Does pi_mod have a ModuleName or a Module?
10 ------------------------------------
11 Mainly FunDeps (23 Jan 01)
12 ------------------------------------
14 This commit re-engineers the handling of functional dependencies.
15 A functional dependency is no longer an Inst; instead, the necessary
16 dependencies are snaffled out of their Class when necessary.
18 As part of this exercise I found that I had to re-work how to do generalisation
19 in a binding group. There is rather exhaustive documentation on the new Plan
20 at the top of TcSimplify.
23 WARNING: I have compiled all the libraries with this new compiler
24 and all looks well, but I have not run many programs.
25 Things may break. Let me know if so.
28 The main changes are these:
30 1. typecheck/TcBinds and TcSimplify have a lot of changes due to the
31 new generalisation and context reduction story. There are extensive
32 comments at the start of TcSimplify
34 2. typecheck/TcImprove is removed altogether. Instead, improvement is
35 interleaved with context reduction (until a fixpoint is reached).
36 All this is done in TcSimplify.
38 3. types/FunDeps has new exports
39 * 'improve' does improvement, returning a list of equations
40 * 'grow' and 'oclose' close a list of type variables wrt a set of
41 PredTypes, but in slightly different ways. Comments in file.
43 4. I improved the way in which we check that main::IO t. It's tidier now.
47 * typecheck/TcMatches:
48 a) Tidy up, introducing a common function tcCheckExistentialPat
50 b) Improve the typechecking of parallel list comprehensions,
51 which wasn't quite right before. (see comments with tcStmts)
53 WARNING: (b) is untested! Jeff, you might want to check.
55 * Numerous other incidental changes in the typechecker
57 * Manuel found that rules don't fire well when you have partial applications
58 from overloading. For example, we may get
60 f a (d::Ord a) = let m_g = g a d
62 \y :: a -> ...(m_g (h y))...
64 The 'method' m_g doesn't get inlined because (g a d) might be a redex.
65 Yet a rule that looks like
67 won't fire because that doesn't show up. One way out would be to make
68 the rule matcher a bit less paranoid about duplicating work, but instead
71 which controls whether we generate things like m_g in the first place.
72 It's not clear that they are a win in the first place.
74 The flag is actually consulted in Inst.tcInstId
78 ------------------------------------
79 Mainly PredTypes (28 Sept 00)
80 ------------------------------------
82 Three things in this commit:
84 1. Main thing: tidy up PredTypes
85 2. Move all Keys into PrelNames
86 3. Check for unboxed tuples in function args
90 The main thing in this commit is to modify the representation of Types
91 so that they are a (much) better for the qualified-type world. This
92 should simplify Jeff's life as he proceeds with implicit parameters
93 and functional dependencies. In particular, PredType, introduced by
94 Jeff, is now blessed and dignified with a place in TypeRep.lhs:
96 data PredType = Class Class [Type]
99 Consider these examples:
100 f :: (Eq a) => a -> Int
101 g :: (?x :: Int -> Int) => a -> Int
102 h :: (r\l) => {r} => {l::Int | r}
104 Here the "Eq a" and "?x :: Int -> Int" and "r\l" are all called
105 *predicates*, and are represented by a PredType. (We don't support
106 TREX records yet, but the setup is designed to expand to allow them.)
108 In addition, Type gains an extra constructor:
110 data Type = .... | PredTy PredType
112 so that PredType is injected directly into Type. So the type
117 I have deleted the hackish IPNote stuff; predicates are dealt with entirely
118 through PredTys, not through NoteTy at all.
121 2. Move Keys into PrelNames
122 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
123 This is just a housekeeping operation. I've moved all the pre-assigned Uniques
124 (aka Keys) from Unique.lhs into PrelNames.lhs. I've also moved knowKeyRdrNames
125 from PrelInfo down into PrelNames. This localises in PrelNames lots of stuff
126 about predefined names. Previously one had to alter three files to add one,
131 Add a static check for unboxed tuple arguments. E.g.
132 data T = T (# Int, Int #)
137 ---------------------------------------
139 ---------------------------------------
141 -funfolding-update-in-place
142 Switching it on doesn't affect many programs, except these
143 sphere is because it makes a critical function (vecsub) more inlinable
145 sphere 66465k -20.61%
151 cichelli 33546k +2.69%
152 typecheck 146023k +1.47%
155 ---------------------------------------
156 Simon's tuning changes: early Sept 2000
157 ---------------------------------------
161 * Eta expand PrelShow.showLitChar. It's impossible to compile this well,
162 and it makes a big difference to some programs (e.g. gen_regexps)
164 * Make PrelList.concat into a good producer (in the foldr/build sense)
169 * Add -ddump-hi-diffs to print out changes in interface files. Useful
170 when watching what the compiler is doing
172 * Add -funfolding-update-in-place to enable the experimental optimisation
173 that makes the inliner a bit keener to inline if it's in the RHS of
174 a thunk that might be updated in place. Sometimes this is a bad idea
175 (one example is in spectral/sphere; see notes in nofib/Simon-nofib-notes)
180 * Fix a bug in SetLevels.lvlMFE. (change ctxt_lvl to dest_level)
181 I don't think this has any performance effect, but it saves making
182 a redundant let-binding that is later eliminated.
184 * Desugar.dsProgram and DsForeign
185 Glom together all the bindings into a single Rec. Previously the
186 bindings generated by 'foreign' declarations were not glommed together, but
187 this led to an infelicity (i.e. poorer code than necessary) in the modules
188 that actually declare Float and Double (explained a bit more in Desugar.dsProgram)
190 * OccurAnal.shortMeOut and IdInfo.shortableIdInfo
191 Don't do the occurrence analyser's shorting out stuff for things which
192 have rules. Comments near IdInfo.shortableIdInfo.
193 This is deeply boring, and mainly to do with making rules work well.
194 Maybe rules should have phases attached too....
196 * CprAnalyse.addIdCprInfo
197 Be a bit more willing to add CPR information to thunks;
198 in particular, if the strictness analyser has just discovered that this
199 is a strict let, then the let-to-case transform will happen, and CPR is fine.
200 This made a big difference to PrelBase.modInt, which had something like
201 modInt = \ x -> let r = ... -> I# v in
202 ...body strict in r...
203 r's RHS isn't a value yet; but modInt returns r in various branches, so
204 if r doesn't have the CPR property then neither does modInt
206 * MkId.mkDataConWrapId
207 Arrange that vanilla constructors, like (:) and I#, get unfoldings that are
208 just a simple variable $w:, $wI#. This ensures they'll be inlined even into
209 rules etc, which makes matching a bit more reliable. The downside is that in
210 situations like (map (:) xs), we'll end up with (map (\y ys. $w: y ys) xs.
211 Which is tiresome but it doesn't happen much.
213 * SaAbsInt.findStrictness
214 Deal with the case where a thing with no arguments is bottom. This is Good.
215 E.g. module M where { foo = error "help" }
216 Suppose we have in another module
218 Then we'd like to do the case-of-error transform, without inlining foo.
223 * Reorganised Simplify.completeBinding (again).
225 * Removed the is_bot field in CoreUnfolding (is_cheap is true if is_bot is!)
226 This is just a tidy up
229 Remove the NewCon constructor from ConDecl. It just added code, and nothing else.
230 And it led to a bug in MkIface, which though that a newtype decl was always changing!
232 * IdInfo and many others
233 Remove all vestiges of UpdateInfo (hasn't been used for years)
235 ------------------------------
236 Join points Sept 2000
237 ------------------------------
239 With Andrew Kennedy, I found out why a few of the join points introduced by
240 the simplifier end up as *not* let-no-escpaed. Here's an example:
242 f x y = case (pwr x b) == 1 of
247 f = \ @ t w :: Integer ->
249 $j :: (State# RealWorld -> Bool)
252 = \ w1 :: (State# RealWorld) ->
253 case pwr w c of wild {
254 S# i -> case i of wild1 { 1 -> $wTrue; __DEFAULT -> $wFalse };
256 case cmpIntegerInt# s d1 1 of wild2 {
257 0 -> $wTrue; __DEFAULT -> $wFalse
261 case pwr w b of wild {
263 case i of wild1 { 1 -> $j realWorld#; __DEFAULT -> $wFalse };
265 case cmpIntegerInt# s d1 1 of wild2 {
266 0 -> $j realWorld#; __DEFAULT -> $wFalse
276 Suppose f is inlined into this case. No new join points are introduced,
277 because the alternatives are both small. But the consumer
278 case [.] of {True -> False; False -> True}
279 will move into the body of f, be duplicated 4 ways, and end up consuming
280 the result of the four outcomes at the body of f. This yields:
281 $j :: (State# RealWorld -> Bool)
284 = \ w1 :: (State# RealWorld) ->
285 case pwr w c of wild {
286 S# i -> case i of wild1 { 1 -> $wTrue; __DEFAULT -> $wFalse };
288 case cmpIntegerInt# s d1 1 of wild2 {
289 0 -> $wTrue; __DEFAULT -> $wFalse
293 case pwr w b of wild {
295 case i of wild1 { 1 -> case $j realWorld# of {T->F; F->T}
296 ; __DEFAULT -> $wTrue };
298 case cmpIntegerInt# s d1 1 of wild2 {
299 0 -> case $j realWorld# of {T->F; F->T}
300 ; __DEFAULT -> $wTrue
304 And, voila, the join point $j isn't let-no-escaped any more.
305 The point is that the consuming context can't "see inside" the join point.
306 It's a phase ordering thing. If f is inlined before the join points
307 are built in the first place, then all is well.
311 -----------------------------
313 -----------------------------
315 * Make the simplifier's Stop continuation record whether the expression being
316 simplified is the RHS of a thunk, or (say) the body of a lambda or case RHS.
317 In the thunk case we want to be a bit keener about inlining if the type of
318 the thunk is amenable to update in place.
320 * SetLevels was being a bit too eager to float things to the top
321 level; e.g. _inline_me_ (\a -> e); here e got floated...
322 Easily fixed by a change to ltMajLvl
324 * Make CoreUnfold.calcUnfoldingGuidance a bit less keen to make case expressions
325 seem small. The original idea was to make inlined wrappers look small, so that
326 when we inline a wrapper it doesn't make call site (much) bigger
327 Otherwise we get nasty phase ordering stuff:
330 If we inline g's wrapper, f looks big, and doesn't get inlined
331 into h; if we inline f first, while it looks small, then g's
332 wrapper will get inlined later anyway. To avoid this nasty
333 ordering difference, we make (case a of (x,y) -> ...),
334 *where a is one of the arguments* look free.
336 BUT (a) It's too eager. We don't want to inline a wrapper into a
337 context with no benefit.
338 E.g. \ x. f (x+x) o point in inlining (+) here!
340 (b) It's ineffective. Once g's wrapper is inlined, its case-expressions
341 aren't scrutinising arguments any more
343 So I've rescinded this idea for now. cases still look fairly small.
345 * Fix interestingArg, which was being too liberal, and hence doing
348 * Extended CoreUtils.exprIsCheap to make two more things cheap:
349 - case (coerce x) of ...
351 This makes a bit more eta expansion happen. It was provoked by
352 a program of Marcin's.
354 * The simplifier used to glom together all the top-level bindings into
355 a single Rec every time it was invoked. The reason for this is explained
356 in SimplCore.lhs, but for at least one simple program it meant that the
357 simplifier never got around to unravelling the recursive group into
358 non-recursive pieces. So I've put the glomming under explicit flag
359 control with a -fglom-binds simplifier pass. A side benefit is
360 that because it happens less often, the (expensive) SCC algorithm
363 * MkIface.ifaceBinds. Make sure that we emit rules for things
364 (like class operations) that don't get a top-level binding in the
365 interface file. Previously such rules were silently forgotten.
367 * Move transformRhs to *after* simplification, which makes it a
368 little easier to do, and means that the arity it computes is
369 readily available to completeBinding. This gets much better
372 * Do coerce splitting in completeBinding. This gets good code for
373 newtype CInt = CInt Int
383 * Modify the meaning of "arity" so that during compilation it means
384 "if you apply this function to fewer args, it will do virtually
385 no work". So, for example
386 f = coerce t (\x -> e)
387 has arity at least 1. When a function is exported, it's arity becomes
388 the number of exposed, top-level lambdas, which is subtly different.
391 I removed CoreUtils.exprArity altogether: it looked only at the exposed
392 lambdas. Instead, we use exprEtaExpandArity exclusively.
394 All of this makes I/O programs work much better.
397 -----------------------------
399 -----------------------------
401 * PrimRep, TysPrim. Add PrimPtrRep as the representation for
402 MVars and MutVars. Previously they were given PtrRep, but that
403 crashed dataReturnConvPrim! Here's the program the killed it:
404 data STRef s a = STRef (MutVar# s a)
407 * Make the desugarer use string equality for string literal
408 patterns longer than 1 character. And put a specialised
409 eqString into PrelBase, with a suitable specialisation rule.
410 This makes a huge difference to the size of the code generated
411 by deriving(Read) notably in Time.lhs
413 -----------------------------
414 Marktoberdorf Commits (Aug 2000)
415 -----------------------------
417 1. Tidy up the renaming story for "system binders", such as
418 dictionary functions, default methods, constructor workers etc. These
419 are now documented in HsDecls. The main effect of the change, apart
420 from tidying up, is to make the *type-checker* (instead of the
421 renamer) generate names for dict-funs and default-methods. This is
422 good because Sergei's generic-class stuff generates new classes at
426 2. Fix the CSE pass so it does not require the no-shadowing invariant.
427 Keith discovered that the simplifier occasionally returns a result
428 with shadowing. After much fiddling around (which has improved the
429 code in the simplifier a bit) I found that it is nearly impossible to
430 arrange that it really does do no-shadowing. So I gave up and fixed
431 the CSE pass (which is the only one to rely on it) instead.
434 3. Fix a performance bug in the simplifier. The change is in
435 SimplUtils.interestingArg. It computes whether an argment should
436 be considered "interesting"; if a function is applied to an interesting
437 argument, we are more likely to inline that function.
440 The 'x' argument was considered "uninteresting" for a silly reason.
441 Since x only occurs once, it was unconditionally substituted, but
442 interestingArg didn't take account of that case. Now it does.
444 I also made interestingArg a bit more liberal. Let's see if we
445 get too much inlining now.
448 4. In the occurrence analyser, we were choosing a bad loop breaker.
449 Here's the comment that's now in OccurAnal.reOrderRec
451 score ((bndr, rhs), _, _)
452 | exprIsTrivial rhs = 3 -- Practically certain to be inlined
453 -- Used to have also: && not (isExportedId bndr)
454 -- But I found this sometimes cost an extra iteration when we have
455 -- rec { d = (a,b); a = ...df...; b = ...df...; df = d }
456 -- where df is the exported dictionary. Then df makes a really
457 -- bad choice for loop breaker
459 I also increased the score for bindings with a non-functional type, so that
460 dictionaries have a better chance of getting inlined early
463 5. Add a hash code to the InScopeSet (and make it properly abstract)
464 This should make uniqAway a lot more robust. Simple experiments suggest
465 that uniqAway no longer gets into the long iteration chains that it used
469 6. Fix a bug in the inliner that made the simplifier tend to get into
470 a loop where it would keep iterating ("4 iterations, bailing out" message).
471 In SimplUtils.mkRhsTyLam we float bindings out past a big lambda, thus:
472 x = /\ b -> let g = \x -> f x x
475 g* = /\a -> \x -> f x x
476 x = /\ b -> let g = g* b in E
478 It's essential that we don't simply inling g* back into the RHS of g,
479 else we will be back to square 1. The inliner is meant not to do this
480 because there's no benefit to the inlining, but the size calculation
481 was a little off in CoreUnfold.
484 7. In SetLevels we were bogus-ly building a Subst with an empty in-scope
485 set, so a WARNING popped up when compiling some modules. (knights/ChessSetList
486 was the example that tickled it.) Now in fact the warning wasn't an error,
487 but the Right Thing to do is to carry down a proper Subst in SetLevels, so
488 that is what I have now done. It is very little more expensive.
496 This is a pretty big commit! It adds stuff I've been working on
497 over the last month or so. DO NOT MERGE IT WITH 4.07!
499 Recompilation checking
500 ~~~~~~~~~~~~~~~~~~~~~~
501 Substantial improvement in recompilation checking. The version management
502 is now entirely internal to GHC. ghc-iface.lprl is dead!
504 The trick is to generate the new interface file in two steps:
505 - first convert Types etc to HsTypes etc, and thereby
506 build a new ParsedIface
507 - then compare against the parsed (but not renamed) version of the old
509 Doing this meant adding code to convert *to* HsSyn things, and to
510 compare HsSyn things for equality. That is the main tedious bit.
512 Another improvement is that we now track version info for
513 fixities and rules, which was missing before.
516 Interface file reading
517 ~~~~~~~~~~~~~~~~~~~~~~
518 Make interface files reading more robust.
519 * If the old interface file is unreadable, don't fail. [bug fix]
521 * If the old interface file mentions interfaces
522 that are unreadable, don't fail. [bug fix]
524 * When we can't find the interface file,
525 print the directories we are looking in. [feature]
530 * New flag -ddump-types to print type signatures
536 data T = T1 A | T2 B | T3 C
537 it seems excessive to import the types A, B, C as well, unless
538 the constructors T1, T2 etc are used. A,B,C might be more types,
539 and importing them may mean reading more interfaces, and so on.
540 So the idea is that the renamer will just import the decl
542 unless one of the constructors is used. This turns out to be quite
543 easy to implement. The downside is that we must make sure the
544 constructors are always available if they are really needed, so
545 I regard this as an experimental feature.
548 Elimininate ThinAir names
549 ~~~~~~~~~~~~~~~~~~~~~~~~~
550 Eliminate ThinAir.lhs and all its works. It was always a hack, and now
551 the desugarer carries around an environment I think we can nuke ThinAir
554 As part of this, I had to move all the Prelude RdrName defns from PrelInfo
555 to PrelMods --- so I renamed PrelMods as PrelNames.
557 I also had to move the builtinRules so that they are injected by the renamer
558 (rather than appearing out of the blue in SimplCore). This is if anything simpler.
562 * Tidy up the data types involved in Rules
564 * Eliminate RnEnv.better_provenance; use Name.hasBetterProv instead
566 * Add Unique.hasKey :: Uniquable a => a -> Unique -> Bool
567 It's useful in a lot of places
569 * Fix a bug in interface file parsing for __U[!]
572 =======================================
575 * Try the effect of enhancing update in place with the CPR
576 idea in CoreUnfold.calcUnfoldingGuidance
578 * Check with Simon M re srt on Lit
580 * Make all primops return a data type so that we can't over-apply a primop
581 This makes code gen simpler. Currently the only primops with a polymorphic
584 catch# :: a -> (b->a) -> a
585 tagToEnum# :: Int -> a
587 Very strange code for PrelException.catchException! What has STret got
592 * Missing w/w for coerce in go2 functions of fibToList' in fibheaps
594 * Watch out for re-boxing in workers; sometimes it happens
595 and then w/w is a Bad Thing
597 * Only two uses of mkCompulsoryUnfolding -- try to nuke it
599 * Note that mkDupAlt makes alts that have binders that
600 are guaranteed to appear just once or not at all
602 Same for case binder, but that's harder to take into account.
604 * max :: Int -> Int -> Int could be CPRd but isn't.
606 * In mandel2 we do a little less well than 4.04 because we aren't
607 inlining point_colour, and that means we have to box up an argument
608 before calling it. [This was due to a bug in 4.04]
609 There's also a great opportunity for liberateCase
610 in check_radius, where it loops around with two lazy F# built each time
612 * In PrelShow.itos' we find a thunk like:
613 tpl = case chrzh {(zpzh {(remIntzh {x{-aMf-} 10}) 48})}
614 of tpl{-X1j-} __D P { __DEFAULT ->
615 PrelBase.Czh{-62,s-} {tpl{-X1j-}}
617 This is a pity. The remInt# can't fail because the divisor isn't 0,
618 so we could do the sum eagerly and allocate a charcter instead of a thunk.
620 * It's good to do let-to-case before we wrap up. Consider
621 f b xs = let ys = partition isUpper xs
622 zs = case ys of (a,b) -> a
627 (a,b) -> (zs ++ xs,[])
628 If we don't do let-to-case at all, we get 3 redundant case ys left.
629 On the other hand we don't want to do it too early, because it
630 prevents inlining into strict arg positions, which is important for
633 * Strict dictionaries.
635 * INLINE functions are not always INLINEd, so it's sad to leave
636 stuff in their bodies like constructors that havn't been inlined.
638 * If let x = e in b is strict, then CPR can use the CPR info from x
639 This bites in the mod method of Integral Int
641 * Inline wrappers if they are the RHS of a let, so that update in place
644 * Consider doing unboxing on strict constr args in a pattern match,
647 * In spectral/expert/Search.ask there's a statically visible CSE. Catching this
648 depends almost entirely on chance, which is a pity.
650 * Think about exprEtaExpandArity in WwLib. Perhaps eliminate eta expand in simplify?
651 Perhaps use even if no coerces etc, just eta expansion. (e.g. PrelArr.done)
653 * In knights/KnightHeuristic, we don't find that possibleMoves is strict
654 (with important knock-on effects) unless we apply rules before floating
655 out the literal list [A,B,C...].
656 Similarly, in f_se (F_Cmp ...) in listcompr (but a smaller effect)
658 * Floating can float the entire body of an INLINE thing out.
660 This is sad, and a bit stupid.
662 * In spectral/multiplier, we have
663 xor = lift21 forceBit f
664 where f :: Bit -> Bit -> Bit
669 Trouble is, f is CPR'd, and that means that instead of returning
670 the constants I# 0, I# 1, it returns 0,1 and then boxes them.
671 So allocation goes up. I don't see a way around this.
673 * spectral/hartel/parstof ends up saying
674 case (unpackCString "x") of { c:cs -> ... }
675 quite a bit. We should spot these and behave accordingly.
677 * Try a different hashing algorithms in hashUFM. This might reduce long CSE lists
678 as well as making uniqAway faster.
680 * [I'm not sure this is really important in the end.]
681 Don't float out partial applications in lvlMFE. E.g. (in hPutStr defn of shoveString)
685 setBufWPtr has arity 3. Floating it out is plain silly. And in this particular
686 case it's harmful, because it ends up preventing eta expansion on the \x.
687 That in turn leads to a big extra cost in hPutStr.
689 *** Try not doing lvlMFE on the body of a lambda and case alternative ***
691 * PrelNumExtra.lhs we get three copies of dropTrailing0s. Too much inlining!
692 drop0 has cost 21, but gets a discount of 6 (3 * #constrs) for its arg.
693 With a keen-neess factor of 2, that makes a discount of 12. Add two for
694 the arguments and we get 21-12-2, which is just small enough to inline.
695 But that is plainly stupid.
697 Add one for cases; and decrease discount for constructors.
699 * IO.hGetContents still doesn't see that it is strict in the handle.
700 Coerces still getting in the way.
702 * Try not having really_interesting_cont (subsumed by changes in the
703 way guidance is calculated for inline things?)
705 * Enumeration types in worker/wrapper for strictness analysis
707 * This should be reported as an error:
708 data T k = MkT (k Int#)
710 * Bogus report of overlapped pattern for
711 f (R {field = [c]}) = 1
713 This shows up for TyCon.maybeTyConSingleCon
715 * > module Main( main ) where
727 For ``f'' the following is reported.
730 Pattern match(es) are overlapped in the definition of function `f'
733 There are no complaints for definition for ``g''.
735 * Without -O I don't think we need change the module version
736 if the usages change; I forget why it changes even with -O
738 * Record selectors for existential type; no good! What to do?
739 Record update doesn't make sense either.
741 Need to be careful when figuring out strictness, and when generating
742 worker-wrapper split.
750 A fairly big pile of work originally aimed at
751 removing the Con form of Core expression, and replacing it with simple
752 Lit form. However, I wanted to make sure that the resulting thing
753 performed better than the original, so I ended up making an absolute
754 raft of other changes.
756 Removing the Con form of Core expressions
757 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
758 The big thing is that
760 For every constructor C there are now *two* Ids:
762 C is the constructor's *wrapper*. It evaluates and unboxes arguments
763 before calling $wC. It has a perfectly ordinary top-level defn
764 in the module defining the data type.
766 $wC is the constructor's *worker*. It is like a primop that simply
767 allocates and builds the constructor value. Its arguments are the
768 actual representation arguments of the constructor.
770 For every primop P there is *one* Id, its (curried) Id
772 Neither contructor worker Id nor the primop Id have a defminition anywhere.
773 Instead they are saturated during the core-to-STG pass, and the code generator
774 generates code for them directly. The STG language still has saturated
775 primops and constructor applications.
777 * The Const type disappears, along with Const.lhs. The literal part
778 of Const.lhs reappears as Literal.lhs. Much tidying up in here,
779 to bring all the range checking into this one module.
781 * I got rid of NoRep literals entirely. They just seem to be too much trouble.
783 * Because Con's don't exist any more, the funny C { args } syntax
784 disappears from inteface files.
786 * Every constructor, C, comes with a
788 *wrapper*, called C, whose type is exactly what it looks like
789 in the source program. It is an ordinary function,
790 and it gets a top-level binding like any other function
792 *worker*, called $wC, which is the actual data constructor.
793 Its type may be different to C, because:
794 - useless dict args are dropped
795 - strict args may be flattened
796 It does not have a binding.
798 The worker is very like a primop, in that it has no binding,
803 * Result type signatures now work
804 f :: Int -> Int = \x -> x
805 -- The Int->Int is the type of f
808 -- The Int is the type of the result of (g x y)
811 Recompilation checking and make
812 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
813 * The .hi file for a modules is not touched if it doesn't change. (It used to
814 be touched regardless, forcing a chain of recompilations.) The penalty for this
815 is that we record exported things just as if they were mentioned in the body of
816 the module. And the penalty for that is that we may recompile a module when
817 the only things that have changed are the things it is passing on without using.
818 But it seems like a good trade.
820 * -recomp is on by default
825 foreign export zoo :: Int -> IO Int
826 then you get a C produre called 'zoo', not 'zzoo' as before.
827 I've also added a check that complains if you export (or import) a C
828 procedure whose name isn't legal C.
831 Code generation and labels
832 ~~~~~~~~~~~~~~~~~~~~~~~~~~
833 * Now that constructor workers and wrappers have distinct names, there's
834 no need to have a Foo_static_closure and a Foo_closure for constructor Foo.
835 I nuked the entire StaticClosure story. This has effects in some of
836 the RTS headers (i.e. s/static_closure/closure/g)
839 Rules, constant folding
840 ~~~~~~~~~~~~~~~~~~~~~~~
841 * Constant folding becomes just another rewrite rule, attached to the Id for the
842 PrimOp. To achieve this, there's a new form of Rule, a BuiltinRule (see CoreSyn.lhs).
843 The prelude rules are in prelude/PrelRules.lhs, while simplCore/ConFold.lhs has gone.
845 * Appending of constant strings now works, using fold/build fusion, plus
847 unpack "foo" c (unpack "baz" c n) = unpack "foobaz" c n
848 Implemented in PrelRules.lhs
850 * The CCall primop is tidied up quite a bit. There is now a data type CCall,
851 defined in PrimOp, that packages up the info needed for a particular CCall.
852 There is a new Id for each new ccall, with an big "occurrence name"
853 {__ccall "foo" gc Int# -> Int#}
854 In interface files, this is parsed as a single Id, which is what it is, really.
858 * There were numerous places where the host compiler's
859 minInt/maxInt was being used as the target machine's minInt/maxInt.
860 I nuked all of these; everything is localised to inIntRange and inWordRange,
863 * Desugaring record updates was broken: it didn't generate correct matches when
864 used withe records with fancy unboxing etc. It now uses matchWrapper.
866 * Significant tidying up in codeGen/SMRep.lhs
868 * Add __word, __word64, __int64 terminals to signal the obvious types
869 in interface files. Add the ability to print word values in hex into
872 * PrimOp.lhs is no longer part of a loop. Remove PrimOp.hi-boot*
877 * isProductTyCon no longer returns False for recursive products, nor
878 for unboxed products; you have to test for these separately.
879 There's no reason not to do CPR for recursive product types, for example.
880 Ditto splitProductType_maybe.
884 * New -fno-case-of-case flag for the simplifier. We use this in the first run
885 of the simplifier, where it helps to stop messing up expressions that
886 the (subsequent) full laziness pass would otherwise find float out.
887 It's much more effective than previous half-baked hacks in inlining.
889 Actually, it turned out that there were three places in Simplify.lhs that
890 needed to know use this flag.
892 * Make the float-in pass push duplicatable bindings into the branches of
893 a case expression, in the hope that we never have to allocate them.
894 (see FloatIn.sepBindsByDropPoint)
896 * Arrange that top-level bottoming Ids get a NOINLINE pragma
897 This reduced gratuitous inlining of error messages.
898 But arrange that such things still get w/w'd.
900 * Arrange that a strict argument position is regarded as an 'interesting'
901 context, so that if we see
903 then we'll be inclined to inline g; this can expose a build.
905 * There was a missing case in CoreUtils.exprEtaExpandArity that meant
906 we were missing some obvious cases for eta expansion
907 Also improve the code when handling applications.
909 * Make record selectors (identifiable by their IdFlavour) into "cheap" operations.
910 [The change is a 2-liner in CoreUtils.exprIsCheap]
911 This means that record selection may be inlined into function bodies, which
912 greatly improves the arities of overloaded functions.
914 * Make a cleaner job of inlining "lone variables". There was some distributed
915 cunning, but I've centralised it all now in SimplUtils.analyseCont, which
916 analyses the context of a call to decide whether it is "interesting".
918 * Don't specialise very small functions in Specialise.specDefn
919 It's better to inline it. Rather like the worker/wrapper case.
921 * Be just a little more aggressive when floating out of let rhss.
922 See comments with Simplify.wantToExpose
923 A small change with an occasional big effect.
925 * Make the inline-size computation think that
926 case x of I# x -> ...
932 * Fix what was essentially a bug in CPR analysis. Consider
934 letrec f x = let g y = let ... in f e1
936 if ... then (a,b) else g x
938 g has the CPR property if f does; so when generating the final annotated
939 RHS for f, we must use an envt in which f is bound to its final abstract
940 value. This wasn't happening. Instead, f was given the CPR tag but g
941 wasn't; but of course the w/w pass gives rotten results in that case!!
942 (Because f's CPR-ness relied on g's.)
944 On they way I tidied up the code in CprAnalyse. It's quite a bit shorter.
946 The fact that some data constructors return a constructed product shows
947 up in their CPR info (MkId.mkDataConId) not in CprAnalyse.lhs
951 Strictness analysis and worker/wrapper
952 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
953 * BIG THING: pass in the demand to StrictAnal.saExpr. This affects situations
955 f (let x = e1 in (x,x))
956 where f turns out to have strictness u(SS), say. In this case we can
957 mark x as demanded, and use a case expression for it.
959 The situation before is that we didn't "know" that there is the u(SS)
960 demand on the argument, so we simply computed that the body of the let
961 expression is lazy in x, and marked x as lazily-demanded. Then even after
964 let x = e1 in case (x,x) of (a,b) -> $wf a b
968 let x = e1 in $wf a b
970 I found a much more complicated situation in spectral/sphere/Main.shade,
971 which improved quite a bit with this change.
973 * Moved the StrictnessInfo type from IdInfo to Demand. It's the logical
974 place for it, and helps avoid module loops
976 * Do worker/wrapper for coerces even if the arity is zero. Thus:
977 stdout = coerce Handle (..blurg..)
979 wibble = (...blurg...)
980 stdout = coerce Handle wibble
981 This is good because I found places where we were saying
982 case coerce t stdout of { MVar a ->
984 case coerce t stdout of { MVar b ->
986 and the redundant case wasn't getting eliminated because of the coerce.
992 * Fix a few renamer bugs
994 * Substantially reorganise the Prelude to eliminate all orphan declarations.
995 Details in PrelBase.lhs
997 * Do a much better job of appending literal strings
999 - move unpackCString stuff to PrelBase
1000 - add BuiltinRules to the Rule type
1001 - add fold/build rules for literal strings
1007 * Fix a terrible bug in Simplify.mkDupableAlt; we were duplicating a small
1008 *InAlt*, but doing so invalidated occurrence info, which could lead to
1009 substantial code duplication.
1011 * Fix a bug in WwLib.mkWWcpr; I was generating CPR wrappers like
1013 which is utterly wrong. It should be
1015 (The effect was to make functions stricter than they really are.)
1017 * Try doing no inlining at all in phase 0. This noticeably improved
1018 spectral/fish (esp Main.hs I think), by improving floating.
1019 This single change has quite a large effect on some programs (allocation)
1021 Don't inline Don't inline
1023 in phase 0 in phase 0
1025 cichelli 28962k -3.12%
1026 wave4main 88089k +130.45%
1027 fibheaps 31731k +19.01%
1029 typecheck 148713k +4.91%
1031 But I found that fish worked much better if we inline *local* things
1032 in phase 0, but not *imported* things.
1034 * Fix a terrible bug in Simplify.mkLamBndrZapper. It was counting
1035 type args in one place, but not type binders, so it was sometimes
1036 inlining into unsaturated lambdas!
1038 * I found that there were some very bad loss-of-arity cases in PrelShow.
1039 In particular, we had:
1041 showl "" = showChar '"' s
1042 showl ('"':xs) = showString "\\\"" . showl xs
1043 showl (x:xs) = showLitChar x . showl xs
1046 showl = \xs -> case xs of
1048 (x:xs) -> let f = showLitChar x
1051 which is TERRIBLE. We can't spot that showLitChar has arity 2 because
1055 showLitChar c = showString ('\\' : asciiTab!!ord c)
1057 notice that the (asciiTab!!orc c) is outside the \s, so GHC can't rewrite it to
1059 showLitChar c = \s -> showString ('\\' : asciiTab!!ord c) s
1061 So I've changed PrelShow.showLitChar to use explicit \s. Even then, showl
1062 doesn't work, because GHC can't see that showl xs can be pushed inside the \s.
1063 So I've put an explict \s there too.
1065 showl "" s = showChar '"' s
1066 showl ('"':xs) s = showString "\\\"" (showl xs s)
1067 showl (x:xs) s = showLitChar x (showl xs s)
1069 Net result: imaginary/gen_regexps more than halves in allocation!
1071 Turns out that the mkLamBndrZapper bug (above) meant that showl was
1072 erroneously inlining showLitChar x and showl xs, which is why this
1073 problem hasn't shown up before.
1075 * Improve CSE a bit. In ptic
1076 case h x of y -> ...(h x)...
1077 replaces (h x) by y.
1079 * Inline INLINE things very agressively, even though we get code duplication
1080 thereby. Reason: otherwise we sometimes call the original un-inlined INLINE
1081 defns, which have constructors etc still un-inlined in their RHSs. The
1082 improvement is dramatic for a few programs:
1084 typecheck 150865k -1.43%
1085 wave4main 114216k -22.87%
1087 cichelli 33786k -14.28%
1089 rewrite 14665k -4.91%
1092 Code size increases by 10% which is not so good. There must be a better way.
1093 Another bad thing showed up in fish/Main.hs. Here we have
1094 (x1,y1) `vec_add` (x2,y2) = (x1+x2, y1+y2)
1095 which tends to get inlined. But if we first inline (+), it looks big,
1096 so we don't inline it. Sigh.
1099 * Don't inline constructors in INLINE RHSs. Ever. Otherwise rules don't match.
1102 * In ebnf2ps/Lexer.uncommentString, it would be a good idea to inline a constructor
1103 that occurs once in each branch of a case. That way it doesn't get allocated
1104 in the branches that don't use it. And in fact in this particular case
1105 something else good happens. So CoreUnfold now does that.
1107 * Reverted to n_val_binders+2 in calcUnfoldingGuidance
1108 Otherwise wrappers are inlined even if there's no benefit.
1113 * Arrange that simplConArgs works in one less pass than before.
1114 This exposed a bug: a bogus call to completeBeta.
1116 * Add a top-level flag in CoreUnfolding, used in callSiteInline
1118 * Extend w/w to use etaExpandArity, so it does eta/coerce expansion
1120 * Don't float anything out of an INLINE.
1121 Don't float things to top level unless they also escape a value lambda.
1122 [see comments with SetLevels.lvlMFE
1123 Without at least one of these changes, I found that
1124 {-# INLINE concat #-}
1125 concat = __inline (/\a -> foldr (++) [])
1126 was getting floated to
1127 concat = __inline( /\a -> lvl a )
1128 lvl = ...inlined version of foldr...
1130 Subsequently I found that not floating constants out of an INLINE
1131 gave really bad code like
1132 __inline (let x = e in \y -> ...)
1133 so I now let things float out of INLINE
1135 * Implement inline phases. The meaning of the inline pragmas is
1136 described in CoreUnfold.lhs
1138 * Implement the "reverse-mapping" idea for CSE; actually it turned out to be easier
1139 to implement it in SetLevels, and may benefit full laziness too.
1143 * It's a good idea to inline inRange. Consider
1145 index (l,h) i = case inRange (l,h) i of
1148 inRange itself isn't strict in h, but if it't inlined then 'index'
1149 *does* become strict in h. Interesting!
1151 * Big change to the way unfoldings and occurrence info is propagated in the simplifier
1152 The plan is described in Subst.lhs with the Subst type
1153 Occurrence info is now in a separate IdInfo field than user pragmas
1156 (coerce T (coerce S (\x.e))) y
1157 didn't simplify in one round. First we get to
1159 and only then do the beta. Solution: cancel the coerces in the continuation
1161 * Amazingly, CoreUnfold wasn't counting the cost of a function an application.
1165 * No commas between for-alls in RULES
1167 * Disable rules in initial simplifier run. Otherwise full laziness
1168 doesn't get a chance to lift out a MFE before a rule (e.g. fusion)
1169 zaps it. queens is a case in point
1171 * Improve float-out stuff significantly. The big change is that if we have
1173 \x -> ... /\a -> ...let p = ..a.. in let q = ...p...
1175 where p's rhs doesn't x, we abstract a from p, so that we can get p past x.
1176 (We did that before.) But we also substitute (p a) for p in q, and then
1177 we can do the same thing for q. (We didn't do that, so q got stuck.)
1178 This is much better. It involves doing a substitution "as we go" in SetLevels,
1184 * exprIsDupable for an application (f e1 .. en) wasn't calling exprIsDupable
1185 on the arguments!! So applications with few, but large, args were being dupliated.
1187 * sizeExpr on an application wasn't doing a nukeScrutDiscount on the arg of
1188 an application!! So bogus discounts could accumulate from arguments!
1190 * Improve handling of INLINE pragmas in calcUnfoldingGuidance. It was really
1193 * Substantially improve handling of coerces in worker/wrapper
1197 * Fix Kevin Atkinson's cant-find-instance bug. Turns out that Rename.slurpSourceRefs
1198 needs to repeatedly call getImportedInstDecls, and then go back to slurping
1199 source-refs. Comments with Rename.slurpSourceRefs.
1201 * Add a case to Simplify.mkDupableAlt for the quite-common case where there's
1202 a very simple alternative, in which case there's no point in creating a
1205 * Fix CoreUtils.exprOkForSpeculation so that it returns True of (==# a# b#).
1206 This lack meant that
1207 case ==# a# b# of { True -> x; False -> x }
1210 * Make float-out dump bindings at the top of a function argument, as
1211 at the top of a let(rec) rhs. See notes with FloatOut.floatRhs
1213 * Make the ArgOf case of mkDupableAlt generate a OneShot lambda.
1214 This gave a noticeable boost to spectral/boyer2
1219 Work, using IO.hPutStr as an example, to reduce the number of coerces.
1220 The main idea is in WwLib.mkWWcoerce. The gloss is that we must do
1221 the w/w split even for small non-recursive things. See notes with
1227 Study why gen_regexps is slower than before. Problem is in IO.writeLines,
1228 in particular the local defn shoveString. Two things are getting
1229 in the way of arity expansion, which means we build far more function
1230 closures than we should:
1231 shove = \ x -> let lvl = \s -> ...
1232 in \s -> ... lvl ...
1236 b) full laziness floats
1239 Solution to (a): add coerces to the worker/wrapper stuff.
1240 See notes with WwLib.mkWWcoerce.
1242 This further complicated getWorkerId, so I finally bit the bullet and
1243 make the workerInfo field of the IdInfo work properly, including
1244 under substitutions. Death to getWorkerId.
1248 Solution to (b): make all lambdas over realWorldStatePrimTy
1249 into one-shot lambdas. This is a GROSS HACK.
1251 * Also make the occurrence analyser aware of one-shot lambdas.
1256 Fix SetLevels so that it does not clone top-level bindings, but it
1257 *does* clone bindings that are destined for the top level.
1259 The global invariant is that the top level bindings are always
1260 unique, and never cloned.