ghc/docs/NOTES.desugar

   1 (91/08/08: OLD!)
   2
   3 These are notes about a _simple_ but complete pattern-matching
   4 compiler for Haskell.  I presume familiarity with Phil's
   5 pattern-matching stuff in Simon's book and use roughly the same notation.
   6
   7 Abbreviations: "p" for pattern, "e" (or "E") for expression, "g" for
   8 guard, "v" for variable, "u" for new variable I made up. "[]" for
   9 FATBAR.
  10
  11 Subscripts: "p11" is really short for "p_{1,1}".  Sometimes I'll use
  12 a "?", as in "pm1 ... pm?", to mean the second subscript goes up to
  13 something I'm really not worried about.
  14
  15 NB: LETRECS NOT DEALT WITH YET.
  16
  17 ---------------------------------------------------------------------
  18 We need a slightly souped-up "match" for Haskell (vs the Phil-chapter
  19 one).  Simon suggested a re-arrangement of things, which I have then
  20 further re-arranged...
  21
  22 Proposal (Simon)
  23 ~~~~~~~~
  24
  25 Eliminate default arg of match (3rd arg in Phil-chapter match) in
  26 favour of returning the variable (not special value) fail.  Thus a
  27 possible translation for
  28
  29         f [] [] = e1
  30         f x  y  = e2
  31
  32 would be
  33
  34         f p q = case p of
  35                 [] -> case q of
  36                         [] -> e1
  37                         _  -> fail
  38                 _ -> fail
  39                 where
  40                 fail = e2
  41
  42 Now the issue of whether to duplicate code or share it becomes whether
  43 to substitute copies of e2 or not.  This is a decision we need to take
  44 anyway for all other let-bound things, so why not for fail too?  If
  45 fail is used only once, we will certainly substitute for it.
  46
  47 We could even detect that fail is used only in a head position, so it
  48 can be implemented as a stack-adjust then a jump.  This might well
  49 apply to other let-bound things too.
  50
  51 Now here's a proposal for the "match" function.  The main difference is
  52     1) no default argument
  53     2) [contra simon's suggestion]  Patterns are still per-row as in
  54        Phil's chapter.
  55     3) [partain] even the input exprs are CoreExprs
  56
  57 OK, for a "match" for m equations each with n patterns:
  58
  59 match :: [Name]
  60                 -- n (variable) names, one per pattern column, bound
  61                 -- to the n expressions we are matching against the
  62                 -- patterns
  63
  64       -> [([Pat], CoreExpr)]
  65                 -- one pair for each of the m equations: the n
  66                 -- patterns in that equation, then the CoreExpr that
  67                 -- is evaluated if we get a match.  The CoreExpr may
  68                 -- contain free "fail"s; some hackery required to
  69                 -- ensure that is OK; see below
  70
  71       -> CoreExpr
  72                 -- the resulting code to do the matching
  73
  74 In words,
  75   takes
  76     (1) a list of n (match-expression, pattern-column) pairs
  77     (2) a list of m post-match expressions, expr i to be inserted
  78         immediately after equation i's lhs matches
  79   returns
  80     (1) a desugared expr equivalent of the whole "match"
  81
  82 Meaning
  83 ~~~~~~~
  84     match [u1, ..., un]
  85           [([p11, ..., p1n], e1), ..., ([pm1, ..., pmn], em)]
  86
  87     match [ (e1, [p11, ...,pm1]), ..., (en, [p1n, ...,pmn])]
  88           [ E1, ... Em ]
  89
  90     ********* MEANS *********
  91
  92     case (u1, ..., un) of
  93          (p11, ..., p1n) -> e1
  94          _               -> fail
  95     where
  96     fail = case (u1, ..., un) of
  97                 (p21, ..., p2n) -> e2
  98                 _               -> fail
  99     ... and so on ...
 100
 101 Alternatively, this specification could be given in terms of
 102 pattern-matching lambdas, as in Phil's chapter.
 103
 104 NOT CHANGED BEYOND HERE
 105
 106 -------------------------------------------------------------------
 107 Cranking through a good old function definition with the above:
 108
 109     f p11 p12 ... p1n | g11 = e11
 110                       | g12 = e12
 111                       ...
 112                       | g1? = e1?
 113     ...
 114     f pm1 pm2 ... pmn | gm1 = em1
 115                       ...
 116                       | gm? = em?
 117
 118 The "match" equivalent is:
 119
 120 f = \u1.\u2...\un ->
 121         match [ (u1, [p11, ...,pm1]), ..., (un, [p1n, ...,pmn])]
 122               [ E1, ..., Em ]
 123         where fail = error "pattern-match for f failed\n"
 124               E1   = if g11 then e11 else if g12 then ... else fail
 125               ...
 126               Em   = if gm1 then em1 else if gm2 then ... else fail
 127
 128 Boring, huh?
 129
 130 -------------------------------------------------------------------
 131 It is helpful to me to think about the simple/base cases for this
 132 complicated "match".
 133
 134 ALL LISTS EMPTY
 135
 136     match [] []
 137
 138   corresponds to the syntactically bogus (zero equations!?)
 139
 140     case () of
 141       () -> {- nothing!! -}
 142       _  -> fail
 143
 144
 145 EMPTY RULE -- no more patterns
 146
 147     match [] [ ([], E1), ..., ([], Em) ]
 148
 149   [where, incidentally, each Ei will be of the form
 150    (not that it has to be...)
 151
 152     Ei = let x1 = e1 in
 153          let x2 = e2 in
 154          ...
 155          let x? = e? in
 156          if g1 then e'1
 157          else if g2 then
 158          ...
 159          else if g? then e'?
 160          else fail
 161   ]
 162
 163   becomes ("E1 [] E2 [] ... [] Em" in Phil's chapter...)
 164
 165     E1
 166     where
 167        fail = E2
 168        where
 169        ...
 170          fail = Em-1
 171          where fail = Em
 172
 173   with any "fail" in Em being bound from an outer scope; perhaps it's
 174   easier to see written as:
 175
 176     let fail = Em
 177      in let fail = Em-1
 178         in ...
 179             let fail = E2 in E1
 180 -------------------------------------------------------------------
 181 HANDLING LAZY ("TWIDDLE") PATTERNS
 182
 183 For Haskell, the "mixture rule" (p.~88) looks at a pattern-column and
 184 splits the equations into groups, depending on whether it sees
 185
 186     * all constructors, or
 187     * all variables _OR LAZY PATTERNS_
 188
 189 The following example shows what "match" does when confronted by one
 190 of these variables/lazy-patterns combinations.  Note the use of the
 191 binding lists.
 192
 193     f  v | g11 = e11
 194          ...
 195          | g1? = e1?
 196     f ~p | g21 = e21
 197          ...
 198          | g2? = e2?
 199
 200 is
 201
 202     f = \ u1 ->
 203         match [(u1, [ v, ~p ])]
 204               [ if g11 then e11 else if ... else fail, -- E1
 205                 if g21 then e21 else if ... else fail  -- E2
 206               ]
 207     where fail = error "no match in f\n"
 208
 209 which transmogrifies into
 210
 211     f = \ u1 ->
 212         let u2 = u1 in
 213         match []
 214               [ -- E1 --
 215                 let v = u2
 216                 in
 217                 if g11 then e11 else if ... else fail
 218
 219                ,-- E2 --
 220                 let free_var1_of_p = match [(u2, [ p ])] [ free_var1_of_p ]
 221                     ...
 222                     free_var?_of_p = match [(u2, [ p ])] [ free_var?_of_p ]
 223                 in
 224                 if g21 then e21 else if ... else fail  -- E2
 225
 226               ]
 227     where fail = error "no match in f\n"
 228
 229 For more specific match-failure error messages, one could insert
 230 "let fail = ..."'s in strategic places.
 231
 232 -------------------------------------------------------------------
 233 "match" EQUIVALENTS FOR VARIOUS HASKELL CONSTRUCTS
 234
 235 * function definition -- shown above
 236
 237 * pattern-matching lambda (souped up version in static semantics)
 238
 239     \ p1 p2 ... pn | g1 -> e1
 240                    | g2 -> e2
 241                    ...
 242                    | gm -> em
 243
 244   is the same as
 245
 246     \ u1.\u2 ... \un ->
 247         match [ (u1, [p1]), ..., (un, [pn])]
 248               [ if g1 then e1 else if ... then em else fail
 249               ]
 250         where fail = error "no match in pattern-matching lambda at line 293\n"
 251
 252 * pattern-matching (simple, non-recursive) "let"
 253
 254     let p = e
 255     in E
 256
 257   corresponds to
 258
 259     case e of
 260       ~p -> E
 261
 262   which has a "match" equivalent of
 263
 264     match [(e, [~p])] [ E ]
 265
 266   The full-blown Haskell "let" is more horrible:
 267
 268     let p | g1 = e1
 269           ...
 270           | gn = en
 271     in E
 272
 273   corresponds to
 274
 275     case ( if g1 then e1 else... else if gn then en else error "?" ) of
 276       ~p -> E
 277
 278   thinking about which I am not able to sleep well at night.
 279   (Won't those g's have things bound from inside p ?)
 280
 281 * pattern-matching (not-quite-so simple, non-recursive) "let"
 282
 283 <mumble>
 284
 285 * pattern binding
 286
 287     p | g1 = e1
 288       | g2 = e2
 289       ...
 290       | gm = em
 291
 292   That's the same as
 293
 294     p = if g1 then e1 else if ... else if gm then em else fail
 295     where fail = "...some appropriate thing..."
 296
 297   which corresponds to
 298
 299     match [ (if g1 ... then em else fail, [ ~p ]) ]
 300           [ {-nothing-} ]
 301     where fail = "...some appropriate thing..."
 302
 303 * "case" expressions (souped up version in static semantics)
 304
 305     case e0 of
 306       p1 | g11 -> e11
 307          ...
 308          | g1? -> e1?
 309       ...
 310       pm | gm1 -> em1
 311          ...
 312          | gm? -> em?
 313
 314   is the same as
 315
 316     match [ (e0, [p1, ..., pm]) ]
 317           [ if g11 then e11 else if ... else fail -- E1
 318             , ... ,
 319             if gm1 then em1 else if ... else fail
 320           ]
 321     where fail = error "pattern-matching case at line xxx failed\n"
 322
 323 * list comprehensions