(91/08/08: OLD!) These are notes about a _simple_ but complete pattern-matching compiler for Haskell. I presume familiarity with Phil's pattern-matching stuff in Simon's book and use roughly the same notation. Abbreviations: "p" for pattern, "e" (or "E") for expression, "g" for guard, "v" for variable, "u" for new variable I made up. "[]" for FATBAR. Subscripts: "p11" is really short for "p_{1,1}". Sometimes I'll use a "?", as in "pm1 ... pm?", to mean the second subscript goes up to something I'm really not worried about. NB: LETRECS NOT DEALT WITH YET. --------------------------------------------------------------------- We need a slightly souped-up "match" for Haskell (vs the Phil-chapter one). Simon suggested a re-arrangement of things, which I have then further re-arranged... Proposal (Simon) ~~~~~~~~ Eliminate default arg of match (3rd arg in Phil-chapter match) in favour of returning the variable (not special value) fail. Thus a possible translation for f [] [] = e1 f x y = e2 would be f p q = case p of [] -> case q of [] -> e1 _ -> fail _ -> fail where fail = e2 Now the issue of whether to duplicate code or share it becomes whether to substitute copies of e2 or not. This is a decision we need to take anyway for all other let-bound things, so why not for fail too? If fail is used only once, we will certainly substitute for it. We could even detect that fail is used only in a head position, so it can be implemented as a stack-adjust then a jump. This might well apply to other let-bound things too. Now here's a proposal for the "match" function. The main difference is 1) no default argument 2) [contra simon's suggestion] Patterns are still per-row as in Phil's chapter. 3) [partain] even the input exprs are CoreExprs OK, for a "match" for m equations each with n patterns: match :: [Name] -- n (variable) names, one per pattern column, bound -- to the n expressions we are matching against the -- patterns -> [([Pat], CoreExpr)] -- one pair for each of the m equations: the n -- patterns in that equation, then the CoreExpr that -- is evaluated if we get a match. The CoreExpr may -- contain free "fail"s; some hackery required to -- ensure that is OK; see below -> CoreExpr -- the resulting code to do the matching In words, takes (1) a list of n (match-expression, pattern-column) pairs (2) a list of m post-match expressions, expr i to be inserted immediately after equation i's lhs matches returns (1) a desugared expr equivalent of the whole "match" Meaning ~~~~~~~ match [u1, ..., un] [([p11, ..., p1n], e1), ..., ([pm1, ..., pmn], em)] match [ (e1, [p11, ...,pm1]), ..., (en, [p1n, ...,pmn])] [ E1, ... Em ] ********* MEANS ********* case (u1, ..., un) of (p11, ..., p1n) -> e1 _ -> fail where fail = case (u1, ..., un) of (p21, ..., p2n) -> e2 _ -> fail ... and so on ... Alternatively, this specification could be given in terms of pattern-matching lambdas, as in Phil's chapter. NOT CHANGED BEYOND HERE ------------------------------------------------------------------- Cranking through a good old function definition with the above: f p11 p12 ... p1n | g11 = e11 | g12 = e12 ... | g1? = e1? ... f pm1 pm2 ... pmn | gm1 = em1 ... | gm? = em? The "match" equivalent is: f = \u1.\u2...\un -> match [ (u1, [p11, ...,pm1]), ..., (un, [p1n, ...,pmn])] [ E1, ..., Em ] where fail = error "pattern-match for f failed\n" E1 = if g11 then e11 else if g12 then ... else fail ... Em = if gm1 then em1 else if gm2 then ... else fail Boring, huh? ------------------------------------------------------------------- It is helpful to me to think about the simple/base cases for this complicated "match". ALL LISTS EMPTY match [] [] corresponds to the syntactically bogus (zero equations!?) case () of () -> {- nothing!! -} _ -> fail EMPTY RULE -- no more patterns match [] [ ([], E1), ..., ([], Em) ] [where, incidentally, each Ei will be of the form (not that it has to be...) Ei = let x1 = e1 in let x2 = e2 in ... let x? = e? in if g1 then e'1 else if g2 then ... else if g? then e'? else fail ] becomes ("E1 [] E2 [] ... [] Em" in Phil's chapter...) E1 where fail = E2 where ... fail = Em-1 where fail = Em with any "fail" in Em being bound from an outer scope; perhaps it's easier to see written as: let fail = Em in let fail = Em-1 in ... let fail = E2 in E1 ------------------------------------------------------------------- HANDLING LAZY ("TWIDDLE") PATTERNS For Haskell, the "mixture rule" (p.~88) looks at a pattern-column and splits the equations into groups, depending on whether it sees * all constructors, or * all variables _OR LAZY PATTERNS_ The following example shows what "match" does when confronted by one of these variables/lazy-patterns combinations. Note the use of the binding lists. f v | g11 = e11 ... | g1? = e1? f ~p | g21 = e21 ... | g2? = e2? is f = \ u1 -> match [(u1, [ v, ~p ])] [ if g11 then e11 else if ... else fail, -- E1 if g21 then e21 else if ... else fail -- E2 ] where fail = error "no match in f\n" which transmogrifies into f = \ u1 -> let u2 = u1 in match [] [ -- E1 -- let v = u2 in if g11 then e11 else if ... else fail ,-- E2 -- let free_var1_of_p = match [(u2, [ p ])] [ free_var1_of_p ] ... free_var?_of_p = match [(u2, [ p ])] [ free_var?_of_p ] in if g21 then e21 else if ... else fail -- E2 ] where fail = error "no match in f\n" For more specific match-failure error messages, one could insert "let fail = ..."'s in strategic places. ------------------------------------------------------------------- "match" EQUIVALENTS FOR VARIOUS HASKELL CONSTRUCTS * function definition -- shown above * pattern-matching lambda (souped up version in static semantics) \ p1 p2 ... pn | g1 -> e1 | g2 -> e2 ... | gm -> em is the same as \ u1.\u2 ... \un -> match [ (u1, [p1]), ..., (un, [pn])] [ if g1 then e1 else if ... then em else fail ] where fail = error "no match in pattern-matching lambda at line 293\n" * pattern-matching (simple, non-recursive) "let" let p = e in E corresponds to case e of ~p -> E which has a "match" equivalent of match [(e, [~p])] [ E ] The full-blown Haskell "let" is more horrible: let p | g1 = e1 ... | gn = en in E corresponds to case ( if g1 then e1 else... else if gn then en else error "?" ) of ~p -> E thinking about which I am not able to sleep well at night. (Won't those g's have things bound from inside p ?) * pattern-matching (not-quite-so simple, non-recursive) "let" * pattern binding p | g1 = e1 | g2 = e2 ... | gm = em That's the same as p = if g1 then e1 else if ... else if gm then em else fail where fail = "...some appropriate thing..." which corresponds to match [ (if g1 ... then em else fail, [ ~p ]) ] [ {-nothing-} ] where fail = "...some appropriate thing..." * "case" expressions (souped up version in static semantics) case e0 of p1 | g11 -> e11 ... | g1? -> e1? ... pm | gm1 -> em1 ... | gm? -> em? is the same as match [ (e0, [p1, ..., pm]) ] [ if g11 then e11 else if ... else fail -- E1 , ... , if gm1 then em1 else if ... else fail ] where fail = error "pattern-matching case at line xxx failed\n" * list comprehensions