1 ______________________________________________________________________________
4 Token -- an indivisible datum. Most things in SBP are parameterized
7 Tree -- a tree of Tokens; each node consists of a Token (called the
8 node's "head") and zero or more children (which are in
9 turn Trees). To use a Haskell-like notation,
11 data Tree a = Tree a [Tree]
13 Note that (1) this is not the algebraic type most
14 commonly used for trees and (2) this model of trees is
15 not simply specific to the data structures used to
16 implement SBP -- the parser assumes that everything it
17 deals with is a Token, Tree, or Forest (see below).
19 Forest -- a set of trees. There is an implicit assumption here that
20 a Forest can be represented efficiently when its
21 constituent trees share supertrees and subtrees.
23 SBP does not yet commit to a specific model of Forests,
24 and the API exposes little more than the ability to expand
25 a Forest into a set of Trees. This may change.
27 One key issue to resolve is how to efficiently represent
28 trees with a huge number of children where only a subset
29 of these children are shared (this effectively requires a
30 Tomita-style SPPF just for the child-list). Currently
31 this issue is sidestepped by the fact that the "raw" parse
32 forest will never have a tree of arity greater than the
33 longest production rule (and therefore not unreasonably
34 long); the "unfolding" transformation used to create
35 arrays out of EBNF-repetition productions takes place
36 during the Forest-to-TreeSet expansion. Clearly this is
37 not an ideal long-term solution.
39 Element -- a pattern which:
41 1. matches a sequence of zero or more Trees
42 2. uses the matched Trees to create a sequence of output Forests
44 The restriction that the result of an Element can only be
45 zero or one Trees (rather than zero or more) is one which
48 Atom -- an Element which matches exactly one Token. An Atom is
49 effectively a (possibly infinite) set of Tokens; matching
52 Sequence -- an Element which matches the juxtaposition of zero or
53 more sub-Elements, and which creates its output Tree by
54 composing the output Trees of those sub-Elements.
56 Position -- within a specific Sequence, a Position is the
57 imaginary space before or after one of the elements
58 (ie at the beginning of the sequence, at the end, or
59 between two elements). Positions correspond exactly
60 to the automata-theoretic notion of an LR "item", but
61 "item" is such a horribly chosen and vague name that
62 I just couldn't bring myself to perpetuate it. Note
63 that Positions and Locations have no relationship to
66 Union -- an Element which matches exactly one of a set of one or
67 more sub-Elements; its output forest is union
69 Location -- a specific location in the input tree. This notion
70 still needs to be formalized; for example, not all
71 nodes in the output Forest have a sensible notion of
75 ______________________________________________________________________________
78 The following notation is used for Tree<String> (Trees of Strings); it
79 can be generalized to Tree<X> where X maps onto Strings (ie
82 Tree ::= Head // equivalent to Head:{}
83 | "{" Body "}" // equivalent to "":{}
84 | Head ":" "{" Body "}"
85 | Head ":" (Tree) // ":" is right-associative
87 Head ::= "\"\"" // the empty string
88 | "\"" [~\"]+ "\"" // Strings containing spaces, braces, quotes, or colons
89 | [~\":{}\r\n\ ]+ // non-empty strings of "normal" characters
91 Body ::= Tree */ (" "+) // zero or more Trees separated by whitespace
107 ______________________________________________________________________________
110 This has not yet been finalized