2 <head><title>The Scannerless Boolean Parser (SBP)</title>
8 font-family: helvetica, verdana, arial, sans-serif;
13 border-top-width: 2pt;
14 border-top-style: solid;
18 font-family: helvetica, verdana, arial, sans-serif;
25 font-family: helvetica, verdana, arial, sans-serif;
31 font-family: helvetica, verdana, arial, sans-serif;
36 LI { margin-top: 5px; }
41 <center><table><tr><td width=600>
44 <font style='font-size:24pt; font-family:helvetica, verdana, arial, sans-serif'>
45 <b>SBP: the Scannerless Boolean Parser</b></font>
50 16-Aug: A new snapshot is <a href=../../edu.berkeley.sbp.tgz>here</a>.
56 <table width=500 style='background: #daa; padding: 10px'><tr><td>
57 <p style='padding: 5px; color:white; background: red; width:100%'><b>Update:</b> [29-July-2006]</font></p>
60 <a href=../../images/error.png>
61 <img align=right src=../../images/error.png width=200>
63 Error handling has been massively improved. Here's an example parsing
64 from a substantial portion of the <a href=../tests/java15.g>Java 1.5
65 grammar</a>. The <a href=../tests/java15.test>input</a> is missing a
66 closing angle-bracket on a generic type definition. Click on the
67 image to view <a href=../../images/error.png>full size</a>. Type
68 <tt>make java15</tt> after a checkout to try it yourself.
76 <table width=400><tr><td>
77 <font color=gray><b>Update:</b> [22-July-2006]<br><br>
79 The <a href=api/edu/berkeley/sbp/package-summary.html>API has been finalized</a> and includes a <a href=api/edu/berkeley/sbp/package-summary.html#package_description>decent example/mini-tutorial</a>.
86 <table width=400><tr><td>
87 <font color=gray><b>Update:</b></font> [17-July-2006]<br><br>
89 There is now a <a href=http://research.cs.berkeley.edu/project/sbp/list/>mailing list</a>.
96 <table width=400><tr><td>
97 <font color=gray><b>Update:</b> [05-July-2006]</font><br><br>
99 The reflective grammar-to-java bindings are complete, so SBP is now
100 vastly easier to use. You can find example code <a
101 href=../src/edu/berkeley/sbp/misc/Demo.java>here</a>
102 and the companion grammar <a
103 href=../tests/demo.g>here</a>.
110 The Scannerless Boolean Parser (SBP) is a scannerless parser for <a
111 href=http://www.cs.queensu.ca/home/okhotin/boolean/>boolean
112 grammars</a> (a superset of context-free grammars). It is written in
113 Java and emits Java source code.
115 <h1>What is interesting about it?</h1>
117 SBP deliberately sacrifices performance in favor of ease of extensibility.
120 Since it is an implementation of the (modified) <a
121 href=http://www.program-transformation.org/Sdf/GeneralizedLR>Lang-Tomita
122 GLR algorithm</a>, SBP supports all context-free languages.
126 href=http://en.wikipedia.org/wiki/Lexerless_parsing>scannerless</a>
127 (does not require a lexer). This allows it to easily handle languages
128 which have non-regular lexical structure or lack a clear lexer-parser
129 distinction, such as TeX, XML, RFC1738 (URLs), ASN.1, SMTP headers,
133 In addition to the juxtaposition and union operators provided in
134 context-free languages, SBP supports grammars which use the
135 intersection operator (<a
136 href=http://www.cs.queensu.ca/home/okhotin/conjunctive/>conjunctive
137 grammars</a>) and the complement operator (<a
138 href=http://www.cs.queensu.ca/home/okhotin/boolean/>boolean
141 <h1>What features does it have?</h1>
143 Features fully implemented are in <font color=green>green</font>;
144 those partially implemented are in <font color=orange>orange</font>;
145 those unimplemented (but planned) are in <font color=red>red</font>.
147 <ul> <li> <b>An implementation of the Lang-Tomita GLR parsing algorithm</b>
149 <li> Including <font color=green>Johnstone & Scott's RNGLR algorithm</font> for epsilon-productions</a>
151 <li> <a href=http://citeseer.ist.psu.edu/vandenbrand02disambiguation.html><font color=green>Visser's</font> extensions</a>
152 for <font color=green>scannerless parsing</font>
153 <ul> <li> <font color=green>Follow</font>, <font color=green>Avoid, Prefer</font>, <font color=green>Reject</font> constraints
154 <li> <font color=green>Character ranges</font>
155 <li> Automatic insertion of <font color=green>whitespace/comments</font>
158 <li> <font color=green>Any topological space</font> can be
159 used as an alphabet (need not be discrete)
160 <ul> <li> <font color=green>Unicode</font>
161 <li> <font color=orange>Trees</font>
164 <li> <font color=green>Associativity constraints</font> on <font color=green><i>n</i>-ary operators</font>
168 <li> <b>Ability to parse a wide variety of grammars in
169 </b> O(n<sup>3</sup>) time:
172 <li> <font color=green>all context-free grammars</font>
174 <li> <font color=green>epsilon productions</font>, <font
175 color=green>included in the parse forest</font>
177 <li> <font color=green>circularities</font>, <font
178 color=red>included in the parse forest</font>.
180 <li> Regular expression operators (
181 <tt><font color=green>*</font></tt>,
182 <tt><font color=green>?</font></tt>,
183 <tt><font color=green>+</font></tt>
186 <li> <font color=green>conjunctive grammars</font>
187 (<font color=green>intersection</font> operator)
189 <li> <font color=orange>boolean grammars</font> (<font
190 color=green>intersection</font>, <font
191 color=green>intersect-with-complement</font>, and
192 <font color=orange>generalized-complement</font>)
196 <li> <b>Facilitates experimenting with grammars</b>
199 <li> <font color=green>Interpreted mode</font>, in which the
200 parse table is interpreted directly, eliminating the
201 need for a compiler and making it easier for grammars
202 to operate on grammars.
204 <li> <font color=green>Simple
205 <a href=api/edu/berkeley/sbp/package-summary.html>API</a></font>
206 makes it easy to generate, analyze, and modify grammars
210 <li> Components of a grammar (nonterminals,
211 productions, etc) <font
212 color=green>represented as objects</font>
213 <li> composite elements implement <font color=green><tt>Iterable<T></tt></font>
216 <li> <font color=red>Compiled mode</font>, in which Java
217 source code is emitted; compiling this code yields a
218 parser. The resulting parser is <i>much</i> faster.
224 <h1>What is it deliberately missing?</h1>
226 <ul> <li> Semantic actions; the only option is to return a parse forest.
228 <li> This keeps the grammar specification language-neutral.
229 <li> A grammar can, however, indicate that certain parts of the parse tree should be dropped.
233 <h1>What features would be nice to have?</h1>
236 <li> <strike>Drop Farshi's algorithm and use <a
237 href=http://doi.ieeecomputersociety.org/10.1109/HICSS.2002.994495>GRMLR</a></strike>.
238 <font color=green>Done!</font>
240 <li> An implementation of the <a
241 href=http://www.cs.berkeley.edu/~smcpeak/elkhound/sources/elkhound/algorithm.html>McPeak-Necula
242 optimization</a> for bounded-depth determinism.
244 <li> Lazy parse trees, to decrease the space requirements from
245 o(n) to o(1) [but still O(n)].
247 <li> Consider implementing <a
248 href=http://www.cs.uvic.ca/~nigelh/Publications/cc99-paper.pdf>
249 Aycock-Horspool</a> unrolling. Improves performance with
250 only highly localized increase in algorithmic complexity.
251 Subsumes many other optimizations.
255 <h1>What are the long term goals?</h1>
257 As we come to a more mature understanding of the pragmatic aspects of
258 boolean grammars, a long-term goal is to migrate support for these
259 features to existing high-performance GLR implementations (<a
260 href=http://www.cs.berkeley.edu/~smcpeak/elkhound/>Elkhound</a>, <a
261 href=http://www.delorie.com/gnu/docs/bison/bison_90.html>bison-glr</a>).
263 <h1>Where can I read more about it?</h1>
265 <ul> <li> The <a href=../README>README</a> file is the best place to start
266 <li> After that, be sure to read <a href=jargon.txt>jargon.txt</a>
268 href=api/edu/berkeley/sbp/package-summary.html>javadoc</a>
269 is the best description of the API
270 <li> There's a <a href=../tests/meta.g>tentative metagrammar</a>,
272 <li> You can also get <a href=osq.lunch.talk.pdf>slides</a>
273 from my talk at the OSQ Lunch on 02-Nov-2005, though some of
274 the stuff (specifically what SBP can and cannot do) is
276 <li> A <a href=preprint.pdf>preprint</a> of one of my conference
280 <h1>Where can I get it?</h1>
282 The color coding above accurately reflects the state of the
283 implementation (<font color=green>11-Dec-2005</font>). However, in its current state it is a
284 bit messy, and may require a bit of fiddling to get it to do what you
285 want. This situation should improve in the next few weeks as I am
286 done adding features (for now) and am currently focusing on
287 reliability, cleanliness, and performance.
290 SBP is available under the BSD license.
293 You can download a snapshot (<font color=green>11-Dec-2005</font>) <a
294 href=../../edu.berkeley.sbp.tgz>here</a>. The parser-generator
295 requires Java 1.5 or later; the Java code it emits <font
296 color=orange>should run on any Java 1.1+ JVM</font>. After unpacking
297 the archive, simply type <tt>make</tt> to compile SBP and run the
302 </td></tr></table></center>