The public APIs in this package are stable; package-private
APIs and all other packages are subject to change in future
releases.
Be sure to read doc/jargon.txt and the description below.
This package forms the stable core of the SBP API Classes fall into
five categories:
- Elements of the grammar -- the
pieces from which a grammar is composed.
- Input, Location, and Region -- the
input to be parsed, as well as classes for describing
locations and regions of that input.
- Parser -- the engine that actually performs the parsing
process.
- Trees and Forests -- used to
represent the output of the parsing process.
- Exceptions.
Theory of Operation
The input that you parse is considered to be a stream of
Tokens; this stream is represented by an
Input<Token>. In order to create this Input,
you must first decide what kind of tokens you want to parse. Based on
this decision, you should then implement subclasses of Input,
Parser, and Atom for that token type. If you are
parsing characters (which you usually are), these subclasses are
provided in the edu.berkeley.sbp.chr.* package so you don't
have to write them yourself.
You then create a grammar by instantiating objects belonging to your
subclass of Atom and forming them into sequences using
Sequence.create___() and new Union().
Ultimately you will wind up with an instance of Union
corresponding to the "start nonterminal" of your grammar. You can
then provide this Union to the constructor of your
Parser subclass and invoke the Parser.parse(Input)
method on the Input to be parsed.
The result will be a Forest, which is an efficient
representation of a set of one or more trees that may share subtrees.
If the parse was ambiguous, you can use
Forest.expand(HashSet) to expand the Forest into all the
possible trees (there is not yet a stable API for inspecting the
Forest directly).
If the parse was not ambiguous, you can call
Forest.expand1() to return the single possible parsing as a
Tree. You would then typically use the methods of the
Tree class to examine the parse tree.
Guide to the API
Example
package edu.berkeley.sbp.misc;
import edu.berkeley.sbp.*;
public class Demo2 {
private static Atom atom(char c) {
return new edu.berkeley.sbp.chr.CharAtom(c); }
private static Atom atom(char c1, char c2) {
return new edu.berkeley.sbp.chr.CharAtom(c1, c2); }
public static void main(String[] s) throws Exception {
Union expr = new Union("Expr");
Element[] add = new Element[] { expr, atom('+'), expr };
Element[] mult = new Element[] { expr, atom('*'), expr };
Element[] paren = new Element[] { atom('('), expr, atom(')') };
Sequence addSequence = Sequence.create("add", add, null, false);
Sequence multSequence = Sequence.create("mult", mult, null, false);
// uncomment this line to disambiguate
//multSequence = multSequence.andnot(Sequence.create("add", add, null, false));
expr.add(Sequence.create(paren, 1));
expr.add(addSequence);
expr.add(multSequence);
expr.add(Sequence.create(atom('0', '9')));
String input = "8+(1+3)*7";
System.out.println("input: \""+input+"\"");
StringBuffer sb = new StringBuffer();
expr.toString(sb);
System.out.println("grammar: \n"+sb);
Forest f = new edu.berkeley.sbp.chr.CharParser(expr).parse(input);
System.out.println("output: "+f.expand1().toPrettyString());
}
}
Executing this code gives the following:
java -Xmx900m -cp edu.berkeley.sbp.jar edu.berkeley.sbp.misc.Demo2
input: "8+(1+3)*7"
grammar:
Expr = [(] Expr [)]
| "add":: Expr [+] Expr
| "mult":: Expr [*] Expr
| [0-9]
Exception in thread "main" unresolved ambiguity; shared subtrees are shown as "*"
possibility: mult:{add:{* * *} * *}
possibility: add:{* * mult:{* * *}}
If we uncomment the line in the example, the result is:
java -Xmx900m -cp edu.berkeley.sbp.jar edu.berkeley.sbp.misc.Demo2
input: "8+(1+3)*7"
grammar:
Expr = [(] Expr [)]
| "add":: Expr [+] Expr
| "mult":: Expr [*] Expr &~ "add":: Expr [+] Expr
| [0-9]
output: add:{8 + mult:{add:{1 + 3} * 7}}