The public APIs in this package are stable; package-private APIs and all other packages are subject to change in future releases.

Be sure to read doc/jargon.txt and the description below.

This package forms the stable core of the SBP API Classes fall into five categories:

Theory of Operation

The input that you parse is considered to be a stream of Tokens; this stream is represented by an Input<Token>. In order to create this Input, you must first decide what kind of tokens you want to parse. Based on this decision, you should then implement subclasses of Input, Parser, and Atom for that token type. If you are parsing characters (which you usually are), these subclasses are provided in the edu.berkeley.sbp.chr.* package so you don't have to write them yourself.

You then create a grammar by instantiating objects belonging to your subclass of Atom and forming them into sequences using Sequence.create___() and new Union().

Ultimately you will wind up with an instance of Union corresponding to the "start nonterminal" of your grammar. You can then provide this Union to the constructor of your Parser subclass and invoke the Parser.parse(Input) method on the Input to be parsed.

The result will be a Forest, which is an efficient representation of a set of one or more trees that may share subtrees.

If the parse was ambiguous, you can use Forest.expand(HashSet) to expand the Forest into all the possible trees (there is not yet a stable API for inspecting the Forest directly).

If the parse was not ambiguous, you can call Forest.expand1() to return the single possible parsing as a Tree. You would then typically use the methods of the Tree class to examine the parse tree.

Guide to the API

Example

package edu.berkeley.sbp.misc;

import edu.berkeley.sbp.*;

public class Demo2 {

    private static Atom atom(char c) {
        return new edu.berkeley.sbp.chr.CharAtom(c); }
    private static Atom atom(char c1char c2) {
        return new edu.berkeley.sbp.chr.CharAtom(c1, c2); }

    public static void main(String[] s) throws Exception {

        Union expr = new Union("Expr");

        Element[] add   = new Element[] { expr, atom('+'), expr };
        Element[] mult  = new Element[] { expr, atom('*'), expr };
        Element[] paren = new Element[] { atom('('), expr, atom(')') };

        Sequence addSequence = Sequence.create("add", add, nullfalse);
        Sequence multSequence = Sequence.create("mult", mult, nullfalse);

        // uncomment this line to disambiguate
        //multSequence = multSequence.andnot(Sequence.create("add", add, null, false));

        expr.add(Sequence.create(paren, 1));
        expr.add(addSequence);
        expr.add(multSequence);
        expr.add(Sequence.create(atom('0''9')));

        String input = "8+(1+3)*7";

        System.out.println("input:  \""+input+"\"");

        StringBuffer sb = new StringBuffer();
        expr.toString(sb);
        System.out.println("grammar: \n"+sb);

        Forest f = new edu.berkeley.sbp.chr.CharParser(expr).parse(input);
        System.out.println("output: "+f.expand1().toPrettyString());
    }

}

Executing this code gives the following:

java -Xmx900m -cp edu.berkeley.sbp.jar edu.berkeley.sbp.misc.Demo2
input:  "8+(1+3)*7"
grammar:
Expr            = [(] Expr [)]
                | "add":: Expr [+] Expr
                | "mult":: Expr [*] Expr
                | [0-9]

Exception in thread "main" unresolved ambiguity; shared subtrees are shown as "*"
  possibility: mult:{add:{* * *} * *}
  possibility: add:{* * mult:{* * *}}

If we uncomment the line in the example, the result is:

java -Xmx900m -cp edu.berkeley.sbp.jar edu.berkeley.sbp.misc.Demo2
input:  "8+(1+3)*7"
grammar: 
Expr            = [(] Expr [)] 
                | "add":: Expr [+] Expr 
                | "mult":: Expr [*] Expr &~ "add":: Expr [+] Expr 
                | [0-9] 

output: add:{8 + mult:{add:{1 + 3} * 7}}