The public APIs in this package are stable; package-private APIs and all other packages are subject to change in future releases.

Be sure to read doc/jargon.txt and the description below; there is also a faq a the end of this document.

This package forms the stable core of the SBP API Classes fall into five categories:

Theory of Operation

The input that you parse is considered to be a stream of Tokens; this stream is represented by an Input<Token>. In order to create this Input, you must first decide what kind of tokens you want to parse. Based on this decision, you should then implement subclasses of Input, Parser, and Atom for that token type. If you are parsing characters (which you usually are), these subclasses are provided in the edu.berkeley.sbp.chr.* package so you don't have to write them yourself.

You then create a grammar by instantiating objects belonging to your subclass of Atom and forming them into sequences using Sequence.create___() and new Union().

Ultimately you will wind up with an instance of Union corresponding to the "start nonterminal" of your grammar. You can then provide this Union to the constructor of your Parser subclass and invoke the Parser.parse(Input) method on the Input to be parsed.

The result will be a Forest, which is an efficient representation of a set of one or more trees that may share subtrees.

If the parse was ambiguous, you can use Forest.expand(HashSet) to expand the Forest into all the possible trees (there is not yet a stable API for inspecting the Forest directly).

If the parse was not ambiguous, you can call Forest.expand1() to return the single possible parsing as a Tree. You would then typically use the methods of the Tree class to examine the parse tree.

Guide to the API

Example

package edu.berkeley.sbp.misc;

import edu.berkeley.sbp.*;

public class Demo2 {

    private static Atom atom(char c) {
        return new edu.berkeley.sbp.chr.CharAtom(c); }
    private static Atom atom(char c1, char c2) {
        return new edu.berkeley.sbp.chr.CharAtom(c1, c2); }

    public static void main(String[] s) throws Exception {

        Union expr = new Union("Expr");

        Element[] add   = new Element[] { expr, atom('+'), expr };
        Element[] mult  = new Element[] { expr, atom('*'), expr };
        Element[] paren = new Element[] { atom('('), expr, atom(')') };

        Sequence addSequence = Sequence.create("add", add, null, false);
        Sequence multSequence = Sequence.create("mult", mult, null, false);

        // uncomment this line to disambiguate
        //multSequence = multSequence.andnot(Sequence.create("add", add, null, false));

        expr.add(Sequence.create(paren, 1));
        expr.add(addSequence);
        expr.add(multSequence);
        expr.add(Sequence.create(atom('0', '9')));

        String input = "8+(1+3)*7";

        System.out.println("input:  \""+input+"\"");

        StringBuffer sb = new StringBuffer();
        expr.toString(sb);
        System.out.println("grammar: \n"+sb);

        Forest f = new edu.berkeley.sbp.chr.CharParser(expr).parse(input);
        System.out.println("output: "+f.expand1().toPrettyString());
    }

}

java -Xmx900m -cp edu.berkeley.sbp.jar edu.berkeley.sbp.misc.Demo2
input:  "8+(1+3)*7"
grammar:
Expr            = [(] Expr [)]
                | "add":: Expr [+] Expr
                | "mult":: Expr [*] Expr
                | [0-9]

Exception in thread "main" unresolved ambiguity; shared subtrees are shown as "*"
  possibility: mult:{add:{* * *} * *}
  possibility: add:{* * mult:{* * *}}

java -Xmx900m -cp edu.berkeley.sbp.jar edu.berkeley.sbp.misc.Demo2
input:  "8+(1+3)*7"
grammar:
Expr            = [(] Expr [)]
                | "add":: Expr [+] Expr
                | "mult":: Expr [*] Expr &~ "add":: Expr [+] Expr
                | [0-9]

output: add:{8 + mult:{add:{1 + 3} * 7}}

FAQs

I get the error java.lang.Error: multiple non-dropped elements in sequence, what does this mean?

Note: this question deals with the package edu.berkeley.sbp.meta, which is not considered stable.

When using the class edu.berkeley.sbp.meta.Grammar, you must supply an instance of Grammar.Bindings; this instance tells SBP how to create a parse tree for an expression using the parse trees of its subexpressions.

SBP has no trouble determining what to do when parsing an expression that drops all of its subexpressions, or all but one -- for example:

... in this example, only C is "non-dropped". In this case, the result of parsing A is simply the result of parsing C.

However, if we were to leave more than one element un-dropped, SBP needs to know how to form a single tree out of the two non-dropped subtrees. There are two ways to do this. The simplest is to provide a tag -- a string which becomes the common parent of the two subtrees:

Expr = Mult:: Expr "*" Expr

If you are using AnnotationGrammarBindings, you can also deal with this situation by declaring a method/inner-class whose name matches the nonterminal (Expr) and has appropriate annotations. This is fairly advanced stuff, and the code it uses isn't quite as mature as the rest of the code.

Reporting Bugs

Bug reports are especially appreciated when you submit them as a test case (here's the grammar and some examples). This way we can add your bug report as part of the regression suite, and be sure we never release a new version in which the bug has crept back in!