From: adam <adam@megacz.com>
Date: Sun, 27 May 2007 20:24:57 +0000 (-0400)
Subject: TODO updates
X-Git-Url: http://git.megacz.com/?p=sbp.git;a=commitdiff_plain;h=78a166e98747ddeb79310ccba340f292fa8a6dca

TODO updates

darcs-hash:20070527202457-5007d-b7ffdb37194bc2f610b6badf0a770b60fb8f855f.gz
---

diff --git a/TODO b/TODO
index 230ec19..29ca7f0 100644
--- a/TODO
+++ b/TODO
@@ -1,138 +1,109 @@
 _____________________________________________________________________________
 Immediately
-  - check ability to use epsilon as a conjunct
+
+  - clean up util package
+
+  - serializable parse tables?
+     - all that is required now is to separate Pos and Position
+
+  - currently we GC the doomed stack when the parent dies... but we
+    should also GC the parent when the doomed stack dies if it was a
+    positive conjunct.
+
+  - single-tree-return assumption
+     - have a way to let question marks not be tagged?
+     - "flat" sequences (no subtrees contain "::"s?) -- stringifiable
+     - make it so that we can have multi-result nonterminals
+       so long as they always appear under double-colons?
+       auto-insert the unwrap?
+
+  - get rid of Sequence.Singleton if possible
+
   - use 'a'-'z' or 'a-z' instead of [a-z]?
-  - EOF token?
   - de-genericize?
-  - better toString() methods all around...
   - foo.add(x)
     foo.add(y.andnot(x)) ==> this is broken
       - distinguish Conjunct from Sequence?
             => !(Conjunct instanceof Reducible)
-  - document the assumption that Sequences that match epsilon
-    must have tag, and that ONLY that tag is returned
-    when the sequence matches epsilon
-  - try to avoid building the parts of the tree that end up getting
-    dropped
+  - avoid building the parts of the tree that end up getting dropped
+      - is it worth adding an additional class of states for these?
+         - or perhaps just a runtime node marker (hasNonDroppedParent)
+      - "ambiguity modulo dropped fragments"?
+         - this may conceal highly inefficient grammars...
   - double-check all the region logic
-
-  ..................................................
-
-  - paper/techreport opportunities
-      - interaction between RNGLR and follow restrictions
-      - "doomed node" optimization
-
   - automatically collect time statistics and display
-  - serializable parse tables?
-  - better ambiguity reporting
-      - colorized tree-diffs?
-      - graphviz?
 
 ______________________________________________________________________________
 v1.1
 
+  - MUST HAVE BETTER ERROR MESSAGES
+     - use for developing java15.g
+     - better ambiguity reporting
+        - colorized tree-diffs?
+        - graphviz?
+     - better toString() methods all around...
+
   - Treewalker code compiler?
-  - circular gramars?
-      s = A
-      A = A | "b"
+  - detect and reject circular gramars
   - skeleton generator?
   - precedes restrictions ("<-")
-  - MUST HAVE BETTER ERROR MESSAGES
-     - use for developing java15.g
-  - java15.g
-     - once this is ready, do big announcement
-  - broader regression testing (for stuff like error messages, etc)
   - More topology untangling [later]
   - grammar highlighting?
-  - Forest needs a "manual access" API
-      - the unwrap bit in Forest makes it really hard
-        to expose an API for forests
+  - Forest needs a "manual access" API (lifting makes this hard)
   - rewriting language? multiple passes?
-
-______________________________________________________________________________
-v1.2
-
-  - finalize metagrammar and rdp-op's
   - write some grammars
-      - Java grammar
+      - Java grammar (java15.g)
       - TeX (math?)
-      - URL (RFC)
       - RFC2822 (email message/headers)
-  - clean up the whole Walk situation (?)
+      - Wikipedia grammar (needs to be both lexerless and boolean)
 
 
 ______________________________________________________________________________
-Soon
-
-  - serialization of parse tables
+Features
 
-  - "ambiguity modulo dropped fragments"?
-       - can this be checked statically?
-       - eliminated statically?
-
-  - substring parsing for better error messages
+  - try harder to "fuse" states together along two dimensions:
+     - identical (equivalent) states, or states that subsume each other
+     - unnecessary intermediate states ("short cut" GLR)
 
+  - substring parsing 
+      - better error messages
+      - Rekers & Koorn note that this can do really elegant and generalized "autocompletion".
   - Parameterized LR
   - "Regular Right Part" grammars (NP Chapman, etc)
   - Attribute unification
+      - Partly-Linear-PATR? (O(n^6) unification grammar)
+
+  - optional "prefer whitespace higher up" disambiguation heuristic
 
-  - inference of rejections for literals
-  - "prefer whitespace higher up" (?)
+  - Incremental parse table construction
 
-  - Labeled edges on trees (associate a label with each slot in the
-    child array in Forest.Body?  might make equality tough) --
-    equivalent to Feature Structures.  Colon-labeling.
+  - "lazy GLR" and "lazy trees" -> language with first-class CF matching
+      - perhaps linear boolean grammars instead? (linear time, quad space)
+
+  - Followed-by and not-followed-by predicates of arbitrary length
+      - expands the grammar beyond Boolean LR...
+      - requires *very* smart garbage collection
 
 ______________________________________________________________________________
-Later
+Optimizations
 
   - understand and implement the RNGLR "kernel state" optimization.
     The _Practical Early Parsing_ paper may help.
 
-  - Partly-Linear-PATR? (O(n^6) unification grammar)
+  - implement Johnstone's algorithm for "reduced, resolved LR
+    tables" to eliminate superfluous reductions on
+    epsilon-transitions.
 
   - Implement a k-token peek buffer (for each state, see if it "dead
     ends" during the next k Phases based solely on state -- ignoring
     result SPPF)
 
-  - Arrange for the SPPF corresponding to dropped subtrees to never be
-    generated (or merged, etc)
-
   - Is there any way we can avoid creating a GSS.Node instance for
     nodes which are transient in the sense that they have only one
     eligible reduction?
 
   - Re-read Rekers, particularly the stuff on optimal sharing
 
-  - Isolate the Element objects from Parse.Table/GSS so we can move
-    towards compilation.
-
-  - consider allowing a Forest.Body to represent some other Tree whose
-    Body's should be [recursively] considered part of this Forest.
-
-      - perhaps not: right now we have a nice situation where
-        Forest.Ref instances become immutable once iterator()ed.  This
-        also gives us a strong place to to culling with the certainty
-        that we won't throw out a Body which would later be salvaged
-        by some yet-to-be-added dependency.
-
-  - Figure out if there is a way to:
-
-      - allow unwrapping of children other than the very last one.
-
-      - fold repetitions into an array form in Forest, before
-        conversion to Tree.  The major problem here is that multiple
-        tree-arrays are possible, all of different lengths.  Worse,
-        even if they're all the same length, not all elements belong
-        in the same "possibility vector" as all others.  You
-        essentially need a GSS to represent the array, which perhaps
-        is what the unfolded form was in the first place.
-
-  - Wikipedia grammar (needs to be both lexerless and boolean)
-
-  - Boolean Parsing
-      => Ordered Choice (";" operator)
-
   - bring back in parse-table phase resolution of precedence (just
     like associativity).  This can be inferred from the use of ">"
     when the rules are in one of these special forms:
@@ -155,26 +126,3 @@ Later
     form (try to prove this, you probably can), but it's "what people
     intend" most of the time.
 
-  - implement Johnstone's algorithm for "reduced, resolved LR
-    tables" to eliminate superfluous reductions on
-    epsilon-transitions.
-
-______________________________________________________________________________
-Neat Ideas
-
-  - Rekers & Koorn note that GLR Substring Parsing can be used to do
-    really elegant and generalized "autocompletion".
-
-
-______________________________________________________________________________
-Ideas for the Future
-
-- Incremental parse table construction
-- "lazy GLR" and "lazy trees" -> language with first-class CF matching
-    - perhaps linear boolean grammars instead? (linear time, quad space)
-- Forest parsing => chained parsers
-- unification parsing, attributes, etc
-- RRP grammars?
-- Take another stab at maximal-match?  Nonterminal not-followed-by is
-  too strong.
-- Error recovery based on substring parsing
diff --git a/src/edu/berkeley/sbp/Parser.java b/src/edu/berkeley/sbp/Parser.java
index 0b2229d..99e2d08 100644
--- a/src/edu/berkeley/sbp/Parser.java
+++ b/src/edu/berkeley/sbp/Parser.java
@@ -6,10 +6,6 @@ import edu.berkeley.sbp.Sequence.Position;
 import java.io.*;
 import java.util.*;
 
-// FEATURE: try harder to "fuse" states together along two dimensions:
-//   - identical (equivalent) states, or states that subsume each other
-//   - unnecessary intermediate states ("short cut" GLR)
-
 /** a parser which translates an Input&lt;Token&gt; into a Forest&lt;NodeType&gt; */
 public abstract class Parser<Token, NodeType> {
 
diff --git a/src/edu/berkeley/sbp/meta/GrammarAST.java b/src/edu/berkeley/sbp/meta/GrammarAST.java
index 63c4392..ca236e9 100644
--- a/src/edu/berkeley/sbp/meta/GrammarAST.java
+++ b/src/edu/berkeley/sbp/meta/GrammarAST.java
@@ -10,26 +10,11 @@ import java.lang.annotation.*;
 import java.lang.reflect.*;
 import java.io.*;
 
-// FIXME: deal with question-marks
-
-// FIXME: put back in associativity: ^")"
-//            A = A "->" (A)
-//        means that the FIRST "A" cannot match the SAME sequence in
-//        which the (A) occurs
-//        -- and add a test case
-
-// FIXME: make it so that we can have multi-result nonterminals
-//        so long as they always appear under double-colons?
-//        auto-insert the unwrap?
-
-// FIXME: put associativity back in
-
-// FIXME: "flat" sequences (no subtrees contain "::"s?)
-
-// freezing problem is related to comments in grammar files
-
-/** The java classes typically used to represent a parsed grammar AST; each inner class is a type of AST node. */
-public class GrammarBuilder {
+/**
+ *  The inner classes of this class represent nodes in the Abstract
+ *  Syntax Tree of a grammar.
+ */
+public class GrammarAST {
 
     /**
      *  Create a grammar from a parse tree and binding resolver