X-Git-Url: http://git.megacz.com/?p=sbp.git;a=blobdiff_plain;f=TODO;h=8aa79409fc613206d3442286393638a407b3e928;hp=daf3667f5e3c10aa0f602930ae2165abf91a6d49;hb=d27d65cb994c6b9ae57a39f4bec99dd75c6bd6f7;hpb=dc9bb3a45ed306e2e35549076842b3e74efecb48

diff --git a/TODO b/TODO
index daf3667..8aa7940 100644
--- a/TODO
+++ b/TODO
@@ -1,137 +1,111 @@
 _____________________________________________________________________________
 Immediately
+
+  - make sure I'm achieving optimal sharing (zipping heads and tails)
+
+  - clean up util package
+
+  - serializable parse tables?
+     - all that is required now is to separate Pos and Position
+
+  - currently we GC the doomed stack when the parent dies... but we
+    should also GC the parent when the doomed stack dies if it was a
+    positive conjunct.
+
+  - single-tree-return assumption
+     - have a way to let question marks not be tagged?
+     - "flat" sequences (no subtrees contain "::"s?) -- stringifiable
+     - make it so that we can have multi-result nonterminals
+       so long as they always appear under double-colons?
+       auto-insert the unwrap?
+
+  - get rid of Sequence.Singleton if possible
+
   - use 'a'-'z' or 'a-z' instead of [a-z]?
-  - EOF token?
   - de-genericize?
-  - better toString() methods all around...
   - foo.add(x)
     foo.add(y.andnot(x)) ==> this is broken
       - distinguish Conjunct from Sequence?
             => !(Conjunct instanceof Reducible)
-  - document the assumption that Sequences that match epsilon
-    must have tag, and that ONLY that tag is returned
-    when the sequence matches epsilon
-  - try to avoid building the parts of the tree that end up getting
-    dropped
+  - avoid building the parts of the tree that end up getting dropped
+      - is it worth adding an additional class of states for these?
+         - or perhaps just a runtime node marker (hasNonDroppedParent)
+      - "ambiguity modulo dropped fragments"?
+         - this may conceal highly inefficient grammars...
   - double-check all the region logic
-
-  ..................................................
-
-  - paper/techreport opportunities
-      - interaction between RNGLR and follow restrictions
-      - "doomed node" optimization
-
   - automatically collect time statistics and display
-  - serializable parse tables?
-  - better ambiguity reporting
-      - colorized tree-diffs?
-      - graphviz?
 
 ______________________________________________________________________________
 v1.1
 
+  - MUST HAVE BETTER ERROR MESSAGES
+     - use for developing java15.g
+     - better ambiguity reporting
+        - colorized tree-diffs?
+        - graphviz?
+     - better toString() methods all around...
+
   - Treewalker code compiler?
-  - circular gramars?
-      s = A
-      A = A | "b"
+  - detect and reject circular gramars
   - skeleton generator?
   - precedes restrictions ("<-")
-  - MUST HAVE BETTER ERROR MESSAGES
-     - use for developing java15.g
-  - java15.g
-     - once this is ready, do big announcement
-  - broader regression testing (for stuff like error messages, etc)
   - More topology untangling [later]
   - grammar highlighting?
-  - Forest needs a "manual access" API
-      - the unwrap bit in Forest makes it really hard
-        to expose an API for forests
+  - Forest needs a "manual access" API (lifting makes this hard)
   - rewriting language? multiple passes?
-
-______________________________________________________________________________
-v1.2
-
-  - finalize metagrammar and rdp-op's
   - write some grammars
-      - Java grammar
+      - Java grammar (java15.g)
       - TeX (math?)
-      - URL (RFC)
       - RFC2822 (email message/headers)
-  - clean up the whole Walk situation (?)
+      - Wikipedia grammar (needs to be both lexerless and boolean)
 
 
 ______________________________________________________________________________
-Soon
-
-  - serialization of parse tables
+Features
 
-  - "ambiguity modulo dropped fragments"?
-       - can this be checked statically?
-       - eliminated statically?
-
-  - substring parsing for better error messages
+  - try harder to "fuse" states together along two dimensions:
+     - identical (equivalent) states, or states that subsume each other
+     - unnecessary intermediate states ("short cut" GLR)
 
+  - substring parsing 
+      - better error messages
+      - Rekers & Koorn note that this can do really elegant and generalized "autocompletion".
   - Parameterized LR
   - "Regular Right Part" grammars (NP Chapman, etc)
   - Attribute unification
+      - Partly-Linear-PATR? (O(n^6) unification grammar)
+
+  - optional "prefer whitespace higher up" disambiguation heuristic
 
-  - inference of rejections for literals
-  - "prefer whitespace higher up" (?)
+  - Incremental parse table construction
 
-  - Labeled edges on trees (associate a label with each slot in the
-    child array in Forest.Body?  might make equality tough) --
-    equivalent to Feature Structures.  Colon-labeling.
+  - "lazy GLR" and "lazy trees" -> language with first-class CF matching
+      - perhaps linear boolean grammars instead? (linear time, quad space)
+
+  - Followed-by and not-followed-by predicates of arbitrary length
+      - expands the grammar beyond Boolean LR...
+      - requires *very* smart garbage collection
 
 ______________________________________________________________________________
-Later
+Optimizations
 
   - understand and implement the RNGLR "kernel state" optimization.
     The _Practical Early Parsing_ paper may help.
 
-  - Partly-Linear-PATR? (O(n^6) unification grammar)
+  - implement Johnstone's algorithm for "reduced, resolved LR
+    tables" to eliminate superfluous reductions on
+    epsilon-transitions.
 
   - Implement a k-token peek buffer (for each state, see if it "dead
     ends" during the next k Phases based solely on state -- ignoring
     result SPPF)
 
-  - Arrange for the SPPF corresponding to dropped subtrees to never be
-    generated (or merged, etc)
-
   - Is there any way we can avoid creating a GSS.Node instance for
     nodes which are transient in the sense that they have only one
     eligible reduction?
 
   - Re-read Rekers, particularly the stuff on optimal sharing
 
-  - Isolate the Element objects from Parse.Table/GSS so we can move
-    towards compilation.
-
-  - consider allowing a Forest.Body to represent some other Tree whose
-    Body's should be [recursively] considered part of this Forest.
-
-      - perhaps not: right now we have a nice situation where
-        Forest.Ref instances become immutable once iterator()ed.  This
-        also gives us a strong place to to culling with the certainty
-        that we won't throw out a Body which would later be salvaged
-        by some yet-to-be-added dependency.
-
-  - Figure out if there is a way to:
-
-      - allow unwrapping of children other than the very last one.
-
-      - fold repetitions into an array form in Forest, before
-        conversion to Tree.  The major problem here is that multiple
-        tree-arrays are possible, all of different lengths.  Worse,
-        even if they're all the same length, not all elements belong
-        in the same "possibility vector" as all others.  You
-        essentially need a GSS to represent the array, which perhaps
-        is what the unfolded form was in the first place.
-
-  - Wikipedia grammar (needs to be both lexerless and boolean)
-
-  - Boolean Parsing
-      => Ordered Choice (";" operator)
-
   - bring back in parse-table phase resolution of precedence (just
     like associativity).  This can be inferred from the use of ">"
     when the rules are in one of these special forms:
@@ -154,26 +128,3 @@ Later
     form (try to prove this, you probably can), but it's "what people
     intend" most of the time.
 
-  - implement Johnstone's algorithm for "reduced, resolved LR
-    tables" to eliminate superfluous reductions on
-    epsilon-transitions.
-
-______________________________________________________________________________
-Neat Ideas
-
-  - Rekers & Koorn note that GLR Substring Parsing can be used to do
-    really elegant and generalized "autocompletion".
-
-
-______________________________________________________________________________
-Ideas for the Future
-
-- Incremental parse table construction
-- "lazy GLR" and "lazy trees" -> language with first-class CF matching
-    - perhaps linear boolean grammars instead? (linear time, quad space)
-- Forest parsing => chained parsers
-- unification parsing, attributes, etc
-- RRP grammars?
-- Take another stab at maximal-match?  Nonterminal not-followed-by is
-  too strong.
-- Error recovery based on substring parsing