X-Git-Url: http://git.megacz.com/?p=sbp.git;a=blobdiff_plain;f=TODO;h=f8746d0b5f6ab9e0c92924920000dfbfbf798943;hp=1d628025dfb76a84c08b695ad58aacc26f7c6650;hb=9680415a3e36240dcac947b3d4d1ad6b312c4fce;hpb=d684010debf9069af108981d9451932c15b1ec75

diff --git a/TODO b/TODO
index 1d62802..f8746d0 100644
--- a/TODO
+++ b/TODO
@@ -1,95 +1,116 @@
 _____________________________________________________________________________
 Immediately
 
-  - decent/better error messages
-      - fix the location stuff, it's broken
-
-  - copyright notices
-
-  - documentation
+  - clean up util package
+
+  - currently we GC the doomed stack when the parent dies... but
+    [ideally] we should also GC the parent when the doomed stack dies
+    if it was a positive conjunct.
+
+  - single-tree-return assumption
+     - have a way to let question marks not be tagged?
+     - "flat" sequences (no subtrees contain "::"s?) -- stringifiable
+     - make it so that we can have multi-result nonterminals
+       so long as they always appear under double-colons?
+       auto-insert the unwrap?
+
+  - get rid of Sequence.Singleton if possible
+
+  - use 'a'-'z' or 'a-z' instead of [a-z]?
+  - de-genericize?
+  - foo.add(x)
+    foo.add(y.andnot(x)) ==> this is broken
+      - distinguish Conjunct from Sequence?
+            => !(Conjunct instanceof Reducible)
+  - avoid building the parts of the tree that end up getting dropped
+      - is it worth adding an additional class of states for these?
+         - or perhaps just a runtime node marker (hasNonDroppedParent)
+      - "ambiguity modulo dropped fragments"?
+         - this may conceal highly inefficient grammars...
+  - double-check all the region logic
+  - automatically collect time statistics and display
 
 ______________________________________________________________________________
 v1.1
 
-  - finalize metagrammar and rdp-op's
+- use regression/least-squares/trend-prof to look for reductions whose
+  behavior is O(n^2)? (ie performed a number of times proportional to
+  the input consumed so far).
+
+- Optimizations:
+    - (x &~ y) => (x & z)
+    - string lookahead
+    - don't form result forests for negated productions
+    - early kills: (a &~ ... b ...) -> once "... b" is seen, "a" is dead
+
+  - MUST HAVE BETTER ERROR MESSAGES
+     - use for developing java15.g
+     - better ambiguity reporting
+        - colorized tree-diffs?
+        - graphviz?
+     - better toString() methods all around...
+
+  - Treewalker code compiler?
+  - detect and reject circular gramars
+  - skeleton generator?
+  - precedes restrictions ("<-")
+  - More topology untangling [later]
+  - grammar highlighting?
+  - Forest needs a "manual access" API (lifting makes this hard)
+  - rewriting language? multiple passes?
   - write some grammars
-      - Java grammar
+      - Java grammar (java15.g)
       - TeX (math?)
-      - URL (RFC)
       - RFC2822 (email message/headers)
-  - clean up the whole Walk situation (?)
+      - Wikipedia grammar (needs to be both lexerless and boolean)
 
 
 ______________________________________________________________________________
-Soon
-
-  - serialization of parse tables
-
-  - "ambiguity modulo dropped fragments"?
-       - can this be checked statically?
-       - eliminated statically?
+Features
 
-  - substring parsing for better error messages
-  - "lift" cases:
-      - right now I can only lift the last child in a forest...  begs
-        the question of what the right representation for Forests is
-        if we need to be able to do lift operations on it.
+  - try harder to "fuse" states together along two dimensions:
+     - identical (equivalent) states, or states that subsume each other
+     - unnecessary intermediate states ("short cut" GLR)
 
+  - substring parsing 
+      - better error messages
+      - Rekers & Koorn note that this can do really elegant and generalized "autocompletion".
   - Parameterized LR
   - "Regular Right Part" grammars (NP Chapman, etc)
   - Attribute unification
+      - Partly-Linear-PATR? (O(n^6) unification grammar)
 
-  - inference of rejections for literals
-  - "prefer whitespace higher up" (?)
+  - optional "prefer whitespace higher up" disambiguation heuristic
 
+  - Incremental parse table construction
+
+  - "lazy GLR" and "lazy trees" -> language with first-class CF matching
+      - perhaps linear boolean grammars instead? (linear time, quad space)
+
+  - Followed-by and not-followed-by predicates of arbitrary length
+      - expands the grammar beyond Boolean LR...
+      - requires *very* smart garbage collection
 
 ______________________________________________________________________________
-Later
+Optimizations
+
+  - understand and implement the RNGLR "kernel state" optimization.
+    The _Practical Early Parsing_ paper may help.
 
-  - Partly-Linear-PATR? (O(n^6) unification grammar)
+  - implement Johnstone's algorithm for "reduced, resolved LR
+    tables" to eliminate superfluous reductions on
+    epsilon-transitions.
 
   - Implement a k-token peek buffer (for each state, see if it "dead
     ends" during the next k Phases based solely on state -- ignoring
     result SPPF)
 
-  - Arrange for the SPPF corresponding to dropped subtrees to never be
-    generated (or merged, etc)
-
   - Is there any way we can avoid creating a GSS.Node instance for
     nodes which are transient in the sense that they have only one
     eligible reduction?
 
   - Re-read Rekers, particularly the stuff on optimal sharing
 
-  - Isolate the Element objects from Parse.Table/GSS so we can move
-    towards compilation.
-
-  - consider allowing a Forest.Body to represent some other Tree whose
-    Body's should be [recursively] considered part of this Forest.
-
-      - perhaps not: right now we have a nice situation where
-        Forest.Ref instances become immutable once iterator()ed.  This
-        also gives us a strong place to to culling with the certainty
-        that we won't throw out a Body which would later be salvaged
-        by some yet-to-be-added dependency.
-
-  - Figure out if there is a way to:
-
-      - allow unwrapping of children other than the very last one.
-
-      - fold repetitions into an array form in Forest, before
-        conversion to Tree.  The major problem here is that multiple
-        tree-arrays are possible, all of different lengths.  Worse,
-        even if they're all the same length, not all elements belong
-        in the same "possibility vector" as all others.  You
-        essentially need a GSS to represent the array, which perhaps
-        is what the unfolded form was in the first place.
-
-  - Wikipedia grammar (needs to be both lexerless and boolean)
-
-  - Boolean Parsing
-      => Ordered Choice (";" operator)
-
   - bring back in parse-table phase resolution of precedence (just
     like associativity).  This can be inferred from the use of ">"
     when the rules are in one of these special forms:
@@ -112,26 +133,3 @@ Later
     form (try to prove this, you probably can), but it's "what people
     intend" most of the time.
 
-  - implement Johnstone's algorithm for "reduced, resolved LR
-    tables" to eliminate superfluous reductions on
-    epsilon-transitions.
-
-______________________________________________________________________________
-Neat Ideas
-
-  - Rekers & Koorn note that GLR Substring Parsing can be used to do
-    really elegant and generalized "autocompletion".
-
-
-______________________________________________________________________________
-Ideas for the Future
-
-- Incremental parse table construction
-- "lazy GLR" and "lazy trees" -> language with first-class CF matching
-    - perhaps linear boolean grammars instead? (linear time, quad space)
-- Forest parsing => chained parsers
-- unification parsing, attributes, etc
-- RRP grammars?
-- Take another stab at maximal-match?  Nonterminal not-followed-by is
-  too strong.
-- Error recovery based on substring parsing