minor: missed name change

[sbp.git] / TODO
diff --git a/TODO b/TODO

index 9c1b408..ecaed94 100644 (file)
--- a/TODO
+++ b/TODO
@@ -1,103 +1,119 @@
  _____________________________________________________________________________
  Immediately
  
  _____________________________________________________________________________
  Immediately
  
-  - Lay down the law on the different kinds of Sequence productions
-    and how they work.
-
-     => mydrop
-     => mylift
-
-  - whitespace-in-braces?
-  - Deal with the problem of zero-rep productions and whitespace insertion
-
-  - switch maximal to not-followed-by (~/~)
-
-  - should Union.add() be there?
-  - should Atom.top() be there?
-
-  - fix the location stuff, it's broken
-  - decent/better error messages
+  - keywordification (ie globally reject from all productions?
+  - generalized follow-by?)
+
+  - clean up util package
+
+  - currently we GC the doomed stack when the parent dies... but
+    [ideally] we should also GC the parent when the doomed stack dies
+    if it was a positive conjunct.
+
+  - single-tree-return assumption
+     - have a way to let question marks not be tagged?
+     - "flat" sequences (no subtrees contain "::"s?) -- stringifiable
+     - make it so that we can have multi-result nonterminals
+       so long as they always appear under double-colons?
+       auto-insert the unwrap?
+
+  - get rid of Sequence.Singleton if possible
+
+  - use 'a'-'z' or 'a-z' instead of [a-z]?
+  - de-genericize?
+  - foo.add(x)
+    foo.add(y.andnot(x)) ==> this is broken
+      - distinguish Conjunct from Sequence?
+            => !(Conjunct instanceof Reducible)
+  - avoid building the parts of the tree that end up getting dropped
+      - is it worth adding an additional class of states for these?
+         - or perhaps just a runtime node marker (hasNonDroppedParent)
+      - "ambiguity modulo dropped fragments"?
+         - this may conceal highly inefficient grammars...
+  - double-check all the region logic
+  - automatically collect time statistics and display
  
  
+______________________________________________________________________________
+v1.1
+
+- use regression/least-squares/trend-prof to look for reductions whose
+  behavior is O(n^2)? (ie performed a number of times proportional to
+  the input consumed so far).
+
+- Optimizations:
+    - (x &~ y) => (x & z)
+    - string lookahead
+    - don't form result forests for negated productions
+    - early kills: (a &~ ... b ...) -> once "... b" is seen, "a" is dead
+
+  - MUST HAVE BETTER ERROR MESSAGES
+     - use for developing java15.g
+     - better ambiguity reporting
+        - colorized tree-diffs?
+        - graphviz?
+     - better toString() methods all around...
+
+  - Treewalker code compiler?
+  - detect and reject circular gramars
+  - skeleton generator?
+  - precedes restrictions ("<-")
+  - More topology untangling [later]
+  - grammar highlighting?
+  - Forest needs a "manual access" API (lifting makes this hard)
+  - rewriting language? multiple passes?
    - write some grammars
    - write some grammars
-      - Java grammar
+      - Java grammar (java15.g)
        - TeX (math?)
        - TeX (math?)
-      - URL (RFC)
        - RFC2822 (email message/headers)
        - RFC2822 (email message/headers)
-      - Wiki grammar
-
-______________________________________________________________________________
-Soon
+      - Wikipedia grammar (needs to be both lexerless and boolean)
  
  
-  - clean up the whole Walk situation
  
  
-  - cleaner solution to "maximal"?
+______________________________________________________________________________
+Features
  
  
-  - "lift" cases:
-      - right now I can only lift the last child in a forest...  begs
-        the question of what the right representation for Forests is
-        if we need to be able to do lift operations on it.
+  - try harder to "fuse" states together along two dimensions:
+     - identical (equivalent) states, or states that subsume each other
+     - unnecessary intermediate states ("short cut" GLR)
  
  
+  - substring parsing 
+      - better error messages
+      - Rekers & Koorn note that this can do really elegant and generalized "autocompletion".
    - Parameterized LR
    - "Regular Right Part" grammars (NP Chapman, etc)
    - Attribute unification
    - Parameterized LR
    - "Regular Right Part" grammars (NP Chapman, etc)
    - Attribute unification
+      - Partly-Linear-PATR? (O(n^6) unification grammar)
+
+  - optional "prefer whitespace higher up" disambiguation heuristic
+
+  - Incremental parse table construction
+
+  - "lazy GLR" and "lazy trees" -> language with first-class CF matching
+      - perhaps linear boolean grammars instead? (linear time, quad space)
  
  
-  - serialization of parse tables
-  - inference of rejections for literals
-  - "prefer whitespace higher up" (?)
-  - "ambiguity modulo dropped fragments"?
-       - can this be checked statically?
-       - eliminated statically?
+  - Followed-by and not-followed-by predicates of arbitrary length
+      - expands the grammar beyond Boolean LR...
+      - requires *very* smart garbage collection
  
  ______________________________________________________________________________
  
  ______________________________________________________________________________
-Later
+Optimizations
+
+  - understand and implement the RNGLR "kernel state" optimization.
+    The _Practical Early Parsing_ paper may help.
+
+  - implement Johnstone's algorithm for "reduced, resolved LR
+    tables" to eliminate superfluous reductions on
+    epsilon-transitions.
  
    - Implement a k-token peek buffer (for each state, see if it "dead
      ends" during the next k Phases based solely on state -- ignoring
      result SPPF)
  
  
    - Implement a k-token peek buffer (for each state, see if it "dead
      ends" during the next k Phases based solely on state -- ignoring
      result SPPF)
  
-  - Arrange for the SPPF corresponding to dropped subtrees to never be
-    generated (or merged, etc)
-
    - Is there any way we can avoid creating a GSS.Node instance for
      nodes which are transient in the sense that they have only one
      eligible reduction?
  
    - Is there any way we can avoid creating a GSS.Node instance for
      nodes which are transient in the sense that they have only one
      eligible reduction?
  
-  - Implement "GLR syntactic predicates" -- the ability to do
-    arbitrary lookahead (ie "followed-by" and "not-followed-by" for
-    arbitrary patterns).  This enables generalized longest-match and
-    lets us drop the Maximal hack.
-
    - Re-read Rekers, particularly the stuff on optimal sharing
  
    - Re-read Rekers, particularly the stuff on optimal sharing
  
-  - Isolate the Element objects from Parse.Table/GSS so we can move
-    towards compilation.
-
-  - consider allowing a Forest.Body to represent some other Tree whose
-    Body's should be [recursively] considered part of this Forest.
-
-      - perhaps not: right now we have a nice situation where
-        Forest.Ref instances become immutable once iterator()ed.  This
-        also gives us a strong place to to culling with the certainty
-        that we won't throw out a Body which would later be salvaged
-        by some yet-to-be-added dependency.
-
-  - Figure out if there is a way to:
-
-      - allow unwrapping of children other than the very last one.
-
-      - fold repetitions into an array form in Forest, before
-        conversion to Tree.  The major problem here is that multiple
-        tree-arrays are possible, all of different lengths.  Worse,
-        even if they're all the same length, not all elements belong
-        in the same "possibility vector" as all others.  You
-        essentially need a GSS to represent the array, which perhaps
-        is what the unfolded form was in the first place.
-
-  - Wikipedia grammar (needs to be both lexerless and boolean)
-
-  - Boolean Parsing
-      => Ordered Choice (";" operator)
-
    - bring back in parse-table phase resolution of precedence (just
      like associativity).  This can be inferred from the use of ">"
      when the rules are in one of these special forms:
    - bring back in parse-table phase resolution of precedence (just
      like associativity).  This can be inferred from the use of ">"
      when the rules are in one of these special forms:
@@ -120,6 +136,3 @@ Later
      form (try to prove this, you probably can), but it's "what people
      intend" most of the time.
  
      form (try to prove this, you probably can), but it's "what people
      intend" most of the time.
  
-  - implement Johnstone's algorithm for "reduced, resolved LR
-    tables" to eliminate superfluous reductions on
-    epsilon-transitions.