add profile target to Makefile (for Adam)

[sbp.git] / TODO
diff --git a/TODO b/TODO

index c7ad150..f8746d0 100644 (file)
--- a/TODO
+++ b/TODO
@@ -1,105 +1,116 @@
  _____________________________________________________________________________
  Immediately
  
-  - Performance
-
-     - Forest: keep() and valid() -- can we do this with states
-       rather than subtrees?
-
-     - hash Long->long: it's all bogus
-
-  * pick back up cleaning up end of Parser.java (Reduction)
-
-  - [more] sensible tree-printout
-
-  - revamp Tib.Block (do it all in the parser using indent/dedent?)
-
-  - more natural phrasing of metagrammar?
-  - finalize metagrammar and rdp-op's
-
-  - should Union.add() be there?
-  - should Atom.top() be there?
-
-  - decent/better error messages
-      - fix the location stuff, it's broken
+  - clean up util package
+
+  - currently we GC the doomed stack when the parent dies... but
+    [ideally] we should also GC the parent when the doomed stack dies
+    if it was a positive conjunct.
+
+  - single-tree-return assumption
+     - have a way to let question marks not be tagged?
+     - "flat" sequences (no subtrees contain "::"s?) -- stringifiable
+     - make it so that we can have multi-result nonterminals
+       so long as they always appear under double-colons?
+       auto-insert the unwrap?
+
+  - get rid of Sequence.Singleton if possible
+
+  - use 'a'-'z' or 'a-z' instead of [a-z]?
+  - de-genericize?
+  - foo.add(x)
+    foo.add(y.andnot(x)) ==> this is broken
+      - distinguish Conjunct from Sequence?
+            => !(Conjunct instanceof Reducible)
+  - avoid building the parts of the tree that end up getting dropped
+      - is it worth adding an additional class of states for these?
+         - or perhaps just a runtime node marker (hasNonDroppedParent)
+      - "ambiguity modulo dropped fragments"?
+         - this may conceal highly inefficient grammars...
+  - double-check all the region logic
+  - automatically collect time statistics and display
  
+______________________________________________________________________________
+v1.1
+
+- use regression/least-squares/trend-prof to look for reductions whose
+  behavior is O(n^2)? (ie performed a number of times proportional to
+  the input consumed so far).
+
+- Optimizations:
+    - (x &~ y) => (x & z)
+    - string lookahead
+    - don't form result forests for negated productions
+    - early kills: (a &~ ... b ...) -> once "... b" is seen, "a" is dead
+
+  - MUST HAVE BETTER ERROR MESSAGES
+     - use for developing java15.g
+     - better ambiguity reporting
+        - colorized tree-diffs?
+        - graphviz?
+     - better toString() methods all around...
+
+  - Treewalker code compiler?
+  - detect and reject circular gramars
+  - skeleton generator?
+  - precedes restrictions ("<-")
+  - More topology untangling [later]
+  - grammar highlighting?
+  - Forest needs a "manual access" API (lifting makes this hard)
+  - rewriting language? multiple passes?
    - write some grammars
-      - Java grammar
+      - Java grammar (java15.g)
        - TeX (math?)
-      - URL (RFC)
        - RFC2822 (email message/headers)
+      - Wikipedia grammar (needs to be both lexerless and boolean)
  
  
  ______________________________________________________________________________
-Soon
+Features
  
-  - substring parsing for better error messages
-
-  - clean up the whole Walk situation
-
-  - "lift" cases:
-      - right now I can only lift the last child in a forest...  begs
-        the question of what the right representation for Forests is
-        if we need to be able to do lift operations on it.
+  - try harder to "fuse" states together along two dimensions:
+     - identical (equivalent) states, or states that subsume each other
+     - unnecessary intermediate states ("short cut" GLR)
  
+  - substring parsing 
+      - better error messages
+      - Rekers & Koorn note that this can do really elegant and generalized "autocompletion".
    - Parameterized LR
    - "Regular Right Part" grammars (NP Chapman, etc)
    - Attribute unification
+      - Partly-Linear-PATR? (O(n^6) unification grammar)
+
+  - optional "prefer whitespace higher up" disambiguation heuristic
+
+  - Incremental parse table construction
+
+  - "lazy GLR" and "lazy trees" -> language with first-class CF matching
+      - perhaps linear boolean grammars instead? (linear time, quad space)
  
-  - serialization of parse tables
-  - inference of rejections for literals
-  - "prefer whitespace higher up" (?)
-  - "ambiguity modulo dropped fragments"?
-       - can this be checked statically?
-       - eliminated statically?
+  - Followed-by and not-followed-by predicates of arbitrary length
+      - expands the grammar beyond Boolean LR...
+      - requires *very* smart garbage collection
  
  ______________________________________________________________________________
-Later
+Optimizations
  
-  - Partly-Linear-PATR? (O(n^6) unification grammar)
+  - understand and implement the RNGLR "kernel state" optimization.
+    The _Practical Early Parsing_ paper may help.
+
+  - implement Johnstone's algorithm for "reduced, resolved LR
+    tables" to eliminate superfluous reductions on
+    epsilon-transitions.
  
    - Implement a k-token peek buffer (for each state, see if it "dead
      ends" during the next k Phases based solely on state -- ignoring
      result SPPF)
  
-  - Arrange for the SPPF corresponding to dropped subtrees to never be
-    generated (or merged, etc)
-
    - Is there any way we can avoid creating a GSS.Node instance for
      nodes which are transient in the sense that they have only one
      eligible reduction?
  
    - Re-read Rekers, particularly the stuff on optimal sharing
  
-  - Isolate the Element objects from Parse.Table/GSS so we can move
-    towards compilation.
-
-  - consider allowing a Forest.Body to represent some other Tree whose
-    Body's should be [recursively] considered part of this Forest.
-
-      - perhaps not: right now we have a nice situation where
-        Forest.Ref instances become immutable once iterator()ed.  This
-        also gives us a strong place to to culling with the certainty
-        that we won't throw out a Body which would later be salvaged
-        by some yet-to-be-added dependency.
-
-  - Figure out if there is a way to:
-
-      - allow unwrapping of children other than the very last one.
-
-      - fold repetitions into an array form in Forest, before
-        conversion to Tree.  The major problem here is that multiple
-        tree-arrays are possible, all of different lengths.  Worse,
-        even if they're all the same length, not all elements belong
-        in the same "possibility vector" as all others.  You
-        essentially need a GSS to represent the array, which perhaps
-        is what the unfolded form was in the first place.
-
-  - Wikipedia grammar (needs to be both lexerless and boolean)
-
-  - Boolean Parsing
-      => Ordered Choice (";" operator)
-
    - bring back in parse-table phase resolution of precedence (just
      like associativity).  This can be inferred from the use of ">"
      when the rules are in one of these special forms:
@@ -122,26 +133,3 @@ Later
      form (try to prove this, you probably can), but it's "what people
      intend" most of the time.
  
-  - implement Johnstone's algorithm for "reduced, resolved LR
-    tables" to eliminate superfluous reductions on
-    epsilon-transitions.
-
-______________________________________________________________________________
-Neat Ideas
-
-  - Rekers & Koorn note that GLR Substring Parsing can be used to do
-    really elegant and generalized "autocompletion".
-
-
-______________________________________________________________________________
-Ideas for the Future
-
-- Incremental parse table construction
-- "lazy GLR" and "lazy trees" -> language with first-class CF matching
-    - perhaps linear boolean grammars instead? (linear time, quad space)
-- Forest parsing => chained parsers
-- unification parsing, attributes, etc
-- RRP grammars?
-- Take another stab at maximal-match?  Nonterminal not-followed-by is
-  too strong.
-- Error recovery based on substring parsing