_____________________________________________________________________________
Immediately
- - Performance
-
- - Forest: keep() and valid() -- can we do this with states
- rather than subtrees?
-
- - hash Long->long: it's all bogus
-
- * pick back up cleaning up end of Parser.java (Reduction)
-
- - [more] sensible tree-printout
-
- - revamp Tib.Block (do it all in the parser using indent/dedent?)
-
- - more natural phrasing of metagrammar?
- - finalize metagrammar and rdp-op's
-
- - should Union.add() be there?
- - should Atom.top() be there?
-
- - decent/better error messages
- - fix the location stuff, it's broken
+ - clean up util package
+
+ - currently we GC the doomed stack when the parent dies... but
+ [ideally] we should also GC the parent when the doomed stack dies
+ if it was a positive conjunct.
+
+ - single-tree-return assumption
+ - have a way to let question marks not be tagged?
+ - "flat" sequences (no subtrees contain "::"s?) -- stringifiable
+ - make it so that we can have multi-result nonterminals
+ so long as they always appear under double-colons?
+ auto-insert the unwrap?
+
+ - get rid of Sequence.Singleton if possible
+
+ - use 'a'-'z' or 'a-z' instead of [a-z]?
+ - de-genericize?
+ - foo.add(x)
+ foo.add(y.andnot(x)) ==> this is broken
+ - distinguish Conjunct from Sequence?
+ => !(Conjunct instanceof Reducible)
+ - avoid building the parts of the tree that end up getting dropped
+ - is it worth adding an additional class of states for these?
+ - or perhaps just a runtime node marker (hasNonDroppedParent)
+ - "ambiguity modulo dropped fragments"?
+ - this may conceal highly inefficient grammars...
+ - double-check all the region logic
+ - automatically collect time statistics and display
+______________________________________________________________________________
+v1.1
+
+- use regression/least-squares/trend-prof to look for reductions whose
+ behavior is O(n^2)? (ie performed a number of times proportional to
+ the input consumed so far).
+
+- Optimizations:
+ - (x &~ y) => (x & z)
+ - string lookahead
+ - don't form result forests for negated productions
+ - early kills: (a &~ ... b ...) -> once "... b" is seen, "a" is dead
+
+ - MUST HAVE BETTER ERROR MESSAGES
+ - use for developing java15.g
+ - better ambiguity reporting
+ - colorized tree-diffs?
+ - graphviz?
+ - better toString() methods all around...
+
+ - Treewalker code compiler?
+ - detect and reject circular gramars
+ - skeleton generator?
+ - precedes restrictions ("<-")
+ - More topology untangling [later]
+ - grammar highlighting?
+ - Forest needs a "manual access" API (lifting makes this hard)
+ - rewriting language? multiple passes?
- write some grammars
- - Java grammar
+ - Java grammar (java15.g)
- TeX (math?)
- - URL (RFC)
- RFC2822 (email message/headers)
+ - Wikipedia grammar (needs to be both lexerless and boolean)
______________________________________________________________________________
-Soon
+Features
- - substring parsing for better error messages
-
- - clean up the whole Walk situation
-
- - "lift" cases:
- - right now I can only lift the last child in a forest... begs
- the question of what the right representation for Forests is
- if we need to be able to do lift operations on it.
+ - try harder to "fuse" states together along two dimensions:
+ - identical (equivalent) states, or states that subsume each other
+ - unnecessary intermediate states ("short cut" GLR)
+ - substring parsing
+ - better error messages
+ - Rekers & Koorn note that this can do really elegant and generalized "autocompletion".
- Parameterized LR
- "Regular Right Part" grammars (NP Chapman, etc)
- Attribute unification
+ - Partly-Linear-PATR? (O(n^6) unification grammar)
+
+ - optional "prefer whitespace higher up" disambiguation heuristic
+
+ - Incremental parse table construction
+
+ - "lazy GLR" and "lazy trees" -> language with first-class CF matching
+ - perhaps linear boolean grammars instead? (linear time, quad space)
- - serialization of parse tables
- - inference of rejections for literals
- - "prefer whitespace higher up" (?)
- - "ambiguity modulo dropped fragments"?
- - can this be checked statically?
- - eliminated statically?
+ - Followed-by and not-followed-by predicates of arbitrary length
+ - expands the grammar beyond Boolean LR...
+ - requires *very* smart garbage collection
______________________________________________________________________________
-Later
+Optimizations
- - Partly-Linear-PATR? (O(n^6) unification grammar)
+ - understand and implement the RNGLR "kernel state" optimization.
+ The _Practical Early Parsing_ paper may help.
+
+ - implement Johnstone's algorithm for "reduced, resolved LR
+ tables" to eliminate superfluous reductions on
+ epsilon-transitions.
- Implement a k-token peek buffer (for each state, see if it "dead
ends" during the next k Phases based solely on state -- ignoring
result SPPF)
- - Arrange for the SPPF corresponding to dropped subtrees to never be
- generated (or merged, etc)
-
- Is there any way we can avoid creating a GSS.Node instance for
nodes which are transient in the sense that they have only one
eligible reduction?
- Re-read Rekers, particularly the stuff on optimal sharing
- - Isolate the Element objects from Parse.Table/GSS so we can move
- towards compilation.
-
- - consider allowing a Forest.Body to represent some other Tree whose
- Body's should be [recursively] considered part of this Forest.
-
- - perhaps not: right now we have a nice situation where
- Forest.Ref instances become immutable once iterator()ed. This
- also gives us a strong place to to culling with the certainty
- that we won't throw out a Body which would later be salvaged
- by some yet-to-be-added dependency.
-
- - Figure out if there is a way to:
-
- - allow unwrapping of children other than the very last one.
-
- - fold repetitions into an array form in Forest, before
- conversion to Tree. The major problem here is that multiple
- tree-arrays are possible, all of different lengths. Worse,
- even if they're all the same length, not all elements belong
- in the same "possibility vector" as all others. You
- essentially need a GSS to represent the array, which perhaps
- is what the unfolded form was in the first place.
-
- - Wikipedia grammar (needs to be both lexerless and boolean)
-
- - Boolean Parsing
- => Ordered Choice (";" operator)
-
- bring back in parse-table phase resolution of precedence (just
like associativity). This can be inferred from the use of ">"
when the rules are in one of these special forms:
form (try to prove this, you probably can), but it's "what people
intend" most of the time.
- - implement Johnstone's algorithm for "reduced, resolved LR
- tables" to eliminate superfluous reductions on
- epsilon-transitions.
-
-______________________________________________________________________________
-Neat Ideas
-
- - Rekers & Koorn note that GLR Substring Parsing can be used to do
- really elegant and generalized "autocompletion".
-
-
-______________________________________________________________________________
-Ideas for the Future
-
-- Incremental parse table construction
-- "lazy GLR" and "lazy trees" -> language with first-class CF matching
- - perhaps linear boolean grammars instead? (linear time, quad space)
-- Forest parsing => chained parsers
-- unification parsing, attributes, etc
-- RRP grammars?
-- Take another stab at maximal-match? Nonterminal not-followed-by is
- too strong.
-- Error recovery based on substring parsing