What is the most powerful deterministic subset of CFGs that we know how to detect? How do you deal with syntax like pat => expr or pat := expr, where pat can be a destructuring pattern, and has some overlap with expressions? (e.g. (a,b) is both an expr and a pat)
Conversation
I'm not sure what deterministic means in this context: do you mean unambiguous? If so, LR parsers are the largest statically provable unambiguous subset of CFGs. [Note: the LR parsing static type system is sound but not complete. There are unambiguous grammars that are not LR.]
1
1
Am I right thinking that the pat => expr go beyond LR, but is still unambiguous?
1
That depends heavily on the specifics of the grammar, so there's no guaranteed answer, but my gut instinct is that there's nothing obviously special about that syntax and so I would expect to be able to encode it as an LR grammar. Warning: I might be wrong.
1
1
The reason I think it may not be is that patterns and exprs can be nested ((a,b),b), so at the (a,b) we don't know whether we need to call the semantic action for the pattern version of (a,b) or the expression version of it, we only know that at =>?
1
One workaround may be to merge the productions for pat and expr, and then in a post pass walk over the AST and see if everything that's supposed to be an expr is an expr and everything that's supposed to be a pat is a pat, so as to reject things like (a,b+c) => ...
1
I tentatively believe is wrong about this being LR and you are correct; I have a blog post about exactly this issue, rntz.net/post/2018-07-1. IIUC Haskell uses the merge-pattern-and-expression approach you suggest.
3
2
Knuth actually discusses precisely this problem as the very last paragraph on his original paper on LR parsing. To my knowledge, his proposed generalization has never been explored, but I could easily be wrong about this. (I simply haven’t ever seen it discussed.)
1
3
Something I would find interesting would be a generalization of LR(1) that can handle these sorts of ambiguities without full backtracking by delaying the evaluation of semantic actions until the ambiguity is resolved, constructing a concrete “token tree” to be interpreted later.
1
2
That is, record the parsing work done so far so you never have to rewind your position in the stream and examine the same token more than once. Instead, construct a generic CST that can be “interpreted” against the appropriate production rules once the ambiguity is resolved.
1
2
How does GLR compare to this generalisation of LR? At least from what I heard in a talk on tree-sitter it sounds somewhat similar? I'm no expert on parsing though, so could be mistaken!
You can indeed do that with GLR by building a parse forest, and only after parsing interpret this parse forest by calling the semantic actions, or showing an error messages if there are unresolved ambiguities in the parse forest.
1
2
Spoofax uses this approach of building a parse forest with extra nodes indicating where ambiguity is not resolved, that you can then interpret in any way you like: metaborg.org/en/latest/
2
It’s true that you can get a similar result to what I’m describing using GLR, but GLR parsers provide no guarantee that the ambiguity will eventually be resolved—they may produce multiple results. I had in mind something that still statically guarantees an unambiguous parse.
1
1
Ahh, cool! Thanks for clarifying! Yeah, having nice static analysis for eventual ambiguities would be really nice - it's part of why I prefer LR generators, despite the error issues. (LALRPOP tends to reasonably good on the error front at least)




