Conversation

What is the most powerful deterministic subset of CFGs that we know how to detect? How do you deal with syntax like pat => expr or pat := expr, where pat can be a destructuring pattern, and has some overlap with expressions? (e.g. (a,b) is both an expr and a pat)
3
I'm not sure what deterministic means in this context: do you mean unambiguous? If so, LR parsers are the largest statically provable unambiguous subset of CFGs. [Note: the LR parsing static type system is sound but not complete. There are unambiguous grammars that are not LR.]
1
1
That depends heavily on the specifics of the grammar, so there's no guaranteed answer, but my gut instinct is that there's nothing obviously special about that syntax and so I would expect to be able to encode it as an LR grammar. Warning: I might be wrong.
1
1
The reason I think it may not be is that patterns and exprs can be nested ((a,b),b), so at the (a,b) we don't know whether we need to call the semantic action for the pattern version of (a,b) or the expression version of it, we only know that at =>?
1
One workaround may be to merge the productions for pat and expr, and then in a post pass walk over the AST and see if everything that's supposed to be an expr is an expr and everything that's supposed to be a pat is a pat, so as to reject things like (a,b+c) => ...
1
Knuth actually discusses precisely this problem as the very last paragraph on his original paper on LR parsing. To my knowledge, his proposed generalization has never been explored, but I could easily be wrong about this. (I simply haven’t ever seen it discussed.)
Image
1
3
Something I would find interesting would be a generalization of LR(1) that can handle these sorts of ambiguities without full backtracking by delaying the evaluation of semantic actions until the ambiguity is resolved, constructing a concrete “token tree” to be interpreted later.
1
2
That is, record the parsing work done so far so you never have to rewind your position in the stream and examine the same token more than once. Instead, construct a generic CST that can be “interpreted” against the appropriate production rules once the ambiguity is resolved.
1
2
It’s true that you can get a similar result to what I’m describing using GLR, but GLR parsers provide no guarantee that the ambiguity will eventually be resolved—they may produce multiple results. I had in mind something that still statically guarantees an unambiguous parse.
1
1
Ahh, cool! Thanks for clarifying! Yeah, having nice static analysis for eventual ambiguities would be really nice - it's part of why I prefer LR generators, despite the error issues. (LALRPOP tends to reasonably good on the error front at least)