Conversation
But it was in fact quite tricky and the whole process was a big eye-opener for me into how English/Latin focus all of programming is.
So.... a thread of the things I needed to do to get this working.
1
20
Step 1: Adapting the grammar. We use the Lark parser, which has built-in support for variable names. I simply inherited for Hedy, so we used to have this:
import common.CNAME -> NAME
Clean and easy! But this only supports Latin, being defined as something like a-zA-Z_.
1
12
I did a lot of thinking of how to convert this to more character sets, clearly, we don't want to specify what characters we want, but what we don't want (no spaces, not starting with a number) calls this negative parsing: twitter.com/Felienne/statu
It was hard to get right!
Quote Tweet
To deal with all sorts of variable names @ra uses what he calls "negative parsing" in which, rather than describing what an identifier looks like, he describes what it cannot look like, allowing for a diverse set of alphabets
Show this thread
2
3
16
Ultimately, I stole Python's definition:
NAME: ID_START ID_CONTINUE*
ID_START: /[\p{Lu}\p{Ll}\p{Lt}\p{Lm}\p{Lo}\p{Nl}_]+/
ID_CONTINUE: ID_START | /[\p{Mn}\p{Mc}\p{Nd}\p{Pc}·]+/
So now I have to add this to my code base and I will admit I do not fully understand this code!
3
11
If you haven't seen it, the categories used are explained in the relevant PEP. It's interesting to see some of the rationale and background: python.org/dev/peps/pep-3
2
I actually got to this via Rust's RFC on non-ASCII identifiers – this thread made me curious about some of the decisions made, as this stuff is non-trivial! rust-lang.github.io/rfcs/2457-non-
1
Ahh, I see you posted a follow-up tweet mentioning the PEP, so sorry if I jumped the gun! Thanks anyway for this thread, this is important stuff to be aware of!
Quote Tweet
This is valid Python, since Python supports non-ASCII letters since PEP 3131 (2007). That's why they have the nice grammar I could steal.
So before I deployed, I figured it would be good to test the code in the Hedy interface too and lo and behold... It says "bad token"?!
Show this thread

