Is there a way to use the Unicode tables in the Rust regex crate to match a single char without turning it into a String or &str? It seems like the internals must have a way to do this, but perhaps not exposed? Is there a trick?
(And you are right that regex crate knows what a Unicode codepoint is. It is, in fact, the fundamental atom of a match for Unicode regexes. This is not as good as using grapheme clusters, but is easier to implement!)
-
-
Is it ridiculous to consider exposing a "codepoint match" facility? Or did I just not understand something about what makes matching the `char` type to a codepoint difficult? (the ontology is complicated enough that I could be missing a mismatch somewhere)
-
I don't know if I would use the term 'ridiculous' necessarily, but I think I would need some compelling evidence to motivate it. There's some incongruities to consider (like regexes that never match a single codepoint), and whether it's really worth a new API item.
- 11 more replies
New conversation -
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.