Is there a way to use the Unicode tables in the Rust regex crate to match a single char without turning it into a String or &str? It seems like the internals must have a way to do this, but perhaps not exposed? Is there a trick?
Is it ridiculous to consider exposing a "codepoint match" facility? Or did I just not understand something about what makes matching the `char` type to a codepoint difficult? (the ontology is complicated enough that I could be missing a mismatch somewhere)
-
-
I don't know if I would use the term 'ridiculous' necessarily, but I think I would need some compelling evidence to motivate it. There's some incongruities to consider (like regexes that never match a single codepoint), and whether it's really worth a new API item.
-
e.g., If there was an example that said, "match on a specific codepoint by doing `http://re.is _match(codepoint.encode_utf8(&mut [0; 4]))`" that might be enough. https://play.rust-lang.org/?gist=79cd9455d12d186af21ae685f6f909fb&version=stable …
-
I would never have thought to turn a char into a &str that way! I wonder if it means we should add char as_str(&self) -> &str
-
Hmm. Don't think that would work. You would need to return an array, but we don't have a type for "fixed size array whose contents are guaranteed to be UTF-8."
-
I mean once you have the &[u8] can't you read it into a &str? Where would the unsafety come from?
-
I don't think you can get a &[u8] for anything other than ASCII, since the in-memory representation is different between `char` and UTF-8. Would have to be owned
-
You could read the data into a new fixed size UTF8 stack array (doing the conversion) and then read it into the &str.
- 4 more replies
New conversation -
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.