I had huge problems with differences between how swift and objective-c counts Unicode characters and handles ranges in languages where one human readable character can be composed of multiple Unicode scalars (Thai, maybe? Can’t recall). Be careful with that.
-
-
I don’t recall details but i have the recollection the bridging between both languages makes you think everything just works but it doesn’t. Objective C counts scalars, swift counts human readable characters, and NSString regex uses objective C counting methodology.
1 reply 0 retweets 0 likes -
Right, rust’s regex library is Unicode-aware and speaks UTF-8, so conversions would be unnecessary from Swift’s String.
2 replies 0 retweets 0 likes -
Replying to @harlanhaskins @gparker
I know nothing about Rust so I can’t comment. But it’s not just a matter of UTF-8 aware but really how it does the character counting. Swift itself changed, initial versions counted scalars, and staring in some version (3 or 4 I think) it started counting human readable chars.
1 reply 0 retweets 0 likes -
A UTF-8 regex only needs to respect unicode scalar boundaries, not grapheme clusters (what’s rendered for humans). It’s legal to split Swift strings at these boundaries. Swift counting characters as scalars or grapheme clusters doesn’t affect whether Rust’s regex will break it.
2 replies 0 retweets 1 like -
Second, if the regex engine doesn’t take the graphemes clusters into account, it may create ranges that are invalid as they would split a cluster in two parts.
1 reply 0 retweets 0 likes -
As far as UTF-8 and Swift strings are concerned, splitting a grapheme cluster is legal as long as you’re not splitting unicode scalars. /cc
@FakeUnicode1 reply 0 retweets 0 likes -
Replying to @NikolaiVazquez @harlanhaskins and
Yeah, that is true. I was still thinking about my use case where I definitely didn’t want that to happen. :)
1 reply 0 retweets 0 likes -
Replying to @arroz @harlanhaskins and
@burntsushi5 can you weigh in on what work would be needed for the regex crate to respect grapheme cluster boundaries?1 reply 0 retweets 0 likes -
Replying to @NikolaiVazquez @arroz and
If you're talking about pervasive support, then a complete rewrite and a total rethinking of how regex matching works. If you're talking about matching a grapheme cluster, e.g., \X in some regex engines, then that's doable today with some nastiness: https://github.com/BurntSushi/bstr/blob/master/scripts/regex/grapheme.sh …
1 reply 0 retweets 3 likes
I don't really have much context on what you're talking about. Is there an underlying assumption here that NSString regex supports grapheme clusters? If so, is there a link where I could read more?
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.