I bet when they created NSString, they thought that a simple 16-bit-per-character encoding ought-to-be-enough-for-everybody, and later they hacked this into UTF-16. I would've preferred ASCII hacked into UTF-8 instead ;)
-
-
Replying to @FlohOfWoe @matiasgoldberg
...what I'd need is a simple way to loop over UTF-32 characters in the NSString, I'm aware that I can decode into an NSData object with any encoding, but that's a bit awkward :D
2 replies 0 retweets 0 likes -
Replying to @FlohOfWoe @matiasgoldberg
You can read more about the history of NSString here: https://www.objc.io/issues/9-strings/unicode/ …, but generally speaking, a list of code points (≠ characters) isn’t what you want, regardless of encoding. Unless you’re writing your own shaping/layout engine, which is a Very Hard Problem.
2 replies 0 retweets 1 like -
Replying to @warrenm @FlohOfWoe
Well, he's doing UI so very likely he actually wants code units
1 reply 0 retweets 0 likes -
Replying to @matiasgoldberg @FlohOfWoe
What do you do with code points once you get them? If the answer is “index into an atlas or glyph array,” you’re Doing It Wrong.
1 reply 0 retweets 1 like -
Replying to @warrenm @FlohOfWoe
You mean because of combining characters?
1 reply 0 retweets 0 likes -
Replying to @matiasgoldberg @FlohOfWoe
Combining characters are but one issue in complex text layout. https://en.m.wikipedia.org/wiki/Complex_text_layout …
1 reply 0 retweets 1 like -
Replying to @warrenm @FlohOfWoe
Ah right, I forgot. I rely on HarfBuzz for that
1 reply 0 retweets 0 likes -
Replying to @matiasgoldberg @FlohOfWoe
And what do you give to HarfBuzz to get a sequence of glyph shapes? It doesn’t require enumerating code points manually.
1 reply 0 retweets 0 likes -
Replying to @warrenm @FlohOfWoe
To HarfBuzz I send the full substring (icu's ubidi separates ltr from rtl). I need code unit access for caret navigation in a basic editbox
2 replies 0 retweets 0 likes
Admittedly that works for the most basic cases, but even if you use a sane canonicalization, you still have to contend with the fact that Codepoints Aren’t Characters, and there are many instances where you don’t want an insertion point between them.
-
-
Replying to @warrenm @matiasgoldberg
At least for games, where localization is done on demand and market size you can usually get away with assuming that one UTF-32 code point is one character, and can be rendered as such. Only exception in >20 languages we had was Arabic (and that was a marketing experiment)
0 replies 0 retweets 1 likeThanks. Twitter will use this to make your timeline better. UndoUndo
-
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.