This would have allowed shorter encoding of a few chars and eliminated the confusion about "overlong sequences" & their invalidity.
-
Show this thread
-
Replying to @RichFelker
Punycode does that offset thang. Another UTF-8 stupid is artificially limiting it to UTF-16 scope. UTF-8 coulda supported up to U+7FFFFFFF.
1 reply 1 retweet 1 like -
-
-
Replying to @bofh453 @FakeUnicode
Because a 1M entry table is large but manageable. A 2G entry table is absolutely not.
1 reply 0 retweets 1 like -
Thankfully there are ways you can reduce the awfulness with multilevel tables & sparseness of ranges with non-const properties. But still...
2 replies 0 retweets 0 likes -
Replying to @RichFelker @bofh453
why not keep going...pic.twitter.com/1OCblAL7Gz
1 reply 0 retweets 0 likes -
Converting between encodings is simpler and faster if there is a small upper bound on the byte size of each encoded code point.
1 reply 0 retweets 1 like -
I don't think it's simpler or faster, but a limit of 4 bytes does allow in-place (reusing input buffer) conversion from UTF-32 to UTF-8.
3 replies 0 retweets 0 likes -
Replying to @RichFelker @gparker and
Definitely potentially faster if your vector width is large enough and you have a compress operation.
1 reply 0 retweets 1 like
I don't see any viable way to vectorize the conversion unless you can assume validity of input, and even then it sounds hard.
-
-
Replying to @RichFelker @gparker and
Vectorizing validity testing is easier than vectorizing conversion. (Neither is actually especially easy, merely possible).
0 replies 0 retweets 0 likesThanks. Twitter will use this to make your timeline better. UndoUndo
-
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.