UTF-8 was really a work of brilliance, guaranteeing what's pretty much a maximal set of important desirable properties like this.
-
-
Thankfully there are ways you can reduce the awfulness with multilevel tables & sparseness of ranges with non-const properties. But still...
-
-
Converting between encodings is simpler and faster if there is a small upper bound on the byte size of each encoded code point.
-
I don't think it's simpler or faster, but a limit of 4 bytes does allow in-place (reusing input buffer) conversion from UTF-32 to UTF-8.
-
Definitely potentially faster if your vector width is large enough and you have a compress operation.
-
I don't see any viable way to vectorize the conversion unless you can assume validity of input, and even then it sounds hard.
-
Vectorizing validity testing is easier than vectorizing conversion. (Neither is actually especially easy, merely possible).
End of conversation
New conversation -
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.