Just about the only thing that arguably could have been done better in UTF-8 would have been offsetting the base value for multibyte chars..
I don't think it's simpler or faster, but a limit of 4 bytes does allow in-place (reusing input buffer) conversion from UTF-32 to UTF-8.
-
-
Definitely potentially faster if your vector width is large enough and you have a compress operation.
-
I don't see any viable way to vectorize the conversion unless you can assume validity of input, and even then it sounds hard.
-
Vectorizing validity testing is easier than vectorizing conversion. (Neither is actually especially easy, merely possible).
End of conversation
New conversation -
-
-
Even when you can't reuse the input buffer the upper bound lets you pre-allocate an output buffer, avoiding overflow and reallocation bugs.
-
Yes, but 4:4 vs 4:6 allocation doesn't make much difference. As above I find the 4-byte limit important but this seems like a dubious reason
-
True, 6 vs 4 doesn't matter much for allocation. (TBH I had forgotten that 6 is no longer the limit.)
End of conversation
New conversation -
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.