Of course the desirable properties necessitate one property that's hard to like: not all byte sequences can be legal/valid.
Thankfully there are ways you can reduce the awfulness with multilevel tables & sparseness of ranges with non-const properties. But still...
-
-
-
Converting between encodings is simpler and faster if there is a small upper bound on the byte size of each encoded code point.
-
I don't think it's simpler or faster, but a limit of 4 bytes does allow in-place (reusing input buffer) conversion from UTF-32 to UTF-8.
-
Definitely potentially faster if your vector width is large enough and you have a compress operation.
-
I don't see any viable way to vectorize the conversion unless you can assume validity of input, and even then it sounds hard.
-
Vectorizing validity testing is easier than vectorizing conversion. (Neither is actually especially easy, merely possible).
End of conversation
New conversation -
-
-
so b/c assigned chars aee sparse, you can limit range for *properties* by mapping most to "Unknown" while still preserving conversion range.
-
Only as long as they remain unassigned. That's not very comforting.
End of conversation
New conversation -
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.