Just about the only thing that arguably could have been done better in UTF-8 would have been offsetting the base value for multibyte chars..
-
Show this thread
-
Replying to @RichFelker
Maybe it would've been better if it was incompatible with ASCII , avoiding confusion about which encoding a given string has.
1 reply 0 retweets 0 likes -
-
Replying to @RichFelker @vilcans
In any case there's no such confusion. An ASCII string *is* (not "is confusable with") UTF-8.
1 reply 0 retweets 0 likes -
Replying to @RichFelker
As ASCII is a subset of both Latin-1 and UTF-8 it causes a lot of confusion I'd say.
1 reply 0 retweets 0 likes -
Replying to @vilcans
Your point is either not clear in <140-char units or not well thought-out.
1 reply 0 retweets 0 likes -
Replying to @RichFelker @vilcans
Only "confusion" I can see here is that when existing data is all ASCII you don't inherently know whether processes will accept UTF-8 edits.
1 reply 0 retweets 0 likes -
Replying to @RichFelker @vilcans
But there's never a confusion about how to interpret the existing ASCII data. Any way you choose is right.
1 reply 0 retweets 0 likes -
Replying to @RichFelker
If you are given a string, e.g. "Hello world", how do you know its encoding? You can't see the difference in that specific case.
2 replies 0 retweets 0 likes -
Replying to @vilcans
It's ASCII. It's UTF-8. It's Latin-1. It's Latin-N for any N. It's ... Any answer is correct (except of course EBCDIC crap).
1 reply 0 retweets 0 likes
There's no lack of information here that prevents you from processing the existing data.
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.