Not only do ASCII bytes never appear in multibyte UTF-8 chars; NO character is ever a substring of another character.
-
-
Only "confusion" I can see here is that when existing data is all ASCII you don't inherently know whether processes will accept UTF-8 edits.
-
But there's never a confusion about how to interpret the existing ASCII data. Any way you choose is right.
-
If you are given a string, e.g. "Hello world", how do you know its encoding? You can't see the difference in that specific case.
-
It's ASCII. It's UTF-8. It's Latin-1. It's Latin-N for any N. It's ... Any answer is correct (except of course EBCDIC crap).
-
There's no lack of information here that prevents you from processing the existing data.
End of conversation
New conversation -
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.