As ASCII is a subset of both Latin-1 and UTF-8 it causes a lot of confusion I'd say.
-
-
Replying to @vilcans
Your point is either not clear in <140-char units or not well thought-out.
1 reply 0 retweets 0 likes -
Replying to @RichFelker @vilcans
Only "confusion" I can see here is that when existing data is all ASCII you don't inherently know whether processes will accept UTF-8 edits.
1 reply 0 retweets 0 likes -
Replying to @RichFelker @vilcans
But there's never a confusion about how to interpret the existing ASCII data. Any way you choose is right.
1 reply 0 retweets 0 likes -
Replying to @RichFelker
If you are given a string, e.g. "Hello world", how do you know its encoding? You can't see the difference in that specific case.
2 replies 0 retweets 0 likes -
Replying to @vilcans @RichFelker
In the future you might get strings with special characters, which will be encoded as UTF-8 or Latin-1, but you can't tell *now*.
1 reply 0 retweets 0 likes -
Replying to @vilcans
That's inherent not a property of UTF-8. The same applies to any character encodings you pick.
1 reply 0 retweets 0 likes -
Replying to @RichFelker @vilcans
You can make the situation even worse by not even having a known common subset (ASCII) but you can't make it any better without metadata.
2 replies 0 retweets 0 likes -
Replying to @RichFelker
You could have an encoding that can't be mistaken for Latin-1 but still readable. "Hello World" encoded as "hELLO_wORLD" for example.
1 reply 0 retweets 0 likes -
Replying to @vilcans
Um, "hELLO_wORLD" is a valid ASCII (and thus Latin-1) string. You've crossed over into the realm of heuristic guesses....
1 reply 0 retweets 0 likes
And completely abandoned "requirement 0" of UTF-8. I don't think this thread is productive at this point...
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.