Apparently the extent of UTF-8's amazingness still isn't widely know. Today on #musl some1 asked to confirm mb chars never have ASCII bytes.
-
Show this thread
-
Not only do ASCII bytes never appear in multibyte UTF-8 chars; NO character is ever a substring of another character.
1 reply 2 retweets 9 likesShow this thread -
UTF-8 was really a work of brilliance, guaranteeing what's pretty much a maximal set of important desirable properties like this.
2 replies 3 retweets 15 likesShow this thread -
Of course the desirable properties necessitate one property that's hard to like: not all byte sequences can be legal/valid.
2 replies 0 retweets 2 likesShow this thread -
Replying to @RichFelker
I actually see that as a very nice "feature". It makes it easy to guess if a long piece of data is UTF-8 or random binary.
1 reply 0 retweets 2 likes -
Replying to @creationix
Yes, once you weigh all the pros and cons, I agree it's a feature. But in my experience it's hard for ppl to accept it at first.
1 reply 0 retweets 0 likes -
Replying to @RichFelker
I'm probably biased because I've written auto-detectors for various binary formats. When a format accepts all bit patterns, it's a pain.
2 replies 0 retweets 1 like
With regard to auto-detection, the best part about UTF-8 is that false positives are extremely rare.
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.