Another candidate definition of a systems programming language: Must give you access to a *non-validating* UTF-8 codec.
The ability to process malformed UTF-8 non-destructively is important, but accepting these invalid inputs is neither necessary or sufficient
-
-
Wrong scenario; the one I care about right now is _generating_ invalid UTF-8 without hassle, in order to test another piece of code.
-
OK, that makes your original tweet make more sense, but I still think it's a mistake.
-
If language is strongly typed and UTF-8/Unicode string is a type, malformed UTF-8 strings (with that type) simply can't come into existence.
-
Regardless, making byte strings that would be malformed if you attempt to interpret them as UTF-8 is trivial without lang/lib support.
-
c1 a1 f8 80 9f 92 a9 ed a0 bd ed b2 a9 f0 8d a0 bd f0 8d b2 a9 f4 9f bf bf fc 80 84 9f bf bf
-
c1, f8, and fc never appear in UTF-8; rest are more subtle.
-
echo -n "c1 b6 c1 a9 c1 ad c0 a0 c1 a3 c1 a1 c1 ae c0 a0 c1 af c1 b6 c1 a5 c1 b2 c1 a2 c1 b9 c1 b4 fc 80 80 80 81 a5" | xxd -r -p | vim -
End of conversation
New conversation -
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.