I liked UTF-8 more before I started to think hard about efficient RISC-V BitManip code for encoding/decoding it.
-
-
How is it with edge cases? overbyte: c0 80 e0 80 80 f0 80 80 80 f8 80 80 80 80 fc 80 80 80 80 80 cesu-8: ed a0 bd ed b1 86 >10FFFF: f4 90 80 80 5 byte nonsense: f9 84 80 80 80 split surrogates: ed a0 80 ed bf bf overbyte split surrogates: f0 8d a0 80 f0 8d bf bf
-
You should add these to the big list of naughty strings:https://github.com/minimaxir/big-list-of-naughty-strings …
- Show replies
New conversation -
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.