What scripts break our Latin-1 assumptions in Unicode? I usually check against (Arabic|Hebrew), (some Indic script), Korean, Han, emoji.
-
-
Replying to @ManishEarth
(ltr+beginning/end ligature forms), separate combiners, inherent combiners, omg glyphs, wtf emoji respectively
1 reply 0 retweets 0 likes -
Replying to @ManishEarth
Whenever thinking about text I usually mentally check against each one of these. Useful. I'm probably missing some.
2 replies 0 retweets 0 likes -
-
-
Replying to @ManishEarth @sebmarkbage
there's also Japanese, for non-roundtripping through Unicode (cp932) and "\" / "¥" ambiguity
1 reply 0 retweets 1 like -
1 reply 0 retweets 0 likes
-
have you seen https://github.com/jagracey/Awesome-Unicode … it's one of the best references I know of.
1 reply 0 retweets 0 likes -
Replying to @wycats @sebmarkbage
Ah, turkish Is and ß for casing are a case I forgot.
1 reply 0 retweets 0 likes
casefolding-to-ASCII is a good one to remember.
-
-
JS had a bug where [\w\W] in regexes had missing chars due to casefolding-to-ASCII mistakes.
0 replies 0 retweets 1 likeThanks. Twitter will use this to make your timeline better. UndoUndo
-
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.