To make room for more expression, we will now count all emojis as equal—including those with gender and skin tone modifiers 

. This is now reflected in Twitter-Text, our Open Source library.
Using Twitter-Text? See the forum post for detail:https://twittercommunity.com/t/new-update-to-the-twitter-text-library-emoji-character-count/114607 …
-
-
Replying to @TwitterAPI
Will this work with other multi-codepoint emoji as well?
keycaps
flags
gender ZWJ
family ZWJ
generic ZWJ
Will you be counting potential extended grapheme clusters, or just Unicode RGI?6 replies 3 retweets 18 likes -
Replying to @FakeUnicode
According to this commit from their open source release, they'll be counting "all emoji as a single code point [...], including longer grapheme clusters combined by zero-width joiners"https://github.com/twitter/twitter-text/commit/9537bdf15f935b40fed0ea3ea44cee860fbb8880 …
1 reply 2 retweets 6 likes -
Replying to @__cartr__
Sounds good on paper, but implementing that can be complex, since emoji are non-deterministic. Like, even country flags. Are they just counting any two RIS pairs, or using a list of ISO codes or Unicode RGI? These can change depending on world politics.
2 replies 2 retweets 8 likes -
Replying to @FakeUnicode @__cartr__
Even skin tones can be complex. Will they just not count those codepoints when following any other emoji, or use the current (annual) version of Unicode's RGI https://unicode.org/Public/emoji/11.0/emoji-data.txt … (emoji with the property Emoji_Modifier_Base=yes take skin tones). Some vendors vary, too.pic.twitter.com/mX89LmFWj8
1 reply 1 retweet 6 likes -
Replying to @FakeUnicode
The actual regex they're using is here: https://github.com/twitter/twitter-text/blob/9537bdf15f935b40fed0ea3ea44cee860fbb8880/java/src/main/java/com/twitter/twittertext/TwitterTextEmojiRegex.java … Not super great at reading regex, but it appears they're checking for real country codes, emoji that actually take skin tones, etc
1 reply 2 retweets 5 likes -
Replying to @__cartr__
Ugh, and since they are dealing with UCS-2 escape sequences, that is basically unreadable.
[maybe cc @mathias: is this vaguely based on your emoji regex, or just the RGI list dumped into UCS-2 ranges]1 reply 1 retweet 6 likes -
can we add hundreds of skin tone modifiers and still count it as one emoji
1 reply 2 retweets 1 like -
That was the thought, but devs confirmed that it is just Unicode's RGI list embedded in Twitter Text.
2 replies 2 retweets 4 likes
I hope to fix this with https://github.com/tc39/proposal-regexp-unicode-sequence-properties … over time.
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.
JavaScript, HTML, CSS, HTTP, performance, security, Bash, Unicode, i18n, macOS.