If you want to split a JS string into characters, don't use .split(''), it will mangle multi-byte characters like emoji. Instead, use spread:
$ node
> const ghosts = '

'
undefined
> ghosts.split('')
[ '�', '�', '�', '�', '�', '�' ]
> [...ghosts]
[ '
', '
', '
' ]
-
-
Replying to @TeslaNick @Dayhaysoos
Super interesting
By Chance do you have a link or tl;dr on why?2 replies 0 retweets 0 likes -
It's because emojis are encoded by using two code points of UTF8 which is just a transformation format for Unicode. Checkout the length of emoji and you'll be surprised: '


'.length === 6
There is nice article on this problem by @mathias : https://mathiasbynens.be/notes/javascript-unicode …1 reply 1 retweet 2 likes -
Replying to @biteofpie @mtliendo and
It looks like that .split(...) doesn't treat UTF8 well.
1 reply 0 retweets 0 likes -
Replying to @biteofpie @mtliendo and
This has nothing to do with UTF-8. This is because JavaScript strings internally are 16 bit UCS-2, and astral characters (above U+FFFF) require two code units to form a UTF-16 sequence. Safe ways to split: [...'

']
Array.from('
')
'
'.match(/[^]/ug)pic.twitter.com/tYW9zSRkqs
1 reply 0 retweets 2 likes -
Replying to @FakeUnicode @mtliendo and
I meant UTF-16 of course. The ECMA standard doesn't strictly specifies UCS-2 and It depends on implementation. UTF16 is implied if not specified (see link). Anyway thanks for hint how to convert safely. It's weird that .split(...) has so many quirks.
https://tc39.es/ecma262/#sec-conformance …2 replies 0 retweets 0 likes
Re: JavaScript using UTF-16 or UCS-2: it depends on how you prefer to explain things. UTF-16 doesn’t support lone surrogates and JavaScript does. https://mathiasbynens.be/notes/javascript-encoding …
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.
JavaScript, HTML, CSS, HTTP, performance, security, Bash, Unicode, i18n, macOS.