Dart made the mistake of supporting compiling to JavaScript and heavily orienting itself around that ecosystem. I don't see why WebAssembly should inherit JavaScript's flaws rather than JavaScript being improved to support proper interoperability with nearly everything else.
Conversation
I want to emphasize that nobody forces Wasm to support only UTF16. The point is that Wasm or more precisely Interface Types, which is an external layer, should also understand UTF16 along with UTF8, so that IT can be effectively used not only by Rust/Go, but also by Java, JS, C#
1
JavaScript strings aren't UTF-16. JavaScript is perfectly capable of supporting actual UTF-16 or supporting much more sensible UTF-8 strings. WTF-16 is not UTF-16 and doesn't belong in new languages / environments.
JavaScript strings are arrays of 16-bit integers, not Unicode.
2
I agree that JS does not have UF16 but rather WTF16, but that is not a big deal, since all FFI crossings and UTF16 -> UTF8 conversions within the engines are sanitized and unpairad surrogates are replaced by U+FFFD. The Wasm IT discusses now how this sanitization will take place
1
Why doesn't JavaScript add proper Unicode strings with a sensible in-memory format that doesn't waste a ton of memory and force conversions? Why should the people doing things better start using UTF-16?
1
Same question. Why UTF8 so waste a ton of memory for Asian languages and emojii?
1
UTF-8 uses at most 4 bytes for a code point, just like UTF-16. Emojis aren't in the BMP. UTF-16 needs the same amount of data to represent them.
It takes 3 bytes for basic Chinese, Japanese and Korean characters rather than 2 but their markup is still 1 byte instead of 2.
1
curl news.baidu.com > data.utf8
wc -c data.utf8
76013 data.utf8
iconv -f UTF-8 -t UTF-16 data > data.utf16
wc -c data.utf16
142622 data.utf16
They're using ASCII for their classes, ids, most of the URLs, etc. so this isn't just about it being an HTML file either.
3
UTF-8 is still more efficient for a program being used in a Mandarin locale due to avoiding conversions, having more efficient markup, attributes, URLs, all the other data in strings, etc.
Also avoids people pretending UTF-16 is fixed-size and having broken Unicode like JS does.
1
UCS2 can't even display the emojis you mentioned earlier which are often broken due to those languages with UCS2 strings masquerading as UTF-16.
Why do users often have issues like that? People who pushed this broken, legacy cruft far past the best before date as you're doing.

