Conversation

JavaScript has mixed UCS2 and UTF16 api. Some methods like codePointAt, fromCodePoint, toUpperCase/toLowerCase, for..of str, Array.from(str), normalize, regex's test are unicode aware. The rest are USC2 include substr and length. But it doesn't really make much difference...
2
The strings are inherently not UTF-16 because they can contain invalid Unicode. UTF-16 doesn't support any string that isn't also supported by UTF-8. There is no lossy conversion between UTF-8 and UTF-16. They're different encodings of the same thing able to represent it all.
1
std::string is just a dynamic array of bytes able to manage a terminating NUL for C string compatibility. It doesn't have any specific encoding. Being able to stick UTF-8 in std::string doesn't mean it provides any form of UTF-8 support or that it's a UTF-8 string type.
1
C strings and std::string are not Unicode string types and while they can hold UTF-8 they are not UTF-8 string types and do not provide any actual UTF-8 support. They're arrays of bytes. C++ std::string is implemented as a library and other libraries can implement string types.
1
There's no point of making this disingenuous strawman argument. The fact is that JavaScript has legacy, broken strings and hasn't been fixed to support what Unicode has been for the past 21 years. JavaScript can be easily fixed rather than propagating brokenness further.
2
I agree that a new language inspired by JavaScript can have really unicode-aware operations for all the remaining USC2 methods. But changing the format from UTF16/WTF16 to UTF8 would cause a complete change in string API. Even Dart couldn't do that.
1
JavaScript strings are not UTF-16 or Unicode strings. Every UTF-16 string can be represented as UTF-8. If JavaScript had UTF-16 strings, there would never be one you couldn't convert to UTF-8. UTF-8 is what's used for input/output and JavaScript is already forcing conversions.
1
I want to emphasize that nobody forces Wasm to support only UTF16. The point is that Wasm or more precisely Interface Types, which is an external layer, should also understand UTF16 along with UTF8, so that IT can be effectively used not only by Rust/Go, but also by Java, JS, C#
1
JavaScript strings aren't UTF-16. JavaScript is perfectly capable of supporting actual UTF-16 or supporting much more sensible UTF-8 strings. WTF-16 is not UTF-16 and doesn't belong in new languages / environments. JavaScript strings are arrays of 16-bit integers, not Unicode.
2
Show replies