Conversation

Dart, C#, Java, JavaScript internally uses UTF16. And They won't be able to change it to UTF8 even in the future. Otherwise it would require a complete re-do of the String API and loose compatibility. Also UTF8 does not solve the security problems like I said
2
1
This isn't accurate. JavaScript doesn't have UTF-16 strings. If it did, every JavaScript string could be represented with standard UTF-8. JavaScript implements strings as arrays of UCS2 characters which is not standard Unicode or UTF-16. That's why an extended UTF-8 is required.
1
3
JavaScript has mixed UCS2 and UTF16 api. Some methods like codePointAt, fromCodePoint, toUpperCase/toLowerCase, for..of str, Array.from(str), normalize, regex's test are unicode aware. The rest are USC2 include substr and length. But it doesn't really make much difference...
2
The strings are inherently not UTF-16 because they can contain invalid Unicode. UTF-16 doesn't support any string that isn't also supported by UTF-8. There is no lossy conversion between UTF-8 and UTF-16. They're different encodings of the same thing able to represent it all.
1
C strings and std::string are not Unicode string types and while they can hold UTF-8 they are not UTF-8 string types and do not provide any actual UTF-8 support. They're arrays of bytes. C++ std::string is implemented as a library and other libraries can implement string types.
1
Show replies