Conversation

Dart, C#, Java, JavaScript internally uses UTF16. And They won't be able to change it to UTF8 even in the future. Otherwise it would require a complete re-do of the String API and loose compatibility. Also UTF8 does not solve the security problems like I said
2
1
That's not accurate. UTF-8, UTF-16 and UTF-32 represent exactly the same set of strings. They're encodings of Unicode. JavaScript is not using UTF-16 but rather has a legacy implementation that's really an array of UCS2 characters which permits invalid Unicode strings.
1
3
UTF-8 is able to encode every single string that UTF-16 and UTF-32 can encode. Your claims are inaccurate. Rust is not working around any issues with UTF-8 but rather OS path names are usually not guaranteed to be Unicode. NTFS and ext4 paths are allowed to be invalid Unicode.
1
2
Replying to
What's specifically wrong with the slash character in a path? At least for UTF8, slash is in the first 128 characters (since slash exists in ASCII as well). Semantically, it carries additional info, but I don't see what's wrong w/ storing it in a UTF8 string.
1
Replying to
It's not a problem. I was explaining that the restrictions on what can be in a path stop far short of allowing only valid Unicode. UTF-8 works fine as the encoding for *nix paths but there's nothing stopping anyone from using any other byte strings without internal \0 characters.
1
Show replies