Seems like preliminary panic. AssemblyScript can move to UTF-8 like everyone else
Conversation
UTF8 doesn't solve security / validation problems:
onlineutf8tools.com/validate-utf8
websec.github.io/unicode-securi
1
It's the standard string encoding in 2021 & good enough. Other encodings should be used for legacy purposes only
1
2
Dart, C#, Java, JavaScript internally uses UTF16. And They won't be able to change it to UTF8 even in the future. Otherwise it would require a complete re-do of the String API and loose compatibility. Also UTF8 does not solve the security problems like I said
2
1
Yes, these are the "legacy" uses. At least C# and Java have had discussions about moving to UTF-8, possibly with new Utf8String type
2
That's not accurate. UTF-8, UTF-16 and UTF-32 represent exactly the same set of strings. They're encodings of Unicode. JavaScript is not using UTF-16 but rather has a legacy implementation that's really an array of UCS2 characters which permits invalid Unicode strings.
1
1
3
UTF-8 is able to encode every single string that UTF-16 and UTF-32 can encode. Your claims are inaccurate.
Rust is not working around any issues with UTF-8 but rather OS path names are usually not guaranteed to be Unicode. NTFS and ext4 paths are allowed to be invalid Unicode.
1
1
2
Most *nix filesystems permit paths to be any NUL-terminated C string with a special meaning for the slash character. That's why you can't represent them with Unicode strings. This is not an issue with UTF-8. You can't represent them as UTF-16 or UTF-32 either. You're very wrong.
1
2
Replying to
What's specifically wrong with the slash character in a path? At least for UTF8, slash is in the first 128 characters (since slash exists in ASCII as well).
Semantically, it carries additional info, but I don't see what's wrong w/ storing it in a UTF8 string.
1
Replying to
It's not a problem. I was explaining that the restrictions on what can be in a path stop far short of allowing only valid Unicode. UTF-8 works fine as the encoding for *nix paths but there's nothing stopping anyone from using any other byte strings without internal \0 characters.
I just mentioned slash because the special meaning means a filename can't contain either NUL or slash unlike a path as a whole which just can't contain NUL.
Unicode permits NUL inside strings so not every Unicode strings can be converted in a lossless way to a path either.
This Tweet was deleted by the Tweet author. Learn more
Show replies



