Conversation

Replying to
What's specifically wrong with the slash character in a path? At least for UTF8, slash is in the first 128 characters (since slash exists in ASCII as well). Semantically, it carries additional info, but I don't see what's wrong w/ storing it in a UTF8 string.
1
Replying to
It's not a problem. I was explaining that the restrictions on what can be in a path stop far short of allowing only valid Unicode. UTF-8 works fine as the encoding for *nix paths but there's nothing stopping anyone from using any other byte strings without internal \0 characters.
1
Replying to and
I just mentioned slash because the special meaning means a filename can't contain either NUL or slash unlike a path as a whole which just can't contain NUL. Unicode permits NUL inside strings so not every Unicode strings can be converted in a lossless way to a path either.
This Tweet was deleted by the Tweet author. Learn more
Replying to and
Should ideally avoid becoming a problem elsewhere and should be solved for JavaScript. They could add a document/program wide mode where valid Unicode strings are enforced and then people can opt-out of the problem. Can require it to use new features like they often do with TLS.
1
Replying to and
I don't really think they need new APIs or API redesign to fix JavaScript's Unicode issues. Need a way to opt-out of legacy strings. In practice, not much would break, and the breakage would be a nice way of uncovering a lot of latent bugs that are potentially already serious.
1
Replying to and
It would then be using UTF-16, which is still unfortunate due to wasted memory, engine complexity from optimizations to avoid wasting as much memory, conversion overhead, etc. Separate feature could be adding a nice new string type using UTF-8 and requiring the Unicode mode.
1
Replying to and
It really wouldn't be that hard to just turn JavaScript strings into UTF-16 in where you opt-in with either an equivalent to "use strict" (globally or not at all) or via document metadata like a header. It'd be nice to give it a modern immutable UTF-8 string type but not needed.
This Tweet was deleted by the Tweet author. Learn more
This Tweet was deleted by the Tweet author. Learn more