I'm considering requiring file names to be UTF-8 on Sortix. Traditionally Unix allows any byte sequence (except '\0' and '/') in an unspecified encoding. This just creates problems. File names you can't type. Programming languages with UTF-16 strings can't access them.
s/UTF-16/UCS-2 with illegal code values in the range D800-DFFF allowed/ (the FS makes no requirement that they be in pairs that are valid as UTF-16).
-
-
Right, my concerns are about the syscall / VFS level. The filesystem driver is responsible for converting the filenames to and from a UTF-8 representation that can be provided through syscalls. If there's no UTF-8 representation possible, then yeah, "filesystem corruption".
Thanks. Twitter will use this to make your timeline better. UndoUndo
-
-
-
Oh, right. Either UCS-2 or potentially-invalid UTF-16, depending on how you describe it.
-
The fun part is that countries like Japan widely adopted a mutibyte encoding long before utf8 was invented, and have stuck with them.
-
Japan is pretty much the only partial holdout, and decreasingly so. Shift_JIS presence on the web dropped by ~50% between 2014 and 2018.
-
Because of the great firewall of china, you mean?
-
what
-
perhaps http://xahlee.info/w/what_encoding_do_chinese_websites_use.html … is out of date and nobody has legacy data.
-
I just checked all sites there, and all have them except qq and 163 have converted to UTF-8. :)
-
curiously, 163 seems to have reverted to GBK since that list was made...
- 3 more replies
New conversation -
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.