It's nice, in a way, that the kernel doesn't care about the encoding. But it shifts the problem to user-space. Just today a bot broke at work because of a file name that wasn't UTF-8 and python just couldn't handle that. It's 2018, I can try and make everyone use a UTF-8 locale.
-
Show this thread
-
I'm also considering disallowing newlines in file names. I can't think of a good use. Their existence in file names is why some tools produce/consume \0 delimited records, but many tools just do newlines (especially portability). Forbidding them makes things a lot simpler.
3 replies 0 retweets 5 likesShow this thread -
Interoperability is a key concern. What if I mount such a filesystem? I could try to 'correct' the file name (drop or fix chars), have a fallback representation (hexencode). I could move the inode to /lost+found as corrupted or return EIO.
4 replies 0 retweets 2 likesShow this thread -
Replying to @sortiecat
IMO, the best approach is to have a fallback representation, and maybe do charset transcoding on mounted filesystems.
2 replies 0 retweets 0 likes -
Replying to @pikhq @sortiecat
You already need to handle something like this if you want to mount FAT, after all, since the only charsets there are legacy and UTF-16.
1 reply 0 retweets 0 likes -
Replying to @pikhq @sortiecat
s/UTF-16/UCS-2 with illegal code values in the range D800-DFFF allowed/ (the FS makes no requirement that they be in pairs that are valid as UTF-16).
2 replies 0 retweets 0 likes -
Replying to @RichFelker @sortiecat
Oh, right. Either UCS-2 or potentially-invalid UTF-16, depending on how you describe it.
1 reply 0 retweets 0 likes -
The fun part is that countries like Japan widely adopted a mutibyte encoding long before utf8 was invented, and have stuck with them.
4 replies 0 retweets 0 likes -
Japan is pretty much the only partial holdout, and decreasingly so. Shift_JIS presence on the web dropped by ~50% between 2014 and 2018.
1 reply 0 retweets 0 likes -
Because of the great firewall of china, you mean?
3 replies 0 retweets 0 likes
Did you mean we're not seeing China as a holdout too because of great firewall? I don't think that's the case. Rather Unicode is just widely adopted there.
-
-
Replying to @RichFelker @landley and
I actually asked my parents about this and (at least according to them) Mainland China really has substantially shifted to using Unicode. It's not just the great firewall.
1 reply 0 retweets 2 likes - 1 more reply
New conversation -
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.