Conversation

Replying to
So the ultimate desired behaviour would be: - Once rename() returns, new is replaced with old for the running system (in-memory) - Some time after rename() is called, new is fsynced then replaces old on disk (if a restart happens) The in-between time sounds scary to me?
2
Replying to and
No, to do a transaction safely, you need to fsync the file to commit the data to storage, rename it and then fsync the containing directory to commit the transaction. By covering up the problem, ext4 is hurting performance while the bugs in software are still present and serious.
1
Replying to and
Yes, that's the way to do it as an atomic transaction where either the previous data or the new data is guaranteed to be intact. Since many developers got it wrong and continue to get it wrong the ext4 developers hard-wired hacks to work around broken code at the expense of perf.
1
Not every use of a rename pattern like this cares about having a fully atomic transaction. It's nasty to have detection of common patterns in the kernel with workarounds. It reminds me of the Windows approach to compatibility. There are some other nasty hacks like this in Linux.
1
The hack often doesn't solve the issue because it only works for a single file and programs doing this wrong often need to commit a transaction across multiple files. The way to do that correctly a symlink to a directory and renaming the symlink to a new (committed) directory.
1
One of the main issues in this area is that developers don't want to pay the cost of using fsync properly so it isn't used nearly as much as it should be and filesystem developers have not appropriately prioritized optimizing it. There are also examples of bad code overusing it.
1
For example, Firefox has historically overused tiny SQLite transactions by committing every single change immediately, including doing commits with every letter that's typed in the address bar. They liked to blame SQLite when SQLite was doing exactly what they asked correctly.
1
twitter.com/DanielMicay/st Also, to clarify, you can safely leave out the final directory fsync as long as it's okay for either the previous version or the new version to be present after a crash / power loss. The fsync (or fdatasync) or the file data is what's needed for that.
Quote Tweet
Replying to @mik235 and @RichFelker
No, to do a transaction safely, you need to fsync the file to commit the data to storage, rename it and then fsync the containing directory to commit the transaction. By covering up the problem, ext4 is hurting performance while the bugs in software are still present and serious.
1
1
Due to ext4 adding the auto_da_alloc hack, it will work without fdatasync in the default configuration since it forces the fsync itself, but it's not guaranteed to be configured that way and the hack doesn't exist on other modern file systems with delayed allocation like XFS.
1
I think it's terrible that ext4 added this hack because it encourages developers not to fix their code to use actual transactions. There's a pervasive mistake of using fsync on the file after the rename instead of before the rename which is broken and can leave inconsistent data.
1
Show replies