For example, Firefox has historically overused tiny SQLite transactions by committing every single change immediately, including doing commits with every letter that's typed in the address bar. They liked to blame SQLite when SQLite was doing exactly what they asked correctly.
Conversation
twitter.com/DanielMicay/st
Also, to clarify, you can safely leave out the final directory fsync as long as it's okay for either the previous version or the new version to be present after a crash / power loss. The fsync (or fdatasync) or the file data is what's needed for that.
Quote Tweet
Replying to @mik235 and @RichFelker
No, to do a transaction safely, you need to fsync the file to commit the data to storage, rename it and then fsync the containing directory to commit the transaction. By covering up the problem, ext4 is hurting performance while the bugs in software are still present and serious.
1
1
My understanding is that with data=ordered or at least with data=journal, just journaling should get you this either-or property with no fsync or fdatasync whatsoever. Is that wrong?
1
You definitely do need fdatasync, but fdatasync is traditionally a very fast operation compared to fsync. It's true that on ext3, data=ordered worked as you described, but there's no guarantee of that kind of coarse journal and ext4 changed this with delayed allocation of blocks.
1
Due to ext4 adding the auto_da_alloc hack, it will work without fdatasync in the default configuration since it forces the fsync itself, but it's not guaranteed to be configured that way and the hack doesn't exist on other modern file systems with delayed allocation like XFS.
1
I think it's terrible that ext4 added this hack because it encourages developers not to fix their code to use actual transactions. There's a pervasive mistake of using fsync on the file after the rename instead of before the rename which is broken and can leave inconsistent data.
1
IIRC the problem was that people were doing fsync/rename backwards and not fixing their code, even after wide-spread reports of data loss.
Presumably because previous filesystems happened to get away with it?
I guess slow is better than lost data?
1
It's not slow to do a properly scoped fdatasync before rename. It's faster than doing a full fsync on the file after rename, which doesn't even guarantee that the rename was synced because doing that requires an fsync on the directory rather than the file anyway.
2
By doing rename before sync, it's possible the rename is committed to storage before data. On ext3, it wasn't possible with data=ordered because it didn't have delayed allocation of blocks, but that was never guaranteed or portable. Doubt developers knew about that guarantee.
2
I disagree with this assessment. It was guaranteed by what my, and I think many others', understanding of what data=ordered meant. Delayed allocation is a hack that licenses the fs driver to violate what data=ordered means.
1
It's only relevant what data=ordered means if the code was specifically written for ext3 rather than being portable across filesystems. It shouldn't matter what data=ordered means for the vast majority of code not written to target a specific filesystem implementation.
From my perspective, you're framing it wrong as the code being "written for" anything. The code is just written to POSIX (without sync options) with no concept of power failure existing. Filesystem is providing user-facing (not app-facing) features on top...
1
...to give the user (not the application) a level of data integrity they've selected (and made tradeoffs for) based on whether they deem power failure or kernel crash something that could plausibly happen and whether they deem the data valuable.
1
Show replies
The parameter doesn't exist at all for other filesystems like xfs or other operating systems. Using fdata and fdatasync per the actual semantics is straightforward and correct code doesn't depend on the ext4 hacks to inject fsync based on non-portable, non-guaranteed heuristics.


