Conversation

The feature is called auto_da_alloc, and the documented purpose is correct - ensuring that, *if* the rename is seen after async power failure, the data in the new file is also seen. But then it goes and does more...
1
1
...which is wrong. It makes the whole rename syscall sleep until the data is committed, as if it were rename+fsync. Which makes sense if you cared about the data, but then you would have called fsync yourself!
2
1
Replying to
So the ultimate desired behaviour would be: - Once rename() returns, new is replaced with old for the running system (in-memory) - Some time after rename() is called, new is fsynced then replaces old on disk (if a restart happens) The in-between time sounds scary to me?
2
Replying to
No scarier than any other write to a filesystem. The journal keeps things ordered such that you don't see later changes without earlier ones, and ensures you see all but the last few seconds of changes anyway.
1
Replying to
Yeah, ok. So if you want actual synchronous behaviour (eg. you accepted a message via SMTP) you should still actually call fsync(). It's unclear from the man page what the guarantee of rename() after return is - but if it's not disk-synchronous then you need to sync().
1
1
The filesystem journalling also isn't guaranteed to be coarse. It's important to actually sync what you need to be synced or there's no guarantee of it being committed to storage. Filesystems can and should implement fine-grained fsync since it encourages doing real transactions.