Conversation

The feature is called auto_da_alloc, and the documented purpose is correct - ensuring that, *if* the rename is seen after async power failure, the data in the new file is also seen. But then it goes and does more...
1
1
...which is wrong. It makes the whole rename syscall sleep until the data is committed, as if it were rename+fsync. Which makes sense if you cared about the data, but then you would have called fsync yourself!
2
1
Replying to
So the ultimate desired behaviour would be: - Once rename() returns, new is replaced with old for the running system (in-memory) - Some time after rename() is called, new is fsynced then replaces old on disk (if a restart happens) The in-between time sounds scary to me?
2
Replying to
No scarier than any other write to a filesystem. The journal keeps things ordered such that you don't see later changes without earlier ones, and ensures you see all but the last few seconds of changes anyway.
1
Replying to
Yeah, ok. So if you want actual synchronous behaviour (eg. you accepted a message via SMTP) you should still actually call fsync(). It's unclear from the man page what the guarantee of rename() after return is - but if it's not disk-synchronous then you need to sync().
1
1
Replying to and
There are no guarantees about data or metadata being synced without explicit using of fdatasync or fsync. The ext4 rename hacks aren't relevant to the semantics of the API because it's an optional feature specific to ext4. The hacks aren't guaranteed to be implemented or enabled.
1