The biggest problem I've so far with getting io_uring IO to compete with traditional sync IO is the cases where there are many processes constantly waiting for one IO to finish. That's the typical case for WAL when the changes made are all small.
Conversation
The case I can't quite get to compete doing asynchronously is lots of concurrent *tiny* commits that all fit on a single page. There's no good way that I have found, so far, for many different processes to wait for that IO, without doing it essentially synchronous.
2
1
Does Postgres have the concept of group commit, ala SQL Server?
1
Yes - that's actually kind of what causes the problem here - lots of commits that are flushed in one IO, which causes a lot of connections to wait for that one IO (be it directly, or via another notification operation).
1
For devices doing less than < ~15k QD1 durable write IOPS it's not a problem. But after that the async paths are currently loosing out :(
1
My torture workload does lots of tiny 56 byte large (in WAL terms) transactions. Synchronous path does ~340k TPS with ~26k IOPS; asynchronous path does ~230k TPS with ~13k IOPS.
With slightly bigger transactions, or slightly higher IO latency, the async path easily wins.
2
2
3
The wider context here is that those 56 byte transactions were designed to be as small as possible in order to represent the worst possible case, which is actually quite unrealistic -- right?
2
I think realistic transactions start around ~100 bytes (34 bytes for the commit record, an insert into a table with two int8 columns is 64 byte after alignment).
At that size there's a smaller slowdown. It's just easier to analyze where the overhead is by going lower.
1
I was just going to point out that a B-Tree leaf insert is at least 64 bytes. The record header overhead significantly contributes to the total WAL overhead with "thin" tables and the like.


