Starting to play with dependent SQEs for io_uring. Enables things like "read data from here, then write to there". Or "open this file, then read this data from it". It's a powerful primitive, and eliminates wait points in sequences of operations.
-
-
Replying to @axboe
Somewhat related: What I was looking for the other day was doing a buffered write, and wanting to trigger immediate writeback. One can achieve that by doing a write and then sync_file_range with SYNC_FILE_RANGE_WRITE (which is what postgres does).
1 reply 0 retweets 0 likes -
Replying to @AndresFreundTec @axboe
Doable with iouring, but I was wondering if there's a cheaper way than submitting those as separate queue entries.
1 reply 0 retweets 0 likes -
-
Replying to @axboe
The goal is to *initiate* immediate writeback, but not to block waiting for it. Now obviously the blocking doesn't matter as much when done through io_uring as it does for plain pwrite(), but it'd require much bigger uring sizes, and a lot more userland buffers for the writes.
2 replies 0 retweets 0 likes -
Replying to @AndresFreundTec @axboe
As postgres uses a per-process model (for now at least), the uring sizes + additional buffers wouldn't be exactly free...
1 reply 0 retweets 0 likes -
Replying to @AndresFreundTec
uring size isn't that costly, each sqe is just 64 bytes and a cqe is 16 bytes. So if you went from needing 256 entries for 128 pending writes instead of just 128 entries, you're using 20kb of memory instead of 10kb. That's hardly more than a single write anyway.
1 reply 0 retweets 0 likes -
Replying to @axboe
I suspect that going from async buffered writeback to RWF_SYNC would require a significant number of queue entries for each process. And all of those would have to point to a userland 8KB buffer for each IO. It's not a problem if we just use two SQEs (write + sync_file_range).
1 reply 0 retweets 0 likes -
Replying to @AndresFreundTec
Ah ok, yes that makes a lot more sense to me. I wasn't advocating using RWF_SYNC unless it did what you needed. Doubling the SQE count per write where you want to kick off writeback doesn't seem like a big deal at all.
1 reply 0 retweets 0 likes
The SQE isn't itself relevant - but the separate page cache lookups themselves aren't free (in fact, they're a bottleneck independent of this in some workloads already). So there'd be some benefit in being able to do that as one command.
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.