Conversation

Replying to and
And it's true that NUMA scheduling problems aren't also "a rust problem" per se, but, these default choices matter. Fundamentally epoll doesn't scale and it's scaling cliff is rapidly approaching common deployment scenarios. Too much of this ecosystem derives epoll architecture
1
1
Replying to and
SO_REUSEPORT will evenly distribute TCP connections across workers in the kernel. It's also possible to use SO_INCOMING_CPU to set the affinity of a socket so connections are handled by a thread with an affinity matching where the connection was received. Distributes load well.
1
Replying to and
I totally agree, scaling your average socket server is a relatively well known problem (at least there are lots of docs and experts now). The waker scheduling domain in this example is more messy though, as the events are spread across multiple multi-threaded reactors...
1
.. so you can get some of the biasing back if you write a scheduler yourself (as most of the super modern scalable storage engines do, for example), and that solves part of the problem, but you're taking on the OS job, and you're down on tools - at some point you fight balancing
1
the defaults matter though, and this is one of my complaints about the ecosystem i'm unwinding in this program. the program is a small cli daemon, it cares about efficiency, and low scale io concurrency. It doesn't care about c1m type problems..
1
1
.. one of the concurrency libs under the hood makes a choice: spawn a thread per core, as a default choice. It assumes you're on something like a 2-16 core machine. It assumes that having every thread polling a common eventfd is ok, as a result. On a numa system, this is not ok.
1
... moreover, it's extremely unlikely that there's any measured real world sample under which the model they've chosen as a default scales well even on home systems. these implementations all thrash between parks and io events because of the mixed concurrency models...
1
... you can see it as soon as you trace the processes - if you kill off most of the io, they still sit there park thrashing for sometimes minutes afterward, because the scheduler has completely lost the plot (not the schedulers fault)
1
1
Replying to and
Are they at least using EPOLLEXCLUSIVE to poll the shared fd? It avoids a lot of spurious wakeups. A major issue for the Rust ecosystem is that io_uring doesn't fit well into Rust's model for memory safety. They're still stuck building things on top of the epoll approach.
1
1
github.com/ringbahn/ringb is probably a good starting point. If I was building things with Rust right now, I don't think I would be interested in using any of the more popular / mature AIO libraries. I don't think the abstractions or implementations make sense. It's not long-term.
I don't think it's possible to have an AIO library that's both good and portable right now. The whole point of Rust is having efficient abstractions mapping well to the low-level details. If it's not a thin abstraction over io_uring, it fails at being a good AIO library on Linux.
2
3
Linux actually being able to do AIO properly is a recent development with crucial additions in recent kernel releases so you need a bleeding edge kernel too. Doing I/O in a thread pool and wiring that up to eventfd and epoll may make sense for JavaScript but it doesn't for Rust.
1
3
Show replies