Conversation

Replying to and
Blocking accept uses FIFO rather than LIFO, but has other issues. It requires a weird non-standard approach to properly handle graceful reloads, etc. It doesn't really make any sense that EPOLLEXCLUSIVE uses a different order than blocking accept by default. It's a weird choice.
1
1
Replying to and
LIFO is lower overhead under very low load. If you have high load, there are no cold workers. The nginx model is to make one worker per CPU core. On many high performance systems, those will even be pinned to cores. It wants to have even load balancing across those threads/CPUs.
1
The reason nginx uses multiple workers is scaling to multiple cores. It matters a lot with TLS, especially against denial of service attacks. It scales up disk I/O on Linux with a per-worker thread pool. It could use io_uring but that hasn't landed yet and isn't that much better.
1
It would help if they had work stealing to let workers with less load steal connections from those with more load but it's hard to fit into how it works. Workers are almost entirely quickly processing events still accept connections fairly fast under high load, making it worse.
1
Even if there was work stealing... you would still want the connections to be decently distributed across them. REUSEPORT is good enough at that but giving more connections to non-idle workers isn't good. FIFO EPOLLEXCLUSIVE would be great for baseline load balancing.
1
Replying to and
There were also some patches for round-robin wakeup. All of this does give me a sense of why kernel developers go with BPF for the 'policy' part, we now know of two ways users may want this. BPF would allow them to inspect more kernel state to make that decision.
2
Replying to and
I think REUSEPORT + sophisticated BPF usage is a lot worse than FIFO EPOLLEXCLUSIVE + userspace work stealing. That would do a decent enough job distributing the initial load with low latency and then userspace can re-balance it over time. You cannot do it all up-front anyway.
1
There are use cases where REUSEPORT + BPF may be superior but it really isn't for a public-facing high load web server, and you're sacrificing latency to 'perfectly' balance the new connections across workers via REUSEPORT despite the very mixed/varying usage so it isn't really.
1