Conversation

🤣🐧🤡 This would not even be an issue if everyone hadn't jumped on the epoll 🤡🚗
Quote Tweet
I'm not surprised to find out that Linux kernel socket load balancing is in a terrible state with no good options. Some background: blog.cloudflare.com/the-sad-state- Traditional epoll wakes every thread and they race to accept the connection. EPOLLEXCLUSIVE fixes it but uses LIFO order.
Show this thread
1
3
Replying to
The Linux kernel didn't really provide other viable options. The least bad option not requiring patching the kernel is probably REUSEPORT combined with dealing with the BPF nonsense they expect you to do. You still use epoll but with a socket per worker instead of sharing one.
2
Replying to and
Blocking accept uses FIFO rather than LIFO, but has other issues. It requires a weird non-standard approach to properly handle graceful reloads, etc. It doesn't really make any sense that EPOLLEXCLUSIVE uses a different order than blocking accept by default. It's a weird choice.
1
1
Replying to and
LIFO is lower overhead under very low load. If you have high load, there are no cold workers. The nginx model is to make one worker per CPU core. On many high performance systems, those will even be pinned to cores. It wants to have even load balancing across those threads/CPUs.
1
The reason nginx uses multiple workers is scaling to multiple cores. It matters a lot with TLS, especially against denial of service attacks. It scales up disk I/O on Linux with a per-worker thread pool. It could use io_uring but that hasn't landed yet and isn't that much better.
1
It would help if they had work stealing to let workers with less load steal connections from those with more load but it's hard to fit into how it works. Workers are almost entirely quickly processing events still accept connections fairly fast under high load, making it worse.
1
Replying to and
There were also some patches for round-robin wakeup. All of this does give me a sense of why kernel developers go with BPF for the 'policy' part, we now know of two ways users may want this. BPF would allow them to inspect more kernel state to make that decision.
2
Show replies