Conversation

🤣🐧🤡 This would not even be an issue if everyone hadn't jumped on the epoll 🤡🚗
Quote Tweet
I'm not surprised to find out that Linux kernel socket load balancing is in a terrible state with no good options. Some background: blog.cloudflare.com/the-sad-state- Traditional epoll wakes every thread and they race to accept the connection. EPOLLEXCLUSIVE fixes it but uses LIFO order.
Show this thread
1
3
Replying to
The Linux kernel didn't really provide other viable options. The least bad option not requiring patching the kernel is probably REUSEPORT combined with dealing with the BPF nonsense they expect you to do. You still use epoll but with a socket per worker instead of sharing one.
2
Replying to and
Blocking accept uses FIFO rather than LIFO, but has other issues. It requires a weird non-standard approach to properly handle graceful reloads, etc. It doesn't really make any sense that EPOLLEXCLUSIVE uses a different order than blocking accept by default. It's a weird choice.
1
1
Replying to and
LIFO is lower overhead under very low load. If you have high load, there are no cold workers. The nginx model is to make one worker per CPU core. On many high performance systems, those will even be pinned to cores. It wants to have even load balancing across those threads/CPUs.
1
The reason nginx uses multiple workers is scaling to multiple cores. It matters a lot with TLS, especially against denial of service attacks. It scales up disk I/O on Linux with a per-worker thread pool. It could use io_uring but that hasn't landed yet and isn't that much better.
1
Even if there was work stealing... you would still want the connections to be decently distributed across them. REUSEPORT is good enough at that but giving more connections to non-idle workers isn't good. FIFO EPOLLEXCLUSIVE would be great for baseline load balancing.
1
Replying to and
There were also some patches for round-robin wakeup. All of this does give me a sense of why kernel developers go with BPF for the 'policy' part, we now know of two ways users may want this. BPF would allow them to inspect more kernel state to make that decision.
2
Show replies