Conversation

🤣🐧🤡 This would not even be an issue if everyone hadn't jumped on the epoll 🤡🚗
Quote Tweet
I'm not surprised to find out that Linux kernel socket load balancing is in a terrible state with no good options. Some background: blog.cloudflare.com/the-sad-state- Traditional epoll wakes every thread and they race to accept the connection. EPOLLEXCLUSIVE fixes it but uses LIFO order.
Show this thread
1
3
Replying to
The Linux kernel didn't really provide other viable options. The least bad option not requiring patching the kernel is probably REUSEPORT combined with dealing with the BPF nonsense they expect you to do. You still use epoll but with a socket per worker instead of sharing one.
2
Replying to
Blocking accept in Linux has some serious issues itself and the way it's implemented doesn't scale. You have to use edge-triggered epoll to avoid O(n) algorithms. They don't really give you much of a choice. It's why nginx used to do this in userspace and probably still should.
1
1
Replying to and
They used to default to `accept_mutex on` where they would do the load balancing it userspace. It implements what EPOLLEXCLUSIVE should: give next connection to the worker idle for the longest time. Still available but they disabled it by default since EPOLLEXCLUSIVE was landed.
1
Replying to and
Their attitude seems to be that EPOLLEXCLUSIVE *should* do what they want and therefore they'll use the API and leave it up to other people to get the Linux kernel to fix it. No one really seems to care and just accepts the crappy REUSEPORT approach + BPF as a 'solution' though.
1
1
Replying to and
Google appears to mostly use REUSEPORT with BPF and Cloudflare fixed EPOLLEXCLUSIVE for themselves. Linux kernel is increasingly taking this approach of declaring that BPF provides ways to solve the problem and therefore they aren't going to do any work on improving the APIs.
1
1
Show replies