Conversation

🤣🐧🤡 This would not even be an issue if everyone hadn't jumped on the epoll 🤡🚗
Quote Tweet
I'm not surprised to find out that Linux kernel socket load balancing is in a terrible state with no good options. Some background: blog.cloudflare.com/the-sad-state- Traditional epoll wakes every thread and they race to accept the connection. EPOLLEXCLUSIVE fixes it but uses LIFO order.
Show this thread
1
3
Replying to
The Linux kernel didn't really provide other viable options. The least bad option not requiring patching the kernel is probably REUSEPORT combined with dealing with the BPF nonsense they expect you to do. You still use epoll but with a socket per worker instead of sharing one.
2
Replying to
Blocking accept in Linux has some serious issues itself and the way it's implemented doesn't scale. You have to use edge-triggered epoll to avoid O(n) algorithms. They don't really give you much of a choice. It's why nginx used to do this in userspace and probably still should.
1
1
Replying to and
They used to default to `accept_mutex on` where they would do the load balancing it userspace. It implements what EPOLLEXCLUSIVE should: give next connection to the worker idle for the longest time. Still available but they disabled it by default since EPOLLEXCLUSIVE was landed.
1
Replying to and
The reason they switched away is because it adds some overhead and at a very large scale the serialization through a single mutex for accepting connections starts to matter. The best choice for anywhere I use it is probably going back to that userspace implementation for now...
1
1
Replying to and
Their attitude seems to be that EPOLLEXCLUSIVE *should* do what they want and therefore they'll use the API and leave it up to other people to get the Linux kernel to fix it. No one really seems to care and just accepts the crappy REUSEPORT approach + BPF as a 'solution' though.
1
1
Replying to and
They have quite explicitly said that about REUSEPORT. It seems that the possibility of fixing EPOLLEXCLUSIVE is still open but the major players with the resources to easily do that kind of thing are happy with out-of-tree patches or BPF. :(