Intel with all the melt/spec/leak mitigations, the bare-bones syscall cost went from ~70ns to about ~350ns. It's crazy. I think we lost 10 years of system call performance.
-
Show this thread
-
So the incentives are now to create shared memory protocols (instead using pipe/ipc) which are pretty hard to get right.
6 replies 14 retweets 106 likesShow this thread -
Replying to @cpuGoogle
I don't get it. You started talking about the per-invocation cost of syscalls, but now you're talking about reducing data movement? Aren't those mostly orthogonal? Whether you wake up another task by writing to a pipe or with futex() doesn't really change the number of syscalls?
1 reply 0 retweets 1 like -
Replying to @tehjh @cpuGoogle
I do understand that shared-memory-based protocols are generally more efficient, but isn't that just as true now as it used to be? (Or maybe even less now, because of Amdahl's law)
1 reply 0 retweets 2 likes -
Replying to @tehjh @cpuGoogle
heck, classic pipe-based communication copies the memory *twice*. if you have a shared memory channel and don't trust the writer, you can just let the reader copy the data over into a second buffer and parse it from there, and you *still* have one less memory copy than before
1 reply 0 retweets 1 like -
-
Replying to @tehjh
The idea of shared mem protocols is to amortize the cost of syscalls, for example if you have a shared futex then only sometimes you actually have to wait (block) which is a syscall. You go from syscall per unit of work to say ~5 units of work per syscall.
1 reply 0 retweets 0 likes
oh, so in usual workloads you can frequently send multiple units of work in rapid succession before having to block? that kind of surprises me. but even then, I still think the choice of wakeup primitive and whether you parse data directly from the shared ring are orthogonal.
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.