Second post about wakers, this time about wakers they care which thread they’re woken fromhttps://boats.gitlab.io/blog/post/wakers-ii/ …
-
-
Replying to @withoutboats
> Using atomic reference counts. Since your application is single threaded, on x86 at least this should have essentially no overhead over using nonatomic reference counts. This has been mentioned elsewhere but it is unfortunately not true that uncontended atomics are cheap.
1 reply 0 retweets 1 like -
Replying to @glaebhoerl @withoutboats
corvus frugilegus Retweeted Eric Niebler
relevant thread [the blog post linked there is new to me and is open in a tab, I also linked a pdf]https://twitter.com/ericniebler/status/1081765425011216384 …
corvus frugilegus added,
1 reply 0 retweets 0 likes -
Replying to @glaebhoerl
Obviously extremely situational & if that’s the case there are implementation strategies without atomically
1 reply 0 retweets 0 likes -
Replying to @withoutboats @glaebhoerl
Also I don’t think the specific cited problem applies here, the whole point is your program is single threaded so there’s no other core reading from that cacheline
1 reply 0 retweets 0 likes -
Replying to @withoutboats @glaebhoerl
It has been pointed out that it can in theory suppress optimizations since the compiler can’t prove it would be UB for concurrent access to happen. I don’t think that’s relevant in this use case (and there are alternative strategies)
1 reply 0 retweets 0 likes -
Replying to @withoutboats
It's not just that, AFAIU (simplifying it, probably) it blocks every other instruction while it goes through the pipeline, so that's something like 10 cycles * 4 ops/cycle cost at a minimum (and yeah, "no other core accessing the cacheline" is what uncontended means:)
1 reply 0 retweets 0 likes -
Replying to @glaebhoerl @withoutboats
A simple test, FWIW: Added a drop(task.clone()) to Tokio's notify() to see how much will two additional atomic operations affect overall perf. The reactor benchmark got 2% worse, and it's not even representative of the real world since it just notifies a dummy I/O task in a loop.
1 reply 0 retweets 2 likes -
Replying to @stjepang @withoutboats
Also fwiw I'm not trying to argue for or against anything here (especially in futures/tokio) it's really not my area, just to provide context wrt the cost of atomics. I wish this were the default mode of communication...
2 replies 0 retweets 4 likes -
Replying to @glaebhoerl @withoutboats
I appreciate it! Tbh, although removing LocalWaker seems like the way to go, a part of me is a little worried so now I'm trying to verify how much performance is *really* affected.
1 reply 0 retweets 1 like
Here's a better benchmark that measures the difference between Arc and Rc:https://github.com/aturon/rfcs/pull/16#issuecomment-454529643 …
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.