Doing a 16 byte atomic operation on x86_64 linux takes over 1000 cycles each. This is ridiculous.
-
-
I did a quick test using pthread_mutexes, since they should be implemented using futexes on anything recent. Took roughly 360 cycles (median). Though, 16 byte loads get faster over time, and approach roughly 200 cycles. They're expensive because it's a call into a .so.
-
Here is my ugly and probably very incorrect cycle count code: https://gist.github.com/Leandros/35b8af120f861ef94a462f57fa10dc05 …
End of conversation
New conversation -
-
-
Which is the ridiculous part, who thought it's a good idea to implement 16 byte atomics as a dynamic library call?
Thanks. Twitter will use this to make your timeline better. UndoUndo
-
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.