The question is whether they're slow because they're actually contesting or not. IIRC they're about 20-30 clocks if uncontested and the data's in the cache? Deferring the update breaks the whole ordered memory thing, which is like - the whole point of doing the LOCK!
-
-
-
On Skylake, LOCK XADD / LOCK OR are 18 clocks when uncontested. That's fine for infrequent operations. But if you're building a fine-grained C++ multithreading library, they're so frequent that they can create a 2x-4x slowdown.
- 6 more replies
New conversation -
-
-
Isn’t it really that inter core synchronisation happens at the cache line level? Wouldn’t the cache line sync have to become like a top unit?
-
That's my understanding. Nontemporal stores aside, I think all the write-buffers to is wait for the current core to own its cache line, then they write their data into the local cache. That would be the place for a ROP to occur.
- 1 more reply
New conversation -
-
-
If you don't need the result before or after an add/or you can keep one value per thread on a separate cachelines and just perform a non atomic add/or. This is often an optimization done in compute as well.
-
This is a good optimization for small things, but for something huge (like a garbage collector's reachability bitmask), it doesn't scale well. E.g. instead of 1 bit per flag, you have to choose either 1 byte per flag, or 1 bit per flag per thread.
- 1 more reply
New conversation -
-
-
Lies, damned lies and performance projections :-)
Thanks. Twitter will use this to make your timeline better. UndoUndo
-
-
-
Message-passing at instruction level would be nice.
-
+1. exchange data between siblings via registers at retirement stage; bypass the memory system. allow “placing” dataflow programs that exploit temporal parallelism, preserving message order to maintain coherence within reduced spatial bounds. warps; spatial temporal coherence.
End of conversation
New conversation -
-
-
@PlowRox - explain!Thanks. Twitter will use this to make your timeline better. UndoUndo
-
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.