So you have to fit your entire state update inside a "ROP packet" or similar, which is a bit of a limited circumstance. Note that you can actually do this - make a thread that is the "ROP" unit and have other threads send it update packets, and it applies the packets in order.
Please bear with me on this: There's currently a piece of hardware that can atomically write just 1 byte into a 64-byte cache line. So clearly it's transforming (old cache line, write request) -> (new cache line). QUESTION: What is this hardware, and can it not support a ROP?
-
-
I think atomic RMW is the reason for locking the line, using LL/SC loop, or atomic RMW instruction. If you just write, with a literal value, fine. But if value came from memory, need to ensure it wasn’t modified by other bus master while <op> and store exec.
-
Is there any good explanation on what a ROP does and how it is designed in hardware?
- 1 more reply
New conversation -
-
-
Sure, it's the write-coalescer on the L1$ cache. This thing could totally be built. It would probably make CMPXCHG slightly faster. It would also make the cache bigger, more complex and slower. For every operation.
-
So is that extra complexity, perf and power worth it? That's the question architects ask themselves every day. And it's not a simple question, and it doesn't have a simple answer.
- 3 more replies
New conversation -
-
-
It is what the Keebler Elves use to make the fudge striped cookies, right? And they don't make ROPs.
Thanks. Twitter will use this to make your timeline better. UndoUndo
-
-
-
I believe it works by locking the cache-line, not by specializing doing a ROP. Writing bytes on the same cache line on x86 from different threads is very slow. We specifically make sure our cores don't write on same cache lines for this reason.
Thanks. Twitter will use this to make your timeline better. UndoUndo
-
-
-
Are you talking about multiple cores accessing the same cache line or the same core? Right now, the cacheline is still locked when it moves to exclusive so you can't really write anything in parallel from multiple cores (it works via serializing things).
Thanks. Twitter will use this to make your timeline better. UndoUndo
-
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.