CMPXCHG = 18 clocks, no ILP, so 100x slower than a non-atomic operation. And this is stupid, as some CPU architecture improvements could make uncontended atomics much faster.
-
-
Replying to @TimSweeneyEpic
Could they? Excellent. Drop me a line, I'll add them.
1 reply 0 retweets 3 likes -
Replying to @tom_forsyth
Please educate me on how write-buffers are implemented and whether adding a ROP is a feasible idea? I mean, if it's just a queue of (address,size,data), then adding an operation wouldn't be hard, and the hardware for add/or/xor/and ROPs would be minimal.
1 reply 0 retweets 1 like -
Replying to @TimSweeneyEpic
Right, but if you allow it to go out of order with other memory operations, you just broke all the code - coz they're not barriers any more. And if you don't then hey - it stalls.
1 reply 0 retweets 2 likes -
Replying to @tom_forsyth
For many common operations, atomicity is vital but order is not: - ORing a shared bitmask, a common technique used by concurrent garbage collectors. - Increasing or decreasing a reference count without checking its value - Adding to a shared counter
2 replies 0 retweets 1 like -
Replying to @TimSweeneyEpic @tom_forsyth
More simply put, I can atomically MOV [BYTE PTR eax],IMM without LOCK but I cant atomically set bits that way. I see no fundamental reason why, if the CPU supports the former, it can’t economically support the later.
2 replies 0 retweets 0 likes -
Replying to @TimSweeneyEpic @tom_forsyth
You could probably build special hardware to pipeline atomics to specific memory locations, but generally you have to read/write an entire cache line at a time, so that piece of memory has to be locked while you do that
1 reply 0 retweets 0 likes -
Replying to @dankbaker @tom_forsyth
Please bear with me on this: There's currently a piece of hardware that can atomically write just 1 byte into a 64-byte cache line. So clearly it's transforming (old cache line, write request) -> (new cache line). QUESTION: What is this hardware, and can it not support a ROP?
5 replies 0 retweets 0 likes -
I think atomic RMW is the reason for locking the line, using LL/SC loop, or atomic RMW instruction. If you just write, with a literal value, fine. But if value came from memory, need to ensure it wasn’t modified by other bus master while <op> and store exec.
2 replies 0 retweets 0 likes -
Replying to @brrrianb @TimSweeneyEpic and
Is there any good explanation on what a ROP does and how it is designed in hardware?
2 replies 0 retweets 0 likes
https://en.m.wikipedia.org/wiki/Render_output_unit … Generally, a ROP is a simple ALU that operates asynchronously at the end of the computing pipeline to merge its output into memory without stalling the front-end.
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.