Are Relaxed atomic loads and stores on GPUs any more expensive than unsynchronized global loads and stores? https://www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html … suggests the only CPU arch where there's a difference is Itanium.
Doesn’t seem to be the case on ARM: https://gcc.godbolt.org/z/xxt300 Or x86-64: https://gcc.godbolt.org/z/Pz8YDF I assume you’re referring to optimization and not the machine instructions? I’m only talking about the machine instructions; obviously the semantics are different.
-
-
He only owns the instructions on NVIDIA GPUs

-
Ah, gotcha.
- 2 more replies
New conversation -
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.