The bottleneck in fine-grained multithreading is the cost of interlocked operations like LOCK XADD, LOCK OR, etc. In the case where code doesn't depend on the opcode's result or flags, couldn't these be made free by extending CPU write buffers to support GPU-style ROPs?
-
-
I think this type of code will play an important role in the future with many-core CPUs, but writing it today feels like writing a renderer on 486: It works, yet it’s brutally inefficient, and is just a couple microarchitectural steps away from excellence.
-
Yes, but modern rendering is very different from 486 - non-parallel smart algorithms were replaced by bruteforce parallel ones (e.g. zbuffer instead of tri-sorting / bsp). IMO future CPUs will follow this pattern (Amdahl's law and x86 strict memory architecture).
- 1 more reply
New conversation -
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.