The bottleneck in fine-grained multithreading is the cost of interlocked operations like LOCK XADD, LOCK OR, etc. In the case where code doesn't depend on the opcode's result or flags, couldn't these be made free by extending CPU write buffers to support GPU-style ROPs?
-
-
Yes, but modern rendering is very different from 486 - non-parallel smart algorithms were replaced by bruteforce parallel ones (e.g. zbuffer instead of tri-sorting / bsp). IMO future CPUs will follow this pattern (Amdahl's law and x86 strict memory architecture).
-
Since GPUs are already so awesome for parallel processing, I think we’ll see the further specialization of CPUs towards branchy, random-access, synchronization-heavy, transactional processing tasks rather than data parallelism.
End of conversation
New conversation -
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.