Good explanation of the compute architecture of GPGPUs: SIMD < SIMT < SMT: parallelism in NVIDIA GPUs #gpgpu http://yosefk.com/blog/simd-simt-smt-parallelism-in-nvidia-gpus.html …
so iow a switch with many branches is inefficient because most time is spent not doing anything?
-
-
yeah, basically. at best, the dead lanes can be clock gated (but even that's not always possible)
-
sometimes you can get granularity in between, e.g. clock-gating register accesses on a quad granularity
End of conversation
New conversation -
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.