Beginning to grok how to benchmark code running on an out-of-order CPU. For most code there is a critical dependency path of instructions that block each other. Any instruction not on that path may have zero impact on measured performance. However, if too many instructions not on
This and IACA are only for analyzing loops. So generally you put markers after the { and after the } of a loop and then it assumes that the thing will run forever, and show you that pipeline. They usually assume you ain't doing conditionals, I think.
-
-
The reason for that is presumably just because they don't actually know what the branch predictor will do without live data, so they can't really give you any kind of estimate for the runtime at that point. You can just comment out either side of the if and get that, though!
-
I haven't studied branch behavior on modern CPUs so I don't know exactly what normally happens there, but at least at one point the pipeline was flushed, making analysis "through" an if kind of useless, because the if was either free or there wasn't any pipelining at all!
End of conversation
New conversation -
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.