Beginning to grok how to benchmark code running on an out-of-order CPU. For most code there is a critical dependency path of instructions that block each other. Any instruction not on that path may have zero impact on measured performance. However, if too many instructions not on
-
Show this thread
-
the critical path are queued up together, they will fill up the instruction queue, and the CPU can't continue running the instructions on the critical path until they are cleared. So telling what instructions block the critical path is non-trivial.
1 reply 0 retweets 0 likesShow this thread -
Replying to @NoHatCoder
MCA is getting not terrible, and can be run from GodBolt now:https://godbolt.org/z/jS4uHd
2 replies 0 retweets 1 like -
Replying to @cmuratori
Interesting, but seems to have a lot of shortcomings. It doesn't really do branches? Seems like it just assumes that every instruction is going to run once.
1 reply 0 retweets 0 likes -
Replying to @NoHatCoder
This and IACA are only for analyzing loops. So generally you put markers after the { and after the } of a loop and then it assumes that the thing will run forever, and show you that pipeline. They usually assume you ain't doing conditionals, I think.
1 reply 0 retweets 0 likes -
Replying to @cmuratori @NoHatCoder
The reason for that is presumably just because they don't actually know what the branch predictor will do without live data, so they can't really give you any kind of estimate for the runtime at that point. You can just comment out either side of the if and get that, though!
1 reply 0 retweets 0 likes
I haven't studied branch behavior on modern CPUs so I don't know exactly what normally happens there, but at least at one point the pipeline was flushed, making analysis "through" an if kind of useless, because the if was either free or there wasn't any pipelining at all!
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.