@jqgregory Is there a more detailed explanation somewhere of the "pipeline stall" discussed in this passage from your book? I was asked about it, and I can't think of anything to which it could be referring. (https://imgur.com/76F458o )
-
-
Replying to @cmuratori
The idea came from this talk by Alexandrescu, I think https://www.slideshare.net/andreialexandrescu1/three-optimization-tips-for-c … (slide 16) but my example in the book is incorrect/confusing. I cite i++ in a LOOP as the example, but as I actually point out in the second paragraph the issue doesn't really apply to loop indices!
1 reply 0 retweets 1 like -
Replying to @jqgregory
[1/2] Does Alexandrescu have an explanation somewhere? That slide set doesn't really talk about it. I am having a very hard time convincing myself that there is a practical scenario where a[++i] compiles to a pipeline stall but a[i++] does not...
1 reply 0 retweets 0 likes -
Replying to @cmuratori @jqgregory
[2/2] I can believe some compilers would better handle a[0] a[1] a+=2 than a[i++] a[i++], but a[++i] vs. a[i++] I can't really think of how the compiler would reliably produce more stalls with one than the other.
1 reply 0 retweets 0 likes -
Replying to @cmuratori
Preincrement introduces a data hazard, because you need to wait for ++a to be calculated before it can be used. Whereas the value of a++ can be used immediately, while the increment op makes its way thru the pipeline sans data hazard.
1 reply 0 retweets 3 likes -
Replying to @jqgregory @cmuratori
Sorry, ++i vs i++ (not ++a/a++). It matters because the very next thing you do is to use i to index into a[]. No useful work can be done in between. So with preinc you have to wait for the result of the inc to pop out the other end of the pipeline before indexing into a[].
1 reply 0 retweets 0 likes -
Replying to @jqgregory
Sorry to keep beating this particular horse, but, I still don't get it. What platform are we talking about? This is not true on x64, right, for several reasons, not the least of which being that memory addresses have built-in offsets which execute in the same cycle as the load.
1 reply 0 retweets 3 likes -
Replying to @cmuratori @jqgregory
Separately, the add instruction is single-cycle and on at least three ports (for Intel anyways), so if the compiler does decide to preincrement the value, it's hidden by the load of the base address anyway.
1 reply 0 retweets 1 like -
Replying to @cmuratori @jqgregory
So if somebody has an example of some actual C code someone might write in the real world, where a pre-increment actually compiles to a cycle stall that would disappear if you switch to post-increment, I would like to see it so I can see what they're talking about.
1 reply 0 retweets 1 like
My best guess is that this was not advice from modern processors - maybe it is based on older processors that had less IPC?
-
-
Replying to @cmuratori @jqgregory
Here's the talk
@incomputable gave associated with that presentation. Minute 18 is where he talks about it. So, yeah, he also stated data dependencies. This was back in 2012 btw.1 reply 0 retweets 1 like -
It's clear you "get it" perfectly

but yeah it does seem like it's a moot point on today's Intel CPUs. I'll verify, and if so I'll remove that section in the next edition. It's confusing at best, and the example is wrong regardless. Thanks for bringing it to my attention.1 reply 0 retweets 6 likes - Show replies
New conversation -
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.