How can OoO fetch multiple items concurrently? I feel like I'm missing something...
-
-
"for (i = 0; i < N; ++i) doThing(x[i]);" doThing for i=1 can start before i=0 finishes (if no data deps)
2 replies 0 retweets 0 likes -
in practice it's not rare for the FE to be 3+ iterations ahead in short loops
2 replies 0 retweets 0 likes -
yes! Often hundreds of instructions at a time in some phase of being processed.
1 reply 0 retweets 0 likes -
Replying to @stephentyrone @rygorous and
newer/Intel specific, or also true of mid/late 2000s AMD chips (K9 & K10)?...
1 reply 0 retweets 0 likes -
Replying to @cr88192 @stephentyrone and
well over 100 instructions in flight is true of current Intel, AMD, and high-end ARM chips.
2 replies 0 retweets 1 like -
How does that avoid P4-era pathologically-bad performance on misses?
2 replies 0 retweets 0 likes -
Replying to @RichFelker @rygorous and
better predictors, better at discarding results without needing to flush everything.
1 reply 0 retweets 1 like -
Replying to @stephentyrone @RichFelker and
i.e. finer grained tracking of everything. Power/area/complexity cost is worth it.
2 replies 0 retweets 1 like -
Replying to @stephentyrone @rygorous and
Why can't we just use that area for 100x as many super-dumb cores?
5 replies 0 retweets 0 likes
make -j800 ftw
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.