Huh. SIMD often loses big time to the naive algorithm if it causes you to read more memory. Memory access latency is king!
-
-
Replying to @littlecalculist
@littlecalculist If some elements in your input have less data than the size of your SIMD registers then you end up reading extraneous data.1 reply 0 retweets 0 likes -
Replying to @pcwalton
@pcwalton@littlecalculist are you reading values that straddle cache lines?1 reply 0 retweets 0 likes
Replying to @asbradbury
@asbradbury @littlecalculist @cartazio I'm using posix_memalign() to get aligned data and it's still a problem.
8:57 AM - 30 Mar 2014
0 replies
0 retweets
0 likes
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.