Huh. SIMD often loses big time to the naive algorithm if it causes you to read more memory. Memory access latency is king!
@littlecalculist If some elements in your input have less data than the size of your SIMD registers then you end up reading extraneous data.
-
-
@pcwalton@littlecalculist are you reading values that straddle cache lines? -
@asbradbury@littlecalculist@cartazio I'm using posix_memalign() to get aligned data and it's still a problem.
End of conversation
New conversation -
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.