It's not a good case for Intel's SIMD traditionally, because it wasn't designed very well :/ The LRB instructions...
But yeah, SSE2-wise, it's pretty janky, and you tend to have to do mask-and-shift stuff.
-
-
And sometimes if there's not much actual math being done, it's actually just not faster :(
Thanks. Twitter will use this to make your timeline better. UndoUndo
-
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.