It's not a good case for Intel's SIMD traditionally, because it wasn't designed very well :/ The LRB instructions...
-
-
But yeah, SSE2-wise, it's pretty janky, and you tend to have to do mask-and-shift stuff.
-
And sometimes if there's not much actual math being done, it's actually just not faster :(
End of conversation
New conversation -
-
-
Yes, this is a conditional "append" pattern, and you want a "pack store" instruction. LRBNI/AVX512 have that.
-
I *highly* recommend you not try to do this stuff with raw SSE intrinsics. Instead use https://ispc.github.io/
- Show replies
New conversation -
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.