It's not a good case for Intel's SIMD traditionally, because it wasn't designed very well :/ The LRB instructions...
-
-
Replying to @cmuratori @OswaldHurlem
... has scatter/gather and masked writes which makes this sort of thing easier, but not sure if they made AVX512.
1 reply 0 retweets 0 likes -
Replying to @cmuratori @OswaldHurlem
Maybe ask
@tom_forsyth if AVX512 has good stuff for this? I'm not sure.2 replies 0 retweets 0 likes -
But yeah, SSE2-wise, it's pretty janky, and you tend to have to do mask-and-shift stuff.
1 reply 0 retweets 0 likes
And sometimes if there's not much actual math being done, it's actually just not faster :(
6:00 PM - 4 Jul 2016
0 replies
0 retweets
0 likes
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.