Bit hacking versus memoization: a Stream VByte example
https://lemire.me/blog/2017/11/28/bit-hacking-versus-memoization-a-stream-vbyte-example/ …
by @lemire
-
-
I have updated my post with a link to your tweet.
-
My team suggests that I have only 1 idea in my whole career; specifically to use PSHUFB for everything. This is unfair; I also like PDEP/PEXT. The combination of these instructions is lethal for columnar database operations (i.e. fast lookup of packed 6 bit fields against a set).
-
We have used pshufb for many papers: https://arxiv.org/abs/1503.03465 https://arxiv.org/abs/1503.07387 https://arxiv.org/abs/1611.07612 https://arxiv.org/abs/1704.00605
-
I've seen some of this work but not all. Thanks! It's a great instruction. Even the magic bit 7 can occasionally be useful. I was sad when we lost the 2 independent 128-bit PSHUFBs to get a single 256-bit one. VBMI will generalize the shuffle architecture to bytes (eventually).
-
How did we lose the two-lane byte shuffle. What do you mean by lose? Where did it go?
-
In the transition from IVB->HSW we lost the ability to do 2 128-bit PSHUFBs (ports 1 and 5) a cycle, gaining instead the ability to do 1 256-bit VPSHUFB (port 5 only).
-
Makes sense. So we still get the two 128-bit byte shuffles per cycle, but need to use 256-bit registers.
End of conversation
New conversation -
-
-
Agreed. This is good and will be fast.
-
But! Don’t forget that you need to get the result back to scalar registers... which has non-trivial latency.
-
In the 'large' you just leave the sums in a SIMD register and just reduce once; reduction is the most painful part IMO. Generally you have to shift/add, shift/add, etc. However a sneaky person would use PSADBW against zero which will get you byte sums over 64 bits.
-
Transfer to GPR from SIMD is not that bad on IA by the way. Maybe 1-2 cycles and cheap. The close binding of SIMD registers to GPR (and control flow) is a huge strength of Intel Arch and I was saying this before I was an Intel employee :-)
End of conversation
New conversation -
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.