I wish SIMD instruction sets had better support for usage with bit vectors. I keep running into use cases for it and I can get substantial performance benefits with fairly contrived code but it could be much better if they'd dedicated more instructions and hardware towards this.
Conversation
It's frustrating that so many fancy scalar bit operations without vector equivalents:
en.wikipedia.org/wiki/Bit_Manip
It's possible to get some substantial performance wins for bit manipulation with vector code, but I end up with 3-4 instructions vs. a scalar operation if not more.
1
PDEP is an amazing instruction covering many of my use cases. I've been using it for proof of concept implementations of various features I've been working on. It has no equivalent in AVX and yet I can speed up other operations that are needed by vectorizing them with AVX...
Replying to
If you aren't familiar with PEXT / PDEP:
randombit.net/bitbashing/201
They're similar to SIMD scatter-gather but a bit level and they're very fast on Intel hardware (i.e. 1-2 cycles) not so much on AMD hardware unfortunately but even there it's still way better than the alternative.
1
