What's up with pext and pdep with a memory operand on Zen/Zen2?
Yeah, the regular reg-reg ones are already terrible: 6 or 7 uops and 19 cycle latency, but with the memory operand (which should add a single uop), they are off the hook: more than *150* uops.
What?
Conversation
I just ran some tests: the performance seems to depend heavily on the value in the last operand; this is also the case for the register variants. If the last operand is set to -1 (i.e., all bits are 1), the instr. has 518 uops and needs more than 289 cycles!
Here is the nanoBench (github.com/andreas-abel/n) command I used for the test: sudo ./kernel-nanoBench.sh -asm "PDEP R8, R9, R10" -asm_init "mov R10, -1" -conf configs/cfg_Zen_common.txt
2
So it just happened that the memory-arg test used a different value then the reg one, right?
So it seems like the whole thing is microcoded and working on a bit-by-bit basis!?
FYU
1
7
Yes, the more bits are set, the slower it gets. For every additional bit, there are 8 additional uops.
2
7

