Conversation

Replying to and
we grab 8x8 squares, load rows with bit-reverse permutation (0,4,2,6,1,5,3,7), transpose, again store with bit-reverse permutation, and we need one small lookup for the bit-reverse of the block start offset
1
Show replies