Dougall@dougallj·Aug 26New blog post: "Reading bits with zero refill latency" Further optimising zlib-dougallj with a surprisingly simple change.dougallj.wordpress.comReading bits with zero refill latencyIn my recent post on optimising zlib decompression for the Apple M1, I used a loop that refilled a bit-buffer and decoded a huffman code each iteration, based on variant 4 from Fabian Giesen’…11159
Fabian Giesen@rygorousReplying to @dougalljon the table build via transpose, here's how that looks in Oodle:gist.github.comMSB-first -> LSB-first Huff table transpose (x86/SSE2 version)MSB-first -> LSB-first Huff table transpose (x86/SSE2 version) - transpose.cpp3:16 AM · Sep 6, 2022·Twitter Web App3 Likes
Fabian Giesen@rygorous·Sep 6Replying to @rygorous and @dougalljwe grab 8x8 squares, load rows with bit-reverse permutation (0,4,2,6,1,5,3,7), transpose, again store with bit-reverse permutation, and we need one small lookup for the bit-reverse of the block start offset1
Fabian Giesen@rygorous·Sep 6Replying to @rygorous and @dougallj(on x86; on ARM it's just a single rbit.)1