New blog post: Faster zlib/DEFLATE decompression on the Apple M1 (and x86)
dougallj.wordpress.com/2022/08/20/fas
Conversation
One more thing to try: for table-building, there's a usually much better way. There's two bitpacking conventions, and zlib (and also Oodle) use LSB-first bitpacking, which means you have bit reverses and scatters during the table build with the obvious approach.
For a canonical Huffman code, if you build a MSB-first decode table, the structure is _much_ simpler, since all the matching codewords (with different ignored suffix bits) are right next to each other. You can write them in big blocks.
1
2
Oodle has straight (len,value) pairs, 8b each, and for table build sets up a MSB-first uint8 code_lens[] and uint8 code_vals[], then runs a batch pass that interleaves lens and vals and applies a bit reverse permutation to the results.
1
1
Show replies

