Conversation

I love libdivide. Moving to libdivide-2.0 in hardened_malloc is an easy win. 16 byte malloc microbenchmark on Broadwell-E: Hardware division: 1s libdivide-1.1: 0.74s libdivide-2.0: 0.71s In a lightweight build: Hardware division: 1s libdivide-1.1: 0.62s libdivide-2.0: 0.59s
4
76
Replying to
I never used to think about it but I'm come to realize that integer division is ridiculously slow and it stands out to me in code that's supposed to perform well. The trick with libdivide is that it's doing the same kind of division by a constant optimizations as compilers.
1
6
Replying to and
So it's doing a fair bit of work to figure out the proper shifts / multiplications when you set up the divisor and then you reuse it many times. In hardened_malloc, it sets up a slab size divisor and size divisor for each size class, and then uses those to find the metadata.
1
2
Replying to and
Once it has the slab metadata, it needs to figure out the bit index in the bitmap(s) corresponding to the slot within the slab. Division is crucial to this approach to fully out-of-line metadata without needing something like a hash table and without needing power of two sizes.
1