I love libdivide. Moving to libdivide-2.0 in hardened_malloc is an easy win.
16 byte malloc microbenchmark on Broadwell-E:
Hardware division: 1s
libdivide-1.1: 0.74s
libdivide-2.0: 0.71s
In a lightweight build:
Hardware division: 1s
libdivide-1.1: 0.62s
libdivide-2.0: 0.59s
Conversation
Replying to
this is amazing. I guess divide isn't a bottleneck often enough for Intel to give that unit a lot of attention?
3
1
Replying to
I never used to think about it but I'm come to realize that integer division is ridiculously slow and it stands out to me in code that's supposed to perform well. The trick with libdivide is that it's doing the same kind of division by a constant optimizations as compilers.
1
6
So it's doing a fair bit of work to figure out the proper shifts / multiplications when you set up the divisor and then you reuse it many times. In hardened_malloc, it sets up a slab size divisor and size divisor for each size class, and then uses those to find the metadata.
1
2
Replying to
ok, this answers my question "why doesn't the compiler just emit the right thing in the first place"
3
1
fwiw the same “slightly too sophisticated for compilers” problem comes up when implementing a decimal floating-point library. Ideally you'd like to write:
uint64_t power_of_ten[17]={1,10,100,1000…};
and then let the compiler optimize “e/power_of_ten[k]” but current compilers…
2
1
I didn't take away from Daniel's comment that it's too sophisticated, but rather that the compiler lacks the knowledge that the same divisor would be used many times
1
So for example, for free(p), it figures out it's a slab allocation with an address range check:
github.com/GrapheneOS/har
Next, a subtraction and division by a constant (size class region size) to determine the size class and get the size class index:
github.com/GrapheneOS/har
Arenas make that a bit more complicated, since it has to determine the arena too. No need for libdivide yet though, since both are compile-time constants. Next, it uses a similar division by slab size to find slab metadata index in the metadata region:
github.com/GrapheneOS/har
1
Slab size is based on the size class. The sizes / slab sizes are all known at compile-time, and as pointed out I could make a constant array of the libdivide divisors, but it wouldn't change much since they're tiny and one time initialization doesn't take that long.
1
Show replies


