I love libdivide. Moving to libdivide-2.0 in hardened_malloc is an easy win.
16 byte malloc microbenchmark on Broadwell-E:
Hardware division: 1s
libdivide-1.1: 0.74s
libdivide-2.0: 0.71s
In a lightweight build:
Hardware division: 1s
libdivide-1.1: 0.62s
libdivide-2.0: 0.59s
Conversation
Replying to
I’m surprised libdivide would need malloc at all (knowing next to nothing about how division is implemented.) Could it be even faster using fixed allocation?
2
Replying to
I'm not saying it uses malloc but rather I use it in hardened_malloc to perform division by slab size (to find the index of the slab metadata) and by the size class (to find the bitmap index for the slot in the slab): github.com/GrapheneOS/har. The benchmark is of hardened_malloc.
Using libdivide in hardened_malloc instead of hardware division cuts 25-40% off the runtime for a malloc microbenchmark depending on the configuration. It's in the 25-30% range with all the optional security features enabled since the relative cost of the divisions get smaller.
1
4

