I love libdivide. Moving to libdivide-2.0 in hardened_malloc is an easy win.
16 byte malloc microbenchmark on Broadwell-E:
Hardware division: 1s
libdivide-1.1: 0.74s
libdivide-2.0: 0.71s
In a lightweight build:
Hardware division: 1s
libdivide-1.1: 0.62s
libdivide-2.0: 0.59s
Conversation
This is comparing to hardware-based division on a modern x86 CPU... so consider how much difference it makes on hardware without hardware-based division. The entire library is a single header with ~1.4k lines of code (~760 without C++ and SSE/AVX). Did I mention it's awesome?
1
6
It gives you compiler-style optimizations for division by a constant where the divisor is a runtime constant rather than a compile-time constant. In hardened_malloc, I use it to quickly perform division by the allocation size and slab size. It makes the out-of-line metadata fast.
2
4
Replying to
Wouldn't it be preferable to generate the constants statically at build time for all possible sizes rather than having wasteful init code and copies per process?
1
Replying to
The libdivide divisor structs are tiny: github.com/ridiculousfish. I could make a script to generate a constant array of these, but it wouldn't necessarily be faster and the memory usage isn't significant since it's just 2 of these structs for each size class allocator as a whole.
I have a lot of lower hanging fruit to address before considering that kind of micro-optimization. It's these 2 fields in the per-size-class slab allocator: github.com/GrapheneOS/har. It could be another one of these tables if I generated it when building: github.com/GrapheneOS/har.
1
Replying to
Imagine what happens if an attacker can corrupt the constants. Semantic constants should always be real rodata constants.
1
Show replies

