I love libdivide. Moving to libdivide-2.0 in hardened_malloc is an easy win.
16 byte malloc microbenchmark on Broadwell-E:
Hardware division: 1s
libdivide-1.1: 0.74s
libdivide-2.0: 0.71s
In a lightweight build:
Hardware division: 1s
libdivide-1.1: 0.62s
libdivide-2.0: 0.59s
Conversation
Replying to
this is amazing. I guess divide isn't a bottleneck often enough for Intel to give that unit a lot of attention?
3
1
Replying to
I never used to think about it but I'm come to realize that integer division is ridiculously slow and it stands out to me in code that's supposed to perform well. The trick with libdivide is that it's doing the same kind of division by a constant optimizations as compilers.
1
6
So it's doing a fair bit of work to figure out the proper shifts / multiplications when you set up the divisor and then you reuse it many times. In hardened_malloc, it sets up a slab size divisor and size divisor for each size class, and then uses those to find the metadata.
1
2
Replying to
ok, this answers my question "why doesn't the compiler just emit the right thing in the first place"
3
1
This Tweet was deleted by the Tweet author. Learn more
This Tweet was deleted by the Tweet author. Learn more
For my use case, I could technically try doing something like making a switch covering all possible divisors with each case performing division by a constant. I do know the set of divisors in advance but it's a specific one at runtime. Not sure Clang / GCC would handle it well.
1
I'm also not sure that would perform better, and it's a lot more work and it would be a bunch of code that needs to be kept consistent with the chosen size classes. Could also generate the libdivide divisors at compile-time and read them from a table but I don't think it'd help.
3
1
I think reading them from a table is a good idea. Any type of table lookup is controversial, sure, but it seems like the constants would pack down to a few cache lines. In practice loading constants like that will be almost free.
1
The reason it wouldn't help though is that it already has a convenient place to read them from at runtime. The only way a table would help is if the compiler was clever enough to do optimizations based on it which I wouldn't expect. It would just move them to a different place.
You mean libdivide is already doing a table lookup to get the magic constants?
1
I mean I'm already generating them once at initialization and storing them in a data structure that it already has to read from anyway. So, if I really wanted, I could generate them all at compile-time since I do know all the possible divisors but it wouldn't accomplish much.
1
1
Show replies


