Conversation

I love libdivide. Moving to libdivide-2.0 in hardened_malloc is an easy win. 16 byte malloc microbenchmark on Broadwell-E: Hardware division: 1s libdivide-1.1: 0.74s libdivide-2.0: 0.71s In a lightweight build: Hardware division: 1s libdivide-1.1: 0.62s libdivide-2.0: 0.59s
4
76
Replying to
I never used to think about it but I'm come to realize that integer division is ridiculously slow and it stands out to me in code that's supposed to perform well. The trick with libdivide is that it's doing the same kind of division by a constant optimizations as compilers.
1
6
Replying to and
fwiw the same “slightly too sophisticated for compilers” problem comes up when implementing a decimal floating-point library. Ideally you'd like to write: uint64_t power_of_ten[17]={1,10,100,1000…}; and then let the compiler optimize “e/power_of_ten[k]” but current compilers…
2
1
Show replies