Clang has gotten clever enough to unroll all the iterations of this loop over the constant array. Unfortunately, the code bloat blocked it from doing inlining and it wasn't clever enough to partially inline the fast paths. Bit manipulation to the rescue.
Conversation
This is a nice performance win for sizes above the special case for the 16 byte spacing class. There isn't much more to optimize in the baseline hardened_malloc implementation. A few of the expensive optional features could still be significantly optimized or tuned to be lighter.
