Mapping from addresses to the slab metadata can be done in various ways like the alignment trick in jemalloc, PartitionAlloc and OpenBSD malloc. That's not needed here due to reserving isolated regions for slab allocations metadata (slabs, large allocations, and all other state).
Conversation
That could still be a perfectly suitable approach in a purely performance-oriented allocator. I'd just throw away all the security features layered on top, replace bitmaps with free lists and add small array-based thread caches like jemalloc to amortize the cost of locking.
1
Scalability primarily comes from having locking divided up per-arena and per-size-class within the arenas, so with 4 arenas, there are 4 separate sets of size class regions, each with their own entirely independent locking (there is no global or arena-level locking for slabs).
1
Being solely focused on 64-bit support is liberating and is what makes it very different than OpenBSD malloc. It frees it from needing to use the alignment tricks or global data structures and enables never mixing / reusing address space between size classes or metadata.
1
Guard slabs also only hurt performance by making the address space sparser. It simply skips them and they never get unprotected. For the default guard slab interval of 1, slabs with a single slot are getting guaranteed guards on both sides with no direct / major performance cost.
1
In a purely performance-oriented allocator, the slab allocator approach is faster, lower fragmentation and has far lower memory usage for metadata. Metadata overhead is lower than 1% for 16 byte allocations with a slab allocator using free lists and a bit over 1% with bitmaps.
1
It's a bit higher here, since it has a bitmap for the quarantine as well in order to provide proper reliable double-free detection without needing an approach like a hash table tracking every element in the quarantine. Canaries also waste space but memory tagging will replace it.
1
Since it doesn't support 64-bit, I do need to choose something else for legacy 32-bit components. That may mean enabling Scudo globally for 32-bit, since it's already going to be around. I prefer the slab allocator approach though, and it just offers better security properties.
1
I was contributing upstream to OpenBSD malloc, but what I wanted was too different and needing to support 32-bit is too limiting. Scudo's goals are probably closer to what I want than OpenBSD malloc, but the fundamental low-level approach is a lot more different from what I want.
1
Replying to
There's probably some direct collaboration that could happen, but even without it, it's immensely helpful to me for Android to be officially supporting multiple malloc implementations, especially one that's going to run into a lot of the same latent memory corruption bugs.

