Conversation

Replying to and
There's some documentation on the approach in the README. It was written to be a replacement for the previous port of OpenBSD malloc and shares a lot of the basic design concepts with it. It does lean more towards security than performance and cares a lot about low fragmentation.
1
Replying to and
I would be using a slab allocator approach with out-of-line slab metadata even for a fully performance oriented allocator, like the Linux kernel. Bitmaps instead of free lists is the main difference from a pure performance-oriented approach but jemalloc uses bitmaps to pack data.
1
Replying to and
Mapping from addresses to the slab metadata can be done in various ways like the alignment trick in jemalloc, PartitionAlloc and OpenBSD malloc. That's not needed here due to reserving isolated regions for slab allocations metadata (slabs, large allocations, and all other state).
1
Replying to and
That could still be a perfectly suitable approach in a purely performance-oriented allocator. I'd just throw away all the security features layered on top, replace bitmaps with free lists and add small array-based thread caches like jemalloc to amortize the cost of locking.
1
Replying to and
Scalability primarily comes from having locking divided up per-arena and per-size-class within the arenas, so with 4 arenas, there are 4 separate sets of size class regions, each with their own entirely independent locking (there is no global or arena-level locking for slabs).
1
Replying to and
Being solely focused on 64-bit support is liberating and is what makes it very different than OpenBSD malloc. It frees it from needing to use the alignment tricks or global data structures and enables never mixing / reusing address space between size classes or metadata.
1
Replying to and
Guard slabs also only hurt performance by making the address space sparser. It simply skips them and they never get unprotected. For the default guard slab interval of 1, slabs with a single slot are getting guaranteed guards on both sides with no direct / major performance cost.
1
Replying to and
In a purely performance-oriented allocator, the slab allocator approach is faster, lower fragmentation and has far lower memory usage for metadata. Metadata overhead is lower than 1% for 16 byte allocations with a slab allocator using free lists and a bit over 1% with bitmaps.
1
Replying to and
It's a bit higher here, since it has a bitmap for the quarantine as well in order to provide proper reliable double-free detection without needing an approach like a hash table tracking every element in the quarantine. Canaries also waste space but memory tagging will replace it.
1
Replying to and
Since it doesn't support 64-bit, I do need to choose something else for legacy 32-bit components. That may mean enabling Scudo globally for 32-bit, since it's already going to be around. I prefer the slab allocator approach though, and it just offers better security properties.
1
Replying to
There's probably some direct collaboration that could happen, but even without it, it's immensely helpful to me for Android to be officially supporting multiple malloc implementations, especially one that's going to run into a lot of the same latent memory corruption bugs.