Conversation

There's more to come soon, including high-level design for the new malloc. Might post after I get a bite to eat; if not, tomorrow.
1
2
The quick version: the design I've mostly worked out now is in some sense slab-like, but with the flexibility to avoid being overly eagar to grow, and without tying in slab structure assumptions to the layer that will be the fast-path.
1
1
That is to say, the outer layer that works with an "active group" of slots per size class could just as well be hooked up to the "bump allocator with free" in the non-candidates post, with no change to logic.
1
1
Replying to
No, no per-cpu/thread caching. That compromises optimal use of free memory (non-fragmentation) and security properties by inherently lacking a globally consistent view of what's free.
2
I think it's also problematic that the batching caused by thread caches increases the 95th percentile and up latency. Thread caches can also use a ridiculous amount of memory as the number of threads scale up, especially if they aren't tiny caches only for the smallest sizes.
1
1
It mostly needs to be concerned with very long lived processes doing tons of allocation, where latency is critical. The throughput does matter, but I don't particularly care about microbenchmarks beyond good enough performance and figuring out what each security feature costs.
1
For Android specifically, it used dlmalloc for a long time and hardened_malloc provides comparable performance with far finer-grained locking. It's possible future apps / OS code won't be as optimized due to jemalloc thread caches but new code is increasingly higher level anyway.
1
Show replies