Conversation

does it make sense to spin up a thread for cleaning up (merging deallocations) the global allocator?
2
2
Replying to and
Traditional malloc implementations have quick coalescing via headers with a pointer to the previous header and the next one. Modern slab allocation designs don't need that stuff. Purging spans of dirty pages is the slow part partly since they need to be faulted in again too.
1
1
In jemalloc, they support enabling worker threads for that kind of maintenance work. They keep a ratio of freed dirty pages to allocated memory and purge ranges beyond it in two phases with time delayed MADV_FREE followed by a further delayed MADV_DONTNEED.
1
1
dlmalloc-style allocators have quick coalescing and track free spans via power of 2 size class bins for a bad approximation of best fit allocation. Has a ton of metadata overhead (16 bytes or more per alloc instead of 2-4 bits) and more importantly still too much fragmentation.
1
1
Modern ones use slabs of allocations of a specific size class with internal free lists or bitmaps and then free lists of those slabs. Usually have mostly implicit (address alignment / ranges, etc.) and out-of-line metadata, What remains slow: getting / returning memory from OS.