Conversation

does it make sense to spin up a thread for cleaning up (merging deallocations) the global allocator?
2
2
Replying to and
Traditional malloc implementations have quick coalescing via headers with a pointer to the previous header and the next one. Modern slab allocation designs don't need that stuff. Purging spans of dirty pages is the slow part partly since they need to be faulted in again too.
1
1
In jemalloc, they support enabling worker threads for that kind of maintenance work. They keep a ratio of freed dirty pages to allocated memory and purge ranges beyond it in two phases with time delayed MADV_FREE followed by a further delayed MADV_DONTNEED.
1
1
That's essentially O(1) when not obtaining memory from the OS or returning it, and you don't always have to do that. It's faster than you'd think. An intrusive rbtree ordered by (size, address) can give you precise first-best-fit but that's O(log n) and takes 2 pointers.
1
1
Modern ones use slabs of allocations of a specific size class with internal free lists or bitmaps and then free lists of those slabs. Usually have mostly implicit (address alignment / ranges, etc.) and out-of-line metadata, What remains slow: getting / returning memory from OS.