Conversation

does it make sense to spin up a thread for cleaning up (merging deallocations) the global allocator?
2
2
Replying to and
Traditional malloc implementations have quick coalescing via headers with a pointer to the previous header and the next one. Modern slab allocation designs don't need that stuff. Purging spans of dirty pages is the slow part partly since they need to be faulted in again too.
1
1
In jemalloc, they support enabling worker threads for that kind of maintenance work. They keep a ratio of freed dirty pages to allocated memory and purge ranges beyond it in two phases with time delayed MADV_FREE followed by a further delayed MADV_DONTNEED.
1
1
dlmalloc-style allocators have quick coalescing and track free spans via power of 2 size class bins for a bad approximation of best fit allocation. Has a ton of metadata overhead (16 bytes or more per alloc instead of 2-4 bits) and more importantly still too much fragmentation.
1
1
That's essentially O(1) when not obtaining memory from the OS or returning it, and you don't always have to do that. It's faster than you'd think. An intrusive rbtree ordered by (size, address) can give you precise first-best-fit but that's O(log n) and takes 2 pointers.
1
1