Consider a loop that calls malloc(16) and free(16) repeatedly. Normally the allocations will always be in cache. But in an allocator that never reuses memory every new page will miss.
Conversation
Indeed this example will be slow, but who calls malloc/free in a hot loop?
There's also a much more granular safety story - you could allow some of your trusted dependencies to be compiled in ReleaseFast mode but most of your application in ReleaseSafe.
4
3
We've had a long time to determine the answer to the question of whether it matters in practice that the memory malloc returns is in cache, and the answer is that yes, it does.
3
5
The state-of-the-art in hardened malloc is 's github.com/GrapheneOS/har and it does quarantine for large objects only (I think, could be wrong).
1
7
It has optional quarantines for both small and large allocations as separate features. Large means a mmap region with guards.
In both cases, it has a random quarantine based on swapping with a random element in an array and a ring buffer as a queue for a deterministic delay.
1
2
For the large allocation quarantine, it replaces the allocation with fresh PROT_NONE pages so it's essentially a virtual memory quarantine.
Either way, it always deterministically detects any invalid free of any pointer that's not a valid non-free, non-quarantined allocation.
1
2
By default, the configuration is very security centric with all security features enabled other than the somewhat superfluous MPK-based metadata protection.
Quarantines are largely separate from the core code other than slab metadata having an extra bitmap for quarantined slots.
2
1
Have you done measurements about the results of quarantining all allocations forever? I'm skeptical but interested in seeing results
2
1
(One thing I've been learning at BigCo is how much $$$ tiny malloc perf improvements add up to at scale…)
1
The small allocation quarantine is very expensive. It's not just that memory for allocations is colder either.
The large allocation quarantine is very cheap since large typically means > 128k, although it can mean > 16k if the extended slab allocation size classes are disabled.
1
1
The reason for 16k being where it naturally switches over is that the next size class (20480) is when they start being aligned to 4096 bytes. 64k would be the natural boundary with 16k pages.
github.com/GrapheneOS/har
Main issue with small allocation quarantine is wasted memory.
twitter.com/DanielMicay/st (primarily about address space size vs. pointer authentication) and twitter.com/DanielMicay/st (current work on reducing system-wide slab allocation quarantine memory usage for GrapheneOS) are recent threads about related topics.
Quote Tweet
GrapheneOS uses our hardened_malloc allocator (github.com/GrapheneOS/har) with all the optional security features enabled.
The optional slab quarantine features inherently need to use a substantial amount of memory in order to delay reuse of slab allocations as long as possible.
Show this thread

