Conversation

It deallocates by replacing the memory it unprotected with a fresh PROT_NONE mapping. The way it ends up with fine-grained guard pages for slab allocation is that it simply skips 50% of the possible slab locations (can be configured) so each slab has a guard slab before/after it.
1
Caches a certain amount of empty slabs based on total size and then switches to replacing them with fresh PROT_NONE mappings with mmap using MAP_FIXED. 1 system call (mprotect) to allocate a slab and 1 system call (mmap) to deallocate, each grabbing mmap_sem write lock once.
2
Replying to and
so if you wanted to do that with VMA tree modifications, you'd basically need something that lets you say "fault on zero PTEs instead of allocating new pages"? you could probably hack something like that together with userfaultfd... allocation would be UFFDIO_ZEROPAGE, [cont]
1
Replying to and
That would probably work but then hardened_malloc would depend on that system call which often isn't available in the sandboxes where it runs. Fact that it uses per-size-class ChaCha8 CSPRNG seeded by getrandom for different internal uses of randomness has already been an issue.
1
I think it would probably be quite bad to add userfaultfd as a whitelisted system call in sandboxes so hardened_malloc can't really depend on it. Depending on getrandom and mprotect is no problem and should really already be whitelisted for sandboxes that are using malloc.
2
By default, hardened_malloc has 4 arenas which each have 49 size class regions that are each 64GB. Those have a 32GB usable region randomly located within the outer region. Half of that 32GB region ends up being used for guard slabs by default. All works fine without overcommit.
1
It reserves that entire slab region as a single massive PROT_NONE mapping. It wouldn't work without overcommit if it wasn't PROT_NONE and freeing it back to fresh PROT_NONE mappings is important for dropping the accounted memory. It has to make fresh mappings to free memory.
1
Setting a slab that was used back to PROT_NONE and purging it with madvise using MADV_DONTNEED doesn't account drop the accounted memory. It has no choice but to use mmap with MAP_FIXED to make a fresh mapping, and it's more efficient (1 system call) than 2 system calls anyway.
1
It inherently needs a way of getting away with reserving immense amounts of the address space in advance and also ideally of fully releasing the memory back to the OS when it frees it. It's hard to release slab metadata memory back to the OS but that's tiny vs. slabs themselves.