Conversation

Replying to
The Linux kernel has had a few cases where 32-bit refcounts could overflow if you had enough RAM (starting at around sizeof(void*)*pow(2,32) = 32 GiB). In at least one of those cases, the fix was to just not allow that many references.
1
32
Replying to and
in a similar context, once proposed to have dynamically-sized refcounts - make the inline refcount 16 or 32 bits wide, and reserve a value that means "the real value is stored in a global hash table indexed by refcount address", or store a table index in the refcount
1
5
Replying to and
They often don't want to pay the memory cost of having properly sized reference counts or they prevented themselves from doing it by using up a limited amount of space for other things. I think my proposal was quite good and it'd be easy to make it transparent via the same API.
1
1
They didn't seem to like the concept though. I find it to be really horrible to put arbitrary limits on the number of objects where if you can reach that limit you're going to cause horrible problems because it could be the wrong processes getting screwed over by other ones.
1
3
The default vm.max_map_count of 65530 is ridiculously low and increasingly inadequate for many use cases. It's very common for raising it to be recommended for server applications. hardened_malloc defaults to very fine-grained use of guard pages requiring raising the limit too.
2
1
Replying to and
I do think that the right way to do lots of guard pages would be a different one though: If you want to sprinkle guard pages into a big anonymous mapping, you could do that by using special PTE values (just like swap/migration/... PTEs), and then avoid all that VMA overhead
1
Replying to and
The way hardened_malloc does it for very large allocations is choosing a random guard size with a minimum and maximum number of pages determined based on the allocation size. That case could really just be a kernel feature taking care of managing randomly sized guards for you.
1
The harder case to handle would be the fine-grained use of guard pages in the slab allocation regions. It reserves a massive PROT_NONE region for all of the mutable allocation state and a separate massive region for each slab allocation size class. It allocates with mprotect.
1
It deallocates by replacing the memory it unprotected with a fresh PROT_NONE mapping. The way it ends up with fine-grained guard pages for slab allocation is that it simply skips 50% of the possible slab locations (can be configured) so each slab has a guard slab before/after it.
1
For large allocations, it takes 2 system calls to allocate (mmap PROT_NONE, then mprotect usable area to PROT_READ|PROT_WRITE) and 1-2 system calls to deallocate (replaces it with PROT_NONE mapping and potentially unmaps a previously quarantined one pushed out of the quarantine).
1
Cost of VMAs in terms of overall memory is quite small and nearly everything is either O(log n) based on # of VMAs or at least SHOULD be O(log n). It's true that there's some really stupid code doing O(n) operations based on the number of VMAs. Mostly only spawning threads.
Replying to and
so if you wanted to do that with VMA tree modifications, you'd basically need something that lets you say "fault on zero PTEs instead of allocating new pages"? you could probably hack something like that together with userfaultfd... allocation would be UFFDIO_ZEROPAGE, [cont]
1
Show replies