In a way you never do with Linux. Because its design is generic. It’s meant to run programs for you. But when’s the last time it’s memory model got in your way?
mremap is super broken because it returns EINVAL if you try to use it on multiple VMAs but the kernel does not reliably merge compatible VMAs.
I have a simple test case involving switching to PROT_NONE and back where it fails to merge them, and then mremap will return EINVAL...
This is my test case if you're curious:
https://gist.github.com/thestinger/8a18e994ac88db1a48ba15dee3722c59…
I haven't really bothered trying to get this fixed upstream at this point. It doesn't really help me that much since it would be so long before I could assume that the kernel actually works properly.
In this test case, the issue seems to be that it tracks whether a VMA was ever PROT_WRITE and it won't merge one that was PROT_WRITE at some point with one that wasn't. Maybe that makes sense. I think my actual problem was slightly different than this test case I ended up making.
IIRC, the issue is that the expanded usable area between the guards doesn't get reliably merged into a single VMA. That ends up breaking the MREMAP_MAYMOVE case below since it gets EINVAL when it tries to remap the usable area. It's such a nonsense API. Really hate it.
I can't remember why that happens. Maybe it doesn't like to merge a fresh PROT_READ|PROT_WRITE VMA with one that was touched. Maybe I could fix it by touching it, toggling it back to PROT_NONE, then back to PROT_READ|PROT_WRITE. Should just delete it since it barely helps anyway.
https://mail-archive.com/linux-kernel@vger.kernel.org/msg1793333.html… was the MPK issue. It would entirely lose your pkey setup on fork. I had to work around it by disabling memory protection keys in the post-fork handler so it would just stop using the feature until exec. I've removed that workaround now, at least...