Conversation

I found an apparent bug in the Linux kernel Memory Protection Keys implementation: github.com/AndroidHardeni It loses the Memory Protection Keys setup on fork. I wasn't entirely sure if this was a bug or just an odd design but I contacted the maintainer and they think it's a bug.
1
14
It's similar to how I find the MAP_FIXED_NOREPLACE clobbering bug by trying out these features as an early adopter: twitter.com/DanielMicay/st It also shares the problem of making it difficult to use after the problem is fixed since unpatched kernels will have broken semantics.
Quote Tweet
I tried out the MAP_FIXED_NOREPLACE API introduced in Linux 4.17 for a minor use case in my hardened allocator and it turns out that it has been very broken since it was introduced. It can clobber adjacent mappings: marc.info/?l=linux-mm&m= It will hopefully be fixed in 4.19.
Show this thread
1
MAP_FIXED_NOREPLACE was designed to transparently fall back to normal mmap hint handling on older kernels, so checking for support isn't necessary. However, the broken early implementation means there are kernels where it will cause memory corruption and more caution is needed.
1
1
On the positive side, it's fairly easy to make a safe runtime test for the MAP_FIXED_NOREPLACE bug and fall back to not passing the flag if the implementation is broken rather than hard-wiring the version for each branch where it becomes safe. The MPK issue is trickier to handle.
1
1
I can't think of a similar way to detect the MPK problem at runtime in library code because I can't spawn a process to see if it works. If I can detect the broken semantics, I can work around it by reapplying the pkey setup, but doing that would be broken with fixed semantics.
Replying to
I expect that I'll need to check the kernel version via uname and hard-wire the stable releases with the fix for each kernel branch. I won't be able to use it on older kernels because someone might cherry-pick the fix without upgrading the version so I can't apply the workaround.
1
For the time being, I think I'll work around it by disabling metadata sealing in the post fork handler for the child process. It will start working again after exec so fancy metadata protection will only be missing for fork-based concurrency designs and tiny fork -> exec windows.