Conversation

Some initial documentation for my new hardened malloc implementation: github.com/AndroidHardeni It's still in an early state of development with major parts of the core design and many security features not yet implemented. It should work with nearly any programs already though.
2
26
You’re unable to view this Tweet because this account owner limits who can view their Tweets. Learn more
Replying to and
Scudo is entirely based on inline metadata and free lists. It relies on CRC32 to detect metadata corruption and can't reliably detect invalid free in the same way. Having fully out-of-line metadata is extremely important for providing many other security properties too.
1
1
Similarly, it can't offer the same kind of fine-grained randomization. One of the biggest differences is that the design I'm using is isolating each size class with a unique random base within the outer reserved region rather than a dlmalloc-style allocator mixing it together.
1
Metadata is entirely out-of-line with no deterministic offsets to it and similarly there are no deterministic offsets between size classes. Address space is reserved for a size class and never reused for another. It's a bit like PartitionAlloc, but with much stronger isolation.
1
My goal is providing better security properties than OpenBSD malloc, which also uses a hardened design from the bottom up with fully out-of-line metadata. It's also going to have much better performance, lower memory overhead and far better scalability than OpenBSD malloc.
1
OpenBSD malloc performance is good enough for the real world and it's possible to do much better without sacrificing security. I'd consider the core of any hardened allocator to be authoritative out-of-line metadata and it makes performance a challenge, but it can still be done.
1
Similarly, thread caches (or queues) are at odds with having an authoritative source for everything able to perform good double-free and invalid free detection. It's possible to offer great scalability without them with arenas, especially layered on top of fine-grained locking.
1
The main reason thread caches are used by jemalloc is really to amortize the cost of locking, and it has the best understanding / approach to them that I've seen. It's easy to waste tons of memory per thread like that too and end up with extremely bad worst case memory usage.
1
I'd also point to the lessons learned and implemented for jemalloc about allocator performance. A faster allocator with higher internal / external fragmentation and higher metadata overhead will often be slower in the real world. Keeping the working set of memory small matters.
1
1