This. CFI is all snake oil aimed at giving attackers headaches at best. But MT admits mitigations that fully shut off lots of vulns.
Conversation
Unless I'm mistaken, even just 3 tag values suffice to ensure that no adjacent objects have same tag, which eliminates all sequential-store buffer overflows, or can be used to protect metadata between objects from *all* OOB stores.
1
It can also be used for deterministic use-after-free mitigation for a given number of cycles. I've documented the approach that I'll be using for it in hardened_malloc:
github.com/GrapheneOS/har
There will be a value reserved for free memory, and then it increments the old value.
2
1
This assumes freed chunks aren't splittable/mergeable, I think, which is another motivating factor for dropping dlmalloc design.
1
A performance-oriented allocator would work well with a similar region-based slab allocation design. It could just use free lists within slabs instead of bitmaps, while still using the same approach to out-of-line metadata for tracking slabs. Main loss is double-free detection.
1
There are other approaches to out-of-line metadata than the address space reservation. I'm doing that largely to get dedicated, isolated memory regions for metadata and each size class with the address space never being mixed / reused. Can also use an approach based on alignment.
1
Memory tagging with a reserved free tag makes it so that the free list gets memory protected (free list accessed with pointers using the reserved free tag) and it's possible to detect double-free by checking for the free tag instead of checking if the slot is free in the bitmap.
2
1
Note that this depends on clearing the free list pointers from the chunk before returning it to satisfy an allocation. Otherwise the "uninitialized" memory returned by malloc contains valid (correctly-tagged!) pointers into the free list.
1
Yeah, and also to preserve the nice property of having allocated memory guaranteed to be zero without information leaks. In my approach, the old tag for the previous allocation of that slot will be saved in the free slot too, in order to increment it, so that gets zeroed too.
1
Leaking those isn't desirable since the initial value was randomized and the weak probabilistic part of memory tagging does have some value too. To bypass the incremented tag, the use-after-free can't happen until the allocation is active after being allocated again 15 times.
1
Combine that with the quarantine, which for example with 128k per class will delay reuse of 16 byte allocations for 8192 alloc/free cycles. There are really 2 layers of quarantine, the ring buffer (FIFO) and a random array (swaps with random slot), which makes it hard to line up.
Since it actually has to line up with the allocation being reused a multiple of 15 times later (due to having 16 tags with 1 tag reserved for free memory). It's a bit more complex than that due to skipping adjacent tags when incrementing as with choosing the initial random one.

