MTE is really garbage. The history of processors is littered with similar errors.
Conversation
No, MTE is the first non-garbage thing in this class. The rest are all garbage, yes. MTE was actually designed (or they randomly got it right on the Nth guess) around the software model problem rather than being some hw person's random idea with no correspondence to sw needs.
2
2
Used correctly, MTE completely eliminates all sequential overflow/underflow, completely eliminates attacks that clobber metadata, and provides strong protection (for multiple generations of alloc/free) against UAF.
1
It probably has the same weakness as ASAN: it can't detect overflows between adjacent members within a struct or within an array of structs. Because you can't insert arbitrary padding without breaking the ABI.
3
Memory tagging doesn't need padding. The pointer has a tag in some of the bits and there's a fault if it doesn't match the tag assigned to the memory being accessed. github.com/GrapheneOS/har is an approach to integrating it into slab allocators. It (mostly) obsoletes canaries, etc
1
's (correct) point is that it can't tell when you overflow past the end of a[] in struct { char a[N1], b[N2]; }
1
It's not undefined in C so it wouldn't be standards compliant for C but it's capable of providing that functionality. I don't think ASan would be capable of doing that because it depends on inserting red zones. Memory tagging for that wouldn't require changing the struct layout.
1
Even if those two members fall within the same granule and the combined size is below 16 bytes?
1
In that case, no, but the tagging granularity seems designed to be variable / configurable eventually even though it's going to start off with 16 byte granularity. I don't think that's set in stone and if you wanted to sacrifice more memory I'd expect that to become possible.
1
1
Are you sure? I can't imagine it being finer granularity than cache lines. The obvious implementation is to store the tag with the cache line and compare the tag field of the address with the tag value of the cache line selected by the address.
1
I'm sure it's going to have 16 byte granularity since that what they've documented for ARMv8.5 but the architecture documentation is defined in a very generic way that leaves open the possibility of having smaller / larger granularity.



