Conversation

ShadowCallStack, inline malloc metadata, freed allocations, etc. can be protected with a single reserved tag. Making sure adjacent allocations have different tags or having protected metadata between them wipes out small / linear overflows. Can do a lot more than random tags.
1
1
I think it's far more compelling than PAC even if you disregard the ability to use random tags and use entirely deterministic ones. PAC is yet another attempt at targeting exploit techniques. Only protects specific pointers rather than memory in general. IMO, it's underwhelming.
1
6
PAC is at odds with using the address space for exploit mitigations. It's directly opposed to approaches like splitting up the address space and avoiding reuse which is a *deterministic* UAF mitigation. It isn't just taking away bits from ASLR but also more interesting things.
1
Those 'more interesting things' also include memory tagging since eventually it could be possible to use 24-bit or larger tags via unused upper bits. PAC is using up a bunch of those precious bits for an inherently very weak and hard to widely deploy probabilistic mitigation.
2
1
Going from the current 16 byte granularity to 64 byte would at least make it possible to use 16-bit tags instead of 4-bit without needing more storage. They can certainly go a lot further than that if they're willing to offer the option to waste more space on the tag metadata.
1
If you go to 64 byte granularity, the amount of wasted memory for the majority of consumer workloads (browsers, etc) becomes your bottleneck. SPARC worked fine with that because it aligned with Oracle DB needs, but consumer OS are in a different space
1
For the ecosystem we work on, nearly all of that is using concurrent compacting garbage collection. C++ and now Rust are used for particularly high performance libraries or application code and it wouldn't make much sense to use it if that code was heavily impacted by malloc.
1
Android was using jemalloc for 5.x and beyond which provided very high performance with low metadata overhead, but those times have ended. It defaults to Scudo which is a semi-hardened allocator. Easily 4x slower with dramatically more metadata overhead for small allocations.
2
1
Native Android application code is written in Kotlin or Java so most allocation is handled via concurrent compacting GC. Chromium is unusual since it's a port of an application from elsewhere. The application layer is entirely written in Java but the browser renderer is all C++.
1
I think you are missing my point. The workloads I refer to are small-size dominated. When you force 64-byte alignment and multiple of, your live allocations consume much more memory (granted, lots of this is anonymous except in the kernel). It’s always a DRAM problem :)