Conversation

I think it's far more compelling than PAC even if you disregard the ability to use random tags and use entirely deterministic ones. PAC is yet another attempt at targeting exploit techniques. Only protects specific pointers rather than memory in general. IMO, it's underwhelming.
1
6
PAC is at odds with using the address space for exploit mitigations. It's directly opposed to approaches like splitting up the address space and avoiding reuse which is a *deterministic* UAF mitigation. It isn't just taking away bits from ASLR but also more interesting things.
1
Those 'more interesting things' also include memory tagging since eventually it could be possible to use 24-bit or larger tags via unused upper bits. PAC is using up a bunch of those precious bits for an inherently very weak and hard to widely deploy probabilistic mitigation.
2
1
Going from the current 16 byte granularity to 64 byte would at least make it possible to use 16-bit tags instead of 4-bit without needing more storage. They can certainly go a lot further than that if they're willing to offer the option to waste more space on the tag metadata.
1
If you go to 64 byte granularity, the amount of wasted memory for the majority of consumer workloads (browsers, etc) becomes your bottleneck. SPARC worked fine with that because it aligned with Oracle DB needs, but consumer OS are in a different space
1
For the ecosystem we work on, nearly all of that is using concurrent compacting garbage collection. C++ and now Rust are used for particularly high performance libraries or application code and it wouldn't make much sense to use it if that code was heavily impacted by malloc.
1
Android was using jemalloc for 5.x and beyond which provided very high performance with low metadata overhead, but those times have ended. It defaults to Scudo which is a semi-hardened allocator. Easily 4x slower with dramatically more metadata overhead for small allocations.
2
1
Chromium uses the system allocator on Android ever since it moved to jemalloc with Android 5.x. It still uses it now that they've mostly moved to using Scudo which is dramatically slower. They mostly use GC, PartitionAlloc and a few other allocators of their own.
1
I think you are missing my point. The workloads I refer to are small-size dominated. When you force 64-byte alignment and multiple of, your live allocations consume much more memory (granted, lots of this is anonymous except in the kernel). It’s always a DRAM problem :)