I'm going to need rounding for large allocation sizes like jemalloc to work around poorly written applications and libraries with element-by-element realloc growth loops. GTK+ / Qt ecosystems are full of memory corruption & inefficient patterns like this.
github.com/AndroidHardeni
Conversation
It's all fine with server and command-line applications and on Android, but GTK+ and Qt applications often take a ridiculously long time to load because applications feel like reallocating 32 bytes to 32 MiB via realloc loops in increments of 32 bytes. Krita is particularly bad.
2
2
8
Replying to
Please, let's work together to get the offenders fixed upstream. This affects musl's current and future allocators too and we're not going to penalize reasonable apps with ridiculous memory bloat to facilitate these app bugs.
1
1
Replying to
My approach to this won't increase actual memory usage, but I'd still rather not do it. It doesn't impact small allocations because the size classes are already divided up into 4 per doubling in size for efficiency reasons. That part is explained here: github.com/AndroidHardeni.
1
The code organizes them in rows based on scaling size, which doubles after reaching the next power of 2:
github.com/AndroidHardeni
It's possible to have more size classes to have more waste from usage spread between them and less waste from rounding but this is a nice balance.
1
Right now is I switch over to using memory mappings once the scaling size has reached 4096 bytes (allocations beyond 16k bytes). I'll likely make it so the size classes go higher since it only takes 4 more to get to 32k, another 4 to 64k and so on but it's kept minimal for now.
1
So, for slab allocations, the problem doesn't matter because even growing by 1 byte in a loop only results in 4 copies per doubling in size. It played no part in choosing those size classes, but they do help with it. The issue is once those end, and it switches 4k granularity.
So what I plan to do is adding the same kind of rounding for large allocations, via a one to two line calculation, which will result in using up to ~20% extra virtual memory in the worst case for large allocations, but not higher memory usage. Already use up to 2x VM for guards.
1
Also worth noting that this allocator is 64-bit only so systems without abundant virtual memory aren't a concern. There's an indirect impact on performance from a sparser heap, but this wouldn't be a substantial difference. Would rather not need to do this, but it really hurts.

