Long thread about allocator design choices: https://twitter.com/DanielMicay/status/1034204853193433088 …. It's entirely possible to provide a hardened allocator with decent performance, low memory use and great scalability without giving up core security properties like fully out-of-line and authoritative metadata.
Is part of the problem perhaps that people writing them are thinking about scaling to 1024 cores or something?
-
-
The lock cost you're trying to amortize should scale with the number of cores due to scaling of cache coherency traffic/contention, no?
-
It might scale up somewhat but at least in jemalloc there are arenas assigned to threads either via round-robin (static assignment) or dynamic load balancing (sched_getcpu is the naive way, but Facebook has some kind of sophisticated replacement). The arenas what gives scaling.
-
Thread caches batch together a bunch of work which results in holding the locks for much longer, especially if the thread caches are large. They're kept quite small in jemalloc and have non-trivial garbage collection / heuristics esp compared to the massive ones used by TCMalloc.
-
In TCMalloc, it was a bandaid for not having an efficient underlying allocator. It's a lot better for the throughput to come from a very high performance allocator behind the locks with the thread caches only needing to make small batches of work to wipe out locking costs.
-
For my new hardened allocator, providing the option of having arenas will be natural since it just requires having multiple slab allocation regions and then assigning threads to them. It will only be a dozen lines of code.
End of conversation
New conversation -
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.