Heads up: musl 1.2.1 is exposing bugs in several applications and libraries from async-signal-unsafe code between (multithreaded) fork and subsequent exec.
Conversation
I've heard them being blamed on mallocng, because they mostly involve invalid use of malloc in this context, but I think the actual change that made them visible is git.musl-libc.org/cgit/musl/comm
1
1
Previously, the forked child skipped libc-internal locks because it was single-threaded, thereby making the app's UB manifest as accessing inconsistent, partially-modified state that should be protected by the lock (which should deadlock).
1
1
Now, these programs safely deadlock if the parent was holding a lock at the moment of fork.
1
2
The known culprits so far are dbus (code to auto-run a session dbus if it doesn't already exist), pulseaudio (code to auto-run daemon if it doesn't already exist), and libvirt (its posix_spawn clone with extra hooks). All look fairly easily fixable.
1
5
Replying to
I had to add support for this in github.com/GrapheneOS/har for compatibility. Since other malloc implementations provide it as an extension, not supporting it uncovered many portability issues. Same code is reused for the memory tracing API extension (malloc_enable/malloc_disable).
2
Still haven't implemented most of that memory tracing API (used by android.googlesource.com/platform/syste and perhaps other tooling) since malloc_iterate cannot be implemented. It would need to track large allocations with a data structure supporting range lookups rather than a hash table.
1
It's annoying that the mutex issue can't be handled in a truly portable way due to the flawed POSIX mutex API. In practice, it can be handled, and malloc depending on the libc implementation details (being able to call pthread_mutex_init again in the forked child) isn't that bad.
It's stated that pubs.opengroup.org/onlinepubs/969 may be deprecated in a future version of the standard but in an ideal world it would be fork being deprecated as a whole...
We miss out on a lot of memory / startup optimizations designed around fork. Nearly all could be done without it.

