Heads up: musl 1.2.1 is exposing bugs in several applications and libraries from async-signal-unsafe code between (multithreaded) fork and subsequent exec.
Conversation
I've heard them being blamed on mallocng, because they mostly involve invalid use of malloc in this context, but I think the actual change that made them visible is git.musl-libc.org/cgit/musl/comm
1
1
Previously, the forked child skipped libc-internal locks because it was single-threaded, thereby making the app's UB manifest as accessing inconsistent, partially-modified state that should be protected by the lock (which should deadlock).
1
1
Now, these programs safely deadlock if the parent was holding a lock at the moment of fork.
1
2
The known culprits so far are dbus (code to auto-run a session dbus if it doesn't already exist), pulseaudio (code to auto-run daemon if it doesn't already exist), and libvirt (its posix_spawn clone with extra hooks). All look fairly easily fixable.
1
5
Replying to
I had to add support for this in github.com/GrapheneOS/har for compatibility. Since other malloc implementations provide it as an extension, not supporting it uncovered many portability issues. Same code is reused for the memory tracing API extension (malloc_enable/malloc_disable).
2
Replying to
It's not just malloc but *everything* that involves locks. malloc is just the most obvious one. Some *can't* be satisfied. For example with stdio FILE locks there may be a blocking operation in process that would deadlock with fork acquiring the lock.
1
Replying to
Yeah, but it's particularly common for applications to depend on malloc after fork because of implementations commonly supporting it as an extension. Not really sure why they would have started supporting it, but since they do, it puts pressure on everyone else to do it too.
Replying to
I think they depend on other things that don't work too, and just hit the deadlock or corruption rarely enough that everybody pretends it's ok...

