Conversation

Replying to
The kernel is generating per-CPU batches of random numbers. Directly using the CSPRNG without a cache would have substantial locking overhead. That overhead is substantially higher for userspace since it needs to make system calls which have gotten substantially more expensive.
2
3
Replying to and
In userspace, you also need to remember to use MADV_WIPEONFORK (ideally) or hooks to avoid leaking it into child processes and to make them get fresh data from the kernel. It's also genuinely quite difficult to do per-CPU caching rather than per-thread caching in userspace.
1
5
Replying to and
Consuming an extra page of memory per thread can be a real issue. Per-CPU caching can be done with restartable sequences but it's not simple. Kernel CSPRNG is certainly higher throughput than most CSPRNGs now. Still need more performance to get people to stop using non-CS PRNGs.
1
4
Replying to and
I think it's finally reasonable to say people should simply use (batched) getrandom(...) output for nearly all use cases explicitly needing secure random numbers. There are still many use cases for random numbers not involving cryptography where people expect better performance.
1
2
Replying to and
Most programming languages split their random API into a regular one with a non-CS PRNG and a secure one with a CSPRNG. I think the biggest improvement would be convincing them to use ChaCha to replace their non-CS PRNGs. ChaCha8 is ridiculously fast especially with SSE2/NEON.
1
4
Replying to and
For example, OpenJDK directly uses the kernel CSPRNG for SecureRandom. They actually drive people away from using it by not batching it. Regular random APIs are insecure, but they get used in security sensitive places all the time. It's not even fast, so people use alternatives.
1
2
Replying to and
It'd solve so much if those non-secure APIs gave people a nice userspace ChaCha8 implementation with wiping/reseeding on fork. It'd be incredibly rare for anyone to want more performance. Can still be marked not suitable for secure use in docs but wouldn't matter if people did.
1
3
Replying to
The arc4random family is exactly that (though with ChaChaa20 instead of 8). For some reason, it never caught on with glibc and thus the Linux ecosystem unless people pull in the external libbsd.
1
1
Replying to
Reference implementation of arc4random is scalar ChaCha20 with global cache and locking. It zeroes out the consumed data from the cache. The generation of uniform random numbers within a range is 32-bit only and could be faster, although with locking costs doesn't matter much.
1
2