In case you missed it, the Linux CSPRNG is pretty good these days!
The extraction has been using ChaCha20 for a while. What just changes is that the entropy mixing will now use Blake2, which makes a lot of sense since it's the same core as ChaCha20.
When you make good crypto fast, you get to use it everywhere, which ends up making things more secure and robust.
Here's Jason replacing a "best effort" kernel RNG function with the CSPRNG. With how fast getrandom(2) is, applications can do the same!
https://lore.kernel.org/lkml/20161222180741.24577-1-Jason@zx2c4.com/…
The kernel is generating per-CPU batches of random numbers. Directly using the CSPRNG without a cache would have substantial locking overhead. That overhead is substantially higher for userspace since it needs to make system calls which have gotten substantially more expensive.
In userspace, you also need to remember to use MADV_WIPEONFORK (ideally) or hooks to avoid leaking it into child processes and to make them get fresh data from the kernel. It's also genuinely quite difficult to do per-CPU caching rather than per-thread caching in userspace.
Consuming an extra page of memory per thread can be a real issue. Per-CPU caching can be done with restartable sequences but it's not simple.
Kernel CSPRNG is certainly higher throughput than most CSPRNGs now. Still need more performance to get people to stop using non-CS PRNGs.
I think it's finally reasonable to say people should simply use (batched) getrandom(...) output for nearly all use cases explicitly needing secure random numbers.
There are still many use cases for random numbers not involving cryptography where people expect better performance.
Most programming languages split their random API into a regular one with a non-CS PRNG and a secure one with a CSPRNG. I think the biggest improvement would be convincing them to use ChaCha to replace their non-CS PRNGs. ChaCha8 is ridiculously fast especially with SSE2/NEON.
For example, OpenJDK directly uses the kernel CSPRNG for SecureRandom. They actually drive people away from using it by not batching it.
Regular random APIs are insecure, but they get used in security sensitive places all the time. It's not even fast, so people use alternatives.
It'd solve so much if those non-secure APIs gave people a nice userspace ChaCha8 implementation with wiping/reseeding on fork. It'd be incredibly rare for anyone to want more performance. Can still be marked not suitable for secure use in docs but wouldn't matter if people did.