In case you missed it, the Linux CSPRNG is pretty good these days!
The extraction has been using ChaCha20 for a while. What just changes is that the entropy mixing will now use Blake2, which makes a lot of sense since it's the same core as ChaCha20.
When you make good crypto fast, you get to use it everywhere, which ends up making things more secure and robust.
Here's Jason replacing a "best effort" kernel RNG function with the CSPRNG. With how fast getrandom(2) is, applications can do the same!
https://lore.kernel.org/lkml/20161222180741.24577-1-Jason@zx2c4.com/…
The kernel is generating per-CPU batches of random numbers. Directly using the CSPRNG without a cache would have substantial locking overhead. That overhead is substantially higher for userspace since it needs to make system calls which have gotten substantially more expensive.
In userspace, you also need to remember to use MADV_WIPEONFORK (ideally) or hooks to avoid leaking it into child processes and to make them get fresh data from the kernel. It's also genuinely quite difficult to do per-CPU caching rather than per-thread caching in userspace.
Consuming an extra page of memory per thread can be a real issue. Per-CPU caching can be done with restartable sequences but it's not simple.
Kernel CSPRNG is certainly higher throughput than most CSPRNGs now. Still need more performance to get people to stop using non-CS PRNGs.
I think it's finally reasonable to say people should simply use (batched) getrandom(...) output for nearly all use cases explicitly needing secure random numbers.
There are still many use cases for random numbers not involving cryptography where people expect better performance.
Most programming languages split their random API into a regular one with a non-CS PRNG and a secure one with a CSPRNG. I think the biggest improvement would be convincing them to use ChaCha to replace their non-CS PRNGs. ChaCha8 is ridiculously fast especially with SSE2/NEON.