It would be difficult to get any more performance out of the CSPRNG layer now: github.com/AndroidHardeni.
ChaCha8 implementation could be much faster which might help. It's just the scalar reference implementation converted into a pure keystream API by removing XOR of the message.
Conversation
The overhead tied to random number generation is acceptable right now but it's going to get higher when a few more features are using it. It needs to be insanely fast for those features to be worth having. There's a limited performance budget with many features to spend it on.
1
Replying to
maybe switch from chacha to AES on platforms that have hardware acceleration? (which are probably all of the ones that matter)
1
Replying to
AES-CTR via AES-NI on x86 or the ARMv8-A instructions would be a good approach. No need for a fallback implementation which makes it a much simpler problem. Going to try using a basic existing implementation to see how much it helps and then it can be further optimized later on.
1
1
ChaCha8 using AVX2 can actually provide better performance than AES using CPU instructions:
bench.cr.yp.to/results-stream
ChaCha8 wins on Intel CPUs there and is quite competitive on the AMD CPUs and Tegra X1. Expect it loses on Snapdragon which is all that really matters for arm64.
Replying to
I would be a bit skeptical of those results; turning on the vector units will slow everything else down, which is fine as long as your workload is entirely vector operations, but probably terrible for this use case. See blog.cloudflare.com/on-the-dangers
1
1
Replying to
Yeah, there are Turbo frequency offsets for AVX / AVX2 / AVX-512 (but not SSE). It would probably only make sense to use it if the cache was large which isn't desirable due to memory usage, cache impact and latency. AES-NI is probably the best option since it'll beat SSE ChaCha8.
1
1

