Conversation

The overhead tied to random number generation is acceptable right now but it's going to get higher when a few more features are using it. It needs to be insanely fast for those features to be worth having. There's a limited performance budget with many features to spend it on.
1
Replying to
AES-CTR via AES-NI on x86 or the ARMv8-A instructions would be a good approach. No need for a fallback implementation which makes it a much simpler problem. Going to try using a basic existing implementation to see how much it helps and then it can be further optimized later on.
1
Replying to
Yeah, there are Turbo frequency offsets for AVX / AVX2 / AVX-512 (but not SSE). It would probably only make sense to use it if the cache was large which isn't desirable due to memory usage, cache impact and latency. AES-NI is probably the best option since it'll beat SSE ChaCha8.
1