Intersection of cryptography, software engineering, statistics, and performance:
If i want to do a random sample from a set of data, and I wish to simply do "hash(dataitem) & bitmask < value", what properties does my hash function need to be statistically sound? Clearly, a ...
Conversation
... cryptographic hash will suffice in terms of uniformity of sampling, but how about non-cryptographic hashes? Cityhash64?
3
2
Replying to
SipHash is a cryptographically secure PRF (keyed hash) and is close to being as fast as one of those typical hacked together keyed hashes with unclear strength.
If your keys are 16 bytes or less, you can directly use AES as a PRF. Doesn't work if they're larger than that though.
If your keys are something like a 64-bit integer or a pair of 64-bit integers, you can use reduced round AES via AES-NI as a PRF.
You could also do this for small strings if those are most of your inputs with SipHash or something else for anything beyond the hard 16 byte limit.

