Intersection of cryptography, software engineering, statistics, and performance:
If i want to do a random sample from a set of data, and I wish to simply do "hash(dataitem) & bitmask < value", what properties does my hash function need to be statistically sound? Clearly, a ...
Conversation
... cryptographic hash will suffice in terms of uniformity of sampling, but how about non-cryptographic hashes? Cityhash64?
3
2
Replying to
ah, uniformity is a harsher constraint than limiting P(collission), so not many hashes will be optimized for it. What kind of data (and amount of information, per hash) are we talking about?
3
Replying to
Millions, up to billions of elements. Should have ~30-40 bits of entropy in the input data.
1
1
How large are the keys in terms of bytes on average?


