Some nice empirical measurement of HyperLogLog++ and min-hash signatures (or, really, k-min-values) here: http://tech.adroll.com/media/hllminhash.pdf …
-
-
-
Replying to @vitalygordon
@BigDataSc@avibryant More interesting: real world performance vs HLL and inclusion-exclusion for intersection?1 reply 0 retweets 0 likes -
Replying to @fdaapproved
@fdaapproved@BigDataSc yeah, I'm skeptical about the use of both HLL and KMV; bet you're better to throw all the bits at one or the other.1 reply 0 retweets 0 likes
@fdaapproved @BigDataSc the main reason to use minhash IMO is to then use LSH to discover high-similarity pairs without needing n^2 compares
8:59 AM - 26 Sep 2013
0 replies
0 retweets
0 likes
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.