@avibryant take a look at this: https://github.com/twitter/algebird/pull/327 … — an obvious attempt at HLL intersections (which worked more or less in the REPL).
@posco wouldn't two huge but totally disjoint sets still claim to have a huge intersection?
-
-
@avibryant how huge? With 10 bits you have 1024 buckets. Not clear that error is off. If you are huge enough to saturate those % might be ok -
@posco too late for me to do the math, but add a test with 1e8 items instead of 1e2 and I'll be happier.
End of conversation
New conversation -
-
-
@posco in general this feels like a drastic overestimate. You're really dealing with min(|A|,|B|), which can be >> |A ^ B|. -
@avibryant you aren't dealing with min(|A|, |B|). Byte wise min will be smaller than that. Try it with some small sets. -
@posco no, I understand that. Just saying that on a byte by byte level, that's what you're starting with. -
@posco obviously you get lots of info from the sparsity of the vectors, but they don't stay sparse forever. -
@avibryant I'll do some analysis tomorrow. Do some numerical tests.
End of conversation
New conversation -
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.