Tinkering about how to publish domain blacklists without immediately revealing the domains. Compressed bloom filters or Golomb-coded sets using hash functions whose complexity increases exponentially + a final memory hard hash?
A simpler approach would be to publish a set of truncated hashes of exponentially increasing work factor for each domain. Easier to do and update, but obviously less memory efficient than probabilistic structures.
-
-
Context: I want to make my blacklists for parental protection available. Cloud-based filters are not acceptable from a privacy standpoint. Yet, I don’t want to publish a huge list of offensive domains.
Show this threadThanks. Twitter will use this to make your timeline better. UndoUndo
-
-
-
Maybe the tricky part in Bloomfilters is the management of the false-positive ratio. Also the canonization of the domains can be a source of issues. Not sure what the best data-structure is...
-
Can't you check the full URL (or its hash) in case of a hit, to avoid (or at least reduce) FPs?
- 7 more replies
New conversation -
-
-
If you solve that, you might solve other major problem, related to database leaks, when database need just to verify personal information somehow, but not to store it(so it can't be stolen).
Thanks. Twitter will use this to make your timeline better. UndoUndo
-
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.