Programmers / software engineers -- does this exist: I'm looking for a very efficient high-speed ID lookup system where I pass an integer to it (8 bytes, but configurable would be awesome) and it will index that id into file segments (configurable size) and periodically sort ...
Conversation
... the file segments so that ID lookup is fast. In front of this might be a bloom filter to speed it up even more so that it doesn't always have to hit the disks to check if the ID exists.
I would need something that scales up to ~10 trillion ids.
1
2
Essentially, this would be a very low-level piece of software with an API that lets you insert an ID and/or check if an ID already exists. The files to hold the existing ids shouldn't really have much of an overhead if any.
Replying to
Redis Sets seem to be close to what you need. A trillion u64 IDs require terabytes of memory so to make it work with Redis you would need to shard your IDs across different nodes, but other than that you should be fine. Also the OH should be minimal.
1
Replying to
For an 8-byte ID, and w/o being clear on whether you are looking for a library or a product, or what you mean by file-segments, you can use github.com/RoaringBitmap/. Just partition your 10 trillion ID-space such that each shard, when serialize()ed, will fit in your disk partitions


