The new "Rinzler" big data archiving and index system (written purely in Go) was just tested on a 64 core machine and can encode and index a whopping 500,000 records per second. This take JSON data, indexes key fields, compresses with zstd and applies ...
#bigdata #datascience
Conversation
Reed Solomon redundant data packets for each row. Even with redundancy, each row of data is compressed to around 25-30% of it's original size. The goal is to reach one million encodings per second.
1
1
If it can reach 1.5 million encodings per second, it could theoretically catalog and index every Reddit comment (over 5.5 billion) in one day.
1
4
Wow I can't even do basic math -- 1.5 million a second would index Reddit in one hour. Whoops.
