Back on my "essays as commit messages" shithttps://github.com/rust-lang/crates.io/pull/2203 …
-
Show this thread
-
Replying to @sgrif
For http://lib.rs I had to give up on using a database for downloads. Now I use an in-memory equivalent of HashMap<version_id, Gzip<[u32; 366]>> for processing the download counts.
1 reply 0 retweets 0 likes -
Replying to @kornelski
We're scaled across multiple servers, so no in memory
2 replies 0 retweets 0 likes -
Replying to @sgrif @kornelski
What do download counts even mean for http://lib.rs ? I didn't think you served crates directly?
2 replies 0 retweets 0 likes -
Replying to @sgrif @kornelski
Based on the 366, I'm guessing it's an in memory cache of 1 year of (upstream) daily download counts. As you said, since we have multiple servers updating the counts we can't cache much in process (plus Heroku can terminate us before we can write data out).
1 reply 0 retweets 0 likes -
That's tough. But maybe you could use a more compact format in the db? Daily (version_id, date) adds ~200K identical dates every day. It could be (version_id, week_number) with a downloads[7] column, for example.
1 reply 0 retweets 0 likes -
Replying to @kornelski @jtgeibel
If we had a problem we were trying to solve, sure. But all our time sensitive queries are on version id and date, the current structure is perfect for that
1 reply 0 retweets 0 likes -
I've meant that as a way to reduce data size, given that the sheer volume of the data is causing you pains. Normalized table has 4-byte counters with ~50 bytes of overhead each. Merging them to be one row per version per week (or longer) should considerably reduce table size.
1 reply 0 retweets 0 likes -
Replying to @kornelski @jtgeibel
The volume of data isn't, so much as how it's being accessed/laid out. Compressing rows would only delay the problem, and would likely slow down access patterns that are actually important to be fast. Only painful place right now is background jobs
1 reply 0 retweets 0 likes
The tuple overhead here wouldn't really be reduced. Every write produces a new tuple no matter what. The number of writes are the same. Duplicate date also isn't really reduced here since the overhead of an array element is the same as a date.
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.