Status: fitting monotonic splines to t-digest cdf sketches.
Plots by EvilPlot
#DataSciencepic.twitter.com/KYM8KDB8oT
You can add location information to your Tweets, such as your city or precise location, from the web and via third-party applications. You always have the option to delete your Tweet location history. Learn more
On the PDF plot, what are the green bins and why is the red line so much noisier than they are?
Green bins are a histogram of the data. Red line is density as estimated from the t-digest sketch, where "bins" are clusters with variable location, effective width and mass. Noise comes from these variations. (gradually working up to a blog post these ideas)
TL/DR is, if your data is too large or streaming, histogramming isn't an option, and instead you're working with some kind of sketch, such as a t-digest
Yep, totally. I'm interested in this as an empirical way to get an analytical continuous PDF prior on big data that I can then use for parameters in bayesian inference on small data. Seems perfect for that.
Definitely! I'll post my prototype notebook code in a day or two. A real blog post may have to wait.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.