Hi Jason! Do you have a citation you can point me to that shows how you do this? We have a purchased set from the same time. Would be happy to compare...
Ya, that’s awesome. So, you identify a subset of likely IDs (reverse engineering their ID production method) and then brute force ID calls and filter against that… is that a correct (super simple) interpretation?
That's correct! I did an analysis on how the tweets are distributed using their snowflake implementation and found the full space of possible ids is actually much smaller than the ids would lead you to believe. But yes, your summary is pretty much exactly what the method does.
That’s really interesting. We often “catch” events a few minutes or hours after they start. It seems like this method could help fill the gap at the start. Is there a way to estimate data decay (via deletions/suspensions)?
I think it should be fairly straightforward to study the data decay as a direct result of failed calls made by Tweet ID; pick a fixed span of time say a month from about an year ago, engineer the IDs, and query away
How do you know when you query an id whether it was a deleted tweet or was never assigned at all? Is there a distinction? I haven't noticed one in the past but if there is that would be very helpful.