I am collecting tweet data for the 7.1 earthquake now. The amount of tweets is amazing. Since this quake reached into Vegas and was stronger in Los Angeles, I expect a peak at least 2x greater than the 6.4 quake. Friday night seems like prime time for people to report via Twitter
Conversation
Waiting for all this data to ingest at 6am on a Saturday morning.
GIF
3
3
Replying to
Have you written about/shared anywhere your set up and process for ingesting the data?
1
Replying to
There are some things scattered about on Pushshift.io but if there is interest, I'll take some time out and write up the process from start to finish.
2
1
Replying to
I guess this tweet made me wonder if you do anything special for an unusual event vs normal ingest process...and if so what a one-off process looks like.
1
1
Replying to
I mainly use the same scripts but I go in and change around some variables and do some pre-analysis. For example, I'll do a search for Earthquake and grab +10,000 tweets. Then I'll rank the hashtags and significant words associated with Earthquake and then go back and ...
1
2
.. broaden the search by including the additional terms to get as much data related to a specific event as possible.
Replying to
Ah interesting! You mean you don’t have a 1:1 dump of all tweets “just in case”? ;-)
1
Replying to
I throw all raw data to dump files or sqlite3 DBs. I've burned myself enough times in the past when I was beginning to learn you should always keep raw data because you never know how you will fuck up your pipeline and then not have the original data to correct from. :)
1
1
Show replies

