Conversation

In order to share Twitter data with academic researchers, it is important that Pushshift does so in a way that complies with Twitter's TOS. Recently, Twitter made some modifications to its TOS which allow for sharing an unlimited number of tweet ids with other researchers.
1
35
The research team that is given this list of ids then need to "rehydrate" the tweets via Twitter's API. One of the things that I am looking into is what metadata can be shared along with each tweet id. This would enable researchers using the data to filter out tweets that aren't
1
8
needed for their research and reduce the number of API requests needed to rehydrate the remaining tweet ids. After I confirm with Twitter if sharing metadata for each tweet id is possible, I wanted to ask other researchers which pieces of metadata are most important.
1
7
Right now, I am creating a list of metadata fields to accompany each tweet id. Currently, I have the following metadata fields in mind. My question to other researchers is what other metadata would be helpful to aid in pre-filtering the data. Here are the fields I've added so far
1
6
tweet id (bigint) user_id (bigint) conversation_id (bigint) is_retweet (bool) is_quoted_tweet (bool) is_reply (bool) is_root (bool) last_update_time (int) retweet_count (int) favorites_count (int) reply_count (int) contains_location_data (bool) contains_media_types (enum int)
1
6
Replying to
pre-filtering the data before rehydrating tweets. If you can think of any other useful metadata to accompany each tweet id, please feel free to add your ideas and recommendations. I will update with more info once I get more guidance from Twitter.
7
8