I analyzed 24.7 million original tweets from all verified accounts and created a word frequency csv that can be used for a baseline comparison against individual accounts. This file contains the top 250,000 words, total and frequency.
files.pushshift.io/twitter/twitte
#datascience
Conversation
Replying to
Awesome 👏 Did you do any stopword removal or anything? Also, did you use any particular tokenization method?
2
Replying to
I didn't remove any stop words. I disqualified hashtags and user mentions from the pool so that the focus was on words actually used in sentences, etc.

