I analyzed 24.7 million original tweets from all verified accounts and created a word frequency csv that can be used for a baseline comparison against individual accounts. This file contains the top 250,000 words, total and frequency.
files.pushshift.io/twitter/twitte
#datascience
Conversation
Replying to
Awesome 👏 Did you do any stopword removal or anything? Also, did you use any particular tokenization method?
2
Replying to
Also, I used the lowercase method on the text to normalize. Unfortunately, this will cause issues with proper nouns, names, etc. It's a first attempt to try and get a large baseline frequency for comparison against individual accounts.

