Conversation

I need to create a word frequency database in order to to compare groups of data (for instance, one user's tweets / youtube comments) to extract words / phrases that are used more often for that particular user / group compared to the universal word usage frequency. Do you think
4
8
it matters if I use tweets or Reddit comments or Youtube comments to generate the universal word frequency table? Would restrictions on the length of a comment / tweet influence the usage frequency of certain words? Would it make sense to combine Reddit, Youtube comments and
2
3
Replying to
search for users that use the word "vaccine(s)", you will get X users back and many of them are just using it occasionally (perhaps because there is a discussion of vaccines). But what I want to do is pull out users that use the term much more often than what the global frequency
1
2
table would have. Then you can detect users who are either purposely trying to spread disinfo, find users who are experts in the topic and like to talk about it, find users who have a strong stance on the topic matter, etc.
2
1
Just because a user uses a term X times doesn't mean they're discussing that topic much more frequently than others -- they just might be prolific and write a lot of comments / tweets -- but if you can calculate the frequency of the use compared to their entire corpus, then you
1