Conversation

I need to create a word frequency database in order to to compare groups of data (for instance, one user's tweets / youtube comments) to extract words / phrases that are used more often for that particular user / group compared to the universal word usage frequency. Do you think
4
8
it matters if I use tweets or Reddit comments or Youtube comments to generate the universal word frequency table? Would restrictions on the length of a comment / tweet influence the usage frequency of certain words? Would it make sense to combine Reddit, Youtube comments and
2
3
A lot of great advice here -- thank you! What I'm trying to do is create a universal word frequency table so that we can then scan users to see which users are using terms / words at a much higher frequency than you would expect by using the universal set. For instance, if you
1
2
Just because a user uses a term X times doesn't mean they're discussing that topic much more frequently than others -- they just might be prolific and write a lot of comments / tweets -- but if you can calculate the frequency of the use compared to their entire corpus, then you
1