Conversation

I need to create a word frequency database in order to to compare groups of data (for instance, one user's tweets / youtube comments) to extract words / phrases that are used more often for that particular user / group compared to the universal word usage frequency. Do you think
4
8
it matters if I use tweets or Reddit comments or Youtube comments to generate the universal word frequency table? Would restrictions on the length of a comment / tweet influence the usage frequency of certain words? Would it make sense to combine Reddit, Youtube comments and
2
3