I need to create a word frequency database in order to to compare groups of data (for instance, one user's tweets / youtube comments) to extract words / phrases that are used more often for that particular user / group compared to the universal word usage frequency.
Do you think
Conversation
it matters if I use tweets or Reddit comments or Youtube comments to generate the universal word frequency table? Would restrictions on the length of a comment / tweet influence the usage frequency of certain words?
Would it make sense to combine Reddit, Youtube comments and
2
3
tweets to create the global word usage frequency table?
Opinions?
10
1
Replying to
Yes. But I think the larger issue you may face is that different moderation standards lead to different word choices.
1
4

