I am currently working on a very large word frequency distribution table using over one billion Reddit comments. This is more challenging than I originally thought since I have to do a lot of filtering to remove bots from the word source since bots like automoderator will inflate
Conversation
the frequency of use for certain words. The first iteration of this project will just use all words lowercased.
I'll be putting up a sample soon. Sorry if you have messaged me and I haven't had a chance to respond yet.
