Machine learning is too often perceived as a kind of oracle-like power capable of predicting the unknowable and working out of distribution.
The difference in ratings between what people could predict today (wrt current articles) and what people would think 20 years from now is not something that a classifier could possibly predict. So nothing is lost.
-
-
At the same time, the classifier trained on old articles would *simply not work* on today's articles. Too many words/concepts would be unknown or redefined, and the style cues it uses would be outdated. It would produce largely garbage output (or at least very biased)
-
How do we know concept drift has been that significant since, say, the ‘90s? Particularly in formal news articles where new slang is less common. I can think of particular examples of new/redefined words but don’t know how much this problem would degrade the classifier’s utility
- Show replies
New conversation -
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.