Well this is annoying. How do you prevent a classifier from biasing towards the majority class in the test data? Oversample the minority?
-
-
-
Replying to @avibryant
.
@avibryant binary text classifier. When it encounters an unknown feature, defaults to majority class, overriding weaker minority features1 reply 0 retweets 0 likes -
Replying to @abestanway
@abestanway I have very little experience with text classification but that just seems like a bug. An unknown feature should have no weight.1 reply 0 retweets 0 likes -
Replying to @avibryant
@avibryant revised: weak features alone also bias towards majority (common words like and, the, etc). Symptom of unbalanced data, seems like1 reply 0 retweets 0 likes
Replying to @abestanway
@abestanway what does the ROC curve look like? Even if there's bias, is there no threshold you could pick that has good performance?
8:38 PM - 26 Oct 2014
0 replies
0 retweets
0 likes
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.