Well this is annoying. How do you prevent a classifier from biasing towards the majority class in the test data? Oversample the minority?
@abestanway I have very little experience with text classification but that just seems like a bug. An unknown feature should have no weight.
-
-
@avibryant revised: weak features alone also bias towards majority (common words like and, the, etc). Symptom of unbalanced data, seems like -
@abestanway what does the ROC curve look like? Even if there's bias, is there no threshold you could pick that has good performance?
End of conversation
New conversation -
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.