Seems it should increase the bias of the individual weak classifiers but avoid systematic bias across the ensemble. (cc @posco, @avibryant.)
-
-
- 1 more reply
New conversation -
-
-
@patrickc create a feature that describes why they're correlated. if non-generalizable (e.g., user_id), consider downweight or subsampling -
@ihat What if you don't know how they are or might be? - 2 more replies
New conversation -
-
-
@patrickc That would be an ecumenical matter.Thanks. Twitter will use this to make your timeline better. UndoUndo
-
-
-
@patrickc PCA is the most popular technique here, but for full 2014 buzzword compliance, you'll need deep learning: http://www.cs.toronto.edu/~hinton/science.pdf … -
@patrickc TBH, random forests do well with correlated features as long as you use decent regularization. PCA more important for (e.g.) GMMs.
End of conversation
New conversation -
-
-
@patrickc The term of art for this is “stratified sampling.”Thanks. Twitter will use this to make your timeline better. UndoUndo
-
-
-
@patrickc You can just employ stratified sampling and make the data representative of what you expect the incoming observations to be.Thanks. Twitter will use this to make your timeline better. UndoUndo
-
-
-
@patrickc I'm not sure Patrick - I know there is some research on ML models with dependent data, I could dig up my MSc thesis...Thanks. Twitter will use this to make your timeline better. UndoUndo
-
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.